Aryan V S

Video to Manimation

It is kinda hard to explain what I have in mind. Something like:

Basically, vague-video-and-text-to-manimation to quickly create visualizations from a normal conversation. Sometimes when speaking to someone, I can draw out a simple "thing" in the air that conveys the meaning to the receiver. It's a form of communication humans can do and understand, but imagine being able to create an exact visualization of these "things".

LLMs and VLMs in a loop can probably do this already.