ASID-Caption

We build ASID-Caption, a data-and-model suite for fine-grained audiovisual video understanding.

Our goal is to move beyond “one video → one generic caption” by providing attribute-structured supervision and quality-verified annotations, enabling models to produce more complete, more controllable, and more temporally consistent descriptions that cover both visual content and audio cues.

What we release

Research interests