Abstract: Extending large image-text pre-trained models (e.g., CLIP) for video understanding has made significant advancements. To enable the capability of CLIP to perceive dynamic information in ...
Abstract: Text-to-video retrieval is an essential task in multimedia information retrieval, enabling users to search and retrieve videos based on natural language descriptions. In this paper, we ...
Some results have been hidden because they may be inaccessible to you
Show inaccessible results