Every year, my mother makes us unwrap matching family pajamas on Christmas Eve. It's her favorite tradition, but it's one that take a lot of planning each year — she has to buy pajamas for 10 people ...
GenPT is the first generative point tracker that addresses the limitations of conventional discriminative models in capturing multi-modality by directly modelling the multi-modality inherent to point ...
Abstract: Imitation learning is a promising approach for enabling generalist capabilities in humanoid robots, but its scaling is fundamentally constrained by the scarcity of highquality expert ...
Modeling interactive driving behaviors in complex scenarios remains a fundamental challenge for autonomous driving planning. Learning-based approaches attempt to address this challenge with advanced ...
While methods exist for aligning flow matching models — a popular and effective class of generative models — with human preferences, existing approaches fail to achieve both adaptation efficiency and ...
Deep generative models, including diffusion and flow matching, have shown outstanding performance in synthesizing realistic multi-modal content across images, audio, video, and text. However, the ...
This paper presents FLOAT, an audio-driven talking portrait video generation method based on flow matching generative model. We shift the generative modeling from the pixel-based latent space to a ...
Multimodal modeling focuses on building systems to understand and generate content across visual and textual formats. These models are designed to interpret visual scenes and produce new images using ...