Aurora Core is a real-time emotion recognition system that leverages both facial expressions (visual data) and vocal cues (audio data) to accurately detect human emotions. By integrating these two ...
Abstract: 3D Visual Grounding (3DVG) involves localizing target objects in 3D point clouds based on natural language. While prior work has made strides using textual descriptions, leveraging spoken ...
Abstract: Community researchers have developed various advanced audio-visual segmentation (AVS) models to accurately segment sound-producing objects. However, existing methods face two key limitations ...