Abstract: Multimodal Emotion Recognition is challenging because of the heterogeneity gap among different modalities. Due to the powerful ability of feature abstraction, Deep Neural Networks (DNNs) ...