Abstract: Visual grounding in remote sensing (RSVG) images aims to detect specific objects associated with referring expressions in remote sensing images. Existing methods typically combine outputs of ...