Abstract: Convolutional Neural Networks (CNNs) and Transformer are two powerful representation learning techniques for visual tracking. Although CNNs can effectively reduce local redundancy via ...