Learning recurrent memory activation networks for visual tracking

Shi Pu, Yibing Song, Chao Ma, Honggang Zhang, Ming Hsuan Yang

Research output: Contribution to journalArticlepeer-review


Facilitated by deep neural networks, numerous tracking methods have made significant advances. Existing deep trackers mainly utilize independent frames to model the target appearance, while paying less attention to its temporal coherence. In this paper, we propose a recurrent memory activation network (RMAN) to exploit the untapped temporal coherence of the target appearance for visual tracking. We build the RMAN on top of the long short-term memory network (LSTM) with an additional memory activation layer. Specifically, we first use the LSTM to model the temporal changes of the target appearance. Then we selectively activate the memory blocks via the activation layer to produce a temporally coherent representation. The recurrent memory activation layer enriches the target representations from independent frames and reduces the background interference through temporal consistency. The proposed RMAN is fully differentiable and can be optimized end-to-end. To facilitate network training, we propose a temporal coherence loss together with the original binary classification loss. Extensive experimental results on standard benchmarks demonstrate that our method performs favorably against the state-of-the-art approaches.

Original languageEnglish
Article number9269487
Pages (from-to)725-738
Number of pages14
JournalIEEE Transactions on Image Processing
Publication statusPublished - 2021

Bibliographical note

Funding Information:
Manuscript received July 18, 2019; revised September 7, 2020; accepted November 2, 2020. Date of publication November 24, 2020; date of current version December 4, 2020. This work was supported in part by the National Natural Science Foundation of China under Grant 62076034. The work of Chao Ma was supported in part by NSFC under Grant 60906119 and in part by the Shanghai Pujiang Program. The work of Ming-Hsuan Yang was supported by the National Science Foundation CAREER under Grant 1149783. The associate editor coordinating the review of this manuscript and approving it for publication was Prof. Riccardo Leonardi. (Corresponding author: Ming-Hsuan Yang.) Shi Pu is with Tencent AI Lab, Beijing 100193, China (e-mail: pushi_519200@qq.com).

Publisher Copyright:
© 1992-2012 IEEE.

All Science Journal Classification (ASJC) codes

  • Software
  • Computer Graphics and Computer-Aided Design


Dive into the research topics of 'Learning recurrent memory activation networks for visual tracking'. Together they form a unique fingerprint.

Cite this