=== Funding
French-Canadian ANR/NSERC project “DeepVision” (2016-2020)
=== Supervisors:
Christian Wolf http://liris.cnrs.fr/christian.wolf
Julien Mille http://www.rfai.li.univ-tours.fr/PagesPerso/jmille/
=== Subject:
Human perception focuses selectively on parts of the scene to acquire information at specific places and times. In machine learning, this kind of process is referred to as attention mechanism, and has drawn increasing interest when dealing with languages, images and other data. Integrating attention can potentially lead to improved overall accuracy, as the system can focus on parts of the data, which are most relevant to the task. In particular, mechanisms of visual attention currently play an important role in many current vision tasks [3][6-10].
The objective of this post-doc is to advance the state-of-the-art in human-centered vision and robotics through visual attention mechanisms for human understanding. A particular focus will be put on two different applications :
– Mechanisms of visual attention for videos and still images;
– « Physical » attention mechanisms, where the agent is not virtual but physical. This translates to tasks where mobile robots optimize their location/navigation in order to solve complex visual tasks.
In terms of methodological contributions, this research will focus on deep learning and deep reinforcement learning for agent control [11] and for vision [6,12].
The post-doctoral candidate will participate in the ongoing collaborations between INSA-Lyon and University of Guelph, Canada, on Deep Learning; on UPMC/LIP6, on Deep Learning; and with INRIA (CHROMA research group) on reinforcement learning and agent control.
More information:
http://liris.cnrs.fr/christian.wolf/openpositions/postdoc-deepvision
[1] Natalia Neverova, Christian Wolf, Graham W. Taylor and Florian Nebout. ModDrop: adaptive multi-modal gesture recognition. In IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI), 2016.
[2] Natalia Neverova, Christian Wolf, Florian Nebout, Graham W. Taylor. Hand Pose Estimation through Weakly-Supervised Learning of a Rich Intermediate Representation. In Computer Vision and Image Understanding, 2017
[3] Fabien Baradel, Christian Wolf, Julien Mille. Pose-conditioned Spatio-Temporal Attention for Human Action Recognition. Arxiv:1703.10106, 2017.
[4] Christian Wolf, Eric Lombardi, Julien Mille, Oya Celiktutan, Mingyuan Jiu, Emre Dogan, Gonen Eren, Moez Baccouche, Emmanuel Dellandréa, Charles-Edmond Bichot, Christophe Garcia, Bülent Sankur. Evaluation of video activity localizations integrating quality and quantity measurements. In Computer Vision and Image Understanding (127):14-30, 2014.
[5] Moez Baccouche, Frank Mamalet, Christian Wolf, Christophe Garcia, Atilla Baskurt. Spatio-Temporal Convolutional Sparse Auto-Encoder for Sequence Classification. In the Proceedings of the British Machine Vision Conference (BMVC), 2012.
[6] Volodymyr Mnih, Nicolas Heess, Alex Graves, and koray kavukcuoglu. Recurrent models of visual attention. In NIPS. 2014.
[7] Jason Kuen, Zhenhua Wang, and Gang Wang. Recurrent Attentional Networks for Saliency Detection. In CVPR, 2016.
[8] Shikhar Sharma, Ryan Kiros, and Ruslan Salakhutdinov. Action Recognition using Visual Attention. ICLR Workshop track, 2016.
[9] S. Song, C. Lan, J. Xing, W. Zeng, and J. Liu. An end-to-end spatio-temporal attention model for human action recognition from skeleton data. Pre-print : arxiv :1611.06067, 2016.
[10] S. Yeung, O. Russakovsky, G. Mori, and L. Fei-Fei. End-to- end Learning of Action Detection from Frame Glimpses in Videos. In CVPR, 2016.
[11] V. Mnih, K. Kavukcuoglu, D. Silver, A.A. Rusu, J. Veness, M.G. Bellemare, A. Graves, M. Riedmiller, A.K. Fidjeland, G. Ostrovski, S. Petersen, C. Beattie, A. Sadik, I. Antonoglou, H. King, D. Kumaran, D. Wierstra, S. Legg, and D. Hassabis. Human-level control through deep reinforcement learning. Nature, page 529–533, 2015
[12] M. Gygli, M. Norouzi and A. Angelova. Deep Value Networks Learn to Evaluate and Iteratively Refine Structured Outputs, arxiv 3/2017.