Optical flow is the motion information that describes the displacement of each pixel between two frames of a video. In the case of a fixed camera, optical flow can be used to highlight moving objects, which are the regions of interest for various vision applications: moving objects segmentation, anomaly detection, etc. However, when analyzing dynamic scenes (i.e. observed by a moving camera), the optical flow becomes a combination of the movement of the objects in the scene along with the camera motion. This causes the background to be also mobile, making applications based on optical flow challenging. Several state-of-the-art methods still use an uncompensated optical flow for solving vision tasks [1,2], but this comes at the cost of tedious filtering of the noise generated by background motion. Other methods have emphasized the importance of handling moving background early on , in motion maps, but propose normalizations that attenuate camera motion without handling it robustly. In addition, geometric image registration methods such as homography , which are effective in simple scenarios, are not adapted for non-planar scenes with complex camera movements, as in the case of road scenes. The aim of this internship is to design and develop a deep learning-based method capable of separating optical flow from camera movement, also known as rigid flow or ego-motion. Objectives of the internship After conducting a scientific survey on rigid flow compensation, the trainee will be able to investigate several areas of study: - Explore supervised methods for learning the separation of optical and rigid flow, on synthetic datasets (ground truth available from previous work). - Develop a rigid flow compensation method capable of exploiting camera poses and self-supervised depth maps. This step aims to alleviate the constraint of having a perfect depth map (ground truth). - To take this a step further, we will look at the case of camera pose unavailability, by investigating methods for estimating the camera's trajectory (ego-motion). References  A. Dave, P. Tokmakov, and D. Ramanan. Towards segmenting anything that moves. In IEEE/CVF InternationalConference on Computer Vision Workshop (ICCVW), 2019.  Zhipeng Bao, Pavel Tokmakov, Allan Jabri, Yu-Xiong Wang,Adrien Gaidon, and Martial Hebert. Discorying object that can move. In CVPR, 2022  Xinyu Zhang, Abdeslam Boularias. Optical Flow boosts Unsupervised Localization and Segmentation. In IROS, 2023.  Yaqing Ding, Daniel Barath, and Zuzana Kukelova. Homography-Based Egomotion Estimation Using Gravity and SIFT Features. In Computer Vision – ACCV 2020
Based in Paris-Saclay campus, CEA-LIST is one of four technological research institutes of CEA TECH, the technological research direction of CEA. Dedicated to intelligent digital systems, it contributes to the competitiveness of companies via research and knowledge transfers. The expertise and competences of the 800 research engineers and technicians at CEA-LIST help more than 200 companies in France and abroad every year on subjects categorized over 4 programs and 9 technological platforms. 21 start-ups have been created since 2003. The Computer Vision and Machine Learning for scene understanding laboratory addresses computer vision subjects with a stronger emphasis on four axes: - Recognition (detection or segmentation of objects and persons) - Behavior analysis (action and gesture recognition, anomalous behavior of individuals or crowds) - Smart annotation (large scale annotation of 2D and 3D data using semi-supervised methods) - Perception and decision-making (Markovian decision processes, navigation) - The intern will join a team composed of 30 researchers (research engineers, PhD students, interns) and will be able to interact with peers working on related subjects and methods.
Niveau demandé : Ingénieur, Master 2 Ce stage ouvre la possibilité de poursuite en thèse et ingénieur R&D dans notre laboratoire. Durée : 6 mois Rémunération : entre 700 € et 1300 € suivant la formation. Compétences requises : vision par ordinateur, apprentissage automatique (deep learning), reconnaissance de formes, Python, C/C++, maîtrise d’un framework d’apprentissage profond (en particulier Tensorflow ou PyTorch)