Label Efficient 3D Detection with Foundation Models H/F

Description de l'offre : 3D object detection is a critical component of many applications such as autonomous driving, robotics, and augmented reality, where having a precise understanding of the 3D environment is crucial. In the context of 3D object detection, a key challenge lies in the high cost of annotating 3D bounding boxes, making it difficult to scale supervised learning methods to new applications. To address this, various learning paradigms such as semi-supervised[1][2], weakly supervised[3], and unsupervised domain adaptation have been proposed to reduce the need for large amounts of annotated data while maintaining or improving performance. By leveraging minimal labeled data or even unannotated data, these approaches help reduce the reliance on costly 3D box annotations. Most state-of-the-art methods rely on a teacher-student architecture. A crucial aspect of this approach is pseudo-label filtering, which can be done using two main strategies. One strategy involves untrained heuristics, such as confidence scores produced by detection models, while the other strategy uses uncertainty estimation modules trained on a small set of annotated 3D data. Both of these approaches, however, have limitations. Heuristics can be overly reliant on hyperparameters that may overfit, while uncertainty estimators can prove unreliable. Recent breakthroughs in 2D vision-language models (VLMs) have inspired research in 3D vision, particularly around the potential of these models for pretraining [4][5]. However, despite the promise of VLMs, there is little exploration of their use in the context of semi-supervised, weakly supervised, or unsupervised domain adaptation for 3D object detection. Therefore, we aim to fill this gap by leveraging the power of foundation models for more robust pseudo-label filtering. This could involve using pixel features from the 2D projections of 3D points to calculate intra-object coherence, as well as neighborhood incoherence scores to ensure that objects are correctly detected and isolated. Additionally, 2D features could be used as a pretext for scene completion tasks, providing finer object contours and estimating occluded parts of detected objects. [1] Zhao, N., et al. (2020). Sess: Self-ensembling semi-supervised 3d object detection. CVPR. [2] Xu, H., et al. (2021, September). Semi-supervised 3d object detection via adaptive pseudo-labeling. ICIP. [3] Yao, B., et al. (2024). Uncertainty-guided Contrastive Learning for Weakly Supervised Point Cloud Segmentation. IEEE Transactions on Geoscience and Remote Sensing. [4] Chen, Zhimin, et al. "Bridging the domain gap: Self-supervised 3d scene understanding with foundation models." NeurIPS 2024 [5] Sirko-Galouchenko, S., et al. (2024). OccFeat: Self-supervised Occupancy Feature Prediction for Pretraining BEV Segmentation Networks. CVPR 2024.

Présentation de l’institut : Based in Saclay (Essonne), the LIST is one of the two institutes of CEA Tech, the technological research division of the CEA. Dedicated to intelligent digital systems, its mission is to carry out technological developments of excellence on behalf of industrial partners in order to create value. Within the LIST, the Laboratory of Vision and Learning for Scene Analysis (LVA) conducts research in the field of computer vision and artificial intelligence for the perception of intelligent and autonomous systems. The laboratory's research themes include visual recognition, behavior and activity analysis, large-scale automatic annotation, and perception and decision models. These technologies are applied in major sectors such as security, mobility, advanced manufacturing, healthcare, and sports.

Profil du candidat : Students in their 5th year of studies (M2 or gap year) Computer vision skills Machine learning skills (deep learning, perception models, generative AI…) Python proficiency in a deep learning framework (especially TensorFlow or PyTorch) Scientific research experience will be appreciated In line with CEA's commitment to integrating people with disabilities, this job is open to all.

Diplome préparé : Bac+5 - Master 2

Médias associés

Talent Impulse

Programme Science Impulse

Informations légales

Suivez-nous !

Contact us

We will reply as soon as possible...