Stage/Internship : Towards a 3D real-time semantics-infused scene reconstruction H/F


Internship context: 3D reconstruction has historically been made in an explicit fashion, where 3D objects are either modeled by a point cloud, a mesh or voxels. These explicit representations involve sparse data that represent complex geometry. Recently, implicit representations have met with great success. Complex 3D scenes are implicitely represented by a function which outputs, for each position in the 3D world, a density/occupancy (=1 if we are inside of an object, =0 else). Such functions can be learned by Neural Radiance Fields (NeRF) [1]. It is a 3D scene representation technique that uses neural networks to reconstruct detailed 3D scenes from 2D images. It models the scene's volumetric data and appearance (color + density), allowing for realistic 3D scene rendering. Since NeRF, many advances have been made and even model dynamic scenes [2]. However, these 3D representations do not explicitely model the scene semantics as they are purely reconstruction oriented. Recently, semantics have been infused in these 3D representations by leveraging 2D foundation models and distillating their features in the 3D scene allowing for surprisingly effective text-based object retrieval [3]. As performant as these methods are, their training is long and costly and the inference time slow. Recents efforts to accelerate them include Gaussian Splatting [4] which models the 3D scene by « painting it » with 3D Gaussians in space. However, such representations have not yet been used for semantics learning, as it presents its own sets of challenges. Internship objectives: This internship will be a part of a trans-laboratory effort to model 3D scenes and inject semantic information in that model. Such a 3D representation contains information on geometry, color and semantics and can be used for a wide variety of tasks requiring some level of scene understanding, such as object detection, segmentation or text query-based object retrieval. In particular, in this internship, the work will be focused on the autonomous driving use case, where we want to reconstruct the dynamic scene in which the car drives in an efficient, if possible real-time manner. In this internship, we will work on the following points : Efficiently learn 3D urban scenes representations that contain semantics information, leveraging fast learning methods such as gaussian splatting Study the object definition with respect to a chosen scale problem, that arises when an object is composed of smaller parts Potentially, the reconstruction of dynamic scenes , with moving parts or objects could be studied.

Required skills : - Computer Vision - Machine Learning (especially Deep Learning) - 3D geometry understanding - Python, C/C++ (optional) - Deep Learning Framework (PyTorch or Tensorflow)

Bac+5 - Diplôme École d'ingénieurs


Contact us

We will reply as soon as possible...