Emploi / Doctorat

Efficient Multimodal Vision Transformers for Embedded System

Challenge: New computing paradigms, including quantum, Communication networks, IOT, radiofrequencies and antennas, Cyber physical systems - sensors and actuators, Health and environment technologies, medical devices,
Type of Contract: phD
Institute :
Location: Paris – Saclay
Level of study: Level 7
Desired start date : 2025-10-01
Superviser : POREBA Martyna (DRT/DSCIN/DSCIN/LIAE)

Job description: The proposed thesis focuses on the optimization of multimodal vision transformers (ViT) for panoptic object segmentation, exploring two main directions. The first is to develop a versatile fusion pipeline to integrate multimodal data (RGB, IR, depth, events, point clouds) by leveraging inter-modal alignment relationships. The second is to investigate an approach combining pruning and mixed-precision quantization. The overall goal is to design lightweight multimodal ViT models, tailored to the constraints of embedded systems, while optimizing their performance and reducing computational complexity.

Your profile: Master en informatique/vision par ordinateur/intelligence artificielle