Efficient Multimodal Vision Transformers for Embedded System

Apply

The proposed thesis focuses on the optimization of multimodal vision transformers (ViT) for panoptic object segmentation, exploring two main directions. The first is to develop a versatile fusion pipeline to integrate multimodal data (RGB, IR, depth, events, point clouds) by leveraging inter-modal alignment relationships. The second is to investigate an approach combining pruning and mixed-precision quantization. The overall goal is to design lightweight multimodal ViT models, tailored to the constraints of embedded systems, while optimizing their performance and reducing computational complexity.

Master en informatique/vision par ordinateur/intelligence artificielle

en_USEN

Contact us

We will reply as soon as possible...