Large Language Models (LLMs), such as ChatGPT, have led to a new AI revolution with applications in every domain. However, LLMs are very resource-consuming (energy, compute…) and, hence, an important line of research focuses on optimizing these models. Existing open-source tool chains, such as LLM Compressor [1] and OpenVINO [2], enable almost-automatic optimizations to compress LLMs into smaller ones by, e.g., quantization and pruning. However, they only target conventional hardware, such as GPUs. New hardware paradigms, such as In-Memory Computing (IMC) are promising to accelerate and reduce the energy consumption of LLMs [3]. However, running LLMs on such hardware requires specific optimizations due to the characteristics of these hardware. For instance, they require extreme quantization of the model (reducing the number of bits on which data, weigths and activations are encoded), because analogue IMC fabric has a limited number of bits, and optimizing the robustness of the model, because IMC computations are prone to errors. Nevertheless, software tools and methods for mapping state-of-the-art LLMs on these hardware platforms lag behind. This internship aims at putting together a software infrastructure for mapping, simulating and exploring the performance of LLMs on IMC hardware, starting with existing open-source tools chains and integrating functionalities dedicated to IMC hardware, such as quantization and error models. The student will be integrated within a multidisciplinary team of research engineers, PhDs, PostDocs and interns, at the heart of an ecosystem of industrial and academic partners in the world of embedded AI. He/she will have access to supercomputers infrastructure. He/she will benefit from increased expertise in LLMs, compression methods, and efficient hardware for AI. Leveraging the tools and knowledge developed during the internship, the student could be offered the opportunity to pursue a PhD on compression methods for LLMs. [1] https://github.com/vllm-project/llm-compressor [2] https://github.com/openvinotoolkit/openvino [3] Analog Foundation Models, Büchel et al, NeurIPS 2025.
The internship will take place at CEA-LIST, in the Integrated Multi-Sensor Intelligence Laboratory, which brings together experts in artificial intelligence, embedded systems and sensors.
Étudiant en dernière année d'école d'ingénieur ou en Master 2 en informatique ou Intelligence Artificielle ou systèmes embarqués. Forte motivation pour apprendre et contribuer à la recherche en IA. Bonne connaissance en informatique, en environnement de programmation (Unix), langage (Python), outils de développement (Git). Bonne connaissance et expérience dans le deep learning et les outils de développement associés (Pytorch ou Tensorflow). Une connaissance des systèmes embarqués est un plus.
Bac+5 - Master 2
Talent impulse, the scientific and technical job board of CEA's Technology Research Division
© Copyright 2023 – CEA – TALENT IMPULSE - All rights reserved