Technically, the internship involves the fields of machine learning (ML) and natural language processing (NLP), and more specifically natural language generation (NLG) and classification techniques. In collaboration with CEA research engineers, the aim will be to train classification models capable of recognizing different types of text-based cyber attacks and distinguishing text-based attacks authored by humans from those generated by AI or by a specific generative model. This internship is meant to be an introduction to research, with the goal of publishing a scientific article if the obtained results are conclusive. The implemented models may also be used to participate in a shared task like AuTexTification (https://sites.google.com/view/autextification/home) and CLIN33 (https://sites.google.com/view/shared-task-clin33/home) or in a challenge like MLMAC (https://mlmac.io/). This work may be followed by a PhD in a broader context.
Engineering degree and/or Master 2 (M2) degree in computer science with a strong interest in artificial intelligence and natural language processing. Required skills : working environment : linux knowledge of text classification techniques background in natural language generation and language modeling familiarity with pre-trained language models and large language models Basic knowledge of the cybersecurity field programming : Python + PyTorch/TensorFlow
Bac+5 - Diplôme École d'ingénieurs