Technically, the internship involves the fields of machine learning (ML) and natural language processing (NLP), and more specifically natural language generation (NLG) and classification techniques. In collaboration with CEA research engineers, the aim will be to train classification models capable of recognizing different types of text-based cyber attacks and distinguishing text-based attacks authored by humans from those generated by AI or by a specific generative model. This internship is meant to be an introduction to research, with the goal of publishing a scientific article if the obtained results are conclusive. The implemented models may also be used to participate in a shared task like AuTexTification (https://sites.google.com/view/autextification/home) and CLIN33 (https://sites.google.com/view/shared-task-clin33/home) or in a challenge like MLMAC (https://mlmac.io/). This work may be followed by a PhD in a broader context.
Engineering degree and/or Master 2 (M2) degree in computer science with a strong interest in artificial intelligence and natural language processing.
Required skills :
working environment : linux
knowledge of text classification techniques
background in natural language generation and language modeling
familiarity with pre-trained language models and large language models
Basic knowledge of the cybersecurity field
programming : Python + PyTorch/TensorFlow