“Join the CEA to give meaning to your activity, lead and support national and international R&D projects, cultivate and bring to life your spirit of curiosity.” IN SUMMARY, WHAT DO WE OFFER YOU? The CEA is looking for an Engineer in use of LLMs for the specification of data sharing policies for the CEA/DRT (Technological Research Directorate). This 36-month fixed-term management position is based at the Nano-Innov site in Paris-Saclay, Essonne (91). This is a position to be filled as soon as possible. WHY JOIN US? The development of physical or digital systems is complex and involves technical and human challenges. The first step is to formalize ideas by writing specifications or specifications, generally written in natural language by functional analysts. These documents are crucial for the project and facilitate communication between stakeholders. Requirements engineering proposes techniques (reviews, modeling, formalization) to improve the quality of documents (consistency, completeness) and correct defects before the implementation of the system. Large model neural networks (LLM) bring new possibilities in this field [2]. We propose to use a conversational agent (ChatGPT, Lama) to model data sharing policies (ODRL[1]) from natural text. The tool will recommend rewriting options inspired by INCOSE and EARS standards, analyze the results and provide an audit on the quality of the obtained model. LLMs are particularly promising for: transforming unstructured requirements into structured models such as EARS or user stories [2] classifying requirements [3]: behavioral, non-functional, etc. as defined for example in [4] report ambiguities, inconsistencies or potential violations based on predefined validation heuristics [5] [2] However, LLMs have limitations: hallucinations, algorithmic biases and limited generalization [2]. To overcome this, we suggest hybridizing, as in [1], LLMs with other techniques (NLP, process algebras) in order to reduce these impacts. WHAT DO WE EXPECT FROM YOU? Within the "Intelligent Requirements" team of the laboratory, your challenge will consist in: Determining schemas or a controlled language to represent the ODRL model Determining the effectiveness of different techniques and formalisms, such as NLP [6] or inspiration from the Bleu metric [7], to avoid hallucinations during rewriting Analyze, manage or generate training data for LLMs Configure and manage one or more LLMs with the most effective techniques to improve the consistency and completeness of data sharing policies Develop the software tools necessary for the above work.
The LECS laboratory (Laboratoire d'Études et de Conception de Systèmes) of the DILS department within the LIST institute of the DRT directorate of the CEA focuses mainly on cybersecurity and data protection in distributed systems. It participates in projects such as the TASTING project, which aims to modernize and secure energy infrastructures. LECS collaborates with several research organizations and industrial partners such as RTE to develop solutions based on formal methods of real-time verification.
COMPÉTENCES REQUISES : Titulaires d'un doctorat ou d'un Master en informatique, en mathématiques ou en ingénierie des systèmes VOUS AVEZ ENCORE UN DOUTE ? Les à-côtés de votre mission principale peuvent nous intéresser : Un écosystème de recherche à la pointe, unique en son genre et dédié à des thématiques à fort enjeu sociétal, qui donne du sens à votre mission Des formations pour renforcer vos compétences, en acquérir de nouvelles et booster votre mission Un équilibre vie privée / vie professionnelle reconnu par nos collaborateurs La possibilité de télétravailler pour équilibrer les temps de transport et contribuer à votre qualité de vie Un CE riche en avantages et en activités sociales, culturelles et sportives Un lieu de travail au cœur d’un plateau dynamique, entouré d’écoles et d’entreprises de la tech Cela vous tente ? Postulez, ce poste est fait pour vous ! Conformément aux engagements pris par le CEA en faveur de l'intégration des personnes en situation de handicap, cet emploi est ouvert à toutes et à tous #CEA-List #NumericalSimulation #AI [1] « ODRL Information Model 2.2 » : https://www.w3.org/TR/odrl-model/ [2] C. Arora, J. Grundy, et M. Abdelrazek, « Advancing Requirements Engineering through Generative AI: Assessing the Role of LLMs », 1 novembre 2023, arXiv: arXiv:2310.13976 : http://arxiv.org/abs/2310.13976 [3] X. Luo, Y. Xue, Z. Xing, et J. Sun, « PRCBERT: Prompt Learning for Requirement Classification using BERT-based Pretrained Language Models », in Proceedings of the 37th IEEE/ACM International Conference on Automated Software Engineering, Rochester MI USA: ACM, oct. 2022, p. 1‑13. doi: 10.1145/3551349.3560417. [4] A. Fan et al., « Large Language Models for Software Engineering: Survey and Open Problems », 11 novembre 2023, arXiv: arXiv:2310.03533. doi: 10.48550/arXiv.2310.03533. [5] D. V. Dzung et A. Ohnishi, « Improvement of Quality of Software Requirements with Requirements Ontology », in 2009 Ninth International Conference on Quality Software, août 2009, p. 284‑289. doi: 10.1109/QSIC.2009.44. [6] I. K. Raharjana, D. Siahaan, et C. Fatichah, « User Stories and Natural Language Processing: A Systematic Literature Review », IEEE Access, vol. 9, p. 53811‑53826, 2021, doi: 10.1109/ACCESS.2021.3070606. [7] K. Papineni, S. Roukos, T. Ward, et W.-J. Zhu, « Bleu: a Method for Automatic Evaluation of Machine Translation », in Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics, P. Isabelle, E. Charniak, et D. Lin, Éd., Philadelphia, Pennsylvania, USA: Association for Computational Linguistics, juill. 2002, p. 311‑318. doi: 10.3115/1073083.1073135.
Talent impulse, the scientific and technical job board of CEA's Technology Research Division
© Copyright 2023 – CEA – TALENT IMPULSE - All rights reserved