LLMs for specifying data sharing policies H/F

Job description: Developing physical or digital systems is a complex process involving both technical and human challenges. The first step is to give shape to ideas by drafting specifications for the system to come. Usually written in natural language by business analysts, these documents are the key that bind all stakeholders for the duration of the project, making it easier to share and understand what needs to be done. Requirements engineering proposes various techniques (reviews, modeling, formalization, etc.) to regulate this process and improve the quality (consistency, completeness, etc.) of the documents produced, with the aim of detecting and correcting defects even before system implementation. In the field of requirements engineering, the recent arrival of very large model neural networks (LLM) has the potential to be a “game changer”. We propose to support the analyst by working around specifications on the data part age. The idea is to be able to model data sharing policies (ODRL) from natural text. The tool will exploit an AI transformer/LLM (such as ChatGPT or Lama) combined with rigorous analysis and consulting methods. It will propose options for rewriting requirements in controlled languages inspired by INCOSE or EARS standards, analyze the results produced by the LLM, and provide an audit on the quality of the model obtained. More specifically, LLMs are particularly promising for the following uses: Automatically transforming unstructured requirements into requirements formatted in structured models such as EARS or user stories. Classify requirements: behavioral, non-functional, etc. flag ambiguities, inconsistencies or potential violations on the basis of predefined validation heuristics. LLMs also have limitations that need to be taken into account in the context of requirements engineering: hallucination, non-determinism, algorithmic biases and limited generalization. As part of the laboratory's “Intelligent Requirements” team, the candidate's work will involve : Determine schemas or a controlled language to represent the ODRL model. Determine the effectiveness of different techniques and formalisms, such as NLP or Blue metric inspiration, to avoid hallucinations during rewriting. Analyze, manage or generate training data for LLMs. Configure and pilot one or more LLMs using the most effective techniques for improving the consistency and completeness of data-sharing policies. Develop the software tools required for the above tasks.

Your profile: Doctorat Connaissance en Java, python, Eclipse EMF, Node JS, REACT

Languages: English Fluent