Internship Subject Design of an automatic metric for evaluating generated texts Summary - Internship in NLP at Orange Innovation (Research & Development team) - Internship of 6 months in Lannion, France (on-site) - Full-time, 35 hours weekly - Starting date: flexible, between January-March 2025 - Internship is open for English-speaking as well as French-speaking students - Supervisor: Anastasia Shimorina - Url for application: https://orange.jobs/jobs/v3/offers/142434 Context Many automatic metrics have been proposed to evaluate the generative capabilities of language models [1, 4-6, among others]. However, most of these studies primarily focus on English, which limits their applicability to other languages. To enhance language coverage, we aim to develop an automatic metric specifically for French. During the internship, you will explore existing literature and various approaches for creating automatic metrics, for instance, Likert-style [1] and span-based methods [2,3]. The focus will be on developing a metric to evaluate texts generated by large language models (LLMs). Key steps will include creating a training corpus, fine-tuning the model, and conducting evaluations. The final metric will be compared with LLM-based evaluators using both internal and public datasets. This internship is part of a research program dedicated to natural language processing and language modeling, covering areas like dialogue modeling, fine-tuning, knowledge distillation, and semantic analysis. You will work in a collaborative environment alongside colleagues focused on related topics. Main Responsibilities - Synthetic data creation: build a training corpus - Supervised fine-tuning of language models - Benchmarking of existing solutions - Evaluating of the proposed method on internal and public data Qualifications - 2-year Master's or engineering school student in Computer Science, Machine Learning, or Natural Language Processing (NLP) - Experience with Machine Learning, Deep Learning - Solid computer engineering skills (Python, Linux, shell, git) - Experience with Large Language Models (LLMs) - Languages: academic English, knowledge of French is a plus Benefits - professional development: engage in cutting-edge research and development in NLP - R&D centre of Orange Innovation: participate in various events, including workshops and seminars - a scientific publication may be produced based on the results - location near the sea: beautiful landscapes, outdoor activities - on-site canteen: subsidised meals *Bibliography* [1] Prometheus: Inducing Fine-grained Evaluation Capability in Language Models. Kim et al., ICLR 2024 [2] xcomet: Transparent Machine Translation Evaluation through Fine-grained Error Detection, Guerreiro et al., TACL 2024 [3] Beyond Traditional Benchmarks: Analyzing Behaviors of Open LLMs on Data-to-Text Generation, Kasner and Dušek, ACL 2024 [4] TIGERScore: Towards Building Explainable Metric for All Text Generation Tasks. Jiang et al., Transactions of Machine Learning Research, 2024. [5] INSTRUCTSCORE: Towards Explainable Text Generation Evaluation with Automatic Feedback Xu et al., ACL 2023. [6] CritiqueLLM: Towards an Informative Critique Generation Model for Evaluation of Large Language Model Generation, Ke et al., ACL 2024.