Internship Subject

Design of an automatic metric for evaluating generated texts


Summary

-   Internship in NLP at Orange Innovation (Research & Development team)
-   Internship of 6 months in Lannion, France (on-site)
-   Full-time, 35 hours weekly
-   Starting date: flexible, between January-March 2025
-   Internship is open for English-speaking as well as French-speaking
    students
-   Supervisor: Anastasia Shimorina
-   Url for application: https://orange.jobs/jobs/v3/offers/142434


Context

Many automatic metrics have been proposed to evaluate the generative
capabilities of language models [1, 4-6, among others]. However, most
of these studies primarily focus on English, which limits their
applicability to other languages. To enhance language coverage, we aim
to develop an automatic metric specifically for French.

During the internship, you will explore existing literature and various
approaches for creating automatic metrics, for instance, Likert-style [1]
and span-based methods [2,3]. The focus will be on developing a metric
to evaluate texts generated by large language models (LLMs). Key steps
will include creating a training corpus, fine-tuning the model, and
conducting evaluations. The final metric will be compared with
LLM-based evaluators using both internal and public datasets.

This internship is part of a research program dedicated to natural
language processing and language modeling, covering areas like dialogue
modeling, fine-tuning, knowledge distillation, and semantic analysis.
You will work in a collaborative environment alongside colleagues
focused on related topics.


Main Responsibilities

-   Synthetic data creation: build a training corpus
-   Supervised fine-tuning of language models
-   Benchmarking of existing solutions
-   Evaluating of the proposed method on internal and public data


Qualifications

-   2-year Master's or engineering school student in Computer Science,
    Machine Learning, or Natural Language Processing (NLP)
-   Experience with Machine Learning, Deep Learning
-   Solid computer engineering skills (Python, Linux, shell, git)
-   Experience with Large Language Models (LLMs)
-   Languages: academic English, knowledge of French is a plus


Benefits

-   professional development: engage in cutting-edge research and
    development in NLP
-   R&D centre of Orange Innovation: participate in various events,
    including workshops and seminars
-   a scientific publication may be produced based on the results
-   location near the sea: beautiful landscapes, outdoor activities
-   on-site canteen: subsidised meals


*Bibliography*
[1] Prometheus: Inducing Fine-grained Evaluation Capability in Language
    Models. Kim et al., ICLR 2024
[2] xcomet: Transparent Machine Translation Evaluation through
    Fine-grained Error Detection, Guerreiro et al., TACL 2024
[3] Beyond Traditional Benchmarks: Analyzing Behaviors of Open LLMs on
    Data-to-Text Generation, Kasner and Du╡k, ACL 2024
[4] TIGERScore: Towards Building Explainable Metric for All Text
    Generation Tasks.  Jiang et al., Transactions of Machine Learning
    Research, 2024.
[5] INSTRUCTSCORE: Towards Explainable Text Generation Evaluation with
    Automatic Feedback Xu et al., ACL 2023.
[6] CritiqueLLM: Towards an Informative Critique Generation Model for
    Evaluation of Large Language Model Generation, Ke et al., ACL 2024.