*Title:* Context-Aware Neural Machine Translation Evaluation II

*Description:*
Context-Aware Neural Machine Translation  (CA-NMT) [Tiedemann and
Scherrer, 2017; Laubli et al., 2018; Miculicich et al., 2018; Maruf et
al., 2019; Zheng et al., 2020; Ma et al. 2021; Lupo et al., 2022] is
one of the most interesting research axes in NLP, with strong impact on
both academic and company research. CA-NMT systems are mostly evaluated
with "average-quality-measuring" metrics such as BLEU
[Papineni et al., 2002] or COMET [Rei et al., 2020], and dedicated
contrastive test suites [Voita et al., 2019; Muller&Rios 2018;
Lopes et al., 2020]. The latter have been designed to measure
specifically to which degree CA-NMT systems are able to exploit context
while scoring sentences to be translated in context. Indeed the
sentence-level average translation quality measured by BLEU or COMET is
inadequate in this respect [Lupo et al., 2022]. When evaluating models
with contrastive test suites however, models only score sentences in
context, that is they do not translate them. The ability of models to
use context is thus only implicitly evaluated.

The work planned in this project will be realized with two different
research directions resulting actually into two different internships.
In the first we will exploit the work already done during a previous
internship on the same subject [Dinarelli et al., 2024], we would like
to make a step ahead in the evaluation of CA-NMT systems. The idea is
to exploit annotated data like those already used for
[Muller&Rios 2018; Lopes et al., 2020], for [Ekaterina et al., 2022],
or for [Dinarelli et al., 2024], to explicitly involve discourse
phenomena, such like coreferences and anaphora, in the evaluation
procedure of CA-NMT models. Such evaluation procedure will allow
possibly to design more accurate and evaluation measures for
"discourse-phenomena-aware" CA-NMT systems. In the second we will work
on a document-level dataset from the financial domain in six languages.
The work will focus on all the steps of the pipeline designed for
collecting the dataset, in order to improve the identification of
context-sensitive linguistic phenomena in the aligned text segments,
such like anaphora, ellipsis, polysemous words, etc. The identification
will be performed using NLP tools such like coreference resolution
systems, syntactic analysers, LLMs.

*Practical Aspects:*
In this internship, on one side the student will use Machine Learning
and Deep Learning tools to automatically annotate parallel data (at
least English-French, but possibly also English-German and other
language pairs) used for NMT with discourse phenomena, as well as
Neural Machine Translation tools for automatically generating
translations that will be used for CA-NMT evaluation. One another side,
the student will use coreference resolution systems, syntactic
analysers, LLMs for identifying discourse phenomena on a document-level
dataset from the financial domain.

Based on the annotation of discourse phenomena, we will design an
adequate evaluation metric for CA-NMT systems, taking into account the
capability of the system to exploit discourse phenomena. Finally, the
evaluation metric will be tested by evaluating CA-NMT systems already
available [Lupo et al., 2022] or trained from scratch at LIG.

*Profile:*
-   Master 2 student level in computer science or NLP
-   Interested in Natural Language Processing and Deep Learning
    approaches
-   Skills in machine learning for neural models
-   Computer science skills:
    1.  Python programming. Some knowledge of deep learning libraries
        such like Pytorch and possibly Fairseq.
    2.  Data manipulation and annotation

The internship may last from 5 up to 6 months, it will take place at
LIG laboratory, GETALP team (http://lig-getalp.imag.fr/), starting from
January or February 2025.
The student will be tutored by Marco Dinarelli
(http://www.marcodinarelli.it), and will collaborate with Ph.D.
students at LIG involved in the same project.

Interested candidates must send a CV and a motivation letter to
marco.dinarelli@univ-grenoble-alpes.fr.

*Bibliography*
[Tiedemann and Scherrer, 2017] Neural ma- chine translation with
    extended context. Workshop on Discourse in Machine Translation 2017.
[Laubli et al., 2018] Has machine translation achieved human parity? a
    case for document-level evaluation. EMNLP 2018.
[Miculicich et al. 2018] Document-level neural machine translation with
    hierarchical attention networks. EMNLP 2018.
[Maruf et al., 2019] Selective attention for context-aware neural
    machine translation. NAACL 2019.
[Zheng et al., 2020] Towards Making the Most of Context in Neural
    Machine Translation. IJCAI 2020.
[Ma et al., 2021]  A Comparison of Approaches to Document-level Machine
    Translation. arXiv pre-print 2021.
[Lupo et al., 2022] Divide and Rule: Effective Pre-Training for
    Context-Aware Multi-Encoder Translation Models. ACL 2022.
[Papineni et al., 2022] Bleu: a method for automatic eval- uation of
    machine translation. ACL 2002. [Rei et al., 2020] COMET: A Neural
    Framework for MT Evaluation. EMNLP 2020.
[Voita et al., 2019] When a good translation is wrong in context:
    Context-aware machine translation improves on deixis, ellipsis, and
    lexical cohesion. ACL 2019.
[Muller&Rios 2018] A large-scale test set for the evaluation of
    context-aware pronoun translation in neural machine translation.
    CMT 2018
[Lopes et al., 2020] Document-level neural MT: A systematic comparison.
    EAMT 2020
[Ekaterina et al., 2022] ParCorFull2.0: a Parallel Corpus Annotated
    with Full Coreference. LREC 2022.
[Dinarelli et al., 2024] Context-Aware Neural Machine Translation
    Models Analysis And Evaluation Through Attention.
    TAL volume 64-3 2024.