The LIG (Laboratoire d'Informatique de Grenoble) laboratory proposes the
following M2 stage (research)

Title:
Automatic Coreference Extraction

Description:
Coreference Resolution is one of the most challenging tasks in Natural
Language Processing (NLP) [Ng 2010], [Godbert and Benoit 2017], [Lee et
al. 2017].  Recent advances in neural model architectures allowed for
impressive improvements in this domain [Wiseman et al. 2016], [Lee et
al. 2017, 2018]. Traditionally however, coreference resolution tasks are
based on previously, manually annotated corpora [Pradhan et al. 2012],
[D esoyer et al. 2016]. Such resources are relatively rare, and very
expensive to obtain from scratch. Recent neural translation models
proved to be very e ffective in capturing long range contexts [Vaswani
et al. 2017], [Voita et al. 2018], [Maruf and Ha ari 2017], [Bawden et
al. 2017], [Zhang et al. 2018], [Miculicich et al. 2018]. In particular
[Voita et al. 2018] showed that document-level neural translation models
capture to some extent coreference (at least anaphora) phenomena. We
want to exploit this feature of neural translation models to
automatically extract coreference phenomena from text. The automatically
extracted annotations will be used for data augmentation for coreference
resolution neural models [Vaswani et al.  2017] in order to study their
impact on quantitative evaluations.

In this internship the student will use and modify existing systems
[Voita et al.  2018], [Lee et al. 2017] in order to automatically
extract coreference phenomena from textual data. The latter will be used
to augment existing data [Pradhan et al. 2012] for training a
coreference resolution neural model and study the impact of
automatically created data on its performance.

Profi le:

- Student for internship level (Master 2) in computer science, or from
  engineering school
- Computer science skills:
  * Python programming with good knowledge of deep learning libraries
    (Pytorch)
  * Data manipulation (textual data): loading di fferent formats, format
    transformation, storing in smart data structures, writing on disk in
    different format, etc.
- Interested in Natural Language Processing
- Skills in machine learning for probabilistic models

The internship may last from 4 up to 6 months, it will take place at LIG
laboratory, GETALP team (http://lig-getalp.imag.fr/), starting from
January/ February 2020. The student will be tutored by Marco Dinarelli
(http:// www.marcodinarelli.it), and Laurent Besacier
(https://cv.archives-ouvertes.  fr/laurent-besacier).

Interested candidates must send a CV and a motivation letter to
marco.dinarelli@univ-grenoble-alpes.fr, and
laurent.besacier@univ-grenoble-alpes.fr.

REFERENCES

Rachel Bawden, Rico Sennrich, Alexandra Birch, and Barry Haddow.
Evaluating discourse phenomena in neural machine translation. CoRR,
abs/1711.00513, 2017. URL http://arxiv.org/abs/1711.00513.

Ad ele D esoyer, Fr ed eric Landragin, Isabelle Tellier, Ana s Lefeuvre,
Jean-Yves Antoine, and Marco Dinarelli. Coreference resolution for
french oral data: Machine learning experiments with ancor. In
Proceedings of the 17th In- ternational Conference on Computational
Linguistics and Intelligent Text Processing, Konya, Turkey, April
2016. Lecture Notes in Computer Science (Springer).

Elisabeth Godbert and Favre Benoit. D etection de cor ef erences de bout
en bout en fran cais. In TALN 2017, Orl eans, France, June 2017. URL
https: //hal.archives-ouvertes.fr/hal-01687116.

Kenton Lee, Luheng He, Mike Lewis, and Luke Zettlemoyer. End-to-end
neural coreference resolution. In Proceedings of EMNLP. Association for
Computational Linguistics, 2017. URL
http://aclweb.org/anthology/D17-1018.

Kenton Lee, Luheng He, and Luke Zettlemoyer. Higher-order coreference
resolution with coarse-to- ne inference. CoRR, abs/1804.05392, 2018. URL
http://arxiv.org/abs/1804.05392.

Sameen Maruf and Gholamreza Ha ari. Document context neural machine
translation with memory networks. CoRR, abs/1711.03688, 2017. URL
http://arxiv.org/abs/1711.03688.

Lesly Miculicich, Dhananjay Ram, Nikolaos Pappas, and James Henderson.
Document-level neural machine translation with hierarchical attention
networks.  CoRR, abs/1809.01576, 2018. URL http://arxiv.org/abs/1809.
01576.

Vincent Ng. Supervised noun phrase coreference research: The rst fteen
years. In Proceedings of the 48th Annual Meeting of the Association for
Com- putational Linguistics, ACL '10, pages 1396{1411, Stroudsburg, PA,
USA, 2010. Association for Computational Linguistics. URL
http://dl.acm.org/ citation.cfm?id=1858681.1858823.

Sameer Pradhan, Alessandro Moschitti, Nianwen Xue, Olga Uryupina, and
Yuchen Zhang. Conll-2012 shared task: Modeling multilingual unrestricted
coreference in ontonotes. In Joint Conference on EMNLP and CoNLL -
Shared Task, CoNLL '12, pages 1{40, Stroudsburg, PA, USA,
2012. Association for Computational Linguistics. URL
http://dl.acm.org/citation.  cfm?id=2391181.2391183.

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones,
Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin. Attention is all
you need. In Proceedings of NIPS, 2017. URL https://arxiv.org/pdf/1706.
03762.pdf.

Elena Voita, Pavel Serdyukov, Rico Sennrich, and Ivan
Titov. Contextaware neural machine translation learns anaphora
resolution. CoRR, abs/1805.10163, 2018. URL
http://arxiv.org/abs/1805.10163.

Sam Wiseman, Alexander M. Rush, and Stuart M. Shieber. Learning global
features for coreference resolution. CoRR, abs/1604.03035, 2016. URL
http: //arxiv.org/abs/1604.03035.

Jiacheng Zhang, Huanbo Luan, Maosong Sun, FeiFei Zhai, Jingfang Xu, Min
Zhang, and Yang Liu. Improving the transformer translation model with
document-level context. CoRR, abs/1810.03581, 2018. URL http://arxiv.
org/abs/1810.03581.