The LIG (Laboratoire d'Informatique de Grenoble) laboratory proposes the following M2 stage (research) Title: Automatic Coreference Extraction Description: Coreference Resolution is one of the most challenging tasks in Natural Language Processing (NLP) [Ng 2010], [Godbert and Benoit 2017], [Lee et al. 2017]. Recent advances in neural model architectures allowed for impressive improvements in this domain [Wiseman et al. 2016], [Lee et al. 2017, 2018]. Traditionally however, coreference resolution tasks are based on previously, manually annotated corpora [Pradhan et al. 2012], [D esoyer et al. 2016]. Such resources are relatively rare, and very expensive to obtain from scratch. Recent neural translation models proved to be very e ffective in capturing long range contexts [Vaswani et al. 2017], [Voita et al. 2018], [Maruf and Ha ari 2017], [Bawden et al. 2017], [Zhang et al. 2018], [Miculicich et al. 2018]. In particular [Voita et al. 2018] showed that document-level neural translation models capture to some extent coreference (at least anaphora) phenomena. We want to exploit this feature of neural translation models to automatically extract coreference phenomena from text. The automatically extracted annotations will be used for data augmentation for coreference resolution neural models [Vaswani et al. 2017] in order to study their impact on quantitative evaluations. In this internship the student will use and modify existing systems [Voita et al. 2018], [Lee et al. 2017] in order to automatically extract coreference phenomena from textual data. The latter will be used to augment existing data [Pradhan et al. 2012] for training a coreference resolution neural model and study the impact of automatically created data on its performance. Profi le: - Student for internship level (Master 2) in computer science, or from engineering school - Computer science skills: * Python programming with good knowledge of deep learning libraries (Pytorch) * Data manipulation (textual data): loading di fferent formats, format transformation, storing in smart data structures, writing on disk in different format, etc. - Interested in Natural Language Processing - Skills in machine learning for probabilistic models The internship may last from 4 up to 6 months, it will take place at LIG laboratory, GETALP team (http://lig-getalp.imag.fr/), starting from January/ February 2020. The student will be tutored by Marco Dinarelli (http:// www.marcodinarelli.it), and Laurent Besacier (https://cv.archives-ouvertes. fr/laurent-besacier). Interested candidates must send a CV and a motivation letter to marco.dinarelli@univ-grenoble-alpes.fr, and laurent.besacier@univ-grenoble-alpes.fr. REFERENCES Rachel Bawden, Rico Sennrich, Alexandra Birch, and Barry Haddow. Evaluating discourse phenomena in neural machine translation. CoRR, abs/1711.00513, 2017. URL http://arxiv.org/abs/1711.00513. Ad ele D esoyer, Fr ed eric Landragin, Isabelle Tellier, Ana s Lefeuvre, Jean-Yves Antoine, and Marco Dinarelli. Coreference resolution for french oral data: Machine learning experiments with ancor. In Proceedings of the 17th In- ternational Conference on Computational Linguistics and Intelligent Text Processing, Konya, Turkey, April 2016. Lecture Notes in Computer Science (Springer). Elisabeth Godbert and Favre Benoit. D etection de cor ef erences de bout en bout en fran cais. In TALN 2017, Orl eans, France, June 2017. URL https: //hal.archives-ouvertes.fr/hal-01687116. Kenton Lee, Luheng He, Mike Lewis, and Luke Zettlemoyer. End-to-end neural coreference resolution. In Proceedings of EMNLP. Association for Computational Linguistics, 2017. URL http://aclweb.org/anthology/D17-1018. Kenton Lee, Luheng He, and Luke Zettlemoyer. Higher-order coreference resolution with coarse-to- ne inference. CoRR, abs/1804.05392, 2018. URL http://arxiv.org/abs/1804.05392. Sameen Maruf and Gholamreza Ha ari. Document context neural machine translation with memory networks. CoRR, abs/1711.03688, 2017. URL http://arxiv.org/abs/1711.03688. Lesly Miculicich, Dhananjay Ram, Nikolaos Pappas, and James Henderson. Document-level neural machine translation with hierarchical attention networks. CoRR, abs/1809.01576, 2018. URL http://arxiv.org/abs/1809. 01576. Vincent Ng. Supervised noun phrase coreference research: The rst fteen years. In Proceedings of the 48th Annual Meeting of the Association for Com- putational Linguistics, ACL '10, pages 1396{1411, Stroudsburg, PA, USA, 2010. Association for Computational Linguistics. URL http://dl.acm.org/ citation.cfm?id=1858681.1858823. Sameer Pradhan, Alessandro Moschitti, Nianwen Xue, Olga Uryupina, and Yuchen Zhang. Conll-2012 shared task: Modeling multilingual unrestricted coreference in ontonotes. In Joint Conference on EMNLP and CoNLL - Shared Task, CoNLL '12, pages 1{40, Stroudsburg, PA, USA, 2012. Association for Computational Linguistics. URL http://dl.acm.org/citation. cfm?id=2391181.2391183. Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin. Attention is all you need. In Proceedings of NIPS, 2017. URL https://arxiv.org/pdf/1706. 03762.pdf. Elena Voita, Pavel Serdyukov, Rico Sennrich, and Ivan Titov. Contextaware neural machine translation learns anaphora resolution. CoRR, abs/1805.10163, 2018. URL http://arxiv.org/abs/1805.10163. Sam Wiseman, Alexander M. Rush, and Stuart M. Shieber. Learning global features for coreference resolution. CoRR, abs/1604.03035, 2016. URL http: //arxiv.org/abs/1604.03035. Jiacheng Zhang, Huanbo Luan, Maosong Sun, FeiFei Zhai, Jingfang Xu, Min Zhang, and Yang Liu. Improving the transformer translation model with document-level context. CoRR, abs/1810.03581, 2018. URL http://arxiv. org/abs/1810.03581.