We propose two internship positions at IRIT, Toulouse, in the MELODI team. Please send an email to apply, with a CV and a few lines explaining your motivation at: chloe.braud@irit.fr and philippe.muller@irit.fr Multilingual discourse relation prediction In Natural Language Processing, "discourse structure" corresponds to the semantic links between sentences or paragraphs, e.g. "explanation", "contrast", "elaboration", that organize a document in a coherent manner. Predicting such links is still a difficult task, all the more as available supervised data is rather scarce, when it exists, for most languages except English. The goal of this internship is to develop models that can leverage data in multiple languages to improve results on relation prediction. This task is crucial to enhance performance of current discourse parsers, and could also be used to develop larger datasets annotated with relations for other tasks, such as machine reading and question generation. The first step of this internship will be dedicated to a review of the state of the art on discourse relation prediction, including a review of the existing datasets for English and for other languages: many of them have been pre-processed and made available through a shared task organized in 2019 and 2021, see the website here: https://sites.google.com/georgetown.edu/disrpt2021/call-for-participation. Through this shared task, we have access to datasets for 11 languages in the same format, but each corpus presents specific features, especially in terms of relation sets. The second step will be to develop a system for the identification of discourse relations in a multilingual setting. A few methods will be compared: merging corpora at training time while using a multilingual pretrained language model, with different merging strategies, multi- task learning, with different architectures, or automated translation of corpora. Supervision: Chloé Braud chloé.braud@irit.fr, Philippe Muller philippe.muller@irit.fr Location: IRIT, University of Toulouse, France, within the Melodi team Duration: 5-6 months Compensation: 546€/month Requirements: Master 2 or equivalent in Computer Science or Mathematics; good programming skills and knowledge of Machine Learning principles and tools. Some knowledge of NLP would be a plus but is not required. References: - Amir Zeldes, Yang Janet Liu, Mikel Iruskieta, Philippe Muller, Chloé Braud, and Sonia Badene. 2021. The DISRPT 2021 Shared Task on Elementary Discourse Unit Segmentation, Connective Detection, and Relation Classification. In Proceedings of the 2nd Shared Task on Discourse Relation Parsing and Treebanking (DISRPT 2021) - Chloé Braud, Barbara Plank, and Anders Søgaard. 2016. Multi-view and multi-task training of RST discourse parsers. In Proceedings of COLING 2016 - Phylogenetic Multi-Lingual Dependency Parsing. Mathieu Dehouck and Pascal Denis. NAACL 2019, Minneapolis, USA. - Zhengyuan Liu, Ke Shi, and Nancy Chen. 2020. Multilingual Neural RST Discourse Parsing. In Proceedings of the 28th International Conference on Computational Linguistics, pages 6730-6738