*Postdoc in NLP - Discourse parsing at IRIT, Toulouse (France) - ANR AnDiAMO* Developing systems towards robust discourse parsing and its application - Contract duration: 12 months - Starting date: May 2022 (flexible) - Location: IRIT, Universit=C3=A9 P. Sabatier (Toulouse III) - Remuneration: TODO euros, gross salary, depending on experience - Application deadline: the position will be open until fulfilled - Send application by email to chloe.braud@irit.fr - More information at: https://www.irit.fr/~Chloe.Braud/andiamo/ *Natural Language Processing* (NLP) is a domain at the frontier of AI, computer science and linguistics, aiming at developing systems able to automatically analyze textual documents. Within NLP, *discourse parsing* is a crucial but challenging task: its goal is to produce structures describing the relationships (e.g. *explanation, contrast*...) between spans of text in full documents, allowing for making inference on their content. Developing high-performing and robust discourse parsers could help to improve downstream applications such as automatic summarization or translation, question-answering, chat bots, e.g. [1,2,3]. However, current performance are still low, mainly due to the lack of annotated data (see e.g. [4] on monologues, [5] on dialogues, [6,7] for the multilingual setting). In order to develop robust discourse parsers within the *AnDiAMO* project, we want to explore multi-objective settings, where the goal is ultimately to perform a discourse analysis, but relying on another related objective such as performing well on another task (e.g. morphological, syntactic or temporal analysis), or an application (e.g. sentiment analysis or argument mining). We will also explore the issues of cross-language and cross framework learning. The recruited candidate will work on one or several of the following topics, depending on its interests: - *Data representation*: Discourse processing requires information from various levels of linguistics analysis. For now, existing studies do not make it clear what kind of information is important and needed, and some potentially relevant sources of information are ignored. We plan to explore this issue within a multi-task learning setting, where a system has to jointly learn different tasks. We will experiment on classification tasks (discourse relation, segmentation) and on full discourse parsing. - *Transferring to new languages, domains and modalities*: Developing systems that perform well on domains or languages (that are) different from those used at training time is crucial, especially if the adaptation can be done in an unsupervised way. It is especially important for discourse, since annotation is very hard and time-consuming. We plan to experiment with cross-lingual embeddings and to explore multi-task learning, but trying to understand how to integrate additional linguistic information with only little annotated data for auxiliary tasks. We also want to investigate dialogues, for which only a few discourse parsers exist, and better understand how it differs for monologues. - *Extrinsic evaluation*: We will investigate a few downstream applications that could benefit from discourse information, as a way to give an extrinsic evaluation. We will explore pipeline systems, varying the way we encode the discourse information as input of our end system. We will also explore transfer learning strategies, either via multi-task learning or representation learning. We plan to start with cognitive impairment detection (e.g. schizophrenia, Alzheimer) and argument mining. More applications will be considered, depending on the interest of the recruited postdoc. It will be possible to investigate other paths of research, such as few-shot or unsupervised learning, depending on the interest of the recruited candidate. The position is funded by the ANR AnDiAMO project, for which an engineer and master interns will also be recruited. Collaborations are planned with researchers in Toulouse, Grenoble, Nancy and Munich. The hired person will be part of the MELODI team at IRIT, participating in team and project meetings, and co-authoring articles. ### Profile - PhD degree in computer science or computational linguistics - Good knowledge in Machine Learning - Interest in language technology / NLP - Good programming skills: preferably with Python, knowledge of PyTorch is a plus ### Application Please send a CV and a few lines explaining your interest for the position to chloe.braud@irit.fr ### References [1] Feng, X., Feng, X., Qin, B., and Geng, X. Dialogue Discourse-Aware Graph Model and Data Augmentation for Meeting Summarization. *In Proceedings of IJCAI.* 2019. [2] Bawden, R., Sennrich, R., Birch, A., and Haddow, B. Evaluating Discourse Phenomena in Neural Machine Translation. In *Proceedings of NAACL. 2018* [3] Xu, J., Gan, Z., Cheng, Y., & Liu, J. Discourse-Aware Neural Extractive Text Summarization. In *Proceedings of ACL. 2020* [4] Koto, F., Lau, J. H., & Baldwin, T. Top-down Discourse Parsing via Sequence Labelling. In *Proceedings of *EACL. 2021 [5] Liu, Z., & Chen, N. Improving Multi-Party Dialogue Discourse Parsing via Domain Integration. In *Proceedings of the 2nd Workshop on Computational Approaches to Discourse*. 2021 [6] Braud, C., Coavoux, M., & S=C3=B8gaard, A. Cross-lingual RST Discourse Parsing. In *Proceedings of *EACL. 2017 [7] Liu, Z., Shi, K., & Chen, N. DMRST: A Joint Framework for Document-Level Multilingual RST Discourse Segmentation and Parsing. In *Proceedings of the 2nd Workshop on Computational Approaches to Discourse*. 2021