Postdoc on "Discourse Segmentation and Parsing of Spoken Conversations" Laboratoire Parole et Langage (Aix-En-Provence) / LINAGORA Labs (Toulouse) / IRIT (Toulouse Computer Science Institute) Applications are invited for a 24 month postdoctoral position on discourse modelling for spontaneous, spoken conversation within the context of the ANR project SUMM-RE (ANR-20-CE23-0017, https://anr.fr/Projet-ANR-20-CE23-0017). The long-term goal of SUMM-RE is to improve algorithms for automatic meeting summarization and meeting minutes. The central hypothesis of the project is that such systems will benefit greatly from exploiting rich information carried by discourse relations (Explanations, Questions/Answers, Corrections...) and discourse structure (in the form of graphs). One of the major objectives of the project is therefore to develop an incremental discourse parser for spontaneous conversation, building on extant work by SUMM-RE members using weak supervision (Badene et al. 2019). Discourse parsing will be done on English (the AMI corpus) and French data, but the principal focus will be on a 100h corpus of meetings in French whose creation will be completed by the time the postdoc starts. The postdoc recruited for this position will be in charge of (i) adapting models of discourse segmentation (e.g. Muller et al. 2019) to meeting-style conversation by building on recent advances with weak supervision (Gravellier et al. 2021) and integrating both speech and acoustic parameters in the segmentation model; (ii) applying insights from discourse segmentation, which provides the foundation for discourse parsing, to improve the incremental discourse parser; (iii) considering and developing mitigation strategies for working directly on ASR output (rather than on gold human transcribed data) for both discourse segmentation and parsing. (The French corpus is transcribed automatically with LINAGORA's state-of-the-art speech-to-text system, LinSTT.) Given these tasks we are looking for a candidates with as many of the following skills as possible: - Experience with speech and ASR, and conversational speech in particular - Dialogue/conversation/interaction analysis and modeling - Machine Learning, in particular Weakly Supervised and Unsupervised approaches - Multimodal (speech + text) Deep Representations for Natural Language Processing - Multilingual model transfer We aim for a starting date around April 2022. The salary will be determined according to French university standards (ranging from 2000 to 2300 euros / month depending on experience, after tax and health insurance coverage). Funding for presenting relevant research results at conferences will be covered by the SUMM-RE project. A minimal command of French is desirable as the postdoc will be required to handle a large French corpus; mastery of French is, however, not required. The postdoc will ideally be hosted by the Laboratoire Parole et Langage (LPL), though exceptions will be considered for candidates who wish to be based at IRIT. LPL is located in the center of Aix-en-Provence (http://www.aixenprovencetourism.com/), a sunny, medium-sized city of South East France, nestled in the Provence countryside, 30 minutes from the Mediterranean and 1h30 from the Alps. It is an active lab currently involved in several large scale projects (Institue for Language Communication and the Brain, https://www.ilcb.fr/ ; Conversational Brains, https://www.cobra-network.eu/), offering a stimulating research environment and a large and diverse opportunities of collaborations. IRIT is located in the southwestern city of Toulouse, the fourth-largest city in France, only an hour from the Pyrénees and two hours from the Mediterranean. IRIT, and in particular the MELODI team, brings internationally recognized expertise in natural language processing, especially in the subdomains of discourse segmentation (Muller et al. 2019); machine learning for discourse parsing, including approaches using weak supervision (Badene et al. 2019); theories of discourse structure (SDRT; Asher & Lascarides 2003); and exploitation of corpora for studying discourse structure, both monologue and multilogue (Asher et al. 2020). A curriculum vitae and a list of publications should be sent to Laurent Prévot (laurent.prevot@univ-amu.fr) no later than January 31st, but we strongly encourage potential candidates to submit their applications as soon as possible, as we might fill the position earlier. For more information, please visit the following web pages: SUMM-RE: anr.fr/Projet-ANR-20-CE23-0017, https://labs.linagora.com/summ-re/ Laboratoire Parole et Langage: https://www.lpl-aix.fr/en/welcome-to-lpl/ LINAGORA Labs: labs.linagora.com MELODI @ IRIT : https://www.irit.fr/departement/intelligence-artificielle/equipe-melodi/ Asher, N., Hunter, J., Morey, M., Benamara, F., Afantenos, S. (2016): Discourse structure and dialogue acts in multiparty dialogue: the STAC corpus. 10th Conference on Language Resources and Evaluation (LREC 2016), 2721-2727. Asher, N., Lascarides, A. (2003): The Logics of Conversation. Cambridge University Press. Badene, S., Thompson, K., Lorré, J. P., Asher, N. (2019): Weak Supervision for Learning Discourse Structure. Conference on Empirical Methods in Natural Language Processing (EMNLP), 2296-2305. Gravellier, L., Hunter, J., Muller, P., Pellegrini, T., Ferrané, I. (2021): Weakly supervised discourse segmentation for multiparty oral conversations. Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing (EMNLP), 1381-1392. Muller, P., Braud, C., Morey, M. (2019): ToNy: Contextual embeddings for accurate multilingual discourse segmentation of full documents. Proceedings of the Workshop on Discourse Relation Parsing and Treebanking 2019, 115-124.