*Postdoc in NLP - Discourse parsing at IRIT, Toulouse (France) - ANR
AnDiAMO*

Developing systems towards robust discourse parsing and its application


- Contract duration: 12 months

- Starting date: May 2022 (flexible)

- Location: IRIT, Universit=C3=A9 P. Sabatier (Toulouse III)

- Remuneration: TODO euros, gross salary, depending on experience

- Application deadline: the position will be open until fulfilled

- Send application by email to chloe.braud@irit.fr

- More information at: https://www.irit.fr/~Chloe.Braud/andiamo/


*Natural Language Processing* (NLP) is a domain at the frontier of AI,
computer science and linguistics, aiming at developing systems able to
automatically analyze textual documents. Within NLP, *discourse
parsing* is a crucial but challenging task: its goal is to produce
structures describing the relationships (e.g. *explanation,
contrast*...) between spans of text in full documents, allowing for
making inference on their content. Developing high-performing and
robust discourse parsers could help to improve downstream applications
such as automatic summarization or translation, question-answering,
chat bots, e.g. [1,2,3]. However, current performance are still low,
mainly due to the lack of annotated data (see e.g. [4] on monologues,
[5] on dialogues, [6,7] for the multilingual setting).

In order to develop robust discourse parsers within the *AnDiAMO*
project, we want to explore multi-objective settings, where the goal
is ultimately to perform a discourse analysis, but relying on another
related objective such as performing well on another task
(e.g. morphological, syntactic or temporal analysis), or an
application (e.g. sentiment analysis or argument mining). We will also
explore the issues of cross-language and cross framework learning.

The recruited candidate will work on one or several of the following
topics, depending on its interests:

- *Data representation*: Discourse processing requires information
from various levels of linguistics analysis. For now, existing studies
do not make it clear what kind of information is important and needed,
and some potentially relevant sources of information are ignored. We
plan to explore this issue within a multi-task learning setting, where
a system has to jointly learn different tasks. We will experiment on
classification tasks (discourse relation, segmentation) and on full
discourse parsing.

- *Transferring to new languages, domains and modalities*: Developing
systems that perform well on domains or languages (that are) different
from those used at training time is crucial, especially if the
adaptation can be done in an unsupervised way. It is especially
important for discourse, since annotation is very hard and
time-consuming. We plan to experiment with cross-lingual embeddings
and to explore multi-task learning, but trying to understand how to
integrate additional linguistic information with only little annotated
data for auxiliary tasks. We also want to investigate dialogues, for
which only a few discourse parsers exist, and better understand how it
differs for monologues.

- *Extrinsic evaluation*: We will investigate a few downstream
applications that could benefit from discourse information, as a way
to give an extrinsic evaluation. We will explore pipeline systems,
varying the way we encode the discourse information as input of our
end system. We will also explore transfer learning strategies, either
via multi-task learning or representation learning. We plan to start
with cognitive impairment detection (e.g. schizophrenia, Alzheimer)
and argument mining. More applications will be considered, depending
on the interest of the recruited postdoc.

It will be possible to investigate other paths of research, such as
few-shot or unsupervised learning, depending on the interest of the
recruited candidate.

The position is funded by the ANR AnDiAMO project, for which an
engineer and master interns will also be recruited. Collaborations are
planned with researchers in Toulouse, Grenoble, Nancy and Munich. The
hired person will be part of the MELODI team at IRIT, participating in
team and project meetings, and co-authoring articles.


### Profile

- PhD degree in computer science or computational linguistics

- Good knowledge in Machine Learning

- Interest in language technology / NLP

- Good programming skills: preferably with Python, knowledge of
PyTorch is a plus


### Application

Please send a CV and a few lines explaining your interest for the
position to chloe.braud@irit.fr


### References

[1] Feng, X., Feng, X., Qin, B., and Geng, X. Dialogue Discourse-Aware
Graph Model and Data Augmentation for Meeting Summarization. *In
Proceedings of IJCAI.* 2019.

[2] Bawden, R., Sennrich, R., Birch, A., and Haddow, B. Evaluating
Discourse Phenomena in Neural Machine Translation. In *Proceedings of
NAACL. 2018*

[3] Xu, J., Gan, Z., Cheng, Y., & Liu, J. Discourse-Aware Neural
Extractive
Text Summarization. In *Proceedings of ACL. 2020*

[4] Koto, F., Lau, J. H., & Baldwin, T. Top-down Discourse Parsing via
Sequence Labelling. In *Proceedings of *EACL. 2021

[5] Liu, Z., & Chen, N. Improving Multi-Party Dialogue Discourse
Parsing
via Domain Integration. In *Proceedings of the 2nd Workshop on
Computational Approaches to Discourse*. 2021

[6] Braud, C., Coavoux, M., & S=C3=B8gaard, A. Cross-lingual RST
Discourse
Parsing. In *Proceedings of *EACL. 2017

[7] Liu, Z., Shi, K., & Chen, N. DMRST: A Joint Framework for
Document-Level Multilingual RST Discourse Segmentation and Parsing. In
*Proceedings
of the 2nd Workshop on Computational Approaches to Discourse*. 2021