*Research engineer in NLP at IRIT, Toulouse (France) - ANR AnDiAMO*

Data and software support for robust discourse parsing and its
application

-   Contract duration: 12 months

-   Starting date: June 2022 (flexible)

-   Location: IRIT, Université P. Sabatier (Toulouse III)

-   Remuneration: 2035-2630 euros, gross salary, depending on
    experience

-   Application deadline: the position will be open until fulfilled

-   Send application by email to chloe.braud@irit.fr

-   More information at:
    https://pagesperso.irit.fr/~Chloe.Braud/andiamo/

*Natural Language Processing *(NLP) is a domain at the frontier of AI,
computer science and linguistics, aiming at developing systems able to
automatically analyze textual documents.
Within NLP, *d**iscourse parsing* is a crucial but challenging task:
its goal is to produce structures describing the relationships
(e.g. *explanation, contrast*...) between spans of text in full
documents, allowing for making inference on their content. Developing
high-performing and robust discourse parsers could help to improve
downstream applications such as automatic summarization or translation,
question-answering, chat bots. However, current performance are still
low, mainly due to the lack of annotated data.

In order to develop robust discourse parsers within the *AnDiAMO*
project, we want to explore multi-objective settings, where the goal is
ultimately to perform a discourse analysis, but relying on another
related objective such as performing well on another task (e.g.
morphological, syntactic or temporal analysis), or an application
(e.g. sentiment analysis or argument mining). We will also explore the
issues of cross-language and cross framework learning.

The hired engineer will be in charge of:

-   *Set up evaluation*: set up pipeline systems for evaluation of
    downstream applications (e.g. sentiment analysis,
    question-answering, argument mining...) ; investigating different
    ways of using the discourse parsers outputs to test the impact of
    discourse information.

-   *Corpus curation*: collect datasets for several tasks (e.g. POS
    tagging, syntactic parsing, temporality, modality...) and
    pre-process them ;

-   *Corpus harmonization*: collect existing discourse corpora and
    harmonize them, following the format used for the DisRPT shared
    task
    (https://sites.google.com/georgetown.edu/disrpt2021/home?authuser=0)

The position is funded by the ANR AnDiAMO project, for which postdocs
and master interns will also be recruited. Collaborations are planned
with researchers in Toulouse, Grenoble, Nancy and Munich. The hired
person will be part of the MELODI team at IRIT, participating in team
and project meetings, and co-authoring articles.

*### Profile*

-   Master or PhD degree in computer science or computational linguistics

-   Interest in language technology / NLP

The recruited engineer should have good developing skills. Knowledge in
machine learning would be a plus. In addition to these tasks, it will
be possible to investigate other paths, such as building multi-task
learning architectures or testing few-shot learning strategies,
according to the interests of the candidate.


*### Application*

Please send a CV and a few lines explaining your interest for the
position to chloe.braud@irit.fr