Title:  Multitask Deep Learning for Joint Syntactic and Semantic
        Easy-first Dependency Parsing

Context: Collaboration between RCLN
    (https://lipn.univ-paris13.fr/accueil/equipe/rcln/), LIPN,
    Université Paris 13, and CAMeL Lab (https://bit.ly/2M0XsAG),
    New York University Abu Dhabi
Host lab: LIPN, Université Paris 13, 99 Avenue Jean Baptiste Clément,
    93430 Villetaneuse
Supervisors: Nadi Tomeh and Joseph Le Roux
Collaborator: Nizar Habash, NYU Abu Dhabi
Start date: February 2021
Duration: 6 months
Stipend: 550 euros/month
Profile and required skills:
    -   Masters in Computer Science, Computational Linguistics, Applied
        Mathematics, or Statistics
    -   Knowledge in Natural Language Processing and Deep Learning is
        highly appreciated
    -   Programming skills in Python (and libraries such as pytorch,
        numpy, or scikit-learn)
How to apply:
    send CV, grades, motivation and recommendation letters to
    tomeh@lipn.fr and leroux@lipn.fr
Permalink:
https://lipn.univ-paris13.fr/~tomeh/public/uploads/offers/2021-internship-multitask-parsing.pdf

Context
In recent work on dependency parsing for Arabic (Kankanampati et al.
2020), we proposed a multitask algorithm based on the easy-first
hierarchical LSTM parser of Kiperwasser and Goldberg (2016). The
multitask algorithm is capable of decoding a sentence into multiple
formalisms and is learned from multiple corresponding treebanks. In the
experiments, we considered two representations, the first one is the
Columbia Arabic Treebank (CATiB) (Habash and Roth, 2009), which is
inspired by Arabic traditional grammar and focuses on modeling
syntactic and morpho-syntactic agreement and case assignment.  The
second is the Universal Dependency (UD) treebank for Arabic (Taji et
al., 2017), which has relatively more focus on semantic/thematic
relations within the sentence, and is coordinated in design with a
number of other languages.
The multitask system enables sharing representations at various levels
of abstraction, and at different time steps of the parsing process,
which makes it possible to communicate information across formalisms
and to learn when sharing is important and when it is not. The joint
system outperforms the single-task baseline on both CATiB and UD
treebanks.

Propositions
We propose to extend the work of Kankanampati et al. (2020) in two
ways:

(i) The joint parser indirectly learns the order in which to produce
the arcs of CATiB and UD trees. In fact, the easiest decision in a
local context is selected at each step during parsing. During training,
the parser is allowed to explore erroneous arcs to reduce the effect of
error propagation, sometimes referred to as the exposure bias problem.
This is done by designing an optimal learning policy also known as a
dynamic oracle. In our multitask setting, the dynamic-oracle-based
training is suboptimal, since the number of arcs allowed by the dynamic
oracle during training is large and the order in which they should be
predicted is unknown. In the experiments, the parser switches between
the two dimensions in about 65% of the time, but we noticed that its
performance can be improved by explicitly controlling the switching
frequency heuristically. Instead of designing a new dynamic oracle for
the multitask parser, we will explore reinforcement learning as a
principled framework to learn this kind of sequential decisions to
further reduce error propagation, similar to Zhang and Chan (2009). We
will however consider a policy gradient approach since it is
straightforward to apply in deep learning because it is gradient-based.
Policy gradient learning was shown to help transition-based syntactic
dependency parsing (Le and Fokkens, 2017), constituency parsing (Fried
and Klein, 2018), and semantic dependency parsing (Kurita and Søgaard,
2019). Other approaches to RL such as DQN, Actor/Critic and MaxEnt RL
can also be considered.

(ii) The current model uses Bi-LSTMs for contextual encoding of lexical
and part-of-speech tags and. It also uses a multilayer perceptron for
arc and label scoring. We will be exploring options to replace these
components with transformer-based encoders and attention mechanisms.
The multitask model uses a predefined parameter sharing strategy by
specifying which layers have tied parameters. The search for the best
sharing architecture considered a few alternatives and compared them on
the devset to select the best one. Similar to Yang and Hospedales
(2017) and Ruder et al. (2019) we want to consider learning the best
sharing strategy in a data-driven way to find the layers or subspaces
that benefit from sharing, the appropriate amount of sharing, and the
appropriate relative weights of the different task losses.

The baseline multitask parser is implemented in Python and will be our
starting point:
https://github.com/yash-reddy/MEF_parser

References
-   Nizar Habash and Ryan M. Roth. "CATiB: The Columbia Arabic
    Treebank." ACL (2009).
-   Lidan Zhang and Kwok Ping Chan. "Dependency Parsing with
    Energy-based Reinforcement Learning." IWPT (2009).
-   Eliyahu Kiperwasser and Yoav Goldberg. "Easy-first dependency
    parsing with hierarchical tree LSTMs." TACL (2016).
-   Dima Taji, Nizar Habash, and Daniel Zeman. "Universal Dependencies
    for Arabic." WANLP (2017).
-   Yongxin Yang, Timothy M. Hospedales. "Trace Norm Regularised Deep
    Multi-Task Learning." ICLR (2017).
-   Minh Le and Antske Fokkens. "Tackling Error Propagation through
    Reinforcement Learning: A Case of Greedy Dependency Parsing".
    EACL (2017).
-   Fried, Daniel and D. Klein. "Policy Gradient as a Proxy for Dynamic
    Oracles in Constituency Parsing." ACL (2018).
-   Ruder, Sebastian, Joachim Bingel, Isabelle Augenstein and Anders
    Søgaard. "Latent Multitask Architecture Learning." AAAI (2019).
-   Shuhei Kurita and Anders Søgaard. "Multi-Task Semantic Dependency
    Parsing with Policy Gradient for Learning Easy-First Strategies."
    ACL (2019).
-   Kankanampati, Yash, Joseph Le Roux, Nadi Tomeh, Dima Taji and Nizar
    Habash. "Multitask Easy-First Dependency Parsing: Exploiting
    Complementarities of Different Dependency Representations."
    COLING (2020).