Title: Multitask Learning of Easy-first Hierarchical Tree LSTMs for Joint Syntactic and Semantic Arabic Dependency Parsing


Context: Collaboration between RCLN (https://lipn.univ-paris13.fr/accueil/equipe/rcln/), LIPN, Université Paris 13, and CAMeL Lab (https://bit.ly/2M0XsAG), New York University Abu Dhabi

Host lab: LIPN, Université Paris 13, 99 Avenue Jean Baptiste Clément,
93430 Villetaneuse

Supervisors: Joseph Le Roux and Nadi Tomeh

Collaborators: Nizar Habash and Dima Taji

Start date: February 2020

Duration: 6 months

Salary: 550 euros/month

Profile and required skills:

- Masters in Computer Science, Computational Linguistics, Applied
Mathematics, or Statistics

- Knowledge in Natural Language Processing and Deep Learning is highly
appreciated

- Programming skills in Python (and libraries such as pytorch, numpy,
or scikit-learn)

How to apply: send CV and available Masters' grades to tomeh@lipn.fr
and leroux@lipn.fr


Description:

In recent work on semantic parsing, Peng et al. [2017; 2018]; and
Kurita and Søgaard [2019] showed that the overlap between three
different theories of semantics and their corresponding
representations can be exploited to improve performance on all three
tasks. This is done using multitask learning in a deep neural
architecture. We would like to explore ways in which this approach can
be applied to Arabic, which has rich morphology and complex
morpho-syntactic interactions. We will work with two different
dependency representations. The first is the Columbia Arabic Treebank
(CATiB) representation [Habash and Roth, 2009], which is inspired by
Arabic traditional grammar and which focus on modeling syntactic and
morpho-syntactic agreement and case assignment.  The second is the
Universal Dependency (UD) representation for Arabic [Taji et al.,
2017], which has relatively more focus on semantic/thematic relations
within the sentence, and which is coordinated in design with a number
of other languages [Nivre et al., 2016]. The two representations
complement each other and stand to benefit from multitask learning
approaches.

In this context, we propose to

(i) Extend the easy-first hierarchical LSTM parser of Kiperwasser and
Goldberg [2016] to multitask settings. We have shown that this
approach can be useful for joint lexical segmentation and dependency
parsing [Constant et al., 2016]. In that work we used as our
single-task model the easy-first parser of Goldberg and Elhadad [2010]
trained with dynamic oracles [Goldberg and Nivre, 2013];

(ii) Apply the model to parse Arabic sentences to both CATiB and UD
representations;

(ii) Employ multitask modeling insights from Peng et al. [2017; 2018];
and Kurita and Søgaard [2019] to enhance the multitask easy-first
parser.


References

    Peng, Hao, Sam Thomson and Noah A. Smith. "Deep Multitask Learning
    for Semantic Dependency Parsing." ACL (2017).

    Peng, Hao, Sam Thomson, Swabha Swayamdipta and Noah
    A. Smith. "Learning Joint Semantic Parsers from Disjoint Data."
    NAACL-HLT (2018).

    Kurita, Shuhei and Anders Søgaard. "Multi-Task Semantic Dependency
    Parsing with Policy Gradient for Learning Easy-First Strategies."
    ACL (2019).

    Nizar Habash and Ryan M. Roth. "CATiB: The Columbia Arabic
    Treebank." Proceedings of Annual Meeting of the Association for
    Computational Linguistics, 2009.

    Dima Taji, Nizar Habash, and Daniel Zeman. "Universal Dependencies
    for Arabic." Proceedings of the Workshop on Arabic Natural
    Language Processing (with EACL), 2017.

    Yoav Goldberg and Michael Elhadad. 2010. An efficient algorithm
    for easy-first non-directional dependency parsing. In Human
    Language Technologies: NAACL, pages 742-750, Los Angeles,
    California.

    Eliyahu Kiperwasser and Yoav Goldberg. 2016. Easy-first dependency
    parsing with hierarchical tree LSTMs. Transactions of the
    Association for Computational Linguistics, 4, 445-461.

    Mathieu Constant, Joseph Le Roux, Nadi Tomeh. Deep Lexical
    Segmentation and Syntactic Parsing in the Easy-First Dependency
    Framework. NAACL, 2016, San Diego, United States.