Title: Multitask Deep Learning for Joint Syntactic and Semantic Easy-first Dependency Parsing Context: Collaboration between RCLN (https://lipn.univ-paris13.fr/accueil/equipe/rcln/), LIPN, Université Paris 13, and CAMeL Lab (https://bit.ly/2M0XsAG), New York University Abu Dhabi Host lab: LIPN, Université Paris 13, 99 Avenue Jean Baptiste Clément, 93430 Villetaneuse Supervisors: Nadi Tomeh and Joseph Le Roux Collaborator: Nizar Habash, NYU Abu Dhabi Start date: February 2021 Duration: 6 months Stipend: 550 euros/month Profile and required skills: - Masters in Computer Science, Computational Linguistics, Applied Mathematics, or Statistics - Knowledge in Natural Language Processing and Deep Learning is highly appreciated - Programming skills in Python (and libraries such as pytorch, numpy, or scikit-learn) How to apply: send CV, grades, motivation and recommendation letters to tomeh@lipn.fr and leroux@lipn.fr Permalink: https://lipn.univ-paris13.fr/~tomeh/public/uploads/offers/2021-internship-multitask-parsing.pdf Context In recent work on dependency parsing for Arabic (Kankanampati et al. 2020), we proposed a multitask algorithm based on the easy-first hierarchical LSTM parser of Kiperwasser and Goldberg (2016). The multitask algorithm is capable of decoding a sentence into multiple formalisms and is learned from multiple corresponding treebanks. In the experiments, we considered two representations, the first one is the Columbia Arabic Treebank (CATiB) (Habash and Roth, 2009), which is inspired by Arabic traditional grammar and focuses on modeling syntactic and morpho-syntactic agreement and case assignment. The second is the Universal Dependency (UD) treebank for Arabic (Taji et al., 2017), which has relatively more focus on semantic/thematic relations within the sentence, and is coordinated in design with a number of other languages. The multitask system enables sharing representations at various levels of abstraction, and at different time steps of the parsing process, which makes it possible to communicate information across formalisms and to learn when sharing is important and when it is not. The joint system outperforms the single-task baseline on both CATiB and UD treebanks. Propositions We propose to extend the work of Kankanampati et al. (2020) in two ways: (i) The joint parser indirectly learns the order in which to produce the arcs of CATiB and UD trees. In fact, the easiest decision in a local context is selected at each step during parsing. During training, the parser is allowed to explore erroneous arcs to reduce the effect of error propagation, sometimes referred to as the exposure bias problem. This is done by designing an optimal learning policy also known as a dynamic oracle. In our multitask setting, the dynamic-oracle-based training is suboptimal, since the number of arcs allowed by the dynamic oracle during training is large and the order in which they should be predicted is unknown. In the experiments, the parser switches between the two dimensions in about 65% of the time, but we noticed that its performance can be improved by explicitly controlling the switching frequency heuristically. Instead of designing a new dynamic oracle for the multitask parser, we will explore reinforcement learning as a principled framework to learn this kind of sequential decisions to further reduce error propagation, similar to Zhang and Chan (2009). We will however consider a policy gradient approach since it is straightforward to apply in deep learning because it is gradient-based. Policy gradient learning was shown to help transition-based syntactic dependency parsing (Le and Fokkens, 2017), constituency parsing (Fried and Klein, 2018), and semantic dependency parsing (Kurita and Søgaard, 2019). Other approaches to RL such as DQN, Actor/Critic and MaxEnt RL can also be considered. (ii) The current model uses Bi-LSTMs for contextual encoding of lexical and part-of-speech tags and. It also uses a multilayer perceptron for arc and label scoring. We will be exploring options to replace these components with transformer-based encoders and attention mechanisms. The multitask model uses a predefined parameter sharing strategy by specifying which layers have tied parameters. The search for the best sharing architecture considered a few alternatives and compared them on the devset to select the best one. Similar to Yang and Hospedales (2017) and Ruder et al. (2019) we want to consider learning the best sharing strategy in a data-driven way to find the layers or subspaces that benefit from sharing, the appropriate amount of sharing, and the appropriate relative weights of the different task losses. The baseline multitask parser is implemented in Python and will be our starting point: https://github.com/yash-reddy/MEF_parser References - Nizar Habash and Ryan M. Roth. "CATiB: The Columbia Arabic Treebank." ACL (2009). - Lidan Zhang and Kwok Ping Chan. "Dependency Parsing with Energy-based Reinforcement Learning." IWPT (2009). - Eliyahu Kiperwasser and Yoav Goldberg. "Easy-first dependency parsing with hierarchical tree LSTMs." TACL (2016). - Dima Taji, Nizar Habash, and Daniel Zeman. "Universal Dependencies for Arabic." WANLP (2017). - Yongxin Yang, Timothy M. Hospedales. "Trace Norm Regularised Deep Multi-Task Learning." ICLR (2017). - Minh Le and Antske Fokkens. "Tackling Error Propagation through Reinforcement Learning: A Case of Greedy Dependency Parsing". EACL (2017). - Fried, Daniel and D. Klein. "Policy Gradient as a Proxy for Dynamic Oracles in Constituency Parsing." ACL (2018). - Ruder, Sebastian, Joachim Bingel, Isabelle Augenstein and Anders Søgaard. "Latent Multitask Architecture Learning." AAAI (2019). - Shuhei Kurita and Anders Søgaard. "Multi-Task Semantic Dependency Parsing with Policy Gradient for Learning Easy-First Strategies." ACL (2019). - Kankanampati, Yash, Joseph Le Roux, Nadi Tomeh, Dima Taji and Nizar Habash. "Multitask Easy-First Dependency Parsing: Exploiting Complementarities of Different Dependency Representations." COLING (2020).