Title: Multitask Learning of Easy-first Hierarchical Tree LSTMs for Joint Syntactic and Semantic Arabic Dependency Parsing Context: Collaboration between RCLN (https://lipn.univ-paris13.fr/accueil/equipe/rcln/), LIPN, Université Paris 13, and CAMeL Lab (https://bit.ly/2M0XsAG), New York University Abu Dhabi Host lab: LIPN, Université Paris 13, 99 Avenue Jean Baptiste Clément, 93430 Villetaneuse Supervisors: Joseph Le Roux and Nadi Tomeh Collaborators: Nizar Habash and Dima Taji Start date: February 2020 Duration: 6 months Salary: 550 euros/month Profile and required skills: - Masters in Computer Science, Computational Linguistics, Applied Mathematics, or Statistics - Knowledge in Natural Language Processing and Deep Learning is highly appreciated - Programming skills in Python (and libraries such as pytorch, numpy, or scikit-learn) How to apply: send CV and available Masters' grades to tomeh@lipn.fr and leroux@lipn.fr Description: In recent work on semantic parsing, Peng et al. [2017; 2018]; and Kurita and Søgaard [2019] showed that the overlap between three different theories of semantics and their corresponding representations can be exploited to improve performance on all three tasks. This is done using multitask learning in a deep neural architecture. We would like to explore ways in which this approach can be applied to Arabic, which has rich morphology and complex morpho-syntactic interactions. We will work with two different dependency representations. The first is the Columbia Arabic Treebank (CATiB) representation [Habash and Roth, 2009], which is inspired by Arabic traditional grammar and which focus on modeling syntactic and morpho-syntactic agreement and case assignment. The second is the Universal Dependency (UD) representation for Arabic [Taji et al., 2017], which has relatively more focus on semantic/thematic relations within the sentence, and which is coordinated in design with a number of other languages [Nivre et al., 2016]. The two representations complement each other and stand to benefit from multitask learning approaches. In this context, we propose to (i) Extend the easy-first hierarchical LSTM parser of Kiperwasser and Goldberg [2016] to multitask settings. We have shown that this approach can be useful for joint lexical segmentation and dependency parsing [Constant et al., 2016]. In that work we used as our single-task model the easy-first parser of Goldberg and Elhadad [2010] trained with dynamic oracles [Goldberg and Nivre, 2013]; (ii) Apply the model to parse Arabic sentences to both CATiB and UD representations; (ii) Employ multitask modeling insights from Peng et al. [2017; 2018]; and Kurita and Søgaard [2019] to enhance the multitask easy-first parser. References Peng, Hao, Sam Thomson and Noah A. Smith. "Deep Multitask Learning for Semantic Dependency Parsing." ACL (2017). Peng, Hao, Sam Thomson, Swabha Swayamdipta and Noah A. Smith. "Learning Joint Semantic Parsers from Disjoint Data." NAACL-HLT (2018). Kurita, Shuhei and Anders Søgaard. "Multi-Task Semantic Dependency Parsing with Policy Gradient for Learning Easy-First Strategies." ACL (2019). Nizar Habash and Ryan M. Roth. "CATiB: The Columbia Arabic Treebank." Proceedings of Annual Meeting of the Association for Computational Linguistics, 2009. Dima Taji, Nizar Habash, and Daniel Zeman. "Universal Dependencies for Arabic." Proceedings of the Workshop on Arabic Natural Language Processing (with EACL), 2017. Yoav Goldberg and Michael Elhadad. 2010. An efficient algorithm for easy-first non-directional dependency parsing. In Human Language Technologies: NAACL, pages 742-750, Los Angeles, California. Eliyahu Kiperwasser and Yoav Goldberg. 2016. Easy-first dependency parsing with hierarchical tree LSTMs. Transactions of the Association for Computational Linguistics, 4, 445-461. Mathieu Constant, Joseph Le Roux, Nadi Tomeh. Deep Lexical Segmentation and Syntactic Parsing in the Easy-First Dependency Framework. NAACL, 2016, San Diego, United States.