Testing Multilingual Language Representations for Parsing Language diversity remains a challenge for NLP. Multilingual distributional representations (multilingual word embeddings) are known to be efficient to address different languages simultaneously, but other approaches have also been proposed like language transfer [1]. These techniques have been applied to many different NLP areas such as morphological analysis, NER, Machine Translation and Dependency parsing [2, 3, 4]. This internship is centered around a model-transfer dependency parsing approach using multilingual feature representations previously developed at LATTICE. This system obtained state-of-the art results during the CoNLL shared task 2017 (see http://universaldependencies.org/conll17/ ). The goal of the internship is to try to get a better knowledge of the analysis process and, more specifically of the multilingual approach. Neural nets are generally used as a black box but it is also advisable to dive into the representations used and test for example the quality of the multilingual word embeddings (have the different languages been correctly aligned?). Different parameters (size of the training corpora, dictionary, etc) will be tested to see how they affect the result. The exact content of the internship will depend on the interest and skills of the student. * Requirements are: - excellent programming skills - some knowledge of neural networks - excellent English (both spoken and written, for this internship we will use English as the main communication language) * Duration and conditions 3 to 5 months, Spring 2018. Normal working conditions (indemnités de stage + transport) * How to apply? Send a CV, a recent grade statement and a short email explaining in a few words your motivation to Thierry Poibeau (thierry.poibeau@ens.fr) and Kyungtae Lim (kyungtae.lim@ens.fr). The position is opened until filled (but applications after Feb 2018 will not be considered). References [1] Stanford CS224n: Natural Language Processing with Deep Learning (http://web.stanford.edu/class/cs224n/lecture_notes/cs224n-2017-notes2.pdf) [2] Joulin, Armand, et al. "FastText. zip: Compressing text classification models." arXiv preprint arXiv:1612.03651 (2016). (https://github.com/facebookresearch/fastText) [3] Mikel Artetxe, Gorka Labaka, and Eneko Agirre. 2017. Learning bilingual word embeddings with (almost) no bilingual data. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 451-462. [4] Lim, KyungTae, and Thierry Poibeau. "A system for multilingual dependency parsing based on bidirectional LSTM feature representations." Proceedings of the CoNLL 2017 Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies(2017): 63-70.