- Title: Cross-lingual transfer with multi-lingual BERT via linguistically informed fine-tuning - Duration: 5-6 months, during the year 2020 - Location: LIMSI, Orsay (south of Paris) - Supervisor: Caio Corro - http://caio-corro.fr/ - Team: Spoken Language Processing / Traitement Automatique de la Parole - Contact: caio.corro@limsi.fr *Context* Recently, much attention has been paid to large scale pre-training of context-sensitive representations (or context-sensitive word embeddings), in particular ELMO [1] and BERT [2] models. The main idea is to pre-train the first layers of a neural network on a large amount of unlabeled data before fine-tuning the rest of the network on a downstream task. As such, context-sensitive representations allow to lower annotation cost and improve classification performance on a wide range of tasks. The multilingual BERT model pre-trains context sensitive representations on a collection of texts in 104 languages instead of texts in a single language. One question that arises is whether we can use the multilingual BERT model for cross-lingual learning, that is training a model on a subset of these languages (source languages) and testing it on a different subset (target languages). This problem is both important under a research perspective (how can we learn multi-lingual representations of typologically diverse languages?) and under an applied industry perspective (i.e. increase language coverage of NLP-based products at low cost). Previous work observed that cross-lingual transfert based on multi-lingual BERT works best for typological similar languages (i.e. languages with similar word order), which is expected but disappointing [3]. This internship will focus on multilingual dependency parsing with the Universal Dependency treebank https://universaldependencies.org/ . Previous work has considered re-ordering source language sentences with respect to word order in target languages [4]. However, re-ordering is not possible for unsupervised large scale pre-training where syntactic structures is not annotated. A different line of work proposed to force word order statistics at test time using constraints [5], but this method is based on a costly lagrangian optimization procedure and cannot be applied on a per sentence basis. Alternatively, we propose to explore fine-tuning methods for multi-lingual BERT model using a linguistically informed training algorithm, i.e. to use dominant word order information (is the object placed before or after the verb in a given language?) to ensure unsupervised transfer to target languages. *Missions* The successful candidate will develop neural network architectures and training algorithms for cross-lingual generalization of pre-trained context-sensitive representations. The main evaluation task will be cross-lingual dependency parsing. As there are many ways to tackle this problem, the specific approach will be determined by the intern aspiration, which could be for example posterior regularization or latent variable modeling. In a nutshell, the aim is to: - propose a method for cross-lingual generalization of multi-lingual BERT using typological information; - evaluate the proposed method on cross-lingual parsing; - evaluate if results generalize to other tasks, for example cross-lingual named entity recognition. [1] "Deep Contextualized Word Representations" Matthew Peters et al. [2] "BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding" Jacob Devlin et al. [3] "How multilingual is Multilingual BERT?" Telmo Pires et al. [4] "Zero-resource Dependency Parsing: Boosting Delexicalized Cross-lingual Transfer with Linguistic Knowledge" Lauriane Aufrant et al. [5] "Target Language-Aware Constrained Inference for Cross-lingual Dependency Parsing" Tao Meng et al.