# NLP Research Engineer: Developing a Toolkit for Learning and Evaluating Knowledge-Aware Word Embeddings ## Context MAGNET and SÉMAGRAMME are Inria research teams, respectively located in Lille and Nancy (France). MAGNET carries out research in Machine Learning in information networks, with an important focus on natural language processing and representation learning. SÉMAGRAMME carries out research in natural language processing and mathematics of language, with a focus on semantics and discourse representation. In this context, we invite applications from highly motivated MSc or PhD holders to pursue research in close collaboration with our permanent research members. Over the recent years, the MAGNET has developed MANGOES (https://gitlab.inria.fr/magnet/mangoes/), a Python-based open-source toolkit for learning and evaluating word embeddings. Compared to existing word embedding softwares, MANGOES was designed from the start to support various methods based both on matrix factorization and deep learning algorithms. Currently supported word representations are standard sparse (P)PMI-reweighted word vectors, SVD-based dense word vectors, as well as word2vec. MANGOES was also designed to offer extra modularity and flexibility in both the pre-processing steps and the definition of contexts, for instance allowing not only word forms, but also lemmas, potentially enriched with POS tags and syntactic dependency information. Furthermore, MANGOES was initially designed to handle multiple languages, though only English and French are currently supported. Finally, Mangoes provides a suite of intrinsic evaluation datasets as well as statistical and visualization tools. ## Job Description The successful candidate will join the MAGNET team lead by Prof. Marc Tommasi and will contribute to the Inria-DFKI joint project IMPRESS ("Improving Embeddings with Semantic Knowledge"). Though the research engineer will be mainly based in Lille, mostly interacting with the members of the MAGNET team, her work will be carried out in collaboration with researchers of the SÉMAGRAMME team and of the German Research Center for Artificial Intelligence (DFKI) in Saarbrücken (Germany) and will include extended stays in both Nancy and Saarbrücken. The overall aim of the project is to investigate the integration of semantic and common sense knowledge into linguistic and multimodal embeddings and its impact on selected downstream tasks (notably, high-level NLP tasks like anaphora and coreference resolution). Additionally we will consider a multilingual extension to handle French, German and English. An essential aspect and outcome of the IMPRESS project will be to significantly extend the functionalities of the MANGOES toolkit in a number of directions, so as to support the research advances made in the project, and to provide the NLP communauty at large with a reference toolkit for learning knowledge-aware embeddings. More specifically, this will include as preliminary tentative extensions: - updating MANGOES with recent transformer-based word representations; - providing interfaces to existing resources for multimodal embeddings such as VisualBERT (Li et al., 2019) anqd MULE embeddings (https://github.com/VisionLearningGroup/MULE); - creating a variety of interfaces for accessing and managing the different lexical and common sense knowledgqe databases (e.g., WordNet, BabelNet, ConceptNet, YAGO), that will serve as inputs to our methods; - integrating state-of-the-art retro-fitting algorithms (e.g., Faruqui et al. 2015; Lengerich et al. 2018) that will serve as baselines to our approaches - extending the evaluation benchmarks with additional intrinsic datasets and extrinsic tasks - providing interfaces to graph node embeddings algorithms implemented in OpenNE (https://github.com/thunlp/OpenNE) ## Job Requirements * A master or PhD degree in Computer Science * Thorough understanding of Machine Learning and Natural Language Processing, ideally with a research experince in the field * Strong programming skills, especially in Python and Pytorch * Knowledge of best practices in software development (prototyping, unit and regression testing, iteratively improvement, versioning, documentation) * Fluent written and verbal communication skills in English ## Contract * Fixed-term, Inria AI engineer contract of 3 years * Starting day: July 2020 ## Applications Applications will be considered until the position is filled. However, you are encouraged to apply early as we shall start processing the applications as and when they are received. Applications, written in English, should be submitted online and should include: * Curriculum Vitae (including your contact address, work experience, publications) * Cover letter indicating your research interests and your motivation * Contact information for at least 2 referees Applications should be sent to: Pascal Denis and Sylvain Pogodalla (firstname.lastname@inria.fr). ## Useful Links * Inria: https://www.inria.fr/fr * DFKI: https://www.dfki.de/web/ueber-uns/standorte-kontakt/saarbruecken/ * MAGNET: https://team.inria.fr/magnet/ * SÉMAGRAMME: https://team.inria.fr/semagramme/