# NLP Research Engineer: Developing a Toolkit for Learning and
  Evaluating Knowledge-Aware Word Embeddings

## Context

MAGNET and SÉMAGRAMME are Inria research teams, respectively located
in Lille and Nancy (France). MAGNET carries out research in Machine
Learning in information networks, with an important focus on natural
language processing and representation learning. SÉMAGRAMME carries
out research in natural language processing and mathematics of
language, with a focus on semantics and discourse representation. In
this context, we invite applications from highly motivated MSc or PhD
holders to pursue research in close collaboration with our permanent
research members.

Over the recent years, the MAGNET has developed MANGOES
(https://gitlab.inria.fr/magnet/mangoes/), a Python-based open-source
toolkit for learning and evaluating word embeddings. Compared to
existing word embedding softwares, MANGOES was designed from the start
to support various methods based both on matrix factorization and deep
learning algorithms. Currently supported word representations are
standard sparse (P)PMI-reweighted word vectors, SVD-based dense word
vectors, as well as word2vec. MANGOES was also designed to offer extra
modularity and flexibility in both the pre-processing steps and the
definition of contexts, for instance allowing not only word forms, but
also lemmas, potentially enriched with POS tags and syntactic
dependency information. Furthermore, MANGOES was initially designed to
handle multiple languages, though only English and French are
currently supported. Finally, Mangoes provides a suite of intrinsic
evaluation datasets as well as statistical and visualization tools.

## Job Description

The successful candidate will join the MAGNET team lead by Prof. Marc
Tommasi and will contribute to the Inria-DFKI joint project IMPRESS
("Improving Embeddings with Semantic Knowledge"). Though the research
engineer will be mainly based in Lille, mostly interacting with the
members of the MAGNET team, her work will be carried out in
collaboration with researchers of the SÉMAGRAMME team and of the
German Research Center for Artificial Intelligence (DFKI) in
Saarbrücken (Germany) and will include extended stays in both Nancy
and Saarbrücken.

The overall aim of the project is to investigate the integration of
semantic and common sense knowledge into linguistic and multimodal
embeddings and its impact on selected downstream tasks (notably,
high-level NLP tasks like anaphora and coreference
resolution). Additionally we will consider a multilingual extension to
handle French, German and English.

An essential aspect and outcome of the IMPRESS project will be to
significantly extend the functionalities of the MANGOES toolkit in a
number of directions, so as to support the research advances made in
the project, and to provide the NLP communauty at large with a
reference toolkit for learning knowledge-aware embeddings. More
specifically, this will include as preliminary tentative extensions:

- updating MANGOES with recent transformer-based word representations;

- providing interfaces to existing resources for multimodal embeddings
  such as VisualBERT (Li et al., 2019) anqd MULE embeddings
  (https://github.com/VisionLearningGroup/MULE);

- creating a variety of interfaces for accessing and managing the different lexical and common sense knowledgqe databases (e.g., WordNet, BabelNet, ConceptNet, YAGO), that will serve as inputs to our methods;

- integrating state-of-the-art retro-fitting algorithms (e.g., Faruqui
  et al. 2015; Lengerich et al. 2018) that will serve as baselines to
  our approaches

- extending the evaluation benchmarks with additional intrinsic
  datasets and extrinsic tasks

- providing interfaces to graph node embeddings algorithms implemented
  in OpenNE (https://github.com/thunlp/OpenNE)

## Job Requirements

* A master or PhD degree in Computer Science

* Thorough understanding of Machine Learning and Natural Language
  Processing, ideally with a research experince in the field

* Strong programming skills, especially in Python and Pytorch

* Knowledge of best practices in software development (prototyping,
  unit and regression testing, iteratively improvement, versioning,
  documentation)

* Fluent written and verbal communication skills in English


## Contract

* Fixed-term, Inria AI engineer contract of 3 years
* Starting day: July 2020


## Applications

Applications will be considered until the position is filled. However,
you are encouraged to apply early as we shall start processing the
applications as and when they are received.

Applications, written in English, should be submitted online and
should include:

* Curriculum Vitae (including your contact address, work experience,
  publications)

* Cover letter indicating your research interests and your motivation

* Contact information for at least 2 referees

Applications should be sent to: Pascal Denis and Sylvain Pogodalla
(firstname.lastname@inria.fr).


## Useful Links

* Inria: https://www.inria.fr/fr
* DFKI: https://www.dfki.de/web/ueber-uns/standorte-kontakt/saarbruecken/
* MAGNET: https://team.inria.fr/magnet/
* SÉMAGRAMME: https://team.inria.fr/semagramme/