Title: Postdoctoral Researcher in Artificial Intelligence and Natural
Language Processing (SCAI/BnF research program)

Body: 12-month postdoctoral contract, renewable)

Attachment: UMR 7222 ISIR

Keywords: machine learning, explainability, databases, computer
science, applied mathematics, statistics, natural language processing,
recommendation

Who are we?

Sorbonne University is a multidisciplinary research university created
on January 1, 2018 by merging the universities Paris-Sorbonne and
UPMC. Deploying its training to 54,000 students including 4,700
doctoral students and 10,200 foreign students, It employs 6,300
teachers, teacher-researchers and researchers and 4,900 library,
administrative, technical, social and health staff. Its budget is 670
M¤. Sorbonne University has a first-rate potential, mainly located in
the heart of Paris, and extends its presence in more than twenty sites
in Île-de-France and in the regions. Sorbonne University is organized
into three faculties: Humanities, Science & Engineering and
Medicine. These faculties have significant autonomy to implement the
university's strategy within their own boundaries, based on a contract
of objectives and resources. University governance is primarily
devoted to promoting the university's strategy, steering, developing
partnerships and diversifying resources.

Presentation of the project

In a national and international context marked by competition around
artificial intelligence, Sorbonne University has created the "Sorbonne
Center for Artificial Intelligence" (SCAI), which brings together in a
single location, located in the heart of the Latin Quarter, a
strategic range of disciplines in modern artificial intelligence. The
ambition of SCAI is to contribute significantly to the excellence of
interdisciplinary research in artificial intelligence by promoting
exchanges between professors, researchers, teachers, students and
industrialists.

The research project described below is part of the strategic
partnership between Sorbonne University and the BnF, which brings
together the expertise of the MLIA team of ISIR at the BnF in order to
develop a joint research work on the subject of recommender systems.

The Bibliothčque nationale de France (BnF) is one of the largest
heritage libraries in the world. Its mission is to collect, catalog,
preserve, enrich and communicate the national documentary
heritage. For many years now, BnF has been involved in ambitious
digitization programs for its collections, to which we can now add the
massive entry of natively digital collections. BnF is constantly
enriching its digital heritage, the mass, diversity and rate of growth
of which require new processing and consultation tools. To enable as
many people as possible to discover and appropriate this heritage, BnF
has been involved in artificial intelligence (AI) technologies for
several years.

Main activities

Gallica, the digital library of the BnF, contains nearly 10 million
digitized documents that are freely accessible online (18.5 million
visits per year). However, most users do not know that Gallica
contains not only printed documents, but also photographs, sound
recordings, videos, and 3D objects. In satisfaction surveys, only a
minority of users consider the search engine's answers to be relevant
and a majority would like to be better guided in their searches. A
recommendation system should be able to help users find their way
through the mass of collections and improve the visibility of the
least known. In this project, BnF is committed to adopting a
resolutely ethical approach. The exploitation of user logs must
respect their privacy and guarantee both the relevance and
transparency of the algorithms, avoiding the risk of filter
bubbles. The interface design is also at the heart of the approach: a
trustworthy system relies on a good user experience and on the
diversity and relevance of the proposed recommendations. Three lines
of thought emerge:

1) based on the available data, including both user logs and
collection descriptions, how to develop predictive algorithms?

2) how to integrate diversity in the recommendation algorithm while
leaving the choice to the user to moderate his serendipity threshold?

3) how to build user trust in algorithm design and audit?

Main missions

This project consists in working on information access in the Gallica
library, from the point of view of machine and deep learning
techniques. The research axes concern (1) the analysis and indexing of
textual documents as well as (2) the analysis of user traces and (3)
recommendation systems. We are particularly interested in multimodal
techniques that allow contextualizing a document or a query based on
user interactions.

The successful candidate will be responsible for:

- Implementing models to learn the semantics of textual data for the
purpose of indexing them.

- Developing algorithms based on representation learning methodologies
to effectively blend text and user traces.

- Reporting and presenting development work in a clear and effective
manner, both for discussion with BnF experts and writing machine
learning publications.

The printed book collection will be the primary focus of the program
described above, but an extension to other collections with textual
descriptors (in particular iconographic collections) may be
considered.

Education:

A PhD degree in Computer Science or equivalent is required, as well as
a strong scientific record, particularly in NLP and/or Recommender
Systems and/or Information Retrieval. Experience with international
research projects and applications in SHS would be an asset.

General information:

Location: Pierre and Marie Curie campus of Sorbonne University and
Datalab of the BnF

Contract: 12-month fixed-term contract with the possibility of an extension

Expected hiring date: as soon as possible

Workload: full time

Desired experience: 1 to 3 years

Salary according to experience

Main contacts:

Laure Soulier, MCF in computer science at Sorbonne University, MLIA
team, ISIR.

Emmanuelle Bermčs, Scientific and Technical Assistant to the Director
of Services and Networks at BnF.

Jean-Philippe Moreux, Scientific expert of Gallica at the BnF.

Supervision: NO

Project management: YES

Knowledge and skills

A strong background in natural language processing or text analysis is
essential, and good programming skills are required. Experience with
recommender systems is assumed. An understanding of the ethical issues
of such systems is also expected.

Language: knowledge of French is not required but is strongly preferred

Applications (CV + motivation + references) should be sent by email to
xavier.fresquet@sorbonne-universite.fr with a copy to
philippe.chevallier@bnf.fr