CEA List, a research institute of Paris-Saclay University, is looking
for a Postdoctoral Fellow to join its laboratory of semantic analysis
of texts and images.

In the context of the DeepGenSeq project, the person hired will
integrate an interdisciplinary team aiming to move closer to the goal
of predictive and generative artificial intelligence for biology by
exploiting deep contextual language models of biological sequences,
which representations generalize to several applications like the
prediction of mutational effects.

BACKGROUND
Exponential growth in sequencing throughput together with the sampling
of natural (uncultured) populations are providing a deeper view of the
diversity of proteins sequences across the tree of life. Proteins are
molecular engines sustaining cellular life and the unobserved
determinants of their structure and function are encoded in the
distribution of observed natural sequences. Therefore, such vast
amounts of (unlabeled) sequences provide evolutionary data that can
form the ground for unsupervised learning of predictive and generative
models of biological function.

Recent advances in machine learning, with the development of the
transformer architecture, have allowed the emergence of powerful
language models that can be used to model proteins sequences. Through
transfer learning, the learned representations can be used to detect
homology (i.e. the relatedness between two protein sequences), predict
secondary and tertiary structures, predict residue-residue contacts or
predict fluorescence landscape.

CHALLENGES AND OBJECTIVES
Our focus here will be to develop high-capacity transformer-based
language models on protein sequence data. Intrinsic organizing
principles captured in the resulting representations can then be
applied in transfer learning settings to different predictive sub-tasks
using limited experimental data (e.g. the effect of sequence variation
on protein function). Following promising recent results, we plan to
also explore zero-shot inference with no additional training and/or
supervision from experimental data.

Responsibilities:
-   Tune and optimize existing unsupervised transformer-based language
    models for protein sequences.
-   Develop and optimize code and machine learning algorithms for
    predictive models.
-   Integrate and analyze large data volumes.
-   Interact continuously with scientists in an interdisciplinary team.

APPLICATION
This project will be an excellent opportunity for a candidate who is
looking to contribute to cutting-edge research and to train with
experts in the field. We are seeking here a detail-oriented computer
scientist and problem solver passionate in science. This 2 years
position is open to a range of candidates from recent college graduates
to more experienced scientists (e.g. post-docs)
The ideal candidate should have the following qualifications:

-   Ph.D. or M.Sc. in Applied Mathematics, Computer Science, or
    Computational Biology.
-   Experience in Deep Learning methods.
-   Experience with Python, open-source software libraries for machine
    learning and Linux.
-   Strong mathematical background and analytical skills.
-   Effective organizational skills, e.g. the ability to prioritize
    work and contribute to the planning of a program of scientific
    research.
-   Demonstrated interpersonal skills including both the ability to
    work independently and perform collaborative research in an
    interdisciplinary team environment.
-   Good oral and written communication skills.

Preferred: Previous experience with transformer-based techniques for
NLP pre-training and transformer language models

TERMS & COMPENSATION
This 2 years position is open to a range of candidates from recent
college graduates to more experienced scientists (e.g. post-docs) - the
chosen candidate's salary will be commensurate with their level of
education, skills, and experience. Other benefits include:
-   48 days of paid holidays
-   on-site subsidized restaurant
-   partial remote work is possible, up to 3 days per week within the
    limit of 100 days per year
-   CEA contribution to the personal company savings plan


LOCATION
We are based on the Paris-Saclay research campus in the south of Paris,
France.

HOW TO APPLY
Interested candidates should submit a resume and short cover letter to
deep genseq Ťatť saxifrage.saclay.cea.fr

ABOUT US
About CEA-List: https://list.cea.fr/en/

About the LASTI lab:
https://kalisteo.cea.fr/index.php/ai/
https://kalisteo.cea.fr/index.php/textual-and-visual-semantic/

About Genoscope: https://www.genoscope.cns.fr