*11-month post-doc position at IECL and ATILF, University of Lorraine (Nancy, France) * ============================================= Subject: Information retrieval for Medical Scientific publications * Advisors: - M. Constant (U. of Lorraine). Website : https://perso.atilf.fr/mconstant/ - M. Clausel (U. of Lorraine). Website : https://sites.google.com/site/marianneclausel/ - R.S. Stoica (U. of Lorraine). Website : https://sites.google.com/site/radustefanstoica/ * Other partners of the project : C. Francois (INIST), P. Oudet (Cancéopôle Est), F. Schaffner (Cancéropôle Est), N. Thouvenin (INIST). * Keywords: Natural language processing, word embeddings, biomedical text mining, graph matching ================================================= # Context: The Cancéropôle Est is one of the 7 Cancéropôles created by the first national cancer action in 2003. Its missions are organizing, coordinating, and strengthening research against cancer in partnership with academic and clinical institutions by associating researchers, healthcare professionals, industrials and patients. The aim of the project is to establish a cartography of the scientific research in Oncology in the two French Regions Grand Est and Bourgogne Franche Comté using the full text of scientific publications of each research team in the two regions. # Description of the position: This position is funded by AMIES, University of Lorraine and Canceropôle Est. With this position, we would like to investigate the use of text mining technics to extract characteristics related to the scientific content of the publications of each research team in Grand Est and Bourgogne Franche Comté. The recruited person will work on the following points: - Preprocessing of the data. The data will be provided by the Cancéropole Est and will consist of several full texts in xml or pdf format. - Learning of oncology embeddings (see for e.g. [1]). INIST will provide training data to learn the embeddings, and ontology - Extraction of characteristics related to the scientific content of publications for each research team. - Combination of these characteristics and the collaboration graph of each team (see for e.g. [2]) to provide general characteristics for each team - Integration in a vizualisation tool The recruited person will benefit from the expertise of Canceropôle Est, INIST and University of Lorraine in natural language processing, text mining and statistical learning. # Candidate skills We would ideally like to recruit a 11 month post-doc with the following preferred skills: - Knowledgeable in natural language processing, text mining and word embeddings - Knowledgeable in machine learning - Good programming skills in Python (classical NLP librairies, scikit-learn, Pytorch and/or Tensor Flow) - Very good English (understanding and writing) # Application The candidates should send a CV, two names of referees and a cover letter to the researchers mentioned above (Mathieu.Constant@univ-lorraine.fr, marianne.clausel@univ-lorraine.fr, radu-stefan.stoica@univ-lorraine.fr). The selected candidates will be interviewed in February for an expected start in March/April 2019. Bibliography [1] J. Lee et al. BioBERT: a Pre Trained Biomedical Language Representation Model for Biomedical Text Mining. Ed. Jonathan Wren. Bioinformatics (2019). [2] Q. Laporte-Chabasse et al. Morpho-statistical description of networks through graph modelling and Bayesian inference. Preprint 2019