Postdoc position at MISTEA, INRAE Montpellier, France Semantic Web, Data linking *Areas:* Semantic Web, Linked Data, Data linking, Representation learning *Qualifications:* PhD in Informatics, AI. Background in knowledge engineering. *Context:* ANR DACE-DL (DAta-CEntric AI-driven Data Linking) https://anr.fr/Projet-ANR-21-CE23-0019 *Contact & Collaboration:* *Danai Symeonidou, danai.symeonidou@inrae.fr* *Clement Jonquet, clement.jonquet@inrae.fr* *Dates:Position available for 2 years. Beginning date is flexible. * *Location:INRAE, Centre Occitanie-Montpellier, MISTEA research unit* https://www6.montpellier.inrae.fr/mistea/ *Salary:* Between 2200¤ and 2700¤ gross monthly depending on qualifications and situation. *Institut:* INRAE is the French research organization in agriculture, food and environmental sciences; it is a pioneer in France in terms of data sharing and Open Science commitment. The MathNum research department gathers around 200 scientists in mathematics and digital technologies in 13 research units in France. MISTEA is a joint research unit of INRAE and Montpellier Institut Agro engineering school with activities in the development of mathematical, statistical and informatics methods dedicated to analysis and decision support for agronomy and environment. The team is also recognized for its expertise in knowledge engineering and ontology-based scientific data management and information systems. *Project context:* Data linking is the scientific challenge of automatically establishing typed links between the entities of two or more structured datasets. A variety of complex data linking systems exists, evaluated on public benchmarks [1,2,3]. While they have allowed for the generation of vast amounts of linked data in the context of various dedicated projects, data generic systems often have limited applicability in many real-world scenarios, where data are highly heterogeneous and domain-specific. The ANR project DACE-DL (2022-2024) targets a paradigm shift in the data linking field with a data-centric bottom-up methodology relying on machine learning and representation learning models [4]. We hypothesize there exists a finite number of identifiable and generalisable linking problem types (LPTs), that we need to categorize and analyze to provide better linking results. *Topic:* The postdoc will work to identify and provide a categorisation/taxonomy of the different linking problem types based on an in-depth analysis of the linked datasets provided by the project and beyond. The first objective is to provide an in-depth analysis of the linked data available along with an exhaustive study of the state-of-the-art in the field of data linking. A finite number of generalisable linking problem types will be classified including the relations and inherent structure of the LPTs made explicit to both human and machine. The goal is to answer questions such as: are certain LPTs or groups of LPTs (e.g. siblings at a given level of the taxonomy) specific to a domain, language or a community? Are certain LPTs inherent to specific types of data? Once a formal taxonomy of LPTs is produced, various datasets will be manually annotated. These annotations on existing pairs of datasets will be used to learn, using machine learning strategies, features for the automatic categorization of other datasets. The postdoc will co-supervise a PhD student working on the machine learning methods. *Application: Send application to the contact emails including:* - *a short description of introducing yourself * - *your adequacy to the position * - *a CV and * - *one major publication* *References* [1] M. Nentwig, (...) E. Rahm. A survey of current link discovery frameworks, Semantic Web, 2017. [2] Euzenat, J., (...), Trojahn, C. Ontology matching benchmarks: generation, stability, and discriminability. Web Semantics, 2013. [3] Zhou, L, (...), Trojahn, C., Zamazal, O: Towards evaluating complex ontology alignments. Knowl. Eng. Rev., 2020. [4] Todorov, K. Datasets First! A Bottom-up Data Linking Paradigm. ISWC 2019 Satellite Tracks, Auckland, New Zealand, October 26-30, 2019.*