The LabEx EFL is hiring a Post-Doc researcher on the topic of
"Distantly Supervised Relation Extraction for Scientific Texts''. The
post will be supported by the "Laboratoire d'Excellence" Empirical
Foundations of Linguistics (LabEx EFL, http://www.labex-efl.org/ ), in
the context of an LabEx axis 5 collaboration between the LIPN
(http://lipn.univ-paris13.fr/en/laboratory), RCLN team "Représentation
des Connaissances et Langage Naturel" and the ERTIM "Équipe de
Recherche Textes, Informatique, Multilinguisme" (http://www.er-tim.fr)
research labs. These partners have already conducted several
experiments on unsupervised knowledge extraction from scientific
papers [7,8,9]. This post-doc is a follow-up of this collaboration.

Context:

Semantic Relation Extraction (RE) is a central task in identifying
domain-specific knowledge in text and structuring it into knowledge
bases. In general, a semantic relationship is coded as a triple
(entity_1, r, entity_2) where the two entities are linked by a
relation r. Currently, most of the systems that are used to carry out
this task are based either on unsupervised or supervised paradigms,
which have both advantages and disadvantages. Unsupervised methods
usually rely on hand-based patterns that may have a very good
precision but limited coverage. The patterns themselves could be
easier to define for some relations and more difficult for
others. Supervised methods usually obtain a better overall score (in
terms of balance between accuracy and coverage) but they require
annotated data, which are expensive and slow to produce. In previous
work, we explored the scope and advantages of these paradigms [8,
11]. We found that while the two methods have complementary strengths,
hybridation techniques allow to improve their performance. These
experiments were performed on the ACL-RelAcs [7] corpus of scientific
papers in NLP. The dataset was also exploited for a SemEval evaluation
campaign in supervised scientific information extraction [10]. A
methodology that does not present the problem of manual intervention,
either for composing rules or for annotating data, is the so-called
Distant Supervision (DS). With DS, any text containing the couple of
entities to be linked can constitute a training example [13]. Recently
DS has been the focus of various works which highlighted its
effectiveness, especially when paired with deep learning methods
[14,15,16].

Our research work on relation extraction in scientific text has
highlighted the difficulty of the RE task in this specific domain. The
difficulties derive from various factors: the fact that entities are
not "named entities" like in other Knowledge Bases, the fact that the
entities can appear as subject or object in different relations, and
the way in which relations are expressed: sometimes these can span
various sentences, or be formulated in very different ways. Examples
of such relations are "used by", "applied to", ...,
"improves"... etc. In our previous work [12] we had to combine various
extractors to compensate for their deficiencies, taken individually,
in order to obtain a good enough accuracy in scientific RE. We believe
that Distant Supervision could help to improve the extraction process
and eventually replace the ensemble extractors.  The PostDoc will
review the existing state of the art in the domain of Distantly
Supervised Relation Extraction and in collaboration with the team will
work towards the definition of a Distantly Supervised methodology for
RE in scientific text.

Conditions:

Salary between 2100 and 2300¤ /month (net)

Selection Criteria:

- PhD in Computer Science

- Experience and/or interest in:

- Natural Language Processing  

- Text Mining and Machine Learning 

- Knowledge Engineering, Semantic Web

- Good scientific writing skills

- Python programming, knowledge of PyTorch

Duration: 12 months (between LIPN and ERTIM)

Start: from September 2022

Notice: the first  interviews will be carried out on the afternoon of the 29/06/2022

 
The candidates should send to Davide Buscaldi (davide.buscaldi@lipn.univ-paris13.fr) and Kata Gábor (kata.gabor@inalco.fr):

 
- a detailed CV (with a list of publications)

- a cover letter

- the names and e-mails of two referees


Bibliography

[1] Agirre E., Olatz A., Hovy E.H., Martinez D. (2000) Enriching very large ontologies using the WWW. In ECAI Workshop on Ontology Learning.

[2] Chavalarias, D. and Cointet, J.-P. (2013). Phylomemetic patterns in science evolution - the rise and fall of scientific fields. PLOS ONE, 8(2).

[3] Fabian M. Suchanek, Mauro Sozio,Gerhard Weikum (2009). Sofie: A self-organizing framework for information extraction. In WWW conference, pp. 631- 640.

[4] Bunescu and Mooney (2005). A shortest path dependency kernel for relation extraction. In Proceedings of Empricial Methods in Natural Language Processing, EMNLP '05, p.724-731. 

[5] Auger, A., & Barrière, C. (2008). Pattern-based approaches to semantic relation extraction: A state-of-the-art. In Terminology, 14(1), pp. 1-19.

[6] Nicolas Béchet, Peggy Cellier, Thierry Charnois, Bruno Crémilleux (2012). Discovering Linguistic Patterns Using Sequence Mining. In CICLing 2012. pp. 154-165

[7] Gábor K., Zargayouna H., Buscaldi D., Tellier I., Charnois T. (2016) : Semantic Annotation of the ACL Anthology Corpus for the Automatic Analysis of Scientific Literature, LREC, Portoroz (Slovenia).

[8] Gábor K., Zargayouna H., Buscaldi D., Tellier I., Charnois T. (2016) : Unsupervised Relation Extraction in Specialized Corpora Using Sequence Mining, Advances in Intelligent Data Analysis XV (IDA 2016), LNCS 9897, p.237-248, Stokholm (Sweden).

[9] Gábor K., Zargayouna H., Tellier I., Buscaldi D., Charnois T. (2016) : A Typology of Semantic Relations Dedicated to Scientific Literature Analysis. SAVE-SD Workshop at the 25th World Wide Web Conference. 

[10] Gábor K., Buscaldi D., Schumann A-K., QasemiZadeh B., Zargayouna H., Charnois T.: Semeval-2018 Task 7: Semantic Relation Extraction and Classification in Scientific Papers. In Proceedings of the 12th International Workshop on Semantic Evaluation (SemEval-2018), New Orleans, USA. 


_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora@uib.no
https://mailman.uib.no/listinfo/corpora