The PARSEME-FR (http://parsemefr.lif.univ-mrs.fr/doku.php) project offers a 1.5-year post-doc position in Natural Language Processing, starting in April 2018. Candidates should send their application before February 1st, 2018 (see contact information below). * Duration: 18 months, starting in April 2018 (open until filled) * Location: to be discussed with the members of the PARSEME-FR consortium (Nancy, Orléans or Paris) * Employer: University of Orléans * Contract : fixed term position * Remuneration: approx. 2,300¤ per month net income (in addition to the salary, the contract includes health benefits) ## Topic: **French MultiWord Expressions representation and parsing** Many NLP applications require a fine-grained representation of the syntactic (and sometimes semantic) structure of texts. The process of building such a representation is called deep parsing. Recent work combining symbolic and data-driven techniques have led to significant advances in this field, notably in terms of robustness and efficiency. Still, Multiword expressions (MWE), that is, groups of (not always continuous) words that exhibit some idiosyncratic properties, such as "hot dog", "hard disk", "kick the bucket", "pay attention", etc. are still a major bottleneck for deep parsing (Sag et al. 2001, Baldwin and Kim 2010). This is due, among other things, to their unpredictable behavior at several levels (irregular morpho-syntax, non-compositional semantics, ...) and to the lack of annotated training data. One of the goals of the PARSEME-FR project is to enhance the support of MWEs in French parsing. To do so, 4 work packages have been defined, dealing respectively with (i) MWE annotation in texts or treebanks, (ii) MWE lexicons, (iii) MWE statistical and (iv) symbolic parsing. The recruted post-doc will work in the last WP. Two complementary aspects will be considered: - the representation of MWEs in linguistic resources (including electronic grammars, see e.g. (Abeillé, 2002)), - the use of these MWE-aware resources in deep (symbolic and hybrid) parsing (see e.g. (Foth and Menzel, 2006)). Among existing resources for French, one may cite the FRMG (FRench MetaGrammar) resource which corresponds to a linguistically motivated abstract and modular description of the syntax of French (De La Clergerie, 2010). FRMG has been successfully used to compute deep representations of French texts. The first phase of the postdoc project will consist in extending the expressive power of metagrammars to provide compact representations of MWEs. A second step will consist in extending FRMG with information about MWEs automatically extracted from treebanks (e.g. syntactic or lexical constraints, distribution information, etc.) and from external resources (e.g. lexicon and grammars). This extension of the linguistic description fed to the parser may rise some efficiency issues. Indeed, the larger the size of the input grammar, the larger the size of the parsing search space (due to syntactic and/or lexical ambiguities). To control the exploration of this search space, several techniques have been proposed including A* algorithms for MWEs (Waszczul et al., 2017). The second phase of the postdoctoral project will focus on the extension of existing algorithms dedicated to MWE parsing and their application to the DyALog engine used to run FRMG (De La Clergerie, 2013). ## Profile: * PhD in computer science or computational linguistics * Good knowledge of French and English (not necessarily native) * Interest in linguistics and familiarity with language technology * Capacity to work independently and as part of a team ## Important dates: Application deadline: February 1, 2018 (or until fulfilled) Position starts: April 2018 Duration: 18 months ## Contact information: Enquiries and / or applications should be sent to Yannick Parmentier (yannick.parmentier@loria.fr) and Eric de la Clergerie (eric.de_la_clergerie@inria.fr). Applications should contain an extended CV (mentioning the PhD defense date and the names and contact information of 2 to 3 references) and a cover letter. ## References: Abeillé A. (2002) « Une grammaire électronique du français », CNRS Editions, Paris. Baldwin T. and Kim S. N. (2010) « Multiword Expressions », in Nitin Indurkhya and Fred J. Damerau (eds.), Handbook of Natural Language Processing, Second Edition, CRC Press, Boca Raton, USA, pp. 267-292. De La Clergerie E. (2010) « Building factorized TAGs with meta-grammars », in The 10th International Conference on Tree Adjoining Grammars and Related Formalisms - TAG+10, New Haven, CO, USA, pp. 111-118. De La Clergerie E. (2013) « Improving a symbolic parser through partially supervised learning », in The 13th International Conference on Parsing Technologies (IWPT), Nara, Japan. Foth K. and Menzel W. (2006) « Hybrid parsing: using probabilistic models as predictors for a symbolic parser », in Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics, Sydney, Australia, pp. 321-328. Sag I., Baldwin T., Bond F., Copestake A. and Flickinger D. (2001) « Multiword Expressions: A Pain in the Neck for NLP », in proceedings of CICLing 2002: Computational Linguistics and Intelligent Text Processing, Mexico, pp 1-15. Waszczuk J., Savary A. and Parmentier Y. (2017) « Multiword expression-aware A* TAG parsing revisited », in 13th International Workshop on Tree-Adjoining Grammar and Related Formalisms, Umeå (TAG+13), Sweden, pp. 84-93.