Starting date: February or March 2024 (flexible) Place of work: Laboratoire d'Informatique de Grenoble, CNRS, Grenoble, France Duration: 6 months Keywords: natural language processing, speech processing, dependency parsing, treebanks Contact: maximin.coavoux@univ-grenoble-alpes.fr Description: The SynPaX ANR project (Syntactic Parsing of Spontaneous speech with cross-modal transfer learning) aims at investigating modal fusions systems to develop end-to-end speech parsing models, i.e. systems that work directly on speech signal to perform parsing, instead of using only (potentially noisy) speech transcriptions as input. The two main bottlenecks in parsing speech are: (i) the scarcity of the available training data (ii) the difficulty in transcribing spontaneous speech. The internship will focus on (i) and explore data augmentation methods by leveraging both audio corpora without syntactic annotations and existing treebanks for written texts. Tasks: - literature review on speech parsing - designing and implementing data augmentation methods for speech parsing - empirical evaluation of proposed methods Scientific environment: The internship will be conducted within the Getalp team of the LIG laboratory (https://lig-getalp.imag.fr/). The GETALP team has a strong expertise and track record in Natural Language Processing. The recruited person will be welcomed within the team which offer a stimulating, multinational and pleasant working environment. The internship will be supervised by Maximin Coavoux, Adrien Pupier and Benjamin Lecouteux. The SynPaX ANR project will also fund a PhD scholarship starting in Fall 2024 to work on extensions of this internship. Profile: - be enrolled in a Master (M2) in NLP, computational linguistics or computer science - background in NLP and/or speech processing - proficiency in Python (transformers, pytorch) - Good communication skills in English, rudimentary knowledge of French is also expected since the internship will focus on French data. How to apply: - please send CV + cover letter + recent (last ~2 years) academic transcripts to maximin.coavoux@univ-grenoble-alpes.fr References: End-to-End Dependency Parsing of Spoken French (Interspeech 2022) Adrien Pupier, Maximin Coavoux, Benjamin Lecouteux, Jérôme Goulian https://hal.science/hal-03713551/ Wave to Syntax: Probing spoken language models for syntax Gaofei Shen, Afra Alishahi, Arianna Bisazza, Grzegorz Chrupala https://arxiv.org/abs/2305.18957