The LIG (Laboratoire d'Informatique de Grenoble) proposes the following Master 2 level internship: *Title*: Multi-Task Neural Spoken Language Understanding from Speech *Description*: Spoken Language Understanding (SLU) is an important part of Human-Computer Interaction (HCI), and aims at extracting semantic interpretations from human utterances [De Mori, 2008]. Because of the high complexity of the problem, most real applications focus on specific narrow domains, e.g. hotel reservation and information [Bonneau-Maynard et al., 2005]. Traditionaly, SLU was performed from automatic transcriptions of the speech signal or, at best, on a word lattices. With the emergence of Deep Neural Networks (DNN), SLU can be performed directly from speech signal, overcoming or at least alleviating the problems related to automatic transcription. Such end-to-end approaches from speech have been already proposed for spoken language translation [Berard et al., 2018; Berard et al., 2016; Weiss et al., 2017], and more recently for E2E SLU [Qian et al., 2017; Serdyuk et al., 2018; Haghani et al., 2018; Desot et al., 2019; Caubrière et al., 2019]. Additionally, the use of Neural Networks such like RNNs (LSTM/GRU) [Hochreiter and Schmidhuber, 1997; Cho et al., 2014] and Transformers [Vaswani et al., 2017], in combination with attention mechanisms [Bahdanau et al., 2014], allows potentially to use contextual information going beyond the single or a few dialog turns [Bothe et al., 2018]. This information is possibly crucial to solve long-range ambiguïties. In this internship the student will investigate multi-task learning using several neural models, decoding semantic interpretations directly from the speech signal and learning SLU tasks in a multi-task learning framework. The student will use modular pre-built systems based on Convolutional and Recurrent Neural Networks [Berard et al., 2018; Dinarelli et al., 2020] and/or Transformer networks, with the objective of creating a whole integrated SLU system. The student will run experiments using the team GPUs, and the system will be evaluated on the SLU benchmarks corpora MEDIA [Bonneau-Maynard et al.,2006, Hahn et al., 2010], PORT-MEDIA [Lefévre et al., 2012] and VOCADOM [Desot et al., 2019] Profile: - Master 2 student level in computer science or NLP - Interested in Natural Language Processing - Skills in machine learning for probabilistic models - Computer science skills: 1. Python programming with good knowledge of deep learning libraries Pytorch and Fairseq 2. Data manipulation (both textual data and audio signal) The internship may last from 5 up to 6 months, it will take place at LIG laboratory, GETALP team (http://lig-getalp.imag.fr/), starting from January/February 2021. The student will be tutored by Marco Dinarelli (http://www.marcodinarelli.it), and François Portet (https://lig-membres.imag.fr/portet/home.php) Interested candidates must send a CV and a motivation letter to marco.dinarelli@univ-grenoble-alpes.fr and/or françois.portet@imag.fr Desot, T., Portet, F., and Vacher, M. (2019). Slu for voice command in smart home: Comparison of pipeline and end-to-end approaches. In 2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), pages 822-829. IEEE. Ghannay, S., Caubrière, A., Estève, Y., Camelin, N., Simonnet, E., Laurent, A., and Morin, E. (2018). End-to-end named entity and semantic concept extraction from speech. In 2018 IEEE Spoken Language Technology Workshop (SLT), pages 692-699. IEEE. Haghani, P., Narayanan, A., Bacchiani, M., Chuang, G., Gaur, N., Moreno, P., Prabhavalkar, R., Qu, Z., and Waters, A. (2018). From audio to semantics: Approaches to end-to-end spoken language understanding. In 2018 IEEE Spoken Language Technology Workshop (SLT), pages 720-726. Qian, Y., Ubale, R., Ramanaryanan, V., Lange, P., Suendermann-Oeft, D., Evanini, K., and Tsuprun, E. (2017). Exploring asr-free end-to-end modeling to improve spoken language understanding in a cloud-based dialog system. In 2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), pages 569-576. IEEE. Serdyuk, D., Wang, Y., Fuegen, C., Kumar, A., Liu, B., and Bengio, Y. (2018). Towards end-to-end spoken language understanding. In 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 5754-5758.