The LIG (Laboratoire d'Informatique de Grenoble) and LIA (Laboratoire d'Informatique d'Avignon) laboratories propose the following M2 stage (research) Title: Dialog-Level Neural Spoken Language Understanding from Speech Description: Spoken Language Understanding (SLU) is an important part of Human-Computer interaction, and aims at extracting semantic interpretations from human utterances [De Mori et al., 2008]. Because of the high complexity of the problem, real applications focus on specific domains, e.g. hotel reservation and information [Bonneau-Maynard et al., 2006]. Most of the times, SLU is performed on automatic transcriptions of the speech signal or, at best, on ASR word lattices. Thanks to neural networks, SLU can be possibly performed directly on speech signal, overcoming or at least alleviating the problems related to automatic transcription. Such end-2-end approaches from speech have been already proposed for spoken language translation [Bérard et al., 2018, Berard et al., 2016, Weiss et al., 2017]. Additionally, the use of Neural Networks such like RNNs (LSTM/GRU) [Hochreiter and Schmidhuber, 1997, Cho et al., 2014] and Transformers [Vaswani et al., 2017], in combination with attention mechanisms [Bahdanau et al., 2014], allows potentially to use contextual information going beyond the single or a few dialog turns [Bothe et al., 2018]. This information is possibly crucial to solve long-range ambiguïties. In this internship the student will implement a complete neural SLU system, decoding semantic interpretations directly from the speech signal and keeping into account contextual information at the whole dialog level. The student will use modular pre-built systems based on Convolutional and Recurrent Neural Networks [Berard et al., 2018, Dinarelli et al., 2017] and/or Transformer networks, with the objective of creating a whole integrated SLU system. The student will run experiments on its own using GPUs, and the system will be evaluated on the SLU benchmark corpus MEDIA [Bonneau-Maynard et al., 2006, Hahn et al., 2010]. Student Profile: - Student for internship level (Master 2) in computer science, or from engineering school - Computer science skills: - Python programming with good knowledge of deep learning libraries (Pytorch) - Data manipulation (both textual data and audio signal) - Interested in Natural Language Processing - Skills in machine learning for probabilistic models The internship may last from 4 up to 6 months, it will take place at LIG laboratory (with potential visits at LIA, Avignon), GETALP team (http://lig-getalp.imag.fr/), starting from January/February 2019. The student will be tutored by Marco Dinarelli (http://www.marcodinarelli.it), Laurent Besacier (https://cv.archives-ouvertes.fr/laurent-besacier), and Bassam Jabaian (http://univ-avignon.fr/m-bassam-jabaian--3265.kjsp). Interested candidates must send a CV and a motivation letter to marco.dinarelli@ens.fr, laurent.besacier@univ-grenoble-alpes.fr, and bassam.jabaian@univ-avignon.fr. References: Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. Neural machine translation by jointly learning to align and translate. CoRR, abs/1409.0473, 2014. URL http://arxiv.org/abs/1409.0473. Alexandre Berard, Olivier Pietquin, Christophe Servan, and Laurent Besacier. Listen and translate: A proof of concept for end-to-end speech-to-text translation. CoRR, abs/1612.01744, 2016. URL http://arxiv.org/abs/1612.01744. Alexandre Béerard, Laurent Besacier, Ali Can Kocabiyikoglu, and Olivier Pietquin. End-to-end automatic speech translation of audiobooks. CoRR, abs/1802.04200, 2018. URL http://arxiv.org/abs/1802.04200. Hélène Bonneau-Maynard, Christelle Ayache, F. Bechet, A Denis, A Kuhn, Fabrice Leféevre, D. Mostefa, M. Qugnard, S. Rosset, and J. Servan, S. Vilaneau. Results of the french evalda-media evaluation campaign for literal understanding. In LREC, pages 2054{2059, Genoa, Italy, May 2006. Chandrakant Bothe, Sven Magg, CorneliusWeber, and StefanWermter. Conversational analysis using utterance-level attention-based bidirectional recurrent neural networks. CoRR, abs/1805.06242, 2018. URL http://arxiv.org/abs/1805.06242. Kyunghyun Cho, Bart van Merrienboer, Caglar Gulcehre, Fethi Bougares, Holger Schwenk, and Yoshua Bengio. Learning phrase representations using RNN encoder-decoder for statistical machine translation. CoRR, abs/1406.1078,2014. URL http://arxiv.org/abs/1406.1078. R. De Mori, F. Bechet, D. Hakkani-Tur, M. McTear, G. Riccardi, and G. Tur. Spoken language understanding: A survey. IEEE Signal Processing Magazine, 25:50-58, 2008.