The LIG (Laboratoire d'Informatique de Grenoble) and LIA (Laboratoire
d'Informatique d'Avignon) laboratories propose the following M2 stage
(research)

Title:
Dialog-Level Neural Spoken Language Understanding from Speech

Description:

Spoken Language Understanding (SLU) is an important part of
Human-Computer interaction, and aims at extracting semantic
interpretations from human utterances [De Mori et al., 2008]. Because of
the high complexity of the problem, real applications focus on specific
domains, e.g. hotel reservation and information [Bonneau-Maynard et al.,
2006]. Most of the times, SLU is performed on automatic transcriptions
of the speech signal or, at best, on ASR word lattices. Thanks to neural
networks, SLU can be possibly performed directly on speech signal,
overcoming or at least alleviating the problems related to automatic
transcription. Such end-2-end approaches from speech have been already
proposed for spoken language translation [Bérard et al., 2018, Berard et
al., 2016, Weiss et al., 2017]. Additionally, the use of Neural Networks
such like RNNs (LSTM/GRU) [Hochreiter and Schmidhuber, 1997, Cho et al.,
2014] and Transformers [Vaswani et al., 2017], in combination with
attention mechanisms [Bahdanau et al., 2014], allows potentially to use
contextual information going beyond the single or a few dialog turns
[Bothe et al., 2018]. This information is possibly crucial to solve
long-range ambiguïties.

In this internship the student will implement a complete neural SLU
system, decoding semantic interpretations directly from the speech
signal and keeping into account contextual information at the whole
dialog level. The student will use modular pre-built systems based on
Convolutional and Recurrent Neural Networks [Berard et al., 2018,
Dinarelli et al., 2017] and/or Transformer networks, with the objective
of creating a whole integrated SLU system. The student will run
experiments on its own using GPUs, and the system will be evaluated on
the SLU benchmark corpus MEDIA [Bonneau-Maynard et al., 2006, Hahn et
al., 2010].

Student Profile:
- Student for internship level (Master 2) in computer science, or from
  engineering school
- Computer science skills:
     - Python programming with good knowledge of deep learning libraries
       (Pytorch)
     - Data manipulation (both textual data and audio signal)
- Interested in Natural Language Processing
- Skills in machine learning for probabilistic models

The internship may last from 4 up to 6 months, it will take place at LIG
laboratory (with potential visits at LIA, Avignon), GETALP team
(http://lig-getalp.imag.fr/), starting from January/February 2019. The
student will be tutored by Marco Dinarelli
(http://www.marcodinarelli.it), Laurent Besacier
(https://cv.archives-ouvertes.fr/laurent-besacier), and Bassam Jabaian
(http://univ-avignon.fr/m-bassam-jabaian--3265.kjsp).

Interested candidates must send a CV and a motivation letter to
marco.dinarelli@ens.fr, laurent.besacier@univ-grenoble-alpes.fr, and
bassam.jabaian@univ-avignon.fr.


References:

Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio.  Neural machine
translation by jointly learning to align and translate.  CoRR,
abs/1409.0473, 2014. URL http://arxiv.org/abs/1409.0473.

Alexandre Berard, Olivier Pietquin, Christophe Servan, and Laurent
Besacier.  Listen and translate: A proof of concept for end-to-end
speech-to-text translation.  CoRR, abs/1612.01744, 2016. URL
http://arxiv.org/abs/1612.01744.

Alexandre Béerard, Laurent Besacier, Ali Can Kocabiyikoglu, and Olivier
Pietquin.  End-to-end automatic speech translation of audiobooks.  CoRR,
abs/1802.04200, 2018. URL http://arxiv.org/abs/1802.04200.

Hélène Bonneau-Maynard, Christelle Ayache, F. Bechet, A Denis, A Kuhn,
Fabrice Leféevre, D. Mostefa, M. Qugnard, S. Rosset, and J. Servan,
S. Vilaneau.  Results of the french evalda-media evaluation campaign for
literal understanding.  In LREC, pages 2054{2059, Genoa, Italy, May
2006.

Chandrakant Bothe, Sven Magg, CorneliusWeber, and StefanWermter.
Conversational analysis using utterance-level attention-based
bidirectional recurrent neural networks. CoRR, abs/1805.06242, 2018. URL
http://arxiv.org/abs/1805.06242.

Kyunghyun Cho, Bart van Merrienboer, Caglar Gulcehre, Fethi Bougares,
Holger Schwenk, and Yoshua Bengio.  Learning phrase representations
using RNN encoder-decoder for statistical machine translation.  CoRR,
abs/1406.1078,2014. URL http://arxiv.org/abs/1406.1078.

R. De Mori, F. Bechet, D. Hakkani-Tur, M. McTear, G. Riccardi, and G.
Tur.  Spoken language understanding: A survey. IEEE Signal Processing
Magazine, 25:50-58, 2008.