The LIG (Laboratoire d'Informatique de Grenoble) proposes the following
Master 2 level internship:

*Title*: Multi-Task Neural Spoken Language Understanding from Speech

*Description*: Spoken Language Understanding (SLU) is an important part
of Human-Computer Interaction (HCI), and aims at extracting semantic
interpretations from human utterances [De Mori, 2008]. Because of the
high complexity of the problem, most real applications focus on
specific narrow domains, e.g. hotel reservation and information
[Bonneau-Maynard et al., 2005].
Traditionaly, SLU was performed from automatic transcriptions of the
speech signal or, at best, on a word lattices. With the emergence of
Deep Neural Networks (DNN), SLU can be performed directly from speech
signal, overcoming or at least alleviating the problems related to
automatic transcription. Such end-to-end approaches from speech have
been already proposed for spoken language translation [Berard et al.,
2018; Berard et al., 2016; Weiss et al., 2017], and more recently for
E2E SLU [Qian et al., 2017; Serdyuk et al., 2018; Haghani et al., 2018;
Desot et al., 2019; Caubrière et al., 2019].
Additionally, the use of Neural Networks such like RNNs (LSTM/GRU)
[Hochreiter and Schmidhuber, 1997; Cho et al., 2014] and Transformers
[Vaswani et al., 2017], in combination with attention mechanisms
[Bahdanau et al., 2014], allows potentially to use contextual
information going beyond the single or a few dialog turns [Bothe et al.,
2018]. This information is possibly crucial to solve long-range
ambiguïties.

In this internship the student will investigate multi-task learning
using several neural models, decoding semantic interpretations directly
from the speech signal and learning SLU tasks in a multi-task learning
framework.
The student will use modular pre-built systems based on Convolutional
and Recurrent Neural Networks [Berard et al., 2018; Dinarelli et al.,
2020] and/or Transformer networks, with the objective of creating a
whole integrated SLU system. The student will run experiments using the
team GPUs, and the system will be evaluated on the SLU benchmarks
corpora MEDIA [Bonneau-Maynard et al.,2006, Hahn et al., 2010],
PORT-MEDIA [Lefévre et al., 2012] and VOCADOM [Desot et al., 2019]

Profile:

   - Master 2 student level in computer science or NLP
   - Interested in Natural Language Processing
   - Skills in machine learning for probabilistic models
   - Computer science skills:


   1.   Python programming with good knowledge of deep learning
        libraries Pytorch and Fairseq
   2.   Data manipulation (both textual data and audio signal)

The internship may last from 5 up to 6 months, it will take place at
LIG laboratory, GETALP team (http://lig-getalp.imag.fr/), starting from
January/February 2021. The student will be tutored by Marco Dinarelli
(http://www.marcodinarelli.it), and François Portet
(https://lig-membres.imag.fr/portet/home.php)

Interested candidates must send a CV and a motivation letter to
marco.dinarelli@univ-grenoble-alpes.fr and/or françois.portet@imag.fr

Desot, T., Portet, F., and Vacher, M. (2019). Slu for voice command in
smart home: Comparison of pipeline and end-to-end approaches. In 2019
IEEE Automatic Speech Recognition and Understanding Workshop (ASRU),
pages 822-829. IEEE.

Ghannay, S., Caubrière, A., Estève, Y., Camelin, N., Simonnet, E.,
Laurent, A., and Morin, E. (2018). End-to-end named entity and semantic
concept extraction from speech. In 2018 IEEE Spoken Language Technology
Workshop (SLT), pages 692-699. IEEE.

Haghani, P., Narayanan, A., Bacchiani, M., Chuang, G., Gaur, N.,
Moreno, P., Prabhavalkar, R., Qu, Z., and Waters, A. (2018). From audio
to semantics: Approaches to end-to-end spoken language understanding.
In 2018 IEEE Spoken Language Technology Workshop (SLT), pages 720-726.

Qian, Y., Ubale, R., Ramanaryanan, V., Lange, P., Suendermann-Oeft, D.,
Evanini, K., and Tsuprun, E. (2017). Exploring asr-free end-to-end
modeling to improve spoken language understanding in a cloud-based
dialog system. In 2017 IEEE Automatic Speech  Recognition and
Understanding Workshop (ASRU), pages 569-576. IEEE.

Serdyuk, D., Wang, Y., Fuegen, C., Kumar, A., Liu, B., and Bengio, Y.
(2018). Towards end-to-end spoken language understanding. In 2018 IEEE
International Conference on Acoustics, Speech and Signal Processing
(ICASSP), pages 5754-5758.