Le LIMSI propose un sujet de stage dont le sujet est détaillé ci-dessous
Si ce sujet vous intéresse, contactez sahar.ghannay@limsi.fr ou
sophie.rosset@limsi.fr.

Title :

Dialogue history integration

Contact :

Master internship at LIMSI CNRS

Sahar Ghannay (ghannay@limsi.fr), Sophie Rosset (rosset@limsi.fr)

Subject :

The proposed internship is about task-oriented dialogue system working
on the cooking domain. This dialogue system handles two different types
of scenarios : (1) the user wants to find a recipe meeting his/her
criteria, and (2) the user asks a question related to the cooking
domain. For the first scenario, the system accesses a database which
contains recipes. For the second scenario, the system accesses
unstructured data using a community question answering module.

For, this internship we are interested in two tasks. First, we propose
to investigate the use of new approaches to integrate the dialog
history in different modules of the dialogue system including the NLU
and cQA modules. Thus, these modules have to capture the common
information between the dialog history and the candidate answers. The
second task concerns the evaluation of the different modules and of the
dialogue system through user simulation.

Some of those terms are defined as follows :
- Dialogue system : a dialogue system allows a user to interact using
natural language [DRO+19]. Two families of dialogue systems exist :
conversational systems and task-oriented systems. Conversational
systems have to generate the most appropriate reaction given a user's
utterance and a context, without any restriction about the domain.
A task-oriented system aims to help the user perform a task or access
information. A dialogue system generally consists of three modules :
natural language understanding (NLU), dialogue management and natural
language generation (NLG).

- NLU takes as input the utterance of the user and returns the slots
and the intent associated to this utterance. Considering the following
user utterance :"Please find me a recipe of pancakes without eggs", the
NLU should detect the slots "recipe : pancakes" and "neg-ingredient :
eggs" plus the intent "RECING",that means that the user is looking for
a recipe by giving the name of the recipe and the ingredients.

- Community Question Answering : Community Question Answering (cQA)
[Pat17] forums, such as Quora and Stack overflow offer a new
opportunity for users to provide, search and share knowledge. The cQA
system consists on automatically search for relevant answers among many
responses provided for a given question, and search for relevant
questions to reuse their existing answers.

Many approach have been proposed to integrate dialogue history
[TRC+20, BZZZ19, PRMU18, BTHTH17]. Popular contextual NLU models
[BTHTH17, BZZZ19] exploit the dialogue history with the memory network
[WCB14].The use of the memory mechanism helps the NLU model to retrieve
context knowledge to reduce the ambiguity of the current utterance.
Other approaches propose to represent the dialog history in the form of
dialog history embedding vectors. The embeddings vector can be computed
weather by predicting bag-of-concepts expected in the answer of the
user from the last dialog system response [TRC+20], or by adapting the
word2vec [MCCD13] approach to compute utterance embeddings that take
into account dialogue context [PRMU18]. The dialog history embedding
vectors are provided as an additional information to the NLU module.

History modeling is essential for ConvQA, since previous history turns
play an essential role in understanding the user's current information
need. Some existing methods simply prepend history turns [CCO+20,
RCM19, BHJM19] or mark answers in the passage[CHI+18]. These methods
cannot handle a long conversation history. Another existing method
[HCY18] uses complicated attention mechanisms to model history and thus
generates relatively large system overhead. [QYQ+19] propose a history
answer embedding method to model conversation history. The proposed
method is specifically tailored for BERT-based architectures.

Expected profile

-   Master 2 profile student in computer Science, specialized at least
    in one of the following topics :

-   Machine learning

-   Natural language processing

-   Technical skills : python, linux

Practical information

-   Duration of internship : 5-6 months

-   Beginning of the internship : start date to be defined with the
    intern

-   Gratification : around 591.91e /month and reimbursement of
    transport costs and canteen subsidy

References

[BHJM19] Basma El Amel Boussaha, Nicolas Hernandez, Christine
    Jacquin, and Emmanuel Morin. Multi-level context response matching
    in retrieval-based dialog systems. In Proceedings of the 7th
    edition of the Dialog System Technology Challenges Workshop at AAAI
    (DSTC7'19). Honolulu, HI, USA, 2019.
[BTHTH17] Ankur Bapna, Gokhan Tür, Dilek Hakkani-Tür, and Larry Heck.
    Sequential dialogue context modeling for spoken language
    understanding. In Proceedings of the 18th Annual SIGdial Meeting on
    Discourse and Dialogue, pages 103-114, Saarbrücken, Germany, August
    2017. Association for Computational Linguistics.
[BZZZ19] He Bai, Yu Zhou, Jiajun Zhang, and Chengqing Zong. Memory
    consolidationfor contextual spoken language understanding with
    dialogue logistic inference.In Proceedings of the 57th Annual
    Meeting of the Association for Computational Linguistics, pages
    5448-5453, 2019.
[CCO+20] Jon Ander Campos, Kyunghyun Cho, Arantxa Otegi, Aitor Soroa,
    Gorka Azkune,and Eneko Agirre. Improving conversational question
    answering systems after deployment using feedback-weighted
    learning. arXiv preprint arXiv:2011.00615, 2020.
[CHI+18] Eunsol Choi, He He, Mohit Iyyer, Mark Yatskar, Wen-tau Yih,
    Yejin Choi, PercyLiang, and Luke Zettlemoyer. Quac: Question
    answering in context. arXiv preprint arXiv:1808.07036, 2018.
[DRO+19] Jan Deriu, Alvaro Rodrigo, Arantxa Otegi, Guillermo Echegoyen,
    Sophie Rosset, Eneko Agirre, and Mark Cieliebak. Survey on
    evaluation methods for dialogue systems. arXiv preprint
    arXiv:1905.04071, 2019.
[HCY18] Hsin-Yuan Huang, Eunsol Choi, and Wen-tau Yih. Flowqa:
    Graspingflow in history for conversational machine comprehension.
    arXiv preprint arXiv:1810.06683, 2018.
[MCCD13] Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean.
    Efficient estimation of word representations in vector space. arXiv
    preprint arXiv:1301.3781, 2013.
[Pat17] Barun Patra. A survey of community question answering.CoRR,
    abs/1705.04009,2017.
[PRMU18] Louisa Pragst, Niklas Rach, Wolfgang Minker, and Stefan Ultes.
    On the vector representation of utterances in dialogue context. In
    Proceedings of the Eleventh International Conference on Language
    Resources and Evaluation (LREC 2018), 2018.
[QYQ+19] Chen Qu, Liu Yang, Minghui Qiu, W Bruce Croft, Yongfeng Zhang,
    and Mohit Iyyer. Bert with history answer embedding for
    conversational question answering. In Proceedings of the 42nd
    International ACM SIGIR Conference on Research and Development in
    Information Retrieval, pages 1133-1136, 2019.
[RCM19] Siva Reddy, Danqi Chen, and Christopher D Manning. Coqa:
    A conversational question answering challenge. Transactions of the
    Association for Computational Linguistics, 7:249-266, 2019.
[TRC+20] Natalia Tomashenko, Christian Raymond, Antoine Caubrière,
    Renato De Mori,and Yannick Estève. Dialogue history integration
    into end-to-end signal-to-concept spoken language understanding
    systems. In ICASSP 2020-2020 IEEE International Conference on
    Acoustics, Speech and Signal Processing (ICASSP), pages 8509-8513.
    IEEE, 2020.
[WCB14] Jason Weston, Sumit Chopra, and Antoine Bordes. Memory
    networks. arXiv preprint arXiv:1410.3916, 2014.