Le LIMSI propose un sujet de stage dont le sujet est détaillé ci-dessous Si ce sujet vous intéresse, contactez sahar.ghannay@limsi.fr ou sophie.rosset@limsi.fr. Title : Dialogue history integration Contact : Master internship at LIMSI CNRS Sahar Ghannay (ghannay@limsi.fr), Sophie Rosset (rosset@limsi.fr) Subject : The proposed internship is about task-oriented dialogue system working on the cooking domain. This dialogue system handles two different types of scenarios : (1) the user wants to find a recipe meeting his/her criteria, and (2) the user asks a question related to the cooking domain. For the first scenario, the system accesses a database which contains recipes. For the second scenario, the system accesses unstructured data using a community question answering module. For, this internship we are interested in two tasks. First, we propose to investigate the use of new approaches to integrate the dialog history in different modules of the dialogue system including the NLU and cQA modules. Thus, these modules have to capture the common information between the dialog history and the candidate answers. The second task concerns the evaluation of the different modules and of the dialogue system through user simulation. Some of those terms are defined as follows : - Dialogue system : a dialogue system allows a user to interact using natural language [DRO+19]. Two families of dialogue systems exist : conversational systems and task-oriented systems. Conversational systems have to generate the most appropriate reaction given a user's utterance and a context, without any restriction about the domain. A task-oriented system aims to help the user perform a task or access information. A dialogue system generally consists of three modules : natural language understanding (NLU), dialogue management and natural language generation (NLG). - NLU takes as input the utterance of the user and returns the slots and the intent associated to this utterance. Considering the following user utterance :"Please find me a recipe of pancakes without eggs", the NLU should detect the slots "recipe : pancakes" and "neg-ingredient : eggs" plus the intent "RECING",that means that the user is looking for a recipe by giving the name of the recipe and the ingredients. - Community Question Answering : Community Question Answering (cQA) [Pat17] forums, such as Quora and Stack overflow offer a new opportunity for users to provide, search and share knowledge. The cQA system consists on automatically search for relevant answers among many responses provided for a given question, and search for relevant questions to reuse their existing answers. Many approach have been proposed to integrate dialogue history [TRC+20, BZZZ19, PRMU18, BTHTH17]. Popular contextual NLU models [BTHTH17, BZZZ19] exploit the dialogue history with the memory network [WCB14].The use of the memory mechanism helps the NLU model to retrieve context knowledge to reduce the ambiguity of the current utterance. Other approaches propose to represent the dialog history in the form of dialog history embedding vectors. The embeddings vector can be computed weather by predicting bag-of-concepts expected in the answer of the user from the last dialog system response [TRC+20], or by adapting the word2vec [MCCD13] approach to compute utterance embeddings that take into account dialogue context [PRMU18]. The dialog history embedding vectors are provided as an additional information to the NLU module. History modeling is essential for ConvQA, since previous history turns play an essential role in understanding the user's current information need. Some existing methods simply prepend history turns [CCO+20, RCM19, BHJM19] or mark answers in the passage[CHI+18]. These methods cannot handle a long conversation history. Another existing method [HCY18] uses complicated attention mechanisms to model history and thus generates relatively large system overhead. [QYQ+19] propose a history answer embedding method to model conversation history. The proposed method is specifically tailored for BERT-based architectures. Expected profile - Master 2 profile student in computer Science, specialized at least in one of the following topics : - Machine learning - Natural language processing - Technical skills : python, linux Practical information - Duration of internship : 5-6 months - Beginning of the internship : start date to be defined with the intern - Gratification : around 591.91e /month and reimbursement of transport costs and canteen subsidy References [BHJM19] Basma El Amel Boussaha, Nicolas Hernandez, Christine Jacquin, and Emmanuel Morin. Multi-level context response matching in retrieval-based dialog systems. In Proceedings of the 7th edition of the Dialog System Technology Challenges Workshop at AAAI (DSTC7'19). Honolulu, HI, USA, 2019. [BTHTH17] Ankur Bapna, Gokhan Tür, Dilek Hakkani-Tür, and Larry Heck. Sequential dialogue context modeling for spoken language understanding. In Proceedings of the 18th Annual SIGdial Meeting on Discourse and Dialogue, pages 103-114, Saarbrücken, Germany, August 2017. Association for Computational Linguistics. [BZZZ19] He Bai, Yu Zhou, Jiajun Zhang, and Chengqing Zong. Memory consolidationfor contextual spoken language understanding with dialogue logistic inference.In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 5448-5453, 2019. [CCO+20] Jon Ander Campos, Kyunghyun Cho, Arantxa Otegi, Aitor Soroa, Gorka Azkune,and Eneko Agirre. Improving conversational question answering systems after deployment using feedback-weighted learning. arXiv preprint arXiv:2011.00615, 2020. [CHI+18] Eunsol Choi, He He, Mohit Iyyer, Mark Yatskar, Wen-tau Yih, Yejin Choi, PercyLiang, and Luke Zettlemoyer. Quac: Question answering in context. arXiv preprint arXiv:1808.07036, 2018. [DRO+19] Jan Deriu, Alvaro Rodrigo, Arantxa Otegi, Guillermo Echegoyen, Sophie Rosset, Eneko Agirre, and Mark Cieliebak. Survey on evaluation methods for dialogue systems. arXiv preprint arXiv:1905.04071, 2019. [HCY18] Hsin-Yuan Huang, Eunsol Choi, and Wen-tau Yih. Flowqa: Graspingflow in history for conversational machine comprehension. arXiv preprint arXiv:1810.06683, 2018. [MCCD13] Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781, 2013. [Pat17] Barun Patra. A survey of community question answering.CoRR, abs/1705.04009,2017. [PRMU18] Louisa Pragst, Niklas Rach, Wolfgang Minker, and Stefan Ultes. On the vector representation of utterances in dialogue context. In Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018), 2018. [QYQ+19] Chen Qu, Liu Yang, Minghui Qiu, W Bruce Croft, Yongfeng Zhang, and Mohit Iyyer. Bert with history answer embedding for conversational question answering. In Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 1133-1136, 2019. [RCM19] Siva Reddy, Danqi Chen, and Christopher D Manning. Coqa: A conversational question answering challenge. Transactions of the Association for Computational Linguistics, 7:249-266, 2019. [TRC+20] Natalia Tomashenko, Christian Raymond, Antoine Caubrière, Renato De Mori,and Yannick Estève. Dialogue history integration into end-to-end signal-to-concept spoken language understanding systems. In ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 8509-8513. IEEE, 2020. [WCB14] Jason Weston, Sumit Chopra, and Antoine Bordes. Memory networks. arXiv preprint arXiv:1410.3916, 2014.