Tritre : Context-aware multilingual semantic representations of dialog turns for SLU task Contact : sahar.ghannay@lisn.upsaclay.fr, sophie.rosset@lisn.upsaclay.fr Description In the context of the spoken language understanding (SLU) field for dialogue systems, the problem of contextual representation remains a hot topic despite the many works on it [Tomashenko et al., 2020]. Focusing on this problem, the main objective of this study is to build a context-aware representation of dialog turns, enriched with multilingual multimodal semantic information. A recent study [Laperri`ere et al., 2023] investigates a specific in-domain semantic enrichment of the SSL (self-supervised learning) SAMU-XLSR model by specializing it on a small amount of transcribed data from a challenging SLU task, to better semantic information extraction on this downstream task. Thus, we propose to enrich the SAMU-XLSR [Khurana et al., 2022] model with contextual information of dialog turns in addition to the previously acquired multilingual multimodal semantic information. We are also interested in semantic information extraction from speech signals using end-to end approaches. The performance of the Contextual-SAMU-XLSR model will be evaluated on SLU task in different languages and domains. The experiments will be performed on two challenging SLU datasets. I) A new version of the MEDIA [Bonneau-Maynard et al., 2005] French corpus enriched with intent information in addition to the slots. II) The TARIC corpus [Masmoudi et al., ] in Tunisian dialect, enriched with semantic annotations ( slots and dialog acts). Both corpora will be publicly available soon. In addition, we propose to use the DailyDialog [Li et al., 2017] corpus to enrich the SAMU-XLSR model with contextual information. The objectives of the internship are: - Extend the recent work [Laperri`ere et al., 2023] to develop an end-to-end SLU system for joint slot and intent detection on the new version of MEDIA TASK. - Enrich the SAMU-XLSR model with contextual information of dialog turns - Evaluate the performance of contextual SaMU XLSR representation on both corpora and investigate how the cross-lingual and cross-domain portability from distant languages could be beneficial to make the semantically enriched representation more accurate. The SLU models will be implemented using the open-source SpeechBrain toolkit [Ravanelli et al., 2021] dedicated to neural speech processing. Expected profile - Master 2 profile student in Computer Science, specialized at least in one of the following topics : - Machine learning - Natural language processing - Technical skills: python, linux Practical information - Duration of internship: 5-6 months - Beginning of the internship: start date is to be defined with the intern, but preferably January or February - Gratification: around 660 /month and reimbursement of transport costs and canteen subsidy - Location: at LISN References [Bonneau-Maynard et al., 2005] Bonneau-Maynard, H., Rosset, S., Ayache, C., Kuhn, A., and Mostefa, D. (2005). Semantic annotation of the french media dialog corpus. In Interspeech. [Khurana et al., 2022] Khurana, S., Laurent, A., and Glass, J. (2022). Samu-xlsr : Semantically-aligned multi- modal utterance-level cross-lingual speech representation. IEEE Journal of Selected Topics in Signal Proces- sing, 16(6) :1493-1504. [Laperrière et al., 2023] Laperri`ere, G., Nguyen, H., Ghannay, S., Jabaian, B., and Est`eve, Y. (2023). Specialized semantic enrichment of speech representations. In 2023 IEEE International Conference on Acoustics, Speech, and Signal Processing Workshops (ICASSPW), pages 1-5. [Li et al., 2017] Li, Y., Su, H., Shen, X., Li, W., Cao, Z., and Niu, S. (2017). Dailydialog : A manually labelled multi-turn dialogue dataset. ArXiv, abs/1710.03957. [Masmoudi et al., ] Masmoudi, A., Esteve, Y., Belguith, L. H., and Habash, N. A corpus and phonetic dictionary for tunisian arabic speech recognition. [Ravanelli et al., 2021] Ravanelli, M., Parcollet, T., Plantinga, P., Rouhe, A., Cornell, S., Lugosch, L., Subakan, C., Dawalatabad, N., Heba, A., Zhong, J., Chou, J.-C., Yeh, S.-L., Fu, S.-W., Liao, C.-F., Rastorgueva, E., Grondin, F., Aris, W., Na, H., Gao, Y., Mori, R. D., and Bengio, Y. (2021). SpeechBrain : A general-purpose speech toolkit. arXiv :2106.04624. [Tomashenko et al., 2020] Tomashenko, N., Raymond, C., Caubri`ere, A., Mori, R. D., and Est`eve, Y. (2020). Dialogue history integration into end-to-end signal-to-concept spoken language understanding systems. InICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 8509-8513.