*About Vivoka* Founded in 2015 and awarded two CES Innovation Awards, Vivoka (https://vivoka.com/en/) has created and sells the Voice Development Kit (VDK), the very first solution allowing a company to design a voice interface in a simple, autonomous, and quick way. Moreover, this interface is embedded: it can be deployed on devices without an Internet connection and fully preserves privacy. Accelerated by the COVID-19 health crisis and the need for "no-touch" interfaces, Vivoka is now optimizing this technology by developing its own speech and language processing solutions that are able to compete with the most efficient current technologies. The internship would be carried out as part of Vivoka's R&D team. The interns will benefit from the startup spirit of Vivoka, where they will interact with the researchers and Ph.D. students of the R&D team, and the engineers responsible for integrating their results into the VDK. Internship Requirements: - M2 in Computer Science with a specialization in Machine Learning (ML) or Natural Language Processing (NLP) - Prior knowledge and/or experience with ML/NLP. - Experience with Python programming and frameworks like PyTorch. 2. Data Augmentation for Low Resource Slot Filling and Intent Classification Context: Neural-based models have achieved outstanding performance on slot and intent classification when fairly large in-domain training data is available. However, as new domains are frequently added, creating sizable data is expensive. Some approaches [1, 2] suggest a set of augmentation methods involving word span and sentence level operations, alleviating data scarcity problems. We target more complex state-of-the-art augmentation approaches that allow models to achieve competitive performance on small (English and French) data. Furthermore, we will investigate the exploitation of pretrained Large Language Models such as [3] for data augmentation, and how it can affect slot filling and intent classification performance for those languages. Objectives and Expected Outcomes: - Experiments on low and large resource data - Implement different approaches to augment data for slot filling and intent classification - Evaluate the quality of the generated data - Evaluate the effect of data augmentation on slot filling and intent classification - Integrate the tool into our NLU system - Develop a Python module for Data Augmentation dedicated to the task - Evaluate the module on several real use cases. References: 1. Jason W. Wei and Kai Zou. 2019. "EDA: easy data augmentation techniques for boosting performance on text classification tasks" https://aclanthology.org/D19-1670/>. In Kentaro Inui, Jing Jiang, Vincent Ng, and Xiaojun Wan, editors, Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, EMNLP-IJCNLP 2019, Hong Kong, China, November 3-7, 2019, pages 6381-6387. Association for Computational Linguistics 2. Marzieh Fadaee, Arianna Bisazza, and Christof Monz. 2017. "Data augmentation for low-resource neural machine translation" https://aclanthology.org/P17-2090/. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pages 567-573, Vancouver, Canada, July. Association for Computational Linguistic 3. Ray, Partha Pratim. "ChatGPT: A comprehensive review on background, applications, key challenges, bias, ethics, limitations and future scope." https://www.sciencedirect.com/science/article/pii/S266734522300024X Internet of Things and Cyber-Physical Systems (2023). Please submit your applications to tulika.bose@vivoka.com or firas.hmida@vivoka.com. Please feel free to share this call for applications with any interested students.