*About Vivoka*

Founded in 2015 and awarded two CES Innovation Awards, Vivoka
(https://vivoka.com/en/) has created and sells the Voice Development
Kit (VDK), the very first solution allowing a company to design a voice
interface in a simple, autonomous, and quick way. Moreover, this
interface is embedded: it can be deployed on devices without an
Internet connection and fully preserves privacy. Accelerated by the
COVID-19 health crisis and the need for "no-touch" interfaces, Vivoka
is now optimizing this technology by developing its own speech and
language processing solutions that are able to compete with the most
efficient current technologies. The internship would be carried out as
part of Vivoka's R&D team. The interns will benefit from the startup
spirit of Vivoka, where they will interact with the researchers and
Ph.D. students of the R&D team, and the engineers responsible for
integrating their results into the VDK.


Internship Requirements:

    -   M2 in Computer Science with a specialization in Machine
        Learning (ML) or Natural Language Processing (NLP)
    -   Prior knowledge and/or experience with ML/NLP.
    -   Experience with Python programming and frameworks like PyTorch.
    
2.  Data Augmentation for Low Resource Slot Filling and Intent
Classification

Context:

Neural-based models have achieved outstanding performance on slot and
intent classification when fairly large in-domain training data is
available. However, as new domains are frequently added, creating
sizable data is expensive. Some approaches [1, 2] suggest a set of
augmentation methods involving word span and sentence level operations,
alleviating data scarcity problems.

We target more complex state-of-the-art augmentation approaches that
allow models to achieve competitive performance on small (English and
French) data. Furthermore, we will investigate the exploitation of
pretrained Large Language Models such as [3] for data augmentation, and
how it can affect slot filling and intent classification performance
for those languages.

Objectives and Expected Outcomes:

    -   Experiments on low and large resource data
    -   Implement different approaches to augment data for slot filling
        and intent classification
    -   Evaluate the quality of the generated data
    -   Evaluate the effect of data augmentation on slot filling and
        intent classification
    -   Integrate the tool into our NLU system
    -   Develop a Python module for Data Augmentation dedicated to the
        task
    -   Evaluate the module on several real use cases.


References:

    1.  Jason W. Wei and Kai Zou. 2019. "EDA: easy data augmentation
        techniques for boosting performance on text classification
        tasks"
        https://aclanthology.org/D19-1670/>. In Kentaro Inui, Jing
        Jiang, Vincent Ng, and Xiaojun Wan, editors, Proceedings of the
        2019 Conference on Empirical Methods in Natural Language
        Processing and the 9th International Joint Conference on
        Natural Language Processing, EMNLP-IJCNLP 2019, Hong Kong,
        China, November 3-7, 2019, pages 6381-6387. Association for
        Computational Linguistics
    2.  Marzieh Fadaee, Arianna Bisazza, and Christof Monz. 2017. "Data
        augmentation for low-resource neural machine translation"
        https://aclanthology.org/P17-2090/. In Proceedings of the 55th
        Annual Meeting of the Association for Computational Linguistics
        (Volume 2: Short Papers), pages 567-573, Vancouver, Canada,
        July. Association for Computational Linguistic
    3.  Ray, Partha Pratim. "ChatGPT: A comprehensive review on
        background, applications, key challenges, bias, ethics,
        limitations and future scope."
        https://www.sciencedirect.com/science/article/pii/S266734522300024X
        Internet of Things and Cyber-Physical Systems (2023).


Please submit your applications to tulika.bose@vivoka.com or
firas.hmida@vivoka.com. Please feel free to share this call for
applications with any interested students.