Cross-language continual learning for conversational systems

Sahar Ghannay, Laure Soulier, Christophe Servan, Sophie Rosset LISN
Laboratory - Saclay

Keywords: Cross-lingual transfer, Continual learning, natural language
understanding, information extraction, slot-filling, conversational
systems

Subject

Understanding natural language in the context of conversational
systems is a critical step to ensure their effectiveness. Different
work in natural language processing community are devoted to solve
natural language understanding (NLU), tackling various tasks such as
information extraction, paraphrase or summarization, or also
slot-filling. Mots of these works have been possible thanks to the
introduction of large language models which have demonstrated large
capabilities over different tasks. However, one drawback of these
solutions is that they are often addressed for a specific language or
a small fraction of languages. If we desire to deploy virtual
assistants over all the world, it is therefore important to design
models able to address a large number of languages.

In this internship, we assume that the deployment of virtual
assistants can be done step by step over different countries in the
world and, thus, that virtual assistants will face different languages
at different timestamp. This assumption imply that, when
designing/training a model for a given task, languages can be
incrementally added in the training procedure.

This setting refers to as two main research fields:

- Cross-lingual transfer [Coria et al., 2022], which aims at
exploiting the knowledge of languages previously exploited in a
pre-training process to train the model on another language. In such
setting, the knowledge from previous language will serve as
initialization of the language model of another language, enabling to
reduce the training time.

- Continual learning [Kirkpatrick et al., 2016, Ke et al., 2020],
which aims at designing models trained on a stream of tasks by
learning knowledge from new tasks without forgetting what has been
learned on previous ones. Some work have been proposed in NLP [Lee,
2017, Garcia et al., 2021].

In our case, we propose a continual learning setting in which the task
is fixed, but the stream is based on different languages. The model
therefore learn the knowledge of language peculiarities. Therefore, to
satisfy the initial condition of virtual assistants to address
different languages, we therefore need to ensure that our task-based
model does not forget previous languages while training on new ones.

Two preliminary works have been done: 1) [Coria et al., 2022] 1 ,
investigating BERT's cross-lingual transfer capabilities in two
continual sequence labeling tasks. 2) [Gerald and Soulier, 2022]
designing continual learning streams for information retrieval.

In practice, we will focus on the Massively Multilingual NLU 2022 data
[FitzGerald et al., 2022], which includes slot-filling and NER tasks
for 51 languages in parallel. The objective of the internship will be
to 1) build a stream of languages for a given task, 2) run baseline
models in the stream, and 3) design a continual learning model for
cross-lingual transfer.

Information

Supervisors: Sahar Ghannay, Laure Soulier, Christophe Servan, Sophie
Rosset

Contact: sahar.ghannay@lisn.fr, laure.soulier@isir.upmc.fr,
christophe.servan@lisn.fr, sophie.rosset@lisn.fr

Localization: Université Paris Saclay (Laboratoire LISN), France

Duration: 6 months, between February and August 2023.

Stipend: around 591.91 euros / month 2

Expected profile: Master or engineering degree in Computer Science or
Applied Mathematics related to machine learning/natural language
processing. The candidate should have a strong scientific background
with good technical skills in programming, and be fluent in reading
and writing English. Autonomy and curiosity are also adequate
soft-skills to work on this internship.

How to apply? Send a CV, a motivation letter and Master records to
sahar.ghannay@lisn.fr, laure.soulier@isir.upmc.fr,
christophe.servan@lisn.fr, sophie.rosset@lisn.fr.  Recommendation
letters would be appreciated.

Interviews will conducted as they arise and the position will be
filled as soon as possible - the latest application date is set to
15th January.

References

[Coria et al., 2022] Coria, J. M., Veron, M., Ghannay, S., Bernard,
G., Bredin, H., Galibert, O., and Rosset, S. (2022). Analyzing BERT
cross-lingual transfer capabilities in continual sequence labeling. In
Proceedings of the First Workshop on Performance and Interpretability
Evaluations of Multimodal, Multipurpose, Massive-Scale Models, pages
15-25, Virtual.  International Conference on Computational
Linguistics.

[FitzGerald et al., 2022] FitzGerald, J., Hench, C., Peris, C.,
Mackie, S., Rottmann, K., Sanchez, A., Nash, A., Urbach, L., Kakarala,
V., Singh, R., Ranganath, S., Crist, L., Britan, M., Leeuwis, W., Tur,
G., and Natarajan, P. (2022). Massive: A 1m-example multilingual
natural language understanding dataset with 51 typologically-diverse
languages.

[Garcia et al., 2021] Garcia, X., Constant, N., Parikh, A. P., and
Firat, O. (2021). Towards continual learning for multilingual machine
translation via vocabulary substitution. In NAACL-HLT, pages
1184-1192.

[Gerald and Soulier, 2022] Gerald, T. and Soulier,
L. (2022). Continual learning of long topic sequences in neural
information retrieval. In Hagen, M., Verberne, S., Macdonald, C.,
Seifert, C., Balog, K., Nørvåg, K., and Setty, V., editors, Advances
in Information Retrieval - 44th European Conference on IR Research,
ECIR 2022, Stavanger, Norway, April 10-14, 2022, Proceedings, Part I,
volume 13185 of Lecture Notes in Computer Science, pages
244-259. Springer.

[Ke et al., 2020] Ke, Z., Liu, B., and Huang, X. (2020). Continual
learning of a mixed sequence of similar and dissimilar tasks. In
NeurIPS.

[Kirkpatrick et al., 2016] Kirkpatrick, J., Pascanu, R., Rabinowitz,
N. C., Veness, J., Desjardins, G., Rusu, A. A., Milan, K., Quan, J.,
Ramalho, T., Grabska-Barwinska, A., Hassabis, D., Clopath, C.,
Kumaran, D., and Hadsell, R. (2016). Overcoming catastrophic
forgetting in neural networks. CoRR, abs/1612.00796.

[Lee, 2017] Lee, S. (2017). Toward continual learning for
conversational agents. CoRR, abs/1712.09943.