Text to Pictograms - 6 Months Internship Natural Language Processing Master 1/2 or engineering school in final year Context Sony CSL Paris is a pure research laboratory embedded in the rich Sony galaxy. The extreme flexibility and transdisciplinarity of the approach adopted in the four labs in Tokyo, Kyoto, Paris and Rome give the chance to face diverse research fields, combining science, technological innovation, art and the public good in a single conceptual space. The specific themes range from music to language, from the future of cities to sustainable agriculture, from creative processes to humanity's great challenges. Our mission is to conduct fundamental research that shows the promise to change in the world for the better. Sony CSL Paris is composed of a diverse range of researchers working on a variety of topics ranging from Music, Creativity, Sustainability to Language. Project In one of the projects of the language team, we are investigating whether and how NLP tools can contribute to improving online speech therapy for children with language and reading comprehension disorders. As a proof of concept, we first implemented an artificial intelligence-powered tool that helps children with reading comprehension fragilities to make sense of the information contained in a written text. Objectives of the Internship Pictograms are standardized image-based word or concept representations that help people with limited speech or reading ability communicate or comprehend written materials. According to estimations, 2 to 5 million people in the European Union could benefit from using symbols or text that contains symbols to communicate in writing (Keskinen et al. 2012). The goal of this internship is to implement a model to generate pictographs from Italian and English texts. More in particular, during your internship you will firstly carry on a literature review on current text to pictograph dataset and computational models existing for English Italian. Secondly, you would have to train and evaluate a model for automatic text to pictogram generation. Finally, you would need to add an additional component to your model that takes into account the surrounding context when generating the pictograms and that performs a safety check on the pictograms generated. Depending on the timing, the results given by the underlying model could also be validated through a field experiment done at a partner speech and language therapy center. Roadmap : 1. Literature Review 2. Dataset selection and preprocessing 3. Model implementation and training 4. Implementation of the surrounding context component 5. Evaluation 6. Optional field experiment Required hard and soft skills * Student in Master 1 or 2 or equivalent engineering school * Background in Natural Language Processing * Good writing and oral skills in English and/or French * Strong programming experience (Python) * Experience with machine and deep learning and practical experience with deep learning frameworks such as Pytorch and Scikit-Learn. * Well-organized, determined, quick learner. Ideally also: * Interest or background in experimental psychology * Interest in the field of AI for Healthcare or AI for Education Location: Paris, France Duration: 6 months (full time), starting in September/October 2023. Gratification & benefits : * 1100 €(M1) or 1200 €(M2) brut per month * Meal ticket and 50% of the transport ticket * Possibility to two days remote work per week How to apply: Send a CV and a cover letter by e-mail to martina.galletti@sony.com before 31/07/23 Relevant Literature: 1. Vandeghinste, V., Sevens, I. S. L., & Van Eynde, F. (2017). Translating text into pictographs. Natural Language Engineering, 23(2), 217-244. 2. Norré, M., Vandeghinste, V., Bouillon, P., & François, T. (2021, September). Extending a text-to-pictograph system to French and to Arasaac. In Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2021) (pp. 1050-1059). 3. Norré, M., Vandeghinste, V., Bouillon, P., & François, T. (2021, September). Extending a text-to-pictograph system to French and to Arasaac. In Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2021) (pp. 1050-1059). 4. Sevens, L., Jacobs, G., Vandeghinste, V., Schuurman, I., & Van Eynde, F. (2016, August). Improving text-to-pictograph translation through word sense disambiguation. In Proceedings of the Fifth Joint Conference on Lexical and Computational Semantics (pp. 131-135). 5. Sevens, L., Vandeghinste, V., Schuurman, I., & Van Eynde, F. (2017). Simplified text-to-pictograph translation for people with intellectual disabilities. In Natural Language Processing and Information Systems: 22nd International Conference on Applications of Natural Language to Information Systems, NLDB 2017, Liège, Belgium, June 21-23, 2017, Proceedings 22 (pp. 185-196). Springer International Publishing 6. Zeng-Treitler, Q., Kim, H., & Hunter, M. (2008). Improving patient comprehension and recall of discharge instructions by supplementing free texts with pictographs. In AMIA Annual Symposium Proceedings (Vol. 2008, p. 849). American Medical Informatics Association 7. Choi, J. (2012). Development and pilot test of pictograph-enhanced breast health-care instructions for community-residing immigrant women. International journal of nursing practice, 18(4), 373-378 8. Sevens, L. (2018). Words divide, pictographs unite: Pictograph communication technologies for people with an intellectual disability