Veuillez trouver ci-dessous une proposition d'un stage M2 financé par le projet TIRIS Text-Ecop. Master thesis offer: Polarization in Natural Language Student profile: Master 2 student in Machine Learning, Data Science, Natural Language Processing. Keywords: Natural Language Processing, Machine Learning Supervision and Contacts: Stergos Afantenos, stergos.afantenos@irit.fr Karine Van Der Straeten, Karine.Van-Der-Straeten@TSE-fr.eu Seebastien Gadat, Sebastien.Gadat@TSE-fr.eu Working place Institut de Recherche en Informatique & Toulouse School of: Economics Length: 5 to 6 months, starting in February/March 2025 Remuneration: Standard internship grant : 550 euros/month. Context: The proposed Masters thesis will be developed within the context of the TIRIS project entitled "Text-mining et nouvelles stratégies de mesure en économie politique" (Text-Ecop), a multi-disciplinary research project betweenToulouse School of Economics and Université Paul Sabatier. The project deals with the use of applied mathematics, computer science and A.I. on textual data analysis for applications in various fields, including political economy as a specific focus in Text-Ecop. Objectives and Contents of the Internship: Polarisation is a societal phenomenon which can take many forms. Usually it is manifested by the differenttopic choices that opposite parties choose to discuss or focus on. On the other hand, these parties are not isolated from society and thus they often need to discuss the same topic. In such cases, polarization is manifested via the linguistic choices they make in order to express the same concept. Consider for examplethe following two phrases (retrieved from chat GPT on Sep 9th, 2024) : A: The city invested in much needed green spaces to promote wellbeing and community activities. B: The city wasted millions on useless patches of grass that no-oneeven wants. In this example we can clearly see the framing bias that each party has, as manifested by the lexical choices that were made in order to express the same concept. Furthermore, we can see that this framing bias lies on aspectrum of polarization (from neutral to more polarized). To make it more explicit, in this particular case we have two instances of polarized lexical choices. The first one concerns the action that is denoted by the main verbs of the sentence ("invested" vs "wasted") while the second concerns the description of the object ("much needed green spaces" vs "useless patches of grass"). Note, at this point, that a third party might have made another lexical choice such as "spent" or something else more or less neutral. The same holds between "green spaces" and "useless patches of grass." In a certain sense, an analogy is formed between these lexical choices, both referring to the same concept, but expressed in a way that reflects polarization. The goal of this masters thesis is the development of Machine Learning models based on Neural Networks or other approaches that will enable us to identify the existence of framing bias, expressed via particular lexical choices, showing a degree of polarization. The dataset that we will use in order to build the Machine Learning models will be the NEUS corpus described in Lee et al. (2022).In terms of measuring the degree of polarity, various approaches, based on thedistance (Euclidean or other) in a latent embedding space of the aforemen-tioned lexical choices, can be examined in order to provide a numerical valuereflecting the degree of polarization. The successful candidate is expected to have a background in Machine Learn-ing or Computational Linguistics. This masters thesis has the possibility to beextended to a PhD thesis funded by the same project, depending on the resultsobtained. References Nayeon Lee, Yejin Bang, Tiezheng Yu, Andrea Madotto, and Pascale Fung. Neus:Neutral multi-news summarization for mitigating framing bias. 2022.