Internship for Last Year Engineer or Master 2 Students Keywords: Machine Learning, Diarization, Digital Humanities, Political Speech, Prosody, Expressive Speech Context This internship is part of the Ontology and Tools for the Annotation of Political Speech (OOPAIP), a transdisciplinary project funded under the DIM-STCN (Text Sciences and New Knowledge, http://www.dim-humanites-numeriques.fr/en/) by the Regional Council of Ile de France. The project is carried out by the European Center for Sociology and Political Science (CESSP, https://www.cessp.cnrs.fr/) of the University of Paris 1 Panthéon-Sorbonne, the National Audiovisual Institute (INA, https://www.ina.fr/), and the LISN (https://www.limsi.fr/en/). Its objective is to design new approaches to develop detailed, qualitative, and quantitative analyzes of political speech in the French media. Part of the project concerns the study of the dynamics of conflicting interactions in interviews and political debates, which requires a detailed description and a large corpus to allow for the models' generalization. Some of the main challenges concern the performance of speaker and speech style segmentation, e.g., improving the segmentation accuracy, detecting superimposed speech, measuring vocal effort and other expressive elements. Objectives The main objective of the internship is to improve the automatic segmentation of political interviews. In this context, we will be particularly interested in the detection of "hubbub" (strong and prolonged overlapped speech). More precisely, we would like to extract features from the speech signal (Eyben, 2015) correlated with the level of conflictual content in the exchanges, based, for example, on the arousal level in the speaker's voice-intermediate level between the speech signal analysis and the expressivity description (Rilliard, 2018)-or vocal effort (Lienard, 2019). The internship will initially be based on two corpora of 30 political interviews manually annotated in speech turns and speech acts-within the framework of the OOPAIP project. It will begin with a state of the art review of speech diarization and overlapped speech detection (chowdhury, 2019). The aim will then be to propose solutions based on recent frameworks (Bredin, 2020) to improve the precise localization of speaking segments, in particular when the frequency of speaker changes is high. In the second part of the internship, we will look at a more detailed measurement and prediction of the conflicting level of exchanges. We will search for the most relevant features to describe the conflicting level and by adapting or developing a neural network architecture for its modeling. The programming language used for this internship will be Python. The candidate will have access to the LISN computing resources (servers and clusters with recent generation GPUs). Publications Depending on the degree of maturity of the work carried out, we expect the applicant to: * Distribute the tools produced under an open-source license * Write a scientific publication Conditions The internship will take place over a period of 4 to 6 months at the LISN (formerly LIMSI) in the Spoken Language Processing (TLP) group. The laboratory is located near the "plateau de Saclay", university campus building 507, rue du Belvédère, 91400 Orsay. The candidate will be supervised by Marc Evrard (marc.evrard@lisn.upsaclay.fr). Allowance under the official standards (https://www.service-public.fr/professionnels-entreprises/vosdroits/F32131). Applicant profile * Student in the last year of a 5-years diploma in the field of computer science (AI is a plus) * Proficiency in Python language and experience in using ML libraries (Scikit-Learn, TensorFlow, PyTorch) * Strong interest in digital humanities and political science in particular * Experience in automatic speech processing is preferred * Ability to carry out a bibliographic study from scientific articles written in English To apply: Send an email to marc.evrard@lisn.upsaclay.fr including a résumé and a cover letter. Bibliography Bredin, H., et al. (2020). Pyannote.audio: neural building blocks for speaker diarization. In ICASSP 2020 (pp. 7124-7128). Chowdhury, S. A., Stepanov, E. A., Danieli, M., Riccardi, G. (2019). "Automatic classification of speech overlaps: Feature representation and algorithms", Computer Speech & Language, vol. 55, pp.145-167. Eyben, F., Scherer, K. R., et al. (2015). The Geneva minimalistic acoustic parameter set (GeMAPS) for voice research and affective computing. IEEE transactions on affective computing, 7(2), 190-202. Liénard, J.-S. "Quantifying vocal effort from the shape of the one-third octave long-term-average spectrum of speech" J. Acoust. Soc. Am. 146 (4), Oc-tober 2019. OOPAIP (Ontologie et outil pour l'annotation des interventions politiques), DIM STCN (Sciences du Texte et connaissances nouvelles), Conseil régional d'Ile de France, url: http://www.dim-humanites-numeriques.fr/projets/oopaip-ontologie-et-outils-pour-lannotation-des-interventions-politiques/ Rilliard, A., d'Alessandro, C & Evrard, M. (2018). Paradigmatic variation of vowels in expressive speech: Acoustic description and dimensional analysis. The Journal of the Acoustical Society of America, 143(1), 109-122.