Internship for Last Year Engineer or Master 2 Students

Keywords: Machine Learning, Diarization, Digital Humanities, Political
Speech, Prosody, Expressive Speech


Context

This internship is part of the Ontology and Tools for the Annotation of
Political Speech (OOPAIP), a transdisciplinary project funded under the
DIM-STCN (Text Sciences and New Knowledge,
http://www.dim-humanites-numeriques.fr/en/) by the Regional Council of
Ile de France. The project is carried out by the European Center for
Sociology and Political Science (CESSP, https://www.cessp.cnrs.fr/) of
the University of Paris 1 Panthéon-Sorbonne, the National
Audiovisual Institute (INA, https://www.ina.fr/), and the LISN
(https://www.limsi.fr/en/). Its objective is to design new approaches
to develop detailed, qualitative, and quantitative analyzes of
political speech in the French media. Part of the project concerns the
study of the dynamics of conflicting interactions in interviews and
political debates, which requires a detailed description and a large
corpus to allow for the models' generalization. Some of the main
challenges concern the performance of speaker and speech style
segmentation, e.g., improving the segmentation accuracy, detecting
superimposed speech, measuring vocal effort and other expressive
elements.


Objectives

The main objective of the internship is to improve the automatic
segmentation of political interviews. In this context, we will be
particularly interested in the detection of "hubbub" (strong and
prolonged overlapped speech). More precisely, we would like to extract
features from the speech signal (Eyben, 2015) correlated with the level
of conflictual content in the exchanges, based, for example, on the
arousal level in the speaker's voice-intermediate level between the
speech signal analysis and the expressivity description (Rilliard,
2018)-or vocal effort (Lienard, 2019).

The internship will initially be based on two corpora of 30 political
interviews manually annotated in speech turns and speech acts-within
the framework of the OOPAIP project. It will begin with a state of the
art review of speech diarization  and overlapped speech detection
(chowdhury, 2019). The aim will then be to propose solutions based on
recent frameworks (Bredin, 2020) to improve the precise localization of
speaking segments, in particular when the frequency of speaker changes
is high.

In the second part of the internship, we will look at a more detailed
measurement and prediction of the conflicting level of exchanges. We
will search for the most relevant features to describe the conflicting
level and by adapting or developing a neural network architecture for
its modeling.

The programming language used for this internship will be Python. The
candidate will have access to the LISN computing resources (servers and
clusters with recent generation GPUs).


Publications

Depending on the degree of maturity of the work carried out, we expect
the applicant to:

*   Distribute the tools produced under an open-source license

*   Write a scientific publication


Conditions

The internship will take place over a period of 4 to 6 months at the
LISN (formerly LIMSI) in the Spoken Language Processing (TLP) group.
The laboratory is located near the "plateau de Saclay", university
campus building 507, rue du Belvédère, 91400 Orsay. The candidate
will be supervised by Marc Evrard (marc.evrard@lisn.upsaclay.fr).
Allowance under the official standards
(https://www.service-public.fr/professionnels-entreprises/vosdroits/F32131).


Applicant profile

*   Student in the last year of a 5-years diploma in the field of
    computer science (AI is a plus)

*   Proficiency in Python language and experience in using ML libraries
    (Scikit-Learn, TensorFlow, PyTorch)

*   Strong interest in digital humanities and political science in
    particular

*   Experience in automatic speech processing is preferred

*   Ability to carry out a bibliographic study from scientific articles
    written in English

To apply: Send an email to marc.evrard@lisn.upsaclay.fr including a
résumé and a cover letter.


Bibliography

Bredin, H., et al. (2020). Pyannote.audio: neural building blocks for
speaker diarization. In ICASSP 2020 (pp. 7124-7128).

Chowdhury, S. A., Stepanov, E. A., Danieli, M., Riccardi, G. (2019).
"Automatic classification of speech overlaps: Feature
representation and algorithms", Computer Speech & Language, vol.
55, pp.145-167.

Eyben, F., Scherer, K. R., et al. (2015). The Geneva minimalistic
acoustic parameter set (GeMAPS) for voice research and affective
computing. IEEE transactions on affective computing, 7(2), 190-202.

Liénard, J.-S. "Quantifying vocal effort from the shape of
the one-third octave long-term-average spectrum of speech" J.
Acoust. Soc. Am. 146 (4), Oc-tober 2019.

OOPAIP (Ontologie et outil pour l'annotation des interventions
politiques), DIM STCN (Sciences du Texte et connaissances nouvelles),
Conseil régional d'Ile de France,  url:
http://www.dim-humanites-numeriques.fr/projets/oopaip-ontologie-et-outils-pour-lannotation-des-interventions-politiques/

Rilliard, A., d'Alessandro, C & Evrard, M. (2018). Paradigmatic
variation of vowels in expressive speech: Acoustic description and
dimensional analysis. The Journal of the Acoustical Society of America,
143(1), 109-122.