Internship position at Telecom-Paris on Deep learning approaches for
social computing

*Place of work* Telecom Paris, Palaiseau (Paris outskirt)

*Starting date* From February 2021(but can start later)

*Duration* 4-6 months


*Context*

The intern will take part in the REVITALISE project, funded by ANR.

The research activity of the internship will bring together the
research topics of Prof. Chloé Clavel [Clavel] of the S2a [SSA] team at
Telecom-Paris- social computing [SocComp] - and Dr. Mathieu Chollet
[Chollet] from University of Glasgow - multimodal systems for social
skills training, and Dr Beatrice Biancardi [Biancardi] - Social
Behaviour Modelling from CESI Engineering School, Nanterre.


*Candidate profile*

As a minimum requirement, the successful candidate should have:

-   A master degree in one or more of the following areas: human-agent
    interaction, deep learning, computational linguistics, affective
    computing, reinforcement learning, natural language processing,
    speech processing

-   Excellent programming skills (preferably in Python)

-   Excellent command of English

-   The desire to do an academic thesis at Telecom-Paris after the
    internship


*How to apply*

The application should be formatted as **a single pdf file** and should
include:

-   A complete and detailed curriculum vitae

-   A cover letter

-   The contact of two referees

The pdf file should be sent to the two supervisors: Chloé Clavel,
Beatrice Biancardi and Mathieu Chollet: chloe.clavel@telecom-paris.fr,
bbiancardi@cesi.fr, mathieu.chollet@glasgow.ac.uk


Multimodal attention models for assessing and providing feedback on
users' public speaking ability

*Keywords*
human-machine interaction, attention models, recurrent neural networks,
Social Computing, natural language processing, speech processing,
non-verbal behavior processing, multimodality, soft skills,
public speaking


*Supervision*
Chloé Clavel, Mathieu Chollet, Beatrice Biancardi


*Description*
Oral communication skills are essential in many situations and have
been identified as core skills of the 21st century.
Technological innovations have enabled social skills training
applications which hold great training potential: speakers' behaviors
can be automatically measured, and machine learning models can be
trained to predict public speaking performance from these measurements
and subsequently generate personalized feedback to the trainees.

The REVITALISE project proposes to study explainable machine learning
models for the automatic assessment of public speaking and for
automatic feedback production to public speaking trainees. In
particular, the recruited intern will address the following points:

-   identify relevant datasets for training public speaking and prepare
    them for model training

-   propose and implement multimodal machine learning models for public
    speaking assessment and compare them to existing approaches in
    terms of predictive performance.

-   integrate the public assessment models to produce feedback a public
    speaking training interface, and evaluate the usefulness and
    acceptability of the produced feedback in a user study

The results of the project will help to advance the state of the art in
social signal processing, and will further our understanding of the
performance/explainability trade-off of these models.

The compared models will include traditional machine learning models
proposed in previous work [Wortwein] and sequential neural approaches
(recurrent networks) that integrate attention models as a continuation
of the work done in [Hemamou], [BenYoussef]. The feedback production
interface will extend a system developed in previous work [Chollet21].


Selected references of the team:

[Hemamou] L. Hemamou, G. Felhi, V. Vandenbussche, J.-C. Martin, C.
    Clavel, HireNet: a Hierarchical Attention Model for the Automatic
    Analysis of Asynchronous Video Job Interviews.  in AAAI 2019, to
    appear

[Ben-Youssef]  Atef Ben-Youssef, Chloé Clavel, Slim Essid, Miriam
    Bilac, Marine Chamoux, and Angelica Lim.  Ue-hri: a new dataset for
    the study of user engagement in spontaneous human-robot
    interactions.  In Proceedings of the 19th ACM International
    Conference on Multimodal Interaction, pages 464-472. ACM, 2017.

[Wortwein] Torsten Wörtwein, Mathieu Chollet, Boris Schauerte,
    Louis-Philippe Morency, Rainer Stiefelhagen, and Stefan Scherer.
    2015. Multimodal Public Speaking Performance Assessment. In
    Proceedings of the 2015 ACM on International Conference on
    Multimodal Interaction (ICMI '15). Association for Computing
    Machinery, New York, NY, USA, 43-50.

[Chollet21] Chollet, M., Marsella, S., & Scherer, S. (2021). Training
    public speaking with virtual social interactions: effectiveness of
    real-time feedback and delayed feedback. Journal on Multimodal User
    Interfaces, 1-13.


Other references:

[TPT] https://www.telecom-paristech.fr/eng/

[IMTA] https://www.imt-atlantique.fr/fr

[SocComp.]
https://www.tsi.telecom-paristech.fr/recherche/themes-de-recherche/analyse-automatique-des-donnees-sociales-social-computing/

[SSA] http://www.tsi.telecom-paristech.fr/ssa/#

[PACCE] https://www.ls2n.fr/equipe/pacce/

[Clavel] https://clavel.wp.imt.fr/publications/

[Chollet] https://matchollet.github.io/

[Biancardi] https://sites.google.com/view/beatricebiancardi

-   Rasipuram, Sowmya, and Dinesh Babu Jayagopi. "Automatic multimodal
    assessment of soft skills in social interactions: a review."
    Multimedia Tools and Applications (2020): 1-24.

-   Sharma, Rahul, Tanaya Guha, and Gaurav Sharma. "Multichannel
    attention network for analyzing visual behavior in public
    speaking." 2018 IEEE Winter Conference on Applications of Computer
    Vision (WACV). IEEE, 2018.

-   Acharyya, R., Das, S., Chattoraj, A., & Tanveer, M. I. (2020,
    April). FairyTED: A Fair Rating Predictor for TED Talk Data. In
    Proceedings of the AAAI Conference on Artificial Intelligence (Vol.
    34, No. 01, pp. 338-345).