Political Viewpoint Analysis from Social and Web Sources

Xavier Tannier (LIMSI/ U. Paris Sud, CNRS)
Ioana Manolescu (INRIA, U. Paris Sud)

Duration: 5-6 months, the starting date is flexible (ideally March
1st, 2016)

Location: Orsay, France

Keywords: Natural Language Processing, Text Mining, Information
Extraction, Information Retrieval, Social data management, Databases
(the tasks will be adapted according to interest and level of
qualification of the candidate).

The work is to be carried in close collaboration between the INRIA OAK
team, LIMSI-CNRS and the major French newspaper Le Monde. The work
is supported by a Computational Research Journalism Award from Google

Context

The French political arena comprises many political parties, spanning
from the extreme left to the extreme right. In 2012, no less than 10
candidates ran for the presidential election (12 in 2007), with
sensitivities such as communism, socialism, ecology, centre left,
centre right, right, far right, extreme right, or even one literally
called "hunting, fishing, nature and tradition". Further, as in any
democratic countries, ideas and values intertwine with group tactics,
personal ambitions, communication strategies, that make politicians
take stands which do not always directly copy those of their political
party.

For all these reasons, deciphering politician candidates' positions
and deciding which candidate is the closest to our own opinions is a
complex task for citizens. Similarly, analyzing statements and facts
into perspective is complex work for journalists.  The goal of the
internship is to automatically build (from a variety of sources such
as online news articles, Twitter feeds, structured databases etc.)
topical threads that will organize and visualize claims made by
politicians, in order to help journalists and citizens decode them and
distinguish between personal opinion, communication tools established
by the parties, and voluntary distortions of the reality.

Data sources. 

We will use different textual and/or structured data sources as input
to our extraction process:

• Newspaper articles from Le Monde web site and printed version.

• Social network data, that comes already endowed with metadata
specifying e.g., the author and date of every information item
published in a social context, and possibly previously published items
to which the new item refers e.g., re-tweets);

• The link structure (e.g., news articles citing each other, links
appearing in tweets), from which we will extract information on a
thread continuation;

• Background knowledge concerning political affiliations of people, as
well as their position in the political chessboard (importance and
side), can be exploited to contextualize the social item content;

• Possibly, data such as voting intention polls, in order to track
events to have an influence on them.  Scientific areas. This project
involves scientific fields such as Natural Language Processing,
Information Extraction, Information Retrieval, Social data management,
Database, Data visualization.


Description

Depending on the level of qualification and duration of the candidate
internship, (s)he will work on one or several of the following steps:

• Using existing metrics (such as mutual information, tf-idf, lexical
  specificity) for extracting differences and similarities between
  claims from different political sides, or different political
  personalities.

• Implementing and adapting unsupervised algorithms such as topic
models (e.g. LDA) on different types of claims.

• Building a unified framework for collecting, organizing and querying
claims from various sources.  

Required competencies are: good software development skills, strong
qualification in one or several of the scientific areas involved in
the project (demonstrated e.g., by academic results or past successful
projects), good communication skills and willingness to work in a
team.  On a daily basis, work will take place in a collaborative team
comprising the internship supervisors, an INRIA engineer whose task it
is to oversee and coordinate the development of our unified platform
for data-based fact checking, and probably other interns. The work is
related to a longer scientific effort within the four-years ANR
project ContentCheck (Content Management Techniques for Fact-Checking:
Models, Algorithms, and Tools); the project starts in January 2016.

The internship may lead to a PhD within the project.
 
The internship will take the form of a full-time INRIA employment
contract. The intern will be paid 1100 €/month.

Contacts

• Ioana Manolescu (ioana.manolescu@inria.fr),
  http://pages.saclay.inria.fr/ioana.manolescu/

• Xavier Tannier (xavier.tannier@limsi.fr),
  https://perso.limsi.fr/xtannier/en/