Inducing semantic dimensions for a Personal Information Platform
Position type: Post-doctoral Fellow
Functional area: Palaiseau
Research theme: Algorithmics, programmation, software and architecture
Project: AT-SAC
Scientific advisor: Gregory.Grefenstette@inria.fr
HR Contact: elodie.barra@inria.fr
Application deadline: 30/06/2014
 
 
About Inria and the job
Public science and technology institution established in 1967, Inria is
is the only public research body fully dedicated to computational
sciences. Combining computer sciences with mathematics, Inria's 3,400
researchers strive to invent the digital technologies of the future.

Educated at leading international universities, they creatively
integrate basic research with applied research and dedicate themselves
to solving real problems, collaborating with the main players in public
and private research in France and abroad and transferring the fruits of
their work to innovative companies. The researchers at Inria published
over 4,800 articles in 2010. They are behind over 270 active patents and
105 start-ups. In 2010, Inria's budget came to 252.5 million euros, 26%
of which represented its own resources.

With recent advances in technology behind the movements of Quantified
Self, LifeLogging and Souveillance, people will soon be generating
enormous quantities of data associated with the personal lives. The
TRACES project seeks to transform this data into classified information
and privately exploitable knowledge by creating the semantic structures
to access personal information archives. User generated personal data
can be textual (emails sent, internet posts, instant messages), attached
to a user from external sources (email sent, messages received, web
browsing history), or passively captured by wearable computing (GPS
coordinates, digital glasses motion captures, vital signs).

In order to store and retrieve and exploit this information, it has to
be classified and semantically annotated. To perform this
classification, we have semantic resources built by experts (e.g., MeSh
for medical knowledge, the NASA thesaurus for aeronautics, etc.) We have
general knowledge resources built by lexicographers (e.g. dictionaries,
WordNet). There are also crowd-sourced semantic resources (FreeBase,
DMOZ, etc.). But for this personal information, we will need personal
semantic resources. An example of such a resource is the face
recognition models that Google Photos or Facebook builds for your
labeled friends and family. We do not yet know how to build
user-oriented, personal semantic models and resources from a person?s
digital life: mails, browsing, quantified life, daily routes, vital
statistics. Though collection and classification of personal information
is exploited for categorising people into advertising or national
security categories, producing personal categories that allow a user to
exploit their own digitally generated and captured information for their
own benefit remains an open research problem that the TRACES project
addresses.
 
Mission
The principal mission of the postdoctoral candidate is to find new ways
of inducing taxonomies and semantic dimensions from user generated and
user captured personal data, integrating textual, quantified,
geolocalized, image, sound and video data. The postdoc will also assist
the TRACES team in developing algorithms and technology for creating a
platform for private, personal information management.
 
Job offer description
The postdoctoral researcher will perform the following work: implement
recent taxonomy/ontology induction algorithms, adapting them to the
problem of personal information (see references below), applying them to
personal data contributed by TRACES team members, evaluating results,
and present their results in an international conference or workshop;
aid TRACES members in the construction of a private personal information
platform based on open source information retrieval systems
(Lucene/SolR); discover and adapt existing open data taxonomies to the
platform; study how GPS information and other quantified personal data
can be integrated into and augment the personal semantic structures
induced from textual sources

References:
Olena Medelyan, Steve Manion, Jeen Broekstra, Anna Divoli, Anna-Lan
Huang, and Ian H. Witten (2013): Constructing a Focused Taxonomy from a
Document Collection, ESWC 2013
Treeratpituk, Pucktada, Madian Khabsa, and C. Lee Giles. "Graph-based
Approach to Automatic Taxonomy Generation (GraBTax)." arXiv preprint
arXiv:1307.1718 (2013).
Cimiano, Philipp, and Johanna Völker. "Text2Onto." Natural language
processing and information systems. Springer Berlin Heidelberg,
2005. 227-238.
 
Skills and profile
Experience with natural language processing (e.g., Stanford Parser)
Experience with ontologies/taxonomies (e.g., MeSH, FreeBase)
Experience with classification algorithms
Familiarity with dealing with large, noisy data sets
Experience with web crawler, information retrieval systems (e.g., Lucene/Solr/
ElasticSearch)
Desire to produce functioning end-to-end systems, life-scale live demos
Scientific rigour
Imagination
 
Benefits
- Duration : 12 months
- Salary: 2.621 euros gross monthly
 
Additional information
Place of work : Plateau de Saclay
Contact : Gregory.Grefenstette@inria.fr 

Security and Defense procedure:

In the interests of protecting its scientific and technological assets,
Inria is a restricted-access establishment. Consequently, it observes
special regulations for welcoming foreign visitors from outside of the
Schengen area.
The final acceptance of each candidate thus depends on applying this
security and defense procedure.