Inducing semantic dimensions for a Personal Information Platform Position type: Post-doctoral Fellow Functional area: Palaiseau Research theme: Algorithmics, programmation, software and architecture Project: AT-SAC Scientific advisor: Gregory.Grefenstette@inria.fr HR Contact: elodie.barra@inria.fr Application deadline: 30/06/2014 About Inria and the job Public science and technology institution established in 1967, Inria is is the only public research body fully dedicated to computational sciences. Combining computer sciences with mathematics, Inria's 3,400 researchers strive to invent the digital technologies of the future. Educated at leading international universities, they creatively integrate basic research with applied research and dedicate themselves to solving real problems, collaborating with the main players in public and private research in France and abroad and transferring the fruits of their work to innovative companies. The researchers at Inria published over 4,800 articles in 2010. They are behind over 270 active patents and 105 start-ups. In 2010, Inria's budget came to 252.5 million euros, 26% of which represented its own resources. With recent advances in technology behind the movements of Quantified Self, LifeLogging and Souveillance, people will soon be generating enormous quantities of data associated with the personal lives. The TRACES project seeks to transform this data into classified information and privately exploitable knowledge by creating the semantic structures to access personal information archives. User generated personal data can be textual (emails sent, internet posts, instant messages), attached to a user from external sources (email sent, messages received, web browsing history), or passively captured by wearable computing (GPS coordinates, digital glasses motion captures, vital signs). In order to store and retrieve and exploit this information, it has to be classified and semantically annotated. To perform this classification, we have semantic resources built by experts (e.g., MeSh for medical knowledge, the NASA thesaurus for aeronautics, etc.) We have general knowledge resources built by lexicographers (e.g. dictionaries, WordNet). There are also crowd-sourced semantic resources (FreeBase, DMOZ, etc.). But for this personal information, we will need personal semantic resources. An example of such a resource is the face recognition models that Google Photos or Facebook builds for your labeled friends and family. We do not yet know how to build user-oriented, personal semantic models and resources from a person?s digital life: mails, browsing, quantified life, daily routes, vital statistics. Though collection and classification of personal information is exploited for categorising people into advertising or national security categories, producing personal categories that allow a user to exploit their own digitally generated and captured information for their own benefit remains an open research problem that the TRACES project addresses. Mission The principal mission of the postdoctoral candidate is to find new ways of inducing taxonomies and semantic dimensions from user generated and user captured personal data, integrating textual, quantified, geolocalized, image, sound and video data. The postdoc will also assist the TRACES team in developing algorithms and technology for creating a platform for private, personal information management. Job offer description The postdoctoral researcher will perform the following work: implement recent taxonomy/ontology induction algorithms, adapting them to the problem of personal information (see references below), applying them to personal data contributed by TRACES team members, evaluating results, and present their results in an international conference or workshop; aid TRACES members in the construction of a private personal information platform based on open source information retrieval systems (Lucene/SolR); discover and adapt existing open data taxonomies to the platform; study how GPS information and other quantified personal data can be integrated into and augment the personal semantic structures induced from textual sources References: Olena Medelyan, Steve Manion, Jeen Broekstra, Anna Divoli, Anna-Lan Huang, and Ian H. Witten (2013): Constructing a Focused Taxonomy from a Document Collection, ESWC 2013 Treeratpituk, Pucktada, Madian Khabsa, and C. Lee Giles. "Graph-based Approach to Automatic Taxonomy Generation (GraBTax)." arXiv preprint arXiv:1307.1718 (2013). Cimiano, Philipp, and Johanna Völker. "Text2Onto." Natural language processing and information systems. Springer Berlin Heidelberg, 2005. 227-238. Skills and profile Experience with natural language processing (e.g., Stanford Parser) Experience with ontologies/taxonomies (e.g., MeSH, FreeBase) Experience with classification algorithms Familiarity with dealing with large, noisy data sets Experience with web crawler, information retrieval systems (e.g., Lucene/Solr/ ElasticSearch) Desire to produce functioning end-to-end systems, life-scale live demos Scientific rigour Imagination Benefits - Duration : 12 months - Salary: 2.621 euros gross monthly Additional information Place of work : Plateau de Saclay Contact : Gregory.Grefenstette@inria.fr Security and Defense procedure: In the interests of protecting its scientific and technological assets, Inria is a restricted-access establishment. Consequently, it observes special regulations for welcoming foreign visitors from outside of the Schengen area. The final acceptance of each candidate thus depends on applying this security and defense procedure.