Xerox Research Centre Europe, located in Grenoble, is offering the following internship for Spring 2015: ------------------------------------------------------------------- Internship: Modeling next sentence in a dialog using vector space models ------------------------------------------------------------------- See: http://www.xrce.xerox.com/About-XRCE/Internships/Modeling-next-sentence-in-a-dialog-using-vector-space-models Contacts : Dymetman, Marc marc.dymetman@xrce.xerox.com Venkatapathy, Sriram sriram.venkatapathy@xrce.xerox.com Please mention "Modeling next sentence in a dialog using vector space models" in your subject line. Duration: 4.5 months Start Date: April - June 2015 Given the context of an observed dialog history up to a certain point, the goal of this internship will be to develop predictive models for the next utterance. A good model of the next utterance should substantially reduce its ``perplexity'' with respect to a baseline language model. Based on a large collection of available chats in the customer-care domain, the feasibility of learning such models will be explored. The perplexity score will be used to evaluate the performance of various models that will be examined. The internship will also involve using such models for the task of actually predicting the next utterance, up to some limited editing actions. The idea will be to retrieve the top candidates for the next sentence from all the sentences that have ever been uttered. One of the criteria for retrieval will be the perplexity of these sentences relative to the learnt model. For modelling, we will focus on models that represent the dialog history as a real vector. Such representations have been extensively used in recent years for encoding the semantics of words and sentences. Several ideas will be explored using frameworks such as Topic modelling and Deep Neural Networks. The ideal candidate is a Computer Science student at the Master or (preferably) PhD level, with a background in Machine Learning and NLP. Strong programming skills are a requirement (Python, C++, Java...). Conference publication of the work will be strongly encouraged.