Post-doctoral position at ERIC, University of Lyon (France): Modeling and analysis dynamics of web reputation. ERIC is a research unit specialized in business intelligence and data mining. In the context of the ImagiWeb project, a 10-month postdoctoral position is available on machine learning / data clustering, with applications to social media analysis. See below for more details about the offer. Julien Velcin julien.velcin@univ-lyon2.fr Post-doctoral Position Title: Modeling and analysis dynamics of web reputation Supervision: Julien Velcin, Stephane Bonnevay Place: ERIC Lab (University of Lyon) Duration: 10 months Funding: 2250 ¤ per month (ANR Project ImagiWeb, gross salary) Context The ImagiWeb project, funded by the French National Research Agency (2012-2015), aims at studying the image (a.k.a. web reputation) of entities of various kinds (companies, politicians etc.) as this is diffused and viewed on the Internet. The study of these representations and their dynamics is considered today to be a real challenge, which deals with several issues related to data mining: information/topic extraction, opinion mining, web reputation, social network analysis etc. The project involves six partners (3 academic labs, 3 companies) and two real case studies are considered. Description Building and tracking images (web reputations), by taking into account both temporal and spatial dimensions, can be addressed as a machine learning problem. In the context of the ImagiWeb project, this post-doctoral position aims mainly at building new unsupervised machine learning models and algorithms. Graphical models [1, 8], probabilistic models or various dynamic models that take into account conditional dependences, will be considered for dealing with this problem. For data clustering, several models have been designed for the attribute-value data [5, 6] and relational data [9]. Recent models of evolutionary clustering have been proposed for integrating the temporal evolution into the process [2, 3, 4, 10]. Researchers of the ImagiWeb project have recently designed a new model for representing and manipulating the "images" (paper under review). This model is able to deal with entities (e.g., politicians, companies, brands etc.), temporally described by opinionated labels. Up to now, it has been tested on short descriptions extracted from a sample of Twitter. The recruited researcher will address theoretical issues and she/he will perform experiments on real datasets provided by the ImagiWeb project. More precisely: - She/he will update the graphical model in both addressing some of its shortcomings and integrating additional information available in the data (in particular, the author of the message). - She/he will propose an accurate way to deal with the timeline by going beyond evenly-distributed time windows, for instance by using the notion of change points [7]. - She/he will participate in submitting the new model(s) to a high-level international conference in machine learning and/or data mining. - She/he will design the algorithm that implements the new model and test it on the datasets of the ImagiWeb project. She/he will be involved in the integration of the code into a full prototype. Profile requirements Applicants must have a PhD Thesis in Science with a clear research orientation. Priority will be given to students who have already worked in the domains of statistical machine learning, probabilistic graphical models, data clustering. Application procedure Applications must be sent by email to Julien Velcin (julien.velcin@univ-lyon2.fr) and Stéphane Bonnevay (stephane.bonnevay@univ-lyon1.fr). Candidates should send the following elements: - Cover letter - CV (including recent publications) - Marks and awards obtained during their Master degree - Recommendation letters After a first selection step, interviews will be organized before taking the final decision. References [1] C.M. Bishop. Pattern recognition and machine learning, volume 4. Springer, New York, 2006. chapter 8. [2] Fuyuan Cao, Jiye Liang, Liang Bai, Xingwang Zhao, and Chuangyin Dang. A framework for clustering categorical time-evolving data. Fuzzy Systems, IEEE Transactions on, 18(5):872-882, October 2010. [3] Deepayan Chakrabarti, Ravi Kumar, and Andrew Tomkins. Evolutionary clustering. In Inter- national conference on Knowledge discovery and data mining, KDD '06, pages 554-560. ACM, 2006. [4] Yun Chi, Xiaodan Song, Dengyong Zhou, Koji Hino, and Belle L. Tseng. Evolutionary spectral clustering by incorporating temporal smoothness. In International conference on Knowledge discovery and data mining, KDD '07, pages 153-162. ACM, 2007. [5] A.P. Dempster, N.M. Laird, and D.B. Rubin. Maximum likelihood from incomplete data via the em algorithm. Journal of the Royal Statistical Society. Series B (Methodological), pages 1-38, 1977. [6] MAF Figueiredo and A.K. Jain. Unsupervised learning of finite mixture models. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 24(3):381-396, 2002. [7] Lajos Horváth and Marie Husková. Change-point detection in panel data. Journal of Time Series Analysis, 33(4):631-648, 2012. [8] D. Magatti. Graphical models for text mining : knowledge extraction and performance estimation. PhD thesis, Universita degli Studi di Milano-Bicocca, 2010. [9] M. Shafiei and H. Chipman. Mixed-membership stochastic block-models for transactional net- works. In Data Mining (ICDM), 2010 IEEE 10th International Conference on, pages 1019-1024. IEEE, 2010. [10] Tianbing Xu, Zhongfei (Mark) Zhang, Philip S. Yu, and Bo Long. Dirichlet process based evolu- tionary clustering. In International Conference on Data Mining, ICDM '08, pages 648-657. IEEE Computer Society, 2008.