An open internship position on a mix of string algorithms, grammatical inference and statistical machine learning Description We are looking for a motivated intern for a project involving the use of sequential patterns for the inference of grammars. The Smallest Grammar Problem is the problem of finding the smallest context-free grammars that generates exactly one given sequence. We plan to generalize this in order to find grammars which generate a set of natural language documents with a strong but hidden structure. This structure will then be converted into additional features (through tree kernels for example) in our analytics pipeline or alternatively as a starting template for existing multilingual authoring tools. Requirements: Research-oriented master student or PhD candidate in computer science Knowledge of standard text algorithms and data structures Knowledge of formal grammars (a course covering the Hopcroft & Ullman book or equivalent for example) Knowledge in statistical machine learning applied to text is a strong plus Fluency in either C, C++ or Java is a plus The intern will work closely with researchers in a very international environment, and will be strongly encouraged to produce scientific publications. Duration: 5-6 months Start Date: March-April 2014 Application instructions Informal inquiries are welcome and can be made at matthias.galle@xrce.xerox.com . To submit an application, please send your CV and cover letter to both xrce-candidates@xrce.xerox.com and matthias.galle@xrce.xerox.com . Ideally, you will also include in your CV people we can contact for letters of recommendation. link: http://www.xrce.xerox.com/About-XRCE/Internships/Grammatical-Inference-with-sequential-motifs