In conjunction with the European Commission Joint Research Center, ELDA offers a 6-month position to produce an updated version of the sentence-aligned multilingual parallel corpus JRC-Acquis (http://langtech.jrc.ec.europa.eu/JRC-Acquis.html) *Purpose of the work / Tasks:* - Download multilingual EU documentation from a server via a dedicated Java application - Convert all documents to a standardised XML format - Clean and pre-process the data by identifying specific text parts such as document footers, lists of addresses and annexes - Possibly: run off-the-shelf tools to sentence align the documents - Carry out consistency checking of the data - Produce statistics on the data - Prepare the data for distribution - Various Perl scripts to produce the first version of the corpus exist and should be reused. *Profile and required skills:* - Degree or MSc in computer science, computational linguistics, natural language processing or similar fields - Good knowledge of Perl to read and change existing data processing scripts. - Java and SQL, to use the application accessing the EU's document database. - XML and XSLT - Proficiency in English - At least passive knowledge of several of the 23 official EU languages (see the JRC-Acquis page for details) Salary: Commensurate with qualifications and experience. Applications will be considered until the position is filled. The position is based in Paris, France, with about one week at the European Commission's Joint Research Centre (JRC) at Ispra in Northern Italy. Candidates should have the citizenship (or residency papers) of a European Union country. Applicants should send (preferably via email) a cover letter addressing the points listed above together with a curriculum vitae to: Victoria Arranz ELRA / ELDA 55-57, rue Brillat Savarin 75013 Paris France Fax : +33 1 43 13 33 30 Email : _job@elda.org _ For further information about ELRA/ELDA, see: http://www.elda.org http://www.elra.info For further information about JRC, see: http://langtech.jrc.ec.europa.eu