In the context of the upcoming interdisciplinary project "impresso - Media Monitoring of the Past II" ("impresso doppio"), the EPFL Digital Humanities Laboratory is looking for a research data engineer who will work with us on the design, development and evaluation of large-scale text mining pipelines for multilingual historical newspaper and radio archives. About EPFL: EPFL, the Swiss Federal Institute of Technology in Lausanne, is one of the most dynamic university campuses in Europe and ranks among the top 20 universities worldwide. The EPFL employs more than 6,000 people supporting the three main missions of the institutions: education, research and innovation. The EPFL campus offers an exceptional working environment at the heart of a community of more than 16,000 people, including over 12,000 students and 4,000 researchers from more than 120 different countries. About the project: "impresso - Media Monitoring of the Past II" is an interdisciplinary research project which aims to pioneer new approaches to the joint exploration of newspaper and radio archive contents across time, languages, and national borders. Funded by the Swiss National Science Foundation and the Luxembourg National Research Fund (2023-2027), it is carried by the EPFL DHLAB, the Department of Computational Linguistics of the University of Zurich, the Centre for Contemporary and Digital History (C2DH) and the History Department of the University of Lausanne, with the additional support of 21 European partners. Computational linguists, computer scientists, digital humanists, historians, and designers will work closely together to enrich and connect newspaper and radio sources through multiple layers of cutting-edge semantic enrichments represented in a shared multilingual vector space, and to design adequate, meaningful and transparent exploration capabilities for (data-driven) historical research in transnational and transmedia perspective. Impresso doppio follows on from the first impresso project which developed a scalable architecture for the processing of Swiss and Luxembourgish newspaper collections and created an interface with powerful search, filter and discovery functionalities based on semantic enrichments. The present project puts forward the vision of a complete connection between media archives across languages and media types. Application deadline: 21.04.2023. Interviews: End of April. Place of work: EPFL DHLAB, Lausanne, Switzerland. Salary: according to EPFL salary scales and experience. How to apply: please upload your application (full CV and cover letter) via this portal. Research Data Engineer in Natural Language Processing Your mission : The impresso project will compile an unprecedented transmedia and transnational corpus (historical newspaper and radio collections from 8 Western European countries) and develop a technical framework for its annotation, integration and exploitation. In this endeavour, you will lead the activities related to the management and engineering of the project data and system architecture. In collaboration with other project team members, you will contribute to the design and implementation of the technical framework. Main duties and responsibilities include : Key responsibilities: - Design and implement scalable data pipelines to convert, cleanse, integrate and consolidate media archives. This includes defining appropriate data structures, models and formats for source documents and enrichments, as well as developing large-scale ingest workflows. - Establish a sustainable system architecture and pipeline management, including unit and integration testing. - Manage, document, and release code modules and datasets. - Actively collaborate with C2DH and UZH teams on data modelling, formats and APIs. - Engage in participative interface and API design with project team and partners. - Contribute to the organisation of annotation and evaluation campaigns (e.g. in the vein of HIPE). - Contribute to the organisation of project workshops on the development and adoption of standards for the representation and exchange of historical data (raw material and annotations). - Contribute to the definition of a roadmap towards the long-term maintenance and expansion of a rich ecosystem of tools, resources and services around historical media. - Participate in other impresso work packages where your expertise is required and coordinate with project team members and partners. - Initiate and/or contribute to scientific publications on data releases, processing and standards (and more topics if interested). The work will be carried out in collaboration with the project team (ca. 12 people). Your profile : - An experienced research data engineer (2-4 years) or NLP researcher/programmer with an interest in history, media and participatory design. - A degree in computer science, natural language processing or a related field (master or PhD), or equivalent professional experience. - Proficiency in: Python; Unix-based operating systems; database development and use (mysql and nosql); use of cloud storage and cloud computing (S3 object storage, Kubernetes); automation and scripting. - Good understanding of machine learning. - Willingness to write good documentation. - Good communication skills. - Strong collaborative and team spirit. - Autonomous and accountable with a proactive approach. - Efficient, committed to deadlines and concerned with production readiness. - Fluency in English. - Comfortable in an international and multi-cultural context. Desirable - Experience working in a scientific and academic context. - Knowledge of French or German is a plus. - Interest in getting involved in supervising activities (MSc students). - Interest in writing scientific papers (on data and infrastructure-related topics, or more if interested). We offer : - Opportunity to join an experienced and highly motivated interdisciplinary team conducting innovative and relevant research at the intersection of computer science and humanities research. - Applied research framework: what you will develop will be deployed in production and directly used by a community of researchers. - Work in an interdisciplinary team at the intersection of computer science, NLP, history, journalism and digital library. - Flexible working hours and teleworking. Located in Lausanne, Switzerland, EPFL has a highly international environment, state-of-the-art research facilities, and is consistently ranked among the world's leading institutions in scientific research. Lausanne is a vibrant and cosmopolitan city centre in a unique natural environment with great outdoor activities (Jura, Alps, Lake Leman). Salaries and benefits are internationally competitive. Start date : Foreseen start of contract: 01.09.2023 Term of employment : Fixed-term (CDD) Duration : 3.5 years (1-year contract renewable until the end of Feb 2027) Contact : For any questions, feel free to contact Maud Ehrmann : maud.ehrmann@epfl.ch Remark : Only candidates who applied through EPFL website or our partner Jobup's website will be considered. Files sent by agencies without a mandate will not be taken into account. Reference : Job Nb 2821 https://recruiting.epfl.ch/Vacancies/2821/Description/2