We are looking for candidates for a research internship position @Deezer Research on AI lyrics detection. Please find the internship description below. The applications could be sent here: https://jobs.smartrecruiters.com/Deezer/743999946532840-ai-lyrics-detection-internship ***** We are a leading company in the music streaming industry, with one of the largest catalogues on the market (over 120 million HiFi music tracks) and more than 10 million worldwide subscribers. Machine learning models are core to our business to manage musical content. With over 100,000 tracks ingested daily, automatically identifying fraudulent content is very important in order to deliver a high-quality service to our users and ensure a fair remuneration for professional artists. AI-generated content has become increasingly easy to produce nowadays, even by inexperienced users with state-of-the-art generative models (e.g. chatGPT for text). Although this could be beneficial to many domains or applications (e.g. code auto-completion), it has also created more opportunities for fraud, abuse and disinformation. Moreover, when generating text, it has been shown that its high-quality level in terms of grammar, fluency and coherence could often fool humans when exposed to the task of identifying fake versus real news or social media posts. In the music domain, multiple models have been recently proposed for lyrics generation with impressive results. While such tools could help songwriters in their creative process, they could also be used with fraudulent intentions. The objective of this internship is to detect if lyrics are AI or human-generated. The proposed solution should be as accurate as possible, data-efficient, but also generalisable. The intern will be supervised by research scientists and engineers from the Deezer Research team, who will provide practical and scientific help with the performed task. The intern is nonetheless encouraged to propose solutions and work autonomously. For experiments, we ensure data availability, cutting-edge technology and appropriate calculus power. Qualifications - Master / PhD student with a background in Computer Science / Computational Linguistics / Applied Mathematics / Statistics. - Strong knowledge of natural language processing, applied machine learning and data mining - Good programming skills for data processing and experimentation (preferred python) - Creativity and autonomy