Titre : Automatic moderation of cultural events Description : a. ideactiv is a young start-up heavily relying on NLP in order to read that data of cultural events on the website of cultural organisations (theatres, show venues, etc.). This allows these organisations to share their events with cultural media and promote them without any manual operation. ideactiv reads ~15 000 events per 24 hours, on the website of ~10 000 organisations. Data can be seen on www.ideactiv.com ideactiv has beencreated by Thomas Chenevier, an alumnus of Ecole Polytechnique. It is used daily by hundreds of cultural organisations, and by national cultural media. Working for ideactiv is a great opportunity to get experience on "deep search engines", ie search engines that index objects according to their meaning and not just web pages according to their content. b. Topics. For the data to be actually useful: 1. the description of events needs to be clean. 2. the system must categorize an event based on its title and description and venue. 3. the system must generate a short description of the events. 4. the system should detect multiple identical pictures of an event. These 4 topics aredescribed in more details at www.ideactiv.com/doc/2025-NLP-ideactiv-moderation.pdf c. The internship should address at least topic 1 and/or topic2, and if possible topic 3 and/or topic 4. The goal of the internship is to produce a system able to work in production, working autonomously behind an API, with a code that is clean and commented so that it can be maintained and improved over time after the internship. The system developed must be as frugal as possible, and must therefore not rely on third party services, except probably for topic 3. ideactiv will provide lots of data to train and test the system, a great knowledge of these issues, and real-time feedback on the outcome of the system to be developed. The internship will be mostly remote, with regular online sessions and some in person sessions with ideactiv developer (who based in Brussels and can come to Paris). If the internship proves successful, ideactiv might provide hiring opportunities as of summer 2025, in order to deepen the research and development on these topics (and more generally on the automatic extraction and structuration of data from websites). d. Please send your questions, motivation for these topics and applications to thomas@ideactiv.com Applications are expected end of January / early February 2025, for an internship that should start btw February and April, and last for at least 4 months.