Title : Reference-Free Evaluation of Image Descriptions for Visually Impaired Users *Supervisors Camille Guinaudeau Michèle Gouiffès Keywords : Natural Language Processing, image captioning, computer vision, IA, visually impaired users Project Description: The accessibility of visual content for blind or visually impaired individuals heavily relies on the quality of automatically generated textual descriptions, commonly referred to as alternative text. These descriptions play a crucial role in conveying essential information embedded in images, videos, or other visual content. However, generating descriptions that are truly useful for this audience presents several challenges. Effective descriptions must not only capture the image content but also adapt to the usage context, be concise for clarity, and precisely transmit key visual information. One of the significant challenges in this field is evaluating the quality of generated alternative texts. Current metrics, often based on comparisons with reference descriptions (e.g., BLEU, CIDEr, or SPICE), are ill-suited to assess the relevance and usability of descriptions from the perspective of visually impaired users. Moreover, the scarcity of datasets tailored for alternative text generation and evaluation adds another layer of complexity to the problem. Building on our prior work, including the development of the AD2AT dataset [1], consisting of 3,000 image-alternative text pairs and the ContextRef benchmark [2], this internship aims to address these challenges by focusing on reference-free evaluation of image descriptions for visually impaired users. Internship objectives: The primary objective of this internship is to deepen the analysis of this dataset to identify the key characteristics of descriptions that best meet the needs of blind or visually impaired individuals. To achieve this, we aim to examine various aspects, such as the textual richness of the descriptions, the specific parts of the image being described, and the level of semantic overlap with the global context of the image. This preliminary analysis will help establish objective and relevant criteria for evaluating the quality of automatically generated descriptions. Based on this foundation, we then plan to propose a new reference-free evaluation metric tailored to the specific accessibility requirements. The new metric we aim to develop should be capable of finely and effectively assessing image descriptions in relation to their context of appearance, while taking into account the specific needs of blind users. Finally, to validate this approach, we plan to implement a manual evaluation of our metric in collaboration with the Institut National des Jeunes Aveugles (INJA), with whom we have already established contact. This step will allow us to compare the automatically generated scores with feedback from end users, refining the metric to ensure its robustness. [1] Élise Lincker, Camille Guinaudeau and Shin'ichi Satoh. AD2AT: AudioDescription to Alternative Text, a Dataset of Alternative Text from Movies. In the Internation Conférence on Multimedia Modelling 2025 [2] Kreiss, E., Zelikman, E., Potts, C., & Haber, N. ContextRef: Evaluating Referenceless Metrics for Image Description Generation. In The Twelfth International Conference on Learning Representations. *Practicalities The internship will be funded 659,76 ¤ per month + reimbursement of transport costs (75% monthly or annual Navigo pass) for a duration of 5 or 6 months (starting in March or April 2025) and will take place at LISN within the LIPS team. Candidate Profile: We are seeking highly motivated candidates with the following qualifications: - Education: Master's degree (M2) in Computer Science, with a preference for candidates experienced in Natural Language Processing (NLP), Computer Vision (CV), or Artificial Intelligence (AI). - Technical Skills: - Proficiency in Python and familiarity with deep learning libraries such as TensorFlow, PyTorch, or Keras. - Experience in data analysis and handling multimodal datasets is a plus. - Soft Skills: Strong analytical abilities, an interest in accessibility and human-centric AI, and the ability to work independently and collaboratively in a research environment. To apply, please send your CV, a cover letter and your M1 and M2 transcripts (if available) by email to Camille Guinaudeau camille.guinaudeau@universite-paris-saclay.fr and Michèle Gouiffès gouiffes@lisn.fr.