Title : Reference-Free Evaluation of Image Descriptions for Visually
Impaired Users

*Supervisors
Camille Guinaudeau
Michèle Gouiffès

Keywords : Natural Language Processing, image captioning,
computer vision, IA, visually impaired users

Project Description:
The accessibility of visual content for blind or visually impaired
individuals heavily relies on the quality of automatically generated
textual descriptions, commonly referred to as alternative text. These
descriptions play a crucial role in conveying essential information
embedded in images, videos, or other visual content. However,
generating descriptions that are truly useful for this audience
presents several challenges. Effective descriptions must not only
capture the image content but also adapt to the usage context, be
concise for clarity, and precisely transmit key visual information.
One of the significant challenges in this field is evaluating the
quality of generated alternative texts. Current metrics, often based on
comparisons with reference descriptions (e.g., BLEU, CIDEr, or SPICE),
are ill-suited to assess the relevance and usability of descriptions
from the perspective of visually impaired users. Moreover, the scarcity
of datasets tailored for alternative text generation and evaluation
adds another layer of complexity to the problem. Building on our prior
work, including the development of the AD2AT dataset [1], consisting of
3,000 image-alternative text pairs and the ContextRef benchmark [2],
this internship aims to address these challenges by focusing on
reference-free evaluation of image descriptions for visually impaired
users.


Internship objectives:
The primary objective of this internship is to deepen the analysis of
this dataset to identify the key characteristics of descriptions that
best meet the needs of blind or visually impaired individuals. To
achieve this, we aim to examine various aspects, such as the textual
richness of the descriptions, the specific parts of the image being
described, and the level of semantic overlap with the global context of
the image. This preliminary analysis will help establish objective and
relevant criteria for evaluating the quality of automatically generated
descriptions. Based on this foundation, we then plan to propose a new
reference-free evaluation metric tailored to the specific accessibility
requirements. The new metric we aim to develop should be capable of
finely and effectively assessing image descriptions in relation to
their context of appearance, while taking into account the specific
needs of blind users. Finally, to validate this approach, we plan to
implement a manual evaluation of our metric in collaboration with the
Institut National des Jeunes Aveugles (INJA), with whom we have already
established contact. This step will allow us to compare the
automatically generated scores with feedback from end users, refining
the metric to ensure its robustness.
[1] Élise Lincker, Camille Guinaudeau and Shin'ichi Satoh. AD2AT:
AudioDescription to Alternative Text, a Dataset of Alternative Text
from Movies. In the Internation Conférence on Multimedia Modelling 2025
[2] Kreiss, E., Zelikman, E., Potts, C., & Haber, N. ContextRef:
Evaluating Referenceless Metrics for Image Description Generation. In
The Twelfth International Conference on Learning Representations.

*Practicalities
The internship will be funded 659,76 ¤ per month + reimbursement of
transport costs (75% monthly or annual Navigo pass) for a duration of
5 or 6 months (starting in March or April 2025) and will take place at
LISN within the LIPS team.

Candidate Profile:
We are seeking highly motivated candidates with the following
qualifications:
-   Education: Master's degree (M2) in Computer Science, with a
    preference for candidates experienced in Natural Language
    Processing (NLP), Computer Vision (CV), or Artificial
    Intelligence (AI).
-   Technical Skills:
    -   Proficiency in Python and familiarity with deep learning
        libraries such as TensorFlow, PyTorch, or Keras.
    -   Experience in data analysis and handling multimodal datasets is
        a plus.
-   Soft Skills: Strong analytical abilities, an interest in
    accessibility and human-centric AI, and the ability to work
    independently and collaboratively in a research environment.

To apply, please send your CV, a cover letter and your M1 and M2
transcripts (if available) by email to
Camille Guinaudeau camille.guinaudeau@universite-paris-saclay.fr and
Michèle Gouiffès gouiffes@lisn.fr.