IRFundusSet - Large heterogeneous retinal fundus dataset

March 16, 2024

TLDR;

Access a larger and more heterogeneous retinal fundus dataset
Integrates and harmonizes pixel-level and label data
Current coverage is 10 public datasets
IRFundusSet paper
IRFundusSet on Github

Table: Composition of IRFundusSet

Availability and quality of training data is a common challenge when developing AI models. Obtaining comprehensive and sufficiently sized datasets of health-related datasets is non-trivial.

A current challenge with retinal fundus datasets is the fragmentation of publicly available datasets. These datasets often differ significantly in their data organization, archiving methods, and in the definition of disease labels. In addition, the definition of a “healthy” or “normal” eye can vary considerably between datasets. Some may include images with minor, non-vision-threatening conditions as “normal,” while others may have stricter criteria. This inconsistency can restrict the overall diversity of data available for training AI models and often necessitates substantial additional curation efforts by researchers.

The Integrated Retinal Fundus Set (IRFundusSet) aims to consolidate, harmonize, and curate several existing public retinal fundus image datasets into a more unified and accessible resource.

The primary goal of IRFundusSet is to facilitate the consumption of these previously fragmented datasets as a cohesive whole by harmonizing the pixel data and providing a consistent “is_normal” label across all included images.
Furthermore, a user-friendly Python package has been created to automate the harmonization process and provide a standardized dataset object that is compatible with popular deep learning frameworks like PyTorch.

Accessing IRFundusSet