Retinal Imaging and Image Analysis

22 Mar.,2024

 

As discussed previously in Section II-A, fundus imaging is the most established way of retinal imaging. Until recently, fundus image analysis was the only source of quantitative indices reflecting retinal morphology. Subjects that lend themselves for fundus image analysis include:

As this paper went to press, over 700 papers have been published on these subjects in fundus image analysis, and discussing each one is beyond the scope of this review. Therefore, we have focused only on those fundamental tasks and related approaches to fundus image analysis that are actively researched by a large number of groups: retinal vessel detection (Section IV-A), retinal lesion detection (Section IV-B), construction of fundus-imaging-based retinal atlases (Section IV-C), and analysis of the optic nerve head morphology from fundus photographs (Section IV-E), in more detail. Registration of fundus images and change detection will be discussed in Section VI-A. In addition, individual methods have been combined into disease-detection systems, particularly for diabetic retinopathy [69]–[71].

Because retinal vessel diameter and especially the relative diameters of arteries and veins are known to signal the risk of systemic diseases including stroke, accurate determination of retinal vessel diameters, as well as differentiation of veins and arteries have become more important, several semi-automated and automated approaches have now been published [ 24 ], [ 25 ], [ 79 ]. Other active areas of research include separation of arteries and veins, detection of small vessels with diameters of less than a pixel, and analysis of the complete vessel trees using graphs.

The number of potential features in the multifeature vector that can be associated with each pixel is essentially infinite. One or more subsets of this infinite set can be considered optimal for classifying the image according to some reference standard. Hundreds of features for a pixel can be calculated in the training stage to cast as wide a net as possible, with algorithmic feature selection steps used to determine the most distinguishing set of features. Extensions of this approach include different approaches to subsequently classify groups of neighboring pixels by utilizing group properties in some manner, for example cluster feature classification, where the size, shape and average intensity of the cluster may be used.

The n-dimensional multifeature vectors are calculated for each pixel, frequently utilizing local convolutions with multiple Gaussian derivative, Gabor, or other wavelet kernels [ 78 ]. The image is thus transformed into an n-dimensional feature space and pixels are classified according to their position in feature space. The resulting hard (categorical) or soft (probabilistic) classification is then used to either assign labels to each pixel (for example vessel or nonvessel in the case of hard classification), or to construct class-specific likelihood maps (e.g., a vesselness map for soft classification).

Originally, pixel intensity was used as a single feature. More recently, n-dimensional multifeature vectors are utilized including pixel contrast with the surrounding region, its proximity to an edge, and similarity. Two distinct stages are required for a supervised learning/classification algorithm to function: 1) a training stage, in which the algorithm “statistically learns” to correctly classify pixels from known classifications, and 2) a testing or classification stage in which the algorithm classifies previously unseen images. For proper assessment of supervised classification method functionality, training data and performance testing data sets must be completely disjoint [ 77 ].

Though not by design, the similarities among the different approaches to vessel detection are often not obvious at first, because of different terms used for the same concepts. For example, template matching, kernel convolution, detector correlation all describe the same concept explained in more detail in the following, though implementation details may vary.

One suitable approach for detecting such lesions is to use a retinal atlas, where the image is routinely compared to a generic normal retina (Section IV-C). After building a retinal atlas by registering the fundus images according to a disc, fovea and a vessel-based coordinate system, image properties at each atlas location from a previously unseen image can be compared to the atlas-based image properties. Consequently, locations can be identified as abnormal if groups of pixels have values outside the normal atlas range.

Performance of a system that has been developed for screening should not be evaluated based solely on its sensitivity and specificity for detection of that disease. Such metrics do not accurately reflect the complete performance in a screening setup. Rare, irregular, or atypical lesions often do not occur frequently enough in standard datasets to affect sensitivity and specificity but can have huge health and safety implications. To maximize screening relevance, the system must therefore have a mechanism to detect rare, atypical, or irregular abnormalities, for example in DR detection algorithms [ 70 ]. For proper performance assessment, the types of potential false negatives—lesions that can be expected or shown to be incorrectly missed by the automated system—must be determined. While detection of red lesions and bright lesions is widely covered in the literature, detection of rare or irregular lesions, such as hemorrhages, neovascularizations, geographic atrophy, scars and ocular neoplasms has received much less attention, despite the fact that they all can occur in combination with diabetic retinopathy and other retinal diseases as well as in isolation. For example, presence of such lesions in isolated forms and without any co-occurrence of small red lesions are rare in DR [ 59 ] and thus missing these does not affect standard metrics of performance such as ROC curves to a measurable degree, except if these are properly weighted as corresponding to serious lesions.

Because the different types of bright lesions have different diagnostic importance and patient management implications, algorithms should be capable not only of detecting bright lesions, but also of differentiating among the bright lesion types. One example algorithm capable of detection and differentiation of bright lesions was reported in [ 85 ]. The algorithm is based on an earlier red lesion algorithm [ 84 ] and includes the following main steps.

Often, bright lesions, defined as lesions brighter than the retinal background, can be found in the presence of retinal and systemic disease. Drusen are the hallmark of age-related macular degeneration, cotton wool spots are typical for diabetic retinopathy and hypertensive retinopathy, while lipoprotein exudates are most frequently seen in diabetic retinopathy, but also in Coats’ disease and other retinal disorders. To complicate the analysis, flash artifacts can be present as false positives for bright lesions. If the lipoprotein exudates would only appear in combination with red lesions, they would only be useful for grading diabetic retinopathy. The exudates can, however, in some cases appear as isolated signs of diabetic retinopathy in the absence of any other lesion. Therefore, their importance is strengthened and several computer-based systems to detect exudates have been proposed [ 80 ], [ 85 ], [ 87 ], [ 88 ], [ 90 ].

Other recent algorithms only detect microaneurysms and forego a phase of detecting normal retinal structures like the optic disc, fovea and retinal vessels, which can act as confounders for abnormal lesions. Instead, the recent approaches find the microaneurysms directly [ 89 ] using template matching in wavelet-subbands. In this approach, the optimal adapted wavelet transform is found using a lifting scheme framework. By applying a threshold on the matching result of the wavelet template, the microaneurysms are labeled. This approach has meanwhile been extended to explicitly account for false negatives and false positives [ 69 ]. Because it avoids detection of the normal structures, such algorithms can be very fast, on the order of less than a second per image.

Niemeijer et al. [ 84 ] presented a hybrid scheme that used both the top-hat based method as well as a supervised pixel classification based method to detect the microaneurysm candidates in color fundus photographs. This method allowed for the detection of larger red lesions (i.e., hemorrhages) in addition to the microaneurysms using the same system. A large set of additional features, including color, was added to those described in [ 82 ] and [ 86 ]. Using the features in a supervised classifier distinguished between real and spurious candidate lesions. These algorithms can usually deal with overlapping microaneurysms because they give multiple candidate responses.

Initially, red lesions were detected in fluoroscein angiograms because their contrast against the background is much higher than that of microaneurysms in color fundus photography images [ 81 ], [ 82 ], [ 86 ]. Hemorrhages mask out fluorescence and present as dark spots in the angiograms. These methods employed a mathematical morphology technique that eliminated the vasculature from a fundus image but left possible microaneurysm candidates untouched as first described in 1984 [ 39 ]. Later, this method was extended to high-resolution red-free fundus photographs by Hipwell et al. [ 83 ]. Instead of using morphology operations, a neural network was used, for example by Gardner et al. [ 87 ]. In their work, images are divided into 20 × 20 pixel grids and the grids are individually classified. Sinthanayothin et al. [ 88 ] applied a recursive region growing procedure to segment both the vessels and red lesions in a fundus image. A neural network was used to detect the vessels exclusively, and the remaining objects were labeled as microaneurysms.

Historically, red lesion detection algorithms focused on detection of normal anatomical objects, especially the vessels, because they can locally mimic red lesions. Subsequently, a combination of one or more filtering operations combined with mathematical morphology is employed to detect red lesion suspects. In some cases, suspect red lesion are further classified in individual lesion types and refined algorithms are capable of detecting specific retinal structures and abnormalities as shown in – .

Small red retinal lesions, namely microaneurysms and small retinal hemorrhages, are typical for diabetic retinopathy, hypertensive retinopathy, and other retinal disorders such as idiopathic juxtafoveal teleangiectasia. The primary importance of small red lesions is that they are the leading indicators of diabetic retinopathy. Because they are difficult to differentiate for clinicians on standard fundus images from nonmydriatic cameras, hemorrhages and microaneurysms are usually detected together and associated with a single combined label. Larger red lesions, primarily large hemorrhages and retinal neovascularizations are still problematic and are discussed in Section IV-B3.

In this section, we will primarily focus on detection of lesions in diabetic retinopathy. It has the longest history as a research subject in retinal image analysis. shows examples of a fundus photograph with the typical lesions automatically detected. Many approaches used the following principle ( ): A transform of some kind is used for detecting candidate lesions, after which a mathematical morphology template is utilized to characterize the candidates. This approach or a modification thereof is in use in many algorithms for detecting DR and AMD [ 80 ]. Additional enhancements include the contributions of Spencer, Cree, Frame, and co-workers [ 81 ], [ 82 ]. They added preprocessing steps, such as shade-correction and matched filter post-processing to this basic framework, to improve performance. Algorithms of this kind function by detecting candidate microaneurysms of various shapes, based on their response to specific image filters. A supervised classifier is typically developed to separate the valid microaneurysms from spurious or false responses. However, these algorithms were originally developed to detect the high-contrast signatures of microaneurysms in fluorescein angiogram images. The next important development resulted from applying a modified version of the top-hat algorithm to red-free fundus photographs rather than angiogram images, as was first described by Hipwell et al. [ 83 ]. They tested their algorithm on a large set of >3500 images and found a sensitivity/specificity operating point of 0.85/0.76. Once this step had been taken, development accelerated. The approach was further refined by broadening the candidate detection transform, originally developed by Baudoin to detect candidate pixels, to a multifilter filter-bank approach [ 73 ], [ 84 ]. The filter responses are used to identify pixel candidates using a classification scheme. Mathematical morphology and additional classification steps are applied to these candidates to decide whether they indeed represent microaneurysms and hemorrhages. A similar approach was also successful in detecting other types of DR lesions, including exudates or cotton-wool spots, as well as drusen in AMD [ 85 ].

By creating a retinal atlas using this method, the atlas can be used as a reference to quantitatively assess the level of deviation from normality. An analyzed image can be compared with the retinal atlas directly in the atlas coordinate space. The normality can thus be defined in several ways depending on the application purpose—using local or global chromatic distribution, degree of vessel tortuosity, presence of pathological features, presence of artifacts, etc. shows an example application driven by a retinal atlas, the region where imaging artifacts are present are highlighted. The atlas was created from 1000 color fundus images (two fields per left eye, from 500 subjects without retinal parthology or imaging artifacts).

The atlas landmarks serve as the reference set so each color fundus image can be mapped to the coordinate system defined by the landmarks. As the last step of atlas generation, color fundus images are warped to the atlas coordinate system so that the arch of each image is aligned to the atlas vascular arch. A thin-plate-spline (TPS) [ 94 ] is used in this method for mapping retinal images to the atlas coordinate system. Rigid coordinate alignment as described above is done for each fundus images to register the disc center and the fovea. The seven control points required for TPS are determined by sampling points from equidistant locations in radial directions centered at the disc center. Consequently, 16 (1 at disc center, 1 at fovea, and 2 × 7 on vascular arch) control points are used to calculate the TPS. Usually, the sampling uses smoothed trace lines utilizing third order polynomial curve fitting because naive traces of vascular arch lines could have locally high tortuosity, which may cause large geometric distortions by TPS. illustrates retinal image mapping process by TPS—the vessel main arch that runs along the naive trace (yellow line) is mapped onto the atlas vessel arch (green) line.

For each pair of landmarks p d i and p f i , where i = 0, 1, …, N − 1 and N = 500, pinning all p d i landmarks to an arbitrary point (μ d ) clears the translation. The centroid of the point cloud formed by p f i landmarks is evaluated to get the fovea atlas location (μ f ) so every p f i can be aligned to μ f using the similarity transformation to remove the inter-image variations in scale and rotation. The steps of rigid coordinate alignment for each parameter are illustrated in . Consequently, an aligned pixel position p′ is determined using p as

Retinal images in clinical practice are acquired under diverse fundus camera settings subjected to saccadic eye movement; and with variable focal center, zooming, tilting, etc. Thus, atlas landmarks from training data need to be aligned to derive any meaningful statistical properties from the atlas. Since the projective distortion within an image is corrected during the pairwise registration, the inter-image variations in the registered images appear as the difference in the rigid coordinate transformation parameters of translation, scale and rotation.

An isotropic coordinate system for the atlas is desirable so images can refer to the atlas independent of spatial pixel location by a linear one-to-one mapping. The radial-distortion-correction (RADIC) model [ 92 ] attempts to register images in a distortion-free coordinate system using a planar-to-spherical transformation, so the registered image is isotropic under a perfect registration, or quasi-isotropic allowing low registration error. As shown in , the fundus curvature can be represented in the registration result using the quadratic model, while the RADIC-based registration unfolds the curvature to put the registered image in an isotropic coordinate system. An isotropic atlas makes it independent of spatial location to map correspondences between the atlas and test image. The intensities in overlapping area are determined by a distance-weighted blending scheme [ 93 ].

Choosing either the disc center or fovea alone to define the atlas coordinate system would allow each image from the population to be translated so a pinpoint alignment can be achieved. Choosing both disc and fovea allows corrections for translation, scale, and rotational differences across the population. However, nonlinear shape variations across the population would not be considered—which can be accomplished when the vascular arch information is utilized. The end of the arches can be defined as the first major bifurcations of the arch branches. The arch shape and orientation vary from individual to individual and influence the structure of the remaining vessel network. Establishing an atlas coordinate system that incorporates the disc, fovea and arches allows for translation, rotation, scaling, and nonlinear shape variations to be accommodated across a population.

The choice of atlas landmarks in retinal images may vary depending on the view of interest. Regardless, the atlas should represent most retinal image properties in a concise and intuitive way. Three landmarks can be used as the retinal atlas key features; the optic disc center, the fovea, and the main vessel arch defined as the location of the largest vein–artery pairs. The disc and fovea provide landmark points, while the arch is a more complicated two-part curved structure that can be represented by its central axis. The atlas coordinate system then defines an intrinsic, anatomically meaningful framework within which anatomic size, shape, color, and other characteristics can be objectively measured and compared.

Compared to other anatomic structures (e.g., the brain, heart, or lungs), the retina has a relatively small number of key anatomic structures (landmarks) visible using fundus camera imaging. Additionally, the expected shape, size, and color variations across a population is expected to be high. While there have been a few reports [ 91 ] on estimating retinal anatomic structure using a single retinal image, we are not aware of any published work demonstrating the construction of a statistical retinal atlas using data from a large number of subjects.

D. Assessing Performance of Fundus Image Analysis Algorithms

Fundus lesion detection algorithms are primarily intended to perform automatically and autonomously. In other words, some retinal images may never be seen by a human expert. Consequently, high demands must be placed on the fundus lesion detection system since the performed diagnostic decisions may have vision-threatening consequences. Lesion detection systems are most commonly employed for diabetic retinopathy screening. In all such systems, a high level of confidence in the agreement between the system and expert human readers is required. In reality, the agreement between an automatic system and an expert reader may be affected by many influences—system performance may become impaired due to the algorithmic limitations, the imaging protocol, properties of the camera used to acquire the fundus images, and a number of other causes. For example, an imaging protocol that does not allow small lesions to be depicted and thus detected will lead to an artificially overestimated system performance if such small lesions might have been detected with an improved camera or better imaging protocol. Such a system then appears to perform better than it truly does if human experts and the algorithm both overlook true lesions.

The performance of a lesion detection system can be measured by its sensitivity, a number between 0 and 1, which is the number of true positives divided by the sum of the total number of (incorrectly missed) false negatives plus the number of (correctly identified) true positives [77]. System specificity, also a number between 0 and 1, is determined as the number of true negatives divided by the sum of the total number of false positives (incorrectly identified as disease) and true negatives. Sensitivity and specificity assessment both require ground truth, which is represented by location-specific discrete values (0 or 1) of disease present/absent for each subject in the evaluation set.

The location-specific output of an algorithm can also be represented by a discrete number (0 or 1). However, the output of the assessment algorithm is often a continuous value determining the likelihood p of local disease presence, with an associated probability value between 0 and 1. Consequently, the algorithm can be made more specific or more sensitive by setting an operating threshold on this probability value p. The resulting sensitivity/specificity pairs are plotted in a graph, yielding a receiver operator characteristics (ROC) curve [77], [96]. The area under the ROC curve (AUC, represented by its value Az) is determined by setting a number of different thresholds for p. Sensitivity and specificity pairs of the algorithm are obtained at each of these thresholds. The ground truth is of course kept unchanged. The algorithm behavior represented by this ROC curve can thus be reduced to a single number. The maximum value of AUC is 1, denoting a perfect diagnostic performance, with both the sensitivity and specificity being 1 (100% performance). While the AUC assessment of performance is highly relevant and covers the most important aspects of lesion detection behavior, this approach has a number of limitations, including its dependence on the quality of annotated datasets [69], [70] and on the underestimation of missing rare, but sight- or life-threatening abnormalities, as discussed in Section IV-B3.

1) Performance Comparison of Diabetic Retinopathy Detection Systems to That of Retinal Specialists

Several groups have studied the performance of detection algorithms in a real world setting, i.e., when the systems are used on populations of patients with diabetes, not previously known to have diabetic retinopathy. The main goal of such a system is to decide whether the patient should be evaluated by a human expert or can return for followup, only involving automated analysis of retinal images [70], [71]. As mentioned previously, performance of the algorithm that placed first at the 2009 Retinopathy Online Challenge competition [97] was compared to that of a large computer-aided early DR detection project EyeCheck [59]. In this comparison, fundus photographic sets from 17 877 patient visits of 17 877 people with diabetes who had not previously been diagnosed with DR consisting of two fundus images from each eye were used for performance comparison. The fundus photographic set from each visit was analyzed by a single retinal expert and 792 of the 17 877 sets were classified as containing more than minimal DR (threshold for patient referral). The two algorithmic lesion detectors were applied separately to the dataset and compared by standard statistical measures. The area under the ROC curve was the main performance characteristic. The results showed that the agreement between the two computerized lesion detectors was high. Retinal exams containing more than minimal DR were detected with an AUC of Az = 0.84 by the Eyecheck algorithm and an AUC of Az = 0.82 for the ROC-2009 winner. This difference in AUC was not statistically significant (z-score of 1.91). If the detection output of these two algorithms were combined (at least one detection constituted a hit), the detection AUC increased to Az = 0.86, a value identical to the theoretically expected maximum [69]. At 90% sensitivity, the specificity of the EyeCheck algorithm was 47.7%. The specificity of the ROC-2009 winner algorithm was 43.6%. By comparison with interobserver variability of the employed experts, the study concluded that DR detection algorithms appear to be mature and further improvements in detection performance cannot be differentiated from the current best clinical practice because the performance of competitive algorithms has now reached the human intrareader variability limit [69]. Additional validation studies on larger, well-defined, but more diverse populations of patients with diabetes are urgently needed, anticipating cost-effective early detection of DR in millions of people with diabetes to triage those patients who need further care at a time when they have early rather than advanced DR, and such trials are currently underway in the U.S., U.K., and the Netherlands, though the results have not yet been disclosed.

2) Multilevel Approach to Lesion Detection: From Pixel to Patient

As outlined above, the retinal lesion detection algorithms operate at a broad range of levels according to the utilization of the detection algorithm outputs. Such a utility level is limited at one end by the finite resolution of the imaging device and at the other end by the feasibility of imaging that can be employed over a finite time (i.e., number of repeated image acquisitions on the same subject). At the lowest level, algorithms classify individual pixels, followed by groups of pixels (possibly representing lesions), areas (organs or organ structures) in images, and at even higher level, complete images, multiple images may form a subject-level exam, and finally—at the highest level—multifaceted analyses of individual subjects are attempted. At each such level, the probability of abnormality detection is frequently determined while relying on findings at previous lower levels. At the highest level the system may be diagnosing a single patient based on the fused information from all the lower-level contributions. Clearly, answering the ultimate question how to effectively fuse all such information is nontrivial.

This subject was studied by Niemeijer et al. [98], and their approach involved application of multiple unsupervised and supervised analysis approaches that were compared in terms of performance at the patient level. A compound computer-aided retinal diagnostic system was developed that takes into account abnormalities of multiple types and at multiple levels, as well as the estimated confidence in individual analysis outcomes. A reliable analysis scheme was proposed based on a supervised fusion scheme for combining the output of the different components, and its performance evaluated on 60 000 images from 15 000 patients. The choice of the fusion system was identified as significantly influencing the overall system performance with simple fusion methods achieving classification performance associated AUC = 0.82 while the supervised fusion system reached an AUC = 0.89 [98].

3) Role of Publicly Available and Comparative Databases

To drive the development of progressively better fundus image analysis methods, research groups have established publicly available, annotated image databases in various fields. Fundus imaging examples are represented by the STARE [72], DRIVE [73], REVIEW [99] and MESSIDOR databases [100], with large numbers of annotated retinal fundus images, with expert annotations for vessel segmentation, vessel width measurements, and diabetic retinopathy detection, as well as competitions such as the Retinopathy Online Challenge [97], some of which will be discussed in the following. A major inspiration for these online image databases and online competitions was the Middlebury Stereo Vision competition [101], [102].

4) DRIVE—(Digital Retinal Images for Vessel Evaluation)

The DRIVE database was established to enable comparative studies on segmentation of retinal blood vessels in retinal fundus images. It contains 40 fundus images from subjects with diabetes, both with and without retinopathy, as well as retinal vessel segmentations performed by two human observers. In one of the available images, high-contrast choroidal regions were also segmented because these can be easily confused with retinal vessels. Starting in 2005, researchers have been invited to test their algorithms on this database and share their results with other researchers through the DRIVE website [103]. At the same web location, results of various methods can be found and compared. An early comparative analysis of the performance of vessel segmentation algorithms was reported in [73] and by now, over 100 papers have been published using the DRIVE database as a benchmark. Currently, retinal vessel segmentation research is primarily focusing on improved segmentation of small vessels, as well as on segmenting vessels in images with substantial abnormalities.

ROC—Retinopathy Online Challenge

The DRIVE database was a great success, allowing comparisons of algorithms on a comparative dataset. In retinal image analysis, it represented a substantial improvement over method evaluations on unknown datasets. However, different groups of researchers tend to use different metrics to compare the algorithm performance, making truly meaningful comparisons difficult or impossible. Additionally, even when using the same evaluation measures, implementation specifics of the performance metrics may influence final results. Consequently, until the advent of the Retinopathy Online Challenge ROC competition in 2009, comparing the performance of retinal image analysis algorithms was difficult [97].

A logical next step was therefore to provide publicly available annotated datasets for use in the context of online, standardized evaluations asynchronous competitions. In an asynchronous competition, a subset of images is made available with annotations, while the remainder of the images are available with annotations withheld. This allows researchers to optimize their algorithm performance on the population from which the images were drawn (assuming the subset with annotated images is representative of the entire population), but they are unable to test–retest on the evaluation images, because those annotations are withheld. All results are subsequently evaluated using the same evaluation software and research groups are allowed to submit results continuously over time. Nevertheless, some groups may be tempted to artificially influence the performance outcome for example by using human readers to assist with the performance of their algorithm, or iteratively improving the performance by submitting multiple results serially and using the obtained performance differences to tune-up their algorithms.

More recently, the concept of synchronous competitions was born, for which a deadline is given for submitting analysis results with competition results announced at a single moment in time. The most well-known example of such an approach is the Netflix competition [104]. These kinds of joint evaluations on a common dataset have the potential to steer future research by showing the failure modes of certain techniques and guide the practical application of techniques in the clinical practice, especially if appropriate reward mechanisms are available (again, the highly successful Netflix competition may serve as a motivational example).

The first Retinopathy Online Challenge competition [105] focused on detection of microaneurysms and was organized in 2009. Twenty-six groups participated in the competition out of which six groups submitted their results on time, as published in [97]. One group decided to drop out of the competition after the results were announced, and the remainder allowed their performance to be discussed publicly [89], [106]–[108]. The results from each of the methods in this competition are summarized in .

Table I

Method⅛¼½1248AverageValladolid [106]0.190.220.250.300.360.410.520.32Waikato [109]0.060.110.180.210.250.300.330.21LaTIM [89]0.170.230.320.380.430.530.600.38OK Medical [107]0.200.270.310.360.390.470.500.36Fujita Lab [108]0.180.220.260.290.350.400.470.31Open in a separate window

ROC-2009 Datasets

A set of 100 digital color fundus photographs were selected from a large dataset of over 150 000 images, acquired during diabetic retinopathy screening [59]. The inclusion criteria were that the screening program ophthalmologist had marked the image as containing microaneurysms and did not mark it as ungradable. Since multiple screening sites utilizing different fundus camera types were involved in the screening program, the images in the ROC-2009 set are quite heterogeneous. Three different sizes of field of view (FOV) are present in the dataset, each corresponding to different image resolution. The images were captured using either Topcon NW 100, Topcon NW 200, or Canon CR5-45NM camera, resulting in two differently shaped FOV’s. All images were made available in JPEG format with standard image compression levels set in the camera. Four retinal experts annotated all microaneurysms as well as all “don’t care” lesions in the 100 images. For the training set, a logical OR was used to combine the lesion locations annotated by the four experts—thus ensuring that the reference dataset was highly sensitive to lesions, as it required only one retinal expert to identify a lesion. The annotations were exported as a file in XML format that contained the center locations for all microaneurysms and all “don’t care” lesions in each image of the set.

For more information what is retinal imaging, please get in touch with us!