Computed Tomography Emphysema Database

This website describes and hosts a computed tomography (CT) emphysema database that has previously been used to develop texture-based CT biomarkers of chronic obstructive pulmonary disease (COPD).

Emphysema, characterized by loss of lung tissue, is one of the main components of COPD, and a proper classification of emphysematous - and healthy - lung tissue is useful for a more detailed analysis of the disease. This may, e.g., lead to improved understanding and improved computer-aided diagnosis (CAD). One way to objectively characterize the emphysema morphology is to describe the CT image intensity patterns using texture analysis techniques. Texture-based biomarkers often rely on supervised learning; hence a set of labeled examples are needed such as the data contained in this database. This area of research has received quite some attention in recent years, yet the amount of available data is limited making it impossible to judge different methods on common grounds. This is why this database is made publicly available.


The database can be used free of charge for research and educational purposes. Redistribution and commercial use is not permitted. If you publish using data from this website (journal publications, conference papers, abstracts, technical reports, etc.), please cite the following paper:

L. Sørensen, S. B. Shaker, and M. de Bruijne, Quantitative Analysis of Pulmonary Emphysema using Local Binary Patterns, IEEE Transactions on Medical Imaging 29(2): 559-569, 2010. [PDF | BibTex]

Further, we would also appreciate if a reference to the publication using the data be forwarded to the following email address: lauges (at) diku (dot) dk. This information will be added to the list of studies using data from this database at the bottom of this website.

Description of the data

The database comprises 115 high-resolution CT (HRCT) slices as well as 168 square patches manually annotated in a subset of the slices.

CT scanning was performed using General Electric (GE) equipment (LightSpeed QX/i; GE Medical Systems, Milwaukee, WI, USA) with four detector rows and using the following parameters: in-plane resolution 0.78 x 0.78 mm, slice thickness 1.25 mm, tube voltage 140 kV, and tube current 200 mAs. The slices were reconstructed using a high-spatial-resolution (bone) algorithm. The data comes from a study group comprising 39 subjects (9 never-smokers, 10 smokers, and 20 smokers with COPD) that were all CT scanned. See [1] and [2] for more details.

Apart from the data, we also provide information about which subject each patch (in a seperate file) and each HRCT slice (in the file naming) comes from to enable appropriate cross-validation at subject level.


The 512 x 512 pixel slices were acquired in the upper, middle, and lower part of the lung of each subject. A slice was missing in two cases, hence a total of 115 HRCT slices are available.

An experienced chest radiologist and a CT experienced pulmonologist each assessed the leading pattern (either normal tissue (NT), centrilobular emphysema (CLE), paraseptal emphysema (PSE), or panlobular emphysema (PLE)) as well as the associated severity (either no emphysema or minimal, mild, moderate, severe, or very severe emphysema) in each of the 115 slices. A consensus was made in cases of disagreement. The leading pattern was later used for obtaining patch labels. The leading pattern, or label, of each slice as well as the associated severity is also available.


The 168 61 x 61 pixel patches are from three different classes, NT (59 observations), CLE (50 observations), and PSE (59 observations). The NT patches were annotated in never smokers, and the CLE and PSE ROIs were annotated in healthy smokers and smokers with COPD in areas of the leading pattern.

The largest patch size considered in [1] was 51 x 51 pixels. However, here we provide larger patches in order to enable handling of the patch border in a fashion similar to [1].

Example 51 x 51 pixel patches from the database. Left: NT. Middle: CLE. Right: PSE.
CT patch with normal tissue CT patch with centrilobular emphysema CT patch with paraseptal emphysema





* Please note that some programs cannot handle 16 bit TIFF images properly, causing the slices and patches to be converted to/displayed as binary (they are grayscale). Examples of programs the can handle the data properly include: Matlab and ImageJ. The first 15 columns in the first row of patch1.tiff should contain the following intensities in order to be correct: 374, 126, 182, 208, 223, 684, 927, 1201, 1346, 1103, 581, -37, -532, -614, -686.


Studies using data from this database

This is a list of studies using data from this database. In some cases, only a subset of the data is used.


Please forward any questions to the following email address: lauges (at) diku (dot) dk

This website is copyright © 2013 - 2018 Lauge Sørensen, Saher B. Shaker, and Marleen de Bruijne.