Respiratory Sound Dataset


The Respiratory Sound Database contains audio samples, collected independently by two research teams in two different countries, over several years. Most of the database consists of audio samples recorded by the School of Health Sciences, University of Aveiro (ESSUA) research team at the Respiratory Research and Rehabilitation Laboratory (Lab3R), ESSUA and at Hospital Infante D. Pedro, Aveiro, Portugal. The second research team, from the Aristotle University of Thessaloniki (AUTH) and the University of Coimbra (UC), acquired respiratory sounds at the Papanikolaou General Hospital, Thessaloniki and at the General Hospital of Imathia (Health Unit of Naousa), Greece.

The database consists of a total of 5.5 hours of recordings containing 6898 respiratory cycles, of which 1864 contain crackles, 886 contain wheezes, and 506 contain both crackles and wheezes, in 920 annotated audio samples from 126 subjects.

The cycles were annotated by respiratory experts as including crackles, wheezes, a combination of them, or no adventitious respiratory sounds. The recordings were collected using heterogeneous equipment and their duration ranged from 10s to 90s. The chest locations from which the recordings were acquired is also provided. Noise levels in some respiration cycles is high, which simulate real life conditions.

Possible use cases

Respiratory sounds contain relevant information about the structure and function of the respiratory system and are generally classified as normal or adventitious. Auscultation-based diagnosis of pulmonary disorders relies heavily on the presence of adventitious sounds and on the altered transmission characteristics of the chest wall. Adventitious sounds are respiratory sounds superimposed on breath sounds which can be discontinuous (crackles) or continuous (wheezes).

Files organisation 

Each file name is divided into 5 elements, separated with underscores (_).

1. Patient number (101,102,…,226)

2. Recording index

3. Chest location

a. Trachea (Tc)

b. Anterior left (Al)

c. Anterior right (Ar)

d. Posterior left (Pl)

e. Posterior right (Pr)

f. Lateral left (Ll)

g. Lateral right (Lr)

4.    Acquisition mode

a. sequential/single channel (sc),

b. simultaneous/multichannel (mc)

5.    Recording equipment

a. AKG C417L Microphone (AKGC417L),

b. 3M Littmann Classic II SE Stethoscope (LittC2SE),

c. 3M Litmmann 3200 Electronic Stethoscope (Litt3200),

d.  WelchAllyn Meditron Master Elite Electronic Stethoscope (Meditron)

The annotation files comprise four columns:

  • Beginning of respiratory cycle(s)
  • End of respiratory cycle(s)
  • Presence/absence of crackles (presence=1, absence=0)
  • Presence/absence of wheezes (presence=1, absence=0)

The diagnosis for each subject can be found here. The abbreviations used in the diagnosis file are:

  • COPD: Chronic Obstructive Pulmonary Disease
  • LRTI: Lower Respiratory Tract Infection
  • URTI: Upper Respiratory Tract Infection

This database is freely available for research and can be downloaded using the link on the top right of this page. Furthermore, you can download the files containing the respective demographic information and the detailed events. (The columns in the text file correspond to the following variables (NA: Not Available): Participant ID, Age, Sex, Adult BMI (kg/m2), Child Weight (kg), Child Height (cm)).


This database is available to logged users using the link below.

How to cite this database:

Rocha BM et al. (2019) “An open access database for the evaluation of respiratory sound classification algorithms” Physiological Measurement 40 035001

This dataset was used in the support of:
ICBHI 2017 Scientific Challenge