The overall approach towards the development of the smarty4covid database is depicted in Fig. 2. It includes a crowd-sourcing data collection strategy followed by a two-step data curation method involving data cleaning and labeling. A multi-modal dataset was collected including audio records and tabular data. The curated dataset was exploited for extracting breathing related features, creating publicly available data records, and developing the smarty4covid OWL knowledge that enables data selection and reasoning.

Fig. 2
figure 2

Overall approach towards developing the smarty4covid database.

Crowd-sourcing Data Collection

The smarty4covid crowd-sourcing data collection was approved by the National Technical University’s Ethics Committee of Research (16141/15.04.2020) and complied with all relevant ethical regulations. A responsive and user-friendly web-based application ( was implemented targeting Greek and Cypriot citizens older than 18 years old. The smarty4covid questionnaire consisted of several sections accompanied by instructions for users to perform audio recordings of voice, breath and cough and provide information regarding demographics, COVID-19 vaccination status, medical history, vital signs as measured by means of oximeter and blood pressure monitor, COVID-19 symptoms, smoking habits, hospitalization, emotional state and working conditions. Four types of audio recordings were considered: (i) three voice recordings where the user was required to read a specific sentence, (ii) five deep breaths, (iii) 30 s regular breathing close to the microphone of the device and (iv) three voluntary coughs. A framework safeguarding data protection while taking into consideration all the necessary ethical aspects was implemented. The user terms and privacy policy were appropriately drafted clarifying the data usage and conditions for sharing, the users’ rights and the exact measures taken to protect the data. Prior to initiating the smarty4covid questionnaire, the users were required to read the informed consent, which included the links to the user terms and privacy policy, in order to provide their consent. Following an effective media planning, more than 10,000 individuals provided demographic information and underlying medical conditions to the smarty4covid application, yet almost half of them (i.e. 4,679) gave the necessary permissions to perform the audio recordings. The web-based application was released in January 2022 during the spread of the omicron wave in Greece, resulting in high COVID-19 prevalence (17.3% of users were tested positive for COVID-19).

Data Curation

Part of the crowd-sourced dataset was invalid due to erroneous audio recording submissions by the users and the presence of distortions and high background noise. The data cleaning process was performed by means of a crowd-sourcing campaign utilizing the Label Studio ( open source data labeling tool. AI engineers who volunteered to annotate the audio signals, signed a Non Disclosure Agreement (NDA) and granted with the necessary access permissions. A user-friendly environment was implemented enabling the annotators to listen the audio signals and answer to questions regarding their validity (yes/no) and their quality (Good, Acceptable, Poor) in terms of background noise and distortion. In order to evaluate the quality of the annotations, a set of randomly selected audio files (i.e. 1,389) was considered more than once and up to 5 times in the annotation procedure. A high level of consistency (92.5%) among the annotators was observed indicating that there was no need to have multiple annotators for each audio recording.

The smarty4covid crowd-sourced dataset was enriched with labels annotated from healthcare professionals (pulmonologists, anesthesiologists, internists) who volunteered to characterize the collected audio recordings in terms of audible abnormalities and to provide personalized recommendations regarding the need for medical advice. To this end, four crowd-sourcing campaigns were initiated utilizing the Label Studio. Three campaigns focused on the audio recordings (breath, voice, cough). As depicted in Fig. 3, the healthcare professionals were asked to assess the presence of audible abnormalities by selecting one or more options from the available labels. In the fourth campaign, the healthcare professionals were exposed to all available multimodal information about the user, excluding vital signs (oxygen saturation, beats per minute (BPM), diastolic/systolic pressure) that would lead them to a biased assessment, in order to estimate the risk of health deterioration and suggest a next course of action: a) Seek for medical advice, b) Repeat the Smarty4Covid test in 24 hours and c) In case you notice changes in your health status, repeat the Smarty4Covid test. They were also asked to define a level of confidence (from 1 to 10) in their assessment.

Fig. 3
figure 3

The smarty4covid labeling campaigns. The cough, breath, and voice campaigns include labels that are indicative of respiratory abnormalities.

Breathing Feature Extraction

Respiration is a complex physiological process, involving both voluntary and involuntary processes, as well as underlying reflexes. A breathing pattern is the upshot of a fine coordination between peripheral chemoreceptors, central nervous system’s organizing structures, lung mechanoreceptors and parenchyma, musculoskeletal components, intrinsic metabolic rate, emotional state, and many others. A breathing pattern adopted at any given moment is assumed to be that which produces adequate alveolar ventilation at the lowest possible energy cost, given the contemporary system’s mechanical status and organism’s metabolic needs. Any disruption in any of these respiratory homeostasis’ pillars, will be reflected in a change of the respiratory pattern, shifting this balance to the best for the prevailing conditions energetic state17. A viral infection could be a breathing pattern’s disorientation factor18,19,20,21. Some quantitative indicators commonly used to describe a breathing pattern and its readjustments are RR, respiratory phases and volumes, gases partial pressure, blood gases analysis and other17,22.

Most of the studies associated with COVID-19 crowd-sourced databases of breathing audio recordings explore features generated through signal processing or deep learning. The smarty4covid dataset14 innovates the current state of the art by including clinically relevant important and informative respiratory indicators extracted from regular breathing records, such as the RR, I/E ratio, and FIT. RR is the number of breaths per minute, that is normally 16–20 breaths/min. It can be affected by both external and internal factors such as the temperature, endogenous acid-base balance, metabolic state, diseases, injuries, toxicity, etc. I/E ratio is the ratio between the inspiratory (Ti) and expiratory time (Te) and it can be indicative to a flow disturbance in the respiratory tract23. Normal breathing usually presents 1:2 or 1:3 I/E ratio at rest23 while airways obstruction may lead to prolonged expiration or inspiration resulting to an abnormal I/E ratio. FIT, also termed as the inspiratory “duty cycle” of the respiratory system, is the ratio between (Ti) and the duration of a total respiratory cycle (\({T}_{tot}\))22. It provides a rough measure of airway obstruction and stress on the respiratory muscles. Table 1 summarizes the description and the normal ranges of the aforementioned respiratory indicators.

Table 1 Normal ranges of respiratory indicators.

A two step approach was developed in order to extract Ti and Te from the crowd-sourced breathing audio signals: (i) localization of the segments on the audio signal that contains breathing, and (ii) detection of the exhaling and inhaling parts. In the first step, an AI-based model, described in the “Technical Validation” Section, was applied. The obtained breathing segments were split into non silent intervals. The second step was particularly challenging since either the inhalation part, that was characterized by low mean amplitude, was not appropriately captured due to the hardware of the recording device or due to the short distance of the sound source from the microphone during the exhalation phase, resulting in distortion of the waveform. In order to face this challenge, an unsupervised method was developed with the aim to identify similar parts on a single breathing audio signal that in turn could be considered as either inhalation or exhalation. This particular method presents several advantages over the state of the art24, since it doesn’t require a dataset of human-labeled data for training while there is no need to take into consideration prior knowledge that inhalation follows exhalation and vice versa. Furthermore, the application of the unsupervised method on a single breathing audio signal adds robustness against distortion and background noise since all inhalation/exhalation parts of the same breathing recording are subject to the same level of distortion and background noise.

The unsupervised method featured a clustering algorithm based on affinity propagation24 at a frequency level. To this end, the mel-spectrogram (MFCC-128) of the audio signal was obtained and transformed into a vector of 128 frequencies each one corresponding to the summation of the respective frequencies over time. The obtained clusters were labeled as “inhalation”, “exhalation” or “other” though applying a heuristic approach. More specifically, for each cluster, the mean amplitudes were calculated by averaging the mean amplitudes over all the members of the cluster. Next, the clusters were sorted from largest to smallest mean amplitude. The top listed cluster was considered as exhalation while the second cluster (if existed) as inhalation. The remaining clusters were labeled as “other”.

For validation purposes, the inhalation and exhalation parts of 127 audio recordings of regular breathing, were manually annotated in order to enable the calculation of the corresponding respiratory indicators. The proposed unsupervised method achieved Root Mean Square Error (RMSE) up to 1.77, 0.21 and 0.08 for the RR, FIT, and I/E ratio, respectively. The RMSE values are considered to be low taking into consideration the normal ranges of each respiratory indicator (Table 1). The algorithm’s efficacy in accurately identifying the inhalation and exhalation parts within the audio recordings of regular breathing, was assessed by applying the Intersection over Union (IoU) criterion25. The obtained IoU values was up to 75% for inhalation and 76% for exhalation. These results indicate an acceptable degree of alignment between the algorithm’s output and the actual respiratory phases.

Source link