This prospective study was performed at a single university center from March 2021 to April 2022 among patients who received an elective tracheostomy. All tracheostomies were performed by a single surgeon. Under general anesthesia, a horizontal incision was made on the skin at the level of the isthmus. The isthmus was cut after dividing the strap muscles, and a window was created at the second trachea ring. Patients who needed ventilator unit care after tracheostomy or underwent an emergency tracheostomy in the emergency room or intensive care unit (ICU), patients younger than 20 years, and pregnant patients were excluded. Using those criteria, 23 patients with tracheostomy were enrolled in this study. We obtained the following clinical information for all patients. The study was approved by Bucheon St. Mary Hospital of the Catholic University of Korea institutional review board (IRB) (The physiologic changes of trachea according to the degree of sputum after tracheostomy, HC20ONSI0106, approved November 10, 2020). Procedures were followed by IRB ethical standards and the Helsinki Declaration of 1975.

Recording system

Breathing sound samples were recorded with a voice recorder (Model PCM-A10; Sony, Japan) using a condenser microphone (Model ECM-CS3; Sony, Japan) located two or three cm from the outer opening of the tracheostomy tube in line with the direction of the tube. The recording type was linear pulse code modulation, which can record original sounds without compression. All participants also wore parts of a polysomnography device during the study. The oronasal airflow device was located at the opening of the tracheostomy tube and did not interfere with breathing or recording. However, electroencephalogram, electrooculogram, and electromyography sensors were not set up. A photograph of the devices installed on each participant is shown in Fig. 1.

Figure 1
figure 1

A photograph of the devices installed on the patient with a tracheostomy. A condenser microphone with a voice recorder and parts of a polysomnogram were set on the patient. A nasal prong connected with a polysomnogram was located at the entrance of the tracheostomy tube. A condenser microphone was located at a distance of 2–3 cm from the outer opening of the tracheostomy tube. Pulse oximetry, thoracic and abdominal bands were also set on the patient.

Data collection and classification

All data collection started in the ICU immediately after surgery. In general, women are typically inserted with a size 6 or 7 tube, while men are typically inserted with a size 7 or 8 tube. After consulting with the anesthesiologists and considering the height, weight, and pulmonary function of participants, it was determined to use a size 7 tube (TRACOE twist 306-7; GmbH, German). Participants were transferred to the general ward two or three days after tracheostomy, and the existing tracheostomy tube was changed with a new fenestration-type tracheostomy tube (TRACOE twist 304-7; GmbH, German) three to five days after surgery. We collected data only until the tube was changed. As a result, all participants were recorded for an average of 12–16 h a day for 1–5 days after tracheostomy.

Breathing sounds with severe background noise and very low breathing sounds that were not detected in the spectrogram due to some reasons such as sleep were excluded. Breathing sounds during the period when the participants expressed severe dyspnea were also excluded. Even if participants were aware of sputum in their airways, breathing sounds were included if they did not require medical intervention.

Breathing sounds were classified through three stages. First, two expert otorhinolaryngology doctors with more than 10 years of experience listened to all the recording samples and selected the breathing sounds that satisfied the inclusion criteria. Second, all selected breathing sounds were converted into spectrograms and analyzed using the waveforms of the spectrogram. An audio spectrogram is a two-dimensional image that simultaneously presents sound waveforms and spectra. By representing continuously changing spectra as a data sample, spectrograms provide rich audio information and are widely used in deep learning frameworks based on image classification10,11,12. The breathing sound samples were converted into spectrograms using a short-time Fourier transform. A more detailed process of conversion is presented in Supplementary 1.

All the breathing sounds were primarily classified into three categories based on the spectrogram waveform: normal breathing sound (NS); low-frequency vibrant breathing sound (VS) that indicates a movable obstacle such as sputum in the tracheostomy tube that requires suctioning; high-frequency sharp breathing sound (SS) that indicates a fixed obstacle, including crusts, and blood clots in the tracheostomy tube that requires suctioning or changing the inner cannula of the tube. Examples of the time-domain wave characteristics and spectrograms of each breathing sound are shown in Fig. 2. In NS, because the airway had no obstacles and minimal friction, the acoustic energy was relatively low and mainly below 2000 Hz. This became evident when examining the spectrogram zoomed below 3000 Hz. There are two primary types of noise. Background noise, which occurs without any specific event, is predominantly found below 1000 Hz. In contrast, noise resulting from events such as speech is most prominent below 1500 Hz. NS, on the other hand, exhibits its energy concentration in the 1500–2000 Hz range. The acoustic energy of the abnormal breathing sounds exhibited a broader acoustic energy distribution, spanning from 500 to 12,000 Hz because the obstacles generated sounds of various frequencies. VS exhibited a repetitive pattern occurring approximately around 100 times per second during respiration indicating the presence of a movable obstacle blocking the trachea or tracheostomy tube. This pattern appears as multiple vertical lines in the spectrogram. In contrast, SS occurred when stiff or fixed obstacles narrowed the cross-section of the airway, which induced a wide range of high-frequency breathing sounds that were more continuous than VS. The pattern of SS appears as multiple horizontal lines in the spectrogram (Fig. 2). There were instances where samples exhibited both VS and SS patterns of the spectrogram. In such cases, they were classified as VS for clinical reasons. First, fixed obstacles are typically composed of sputum or blood clots, which are movable obstacles3. Therefore, being in an intermediate stage before becoming entirely SS, they tend to display features closer to VS. Second, suction is a more rapid and easily accessible approach than tube change. Tube change can be performed by only medical staff and carries a higher risk of tracheostomy tube displacement if done within a week after tracheostomy13. In contrast, suction can be carried out by caregivers, and there is also a chance for medical staff to reassess the airway after removing all movable obstacles through suction. Additionally, the extent to which a suction catheter can enter the tracheostomy tube itself serves as one of the methods to assess tube obstruction. If fixed obstacles are removed along with movable obstacles during suction, airway problems can often be entirely resolved with suction alone. In cases of a dual cannula of the tracheostomy tube, regardless of the presence of fixed obstacles in the inner cannula, suction should be performed after. For these reasons, the cases displaying both VS and SS patterns of the spectrogram were classified as VS.

Figure 2
figure 2

Time domain sound characteristics and spectrograms of breathing sounds. (A) The acoustic energy was relatively small and concentrated mainly below 2000 Hz in normal breathing sounds. (B) A detailed image of normal breathing sounds under 3 kHz. (C) The acoustic energy was scattered over a large area, from 500 to 12,000 Hz in abnormal breathing sounds. In addition to the preceding reason, characteristic multiple vertical lines created by the movable obstacle repeatedly blocking the transmission of sounds were shown in vibrant breathing sounds. (D) In sharp breathing sounds, characteristic multiple horizontal lines were shown because fixed obstacles, which induced a wide range of high-frequency breathing sounds, narrowed the airway continuously.

Third, two experts classified the breathing sounds according to the spectrogram results. Only when both experts agreed on the result of the spectrogram was it included.

Methods for AI-based analysis

In this study, we converted breathing sound samples into a spectrogram and Mel frequency cepstral coefficient (MFCC), which are audio features widely utilized to analyze respiratory status. The details of the converted features are described in the following sections. All data processing for sound classification by multiple AI algorithms, such as spectrogram conversion and MFCC extraction, was performed in MATLAB 2019a.

MFCC extraction

The MFCC is a group of audio parameters suitable for human auditory characteristics and has been widely applied for speech recognition14,15 and respiratory diagnosis16,17,18. In this study, we used MFCC as a tracheal breathing sound feature for machine learning–based classifiers. The MFCCs were extracted in the same frequency range as spectrograms, and the coefficients were obtained for each frame in two variants: MFCC (20) and MFCC (40). A more detailed process of MFCC extraction, the designed filter banks, and examples of extracted MFCCs are presented in Supplementary 2, 3 and 4.

MFCC-based machine learning classification methods

For MFCC-based breathing sound classification, a support vector machine (SVM)19,20 and k-nearest neighbor (kNN)21,22, which are widely used for health status diagnosis using MFCC, are employed. All breathing sound classification for the machine learning algorithms was performed using a desktop machine with an Intel i5-10500F CPU and NVIDIA GeForce RTX1660 Ti (6 GB) GPU. A more detailed process of extraction is presented in Supplementary 5.

Spectrogram-based deep learning method: convolutional neural network (CNN)

CNN is a widely and successfully established deep learning algorithm in the field of image classification and pattern recognition23. The basic structure of a CNN is the convolution layer, which calculates the tensor transferred to the next layer through a convolution computation between the tensor and the kernel. A CNN topology includes many convolution layers designed based on factors such as kernel size and number. A CNN provides a framework for learning the features common among images in a data group without requiring manual data extraction, and it can generate accurate pattern recognition or image classification–trained models. Biomedical signal classification studies have been accelerated by imaging and learning with CNNs10,11,12. For example, CNN classification based on spectrograms has increased the accuracy of respiratory pattern classification in many clinical fields. In this study, we applied spectrograms converted from one cycle of respiratory data to the following CNN topologies: AlexNet24, VGGNet25, ResNet26, Inception_v327, and MobileNet28. The CNN classification was conducted with Lenovo Intelligent-Computing-Orchestration with a batch size of 32 and a maximum iteration of 200,000.

Ethical approval and consent to participate

The study was approved by the institutional review board (IRB) of Bucheon St. Mary Hospital of the Catholic University of Korea. The approval number is HC20ONSI0106. All participants were provided with an explanation of the study and gave their informed consent. Consent documents were obtained from all participants.

Source link