Faculty Publications

Machine Learning-Based Estimation of Hoarseness Severity Using Acoustic Signals Recorded During High-Speed Videoendoscopy

Tobias Schraut, Universitätsklinikum Erlangen
Michael Döllinger, Universitätsklinikum Erlangen
Melda Kunduk, Louisiana State University
Matthias Echternach, Klinikum der Universität München
Stephan Dürr, Klinikum der Universität Regensburg und Medizinische Fakultät
Julia Werz, Universitätsklinikum Erlangen
Anne Schützenberger, Universitätsklinikum Erlangen

Document Type

Article

Publication Date

1-1-2025

Abstract

Objectives: This study investigates the use of sustained phonations recorded during high-speed videoendoscopy (HSV) for machine learning-based assessment of hoarseness severity (H). The performance of this approach is compared with conventional recordings obtained during voice therapy to evaluate key differences and limitations of HSV-derived acoustic recordings. Methods: A database of 617 voice recordings with a duration of 250 ms was gathered during HSV examination (HS). Two databases comprising 809 vowels recorded during voice therapy were used for comparison, examining recording durations of 1 second (VT-1) and 250 ms (VT-2). A total of 490 features were extracted, including perturbation and noise characteristics, spectral and cepstral coefficients, as well as features based on modulation spectrum, nonlinear dynamic analysis, entropy, and empirical mode decomposition. Model development focused on selecting a minimal-optimal feature subset and suitable classification algorithms. Recordings were classified into two groups of hoarseness based on auditory-perceptual ratings by experts, yielding a continuous hoarseness score yˆ. Model performance was evaluated based on classification accuracy, correlation between predicted scores yˆ∈[0,1] and subjective ratings H∈{0,1,2,3}, and correlation between the relative change in quantitative and subjective ratings. Results: Logistic regression combined with five acoustic features achieved a classification accuracy of 0.863 (VT-1), 0.847 (VT-2), and 0.742 (HS) on the test sets. A correlation of 0.797 (VT-1), 0.763 (VT-2), and 0.637 (HS) was obtained between yˆ and H, respectively. For 21 test subjects with two recordings, the model yielded a correlation of 0.592 (VT-1), 0.486 (VT-2), and 0.088 (HS) between ∆yˆ and ∆H. Conclusion: While acoustic signals recorded during HSV show potential for quantitative hoarseness assessment, they are less reliable than voice therapy recordings due to practical challenges associated with oral laryngeal examination. Addressing these limitations, for example, through the use of flexible nasal endoscopy, could improve the quality of HSV-derived acoustic recordings and voice assessments.

Publication Source (Journal or Book title)

Journal of Voice

Recommended Citation

Schraut, T., Döllinger, M., Kunduk, M., Echternach, M., Dürr, S., Werz, J., & Schützenberger, A. (2025). Machine Learning-Based Estimation of Hoarseness Severity Using Acoustic Signals Recorded During High-Speed Videoendoscopy. Journal of Voice https://doi.org/10.1016/j.jvoice.2024.12.008

Download

COinS

Faculty Publications

Machine Learning-Based Estimation of Hoarseness Severity Using Acoustic Signals Recorded During High-Speed Videoendoscopy

Document Type

Publication Date

Abstract

Publication Source (Journal or Book title)

Recommended Citation

Search

Browse

Author Corner

SPONSORED BY

Faculty Publications

Machine Learning-Based Estimation of Hoarseness Severity Using Acoustic Signals Recorded During High-Speed Videoendoscopy

Authors

Document Type

Publication Date

Abstract

Publication Source (Journal or Book title)

Recommended Citation

Share

Search

Browse

Author Corner

SPONSORED BY