Document Type

Article

Publication Date

1-1-2025

Abstract

Objectives: This study investigates the use of sustained phonations recorded during high-speed videoendoscopy (HSV) for machine learning-based assessment of hoarseness severity (H). The performance of this approach is compared with conventional recordings obtained during voice therapy to evaluate key differences and limitations of HSV-derived acoustic recordings. Methods: A database of 617 voice recordings with a duration of 250 ms was gathered during HSV examination (HS). Two databases comprising 809 vowels recorded during voice therapy were used for comparison, examining recording durations of 1 second (VT-1) and 250 ms (VT-2). A total of 490 features were extracted, including perturbation and noise characteristics, spectral and cepstral coefficients, as well as features based on modulation spectrum, nonlinear dynamic analysis, entropy, and empirical mode decomposition. Model development focused on selecting a minimal-optimal feature subset and suitable classification algorithms. Recordings were classified into two groups of hoarseness based on auditory-perceptual ratings by experts, yielding a continuous hoarseness score yˆ. Model performance was evaluated based on classification accuracy, correlation between predicted scores yˆ∈[0,1] and subjective ratings H∈{0,1,2,3}, and correlation between the relative change in quantitative and subjective ratings. Results: Logistic regression combined with five acoustic features achieved a classification accuracy of 0.863 (VT-1), 0.847 (VT-2), and 0.742 (HS) on the test sets. A correlation of 0.797 (VT-1), 0.763 (VT-2), and 0.637 (HS) was obtained between yˆ and H, respectively. For 21 test subjects with two recordings, the model yielded a correlation of 0.592 (VT-1), 0.486 (VT-2), and 0.088 (HS) between ∆yˆ and ∆H. Conclusion: While acoustic signals recorded during HSV show potential for quantitative hoarseness assessment, they are less reliable than voice therapy recordings due to practical challenges associated with oral laryngeal examination. Addressing these limitations, for example, through the use of flexible nasal endoscopy, could improve the quality of HSV-derived acoustic recordings and voice assessments.

Publication Source (Journal or Book title)

Journal of Voice

Plum Print visual indicator of research metrics
PlumX Metrics
  • Usage
    • Downloads: 5
    • Abstract Views: 1
  • Captures
    • Readers: 14
see details

Share

COinS
 
 

To view the content in your browser, please download Adobe Reader or, alternately,
you may Download the file to your hard drive.

NOTE: The latest versions of Adobe Reader do not support viewing PDF files within Firefox on Mac OS and if you are using a modern (Intel) Mac, there is no official plugin for viewing PDF files within the browser window.