Semester of Graduation

Fall 2024

Degree

Master of Science (MS)

Department

Division of Computer Science & Engineering

Document Type

Thesis

Abstract

Wearable exoskeletons offer significant potential in enhancing human mobility in industrial environments. However, their adaptability to dynamic, task-intensive settings presents challenges, especially in accurately predicting locomotion modes such as ladder climbing, stair navigation, low-space movement, and obstacle navigation. This research proposes a multimodal framework that integrates visual data and speech commands to improve locomotion mode prediction in unpredictable environments. Multimodal data was collected using smart glasses, capturing both the user’s perspective (field-of-view, FOV) and voice during locomotion tasks. State-of-the-art models—CLIP, ImageBind, and GPT-4o—process these visual and linguistic inputs to predict locomotion activities. The models were evaluated in zero-shot and fine-tuned conditions, with preprocessing steps aligning voice commands to FOV frames. Class imbalances were addressed through data generation and augmentation techniques. Results show that fine-tuned models significantly improve prediction accuracy, especially when integrating visual and textual modalities. The CLIP model achieved an F1-score of 90.05% when fine-tuned on image-text data, while GPT-4o reached 87.87% in zero-shot reasoning tasks using chain-of-thought prompting. ImageBind performed well with image-text fusion, though audio integration produced mixed outcomes. This research demonstrates that multimodal approaches—especially the combination of vision and language—can substantially enhance locomotion prediction in complex industrial environments.

Date

11-21-2024

Recommended Citation

Ahmadi, Ehsan, "Vision-Language Integration for Enhanced Locomotion Mode Prediction" (2024). LSU Master's Theses. 6065.
https://repository.lsu.edu/gradschool_theses/6065

Committee Chair

Jasim, Mahmood

Download

Included in

Artificial Intelligence and Robotics Commons

COinS

LSU Master's Theses

Vision-Language Integration for Enhanced Locomotion Mode Prediction

Semester of Graduation

Degree

Department

Document Type

Abstract

Date

Recommended Citation

Committee Chair

Included in

Search

Browse

Author Corner

SPONSORED BY

LSU Master's Theses

Vision-Language Integration for Enhanced Locomotion Mode Prediction

Author

Semester of Graduation

Degree

Department

Document Type

Abstract

Date

Recommended Citation

Committee Chair

Included in

Share

Search

Browse

Author Corner

SPONSORED BY