Semester of Graduation

Spring 2025

Degree

Master of Science (MS)

Department

Division of Computer Science & Engineering

Document Type

Thesis

Abstract

Reverse engineering is a cybersecurity process that focuses on understanding the underlying functionality of software or malware. This is an arduous process that demands large amounts of time and effort from cybersecurity practitioners. Large Language Models (LLMs) offer a potential solution to this problem. LLMs have worked their way into various fields of cybersecurity in recent years, including incident response and malware classification. However, LLMs have historically struggled with low-level code comprehension: a necessary part of reverse engineering. While LLMs can generate code and explain its function on the surface level, they struggle to grasp the wider context. In this paper, we utilize parameter-efficient fine-tuning to train LLMs to generate contextual comments for x86 assembly code in an effort to expedite reverse engineering. We choose LLMs from several parameter classes in each of the Qwen2.5-Coder, CodeLlama, and CodeGemma families to fine-tune on a dataset of x86 assembly code. We evaluate each model's performance on cross entropy loss and cosine similarity before and after fine-tuning. We observe promising results, with a significant boost in similarity score for five out of the seven LLMs selected. This is particularly evident in the 0.11 increase in simialrity score for Qwne2.5-Coder-7B and the 0.18 increase in similarity score for CodeLlama-7B after fine-tuning.

Date

4-3-2025

Recommended Citation

Lea, Darrin Michael, "Optimizing LLM x86 Assembly Code Comprehension through Fine-Tuning" (2025). LSU Master's Theses. 6140.
https://repository.lsu.edu/gradschool_theses/6140

Committee Chair

Dr. James M Ghawaly

Download

Included in

Cybersecurity Commons

COinS

LSU Master's Theses

Optimizing LLM x86 Assembly Code Comprehension through Fine-Tuning

Semester of Graduation

Degree

Department

Document Type

Abstract

Date

Recommended Citation

Committee Chair

Included in

Search

Browse

Author Corner

SPONSORED BY

LSU Master's Theses

Optimizing LLM x86 Assembly Code Comprehension through Fine-Tuning

Author

Semester of Graduation

Degree

Department

Document Type

Abstract

Date

Recommended Citation

Committee Chair

Included in

Share

Search

Browse

Author Corner

SPONSORED BY