Degree

Doctor of Philosophy (PhD)

Department

Division of Computer Science & Engineering

Document Type

Dissertation

Abstract

Code generation, the automated synthesis of source code from natural language descriptions, has emerged as a transformative technology in software development, promising to accelerate workflows, reduce human error, and democratize programming. Recent advancements in large language models, such as OpenAI's ChatGPT and Meta's LLaMA, leverage transformer architectures and vast pre-training datasets to generate functional code from user queries with remarkable accuracy. However, despite these advancements, their practical application is often undermined by significant challenges: limited reasoning capabilities, insufficient generalization when tasked with novel programming challenges, and a critical lack of robustness, where semantically trivial variations in user prompts can lead to catastrophic failures in the generated code.

This dissertation presents a multi-faceted approach to address these limitations. First, to enhance core reasoning and adaptability, we employ a targeted fine-tuning process that aligns models with specific coding tasks. This process leverages a curated dataset that combines diverse, high-quality synthetic programming tasks with expert-human examples drawn from real-world coding challenges, designed to improve reasoning capabilities. The performance of these enhanced models is then assessed using a new benchmark developed to evaluate complex problem-solving, while ensuring no data contamination takes place, improving upon the limitations of existing works. Second, to directly combat the critical issue of prompt sensitivity, this research introduces a multi-stage architecture that decouples semantic interpretation from code synthesis. This is achieved by translating ambiguous natural language descriptions into a vi formal, unambiguous intermediate representation, which then serves as a stable blueprint for generating the final executable code.

Experimental results demonstrate the efficacy of this dual approach. Fine-tuned models achieve significant advancements in pass@k metrics on context-rich problems, exhibiting improved generalization to novel scenarios and producing functionally correct code. Crucially, the IR-based architecture yields a dramatic improvement in semantic robustness, consistently and substantially outperforming direct-generation baselines across all model scales. This method demonstrates a remarkable ability to correctly interpret and implement subtle but critical changes in problem specifications, reducing logical errors caused by prompt variations. Together, these contributions pave the way for more reliable, robust, and versatile code generation tools, advancing the potential of LLMs to meet the diverse and evolving needs of modern software development.

Date

10-31-2025

Committee Chair

Chen, Jianhua

Available for download on Friday, October 29, 2032

Share

COinS