Degree

Doctor of Philosophy (PhD)

Department

Biological Sciences

Document Type

Dissertation

Abstract

Machine learning has become a key tool in computational biology, enabling large-scale analysis of complex biological systems and supporting advances in therapeutic discovery. However, major challenges persist, including limited availability of labeled data, scarcity of experimental negative examples, and difficulty in modeling multi-scale biological interactions underlying microbial metabolism and cancer drug response. Existing approaches often lack biologically meaningful representations for proteome-driven metabolic inference and fail to capture network-level dependencies in drug synergy. This dissertation addresses these gaps through the development of data-driven frameworks for predictive modeling in the human gut microbiome and anticancer drug combinations.

First, this work develops a machine learning framework to predict bacteria–metabolite interactions in the human gut microbiome. A random forest (RF) model integrates enzyme-level features derived from Enzyme Commission (EC) encodings with metabolite structural embeddings to classify metabolite consumption and production. Later, the study delves into an alternate strategy to address the absence of negative to generate biologically relevant negative sets including: a distance-based synthetic approach using structurally dissimilar compounds, and an experimentally informed approach incorporating structural information from validated negative instances. These methods improve dataset quality and model robustness, enabling accurate prediction of microbial metabolic behavior. The study further examines the relationship between proteome similarity and metabolism using mean amino acid identity (AAI). Results show that increased proteomic similarity between microbial species corresponds to greater similarity in metabolic profiles, supporting the use of proteome-level features for functional inference beyond taxonomy.

Second, to address data limitations in drug discovery, this dissertation introduces a data augmentation strategy for anticancer drug synergy prediction. A pharmacologically grounded similarity metric is used to generate realistic drug combinations, expanding benchmark datasets and improving model generalization. Building on this, SynerGNet, a graph neural network model, is developed to predict drug synergy by integrating heterogeneous biological data within protein–protein interaction networks. This approach captures network-level dependencies and outperforms traditional machine learning methods on benchmark datasets.

Overall, this work advances predictive modeling by integrating biologically informed data generation, proteome-based representations, and graph-based learning, with applications in microbiome research, precision medicine, and drug discovery.

Date

5-20-2026

Committee Chair

Brylinski, Michal

LSU Acknowledgement

1

LSU Accessibility Acknowledgment

1

Available for download on Friday, May 11, 2029

Share

COinS