Degree
Doctor of Philosophy (PhD)
Department
Mathematics
Document Type
Dissertation
Abstract
This research explores both theoretical and practical aspects of disentangled representation learning by extending the VAE framework. We address the core challenge of extracting independent generative factors from observed data while preserving high reconstruction fidelity. To this end, we propose two novel VAE variants: (i) the $\lambda\beta$-VAE, which incorporates an additional $\ell^2$-norm reconstruction loss to improve accuracy, and (ii) the $\gamma\beta$-VAE, which introduces a mutual information regularization term to encourage independence across latent dimensions.
Our theoretical analysis is conducted in a linear Gaussian setting, where we derive optimal solutions for these VAE-based models. We further examine how varying levels of correlation among ground truth factors influence disentanglement performance, using established metrics such as SAP and MIG, along with a newly proposed mutual information-based metric, $I_m$. Empirical evaluations on the dSprites dataset show that introducing explicit architectural constraints can effectively address key limitations of standard VAE approaches.
Date
4-15-2025
Recommended Citation
Vu, Minh Hong, "Disentanglement in Representation Learning: Interpretability in Dimension Reduction With VAE" (2025). LSU Doctoral Dissertations. 6744.
https://repository.lsu.edu/gradschool_dissertations/6744
Committee Chair
Wan, Xiaoliang