VAE-Assisted Data Augmentation for Improved Molecular Prediction with Graph Neural Networks (GNNs) in Low-Data Regimes

Document Type

Article

Publication Date

1-1-2025

Abstract

This study presents a novel approach to enhancing molecular property prediction through variational autoencoder (VAE)-assisted data augmentation in low-data regimes. The methodology combines graph neural networks (GNNs) with VAEs to improve predictive accuracy on molecular datasets from MoleculeNet, specifically ESOL (water solubility) and FreeSolv (hydration-free energy). By generating chemically valid molecules that align with the original dataset's chemical space, the approach enhances model performance, particularly for graph attention networks (GATs). Results show significant improvements in prediction accuracy, with GAT models demonstrating increased R2 values from 0.879 to 0.918 for FreeSolv and 0.873 to 0.885 for ESOL when trained on augmented datasets. The study validates the effectiveness of VAE-generated molecules through chemical space analysis and property distribution comparisons, offering a promising solution for molecular property prediction in data-limited scenarios.

Publication Source (Journal or Book title)

Chemical Engineering Transactions

First Page

1021

Last Page

1026

This document is currently not available here.

Share

COinS