Predicting Folding Dynamics and Thermodynamic Stability of Viral Proteins Using Graph Neural Networks
Abstract
The importance of protein folding and stability has been highlighted in the field of virology since these structural properties impact virus infectivity, immune recognition and drug-target interactions. As computational biology has advanced over the last few decades, the accurate and scalable prediction of folding dynamics and thermodynamic stability in viral proteins stands as a major challenge due to high structural heterogeneity and mutation-driven variability of viruses, as well as the complex topologies of their proteins. Molecular dynamics simulations, the traditional method for protein folding prediction, is computationally expensive and even sequence-based models often ignore critical spatial dependencies. To overcome these limitations, this work presents a framework for viral protein tertiary structure encoding as residue-level graphs for simultaneous prediction of both folding states and Gibbs free energy (ΔG)-based thermodynamic stability using a Graph Neural Network (GNN). We assembled a curated experimental dataset of viral protein structures from the Protein Data Bank (PDB) that were rigorously cross-referenced with matched ΔG annotations from the ProTherm database. Proteins are modeled as undirected graphs, where the nodes correspond to amino acids (the residues in the protein), and the edge is defined by the spatial proximity and bonding pattern between the amino acid residues. We represented node features as evolutionary conservation scores, B-factors, solvent accessibility, hydrophobicity indices and 3D coordinates. Our model combines multi-layer graph convolutions with self-attention-based message passing to learn representations across protein topologies that are both hierarchical and spatially informed. Folding state classification was obtained based on contact map continuity, residue clustering, and secondary structure annotations, and ΔG stability priors were calculated using supervised regression. The model yielded 92.6% average classification accuracy and 0.89 Pearson correlation for ΔG prediction, outperforming baseline convolutional and sequence-based deep-learning models by a significant margin. We show that GNNs provide a scalable and biologically interpretable framework for viral protein structure–function modeling. Here, we fill a much-needed gap between structural information about proteins and how these proteins behave dynamically, providing essential tools for viral surface protein characterization, mutation effect prediction and corresponding therapeutic rational design.
Keywords: Graph Neural Networks (GNNs), Protein Folding Prediction, Thermodynamic Stability, Viral Proteins, Structural Bioinformatics, Gibbs Free Energy (ΔG), Protein Data Bank (PDB), ProTherm Database, Residue-Level Graph Modeling