MFBM-PS02

Discovering Sequence-Activity Relationships using Machine Learning: Convolutional Neural Networks (CNNs) and Gaussian Process Regressions (GPRs).

Tuesday, June 15 at 03:15pm (PDT)
Tuesday, June 15 at 11:15pm (BST)
Wednesday, June 16 07:15am (KST)

SMB2021 SMB2021 Follow Tuesday (Wednesday) during the "PS02" time block.
Share this

Samarth Kadaba

University of California, Santa Barbara
"Discovering Sequence-Activity Relationships using Machine Learning: Convolutional Neural Networks (CNNs) and Gaussian Process Regressions (GPRs)."
We show how recent machine learning methods can be utilized to learn representations for classes of proteins and other macromolecules that relate sequence information to predicted activity on specific substrates. In the synthetic evolution of enzymes, predicting activity is a crucial step towards finding functional unscreened variants. However, identifying sequence-activity relationships for organic polymers is complicated by latent features associated with the 3D structure of the folded molecule. Here we propose methods that use Convolutional Neural Networks (CNN's) to extract from 3D structural information, such as crystallographic data and Molecular Dynamics (MD) simulation data, relationships between sequence and activity. In particular, we develop CNN feature extractors for kernels within Gaussian Process Regressions (GPRs) to make predictions on sequence space. As a demonstration, we use data of amino acid rotamers to show how CNN's can predict amino acid torsion angles from a sequence of Chi angle conformation types. Applying these networks to develop kernel functions for GPRs we predict conformational state information and predicted activities. We perform studies also on toy models to show how using deep learning approximations of macromolecular structure can yield representations of sequence-activity relationships potentially useful for synthetic evolution.










SMB2021
Hosted by SMB2021 Follow
Virtual conference of the Society for Mathematical Biology, 2021.