Invited Speaker


Dr. Eric J. Sorin

Dr. Eric J. Sorin

Associate Professor
Department of Chemistry & Biochemistry, California State University, Long Beach, USA
Speech Title: Overcoming the Heuristic Nature of k‑Means Clustering: Identification and Characterization of Binding Modes from Simulations of Molecular Recognition Complexes

Abstract: Abstract: The accurate and reproducible detection and description of thermodynamic states in computational data is a nontrivial problem, particularly when the number of states is unknown a priori and for large, flexible chemical systems and complexes. To this end, we report a novel clustering protocol that combines high-resolution structural representation, brute-force repeat clustering, and optimization of clustering statistics to reproducibly identify the number of clusters present in a data set (k) for simulated ensembles of butyrylcholinesterase in complex with two previously studied organophosphate inhibitors. Each structure within our simulated ensembles was depicted as a high-dimensionality vector with components defined by specific protein−inhibitor contacts at the chemical group level and the magnitudes of these components defined by their respective extents of pair-wise atomic contact, thus allowing for algorithmic differentiation between varying degrees of interaction. These surface-weighted interaction fingerprints were tabulated for each of over 1 million structures from more than 100 μs of all-atom molecular dynamics simulation per complex and used as the input for repetitive k-means clustering. Minimization of cluster population variance and range afforded accurate and reproducible identification of k, thereby allowing for the characterization of discrete binding modes from molecular simulation data in the form of contact tables that concisely encapsulate the observed intermolecular contact motifs. While the protocol presented herein to determine k and achieve non-heuristic clustering is demonstrated on data from massive atomistic simulation, our approach is generalizable to other data types and clustering algorithms, and is tractable with limited computational resources.

Keywords: molecular dynamics, enzyme inhibition, interaction fingerprint, contact motif, contact table


Biography: Research in the Sorin Lab focuses on using molecular modelling and simulation to examine the structure and dynamics of biological molecules of varying sizes and chemical compositions. The Folding@Home Distributed Computing network allows us to use hundreds of thousands of personal computers donated by clients from around the world to run large numbers of molecular simulations, giving us the ability to predict ensemble average kinetic, thermodynamic, and structural/mechanistic properties, and thereby bridging the gap between single-molecule and bulk experimental measurements. We use this infrastructure to simulate tens of thousands of molecular systems at a time, including (a) probing the physics of RNA folding dynamics and energetics, (b) studying the intermolecular interactions within enzyme-inhibitor complexes, and (c) investigating the structural and energetic implications of single-point mutations in the collagen triple-helix structure, as related to human health and disease.