Bystroff Lab Downloads

I-sites Library of sequence-structure motifs

HMMSTR-SS secondary structure prediction

HMMSTR-CM contact map prediction

HMMSTR utilities

HMMSUM pairwise alignment using a local structure-based model

MASKER molecular surface area calculator

PROTEAN torsion space molecular simulations

SCALI non-sequential structure-based alignment tool

Backbone ensemble generation for protein design in PyRosetta

InteractiveROSETTA

Click here for help compiling fortran 90

MASKER molecular surface package

This package contains a Fortran 90 module for calculating the solvent excluded surface and the solvation free energy of any molecule in PDB format. Atomic radii and surface tensions may be set by the user. The solvation free energy is assumed to be a weighted sum of the solvent excluded surfaces. where each atom has a weight determined by the solvation free energy of that atom type. New atom types may be defined by the user.
The downloadable package includes a program (pdbmask) for calculating the SES and the solvation energy. It also optionally outputs a raster3D file.
Servers are available for calculating the buried surface between all pairs of amino acids in a structure (MASKER-CM), and for calculating the locations of buried void spaces within a structure (VOIDMASK).
The MASKER module compiles using gfortran.

MASKER downloads

MASKER contact map server

MASKER buried voids server

Please cite:

Bystroff C. (2002). MASKER: Improved solvent excluded molecular surface area estimations using Boolean masks. Protein Eng 15, 959 - 965 abstract PDF

HMMSTR secondary structure prediction

This package contains the programs needed to predict secondary structure starting with a sequence profile. The sequence profile (a vector of 20 probabilities for each residue in the sequence) can be the output of a profile HMM such as HMMer. It may also be the output of Psi-Blast, which uses profiles internally, or may be generated from a multiple sequence alignment. The programs in this package, HMMSTR and associated format converters, will give you a probabilistic prediction of each of the six DSSP symbols: H,E,G,S,T and _. For now, this is a bare-bones package. Note that this is a small part of the script that runs from the HMMSTR/Rosetta server. You will need a Unix system and C++ and Fortran90 compilers to run the package.

Please cite:

Bystroff C, Thorsson V & Baker D. (2000). HMMSTR: A hidden markov model for local sequence-structure correlations in proteins. Journal of Molecular Biology 301, 173-90. abstract PDF

Download Compressed TAR file

HMMSTR models

HMMSTR-CM contact map prediction

This package contains the programs needed to predict the contact potential map for a protein. As above a sequence profile is the input. HMMSTR-CM gives you a JPEG image and text file showing the residues that are most likely to come into contact (distance < 8.0A) in the folded structure. This too is a bare-bones package. This script runs as part of the HMMSTR/Rosetta server. You need a Linux/pentium system, gcc and pgf90 compilers to install the package.

Please cite:

Shao Y & Bystroff C. (2003a). Predicting inter-residue contacts using templates and pathways. Proteins, Structure, Function and Genetics 53 Suppl 6:497-502. abstract PDF

Download Compressed TAR file

PROTEAN molecular simulations

PROTEAN is a set of Fortran subroutines for calculating the equations ofmotion in torsion space for polypeptides. Torsion space is the space of all rotatable bonds. Bond lengths and bond angles remain fixed at their ideal values in a torsion space simulation. Simulations in torsion space are at least ten times more efficient than simulations in Cartesian space. Protean is lightly documented, but the adventurous student of molecular simulations may discover the hidden script language by trial and error, and the help command. Requires a Fortran compiler such as PGF90.

Download Compressed TAR file

If you find this useful,please cite:

Bystroff, C. (2001) An alternative derivation of the equations of motion in torsion space for a branched linear chain. Protein Engineering 14, 825-828. abstract

SCALI: Non-sequential structure-based alignments.

Proteins of the same class often share a secondary structure packing arrangement but differ in how the secondary structure units are ordered in the sequence. We find that proteins that share a common core also share local sequence-structure similarities, and these can be exploited to align structures with different topologies. In this study, segments from a library of local sequence-structure alignments were assembled hierarchically, enforcing the compactness and conserved inter-residue contacts but not sequential ordering. Previous structure-based alignment methods often ignore sequence similarity, local structural equivalence, and compactness. The new program, SCALI (Structural Core ALIgnment), can efficiently find conserved packing arrangements, even if they are non-sequentially ordered in space. SCALI alignments conserve remote sequence similarity and contain fewer alignment errors. Clustering of our pairwise non-sequential alignments shows that recurrent packing arrangements exist in topologically different structures. For example, the 3-layer sandwich domain architecture may be divided into four structural subclasses based on internal packing arrangements. These subclasses represent an intermediate level of structure classification, more general than topology but more specific than architecture as defined in CATH. A strategy is presented for developing a set of predictive hidden Markov models based on multiple SCALI alignments.
An online SCALI structure comparison server is available here.

Download compressed TAR file

If you find this useful,please cite:

Yuan X & Bystroff, C. (2005) Non-sequential Structure-based Alignments Reveal Topology-independent Core Packing Arrangements in Proteins. Bioinformatics 27(7):1010-1019. abstract

PDF

HMMSUM structure-based substitution matrices

HMMSUM (HMMSTR-based SUbstitution matrices) is a new model for structural context-based amino acid substitution probabilities consisting of a set of 281 matrices, each for a different sequence-structure context. HMMSUM does not require the structure of the protein to be known. Instead, predictions of local structure are made using HMMSTR, a hidden Markov model for local structure. Alignments using the HMMSUM matrices compare favorably to alignments carried out using the BLOSUM50 matrix when validated against curated remote homolog alignments from BAliBASE. HMMSUM has been implemented using local Dynamic Programming and with the Bayesian Adaptive alignment method. The download package contains the essential programs from HMMSTR (see above) and the HMMSTR model itself, alng with Smith-Waterman local dynamic programming and Bayesian Adaptive alignment programs, modified to use the HMMSUM matrices. A server for HMMSUM alignment is under construction.

Download compressed TAR file

If you find this useful,please cite:

Huang, Y-M, & Bystroff, C. (2006) Improved pairwise alignment of proteins in the Twilight Zone using local structure predictions. Bioinformatics 22(4):413-422 PDF

Ensemble generation scripts for PyRosetta

Motivation: Mutations in homologous proteins affect changes in the backbone conformation that involve a complex interplay of forces and are hard to predict. Protein design algorithms need to anticipate these backbone changes in order to accurately calculate the energy of the structure given an amino acid sequence, and they must do so without the knowledge of the final, designed sequence.
Results: We explored the ability of the Rosetta suite of protein de-sign tools to move the backbone from its position in one structure (template) to its position in a homolog structure (target) as a function of the diversity of the backbone ensemble, the percent sequence identity, and the size of the local zone being modeled. We describe a pareto front in the likelihood of moving the backbone toward the target as a function of ensemble diversity and zone size. The num-bers presented here will be useful for homology modeling and for protein design using the piecemeal approach.
ensemblegen.py superimp.py

NOTE: Requires PyRosetta and mpi2py

If you find this useful,please cite:

Schenkelberg, C. D., & Bystroff, C. (2016). Protein backbone ensemble generation explores the local structural space of unseen natural homologs. Bioinformatics, 32(10), 1454-1461.

InteractiveROSETTA

Summary: Modern biotechnical research is increasingly becoming more reliant on computational structural modeling programs to de-velop novel solutions to pressing scientific questions. Rosetta is one such protein modeling suite that has already demonstrated wide applicability to a number of diverse research projects. Unfortunately, Rosetta is largely a command-line driven software package which restricts its use among non-computational researchers. Some graphical interfaces for Rosetta exist, but typically are not as sophis-ticated as commercial software. Here we present InteractiveROSETTA, a graphical interface for the PyRosetta framework that pre-sents easy-to-use controls for several of the most widely-used Ro-setta protocols alongside a sophisticated selection system utilizing PyMOL as a visualizer. InteractiveROSETTA is also capable or interacting with remote servers running a standalone Rosetta install, rendering it easy to incorporate more sophisticated protocols that are not accessible in PyRosetta and/or require significant computa-tional resources.
Availability: InteractiveROSETTA is freely available at github.com/schenc3/InteractiveROSETTA. This python script requires Python, and a separate download of PyRosetta which is available at http://www.pyrosetta.org after obtaining a license (free for academic use).
Contact: schenc3@rpi.edu, bystrc@rpi.edu
If you find this useful,please cite:

Schenkelberg, CD & Bystroff, C. (2015) InteractiveROSETTA: A client graphical user interface for the Py-Rosetta and Rosetta protein modeling suite. Bioinformatics btv492

email me

Last updated Tue Jan 3 16:39:41 EST 2017