From: Sergei Izrailev (sergei_at_ks.uiuc.edu)
Date: Wed Jun 11 1997 - 16:45:17 CDT

Hello,

In this issue of VMDTech we present Hingefind -
a novel algorithm to investigate domain motions in proteins
(by Willy Wriggers).

---------------------------------------------------------------------

1. General remarks
==================

Hingefind is an algorithm for the identification of domain
movements and their characterization and visualization by effective
rotation axes (hinges). The program compares two known structures
(e.g. two different crystal structures of a protein or the results
of molecular dynamics simulations) and partitions the protein with a
prespecified tolerance in preserved subdomains. It then determines
effective rotation axes which characterize the domain movements with
respect to the reference domain ("rigid core"). With little
modification, the parts of the algorithm can also be used alone,
i.e. one can assign domains manually and let the algorithm determine
the movements between the domains, or one can use the algorithm to
partition a protein into preserved subdomains. The method does not
require any previous knowledge about functionally relevant domains
or hinged domain motions, however a critical analysis of the results
using the program output is recommended. The output files provide
information about the accuracy of the found effective rotation. The
variety of options and tolerances allow to find an optimal
partitioning. The user can assess the validity of the depicted
movement and change the script if necessary.

2. Examples
===========

Examples of the application of the method can be found at
http://www.ks.uiuc.edu/Research/METHOD/domain_mov/ - or simply
follow the instructions below in 6.) and check it out yourself.

3. About the method
===================

The algorithm is separated in two parts: the "partitioning" and the
"rotational fit" section. The "partitioning" part determines
domains with preserved structure in the two compared coordinate
sets, depending on a prespecified tolerance. The method uses the
least-squares fit method (W. Kabsch, 1976) to superimpose
structures. A domain is found in a iterative "adaptive selection"
procedure, in which poor matching residues are excluded from the
domain and good matches are included (in "slo" mode partitiononing,
available in the X-PLOR version, only the heaviest connected set is
considered, maintaining the connectivity of the changing domain).
After a partitioning of the protein the rigid domains are sorted by
their size and domains with a size larger than a cutoff considered
for further evaluation.

The "rotational fit" method attempts to locate a rotation axis
which characterize the movement of the domain as a hinged rigid body
motion. The problem is to find an approximation of the rigid body
motion with the constraint that transformations are not allowed,
only rotations about the unknown hinge axis. Note that the the
rotation about an axis without translation, in general, will not
yield the closest fit of the Kabsch least-squares method. The
accuracy of the approximation can be assessed by comparing the
least-squares fit with the proposed rotational fit. The relative
error (in percent) is computed as

[ RMSD (rot. fit) - RMSD (least sq.) ] / v,

where v is the COM separation of the domain in both positions.

The constructed rotation upholds (similar to a least-squares fit)
the removal of the COM difference between the two coordinate sets,
but approximates the rotation relative to a least-squares fit. Thus,
the validity of the approximation can also be assessed by checking
the "projection angle" beta between the least-squares rotation axis
and the hinge axis, which should be small. One finds that the method
works best for larger domains comprising several secondary
structure elements. The projection angle beta and the relative error
can be found in the output of the Hingefind runs.

4. Implementations
==================

Currently, users have the choice of two implementations of Hingefind.

* X-PLOR scripting language implementation.
* Tcl plug-in for our free molecular graphics program vmd.

The two versions of Hingefind are complementary in their
application. The X-PLOR version is designed for longer batch jobs
which allow the user to run a large number of trial runs in various
modes. The Tcl version of the program runs with our free molecular
graphics package vmd and offers the user greater flexibility in the
selection of suitable atoms for comparison. It is designed for
interactive use. For efficiency in interactive use, the Tcl version
does not provide the option to maintain the spatial connectivity of
found domains.
In the following, I will only discuss the Tcl version for vmd. The
X-PLOR version of hingefind is described at URL
ftp://lisboa.ks.uiuc.edu/pub/hingefind/hingefind_xplor.html

5. About the Tcl version
========================

The Tcl (John Ousterhout, 1988) version of Hingefind is designed
for interactive use with our molecular graphics package vmd. The
version for vmd has the advantage that it only requires PDB files as
input. Thereby, selections of atoms are easier to adapt to the
user's needs than with X-PLOR. E.g., conformations of nonidentical
proteins can be compared, as long as pairs of atoms in both
structures can be matched sequentially.

6. The Tcl source file for vmd
==============================

There is a single file at the ftp site
(ftp://ftp.ks.uiuc.edu/pub/hingefind/hingefind.tcl) which contains
the algorithm.

Start up vmd and source the Tcl script by typing

source hingefind.tcl

in the vmd command shell and return. The default setup of the
script loads two structures of lactoferrin from the PDB databank. To
load these structures, type

load

and return. The procedure load now initializes the program, selects
the pairs of corresponding atoms (default: alpha carbons) and
initializes some global variables. To partition the protein
structures with a given tolerance epsilon, type

partition epsilon

and return. The tolerance epsilon must be entered in Angstrom
units. The tolerance should be larger than the imprecision of the
coordinates in the two structures (local noise level), but smaller
than the total rms deviation of the two conformations. The protein
now iteratively finds the rigid domains, sorts them by size, and
draws tubes in the color of the domain-id, starting at domain-id 0,
the largest found domain. To determine the effective rotation axis
(hinge) and other information about the relative movement of two
domains domid1 and domid2, enter

hinge domid1 domid2

and return. Domain domid1 is the reference domain and domain domid2
the moving domain. Both structures are superimposed by the
reference domain. The rotation axis is shown as an arrow with it's
orientation representing a left-handed rotation about the axis. A
"pivot" point can be found in the middle of the arrow which is
connected to the COM of the moving domain in both structures to
illustrate the rotation angle. The rotation angle and other useful
information about the run, the domains, and the accuracy of the
rotational fitting can be found in the output (vmd command shell).

An example vmd session would look like this:

source hingefind.tcl
load
partition 2.0
hinge 0 1
hinge 0 2
partition 1.8
hinge 0 1

The user has to edit the procedure load to select different PDB
files for comparison, to change the selection of atoms for
comparison, or to customize the hingefind computation by resetting
some of the global variables. Here is a partial list of variables
which may be changed:

* $maxccounter: The number of maximum cycles of the "converge"
loop. In case the algorithm does not converge within the specified
number of cycles (this was very rarely observed at extreme
tolerances), a warning message is written.

* $ndomains: The max. number of domains to be found. Recommended:
Set very large to yield full partitioning. Small values (2-5) should
be used for test runs.

* $cutdom: Minimum domain size of the found domains.

The initial rendering of the loaded molecules may also be altered
in procedure load. By default molecule 1 is shown in purple and
codes with colored tubes for the found domains, while molecule 2
remains a solid white line which follows the backbone.

7. Output
=========

The rotation angle and other useful information about the hinge
rotation, the domains, and the accuracy of the rotational fitting
are written to the vmd command shell. The information includes:

* the pivot point of a rotation
* the effective rotation axis (hinge)
* the effective rotation angle in degrees
* the rmsd (least squ.) of the moving domain
* the rmsd of the moving domain after hinge rotation
* the relative error (see Hingefind homepage)
* the projection (deviation) angle beta in degrees (see Hingefind
homepage)

8. About the Tcl code
=====================

A programmer can find more information about available vmd commands
and routines at the vmd user's guide. All procedures and variables
are documented in the program. The datastructures used to store
structural information and domain membership are briefly described
here:

* list of lists of indices - Used by procedures criterion,
superimpose, seed and convergence to store information about
selected atomindices in molecules 1 and 2. A list containing a pair
of lists of equal length. The first stores the atom indices of
molecule 1. The second list stores the corresponding atom indices of
molecule 2.

* list $dindex - List of domain membership indices for each atom of
$full1 (atom selection in procedure load). This list gets updated
by procedure update_boundaries once a domain has been found.

* list $sup_rmsd - Contains a history of conformance of each atom
of $full1 with previously found domains. Procedure update_boundaries
updates the domain membership indices only for those atoms which
show an improved conformance with the new domain. The conformance of
an atom is measured by the separation of the corresponding atoms in
molecule 1 and 2 depending on the domain chosen for superposition.

* 'domindex' arrays - after the protein is fully partitioned into
domains, the procedure partition transfers the membership
information from the list $dindex to the arrays $domindex1($i) and
$domindex2($i), which contain the atom indices of molecules 1 and 2
corresponding to each domain $i. The domindex arrays are then sorted
in the procedure sort_n_render to be used for the determination of
the effective rotation axis in procedure hinge.

9. Reference
============

W. Wriggers and K. Schulten, Protein Domain Movements: Detection of
Rigid Domains and Visualization of Hinges in Comparisons of Atomic
Coordinates, Proteins (1997) In press.

10. Correspondence
==================

The author would appreciate any comments regarding the usefulness
of the algorithm. Please send any comments to wriggers_at_ks.uiuc.edu.

--------------------------------------------------

Sergei Izrailev

--