From: Andrew Dalke (dalke_at_ks.uiuc.edu)
Date: Fri Mar 07 1997 - 16:22:57 CST

In this week's issue of VMDTech we decided to cover some of the
details about how VMD makes its images. There isn't much about the
scripting language in this one, but it does get to be somewhat long,
so is split into two parts. Next week's version will cover the
rendering styles not mentioned here.

                                                Andrew Dalke
                                                dalke_at_ks.uiuc.edu
P.S.

  As we just found out, the VMD online web registration form at
http://www.ks.uiuc.edu/Research/vmd/VMDfeedback.html wasn't configured
properly, and hasn't been so for almost a year. If you were one of
the 30 or so people that registered your use during that time (or want
to do now), please fill out that form. We keep track of this
information so we have an idea of how many users there are.

 = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = =
                        Drawing Methods

  All of the drawing methods are implemented in one of three files,
DrawMolItem.C, DrawMolItem2.C, and DrawMolItemSurface.C and for
version 1.2 there is a helper file, DrawMolItemSolventPoints.data.
Nearly all of the functions are implemented in the first file, and
I'll go through the list of them and explain how they work. Actually,
this order is somewhat different than the list you get from the pop-up
menu in the graphics form, but I think this way lends itself better to
understanding.

 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
                        Points

  Probably the easiest to understand, this draws a point at the
location of the atom.

 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
                        VDW

  For each selected atom, draw a sphere centered at the atom position
with radius scaled by the atom name. Sounds easy enough, but there
are some difficulties.
  First off, the GL (and OpenGL) graphics libraries don't have a
sphere primitive. Instead they provide another library which
constructs the spheres out of triangles. The GL sphere library (see
'man libsphere' uses recursive bisection to generate these triangles.
At the most primitive level it starts with a shape, like a square
bipyramid (two pyramids with a square base joined base-to-base). For
each level of recursion, given a triangular face, compute the
endpoints of the centers and scale it so it is on the surface of the
sphere. Given the endpoints and the centers you can construct four
sub-triangles. Thanks goes to Jon Leech <leech_at_cs.unc.edu> for
explaining this and for the following picture and pseudocode:

         Subdivide each triangle in the old approximation and normalize
          the new points thus generated to lie on the surface of the unit
          sphere.
         Each input triangle with vertices labelled [0,1,2] as shown
          below will be turned into four new triangles:
         
                               Make new points
                                   a = (0+2)/2
                                   b = (0+1)/2
                                   c = (1+2)/2
                 1
                /\ Normalize a, b, c
               / \
             b/____\ c Construct new triangles
             /\ /\ [0,b,a]
            / \ / \ [b,1,c]
           /____\/____\ [a,b,c]
          0 a 2 [a,c,2]

  Luckily, the sphere library takes care of the all the details, so
the only thing special we have to worry about is the initial shape to
use (the square bipyramid) and the number of levels of recursion. Of
those, the only one available to you is the recusion level, which is
the "Sphere Res" control in the graphics form.

  A downside of this method is it is slow. Granted, all the
highlights and 3D perspective a good, but for most cases, Rasmol using
memory-mapped X calls beats out VMD by a factor of 30 or so (but if
you make the spheres _really_ big, Rasmol is slower and can core
dump).

  The other problem is how to determine the radius. Properly you want
to get the radius from the type of atom, but getting the atom type is
a notorious problem. The atom name from the PDB file is allocated
four characters, the first two are (usually) for the atom element and
the last two specify the position. This means 'HG ' is a mercury and
' HG ' is the hydrogen off the gamma carbon. I say usually because
you can get things like '1HG2' for the first hydrogen of the second
gamma carbon.

  Anyway, VMD doesn't quite follow this specification, mostly because
we didn't know about it when we implemented the PDB read/write
functions (and neither, as it turns out, do quite a few other
programs). It is somewhat hard to retrofit that nuance into things
and it may not be useful since the new PDB defintions now specifies an
element field off to the far right of the ATOM record. So now we are
faced with the possibility of reading several different formats of PDB
files, at which point I start to pull my hair out.

  In the end, we decided just to read the first character of the name
and use that to detemine the radius, and let you scale the radius as
need be. This code is in BaseMolecule.C in default_radius (and
default_mass for the mass and default_charge for charge). Actually,
looking at the code we should probably look at the first non-numeric
character, so I'll have to add that to the list of things to fix.

  You can override this if you want. For instance, suppose all the
HGs in residues named ION of the "top" molecule are mercury ions, and
they should all have a radius of 1.9A (no, I don't know the real
radius of mercury, but I do have a reference somewhere)

Then you can do:

  set sel [atomselect top "name HG and resname ION"]
  $sel set radius 1.9

You'll also have to press "Apply Changes" in the graphics form to
update everything that depends on the radius. BTW, you can also set
the mass and charge on an atom-by-atom basis.

 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
                                Dotted

  This one is much easier -- instead of drawing a triangle for every
triangle found with the recursive bisection used for VDW, just draw a
point for each vertex. Otherwise the options (and problems) are the
same.

  It doesn't do a good job at describing a dot surface. The recursive
bisection method doesn't give a uniform distribution over the surface
and there is no checking to see if the points overlap with another
atoms

 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
                                Solvent

  Instead, try 'Solvent', which is new in the 1.2 release. This uses
several (precomputed) dot distributions to give a more uniform
coverage of the surface. The method VMD uses to check for overlaps
isn't technically correct, but it is fast and works well enough.

  For each point of the surface distribution (of radius r = atom
radius + probe radius) of atom i, check each of the atoms j to which
it is covalently bound. If the point is too close to j, don't display
it. Also, if the point is to close to any neight k of j (k != i) then
don't draw it. This is fast since there aren't that many neighbors to
check, but it doesn't omit parts of the surface in contact with atoms
which aren't one or two bonds away. This can be considered a good
thing since you might get a better idea of the contact surface.

  There are three parameters for this option. One is the probe
radius, which was mentioned in the description. If the probe radius
is too large, the problem of over-lapping surfaces between
non-connected atoms becomes more apparent. The second is "Detail,"
which should probably be renamed "Density" as it determined the
surface density of the distributions. The higher the detail, the
higher the density.

  The final option is the "Method." By default the surface is drawn
as a point, but a point is a pixel in size no matter the scale of the
molecule, so when scaled small the surface density appears high, and
when scaleed large, the density appears low.

  Method 2 draws little plus signs instead of points, which does scale
better so the density appears more contant. Method 3 draw lines
between the surface points that are on the same atom, but makes no
attempt to connect the two spheres. It doesn't look to bad, but I'm
not sure about its usefulness, so I'll leave that to you to decide.

  Thanks to Jan Hermans <hermans_at_med.unc.edu> for implementation
pointers and thanks again to Jon Leech <leech_at_cs.unc.edu> for the code
to compute the uniform point distributions. That code will be
included as part of the 1.2 distribution. It does an electostatic
minimization of N charges on a sphere. When N gets large, this
computation takes a minute or so, so instead of computing these values
on the fly, they are precomputed and saved in the file
DrawMolItemSolventPoints.data.

  
 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

                                Lines

  This option is also known as 'wireframe'. It adraws a line between
an atom and the atoms to which it is bonded. Both atoms have to be
selected for the bond to be drawn (we tried drawing half-bonds when
only one atom is selected, but that was very confusing). The first
half of the bond is colored appropriately for the first atom, and the
second half for the second atom.

  The only parameter for this option is the line thickness. On some
SGIs you'll only see a difference between sizes of 0, 1 and 2, but
anything past 2 looks the same as 2. See the man page for linewidth
to understand why this is so.

  The biggest problem with drawing things as lines is trying to
determine the bond connections. VMD only supports one file format
that lists the complete bond topology, and that is an XPlor PSF file
(though we are working on AMBER and other formats). The PDB files
does have CONECT records, but the documentation says they are used for
non-standard connections, such as lipids. Thus, to do this properly
VMD would have to have to know the standard definitions and how the
use the CONECT records, and then for systems which don't have CONECT
infomation and/or are not standard (like a buckeyball) still have to
have a way to construct the information on the fly.

  It was easier just to give our best guess, and for the most part
this works pretty well. The function find_bonds_from_coordinates in
MoleculeFilePDB.C does a distance search, and if two atoms are within
R1*R2*0.6 of each other, where R1 and R2 are the respective radii,
then they are considered bonded. Note that this check only happens
once, so if you redefine the radii the bonds are not recomputed.

 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
                                Bonds

  Everything about this options is the same as lines except that
instead of drawing a line between two atoms is draw a cylinder.
Again, to be technically correct it draws an n-sided prism where the
number of sides is determined in the graphics form by the "Bond Res"
control and the radius is the value of "Bond Rad", in Angstroms.

  Actually, I take that back. Imagine two hollow cylinders coming
together so the center of the face of one end is in the same position
as the center of a face of the other cylinder. Suppose these two
cylinders come together at 90 degrees. Part of the two cylinders will
overlap, but there will also appear to be a gap, like this (not the
best ASCII drawing I've made :)

    ----------- Gap
    S |
    S ---+---
    S | | |
    -------+--- |
           | |
           | |
           |~~~~~|

What VMD does is extend both cylinders somewhat so the far ends touch.
This produces more of an overlap which is still noticable if you look
at it closely, but it is much less noticable than the gap.

  If you think about this you might wonder what happens when three or
more bonds join at one atom. What VMD actually does is choose the
lowest numbered bond and extend all other bonds to meet with that one.
It also extends the lowest numbered bond to meet with the second
lowest numbered one. A bit tricky, but not hard to do.

  The default bond resolution of 8 is too high and makes things slow
without much gain in appearance, so I've got to fix that as well. If
the radius or number of sides gets too small, the bonds are drawn as
lines.

 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
                                CPK

  This is a combination of VDW and Bonds (except for the extension
aspect), though the bond radius and the sphere radius are scaled down a
bit. I don't know of many people who use this option.

  If the values for a sphere or bond are too small, that attribute is
not drawn, so you can actually see the gap I was talking about by
setting the "S Rad" to 0 and increasing the "B Rad" to 1.3 or so.

 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
                                Licorice

  This is another combination of VDW and Bonds (again, except for the
extension aspect which isn't needed since the sphere fills in the
empty part). In this case, the radius of the spheres does _not_
depend on the radius of the atom; it is controled by the bond radius
so that the bonds and the spheres are the same size. This produces a
very nice image, but since this and CPK use many spheres and cylinders
they tend to be slow.

 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
                                Trace

  The Trace option is the first method described so far which produces
a reduced representation of the system. The others would show all the
selected atom or bonds whereas this only draws connections between
successive residues. As with the bonds, these are drawn as n-sided
prisms, with the values of the radius and the resolution as chosen in
the graphics form and the cylinders extended to form a smooth join.
If either of the parameters are too small, the trace is drawn as
lines.

  The more complicated problem is how to determine which residues to
follow when building the trace. The obvious method is if two residues
have sequential resids (and are in the same chain/segment) then
connect the two CA atoms (hopefully they are carbons and not
calciums). But what if they have the same resid but successive
insertion codes (you can find these, for example, when the resids are
based on a homologous match to some other protein, but there are a
couple new residues, as in loops)? VMD doesn't even read in the
insertion codes (at some point nature just gets too complicated) so
the original method would fail here.

  In a similar vein, you can have structures with missing resids,
again because the numbers actually came from a homology comparison
(this, by the way, is why you can also find some residues with a
negative resid).

  Instead, we do things the harder (but more satisfying) way.
Assuming the original bond connectivity is correct we build up the
definition of the molecule. Atoms that are connected together and
have the same resid and are not in different segments are put in one
residue and checked to see if it is a protein, nucleic acid, or water.
If two protein residues are connected via a bond between backbone
atoms, then these are part of the same protein. Another data
structure is built that lists the proteins, starting from the amino
end, and the nucleic acids, starting from the 5' end.

  Once this data is available, it is a simple matter of going down the
list and drawing the bonds between the CA's of proteins and between
the P's of nucleic acids. But once again, real life is more
complicated. There are some systems which only have CA's and/or P's
because the full structure could not be accurately resolved. In this
case, VMD can't find any residues since no bonds were generated.
Instead, if no proteins were found it goes back to the original method
and draws a connection between CAs and Ps of successivly numbered
resids, and leaving a gap where the numbers change. (Besides, if you
want to view a calcium gas, why would you want to view the trace?)

  An additional advantage of this second implementation is that those
people doing threading or protein folding and only look at the CA
coordinate and the residue name don't have to worry about building all
the sidechains to see their structure. Also, people working on
polymers can fake their structure by just naming everything " CA ".

  Thinking about this now, it would be nice, and hopefully useful, if
the tube representation (to be discussed next week) could also handle
the case where there are only C-alphas. Yet another something to add
to the list of things to do for the final 1.2 release.

 = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = =