Judging X-ray Model Quality

Technical Note

How Structures Are Determined From Diffracted X-Rays

If you know just a little bit about where crystallographic structures come from, you'll be better equipped to use them wisely. This page provides that little bit. To simplify the language, I'll talk about proteins, but the ideas apply to all macromolecules: nucleic acids, carbohydrates, or complexes of these -- anything that will form highly ordered crystals.

To determine the structure of a protein, a team x-ray crystallographers first grows crystals of the substance of interest. Then they irradiate the crystals with x-rays and make images, called diffraction patterns, of the diffracted x-rays that emerge from the crystal. These images can be seen as arrays of dots, called reflections, on a detector (originally a square of photographic film, but more commonly now, a video screen).

If diffracted x-rays could be focused by a lens, the result would be an image of the average molecule in the crystal (and hundreds of crystallographers would be looking for jobs). But no known material can focus x-rays, so crystallographers must employ computers to simulate the action of a focusing lens. For this purpose, they must measure precisely the intensity of each diffracted ray and its angle of emergence from the crystal (or equivalently, its position on the film). They also need to know the relationships among the phases of all the rays. Unfortunately, the phase information, which would all be kept in good order by a lens, is lost when the rays strike and expose the film.

There are various methods (too complex to describe here) of getting rough initial estimates of the phases. One method involves analyzing how the addition of a heavy metal atom to the protein alters reflection intensities. Often, even crude phase estimates allow calculation of a fuzzy molecular image, called an electron-density map. If a map is clear enough, crystallographers can at least tell where protein ends and solvent begins, and thus define the overall shape of the molecule. Then they can compute the phases that this crude model would exhibit, if it were correct. These phases serve as better estimates of the true phases.

Using these phases and the original data (intensities and positions of x-ray reflections), they calculate a new map, which should be more detailed. Perhaps continuous chains of density can now be discerned, and some or most of the amino-acid residues (previously determined from chemical sequencing) can be built to fit the map. From each new, more detailed model, crystallographers calculate new phases. If much of the new model is accurate, calculation using the new phases and the original data give an even clearer map, into which more of the model can be built accurately.

At some point, the map exhibits all parts of the molecule, allowing all residues to be built convincingly, and even reveals images of water molecules bound to the macromolecular surface. Most of these waters are of no significance to the function of the protein, and they probably do not reside in the same positions in solution. But adding models of the water molecules revealed by the map makes the overall model more complete and accurate, and thus improves the accuracy of calculated phases and the clarity of the map. Finally, this repetitive, bootstrapping process -- model building, calculating phases, calculating new maps, and rebuilding -- converges, and the latest model gives the same phases as the previous one. The resulting model is as accurate as the data allows. The atomic coordinates of this model are then deposited in the Protein Data Bank.

In addition to the coordinates, the PDB file contains some other useful information. The file header (comments at the beginning) should include the R-factor for the model, a statistical measure that tells how well the model fits the original data. Smaller R is better. For proteins, a desirable target R-factor for a model at 2.5 angstrom resolution is 0.2. Rarely, small, well-ordered proteins may exhibit R-factors of 0.1. The R-factor for a model in early stages of structure work may be as high as 0.4, but with each successive rebuilding stage, R should decrease. Crystallographers use the R-factor to show them how well the procedure is going.

The PDB file header also tells how chemically realistic the model is, by listing how well bond lengths and angles agree with expected values (the values found in small molecules). A good model should show average devations from expected values of no more than 0.2 angstroms in bond lengths and 4 degrees in bond angles.

Another result of crystallographic structure determination is a number called a B-factor or temperature factor for each atom. Roughly speaking, B-factors indicate the precision of the atom positions. Atom positions can be uncertain because of disorder in the crystal from which the structure was determined. In a high-quality model, B-factors reflect the mobility or flexibility of various parts of the molecule. High B-factors mean greater uncertainty about the actual atom position. Values of 60 or greater may imply disorder (for example, free movement of a side chain or alternative side-chain conformations). Values of 20 and 5 correspond to uncertainties of 0.5 and 0.25 angstroms, respectively.

You might wonder whether the structure of a macromolecule in a crystal is the same as that in solution. In fact, crystallographers go to great lengths to demonstrate that the crystal structure is pertinent to molecular function in solution. One of the most dramatic ways to prove this is to show that the molecule retains its function in the crystal. For example, many crystalline enzymes, when soaked in solutions of their substrates, are still catalytically active in the crystal. In other words, they convert substrates to products without entering the solution. It turns out that the crystalline protein is well hydrated. As much as 50% of a protein crystal is water, some of it ordered on the molecular surface, but more of it disordered in aqueous channels between macromolecules. Substrates diffuse into the crystal through channels of water, enter the active site of an enzyme, and are converted to products. This catalysis can only occur if the enzyme's crystalline structure is the same as in solution.

A full treatment of x-ray crystallography goes beyond the scope of this tutorial. For more information on the strengths and limitations of crystallographic models, click here. If you want a more rigorous treatment of the subject, aimed at beginners, consider reading this book.

X-ray diffraction and NMR methods complement each other. For example, diffraction methods can reveal larger structures in less time, while NMR can reveal more about molecular dynamics in solution. In most cases, crystallographic and NMR models of the same molecules agree with each other quite well, indicating that both types of models can guide us toward better understanding of molecular function in the cell.

Back to Tutorial.