Judging NMR Model Quality

Technical Note

How Structures Are Determined By NMR Spectroscopy

If you know just a little bit about where NMR structures come from, you'll be better equipped to use them wisely. This page provides that little bit. To simplify the language, I talk about proteins, but these ideas apply to all macromolecules: nucleic acids, carbohydrates, or complexes of these -- anything for which you can obtain well-resolved high-resolution NMR spectra.

NMR spectroscopists study protein structures in solution, rather than in crystals, allowing them to determine structures of molecules that will not crystallize. NMR also has the potentail to provide insight into the dynamics of molecules in solution. Finally, it can help us to discern differences between solution and crystal structures.

Unfortunately, NMR structure determination is limited to smaller molecules, with a maximum mass of about 15,000 daltons (about 150 residues for proteins). In contrast, huge structures, including that of whole viruses, can be determined by crystallography -- if the material can be crystallized.

NMR structure determination is conceptually simple, but like crystallography, demanding in practice. First, the protein must be purified. Next, a very high resolution correlation spectrum is obtained from solutions of the purified protein. This spectrum provides both chemical shifts and through-bond coupling constants (from magnetic coupling between atoms within a few bonds of each other, like the coupling you learned about in organic chemistry). Next, a large number of decoupling experiments are carried out in order to assign peaks in the spectrum to specific nuclei in the protein. This assignment process requires a very high-resolution spectrum, because there are many chemically identical groups in a protein, such as many residues of alanine. So a protein spectrum is complex, with many overlapping peaks. Simply assigning peaks to specific nuclei, like the hydrogens of ALA152, is a massive task.

After peak assignments are made, another type of spectrum, called an NOE (nuclear Overhauser effect) spectrum is obtained (correlation and NOE spectroscopy are commonly called COESY and NOESY). This NOE spectrum reveals couplings and coupling constants through space, rather than through bonds. In other words, in an NOE spectrum, nuclei exhibit coupling if they are near each other in space, even though they may be distant from each other in the sequence of residues. These couplings provide a list of pairs of nuclei that are near each because of the way the protein is folded. The measured values of the coupling constants even provide distance estimates between these pairs of atoms.

The list of NOE-coupled nuclei amounts to a set of constraints on how the protein could possibly be folded. The values of through-bond (COESY) coupling constants allow calculation of torsion angles between coupled nuclei, thus providing additional constraints. Hydrogen-bonding pairs of nuclei can also be identified by amide hydrogen exchange with the solvent, as detected by changes in NOESY spectra. The spectroscopist also knows that bond lengths, bond angles, and conformational angles throughout the molecule lie within the ranges found in small molecules. This constitutes another set of constraints on possible protein conformations. After all this labor, the spectroscopist still does not know the structure, but does know that the structure must conform to all these constraints.

With the list of constraints in hand, the stage is set for structure determination, which from this point is a computational process. The computer algorithm, in essence, attempts to fold a model of the protein chain into a conformation that fits all the constraints. If there are enough constraints, only a small number of similar protein conformations will meet all of them.

As an illustration of how constaints reveal structure, consider this list of mainchain hydrogen bonds, deduced by NMR work: ALA90 to CYS94, SER91 to ALA95, VAL92 to LYS96. These are the hydrogen bonds expected if residues 90 through 96 lie in an alpha helix. Similarly, GLY54 to ALA42, TYR53 to THR43, ASP52 to ASN44 implies that residues 54 to 52 lie alongside residues 42 to 44, as in an antiparallel pleated sheet,.

The result of NMR structure determination is not one model, but a set of similar models, all of which fit the experimentally determined constraints. A final structure is obtained by averaging the models, and then finding the conformation of minimum energy that lies nearest to this average conformation.

NMR spectroscopists submit two types of files to the Protein Data Bank. One type contains the complete set, or ensemble, of models (from 5 or 6 to 30 or more) that fit the constraints. The second type of file contains an averaged, energy-minimized model, usually with information about how well the individual models agree with each other at each atomic position. This information is roughly analogous to B-factors for crystallographic models. In regions where the NMR models agree well with each other, the structure is well determined (analogous to low B-factors), and atoms positions can be accepted with confidence. In regions where the models diverge from each other, atom positions are much less certain (analogous to high B-factors).

Comments in the PDB file header usually include lists of the number and types of constraints used in obtaining the final model.

A full treatment of protein structure determination by NMR goes beyond the scope of this tutorial. If you want a fuller understanding of the subject, start by reading Chapter 10 of this book.

X-ray diffraction and NMR methods complement each other. For example, diffraction methods can reveal larger structures in less time, while NMR can reveal more about molecular dynamics in solution. In most cases, crystallographic and NMR models of the same molecules agree with each other quite well, indicating that both types of models can guide us toward better understanding of molecular function in the cell.

Back to Tutorial.