12. Exploring Protein Structure by Homology Modeling

Under construction 2008-06-13 ABANDONED

This section not currently ready or linked to anything.

****NB to GR: Abandoned this in favor of the advanced tutorials at DeepView Home. In bioinformatics tutorial, users are referred to the homology tutorials there.

This procedure is for DeepView version 4. There are quite a few differences between homology modeling tools in versions 3.9 and 4.0, including new links to the SwissModel Workspace. If you do not have version 4, get it before continuing.

In this section, you will produce a model of the protein peropsin. This protein is found in the human retina, and it exhibits sequence homology with the visual pigments of the retina, such as rhodopsin. The function of peropsin is unknown. The structure also unknown, but because of its homology with rhodopsin—a protein of known structure—you can build a homology model of peropsin (the target) by folding it the same way as rhodopsin (the template). Building the model entails first finding the best alignment of the sequences of target and template, and then making the assumption that regions of peropsin sequence have the same conformation (three-dimensional structure) as aligned regions of rhodopsin. After making the model, you will try out some of DeepView's tools for assessing model quality (using some of the same tools as for crystallographic and NMR models), and then you will explore the structure for clues to its function. For example, rhodopsin and the other visual pigments have a covalently bound prosthetic group, retinal, joined by an imine link to a lysine side chain. You will use your homology model to ask whether peropsin contains the sturcutural features necessary to bind retinal.

FYI: More information about peropsin

The first step in judging the quality of homology model is simple: recognize, and don't forget, that even the best homology model is not as good as an experimental model from crystallography or NMR. At best, it is sort of an educated guess about the structure of a protein (the target) from its sequence, based on the structure of one or more homologous proteins (templates) that are available in the Protein Data Bank. How good a guess? In short, the more similar are the sequence and function of your target and your template(s), the better the model. So in our case, if we can find a template of similar sequence and function (a homologous retinal-binding opsin, that is), we should be able to get a decent model of peropsin, and see whether it's really feasible to expect it to bind retinal. We might even be able to tell whether it binds retinal covalently or noncovalently, depending on whether an appropriate covalent-binding residue is suitably located in the homology model. At the very best, however, you cannot learn fine details of side-chain conformations, as you can from determining protein structure by X-ray crystallography or NMR. But homology models can be useful for preliminary exploration, and might also point to useful target residues for chemical analyses of structure, such as site-directed mutagenesis.

Defining Some Terms
Creating a model is loosely called structure determination, but in fact, we never really determine the structures of molecules. What we call structure determination is really creating molecular models that fit data or fit what we know about a substance. A model based on data from x-ray crystallography or spectroscopy is called an experimental model. A model not based on experimental data is a theoretical model, and a theoretical model based on homology to one or more experimental models is a homology model.

Exploring Rhodopsin and Peropsin

Quick Homology Models (but you don't learn anything).

Many genomics databases include homology models generated by automated servers, or quick and easy ways to submit a sequence or database entry for modeling. You get a model quickly, but you don't learn much about the process. Let's get a peropsin model by this route, and then we will do it by a more hands-on method that helps you to see a lttle bit about what the automated servers are doing. Then later on, if an automated server gives you a poor model (one that conflicts with things that other research has told you, or one with obvious errors, like charged residues in its interior), you will know how to intervene in the modeling in hopes of getting a better model (one that agrees with reliable evidence about the structure).

Go to http://us.expasy.org/, select UniProt Knowledgebase (Swiss-Prot and TrEMBL), and enter OPSX, the protein name for peropsin, in the Search Swiss-Prot and TrEMBL for box. Click Go. On the result page, click OPSX Human. You are now at UniProtKB/Swiss-Prot entry O14718. Look over the wealth of information that makes up the annotations for this entry. Through this page, you are connected to the information about this gene is databases all over the world of structural biology.

After looking over this page, and perhaps visiting some other sites to see what thay have on OPSX, scroll to the bottom and click on Submit a homology modeling request to SWISS-MODEL.

At Swiss-Model, you see that the server has already started filling out the form for you, by displaying the accession number O14718 to the top. Fill in your email address and your name. Click Submit to SWISS-MODEL. You just asked Swiss-Model to make a homology model for you. The server will search (using pBlast) for templates with sufficient sequence similarity; gather the templates and thread your model onto them, search databases of protein loops to build parts that do not match well with the templates; to build loops similar to those in databases, if possible; to build remaining loops that are mostly guesswork, but with reasonable conformations; and finally, to optimize the model by minimizing its conformational energy.

NOTE: You can also paste any FASTA sequence into a form for a homology modeling request by going to http://swissmodel.expasy.org/ and clicking on First Approach Mode under Modeling Requests.

If this is your first model request, SwissModel will open a new account for you in the SwissModel Workspace. You will receive email about this, along with and ID, password, and a link to your Workspace. There you will find lots of information about your model, as well as a project file all ready for viewing in DeepView, including the model and all templates used in the project. You might want to visit your Workspace after this tutorial, and use the help tools to learn how you can run and manage modeling jobs at SwissModel.

Note: From the UniProtKB/Swiss-Prot entry O14718 page, you can also get homology models of peropsin using various templates by clicking O14718 next to ModBase under Cross References, 3D Structure Databases. By now, there may be links to other modeling sites also. From all of these, you get a model, but you don't learn much about modeling. In the next section, you will make a homology model by a slightly more manual method. What you learn will help you understand modeling better, and help you learn how to judge the quality of homology models.

Homology Modeling Step by Step

This procedure is for DeepView version 4. There are quite a few differences between homology modeling tools in versions 3.9 and 4.0. If you do not have version 4, get it before continuing.

Now let's make a peropsin homology model by the do-it-yourself (well, partically, at least) method. This will give you a much clearer picture of what's going on when an automated server makes a model for you.

First, we'll need a FASTA file for human peropsin. Go to http://us.expasy.org/, select UniProt Knowledgebase (Swiss-Prot and TrEMBL), and enter OPSX, the protein name for peropsin, in the Search Swiss-Prot and TrEMBL for box. Click Go. On the result page, click OPSX Human. Near the bottom of the resulting UniProtKB/Swiss-Prot entry, click on O14718 in FASTA format. Save the resulting text file as peropsin.txt. (I selected and copied the file from the browser display, pasted it into a new word processor file, and saved it in text format).

NOTE: You can also paste any FASTA sequence into a form for homology modeling by going to http://swissmodel.expasy.org/ and clicking on First Approach Mode under Modeling Requests.

Start DeepView. Then Cancel the initial dialog box, which is expecting you to load a PDB file. Your peropsin file is in FASTA format, so you have to load it by a different procedure.

SwissModel: Load Raw Sequence to Model...
A reminder about DeepView Tutorial format: the instruction above this line tells you to select the command Load Raw Sequence to Model.. from the SwissModel menu.

The resulting dialog box looks just like one for loading a PDB file, but now DeepView is looking for a sequence file in FASTA format. Navigate to your peropsin.txt file, select it, and click Open.

DeepView displays the sequence of peropsin as an alpha helix. This is a compact way to get the 337 residues onto the screen. First, let's see what Prosite (above) has to say about the nature of this protein. DeepView contains an internal link to ProSite through the command Edit: Search for ProSite Patterns. Because ProSite works on sequence only, we don't need to know anything about structure to see what signatures of protein function ProSite can find.

Edit: Search for ProSite Patterns

This command elicits a small window listing signatures or patterns found in peropsin. Note the last two entries, indicating that ProSite recognizes patterns indicating that peropsin is a G protein-coupled receptor (GPCR) with a retinal-binding site. Click the black descriptions of sites to highlight the residues that ProSite recognized. Click the red ProSite entry numbers to download a full description of the ProSite documentation for recognizing a specific type of protein. The entry for GPCRs contains a full list of sequences from this family in SwissProt/TrEMBL. It's a huge list. The entry for retinal binding sites contains a list of sequences that appear to contain this site, followed by a description of the types of proteins in which this pattern is found. All entries end with the specific patterns that ProSite looks for to a particular type of protein. Study these lines to find out what ProSite is looking for when it scans a sequence.

Now you know that there is likely to be a retinal-binding site in peropsin. Let's see if we can find it.

SwissModel: Find Appropriate ExPDB Templates

DeepView starts up your preferred browser and completes a form containing the peropsin sequence in FASTA format. Click Submit to conduct a search for proteins of known three-dimensional structures that are homologous to peropsin. The Swiss-Model Template Server returns a list of model that should server as suitable templates for making a model of peropsin. These models are in a special structure database called ExPDB, for excerpts of PDB models. An ExPDB entry is usually one domain from a multidomain protein, or one chain from a PDB model that contains more than one chain.

On 2007/01/08, I got sixteen possible templates, all of which were various models of bovine rod rhodopsin. As of this date, rhodopsin from the rod cells of good ol' Bos taurus (whence "Come, Bossy!") is the only visual pigment of known structure. The first recommended template, 1f88A, is chain A from PDB entry 1F88, a model determined by X-ray crystallography. The BLAST score for 1f88A indicates that the odds that the sequencs of of peropsin and model 1f88A are similar by chance is 9x10-37, which implies strong reason for similarity. Biologists would say that the only reasonable explanation for such similarity is evolution: peropsin and bovine rhodopsin evolved from a common ancestor.

Because all the potential templates are the same protein, we will work with just one template, 1f88A. Click its file name in the download ExPDB column of the Template Selection table. Depending on how your browser and DeepView are set up, the file might open automatically in DeepView, or you may have to specify DeepView as the helper application, or you may have to save the file to your desktop. If you've been using DeepView before, you've probably worked out the best way to handle PDB downloads. Use your favorite method, and then open the file in DeepView (by File: Open PDB File, if you saved the file to your desktop).

The model should appear near your large alpha helix of peropsin residues.

Wind: Sequences Alignment
DeepView
displays the sequences of both models, right-justified -- that is, aligned at their N-terminal residues. Now for some real magic.

Fit: Magic Fit
In the blink of an eye, it appears that your peropsin helix is gone. But its sequence has been aligned with that of 1f88A, and each its residues has been superimposed upon the residue with which it aligns sequentially. In short, the peropsin chain has been threaded onto the 3-dimensional model of bovine rhodopsin.

<control-tab-tab-tab...>
Holding down the control key and pressing tab tells DeepView to "blink" between the structures. You are seeing, alternately, 1f88A and the target peropsin homology model. Your target might still be colored gray, with a ProSite pattern highlighted in cyan; if so, display the target, and Color: CPK to give it "normal" color. As you blink back and forth between the models, you might see some strange things about the target. For example, look for a very long peptide bond at one end (between residues 315 and 316). Such features are obviously not chemically realistic; they just represent the best DeepView could do at aligning the sequences. Such a problem suggests that, in this region, the two proteins are not structurally very similar. But the overall alignment seems to work well.

In the Sequences Alignment window, notice that the sequences are not longer left-justified; they are aligned by homology. Straight vertical lines connect residues that are identical in the two models, two dots connect quite similar residues (like valine and leucine, both bulky nonpolar), one dots connects less similar pairs (serine and glutamine, both polar), and dissimilar pairs are unconnected (glycine and isoleucine). To see the full alignment conveniently, click the little document icon at the left end of the Sequences Alignment window. You can save this diagram as a plain text file for printing (File: Save: Sequence Alignment).

Wind: Layer Infos
This window gives information about the display property of all models currently loaded. With what we are doing, it's a handy way to be more quantitative about similarities. According to the Sel column (far right), 299 residues are currently selected. This is the number of residues in the aligned regions of the two models. Blink to display 1f88A. Select: All. The Sel column tells you that 1f88A contains 346 residues. Blink to display peropsin. How many residues does it contain?

With the peropsin model remaining on display, Select: aa Identical to ref. Structure. DeepView tells you that 88 residues of peropsin align with identical amino-acid residues of 1f88A. This is 88/346 or only 25% sequence identity. Select: aa Similar to ref. Structure. The percentage of aligned residues that are chemically similar (double dots in the Sequences Alignment window) is 182/346 or 53%. If two proteins show more than about 35% sequence similarity under best alignment, then they are almost certain to be of similar structure.

Select: aa Making Clashes
A large number of residues in the homology model are trying to occupy the same space. This is obviously unrealistic. Make the model more feasible by using Tools: Fix Selected Sidechains: Quick and Dirty. Again, Select: aa Making Clashes to see if DeepView has improved these problems. Press <return> to show only the selected residues. Pink dotted lines reveal the clashes. Some of them look pretty serious.

You can solve all of the problems of this primitive model by sending it off to the Swiss-Model server for optimization.

SwissModel: Submit Modeling Request
Your browser appears again, this time with a new form. The form tells you that a project file has been created, and gives you its location (on a Macintosh, the location is /Applications/SPDBV_3.9b1.01_univ/temp/, and the file name is SwissModelRequest-xxx.spd. On the form, under Your Swiss-Model project file can be found in:, click Browse, and navigate to the file listed on the form. This is the project file that your browser will send to Swiss-Model for optimization. (The other file in the same location, SwissModelRequest-xxx.htm is just the web page form you are viewing. Don't select this file by accident.) Complete the form by checking your email information, select Swiss-PdbViewer mode for the format of your final model, and uncheck the option of getting a WhatCheck report of the final model. Then click Send Request.

NOTE: After clicking the Browse button on the form, you could select and send ANY DeepView project file that you created by Fit:Magic Fitting a FASTA sequence on to one or more templates. So your procedure to this point does not have to follow the tutorial exactly. But the project file must contain one target sequence, loaded as the first model, and one or more templates loaded afterward. If you are working on several modeling projects, you might want to move a copy of SwissModelRequest-xxx.spd to a working folder for this model, and use the same folder for the results files that you will receive.

ABANDON HOPE, ALL YE WHO ENTER HERE.S

REMAINDER OF TUTORIAL WAS NEVER REVISED, OUT OF DATE.

ALTERNATIVE REQUEST METHOD: If you have problems sending the project file by way of your browser, you can submit it directly to Swiss-Model. Go to http://swissmodel.expasy.org/. Under Modeling Requests, click Project (optimise) mode. You will see a form similar to the one the DeepView creates. Fill in your email address, name, and a project title. Click Choose File, navigate to your project file, and choose it. Select Swiss-PdbViewer mode for the format of your final model, and uncheck the option of getting a WhatCheck report of the final model. Then click Send Request.

By either submission method, your browser should return a message indicating successful uploading of your project file (385787 bytes for this project), and provide further information. You will receive your optimized model by email. It may take several hours. Once you receive several email files from Swiss-Model, you are ready to resume this tutorial.

NOTE TO USERS: 2007/01/13 -- revised to here, then found that this project was failing at Swiss-Model. The folks there are trying to track down the problem. Tutorial beyond this point is revised to fit the expected results from the hands-on method, and the actual results from the automated method, but it might need additional changes.

On 2007/01/11, my automated modeling request elicited four email messages from Swiss-Model, with one that contains this subject line: SwissModel-Model-AAAa080xq. The AAA number is a Swiss-Model project number (yours will be different, of course), and this is the email that contains your homology model as an attachment (mine was named AAAa080xq.pdb). Save your model file to a convenient location, and then start DeepView and open the file.

NOTE: The other emails contain information about how your modeling project was carried out, and some news for Swiss-Model users. If a modeling project fails, one of these emails will tell you (perhaps cryptically) just what happened. The mail whose subject line includes the word TraceLog gives a list of all operations in making your model.

ANOTHER NOTE: For an automated request, there may be more than one template. In the remaining instructions, I will assume more than one template, in order to include instructions for looking at only the ones you want.

Studying and Evaluating Your Model

Blink (hold down ctrl and press tab repeatedly) to see the target (peropsin) and the templates in sequence. In the cyc column of the Layers Info window, for all models except the target and the first template, click each checkmark once to turn it to +, than again to turn it to -. This prevent the other templates from appearing during blinking. Now blinking will simply alternate between showing the target and the first template.

By default, templates are displayed as backbone only. Turn on all of the first templatse side chains by shift-clicking anywhere in the side column of the Control Panel. With peropsin on display, shift-click any checkmark in the show column to turn off display of all residues, leaving a ribbon model. By default, the ribbon is colored to show the quality of the model. Most of the ribbon is blue or green, and some short segments are red. Blue indicates residues that fit very well with the template, green means not bad, while red indicates residues that did not match up well with template residues. (The menu command option for showing this color scheme is Color: B-Factor. although the term B-Factor does not apply here.)

It is typical in a modeling project like this that scaffold residues, such as those in the seven helices, model well, but surface loops, which define the specific function of the protein, constitute the most significant differences between target and template, and do not model as well. Ironically, you learn mostly what you already know about your target (in this case, that it's a seven-helix bundle), and you learn least about the most interesting parts, the parts that differ most from your template, and that give your target protein a different function from that of your template.

Let's see whether optimization really improved our hand-made model noticeably. With the peropsin layer displayed, and the Layer Infos window open, Select: aa Making Clashes. If you find any, fix them as outlined above. Look for other funny stuff, such as long peptide bonds. If such things are all gone, and the model is at least structurally realistic. To learn more about judging the quality of models -- homology models and others -- visit these two resources:

  1. Principles of Protein Structure, Comparative Protein Modelling and Visualisation, by Nicolas Guex (creator of DeepView) and Manual C. Peitsch
  2. Judging the Quality of Macromolecular Models, by Gale Rhodes

Now let's see whether it appears that peropsin contains a pocket for retinal binding. Blink to the 1f88A layer (in my project, the first template). In the Control Panel, scroll down to the bottom and click RET977, to select the retinal molecule in the bovine rhodopsin model. The line RET977 should turn red when you click it. Press <return> to remove all but retinal from the display.

Select: Neighbors of Selected aa..: Select groups that are within 4.0 A of the picked atom. Click OK and press <return>. Click the labl heading in the Control Panel to label all displayed residues, and click the side heading to add side chaines . Next to RET977 at the bottom of the Control Panel, add a checkmark to the column headed by four dots and a small v (van der Waals surface column). You should now have a dramatic display of the retinal-binding pocket of bovine rhodopsin. Note that LYS296 forms a covalent imide link, with retinal, the result of addition of the LYS296 amino group to the aldehyde group of retinal. Now the exquisite fit of the retinal among other residues, with most hydrophobic side chains snuggling up to the hydrophobic retinal molecule.

Now let's see whether our model of peropsin allows such a pocket, and provides an appropriately placed lysine residue for covalent bonding. Select: Extend to other layers. This selects the residues in model peropsin that are aligned with the displayed residues of af88A. Blink to the peropsin layer, press<return> to display the selected residues, shift-click any checkmark in the ribn column to remove the ribbon display, and click the labl heading to label all displayed residues. Now blink between the layers, looking for similarities and differences. You should see that LYS284 of peropsin is perfectly placed to link to retinal, that no pocket residues of peropsin intrude upon the retinal, and that many of the pocket residues are identical to those in bovine rhodopsin. It appears that peropsin could accomodate a retinal molecule, and could attach it by an imide link as well. Finally, Color:Layer to give the the target and template different colors, and then make both layers visible (in Layer Infos window, click to put check marks in the vis colum for both. With the models superimposed, you can readily see that the target model has a nicely formed retinal pocket.

So. Does peropsin carry retinal in the human eye? I don't know. This homology model certainly suggests that retinal binding is feasible. Proving that peropsin is a retinal carrier in the eye requires more than just building models. It requires purifying peropsin from retinal tissue, followed by chemical analysis to detect retinal. Finding out if the binding is just like what we are seeing in the model would require determining the structure of the peropsin-retinal complex by X-ray crystallography or NMR. Or a researcher could use the model we've made to select residues to change (by site-directed mutagenesis) and see if the changes affect binding. Modeling only begins the quest to determine what peropsin actually does. To fully understand peropsin will require a conversation between theory (which includes model building) and experiment (chemical analysis, spectroscopy, structure determination, monitoring peropsin gene expression). This powerful dialog is the engine that propels science, and our growing understanding of nature.


To The Molecular Level