Homology Model from Human Genome

Making a Homology Model AND Using the Human Genome

With Tips on Judging Model Quality

2008-06-11 NOTICE: This homology modeling project no longer works. Not sure if I will ever fix it.

For a project that works, and might actually continue to work for awhile, despite endless changes in the web tools for such tasks, go HERE.

Revised 2008-05-31

An important development has occurred since I last revised this tutorial. A much more appropriate template has appeared: a crystallographic model of beta-2 adrenergic receptor (see PDB 2RH1). To follow this tutorial to completion, and to see some of the defects that homology models can harbor, you will need intentionally to choose the same (currently, less appropriate) template that I used in the original tutorial. I will remind you of this action at the appropriate place in this tutorial, and I will make some suggestions about how to use the new template to learn more about homology modeling, after you complete the tutorial as originally designed.

This tutorial has not yet been tested by students. Please inform me of errors or unclear instructions!
Thanks.

Many things have changed at the NCBI since the last time I looked at these instructions. Just in case you cannot find the necessary sequence file easily, I updated Section 1 to give you quick access to it for this exercise.

If you have problems getting the sequence file, you might have more luck with the homology modeling tutorial that is part of Assignment 2 (sequence provided, so it does not change, and student tested), or with the tutorials at Deep View Home.

Introduction

In this tutorial, you will make a homology model of the human beta-1 adrenergic receptor, a G-protein coupled receptor (GPCR). Such receptors are integral membrane proteins and thus difficult to crystallize for X-ray studies or to dissolve for NMR studies.

You will search the human genome to find the amino-acid sequence of the receptor, as well as to see some of the information available as a result of the Human Genome Project. Then you will search for a template for modeling the receptor. An appropriate template is a protein of known structure that shares a minimum of 20-25% homology in amino-acid sequence. Next, you will build a model of the receptor by threading it onto the template and submitting the model to the Swiss-Model server for optimization. Finally, you will examine the model for clues as to its function, as well as to assess the limitations of your model.

Let me warn you at the beginning that part of this lesson is to show you some of the things that can go wrong in a modeling project, and how to detect problems in the final model. If you notice questionable decisions made along the way, that's a good sign. I hope that I am aware of all of them already, and I'll explain them at the end.

If you have not worked through sections 1-6, 8, and 11 of the Deep View Tutorial please do so before attempting Part 2 of this exercise. If you are an experienced user, please review the conventions used for specifying commands in these tutorials (first section of Deep View Tutorial).

For more information about the nature of homology models and the procedures for building them, see Principles of Protein Structure, Comparative Protein Modelling and Visualisation
by Nicolas Guex and Manuel C. Peitsch.

Tutorial

1. Obtaining the Sequence of Human Beta-1 Adrenergic Receptor

Click HERE to open a new browser window showing the home page of NCBI, the National Center for Biotechnology Information. You may want to bookmark this page and explore further the NCBI resources. For now, click Map Viewer (in list at right).

On the resulting page, click the words Homo sapiens (do not click the word Blast adjacent).

You are looking at the the genome view of the human genome, one of many links into the data from the Human Genome Project. In the Search For window type adrenergic receptor, and click Find.

Now the chromosome diagram of the Map Viewer is marked in red with locations of established or highly probable adrenergic receptor genes. The locuses of these genes are also listed below the chromosome diagram. Click on the number 10 under chromosome #10. When an expanded view of chromosome #10, appears, search near the bottom of the map for ADRB1. Scroll far to the right to find ADRB1 in full-size type. Click on ADRBI.

On the resuling page, find and click on Reference Sequences in the list on the right. This takes you down the page, where you will see a heading for mRNA and Protein(s), and two links joined by an arrow: NM_000684.2→NP_000675.1. The first link is to the mRNA sequence of this gene, and the second link is to the protein sequence. Click the second link.

At the top of the Protein page, in the menu next to Display, pick FASTA. Starting with (and including) the > symbol, select and copy the protein sequence shown. Paste it into a blank word-processor file, and save the file in plain text format. Name the file BAdrn.fst.

Skip to the beginning of Section 2. The remainder of this section needs revision.

You are looking at the LocusLink page for our receptor. LocusLink provides links to a great variety of resources for each gene in the genome. Read the brief summary near the top, and then look around here. Under Locus Information, one great site is OMIM, Online Mendelian Inheritance in Man (women, too, I'll bet). OMIM provides descriptions of gene functions and malfunctions, heavily referenced to original literature. This is one of the most useful of the many "annotations" of the genome.

Open a new window with the linked number beside "OMIM" to read about the beta adrenergic receptor's function and associated diseases and defects. Note the many links to further reading. Then close the window.

Another neat resource is the Conserved Domain Browser. To see it, Scroll down to NCBI Reference Sequences (RefSeq), and open a new browser window by clicking the words next to Domains. You'll see a list of other genes whose protein products very likely have the same domain fold as our target receptor. Close the window when you have finished browsing.

Next we'll get an amino-acid sequence of the receptor so we can make a model.

Under NCBI Reference Sequences (RefSeq). Click the numerical link next to Protein.

You are now at the Entrez Protein page, which contains links to references about this receptor. At the bottom is the receptor's amino-acid sequence, but it's not in the proper format for Deep View modeling. At the top of the page, pull down the menu beside the Display button (it says Default View), and select FASTA. Change the next menu to the right from HTML to Plain Text. Click Display to see the receptor sequence in FastA format. Click the Save button. Direct the FastA file to a convenient location on your computer, and name the file BAdrn.fst.

Congratulations! From the staggeringly vast stores of information about the human genome, including over 99% of its sequence, you have extracted the amino-acid sequence of one important signalling protein. Next, you'll use the FastA file to make a model of the receptor, using Deep View (Swiss-PdbViewer).

2. Finding a Template for Modeling

If you have not worked through the first six sections of the Deep View Tutorial, please do so now. If you are an experienced user, please review the conventions used for specifying commands in these tutorials (see Overview under Contents in the left frame).

Start Deep View and click Cancel on the startup open-file dialog. This dialog is for loading a structure file, but you are going to load a sequence file first.

SwissModel: Load Raw Sequence to Model
In the open-file dialog that results, locate your FastA file BAdren.fst, highlight it, and click Open.

Deep View builds an alpha-helix model from the sequence. This is a fairly compact way of displaying the sequence.

SwissModel: Find Appropriate ExPDB Templates
Your web browser opens, displaying a template request form. The FastA sequence of the receptor is already pasted into the form. Click Submit. You are sending the sequence for comparison with sequences a database composed of single protein domains extracted from the Protein Data Bank. This database, called ExPDB, also contains structure files for all domains in the PDB.

Wait for the reply. In your browser, you see a list of templates.

NOTE: In 2008, a better template appeared: a crystallographic model of beta-2 adrenergic receptor (PDB 2RH1). I expect that this model will show up as the best template. In order to obtain the same results as described in this tutorial, pick 1F88B as your template. At the end of the tutorial, compare the model you obtained with PDB 2RH1. Or, for more practice, go back and make another homology model of beta-1 adrenergic receptor, this time using PDB 2RH1 as the template. Then compare your two models.

In this case, there are only two, but look closely, and you will find that they are the two chains of bovine rhodopsin that make up the asymmetric unit in the PDB file 1F88. So there is really only one template choice. Before continuing, browse this page. You'll see comparisons of the sequences of these two chains with the receptor sequence, including the numbers of residues that are identical (identities) and similar (positives) to those in the aligned sequence of the receptor. This information would help you make template choices if there were more options.

Question: If the two templates are the same, why are the numbers of identities and similarities different? Answer: There are some missing residues in chain A of PDB model 1F88. As a result, chain B aligns better with the receptor sequence, giving more sequence matches. By these criteria, chain B, or ExPDB file 1F88B, is the better template.

In the Download ExPDB column, click 1F88B. The template file should download and automatically open in Deep View, which is still displaying the receptor sequence as a long helix.

3. Threading the Sequence and Submitting a Modeling Request

Color: Layer
Deep View shows the receptor in yellow, the template in blue. Now you will thread the receptor onto the template, aligning as many identical and similar residues as possible.

Fit: Magic Fit
In the blink of an eye, Deep View threads the sequence to produce a raw model.

Examine the model (yellow) superimposed on the template (blue). Notice two or three very long peptide bonds. These are at the locations of gaps in the alignment. Completing the homology model entails rebuilding the regions that do not fit the template very well.

Using Deep View, you can make a number of adjustments to this model, including altering the alignment at gaps, but you probably can't do it as well as the program Swiss-Model can do it. Now you will allow Swiss-Model, running on a computer in Geneva, Switzerland, to optimize your model.

Swiss-Model: Submit Modelling Request
When you first use this command, you may have to fill in some email information in a Deep View Preference window. Provide the necessary information, click OK, and execute the request command again. Deep View asks you for a name for file containing this project: Use BAdren, and direct the file to a convenient location

Again, your brower opens, displaying a form. Check the personal information in the top box. Correct if necessary. Click Browse, find the file proj_BAdren, and click Open. Now the file name proj_BAdren should appear in the line beside the Browse button. Under Results Options, click to darken the button beside Swiss-PdbViewer Mode, and make sure all other options are unchecked. Click Send Request.

The file proj-BAdren contains your model and template, saved just as they were when you sent the request. Save it for later comparison with the completed model. You will also find in the same directory another file named BAdren. This file is the browser model request form you completed just before you sent the request.

4. Receiving and Evaluating Your Model

The modeling project will come back to you as an attachment to email. You will also receive several other emails with information about the modeling server and the history of your modeling project.

Start Deep View and open the attached file using the startup file-open dialog. (In some cases, depending on your computer, you may have trouble associating this file with Deep View. It is adequate to change the file type to TEXT using FileTyper or a similar utility.)

The completed project file contains two models, your homology model and the rhodopsin template. Blink between them, and you will see that their backbone conformations are very similar.

Display the homology model only (top layer listed in Layer Infos window). Turn off display of residues to reveal the ribbon model. Swiss-Model colors the ribbon by confidence factor, a measure of how well your model fits the template(s). In this model, the red areas are loops whose size differed from that of the template, so the template could not guide model building in these areas. The green areas fit the template very well, as you can see by comparing the two models. Roughly speaking, confidence-value colors toward the blue end of the visible spectrum are cause for higher confidence (low numerical value, as with crystallographic B-factors), and colors toward the red are of the spectrum recommend lower confidence in the actual positions of those residues. (These values are written into the project coordinate file in the B-factor column, so you can use the command Color: B-factor to apply these colors to backbone, sidechains, surfaces or labels, in addition to ribbons.

Now explore the model and look for indications of the quality of this model.

Color (Ribbon): Secondary Structure Succession
Prefs: Ribbons
Set nb Strands to 1
(Reminder: to set the Color menu for coloring ribbon, pick ribbon from the tiny menu under the heading col in the Control Panel.) Deep View gives a distinct color to each helix in the structure, making it easy to trace the chain from N-terminus (blue) to C-terminus (red). You will leave this ribbon on display in order to help you keep your bearings, but you give it only one strand to make it less obtrusive.

Start by noting the meaning of ribbon color: areas modeled from template vs areas made by loop building.

Select: All
Wind: Ramachandran Plot
Note the clustering of colored residues in the alpha-helix region of the Rama diagram. Because all residues are selected, Deep View shows all of them on the Rama diagram. Run the cursor over (don't click) residues in conformation-forbidden areas outside the blue boundaries. In well-refined experimental models (from crystallography or NMR), it is rare to find residues other than gly and ala in forbidden regions. Although most residues lie in allowed regions, the presence of some bulky aliphatic residues in forbidden conformations is not a good sign (leu64, leu188, ile329). Close the Rama window.

Select: aa Making Clashes
<return> (reminder: the return key displays selected residues.
Deep View displays residues whose side chains are clashing. Notice the sel column in the Layer Infos window. No residues are selected. This model does not have side chains with steric problems.

Select: aa Making Clashes with Backbone
<return>
Deep View displays pro283 and pro285. To see the clashes, display a few neighboring residues: in the Control Panel, select and display residues 280-288. Deep View shows clashes as pink dotted lines.

Select: Sidechains Lacking Proper H-Bond
<return>
Deep View displays sidechains that are expected to have hydrogen bonds. It's normal to find many such side chains on a water-soluble surface, because they probably H-bond to water. It's troubling to find buried polar or charged side chains lacking H-bonds. Deep View finds 67 residues lacking expected H-bonds, and too many of these are in the interior.

Display: Side Chains Even When Backbone is Hidden
Color (sidechain): Type
Select: Group Property: Nonpolar
heading: side
Deep View displays the side chains of nonpolar residues on the ribbon backbone. Sidechains are now colored by residue type, so all residues on display are gray.

Clicking the side heading displays only the side chains of selected residues. In a membrane protein, we expect the membrane-buried surface to contain nonpolar residues only, and the interior to be much like a water-soluble protein: primarily nonpolar residues, but a few more polars than on a membrane-buried surface.

Polarity of Residues by Protein Type and Region
(Most Polar to Least Polar)

Water-Accessible Protein Surface

Interior of Water-Soluble or Interior of Membrane-Buried Protein

Membrane-Buried Protein Surface

Explore the location of these nonpolar residues. Are nonpolars missing from any surface areas?

Select: Group Kind: Acidic
<control> Select: Group Kind: Basic (press control key before and during command selection)
<control> Select: Group Kind: Polar
heading: side
Deep View displays only acidic (red), basic (blue), and nonpolar (yellow) residues.

Can you find putative membrane-buried surface areas that contain charged or polar residues? Note in particular the helices comprising residues 253-281 (yellow orange) and 291-312 (orange-red). Both contain exposed Lys and Arg residues. We simply would not expect charged residues in these locations.

In summary, this is not a very good model, and despite its reasonable appearance at first glance, we are not safe drawing any conclusions from it about the location of specific residues in the beta-1-adrenergic receptor.

Were there earlier signs that this modeling process might go awry? There were no compelling ones. In the final model, the number of identities in the alignment is 62 out of 312 rhodopsin residues,or 19.9%. The percent of similarities is 49% (135/312). These are low, but still sometimes viable, percentages for successful modeling. One questionable decision was to use only one template. If multiple templates with substantial aligned regions are available, use them. Even in this case, when our only two choices were "the same" molecule, they are not quite the same. Specifically, they compose the asymmetric unit of the rhodopsin crystal, and they differ slightly in conformation. In addition, some residues are missing in one chain, so alignment of the target sequence with the two aligned models might produce slightly better results. (In this case, it does, but only slightly. Try it, and use Fit: Best (with structural alignment) to align the two templates before Magic-Fitting the target onto them.)

The take-home lesson is that a systematic search for unrealistic aspects of a model can help you to decide whether confidence is justified. Or more precisely, such a search can tell you when confidence is certainly not justified, as in this example.

The lack of templates for modeling the important family of GPCRs is well known, and has led to other approaches to modeling them (for example, go to Swiss-Model and click on GPCR Mode). For the most part, it is not difficult to model the 7-helix cores. But the sites for extracellular interaction with ligand (hormone, for example) and for intracellular interaction with G-protein are surface loops that probably vary greatly among the GPCRs. It is ironic that we can model the least interesting parts of these proteins with some apparent reliability (at least no residues in unlikely locations), but can get no structural clues about the most interesting regions until someone solves one or more hormone-binding GPCR structures.

Like GPCRs, other families of proteins (for some of which there are no good templates) have sometimes yielded to more specialized modeling techniques. Web searching may turn up special modeling sites for specific families. For example, there are many sites for modeling antibodies.

For a simple homology-modeling exercise using a template that is more similar in structure and function to the target, see Part 4 of USM Assignment 2 in the Deep View Tutorial. Or search the human genome for opsins and model one of the cone opsins. You will have the same template choice, bovine rhodopsin, but because of greater homology in sequence and function, you should get a very reasonable model, with a plausible binding pocket for retinal. Of course, the more similar the target and template proteins, the more likely that the resulting model will be of high quality.

For more information about judging the quality of homology models, read this.

To Biochemistry Resources

HOME