Bioinformatics Tutorial
Cast of Characters
You will encounter these databases and software tools one by one as you follow this tutorial. Use this page for reference if you can't remember the meaning of an acronym or program name.
I. The Databases
- Genbank, operated by NCBI (National Center for
Biotechnology Information)
Contains all publicly available sequences of DNA, with
annotations, which are constantly being extended and updated. Annotations include identification of a genes its gene product(s) (if known), and extensive links to all kinds of information about the gene in other databases.
NCBI contains the same DNA sequence content as EMBL (European Molecular
Biology Laboratory) and DDBJ (DNA Data Bank of Japan)
- OMIM, (Online Mendelian Inheritance in Man—woman, too)
An encyclopedia of human genes and genetic disorders, linked to gene
entries in GenBank and to scientific literature in PubMed. Gives complete and up-to-the-minute information about many human genes.
- PDB (Protein Data Bank)
Contains all publicly available experimentally determined
(by
x-ray crystallography and NMR) structural models of proteins and nucleic acids. Does not contain homology models or other types of theoretical models.
- PubMed
Described in Wikipedia as "a free search engine for accessing the MEDLINE database of citations and abstracts of biomedical research articles. The core subject is medicine, and PubMed covers fields related to medicine, such as nursing and other allied health disciplines. It also provides very full coverage of the related biomedical sciences, such as biochemistry and cell biology. It is offered by the United States National Library of Medicine at the National Institutes of Health as part of the Entrez information retrieval system."
- UniProt Knowledgebase (Swiss-Prot and TrEMBL), operated by SIB (Swiss
Institute of Bioinformatics) and EBI (European
Bioinformatics Institute).
Contains most of the publicly available sequences of
proteins (not DNA or RNA). Sequences in Swiss-Prot are annotated manually, and provide or link you to just about all published information about the sequence. Sequences in TrEMBL are collected and annotated automatically from sequence databases, and will make their way to Swiss-Prot, but only after they are manually annotated to meet Swiss-Prot standards.
II. The Tools
- BLAST (Basic Local Alignment Search Tool)
For searching databases to find genes or proteins with sequences similar to yours
- ClustalW
For comparing your sequence with others, or lots of sequences
with each other
- DeepView (also knows as Swiss-PdbViewer)
For seeing and exploring macromolecular models in three
dimensions, and for manual and semiautomated homology
modeling
- ExPASy (Expert Protein Analysis System)
Not so much a tool as a tool box -- a very complete set of protein-analysis tools
- NCBI Map Viewer
For finding genes and gene products (RNAs and proteins) that
interest you, and for seeing where they lie on the set of chromosomes for each organism
- PubMed
For searching ALL the literature of the life sciences
- Phylip
For making rigorous phylogenetic trees when you want to control all the parameters
- Phylodendron
For printing phylogenetic trees using data
- PhyML
For making rigorous phylogenetic trees automatically by a maximum-likelihood method—probably the best, but the slowest
- Swiss-Model and the Swiss-Model Workspace
For automated building theoretical structural models of your
sequence based on known structures (homology modeling)
- Tcoffee
Like ClustalW, a tool for sequence comparisons, but more powerful, and can use known structures to improve the comparisons
NEXT