How To Find Gene Sequence Of A Protein

This question serves as a basic introduction to the three major genome viewers. One factor, ADAM23, volition be examined using all three sites so that the reader can gain an appreciation of the subtle differences in data presented at each of these sites.

National Heart for Biotechnology Data Map Viewer

The NCBI Map Viewer tin be accessed from the NCBI's domicile page, at http://www.ncbi.nlm.nih.gov. Follow the hyperlink in the right-paw column labeled Map Viewer to go to the Map Viewer home page. NCBI provides Map Viewers for 17 organisms, including mammals, other vertebrates, plants, protozoa, fungi, and invertebrates. Select an organism from the pull-downwardly bill of fare, enter a search term in the text box, and press Go! The search term tin can be any element mapped in that genome, which in man includes gene symbol, GenBank accession number, marker proper noun or disease name. For this example, alter the Organism to human, and enter 'ADAM23'.

The notation at the peak of the resulting page (Figure 1.1) indicates that this is Build 31, or the NCBI'south 31st assembly of the human genome. Build 31 is based on sequence data from 15 Nov 2002. The previous genome assembly, Build 30, was based on sequence data from 28 June 2002. This overview folio shows a schematic of all of the man chromosomes, pinpointing the position of ADAM23 to the q arm of chromosome two (Fig. 1.1). The striking on chromosome eight is to ADAM28, which was at one time chosen ADAM23. The search results section shows that ADAM23 exists on two NCBI maps, Genes_cyto and Genes_seq. Genes_cyto refers to the cytogenetic map, whereas Genes_seq refers to the sequence map. Clicking on either of those ii links opens a view of just that map.

Detailed descriptions of these and other NCBI human being maps are available at http://world wide web.ncbi.nlm.nih.gov/mapview/static/humansearch.html. To go the most full general overview of the genomic context of ADAM23, including all available maps, click on the item in the Map element cavalcade (in this case, ADAM23). This view shows ADAM23 and a bit of flanking sequence on chromosome 2q33. By default, the maps are shown compressed. To brandish the maps in a wider format, remove the check mark next to Compress Map in the left blue sidebar (Fig. 1.2). Iii maps are displayed in this view, each of which will exist discussed below. Additional maps, discussed in other examples in this guide, can be added to this view using the Maps & Options link.

The rightmost map is the master map, the map providing the most detail. The chief map in this case is the Genes_seq map, which depicts the intron/exon organization of ADAM23 and is created by adjustment the ADAM23 mRNA to the genome. The cistron appears to have 27 exons. The vertical arrow next to the ADAM23 gene symbol (within the pink box) shows the direction in which the cistron is transcribed. The gene symbol itself is linked to LocusLink, an NCBI resource that provides comprehensive information about the gene, including aliases, nucleotide and poly peptide sequences, and links to other resources¹⁰ (run across Question 10). The links to the right of the gene symbol point to additional information most the gene.

sv, or sequence view, shows the position of the gene in the context of the genomic contig, including the nucleotide and encoded poly peptide sequences.
ev brings the user to the show viewer, a view that displays the biological evidence supporting a detail gene model. This view shows all RefSeq models, GenBank mRNAs, transcripts (whether annotated, known or potential) and expressed sequence tags (ESTs) aligning to this genomic contig. More information on the prove viewer can exist constitute on the NCBI Spider web site by clicking Bear witness Viewer Assistance on whatsoever ev study page.
hm is a link to the NCBI's Human–Mouse Homology Map, showing genome sequences with predicted orthology betwixt mouse and human being (Fig. 12.two).
seq allows the user to call up the genomic sequence of the region in text format. The region of sequence displayed can easily exist inverse.
mm is a link to the Model Maker, which shows the exons that result when GenBank mRNAs, ESTs and cistron predictions are aligned to the genomic sequence. The user tin can then select individual exons to create a customized model of the gene. More information on the Model Maker can exist found on the NCBI web site by clicking aid on any mm report page.

The UniG_Hs map shows human UniGene clusters that have been aligned to the genome. The greyness histogram depicts the number of aligning ESTs and the blue lines show the mapping of UniGene clusters to the genome. The thick blueish bars are regions of alignment (that is, exons) and the sparse bluish lines indicate potential introns. In this example, the mapping of UniGene cluster Hs.7164 to the genome follows that of ADAM23, and all the exons align.

The Genes_cyto map shows genes that have been mapped cytogenetically; the orange bar shows the position of the gene. Many genes have been broadly mapped to this regoin of chromosome two.

Clicking on the zoom control in the bluish sidebar allows the user to zoom out to view a larger region of chromosome two. Zooming out i level shows 1/100th of the chromosome. In that location are 22 genes in the region, only only 20 are labeled (displayed) in this view (Fig. 1.3). The region of ADAM23 is highlighted in cherry-red on all maps. On the basis of the Genes_seq map, ADAM23 is located betwixt KIAA1571 and LOC151405.

Academy of California, Santa Cruz Genome Browser

The home page for the UCSC Genome Browser is http://genome.ucsc.edu/. UCSC provides browsers not only for the nearly contempo version of the rat, mouse, and human genome data, but likewise for several before assemblies. To use the Genome Browser, select the appropriate organism from the pull-downward menu at the top of the blue sidebar (Human, in this case) and and so click the link labeled Browser. On the resulting page, select the version of the human assembly to view. The December. 2001 browser displays annotations based on NCBI's build 28 of the human genome, the April. 2002 browser displays annotations on NCBI's build 29, the June 2002 browser displays annotations of NCBI's build xxx, and the Nov. 2002 browser displays annotations on NCBI's build 31. Select Man Nov. 2002 from the pull-down menu to access the assembly from that date (Fig. 1.4).

Supported types of queries are listed beneath the text input boxes. Enter 'ADAM23' in the box labeled position and and so click Submit. The results of this search are presented in two categories, RefSeq Genes and mRNA Associated Search Results (Fig. 1.5). The department marked RefSeq Genes shows the mapping of the NCBI Reference mRNA sequences to the genome. The mRNA Associated Search Results represent the mapping of other GenBank mRNA sequences to the genome. Click on the RefSeq Genes link for ADAM23 (arrow, Fig. 1.5) to see the genomic context of the ADAM23 mRNA Reference Sequence (NM_003812).

The resulting zoomed-in view shows a region of chromosome two from base pair 206032982 to 206207297, located within 2q33.iii (Fig. 1.vi). The blue runway entitled Known Genes based on SWISS-PROT, TrEMBL, mRNA, and RefSeq shows the intron–exon structure of known genes. The vertical boxes indicate exons and the horizontal lines introns. The ADAM23 gene seems to accept 26 exons. The direction of transcription is indicated past the arrowheads on the introns. The tracks labeled Ensembl Genes, Acembly Genes, Twinscan, SGP Genes, and Genscan Genes are the results of gene predictions (come across Question 7). Alignments of other database nucleotide sequences are shown in the Homo mRNAs from GenBank, Spliced EST, and Nonhuman mRNAs from GenBank tracks. Translated alignments of Fugu rubripes genomic sequence are in the Fugu BLAT tracks. The Mouse Cons and Best Mouse tracks shows conservation between the human and mouse genomes. Tracks displaying single-nucleotide polymorphisms (SNPs) and repetitive elements are shown at the bottom. Boosted details almost each runway are bachelor by selecting the track name in the Track Controls at the bottom.

To view the genomic context of ADAM23, zoom out 3x past clicking on the zoom out 3x box in the upper right corner. ADAM23 is located betwixt AF338192 and BC033509 (Fig. 1.vii).

Ensembl

The Ensembl^vii project, http://www.ensembl.org/, provides genome browsers for nine species: man, mouse, rat, zebrafish, fugu, mosquito, fruitflly, C. elegans, and C. briggsea. Click on Human to view the main entry point for the human genome. The current version of homo Ensembl is version 11.31.one, based on the NCBI'south 31st build of the genome. To perform a text search, enter 'ADAM23' in the text box, and limit the search past selecting Gene from the pull-downwards search. Click on the upper push labeled Lookup. Every bit at NCBI, two results are returned, the commencement with a link to the ADAM28 gene, and the second with a link to the ADAM23 gene (Fig. i.8).

Click on either of the ADAM23 links (Ensembl Gene ENSG00000114948) to retrieve the GeneView window. The returned page contains two sections of information. The Ensembl Gene Report (Fig. ane.ix) is an overview of ADAM23, including a link to the genomic location of the gene, a schematic of the intron/exon structure, and links to homologous genes from other organisms. Some of these fields will be described in more particular in later examples. The Transcripts/Translation Summary provides data on the gene transcript (Fig. 1.10). This section of the GeneView shows links to ADAM23 in other databases, as well every bit protein domain information. If more than one transcript is predicted for the gene, each is allocated its own summary section.

The complete genomic context of ADAM23 is viewed past returning to the first department of the GeneView (Fig. one.9) and clicking on one of the 2 links inside the Genomic Location box. The top portion of the resulting ContigView (Fig. i.11) depicts the chromosome, with the region of interest outlined in red. The Overview shows the genomic context of the gene, including the chromosome bands, contigs, markers and genes that map to near 2q33.3. Clicking on any of these items recenters the display around that item. The section of involvement is boxed in cherry-red on the Dna(contigs) map. The known genes annotated by Ensembl as being around ADAM23 are Q9BZ60 and NM_014929.

The center panel of the ContigView, the Detailed View (Fig. 1.12), shows a zoomed-in view of the boxed region, highlighting all features that have been mapped to this region of the homo genome. The navigator buttons between the Overview and the Detailed View motility the display to the left and correct and zoom in and out. The features to be displayed can be inverse by selecting the Features pull-downwardly carte and then checking which features to view.

The Features shown in Fig. 1.12 are the defaults. The Deoxyribonucleic acid (contigs) map separates items on the frontwards strand (above) from those on the opposite (beneath). The forward strand shows seven types of features. Starting at the lesser, the ADAM23 transcript is shown in red, indicating that information technology is a known transcript corresponding to a almost-total-length cDNA sequence, protein sequence or both already available in the public sequence database. Black transcripts are predicted based on EST or protein sequence similarity. EST Transcr. links to individual adjustment ESTs, whereas the UniGene track most the elevation displays UniGene clusters. The Genscan model on the forrad strand contains many exons found in the known transcript and was predicted past the GENSCAN gene prediction program¹¹ (run across Question 7). The Proteins and Homo proteins boxes bespeak protein sequences that align to this version of the genome, whereas Man cDNAs shows mRNA sequences in the EMBL nucleotide sequence database and NCBI RefSeqs. Positioning the reckoner mouse over any feature brings upwardly the feature's name and links to more detailed information. The only features on the contrary strand in this view are portions of an EST transcript and a Genscan transcript. The Basepair view, at the bottom of the ContigView (Figure i.13) shows a very fine view of a 101 nucleotide region of ADAM23, showing the actual nucleotide and poly peptide sequence, every bit well as restriction enzyme sites.

The NCBI, UCSC and Ensembl sometimes use different symbols for the same genes, then it can be difficult to compare the views obtained by the different browsers. Furthermore, the three sites maintain independent annotation pipelines and do not all attempt to align the same mRNA sequences to the genome. All three sites are currently displaying annotations based on NCBI's build 31. However, it takes significant fourth dimension to update an notation based on a new genome assembly, so presently subsequently the release of a new assembly, the sites may display different versions. Now, UCSC is the just site to maintain browsers based on older assemblies. Withal, it is fairly easy to navigate amid the 3 sites. The NCBI, for instance, links to Ensembl and UCSC through the black boxes at the pinnacle of LocusLink entries for human being genes, and Ensembl directs users to NCBI and UCSC through the "Jump to" link in its ContigView. Some versions of UCSC'south Genome Browser take links to Ensembl and NCBI's Map Viewer in the blue bar at the top of each browser page.

References

International Human Genome Sequencing Consortium. Initial sequencing and analysis of the human genome. Nature 409, 860–921 (2001).
Collins, F.S. and McKusick, V.A. Implications of the Human Genome Project for medical science. J. Am. Med. Assoc. 285, 540–544 (2001).

CAS Article Google Scholar
Watson, J.D. & Crick, F.H.C. Molecular structure of nucleic acids: a structure for deoxyribose nucleic acrid. Nature 171, 737–738 (1953).

CAS Article Google Scholar
Green, Due east.D. Strategies for the systematic sequencing of complex genomes. Nature Rev. Genet. 2, 573–583 (2001).

CAS Commodity Google Scholar
Ouellette, B.F.F. & Boguski, M.S. Database divisions and homology search files: a guide for the perplexed. Genome Res. 7, 952–955 (1997).

CAS Article Google Scholar
Bairoch, A. & Apweiler, R. The SWISS-PROT Protein Sequence Database and its supplement TREMBL in 2000. Nucleic Acids Res. 28, 45–48 (2000).

CAS Article Google Scholar
Hubbard, T. et al. The Ensembl Genome Database Project. Nucleic Acids Res. thirty, 38–41 (2002).

CAS Article Google Scholar
Kent, W.J. BLAT—the BLAST-like Alignment Tool. Genome Res. 12, 656–664 (2002).

CAS Article Google Scholar
Stein, 50. Genome notation: from sequence to biological science. Nature Rev. Genet. 2, 493–503 (2001).

CAS Commodity Google Scholar
Pruitt, K.D. & Maglott, D.R. RefSeq and LocusLink: NCBI factor-centered resources. Nucleic Acids Res. 29, 137–140 (2001).

CAS Commodity Google Scholar
Burge, C.B. & Karlin, S. Finding the genes in genomic DNA. Curr. Opin. Struct. Biol. viii, 346–354 (1998).

CAS Article Google Scholar
Schuler, One thousand.D. Electronic PCR: bridging the gap between genome mapping and genome sequencing. Trends Biotechnol. 16, 456–459 (1998).

CAS Article Google Scholar
Sherry, S.T. et al. dbSNP: the NCBI database of genetic variation. Nucleic Acids Res. 29, 308–311 (2001).

CAS Article Google Scholar
Hamosh, A. et al. Online Mendelian Inheritance in Man (OMIM), a knowledgebase of human genes and genetic disorders. Nucleic Acids Res. xxx, 52–55 (2002).

CAS Article Google Scholar
Baxevanis, A.D. & Ouellette, B.F.F. (eds.) Bioinformatics: A Practical Guide to the Analysis of Genes and Proteins (John Wiley & Sons, New York, 2001).

Book Google Scholar
Solovyev, 5.V., Salamov, A.A. & Lawrence, C.B. Identification of man gene structure using linear discriminant functions and dynamic programming. Proc. Int. Conf. Intell. Syst. Mol. Biol. iii, 367–375 (1995).

CAS PubMed Google Scholar
Yeh, R.F., Lim, L.P. & Burge, C.B. Computational inference of homologous cistron structures in the human genome. Genome Res. 11, 803–816 (2001).

CAS Article Google Scholar
Marchler-Bauer, A. et al. CDD: a database of conserved domain alignments with links to domain three-dimensional structure. Nucleic Acids Res. xxx, 281–283 (2002).

CAS Article Google Scholar
Apweiler, R. et al. InterPro—an integrated documentation resources for protein families, domains and functional sites. Bioinformatics sixteen, 1145–1150 (2000).

CAS Article Google Scholar
Rebhan, M., Chalifa-Caspi, V., Prilusky, J. & Lancet, D. GeneCards: a novel functional genomics compendium with automated data mining and query reformulation support. Bioinformatics xiv, 656–664 (1998).

CAS Article Google Scholar
Blake, J.A., Richardson, J.East., Bult, C.J., Kadin, J.A. & Eppig, J.T. The Mouse Genome Database (MGD): the model organism database for the laboratory mouse. Nucleic Acids Res. 30, 113–115 (2002).

CAS Commodity Google Scholar
Hudson, T.J. et al. A radiation hybrid map of mouse genes. Nature Genet. 29, 201–205 (2001).

CAS Article Google Scholar
Bateman, A. et al. The Pfam poly peptide families database. Nucleic Acids Res. 30, 276–280 (2002).

CAS Article Google Scholar
Letunic, I. et al. Recent improvements to the SMART domain–based sequence note resource. Nucleic Acids Res. xxx, 242–244 (2002).

CAS Article Google Scholar
Altschul, S.F. et al. Gapped Blast and PSI-Smash: a new generation of protein database search programs. Nucleic Acids Res. 25, 3389–3402 (1997).

CAS Commodity Google Scholar
Durbin, R., Eddy, S., Krogh, A. & Mitchison, M. Biological Sequence Assay: Probabilistic Models of Proteins and Nucleic Acids (Cambridge Univ. Press, Cambridge, 1998).

Book Google Scholar
Peri, S., Ibarrola, Due north., Blagoev, B., Mann, M. & Pandey, A. Mutual pitfalls in bioinformatics-based analyses: expect before you leap. Trends Genet. 17, 541–545 (2001) [erratum Trends Genet. xviii, 218 (2002)].

CAS Article Google Scholar
Ponting, C. Issues in predicting protein function from sequence. Brief. Bioinform. 2, 19–29 (2001).

CAS Article Google Scholar
Aparicio, S.A.J.R. How to count ... man genes. Nature Genet. 25, 129–130 (2000).

CAS Article Google Scholar
Beadle, G.W. & Tatum, E.L. Genetic control of biochemical reactions in Neurospora. Proc. Natl Acad. Sci. The states 27, 499–506 (1941).

CAS Article Google Scholar
Jeffery, C.J., Bahnson, B.J., Chien, W., Ringe, D. & Petsko, G.A. Crystal structure of rabbit phosphoglucose isomerase, a glycolytic enzyme that moonlights equally neuroleukin, autocrine motility factor, and differentiation mediator. Biochemistry 39, 955–964 (2000).

CAS Article Google Scholar
Wistow, K. & Piatigorsky, J. Recruitment of enzymes as lens structural proteins. Science 236, 1554–1556 (1987).

CAS Article Google Scholar
Jeffery, C.J. Moonlighting proteins. Trends Biochem. Sci. 24, 8–11 (1999).

CAS Article Google Scholar
Chothia, C. Proteins. One chiliad families for the molecular biologist. Nature 357, 543–544 (1992).

CAS Article Google Scholar
Hegyi, H. & Gerstein, M. The relationship between protein structure and function: a comprehensive survey with application to the yeast genome. J. Mol. Biol. 288, 147–164 (1999).

CAS Commodity Google Scholar
Jansen, R. & Gerstein, M. Assay of the yeast transcriptome with structural and functional categories: characterizing highly expressed proteins. Nucleic Acids Res. 28, 1481–1488 (2000).

CAS Article Google Scholar
Brenner, Due south.East. Errors in genome note. Trends Genet. xv, 132–133 (1999).

CAS Article Google Scholar
Smith, R.F. Perspectives: sequence information base searching in the era of large-scale genomic sequencing. Genome Res. half dozen, 653–660 (1996).

CAS Article Google Scholar

Download references

Rights and permissions

About this article

Cite this commodity

Question 1 How does 1 notice a cistron of involvement and decide that factor's structure? Once the gene has been located on the map, how does i easily examine other genes in that aforementioned region?. Nat Genet 35, 9–17 (2003). https://doi.org/10.1038/ng1189

Download citation