Databases
A collection of genomics, functional genomics, and
genetics studies and links to their resulting datasets. This resource describes
project scope, material, and objectives and provides a mechanism to retrieve
datasets that are often difficult to find due to inconsistent annotation,
multiple independent submissions, and the varied nature of diverse data types
which are often stored in different databases.
The BioSample database contains descriptions of
biological source materials used in experimental assays.
A collaborative effort to identify a core set of human
and mouse protein coding regions that are consistently annotated and of high
quality.
A divison of GenBank that contains short single-pass
reads of cDNA (transcript) sequences. dbEST can be searched directly through
the Nucleotide EST Database.
A division of GenBank that contains short single-pass
reads of genomic DNA. dbGSS can be searched directly through the Nucleotide GSS
Database.
Includes single nucleotide variations,
microsatellites, and small-scale insertions and deletions. dbSNP contains
population-specific frequency and genotype data, experimental conditions,
molecular context, and mapping information for both neutral variations and
clinical mutations.
The NIH genetic sequence database, an annotated
collection of all publicly available DNA sequences. GenBank is part of the
International Nucleotide Sequence Database Collaboration, which comprises the
DNA DataBank of Japan (DDBJ), the European Molecular Biology Laboratory (EMBL),
and GenBank at NCBI. These three organizations exchange data on a daily basis.
GenBank consists of several divisions, most of which can be accessed through
the Nucleotide database. The exceptions are the EST and GSS divisions, which
are accessed through the Nucleotide EST and Nucleotide GSS databases,
respectively.
A compilation of data from the NIAID Influenza Genome
Sequencing Project and GenBank. It provides tools for flu sequence
analysis, annotation and submission to GenBank. This resource also has links to
other flu sequence resources, and publications and general information about
flu viruses.
A collection of nucleotide sequences from several
sources, including GenBank, RefSeq, the Third Party Annotation (TPA) database,
and PDB. Searching the Nucleotide Database will yield available results from
each of its component databases.
Database of related DNA sequences that originate from
comparative studies: phylogenetic, population, environmental and, to a lesser
degree, mutational. Each record in the database is a set of DNA sequences. For
example, a population set provides information on genetic variation within an
organism, while a phylogenetic set may contain sequences, and their alignment,
of a single gene obtained from several related organisms.
A public registry of nucleic acid reagents designed
for use in a wide variety of biomedical research applications, together with
information on reagent distributors, probe effectiveness, and computed sequence
similarities.
A collection of human gene-specific reference genomic
sequences. RefSeq gene is a subset of NCBI’s RefSeq database, and are
defined based on review from curators of locus-specific databases and the
genetic testing community. They form a stable foundation for reporting
mutations, for establishing consistent intron and exon numbering conventions,
and for defining the coordinates of other biologically significant variation. RefSeqGene
is a part of the Locus Reference Genomic
(LRG)
Collaboration.
A collection of curated, non-redundant genomic DNA,
transcript (RNA), and protein sequences produced by NCBI. RefSeqs provide a
stable reference for genome annotation, gene identification and
characterization, mutation and polymorphism analysis, expression studies, and
comparative analyses. The RefSeq collection is accessed through the Nucleotide
and Protein databases.
The Sequence Read Archive (SRA) stores sequencing data
from the next generation of sequencing platforms including Roche 454 GS
System®, Illumina Genome Analyzer®, Life Technologies AB SOLiD System®, Helicos
Biosciences Heliscope®, Complete Genomics®, and Pacific Biosciences SMRT®.
A database that contains sequences built from the
existing primary sequence data in GenBank. The sequences and corresponding
annotations are experimentally supported and have been published in a
peer-reviewed scientific journal. TPA records are retrieved through the
Nucleotide Database.
A repository of DNA sequence chromatograms (traces), base
calls, and quality estimates for single-pass reads from various large-scale
sequencing projects.
A database that provides sets of transcript sequences
that appear to come from the same transcription locus (gene or expressed
pseudogene), together with information on protein similarities, gene
expression, cDNA clone reagents, and genomic location.
This database contains libraries of Expressed Sequence
Tags (ESTs) organized by organism, tissue type and developmental stage.
A comprehensive database of sequence tagged sites
(STSs) derived from STS-based maps and other experiments. STSs are defined by
PCR primer pairs and are associated with additional information, such as
genomic position, genes, and sequences.
Downloads
BLAST executables for local use are provided for
Solaris, LINUX, Windows, and MacOSX systems. See the README file in the ftp
directory for more information. Pre-formatted databases for BLAST nucleotide,
protein, and translated searches also are available for downloading under the
db subdirectory.
Sequence databases for use with the stand-alone BLAST
programs. The files in this directory are pre-formatted databases that are
ready to use with BLAST.
Sequence databases in FASTA format for use with the
stand-alone BLAST programs. These databases must be formatted using formatdb
before they can be used with BLAST.
This site contains files for all sequence records in
GenBank in the default flat file format. The files are organized by GenBank
division, and the full contents are described in the README.genbank file.
This site contains all nucleotide and protein sequence
records in the Reference Sequence (RefSeq) collection. The
""release"" directory contains the most current release of
the complete collection, while data for selected organisms (such as human,
mouse and rat) are available in separate directories. Data are available in
FASTA and flat file formats. See the README file for details.
This site contains next-generation sequencing data
organized by the submitted sequencing project.
This site contains the trace chromatogram data
organized by species. Data include chromatogram, quality scores, FASTA
sequences from automatic base calls, and other ancillary information in
tab-delimited text as well as XML formats. See the README file for details.
This site contains individual directories for each
organism with data in UniGene. The data for each species includes the unique
sequence for each UniGene cluster, all sequences in each cluster in FASTA
format and library information for the cluster. See the README file for further
details.
This site contains the UniVec and UniVec_Core
databases in FASTA format. See the README.uv file for details.
This site contains whole genome shotgun sequence data
organized by the 4-digit project code. Data include GenBank and GenPept flat
files, quality scores and summary statistics. See the README.genbank.wgs file
for more information.
Submissions
An online form that provides an interface for
researchers, consortia and organizations to register their BioProjects. This
serves as the starting point for the submission of genomic and genetic data for
the study. The data does not need to be submitted at the time of BioProject registration.
A web-based sequence submission tool for one or a few
submissions to the GenBank database, designed to make the submission process
quick and easy.
Tool for submission to the GenBank database of Barcode
short nucleotide sequences from a standard genetic locus for use in species
identification.
A stand-alone software tool developed by the NCBI for
submitting and updating entries to public sequence databases (GenBank, EMBL, or
DDBJ). It is capable of handling simple submissions that contain a single short
mRNA sequence, complex submissions containing long sequences, multiple
annotations, segmented sets of DNA, as well as sequences from phylogenetic and
population studies with alignments. For simple submission, use the online
submission tool BankIt instead.
A command-line program that automates the creation of
sequence records for submission to GenBank using many of the same functions as
Sequin. It is used primarily for submission of complete genomes and large
batches of sequences.
This link describes how submitters of SRA data can
obtain a secure NCBI FTP site for their data, and also describes the allowed
data formats and directory structures.
A single entry point for submitters to link to and
find information about all of the data submission processes at NCBI. Currently,
this serves as an interface for the registration of BioProjects and BioSamples
and submission of data for WGS and GTR. Future additions to this site are
planned.
This link describes how submitters of trace data can
obtain a secure NCBI FTP site for their data, and also describes the allowed
data formats and directory structures.
Tools
Finds regions of local similarity between biological
sequences. The program compares nucleotide or protein sequences to sequence
databases and calculates the statistical significance of matches. BLAST can be
used to infer functional and evolutionary relationships between sequences as
well as to help identify members of gene families.
Allows you to retrieve records from many Entrez
databases by uploading a file of GI or accession numbers from the Nucleotide or
Protein databases, or a file of unique identifiers from other Entrez databases.
Search results can be saved in various formats directly to a local file on your
computer.
Tools that provide access to data within NCBI's Entrez
system outside of the regular web query interface. They provide a method of
automating Entrez tasks within software applications. Each utility performs a
specialized retrieval task, and can be used simply by writing a specially
formatted URL.
This tool compares nucleotide or protein sequences to
genomic sequence databases and calculates the statistical significance of
matches using the Basic Local Alignment Search Tool (BLAST) algorithm.
NCBI's Remap tool allows users to project annotation
data and convert locations of features from one genomic assembly to another or
to RefSeqGene sequences through a base by base analysis. Options are provided
to adjust the stringency of remapping, and summary results are displayed on the
web page. Full results can be downloaded for viewing in NCBI's Genome Workbench
graphical viewer, and annotation data for the remapped features, as well as
summary data, is also available for download.
An integrated application for viewing and analyzing
sequence data. With Genome Workbench, you can view data in publically available
sequence databases at NCBI, and mix these data with your own data.
A graphical analysis tool that finds all open reading
frames in a user's sequence or in a sequence already in the database. Sixteen
different genetic codes can be used. The deduced amino acid sequence can be
saved in various formats and searched against protein databases using BLAST.
The Primer-BLAST tool uses Primer3 to design PCR
primers to a sequence template. The potential products are then automatically
analyzed with a BLAST search against user specified databases, to check the
specificity to the target intended.
A utility for computing alignment of proteins to
genomic nucleotide sequence. It is based on a variation of the Needleman Wunsch
global alignment algorithm and specifically accounts for introns and splice
signals. Due to this algorithm, ProSplign is accurate in determining splice
sites and tolerant to sequencing errors.
Provides a configurable graphical display of a
nucleotide or protein sequence and features that have been annotated on that
sequence. In addition to use on NCBI sequence database pages, this viewer is
available as an embeddable webpage component. Detailed
documentation including an API Reference guide is available for
developers wishing to embed the viewer in their own pages.
A utility for computing cDNA-to-Genomic sequence
alignments. It is based on a variation of the Needleman-Wunsch global alignment
algorithm and specifically accounts for introns and splice signals. Due to this
algorithm, Splign is accurate in determining splice sites and tolerant to
sequencing errors.
A system for quickly identifying segments of a nucleic
acid sequence that may be of vector origin. VecScreen searches a query sequence
for segments that match any sequence in a specialized non-redundant vector
database (UniVec).
No comments:
Post a Comment