728

Web Search

 

May 6, 2010

DNA&RNA Databases and Tools

Databases

BioSystems

Database that groups biomedical literature, small molecules, and sequence data in terms of biological relationships.


Database of Expressed Sequence Tags (dbEST)

A divison of GenBank that contains short single-pass reads of cDNA (transcript) sequences. dbEST can be searched directly through the Nucleotide EST Database.


Database of Genome Survey Sequences (dbGSS)

A division of GenBank that contains short single-pass reads of genomic DNA. dbGSS can be searched directly through the Nucleotide GSS Database.


GenBank

The NIH genetic sequence database, an annotated collection of all publicly available DNA sequences. GenBank is part of the International Nucleotide Sequence Database Collaboration, which comprises the DNA DataBank of Japan (DDBJ), the European Molecular Biology Laboratory (EMBL), and GenBank at NCBI. These three organizations exchange data on a daily basis. GenBank consists of several divisions, most of which can be accessed through the Nucleotide database. The exceptions are the EST and GSS divisions, which are accessed through the Nucleotide EST and Nucleotide GSS databases, respectively.


NCBI Website Search

A database of static NCBI web pages, documentation, and online tools. These pages include such content as specialized online sequence analysis tools, back issues of newsletters, legacy resource description pages, sample code, and other miscellaneous resources. Searching this database is equivalent to a site search tool for the whole NCBI web site. FTP site is not covered.


Nucleotide Database

A collection of nucleotide sequences from several sources, including GenBank, RefSeq, the Third Party Annotation (TPA) database, and PDB. Searching the Nucleotide Database will yield available results from each of its component databases.

PopSet

Database of related DNA sequences that originate from comparative studies: phylogenetic, population, environmental and, to a lesser degree, mutational. Each record in the database is a set of DNA sequences. For example, a population set provides information on genetic variation within an organism, while a phylogenetic set may contain sequences, and their alignment, of a single gene obtained from several related organisms.


Probe

A public registry of nucleic acid reagents designed for use in a wide variety of biomedical research applications, together with information on reagent distributors, probe effectiveness, and computed sequence similarities.

Reference Sequence (RefSeq)

A collection of curated, non-redundant genomic DNA, transcript (RNA), and protein sequences produced by NCBI. RefSeqs provide a stable reference for genome annotation, gene identification and characterization, mutation and polymorphism analysis, expression studies, and comparative analyses. The RefSeq collection is accessed through the Nucleotide and Protein databases.


Sequence Read Archive (SRA)

The Sequence Read Archive (SRA) stores sequencing data from the next generation of sequencing platforms including Roche 454 GS System®, Illumina Genome Analyzer®, Life Technologies AB SOLiD System® , Helicos Biosciences Heliscope®, Complete Genomics®, and Pacific Biosciences SMRT®


Third Party Annotation (TPA) Database

A database that contains sequences built from the existing primary sequence data in GenBank. The sequences and corresponding annotations are experimentally supported and have been published in a peer-reviewed scientific journal. TPA records are retrieved through the Nucleotide Database.


Trace Archive

A repository of DNA sequence chromatograms (traces), base calls, and quality estimates for single-pass reads from various large-scale sequencing projects.


Tools

Batch Entrez

Allows you to retrieve records from many Entrez databases by uploading a file of GI or accession numbers from the Nucleotide or Protein databases, or a file of unique identifiers from other Entrez databases. Search results can be saved in various formats directly to a local file on your computer.


BLAST (Basic Local Alignment Search Tool)

Finds regions of local similarity between biological sequences. The program compares nucleotide or protein sequences to sequence databases and calculates the statistical significance of matches. BLAST can be used to infer functional and evolutionary relationships between sequences as well as to help identify members of gene families.


E-Utilities

Tools that provide access to data within NCBI's Entrez system outside of the regular web query interface. They provide a method of automating Entrez tasks within software applications. Each utility performs a specialized retrieval task, and can be used simply by writing a specially formatted URL.


Genome Workbench

An integrated application for viewing and analyzing sequence data. With Genome Workbench, you can view data in publically available sequence databases at NCBI, and mix these data with your own data.


Influenza Virus

Presents data from the NIAID Influenza Genome Sequencing Project and from GenBank, and provides tools for flu sequence analysis, annotation and submission to GenBank. It also provides links to other flu sequence resources, and publications and general information about flu viruses.


VecScreen

A system for quickly identifying segments of a nucleic acid sequence that may be of vector origin. VecScreen searches a query sequence for segments that match any sequence in a specialized non-redundant vector database (UniVec).

Downloads


BLAST (Stand-alone)

BLAST executables for local use are provided for Solaris, LINUX, Windows, and MacOSX systems. See the README file in the ftp directory for more information. Pre-formatted databases for BLAST nucleotide, protein, and translated searches also are available for downloading under the db subdirectory.


FTP: GenBank

This site contains files for all sequence records in GenBank in the default flat file format. The files are organized by GenBank division, and the full contents are described in the README.genbank file.


FTP: RefSeq

This site contains all nucleotide and protein sequence records in the Reference Sequence (RefSeq) collection. The "release" directory contains the most current release of the complete collection, while data for selected organisms (such as human, mouse and rat) are available in separate directories. Data are available in FASTA and flat file formats. See the README file for details.


FTP: UniVec

This site contains the UniVec and UniVec_Core databases in FASTA format. See the README.uv file for details.

Submissions


BankIt

A web-based sequence submission tool for one or a few submissions, designed to make the submission process quick and easy.


Barcode Submission

Tool to submit Barcodes, short nucleotide sequences from a standard genetic locus for use in species identification.


Sequin

A stand-alone software tool developed by the NCBI for submitting and updating entries to public sequence databases (GenBank, EMBL, or DDBJ). It is capable of handling simple submissions that contain a single short mRNA sequence, complex submissions containing long sequences, multiple annotations, segmented sets of DNA, as well as sequences from phylogenetic and population studies with alignments. For simple submission, use the online submission tool BankIt instead.


tbl2asn

A command-line program that automates the creation of sequence records for submission to GenBank using many of the same functions as Sequin. It is used primarily for submission of complete genomes and large batches of sequences.

No comments: