Warning

This site has been replaced by the new QIIME 2 “amplicon distribution” documentation, as of the 2025.4 release of QIIME 2. You can still access the content from the “old docs” here for the QIIME 2 2024.10 and earlier releases, but we recommend that you transition to the new documentation at https://amplicon-docs.qiime2.org. Content on this site is no longer updated and may be out of date.

Are you looking for:

the QIIME 2 homepage? That’s https://qiime2.org.
learning resources for microbiome marker gene (i.e., amplicon) analysis? See the QIIME 2 amplicon distribution documentation.
learning resources for microbiome metagenome analysis? See the MOSHPIT documentation.
installation instructions, plugins, books, videos, workshops, or resources? See the QIIME 2 Library.
general help? See the QIIME 2 Forum.

Old content beyond this point… 👴👵

get-ncbi-data-protein: Download, parse, and import NCBI protein sequences and taxonomies¶

Citations	Dennis A Benson, Mark Cavanaugh, Karen Clark, Ilene Karsch-Mizrachi, David J Lipman, James Ostell, and Eric W Sayers. Genbank. Nucleic acids research, 41(D1):D36–D42, 2012. NCBI Resource Coordinators. Database resources of the national center for biotechnology information. Nucleic acids research, 46(D1):D8–D13, 2018. URL: https://doi.org/10.1093/nar/gkx1095, doi:10.1093/nar/gkx1095.

Command line interface
Artifact API

Docstring:

Usage: qiime rescript get-ncbi-data-protein [OPTIONS]

  Download and import sequences from the NCBI Protein database and download,
  parse, and import the corresponding taxonomies from the NCBI Taxonomy
  database.

  Please be aware of the NCBI Disclaimer and Copyright notice
  (https://www.ncbi.nlm.nih.gov/home/about/policies/), particularly "run
  retrieval scripts on weekends or between 9 pm and 5 am Eastern Time weekdays
  for any series of more than 100 requests". As a rough guide, if you are
  downloading more than 125,000 sequences, only run this method at those
  times.

  The NCBI servers can be capricious but reward polite persistence. If the
  download fails and gives you a message that contains the words "Last
  exception was ReadTimeout", you should probably try again, maybe with more
  connections. If it fails for any other reason, please create an issue at
  https://github.com/bokulich-lab/RESCRIPt.

Parameters:
  --p-query TEXT          Query on the NCBI Protein database        [optional]
  --m-accession-ids-file METADATA...
    (multiple arguments   List of accession ids for sequences in the NCBI
     will be merged)      Protein database.                         [optional]
  --p-ranks TEXT... Choices('domain', 'superkingdom', 'kingdom',
    'subkingdom', 'superphylum', 'phylum', 'subphylum', 'infraphylum',
    'superclass', 'class', 'subclass', 'infraclass', 'cohort', 'superorder',
    'order', 'suborder', 'infraorder', 'parvorder', 'superfamily', 'family',
    'subfamily', 'tribe', 'subtribe', 'genus', 'subgenus', 'species group',
    'species subgroup', 'species', 'subspecies', 'forma')
                          List of taxonomic ranks for building a taxonomy
                          from the NCBI Taxonomy database. [default:
                          'kingdom', 'phylum', 'class', 'order', 'family',
                          'genus', 'species']                       [optional]
  --p-rank-propagation / --p-no-rank-propagation
                          Propagate known ranks to missing ranks if true
                                                               [default: True]
  --p-logging-level TEXT Choices('DEBUG', 'INFO', 'WARNING', 'ERROR',
    'CRITICAL')           Logging level, set to INFO for download progress or
                          DEBUG for copious verbosity               [optional]
  --p-n-jobs INTEGER      Number of concurrent download connections. More is
    Range(1, None)        faster until you run out of bandwidth.  [default: 1]
Outputs:
  --o-sequences ARTIFACT FeatureData[ProteinSequence]
                          Sequences from the NCBI Protein database  [required]
  --o-taxonomy ARTIFACT FeatureData[Taxonomy]
                          Taxonomies from the NCBI Taxonomy database
                                                                    [required]
Miscellaneous:
  --output-dir PATH       Output unspecified results to a directory
  --verbose / --quiet     Display verbose output to stdout and/or stderr
                          during execution of this action. Or silence output
                          if execution is successful (silence is golden).
  --example-data PATH     Write example data and exit.
  --citations             Show citations and exit.
  --use-cache DIRECTORY   Specify the cache to be used for the intermediate
                          work of this action. If not provided, the default
                          cache under $TMP/qiime2/ will be used.
                          IMPORTANT FOR HPC USERS: If you are on an HPC system
                          and are using parallel execution it is important to
                          set this to a location that is globally accessible
                          to all nodes in the cluster.
  --help                  Show this message and exit.

Import:

from qiime2.plugins.rescript.methods import get_ncbi_data_protein

Docstring:

Download, parse, and import NCBI protein sequences and taxonomies

Download and import sequences from the NCBI Protein database and download,
parse, and import the corresponding taxonomies from the NCBI Taxonomy
database.  Please be aware of the NCBI Disclaimer and Copyright notice
(https://www.ncbi.nlm.nih.gov/home/about/policies/), particularly "run
retrieval scripts on weekends or between 9 pm and 5 am Eastern Time
weekdays for any series of more than 100 requests". As a rough guide, if
you are downloading more than 125,000 sequences, only run this method at
those times.  The NCBI servers can be capricious but reward polite
persistence. If the download fails and gives you a message that contains
the words "Last exception was ReadTimeout", you should probably try again,
maybe with more connections. If it fails for any other reason, please
create an issue at https://github.com/bokulich-lab/RESCRIPt.

Parameters
----------
query : Str, optional
    Query on the NCBI Protein database
accession_ids : Metadata, optional
    List of accession ids for sequences in the NCBI Protein database.
ranks : List[Str % Choices('domain', 'superkingdom', 'kingdom', 'subkingdom', 'superphylum', 'phylum', 'subphylum', 'infraphylum', 'superclass', 'class', 'subclass', 'infraclass', 'cohort', 'superorder', 'order', 'suborder', 'infraorder', 'parvorder', 'superfamily', 'family', 'subfamily', 'tribe', 'subtribe', 'genus', 'subgenus', 'species group', 'species subgroup', 'species', 'subspecies', 'forma')], optional
    List of taxonomic ranks for building a taxonomy from the NCBI Taxonomy
    database. [default: 'kingdom', 'phylum', 'class', 'order', 'family',
    'genus', 'species']
rank_propagation : Bool, optional
    Propagate known ranks to missing ranks if true
logging_level : Str % Choices('DEBUG', 'INFO', 'WARNING', 'ERROR', 'CRITICAL'), optional
    Logging level, set to INFO for download progress or DEBUG for copious
    verbosity
n_jobs : Int % Range(1, None), optional
    Number of concurrent download connections. More is faster until you run
    out of bandwidth.

Returns
-------
sequences : FeatureData[ProteinSequence]
    Sequences from the NCBI Protein database
taxonomy : FeatureData[Taxonomy]
    Taxonomies from the NCBI Taxonomy database

get-ncbi-data-protein: Download, parse, and import NCBI protein sequences and taxonomies¶

Docstring:

Import:

Docstring:

Table of Contents

Quick search