Warning
This site has been replaced by the new QIIME 2 “amplicon distribution” documentation, as of the 2025.4 release of QIIME 2. You can still access the content from the “old docs” here for the QIIME 2 2024.10 and earlier releases, but we recommend that you transition to the new documentation at https://amplicon-docs.qiime2.org. Content on this site is no longer updated and may be out of date.
Are you looking for:
the QIIME 2 homepage? That’s https://qiime2.org.
learning resources for microbiome marker gene (i.e., amplicon) analysis? See the QIIME 2 amplicon distribution documentation.
learning resources for microbiome metagenome analysis? See the MOSHPIT documentation.
installation instructions, plugins, books, videos, workshops, or resources? See the QIIME 2 Library.
general help? See the QIIME 2 Forum.
Old content beyond this point… 👴👵
get-ncbi-data-protein: Download, parse, and import NCBI protein sequences and taxonomies¶
Citations |
|
---|
Docstring:
Usage: qiime rescript get-ncbi-data-protein [OPTIONS] Download and import sequences from the NCBI Protein database and download, parse, and import the corresponding taxonomies from the NCBI Taxonomy database. Please be aware of the NCBI Disclaimer and Copyright notice (https://www.ncbi.nlm.nih.gov/home/about/policies/), particularly "run retrieval scripts on weekends or between 9 pm and 5 am Eastern Time weekdays for any series of more than 100 requests". As a rough guide, if you are downloading more than 125,000 sequences, only run this method at those times. The NCBI servers can be capricious but reward polite persistence. If the download fails and gives you a message that contains the words "Last exception was ReadTimeout", you should probably try again, maybe with more connections. If it fails for any other reason, please create an issue at https://github.com/bokulich-lab/RESCRIPt. Parameters: --p-query TEXT Query on the NCBI Protein database [optional] --m-accession-ids-file METADATA... (multiple arguments List of accession ids for sequences in the NCBI will be merged) Protein database. [optional] --p-ranks TEXT... Choices('domain', 'superkingdom', 'kingdom', 'subkingdom', 'superphylum', 'phylum', 'subphylum', 'infraphylum', 'superclass', 'class', 'subclass', 'infraclass', 'cohort', 'superorder', 'order', 'suborder', 'infraorder', 'parvorder', 'superfamily', 'family', 'subfamily', 'tribe', 'subtribe', 'genus', 'subgenus', 'species group', 'species subgroup', 'species', 'subspecies', 'forma') List of taxonomic ranks for building a taxonomy from the NCBI Taxonomy database. [default: 'kingdom', 'phylum', 'class', 'order', 'family', 'genus', 'species'] [optional] --p-rank-propagation / --p-no-rank-propagation Propagate known ranks to missing ranks if true [default: True] --p-logging-level TEXT Choices('DEBUG', 'INFO', 'WARNING', 'ERROR', 'CRITICAL') Logging level, set to INFO for download progress or DEBUG for copious verbosity [optional] --p-n-jobs INTEGER Number of concurrent download connections. More is Range(1, None) faster until you run out of bandwidth. [default: 1] Outputs: --o-sequences ARTIFACT FeatureData[ProteinSequence] Sequences from the NCBI Protein database [required] --o-taxonomy ARTIFACT FeatureData[Taxonomy] Taxonomies from the NCBI Taxonomy database [required] Miscellaneous: --output-dir PATH Output unspecified results to a directory --verbose / --quiet Display verbose output to stdout and/or stderr during execution of this action. Or silence output if execution is successful (silence is golden). --example-data PATH Write example data and exit. --citations Show citations and exit. --use-cache DIRECTORY Specify the cache to be used for the intermediate work of this action. If not provided, the default cache under $TMP/qiime2/will be used. IMPORTANT FOR HPC USERS: If you are on an HPC system and are using parallel execution it is important to set this to a location that is globally accessible to all nodes in the cluster. --help Show this message and exit.
Import:
from qiime2.plugins.rescript.methods import get_ncbi_data_protein
Docstring:
Download, parse, and import NCBI protein sequences and taxonomies Download and import sequences from the NCBI Protein database and download, parse, and import the corresponding taxonomies from the NCBI Taxonomy database. Please be aware of the NCBI Disclaimer and Copyright notice (https://www.ncbi.nlm.nih.gov/home/about/policies/), particularly "run retrieval scripts on weekends or between 9 pm and 5 am Eastern Time weekdays for any series of more than 100 requests". As a rough guide, if you are downloading more than 125,000 sequences, only run this method at those times. The NCBI servers can be capricious but reward polite persistence. If the download fails and gives you a message that contains the words "Last exception was ReadTimeout", you should probably try again, maybe with more connections. If it fails for any other reason, please create an issue at https://github.com/bokulich-lab/RESCRIPt. Parameters ---------- query : Str, optional Query on the NCBI Protein database accession_ids : Metadata, optional List of accession ids for sequences in the NCBI Protein database. ranks : List[Str % Choices('domain', 'superkingdom', 'kingdom', 'subkingdom', 'superphylum', 'phylum', 'subphylum', 'infraphylum', 'superclass', 'class', 'subclass', 'infraclass', 'cohort', 'superorder', 'order', 'suborder', 'infraorder', 'parvorder', 'superfamily', 'family', 'subfamily', 'tribe', 'subtribe', 'genus', 'subgenus', 'species group', 'species subgroup', 'species', 'subspecies', 'forma')], optional List of taxonomic ranks for building a taxonomy from the NCBI Taxonomy database. [default: 'kingdom', 'phylum', 'class', 'order', 'family', 'genus', 'species'] rank_propagation : Bool, optional Propagate known ranks to missing ranks if true logging_level : Str % Choices('DEBUG', 'INFO', 'WARNING', 'ERROR', 'CRITICAL'), optional Logging level, set to INFO for download progress or DEBUG for copious verbosity n_jobs : Int % Range(1, None), optional Number of concurrent download connections. More is faster until you run out of bandwidth. Returns ------- sequences : FeatureData[ProteinSequence] Sequences from the NCBI Protein database taxonomy : FeatureData[Taxonomy] Taxonomies from the NCBI Taxonomy database