Warning
This site has been replaced by the new QIIME 2 “amplicon distribution” documentation, as of the 2025.4 release of QIIME 2. You can still access the content from the “old docs” here for the QIIME 2 2024.10 and earlier releases, but we recommend that you transition to the new documentation at https://amplicon-docs.qiime2.org. Content on this site is no longer updated and may be out of date.
Are you looking for:
the QIIME 2 homepage? That’s https://qiime2.org.
learning resources for microbiome marker gene (i.e., amplicon) analysis? See the QIIME 2 amplicon distribution documentation.
learning resources for microbiome metagenome analysis? See the MOSHPIT documentation.
installation instructions, plugins, books, videos, workshops, or resources? See the QIIME 2 Library.
general help? See the QIIME 2 Forum.
Old content beyond this point… 👴👵
get-ncbi-genomes: Fetch entire genomes and associated taxonomies and metadata using NCBI Datasets.¶
Citations |
|
---|
Docstring:
Usage: qiime rescript get-ncbi-genomes [OPTIONS] Uses NCBI Datasets to fetch genomes for indicated taxa. Nucleotide sequences and protein/gene annotations will be fetched and supplemented with full taxonomy of every sequence. Parameters: --p-taxon TEXT NCBI Taxonomy ID or name (common or scientific) at any taxonomic rank. [required] --p-assembly-source TEXT Choices('refseq', 'genbank', 'all') Fetch only RefSeq or GenBank genome assemblies. [default: 'refseq'] --p-assembly-levels TEXT... Choices('complete_genome', 'chromosome', 'scaffold', 'contig') Fetch only genome assemblies that are one of the specified assembly levels. [default: ['complete_genome']] --p-only-reference / --p-no-only-reference Fetch only reference and representative genome assemblies. [default: True] --p-only-genomic / --p-no-only-genomic Exclude plasmid, mitochondrial and chloroplast molecules from the final results (i.e., keep only genomic DNA). [default: False] --p-tax-exact-match / --p-no-tax-exact-match If true, only return assemblies with the given NCBI Taxonomy ID, or name. Otherwise, assemblies from taxonomy subtree are included, too. [default: False] --p-page-size INTEGER Range(20, 1000, inclusive_end=True) The maximum number of genome assemblies to return per request. If number of genomes to fetch is higher than this number, requests will be repeated until all assemblies are fetched. [default: 20] --p-ranks TEXT... Choices('domain', 'superkingdom', 'kingdom', 'subkingdom', 'superphylum', 'phylum', 'subphylum', 'infraphylum', 'superclass', 'class', 'subclass', 'infraclass', 'cohort', 'superorder', 'order', 'suborder', 'infraorder', 'parvorder', 'superfamily', 'family', 'subfamily', 'tribe', 'subtribe', 'genus', 'subgenus', 'species group', 'species subgroup', 'species', 'subspecies', 'forma') List of taxonomic ranks for building a taxonomy from the NCBI Taxonomy database. [default: ['kingdom', 'phylum', 'class', 'order', 'family', 'genus', 'species']] --p-rank-propagation / --p-no-rank-propagation If a rank has no taxonomy associated with it, the taxonomy from the upper-level rank of that lineage, will be propagated downward. For example, if we are missing the genus label for 'f__Pasteurellaceae; g__'then the 'f__' rank will be propagated to become: 'f__Pasteurellaceae; g__Pasteurellaceae'. [default: True] Outputs: --o-genome-assemblies ARTIFACT FeatureData[Sequence] Nucleotide sequences of requested genomes. [required] --o-loci ARTIFACT Loci features of requested genomes. GenomeData[Loci] [required] --o-proteins ARTIFACT GenomeData[Proteins] Protein sequences originating from requested genomes. [required] --o-taxonomies ARTIFACT FeatureData[Taxonomy] Taxonomies of requested genomes. [required] Miscellaneous: --output-dir PATH Output unspecified results to a directory --verbose / --quiet Display verbose output to stdout and/or stderr during execution of this action. Or silence output if execution is successful (silence is golden). --example-data PATH Write example data and exit. --citations Show citations and exit. --use-cache DIRECTORY Specify the cache to be used for the intermediate work of this action. If not provided, the default cache under $TMP/qiime2/will be used. IMPORTANT FOR HPC USERS: If you are on an HPC system and are using parallel execution it is important to set this to a location that is globally accessible to all nodes in the cluster. --help Show this message and exit.
Import:
from qiime2.plugins.rescript.methods import get_ncbi_genomes
Docstring:
Fetch entire genomes and associated taxonomies and metadata using NCBI Datasets. Uses NCBI Datasets to fetch genomes for indicated taxa. Nucleotide sequences and protein/gene annotations will be fetched and supplemented with full taxonomy of every sequence. Parameters ---------- taxon : Str NCBI Taxonomy ID or name (common or scientific) at any taxonomic rank. assembly_source : Str % Choices('refseq', 'genbank', 'all'), optional Fetch only RefSeq or GenBank genome assemblies. assembly_levels : List[Str % Choices('complete_genome', 'chromosome', 'scaffold', 'contig')], optional Fetch only genome assemblies that are one of the specified assembly levels. only_reference : Bool, optional Fetch only reference and representative genome assemblies. only_genomic : Bool, optional Exclude plasmid, mitochondrial and chloroplast molecules from the final results (i.e., keep only genomic DNA). tax_exact_match : Bool, optional If true, only return assemblies with the given NCBI Taxonomy ID, or name. Otherwise, assemblies from taxonomy subtree are included, too. page_size : Int % Range(20, 1000, inclusive_end=True), optional The maximum number of genome assemblies to return per request. If number of genomes to fetch is higher than this number, requests will be repeated until all assemblies are fetched. ranks : List[Str % Choices('domain', 'superkingdom', 'kingdom', 'subkingdom', 'superphylum', 'phylum', 'subphylum', 'infraphylum', 'superclass', 'class', 'subclass', 'infraclass', 'cohort', 'superorder', 'order', 'suborder', 'infraorder', 'parvorder', 'superfamily', 'family', 'subfamily', 'tribe', 'subtribe', 'genus', 'subgenus', 'species group', 'species subgroup', 'species', 'subspecies', 'forma')], optional List of taxonomic ranks for building a taxonomy from the NCBI Taxonomy database. rank_propagation : Bool, optional If a rank has no taxonomy associated with it, the taxonomy from the upper-level rank of that lineage, will be propagated downward. For example, if we are missing the genus label for 'f__Pasteurellaceae; g__'then the 'f__' rank will be propagated to become: 'f__Pasteurellaceae; g__Pasteurellaceae'. Returns ------- genome_assemblies : FeatureData[Sequence] Nucleotide sequences of requested genomes. loci : GenomeData[Loci] Loci features of requested genomes. proteins : GenomeData[Proteins] Protein sequences originating from requested genomes. taxonomies : FeatureData[Taxonomy] Taxonomies of requested genomes.