Fork me on GitHub

get-gtdb-data: Download, parse, and import SSU GTDB reference data.

Citations

Docstring:

Usage: qiime rescript get-gtdb-data [OPTIONS]

  Download, parse, and import SSU GTDB files, given a version number.
  Downloads data directly from GTDB, parses the taxonomy files, and outputs
  ready-to-use sequence and taxonomy artifacts. REQUIRES STABLE INTERNET
  CONNECTION. NOTE: THIS ACTION ACQUIRES DATA FROM GTDB. SEE
  https://gtdb.ecogenomic.org/about FOR MORE INFORMATION and be aware that
  earlier versions may be released under a different license.

Parameters:
  --p-version TEXT Choices('202.0', '207.0', '214.0', '214.1')
                       GTDB database version to download.   [default: '214.1']
  --p-domain TEXT Choices('Both', 'Bacteria', 'Archaea')
                       SSU sequence and taxonomy data to download from a
                       given microbial domain from GTDB. 'Both' will fetch
                       both bacterial and archaeal data. 'Bacteria' will only
                       fetch bacterial data. 'Archaea' will only fetch
                       archaeal data. This only applies to 'db-type
                       SpeciesReps'.                         [default: 'Both']
  --p-db-type TEXT Choices('All', 'SpeciesReps')
                       'All': All SSU data that pass the quality-control of
                       GTDB, but are not clustered into representative
                       species. 'SpeciesReps': SSU gene sequences identified
                       within the set of representative species. Note: if
                       'All' is used, the 'domain' parameter will be ignored
                       as GTDB does not maintain separate domain-level files
                       for these non-clustered data.  [default: 'SpeciesReps']
Outputs:
  --o-gtdb-taxonomy ARTIFACT FeatureData[Taxonomy]
                       SSU GTDB reference taxonomy.                 [required]
  --o-gtdb-sequences ARTIFACT FeatureData[Sequence]
                       SSU GTDB reference sequences.                [required]
Miscellaneous:
  --output-dir PATH    Output unspecified results to a directory
  --verbose / --quiet  Display verbose output to stdout and/or stderr during
                       execution of this action. Or silence output if
                       execution is successful (silence is golden).
  --example-data PATH  Write example data and exit.
  --citations          Show citations and exit.
  --help               Show this message and exit.

Import:

from qiime2.plugins.rescript.methods import get_gtdb_data

Docstring:

Download, parse, and import SSU GTDB reference data.

Download, parse, and import SSU GTDB files, given a version number.
Downloads data directly from GTDB, parses the taxonomy files, and outputs
ready-to-use sequence and taxonomy artifacts. REQUIRES STABLE INTERNET
CONNECTION. NOTE: THIS ACTION ACQUIRES DATA FROM GTDB. SEE
https://gtdb.ecogenomic.org/about FOR MORE INFORMATION and be aware that
earlier versions may be released under a different license.

Parameters
----------
version : Str % Choices('202.0', '207.0', '214.0', '214.1'), optional
    GTDB database version to download.
domain : Str % Choices('Both', 'Bacteria', 'Archaea'), optional
    SSU sequence and taxonomy data to download from a given microbial
    domain from GTDB. 'Both' will fetch both bacterial and archaeal data.
    'Bacteria' will only fetch bacterial data. 'Archaea' will only fetch
    archaeal data. This only applies to 'db-type SpeciesReps'.
db_type : Str % Choices('All', 'SpeciesReps'), optional
    'All': All SSU data that pass the quality-control of GTDB, but are not
    clustered into representative species. 'SpeciesReps': SSU gene
    sequences identified within the set of representative species. Note: if
    'All' is used, the 'domain' parameter will be ignored as GTDB does not
    maintain separate domain-level files for these non-clustered data.

Returns
-------
gtdb_taxonomy : FeatureData[Taxonomy]
    SSU GTDB reference taxonomy.
gtdb_sequences : FeatureData[Sequence]
    SSU GTDB reference sequences.