Fork me on GitHub

get-gtdb-data: Download, parse, and import SSU GTDB reference data.

Citations

Docstring:

Usage: qiime rescript get-gtdb-data [OPTIONS]

  Download, parse, and import SSU GTDB files, given a version number.
  Downloads data directly from GTDB, parses the taxonomy files, and outputs
  ready-to-use sequence and taxonomy artifacts. REQUIRES STABLE INTERNET
  CONNECTION. NOTE: THIS ACTION ACQUIRES DATA FROM GTDB. SEE
  https://gtdb.ecogenomic.org/about FOR MORE INFORMATION and be aware that
  earlier versions may be released under a different license.

Parameters:
  --p-version TEXT Choices('202.0', '207.0', '214.0', '214.1', '220.0')
                         GTDB database version to download. [default: '220.0']
  --p-domain TEXT Choices('Both', 'Bacteria', 'Archaea')
                         SSU sequence and taxonomy data to download from a
                         given microbial domain from GTDB. 'Both' will fetch
                         both bacterial and archaeal data. 'Bacteria' will
                         only fetch bacterial data. 'Archaea' will only fetch
                         archaeal data. This only applies to 'db-type
                         SpeciesReps'.                       [default: 'Both']
  --p-db-type TEXT Choices('All', 'SpeciesReps')
                         'All': All SSU data that pass the quality-control of
                         GTDB, but are not clustered into representative
                         species. 'SpeciesReps': SSU gene sequences identified
                         within the set of representative species. Note: if
                         'All' is used, the 'domain' parameter will be ignored
                         as GTDB does not maintain separate domain-level files
                         for these non-clustered data.
                                                      [default: 'SpeciesReps']
Outputs:
  --o-gtdb-taxonomy ARTIFACT FeatureData[Taxonomy]
                         SSU GTDB reference taxonomy.               [required]
  --o-gtdb-sequences ARTIFACT FeatureData[Sequence]
                         SSU GTDB reference sequences.              [required]
Miscellaneous:
  --output-dir PATH      Output unspecified results to a directory
  --verbose / --quiet    Display verbose output to stdout and/or stderr
                         during execution of this action. Or silence output if
                         execution is successful (silence is golden).
  --example-data PATH    Write example data and exit.
  --citations            Show citations and exit.
  --use-cache DIRECTORY  Specify the cache to be used for the intermediate
                         work of this action. If not provided, the default
                         cache under $TMP/qiime2/ will be used.
                         IMPORTANT FOR HPC USERS: If you are on an HPC system
                         and are using parallel execution it is important to
                         set this to a location that is globally accessible to
                         all nodes in the cluster.
  --help                 Show this message and exit.

Import:

from qiime2.plugins.rescript.methods import get_gtdb_data

Docstring:

Download, parse, and import SSU GTDB reference data.

Download, parse, and import SSU GTDB files, given a version number.
Downloads data directly from GTDB, parses the taxonomy files, and outputs
ready-to-use sequence and taxonomy artifacts. REQUIRES STABLE INTERNET
CONNECTION. NOTE: THIS ACTION ACQUIRES DATA FROM GTDB. SEE
https://gtdb.ecogenomic.org/about FOR MORE INFORMATION and be aware that
earlier versions may be released under a different license.

Parameters
----------
version : Str % Choices('202.0', '207.0', '214.0', '214.1', '220.0'), optional
    GTDB database version to download.
domain : Str % Choices('Both', 'Bacteria', 'Archaea'), optional
    SSU sequence and taxonomy data to download from a given microbial
    domain from GTDB. 'Both' will fetch both bacterial and archaeal data.
    'Bacteria' will only fetch bacterial data. 'Archaea' will only fetch
    archaeal data. This only applies to 'db-type SpeciesReps'.
db_type : Str % Choices('All', 'SpeciesReps'), optional
    'All': All SSU data that pass the quality-control of GTDB, but are not
    clustered into representative species. 'SpeciesReps': SSU gene
    sequences identified within the set of representative species. Note: if
    'All' is used, the 'domain' parameter will be ignored as GTDB does not
    maintain separate domain-level files for these non-clustered data.

Returns
-------
gtdb_taxonomy : FeatureData[Taxonomy]
    SSU GTDB reference taxonomy.
gtdb_sequences : FeatureData[Sequence]
    SSU GTDB reference sequences.