Fork me on GitHub

Utilities in QIIME 2

There are many non-plugin-based utilities available in QIIME 2. The following document attempts to demonstrate many of these functions. This document is divided by interface, and attempts to cross-reference similar functionality available in other interfaces.

q2cli

Most of the interesting utilities can be found in the tools subcommand of q2cli:

qiime tools --help

stdout:

Usage: qiime tools [OPTIONS] COMMAND [ARGS]...

  Tools for working with QIIME 2 files.

Options:
  --help      Show this message and exit.

Commands:
  cache-create              Create an empty cache at the given location.
  cache-fetch               Fetches an artifact out of a cache into a .qza.
  cache-garbage-collection  Runs garbage collection on the cache at the
                            specified location.
  cache-import              Imports data into an Artifact in the cache under a
                            key.
  cache-remove              Removes a given key from a cache.
  cache-status              Checks the status of the cache.
  cache-store               Stores a .qza in the cache under a key.
  cast-metadata             Designate metadata column types.
  citations                 Print citations for a QIIME 2 result.
  export                    Export data from a QIIME 2 Artifact or a
                            Visualization
  extract                   Extract a QIIME 2 Artifact or Visualization
                            archive.
  import                    Import data into a new QIIME 2 Artifact.
  inspect-metadata          Inspect columns available in metadata.
  list-formats              List the available formats.
  list-types                List the available semantic types.
  peek                      Take a peek at a QIIME 2 Artifact or
                            Visualization.
  replay-citations          Reports all citations from a QIIME 2 Artifact...
  replay-provenance         Replay provenance from a QIIME 2 Artifact...
  replay-supplement         Produces a zipfile package of useful...
  validate                  Validate data in a QIIME 2 Artifact.
  view                      View a QIIME 2 Visualization.

Let’s get our hands on some data so that we can learn more about this functionality! First, we will take a look at the taxonomic bar charts from the PD Mice Tutorial:

Please select a download option that is most appropriate for your environment:
wget \
  -O "taxa-barplot.qzv" \
  "https://data.qiime2.org/2024.10/tutorials/utilities/taxa-barplot.qzv"
curl -sL \
  "https://data.qiime2.org/2024.10/tutorials/utilities/taxa-barplot.qzv" > \
  "taxa-barplot.qzv"

Retrieving Citations

Now that we have some results, let’s learn more about the citations relevant to the creation of this visualization. First, we can check the help text for the qiime tools citations command:

qiime tools citations --help

stdout:

Usage: qiime tools citations [OPTIONS] ARTIFACT/VISUALIZATION

  Print citations as a BibTex file (.bib) for a QIIME 2 result.

Options:
  --help      Show this message and exit.

Output visualizations:

Now that we know how to use the command, we will run the following:

qiime tools citations taxa-barplot.qzv

stdout:

@article{framework|qiime2:2019.10.0|0,
 author = {Bolyen, Evan and Rideout, Jai Ram and Dillon, Matthew R. and Bokulich, Nicholas A. and Abnet, Christian C. and Al-Ghalith, Gabriel A. and Alexander, Harriet and Alm, Eric J. and Arumugam, Manimozhiyan and Asnicar, Francesco and Bai, Yang and Bisanz, Jordan E. and Bittinger, Kyle and Brejnrod, Asker and Brislawn, Colin J. and Brown, C. Titus and Callahan, Benjamin J. and Caraballo-Rodríguez, Andrés Mauricio and Chase, John and Cope, Emily K. and Da Silva, Ricardo and Diener, Christian and Dorrestein, Pieter C. and Douglas, Gavin M. and Durall, Daniel M. and Duvallet, Claire and Edwardson, Christian F. and Ernst, Madeleine and Estaki, Mehrbod and Fouquier, Jennifer and Gauglitz, Julia M. and Gibbons, Sean M. and Gibson, Deanna L. and Gonzalez, Antonio and Gorlick, Kestrel and Guo, Jiarong and Hillmann, Benjamin and Holmes, Susan and Holste, Hannes and Huttenhower, Curtis and Huttley, Gavin A. and Janssen, Stefan and Jarmusch, Alan K. and Jiang, Lingjing and Kaehler, Benjamin D. and Kang, Kyo Bin and Keefe, Christopher R. and Keim, Paul and Kelley, Scott T. and Knights, Dan and Koester, Irina and Kosciolek, Tomasz and Kreps, Jorden and Langille, Morgan G. I. and Lee, Joslynn and Ley, Ruth and Liu, Yong-Xin and Loftfield, Erikka and Lozupone, Catherine and Maher, Massoud and Marotz, Clarisse and Martin, Bryan D. and McDonald, Daniel and McIver, Lauren J. and Melnik, Alexey V. and Metcalf, Jessica L. and Morgan, Sydney C. and Morton, Jamie T. and Naimey, Ahmad Turan and Navas-Molina, Jose A. and Nothias, Louis Felix and Orchanian, Stephanie B. and Pearson, Talima and Peoples, Samuel L. and Petras, Daniel and Preuss, Mary Lai and Pruesse, Elmar and Rasmussen, Lasse Buur and Rivers, Adam and Robeson, Michael S. and Rosenthal, Patrick and Segata, Nicola and Shaffer, Michael and Shiffer, Arron and Sinha, Rashmi and Song, Se Jin and Spear, John R. and Swafford, Austin D. and Thompson, Luke R. and Torres, Pedro J. and Trinh, Pauline and Tripathi, Anupriya and Turnbaugh, Peter J. and Ul-Hasan, Sabah and van der Hooft, Justin J. J. and Vargas, Fernando and Vázquez-Baeza, Yoshiki and Vogtmann, Emily and von Hippel, Max and Walters, William and Wan, Yunhu and Wang, Mingxun and Warren, Jonathan and Weber, Kyle C. and Williamson, Charles H. D. and Willis, Amy D. and Xu, Zhenjiang Zech and Zaneveld, Jesse R. and Zhang, Yilong and Zhu, Qiyun and Knight, Rob and Caporaso, J. Gregory},
 doi = {10.1038/s41587-019-0209-9},
 issn = {1546-1696},
 journal = {Nature Biotechnology},
 number = {8},
 pages = {852-857},
 title = {Reproducible, interactive, scalable and extensible microbiome data science using QIIME 2},
 url = {https://doi.org/10.1038/s41587-019-0209-9},
 volume = {37},
 year = {2019}
}

@article{view|types:2019.10.0|BIOMV210DirFmt|0,
 author = {McDonald, Daniel and Clemente, Jose C and Kuczynski, Justin and Rideout, Jai Ram and Stombaugh, Jesse and Wendel, Doug and Wilke, Andreas and Huse, Susan and Hufnagle, John and Meyer, Folker and Knight, Rob and Caporaso, J Gregory},
 doi = {10.1186/2047-217X-1-7},
 journal = {GigaScience},
 number = {1},
 pages = {7},
 publisher = {BioMed Central},
 title = {The Biological Observation Matrix (BIOM) format or: how I learned to stop worrying and love the ome-ome},
 volume = {1},
 year = {2012}
}

@inproceedings{view|types:2019.10.0|pandas.core.frame:DataFrame|0,
 author = { Wes McKinney },
 booktitle = { Proceedings of the 9th Python in Science Conference },
 editor = { Stéfan van der Walt and Jarrod Millman },
 pages = { 51 -- 56 },
 title = { Data Structures for Statistical Computing in Python },
 year = { 2010 }
}

@inproceedings{view|types:2019.10.0|pandas.core.series:Series|0,
 author = { Wes McKinney },
 booktitle = { Proceedings of the 9th Python in Science Conference },
 editor = { Stéfan van der Walt and Jarrod Millman },
 pages = { 51 -- 56 },
 title = { Data Structures for Statistical Computing in Python },
 year = { 2010 }
}

@article{plugin|dada2:2019.10.0|0,
 author = {Callahan, Benjamin J and McMurdie, Paul J and Rosen, Michael J and Han, Andrew W and Johnson, Amy Jo A and Holmes, Susan P},
 doi = {10.1038/nmeth.3869},
 journal = {Nature methods},
 number = {7},
 pages = {581},
 publisher = {Nature Publishing Group},
 title = {DADA2: high-resolution sample inference from Illumina amplicon data},
 volume = {13},
 year = {2016}
}

@article{framework|qiime2:2019.4.0|0,
 author = {Bolyen, Evan and Rideout, Jai Ram and Dillon, Matthew R and Bokulich, Nicholas A and Abnet, Christian and Al-Ghalith, Gabriel A and Alexander, Harriet and Alm, Eric J and Arumugam, Manimozhiyan and Asnicar, Francesco and Bai, Yang and Bisanz, Jordan E and Bittinger, Kyle and Brejnrod, Asker and Brislawn, Colin J and Brown, C Titus and Callahan, Benjamin J and Caraballo-Rodríguez, Andrés Mauricio and Chase, John and Cope, Emily and Da Silva, Ricardo and Dorrestein, Pieter C and Douglas, Gavin M and Durall, Daniel M and Duvallet, Claire and Edwardson, Christian F and Ernst, Madeleine and Estaki, Mehrbod and Fouquier, Jennifer and Gauglitz, Julia M and Gibson, Deanna L and Gonzalez, Antonio and Gorlick, Kestrel and Guo, Jiarong and Hillmann, Benjamin and Holmes, Susan and Holste, Hannes and Huttenhower, Curtis and Huttley, Gavin and Janssen, Stefan and Jarmusch, Alan K and Jiang, Lingjing and Kaehler, Benjamin and Kang, Kyo Bin and Keefe, Christopher R and Keim, Paul and Kelley, Scott T and Knights, Dan and Koester, Irina and Kosciolek, Tomasz and Kreps, Jorden and Langille, Morgan GI and Lee, Joslynn and Ley, Ruth and Liu, Yong-Xin and Loftfield, Erikka and Lozupone, Catherine and Maher, Massoud and Marotz, Clarisse and Martin, Bryan and McDonald, Daniel and McIver, Lauren J and Melnik, Alexey V and Metcalf, Jessica L and Morgan, Sydney C and Morton, Jamie and Naimey, Ahmad Turan and Navas-Molina, Jose A and Nothias, Louis Felix and Orchanian, Stephanie B and Pearson, Talima and Peoples, Samuel L and Petras, Daniel and Preuss, Mary Lai and Pruesse, Elmar and Rasmussen, Lasse Buur and Rivers, Adam and Robeson, II, Michael S and Rosenthal, Patrick and Segata, Nicola and Shaffer, Michael and Shiffer, Arron and Sinha, Rashmi and Song, Se Jin and Spear, John R and Swafford, Austin D and Thompson, Luke R and Torres, Pedro J and Trinh, Pauline and Tripathi, Anupriya and Turnbaugh, Peter J and Ul-Hasan, Sabah and van der Hooft, Justin JJ and Vargas, Fernando and Vázquez-Baeza, Yoshiki and Vogtmann, Emily and von Hippel, Max and Walters, William and Wan, Yunhu and Wang, Mingxun and Warren, Jonathan and Weber, Kyle C and Williamson, Chase HD and Willis, Amy D and Xu, Zhenjiang Zech and Zaneveld, Jesse R and Zhang, Yilong and Knight, Rob and Caporaso, J Gregory},
 doi = {10.7287/peerj.preprints.27295v1},
 issn = {2167-9843},
 journal = {PeerJ Preprints},
 month = {oct},
 pages = {e27295v1},
 title = {QIIME 2: Reproducible, interactive, scalable, and extensible microbiome data science},
 url = {https://doi.org/10.7287/peerj.preprints.27295v1},
 volume = {6},
 year = {2018}
}

@article{action|feature-classifier:2019.10.0|method:classify_sklearn|0,
 author = {Pedregosa, Fabian and Varoquaux, Gaël and Gramfort, Alexandre and Michel, Vincent and Thirion, Bertrand and Grisel, Olivier and Blondel, Mathieu and Prettenhofer, Peter and Weiss, Ron and Dubourg, Vincent and Vanderplas, Jake and Passos, Alexandre and Cournapeau, David and Brucher, Matthieu and Perrot, Matthieu and Duchesnay, Édouard},
 journal = {Journal of machine learning research},
 number = {Oct},
 pages = {2825--2830},
 title = {Scikit-learn: Machine learning in Python},
 volume = {12},
 year = {2011}
}

@article{plugin|feature-classifier:2019.10.0|0,
 author = {Bokulich, Nicholas A. and Kaehler, Benjamin D. and Rideout, Jai Ram and Dillon, Matthew and Bolyen, Evan and Knight, Rob and Huttley, Gavin A. and Caporaso, J. Gregory},
 doi = {10.1186/s40168-018-0470-z},
 journal = {Microbiome},
 number = {1},
 pages = {90},
 title = {Optimizing taxonomic classification of marker-gene amplicon sequences with QIIME 2's q2-feature-classifier plugin},
 url = {https://doi.org/10.1186/s40168-018-0470-z},
 volume = {6},
 year = {2018}
}

@article{view|types:2019.10.0|biom.table:Table|0,
 author = {McDonald, Daniel and Clemente, Jose C and Kuczynski, Justin and Rideout, Jai Ram and Stombaugh, Jesse and Wendel, Doug and Wilke, Andreas and Huse, Susan and Hufnagle, John and Meyer, Folker and Knight, Rob and Caporaso, J Gregory},
 doi = {10.1186/2047-217X-1-7},
 journal = {GigaScience},
 number = {1},
 pages = {7},
 publisher = {BioMed Central},
 title = {The Biological Observation Matrix (BIOM) format or: how I learned to stop worrying and love the ome-ome},
 volume = {1},
 year = {2012}
}

@article{plugin|feature-classifier:2019.4.0|0,
 author = {Bokulich, Nicholas A. and Kaehler, Benjamin D. and Rideout, Jai Ram and Dillon, Matthew and Bolyen, Evan and Knight, Rob and Huttley, Gavin A. and Caporaso, J. Gregory},
 doi = {10.1186/s40168-018-0470-z},
 journal = {Microbiome},
 number = {1},
 pages = {90},
 title = {Optimizing taxonomic classification of marker-gene amplicon sequences with QIIME 2's q2-feature-classifier plugin},
 url = {https://doi.org/10.1186/s40168-018-0470-z},
 volume = {6},
 year = {2018}
}

@article{action|feature-classifier:2019.4.0|method:fit_classifier_naive_bayes|0,
 author = {Pedregosa, Fabian and Varoquaux, Gaël and Gramfort, Alexandre and Michel, Vincent and Thirion, Bertrand and Grisel, Olivier and Blondel, Mathieu and Prettenhofer, Peter and Weiss, Ron and Dubourg, Vincent and Vanderplas, Jake and Passos, Alexandre and Cournapeau, David and Brucher, Matthieu and Perrot, Matthieu and Duchesnay, Édouard},
 journal = {Journal of machine learning research},
 number = {Oct},
 pages = {2825--2830},
 title = {Scikit-learn: Machine learning in Python},
 volume = {12},
 year = {2011}
}

@inproceedings{view|types:2019.4.1|pandas.core.series:Series|0,
 author = { Wes McKinney },
 booktitle = { Proceedings of the 9th Python in Science Conference },
 editor = { Stéfan van der Walt and Jarrod Millman },
 pages = { 51 -- 56 },
 title = { Data Structures for Statistical Computing in Python },
 year = { 2010 }
}

As you can see, the citations for this particular visualization are presented above in BibTeX format.

We can also see the citations for a specific plugin:

qiime vsearch --citations

stdout:

% use `qiime tools citations` on a QIIME 2 result for complete list

@article{key0,
 author = {Rognes, Torbjørn and Flouri, Tomáš and Nichols, Ben and Quince, Christopher and Mahé, Frédéric},
 doi = {10.7717/peerj.2584},
 journal = {PeerJ},
 pages = {e2584},
 publisher = {PeerJ Inc.},
 title = {VSEARCH: a versatile open source tool for metagenomics},
 volume = {4},
 year = {2016}
}

And also for a specific action of a plugin:

qiime vsearch cluster-features-open-reference --citations

stdout:

% use `qiime tools citations` on a QIIME 2 result for complete list

@article{key0,
 author = {Rognes, Torbjørn and Flouri, Tomáš and Nichols, Ben and Quince, Christopher and Mahé, Frédéric},
 doi = {10.7717/peerj.2584},
 journal = {PeerJ},
 pages = {e2584},
 publisher = {PeerJ Inc.},
 title = {VSEARCH: a versatile open source tool for metagenomics},
 volume = {4},
 year = {2016}
}

@article{key1,
 author = {Rideout, Jai Ram and He, Yan and Navas-Molina, Jose A. and Walters, William A. and Ursell, Luke K. and Gibbons, Sean M. and Chase, John and McDonald, Daniel and Gonzalez, Antonio and Robbins-Pianka, Adam and Clemente, Jose C. and Gilbert, Jack A. and Huse, Susan M. and Zhou, Hong-Wei and Knight, Rob and Caporaso, J. Gregory},
 doi = {10.7717/peerj.545},
 journal = {PeerJ},
 pages = {e545},
 publisher = {PeerJ Inc.},
 title = {Subsampled open-reference clustering creates consistent, comprehensive OTU definitions and scales to billions of sequences},
 volume = {2},
 year = {2014}
}

Viewing Visualizations

What if we want to view our taxa bar plots? One option is to load the visualization at https://view.qiime2.org. All QIIME 2 Results may be opened this way. This will present the visualization (assuming the file is a .qzv), Result details (e.g. filename, uuid, type, format, citations), and a provenance graph showing how the Visualization or Artifact was created.

Note

Provenance viewing is only available at https://view.qiime2.org.

Another option is to use qiime tools view to accomplish the job. This command may only be used with Visualizations, and will not display Visualization details (see Peeking at Results) or provenence, but provides a quick and easy way to view your results from the command line.

qiime tools view taxa-barplot.qzv

This will open a browser window with your visualization loaded in it. When you are done, you can close the browser window and press ctrl-c on the keyboard to terminate the command.

Peeking at Results

Oftentimes we need to verify the type and uuid of an Artifact. We can use the qiime tools peek command to view a brief summary report of those facts. First, let’s get some data to look at:

Please select a download option that is most appropriate for your environment:
wget \
  -O "faith-pd-vector.qza" \
  "https://data.qiime2.org/2024.10/tutorials/utilities/faith-pd-vector.qza"
curl -sL \
  "https://data.qiime2.org/2024.10/tutorials/utilities/faith-pd-vector.qza" > \
  "faith-pd-vector.qza"

Now that we have data, we can learn more about the file:

qiime tools peek faith-pd-vector.qza

stdout:

UUID:        d5186dce-438d-44bb-903c-cb51a7ad4abe
Type:        SampleData[AlphaDiversity] % Properties('phylogenetic')
Data format: AlphaDiversityDirectoryFormat

Output artifacts:

Here we can see that the type of the Artifact is SampleData[AlphaDiversity] % Properties('phylogenetic'), as well as the Artifact’s UUID and format.

Validating Results

We can also validate the integrity of the file by running qiime tools validate:

qiime tools validate faith-pd-vector.qza

stdout:

Result faith-pd-vector.qza appears to be valid at level=max.

If there was an issue with the file, this command will usually do a good job of reporting what the problem is (within reason).

Inspecting Metadata

In the Metadata tutorial we learned about the metadata tabulate command, and the resulting visualization it creates. Oftentimes we don’t care so much about the values of the Metadata, but rather, just the shape of it: how many columns? What are their names? What are their types? How many rows (or IDs) are in the file?

We can demonstrate this by first downloading some sample metadata:

Please select a download option that is most appropriate for your environment:
wget \
  -O "sample-metadata.tsv" \
  "https://data.qiime2.org/2024.10/tutorials/pd-mice/sample_metadata.tsv"
curl -sL \
  "https://data.qiime2.org/2024.10/tutorials/pd-mice/sample_metadata.tsv" > \
  "sample-metadata.tsv"

Then, we can run the qiime tools inspect-metadata command:

qiime tools inspect-metadata sample-metadata.tsv

stdout:

              COLUMN NAME  TYPE       
=========================  ===========
                  barcode  categorical
                 mouse_id  categorical
                 genotype  categorical
                  cage_id  categorical
                    donor  categorical
             donor_status  categorical
     days_post_transplant  numeric    
genotype_and_donor_status  categorical
=========================  ===========
                     IDS:  48
                 COLUMNS:  8

Question

How many metadata columns are there in sample-metadata.tsv? How many IDs? Identify how many categorical columns are present. Now do the same for numeric columns.

This tool can be very helpful for learning about Metadata column names for files that are viewable as Metadata.

Please select a download option that is most appropriate for your environment:
wget \
  -O "jaccard-pcoa.qza" \
  "https://data.qiime2.org/2024.10/tutorials/utilities/jaccard-pcoa.qza"
curl -sL \
  "https://data.qiime2.org/2024.10/tutorials/utilities/jaccard-pcoa.qza" > \
  "jaccard-pcoa.qza"

The file we just downloaded is a Jaccard PCoA (from the PD Mice Tutorial), which, can be used in place of the “typical” TSV-formatted Metadata file. We might need to know about column names for commands we wish to run, using inspect-metadata, we can learn all about it:

qiime tools inspect-metadata jaccard-pcoa.qza

stdout:

COLUMN NAME  TYPE   
===========  =======
     Axis 1  numeric
     Axis 2  numeric
     Axis 3  numeric
     Axis 4  numeric
     Axis 5  numeric
     Axis 6  numeric
     Axis 7  numeric
     Axis 8  numeric
     Axis 9  numeric
    Axis 10  numeric
    Axis 11  numeric
    Axis 12  numeric
    Axis 13  numeric
    Axis 14  numeric
    Axis 15  numeric
    Axis 16  numeric
    Axis 17  numeric
    Axis 18  numeric
    Axis 19  numeric
    Axis 20  numeric
    Axis 21  numeric
    Axis 22  numeric
    Axis 23  numeric
    Axis 24  numeric
    Axis 25  numeric
    Axis 26  numeric
    Axis 27  numeric
    Axis 28  numeric
    Axis 29  numeric
    Axis 30  numeric
    Axis 31  numeric
    Axis 32  numeric
    Axis 33  numeric
    Axis 34  numeric
    Axis 35  numeric
    Axis 36  numeric
    Axis 37  numeric
    Axis 38  numeric
    Axis 39  numeric
    Axis 40  numeric
    Axis 41  numeric
    Axis 42  numeric
    Axis 43  numeric
    Axis 44  numeric
    Axis 45  numeric
    Axis 46  numeric
    Axis 47  numeric
===========  =======
       IDS:  47
   COLUMNS:  47

Output artifacts:

Question

How many IDs are there? How many columns? Are there any categorical columns? Why?

Casting Metadata Column Types

In the Metadata tutorial we learned about column types and utilizing the qiime tools cast-metadata tool to specifiy column types within a provided metadata file. Below we will go through a few scenarios of how this tool can be used, and some common mistakes that may come up.

We’ll start by first downloading some sample metadata. Note: This is the same sample metadata used in the Inspect Metadata section, so you can skip this step if you have already downloaded the sample_metadata.tsv file from above.

Please select a download option that is most appropriate for your environment:
wget \
  -O "sample_metadata.tsv" \
  "https://data.qiime2.org/2024.10/tutorials/pd-mice/sample_metadata.tsv"
curl -sL \
  "https://data.qiime2.org/2024.10/tutorials/pd-mice/sample_metadata.tsv" > \
  "sample_metadata.tsv"

In this example, we will cast the days_post_transplant column from numeric to categorical, and the mouse_id column from categorical to numeric. The rest of the columns contained within our metadata will be left as-is.

qiime tools cast-metadata sample_metadata.tsv \
  --cast days_post_transplant:categorical \
  --cast mouse_id:numeric

stdout:

sample_name	barcode	mouse_id	genotype	cage_id	donor	donor_status	days_post_transplant	genotype_and_donor_status
#q2:types	categorical	numeric	categorical	categorical	categorical	categorical	categorical	categorical
recip.220.WT.OB1.D7	CCTCCGTCATGG	457	wild type	C35	hc_1	Healthy	49	wild type and Healthy
recip.290.ASO.OB2.D1	AACAGTAAACAA	456	susceptible	C35	hc_1	Healthy	49	susceptible and Healthy
recip.389.WT.HC2.D21	ATGTATCAATTA	435	susceptible	C31	hc_1	Healthy	21	susceptible and Healthy
recip.391.ASO.PD2.D14	GTCAGTATGGCT	435	susceptible	C31	hc_1	Healthy	14	susceptible and Healthy
recip.391.ASO.PD2.D21	AGACAGTAGGAG	437	susceptible	C31	hc_1	Healthy	21	susceptible and Healthy
recip.391.ASO.PD2.D7	GGTCTTAGCACC	435	susceptible	C31	hc_1	Healthy	7	susceptible and Healthy
recip.400.ASO.HC2.D14	CGTTCGCTAGCC	437	susceptible	C31	hc_1	Healthy	14	susceptible and Healthy
recip.401.ASO.HC2.D7	ATTTACAATTGA	437	susceptible	C31	hc_1	Healthy	7	susceptible and Healthy
recip.403.ASO.PD2.D21	CGCAGATTAGTA	456	susceptible	C35	hc_1	Healthy	21	susceptible and Healthy
recip.411.ASO.HC2.D14	ATGTTAGGGAAT	456	susceptible	C35	hc_1	Healthy	14	susceptible and Healthy
recip.411.ASO.HC2.D21	CTCATATGCTAT	457	wild type	C35	hc_1	Healthy	21	wild type and Healthy
recip.411.ASO.HC2.D49	GCAACGAACGAG	435	susceptible	C31	hc_1	Healthy	49	susceptible and Healthy
recip.412.ASO.HC2.D14	AAGTGGCTATCC	457	wild type	C35	hc_1	Healthy	14	wild type and Healthy
recip.412.ASO.HC2.D7	GCATTCGGCGTT	456	susceptible	C35	hc_1	Healthy	7	susceptible and Healthy
recip.413.WT.HC2.D7	ACCAGTGACTCA	457	wild type	C35	hc_1	Healthy	7	wild type and Healthy
recip.456.ASO.HC3.D49	ACGGCGTTATGT	468	wild type	C42	hc_1	Healthy	49	wild type and Healthy
recip.458.ASO.HC3.D21	ACGGCCCTGGAG	468	wild type	C42	hc_1	Healthy	21	wild type and Healthy
recip.458.ASO.HC3.D49	CATTTGACGACG	469	wild type	C42	hc_1	Healthy	49	wild type and Healthy
recip.459.WT.HC3.D14	ACATGGGCGGAA	468	wild type	C42	hc_1	Healthy	14	wild type and Healthy
recip.459.WT.HC3.D21	CATAAATTCTTG	469	wild type	C42	hc_1	Healthy	21	wild type and Healthy
recip.459.WT.HC3.D49	GCTGCGTATACC	536	susceptible	C43	pd_1	PD	49	susceptible and PD
recip.460.WT.HC3.D14	CTGCGGATATAC	469	wild type	C42	hc_1	Healthy	14	wild type and Healthy
recip.460.WT.HC3.D21	GTCAATTAGTGG	536	susceptible	C43	pd_1	PD	21	susceptible and PD
recip.460.WT.HC3.D49	GAGAAGCTTATA	537	wild type	C43	pd_1	PD	49	wild type and PD
recip.460.WT.HC3.D7	GACCCGTTTCGC	468	wild type	C42	hc_1	Healthy	7	wild type and Healthy
recip.461.ASO.HC3.D21	AGCCCGCAAAGG	537	wild type	C43	pd_1	PD	21	wild type and PD
recip.461.ASO.HC3.D49	GGCGTAACGGCA	538	wild type	C44	pd_1	PD	49	wild type and PD
recip.461.ASO.HC3.D7	ATTGCCTTGATT	469	wild type	C42	hc_1	Healthy	7	wild type and Healthy
recip.462.WT.PD3.D14	GTGAGGGCAAGT	536	susceptible	C43	pd_1	PD	14	susceptible and PD
recip.462.WT.PD3.D21	GGCCTATAAGTC	538	wild type	C44	pd_1	PD	21	wild type and PD
recip.462.WT.PD3.D49	AATACAGACCTG	539	susceptible	C44	pd_1	PD	49	susceptible and PD
recip.462.WT.PD3.D7	TTAGGATTCTAT	536	susceptible	C43	pd_1	PD	7	susceptible and PD
recip.463.WT.PD3.D14	ATATTGGCAGCC	537	wild type	C43	pd_1	PD	14	wild type and PD
recip.463.WT.PD3.D21	CGCGGCGCAGCT	539	susceptible	C44	pd_1	PD	21	susceptible and PD
recip.463.WT.PD3.D7	GTTTATCTTAAG	537	wild type	C43	pd_1	PD	7	wild type and PD
recip.464.WT.PD3.D14	TCATCCGTCGGC	538	wild type	C44	pd_1	PD	14	wild type and PD
recip.465.ASO.PD3.D14	GGCTTCGGAGCG	539	susceptible	C44	pd_1	PD	14	susceptible and PD
recip.465.ASO.PD3.D7	CAGTCTAGTACG	538	wild type	C44	pd_1	PD	7	wild type and PD
recip.466.ASO.PD3.D7	GTGGGACTGCGC	539	susceptible	C44	pd_1	PD	7	susceptible and PD
recip.467.WT.HC3.D49.a	GTCAGGTGCGGC	437	susceptible	C31	hc_1	Healthy	49	susceptible and Healthy
recip.467.WT.HC3.D49.b	GTTAACTTACTA	546	susceptible	C49	pd_1	PD	49	susceptible and PD
recip.536.ASO.PD4.D49	CAAATTCGGGAT	547	wild type	C49	pd_1	PD	49	wild type and PD
recip.537.WT.PD4.D21	CTCTATTCCACC	546	susceptible	C49	pd_1	PD	21	susceptible and PD
recip.538.WT.PD4.D21	ATGGATAGCTAA	547	wild type	C49	pd_1	PD	21	wild type and PD
recip.539.ASO.PD4.D14	GATCCGGCAGGA	546	susceptible	C49	pd_1	PD	14	susceptible and PD
recip.539.ASO.PD4.D7	GTTCGAGTGAAT	546	susceptible	C49	pd_1	PD	7	susceptible and PD
recip.540.ASO.HC4.D14	CTTCCAACTCAT	547	wild type	C49	pd_1	PD	14	wild type and PD
recip.540.ASO.HC4.D7	CGGCCTAAGTTC	547	wild type	C49	pd_1	PD	7	wild type and PD

If the --output-file flag is enabled, the specified output file will contain the modified column types that we cast above, along with the rest of the columns and associated data contained in sample_metadata.tsv.

If you do not wish to save your cast metadata to an output file, you can omit the --output-file parameter and the results will be output to sdtout (as shown in the example above).

The --ignore-extra and --error-on-missing flags are used to handle cast columns not contained within the original metadata file, and columns contained within the metadata file that aren’t included in the cast call, respectively. We can take a look at how these flags can be used below:

In the first example, we’ll take a look at utilizing the --ignore-extra flag when a column is cast that is not included within the original metadata file. Let’s start by looking at what will happen if an extra column is included and this flag is not enabled.

qiime tools cast-metadata sample_metadata.tsv \
  --cast spleen:numeric

stderr:

Usage: qiime tools cast-metadata [OPTIONS] METADATA...
Try 'qiime tools cast-metadata --help' for help.

Error: Invalid value for cast: The following cast columns were not found within the metadata: spleen

Notice that the spleen column included in the cast call results in a raised error. If we want to ignore any extra columns that are not present in the original metadata file, we can enable the --ignore-extra flag.

qiime tools cast-metadata sample_metadata.tsv \
  --cast spleen:numeric \
  --ignore-extra

When this flag is enabled, all columns included in the cast that are not present in the original metadata file will be ignored. Note that stdout for this example has been omitted since we will not see a raised error with this flag enabled.

In our second example, we’ll take a look at the --error-on-missing flag, which handles columns that are present within the metadata that are not included in the cast call.

The default behavior permits a subset of the full metadata file to be included in the cast call (e.g. not all columns within the metadata must be present in the cast call). If the --error-on-missing flag is enabled, all metadata columns must be included in the cast call, otherwise an error will be raised.

qiime tools cast-metadata sample_metadata.tsv \
  --cast mouse_id:numeric \
  --error-on-missing

stderr:

Usage: qiime tools cast-metadata [OPTIONS] METADATA...
Try 'qiime tools cast-metadata --help' for help.

Error: Invalid value for cast: The following columns within the metadata were not provided in the cast: barcode, genotype_and_donor_status, cage_id, donor_status, donor, days_post_transplant, genotype

Artifact API

Unlike q2cli, the Artifact API (using QIIME 2 with Python) does not have a single central location for utility functions. Rather, utilities are often bound to objects as methods which operate on those objects.

Discovering Actions registered to a plugin

When working with a new plugin, it may be useful to check what Actions are available. We first import the plugin, and then query its actions attribute. This gives us a list of public methods, and details of whether they are methods, visualizers, or pipelines.

>>> from qiime2.plugins import feature_table
>>> help(feature_table.actions)
Help on module qiime2.plugins.feature_table.actions in qiime2.plugins.feature_table:

NAME
    qiime2.plugins.feature_table.actions

DATA
    __plugin__ = <qiime2.plugin.plugin.Plugin object>
    core_features = <visualizer qiime2.plugins.feature_table.visualizers.c...
    filter_features = <method qiime2.plugins.feature_table.methods.filter_...
    ...

If you already know that you are looking for a method, pipeline, or visualizer, you can get that subgroup of actions directly:

>>> help(feature_table.methods)

If you are working in a Jupyter Notebook or in iPython, you may prefer tab-complete to running help():

>>> feature_table.visualizers.  # press tab after the . for tab-complete...

Getting help with an Action

Once you have imported a plugin, action helptext is accessible in interactive sessions with the iPython ? operator:

>>> feature_table.methods.merge?
Call signature:
feature_table.methods.merge(
    tables: List[FeatureTable[Frequency]¹ | FeatureTable[RelativeFrequency]²],
    overlap_method: Str % Choices('average', 'error_on_overlapping_feature', 'error_on_overlapping_sample', 'sum')¹ | Str % Choices('average', 'error_on_overlapping_feature', 'error_on_overlapping_sample')² = 'error_on_overlapping_sample',
) -> (FeatureTable[Frequency]¹ | FeatureTable[RelativeFrequency]²,)
Type:           Method
String form:    <method qiime2.plugins.feature_table.methods.merge>
File:           ~/miniconda/envs/q2-dev/lib/python3.8/site-packages/qiime2/sdk/action.py
Docstring:      QIIME 2 Method
Call docstring:
Combine multiple tables

Combines feature tables using the `overlap_method` provided.

Parameters
----------
tables : List[FeatureTable[Frequency]¹ | FeatureTable[RelativeFrequency]²]
overlap_method : Str % Choices('average', 'error_on_overlapping_feature', 'error_on_overlapping_sample', 'sum')¹ | Str % Choices('average', 'error_on_overlapping_feature', 'error_on_overlapping_sample')², optional
    Method for handling overlapping ids.

Returns
-------
merged_table : FeatureTable[Frequency]¹ | FeatureTable[RelativeFrequency]²
    The resulting merged feature table.

Retrieving Citations

The Artifact API does not provide a utility for getting all citations from a plugin. Per-action citations are accessible in each action’s citations attribute, in BibTeX format.

>>> feature_table.actions.rarefy.citations
(CitationRecord(type='article', fields={'doi': '10.1186/s40168-017-0237-y', 'issn': '2049-2618', 'pages': '27', 'number': '1', 'volume': '5', 'month': 'Mar', 'year': '2017', 'journal': 'Microbiome', 'title': 'Normalization and microbial differential abundance strategies depend upon data characteristics', 'author': 'Weiss, Sophie and Xu, Zhenjiang Zech and Peddada, Shyamal and Amir, Amnon and Bittinger, Kyle and Gonzalez, Antonio and Lozupone, Catherine and Zaneveld, Jesse R. and Vázquez-Baeza, Yoshiki and Birmingham, Amanda and Hyde, Embriette R. and Knight, Rob'}),)

Peeking at Results

The Artifact API provides a .peek method that displays the UUID, Semantic Type, and :term: data format of any QIIME 2 archive.

>>> from qiime2 import Artifact
>>> Artifact.peek('observed_features_vector.qza')
ResultMetadata(uuid='2e96b8f3-8f0a-4f6e-b07e-fbf8326232e9', type='SampleData[AlphaDiversity]', format='AlphaDiversityDirectoryFormat')

If you have already loaded an artifact into memory and you’re not concerned with the data format, the artifact’s string representation will give you its UUID and Semantic Type.

>>> from qiime2 import Artifact
>>> table = Artifact.load('table.qza')
>>> table
<artifact: FeatureTable[Frequency] uuid: 2e96b8f3-8f0a-4f6e-b07e-fbf8326232e9>

Validating Results

Artifacts may be validated by loading them and then running the validate method. validate takes one parameter, level, which may be set to max or min, defaulting to max. Min validation is useful for quick checks, while max validation generally trades comprehensiveness for longer runtimes.

The validate method returns None if validation is successful; simply running x.validate() in the interpreter will output a blank line. If the artifact is invalide, a ValidationError or NotImplementedError is raised.

>>> from qiime2 import Artifact
>>> table = Artifact.load('table.qza')
>>> table.validate(level='min')

>>> print(table.validate())  # equivalent to print(table.validate(level='max'))
None

Viewing Data

The view API allows us to review many types of data without the need to save it as a .qza.

>>> art = artifact.load('some.qza')

...  # perform some analysis, producing a result

>>> myresult.view(pd.Series)
s00000001   74
s00000002   48
s00000003   79
s00000004   113
s00000005   111
Name: observed_otus, Length: 471, dtype: int64

Viewing data in a specific format is only possible if there is a transformer registered from the current view type to the type you want. We get an error if there’s no transformer. E.g. if we try to view this SampleData[AlphaDiversity] as a DataFrame.

>>> myresult.view(pd.Series)
---------------------------------------------------------------------------
Exception                                 Traceback (most recent call last)
/tmp/ipykernel_18201/824837086.py in <module>
     12 # Note: Views are only possible if there are transformers registered from the default
     13 # view type to the type you want. We get an error if there's no tranformer
---> 14 art.view(pd.DataFrame)

... # traceback Here

Exception: No transformation from <class 'q2_types.sample_data._format.AlphaDiversityDirectoryFormat'> to <class 'pandas.core.frame.DataFrame'>

Some Artifacts are viewable as metadata. If you’d like to check, try:

>>> art.has_metadata()
True

>>> art_as_md = art.view(Metadata)
>>> art_as_md
Metadata
--------
471 IDs x 1 column
observed_otus: ColumnProperties(type='numeric')

Call to_dataframe() for a tabular representation.

Viewing Visualizations

The Artifact API does not provide utilities for viewing QIIME 2 visualizations. Users generally save visualizations and use QIIME 2 View to explore.

art.save('obs_features.qza')

Inspecting Metadata

Metadata sheets can be viewed in summary or displayed nicely in DataFrame format, once they have been loaded.

>>> from qiime2 import Metadata
>>> metadata = Metadata.load('simple-metadata.tsv')
Metadata
--------
516 IDs x 3 columns
barcode:               ColumnProperties(type='categorical')
days:                  ColumnProperties(type='numeric')
extraction:            ColumnProperties(type='categorical')

>>> print(metadata)
>>> metadata.to_dataframe()
              barcode   days  extraction
sampleid
s00000001     806rcbc0   1       1
s00000002     806rcbc1   3       1
s00000003     806rcbc2   7       1
s00000004     806rcbc3   1       1
s00000005     806rcbc4   11      1
...           ...        ...     ...

Casting Metadata Column Types

The Artifact API does not provide a dedicated utility for casting metadata column type, and Metadata.columns is a read-only property. However, it is possible to edit your .tsv and re-load it with Metadata.load, or to cast your Metadata to a Pandas.DataFrame, cast the columns whose properties you need to change, and reload as Metadata with the types corrected. Here’s a walkthrough of the latter approach.

Load some Metadata

# Imagine you have loaded a tsv as metadata
>>> md = Metadata.load('md.tsv')
>>> print(md)

Metadata
--------
3 IDs x 5 columns
strCatOnly: ColumnProperties(type='categorical')
intNum:     ColumnProperties(type='numeric')
intCat:     ColumnProperties(type='categorical')
floatNum:   ColumnProperties(type='numeric')
floatCat:   ColumnProperties(type='categorical')

Call to_dataframe() for a tabular representation.

We have defined three columns of categorical data in the tsv, and two numeric. The column IDs describe the data values (e.g. int) and the declared column type (e.g. Num for numeric).

Limitations on casting

The sequences in strCatOnly are read in as python strings, and represented in the Numpy/Pandas stack as “objects”. Loading the metadata would fail with an error if we typed this column numeric, because we don’t have a good way to represent strings as numbers. Similarly, you won’t have much luck casting string data to int or float in Pandas.

Convert to DataFrame

>>> md = md.to_dataframe()

>>> print(md)
>>> print()
>>> print("intCat should be an object (because categorical): ", str(md['intCat'].dtype))
>>> print("floatNum should be a float (because numerical): ", str(md['floatNum'].dtype))
>>> print("intNum should be a float, not an int (because categorical): ", str(md['intCat'].dtype))

            strCatOnly  intNum intCat  floatNum floatCat
sampleid
S1        TCCCTTGTCTCC     1.0      1      1.01     1.01
S2        ACGAGACTGATT     3.0      3      3.01     3.01
S3        GCTGTACGGATT     7.0      7      7.01     7.01

intCat should be an object (because categorical):  object
floatNum should be a float (because numerical):  float64
intNum should be a float, not an int (because categorical): float64

The intNum and intCat columns of the original .tsv contained integer data. MetadataColumns typed as categorical are represented in Pandas as object. MetadataColumns typed as numeric are represented in Pandas as float. As such, intNum is rendered as floating point data when to_dataframe is called, and intCat is represented as an object in the DataFrame.

These behaviors roundtrip cleanly. If we cast our DataFrame back to Metadata without making any changes, the new Metadata will be identical to the original Metadata we loaded from the tsv. We’re here to see how DataFrames allow us to cast metadata column types, though, so let’s give it a shot.

Cast columns

>>> md['intCat'] = md['intCat'].astype("int")
>>> md['floatNum'] = md['floatNum'].astype('str')

>>> print(md)
>>> print()
>>> print("intCat should be an int now: ", str(md['intCat'].dtype))
>>> print("floatNum should be an object now: ", str(md['floatNum'].dtype))

            strCatOnly  intNum  intCat floatNum floatCat
sampleid
S1        TCCCTTGTCTCC     1.0       1     1.01     1.01
S2        ACGAGACTGATT     3.0       3     3.01     3.01
S3        GCTGTACGGATT     7.0       7     7.01     7.01

intCat should be an int now:  int64
floatNum should be an object now:  object

The DataFrame looks the same, but the column dtypes have changed as expected. When we turn this DataFrame back into Metadata, the ColumnProperties have changed accordingly. Columns represented in Pandas as objects (including strs) are categorical. Columns represented in Pandas as ints or floats are numeric.

Cast the DataFrame back to Metadata

>>> md = Metadata(md)
>>> md

Metadata
--------
3 IDs x 5 columns
strCatOnly: ColumnProperties(type='categorical')
intNum:     ColumnProperties(type='numeric')
intCat:     ColumnProperties(type='numeric')
floatNum:   ColumnProperties(type='categorical')
floatCat:   ColumnProperties(type='categorical')

Call to_dataframe() for a tabular representation.

Note that intCat, formerly categorical, is now numeric, while floatNum has changed from numeric to categorical.