Utilities in QIIME 2¶
Contents
There are many non-plugin-based utilities available in QIIME 2. The following document attempts to demonstrate many of these functions. This document is divided by interface, and attempts to cross-reference similar functionality available in other interfaces.
q2cli
¶
Most of the interesting utilities can be found in the tools
subcommand of
q2cli
:
qiime tools --help
stdout:
Usage: qiime tools [OPTIONS] COMMAND [ARGS]...
Tools for working with QIIME 2 files.
Options:
--help Show this message and exit.
Commands:
cache-create Create an empty cache at the given location.
cache-fetch Fetches an artifact out of a cache into a .qza.
cache-garbage-collection Runs garbage collection on the cache at the
specified location.
cache-import Imports data into an Artifact in the cache under a
key.
cache-remove Removes a given key from a cache.
cache-status Checks the status of the cache.
cache-store Stores a .qza in the cache under a key.
cast-metadata Designate metadata column types.
citations Print citations for a QIIME 2 result.
export Export data from a QIIME 2 Artifact or a
Visualization
extract Extract a QIIME 2 Artifact or Visualization
archive.
import Import data into a new QIIME 2 Artifact.
inspect-metadata Inspect columns available in metadata.
list-formats List the available formats.
list-types List the available semantic types.
peek Take a peek at a QIIME 2 Artifact or
Visualization.
replay-citations Reports all citations from a QIIME 2 Artifact...
replay-provenance Replay provenance from a QIIME 2 Artifact...
replay-supplement Produces a zipfile package of useful...
validate Validate data in a QIIME 2 Artifact.
view View a QIIME 2 Visualization.
Let’s get our hands on some data so that we can learn more about this functionality! First, we will take a look at the taxonomic bar charts from the PD Mice Tutorial:
Download URL: https://data.qiime2.org/2024.10/tutorials/utilities/taxa-barplot.qzv
Save as: taxa-barplot.qzv
wget \
-O "taxa-barplot.qzv" \
"https://data.qiime2.org/2024.10/tutorials/utilities/taxa-barplot.qzv"
curl -sL \
"https://data.qiime2.org/2024.10/tutorials/utilities/taxa-barplot.qzv" > \
"taxa-barplot.qzv"
Retrieving Citations¶
Now that we have some results, let’s learn more about the citations relevant to
the creation of this visualization. First, we can check the help text for the
qiime tools citations
command:
qiime tools citations --help
stdout:
Usage: qiime tools citations [OPTIONS] ARTIFACT/VISUALIZATION
Print citations as a BibTex file (.bib) for a QIIME 2 result.
Options:
--help Show this message and exit.
Now that we know how to use the command, we will run the following:
qiime tools citations taxa-barplot.qzv
stdout:
@article{framework|qiime2:2019.10.0|0,
author = {Bolyen, Evan and Rideout, Jai Ram and Dillon, Matthew R. and Bokulich, Nicholas A. and Abnet, Christian C. and Al-Ghalith, Gabriel A. and Alexander, Harriet and Alm, Eric J. and Arumugam, Manimozhiyan and Asnicar, Francesco and Bai, Yang and Bisanz, Jordan E. and Bittinger, Kyle and Brejnrod, Asker and Brislawn, Colin J. and Brown, C. Titus and Callahan, Benjamin J. and Caraballo-Rodríguez, Andrés Mauricio and Chase, John and Cope, Emily K. and Da Silva, Ricardo and Diener, Christian and Dorrestein, Pieter C. and Douglas, Gavin M. and Durall, Daniel M. and Duvallet, Claire and Edwardson, Christian F. and Ernst, Madeleine and Estaki, Mehrbod and Fouquier, Jennifer and Gauglitz, Julia M. and Gibbons, Sean M. and Gibson, Deanna L. and Gonzalez, Antonio and Gorlick, Kestrel and Guo, Jiarong and Hillmann, Benjamin and Holmes, Susan and Holste, Hannes and Huttenhower, Curtis and Huttley, Gavin A. and Janssen, Stefan and Jarmusch, Alan K. and Jiang, Lingjing and Kaehler, Benjamin D. and Kang, Kyo Bin and Keefe, Christopher R. and Keim, Paul and Kelley, Scott T. and Knights, Dan and Koester, Irina and Kosciolek, Tomasz and Kreps, Jorden and Langille, Morgan G. I. and Lee, Joslynn and Ley, Ruth and Liu, Yong-Xin and Loftfield, Erikka and Lozupone, Catherine and Maher, Massoud and Marotz, Clarisse and Martin, Bryan D. and McDonald, Daniel and McIver, Lauren J. and Melnik, Alexey V. and Metcalf, Jessica L. and Morgan, Sydney C. and Morton, Jamie T. and Naimey, Ahmad Turan and Navas-Molina, Jose A. and Nothias, Louis Felix and Orchanian, Stephanie B. and Pearson, Talima and Peoples, Samuel L. and Petras, Daniel and Preuss, Mary Lai and Pruesse, Elmar and Rasmussen, Lasse Buur and Rivers, Adam and Robeson, Michael S. and Rosenthal, Patrick and Segata, Nicola and Shaffer, Michael and Shiffer, Arron and Sinha, Rashmi and Song, Se Jin and Spear, John R. and Swafford, Austin D. and Thompson, Luke R. and Torres, Pedro J. and Trinh, Pauline and Tripathi, Anupriya and Turnbaugh, Peter J. and Ul-Hasan, Sabah and van der Hooft, Justin J. J. and Vargas, Fernando and Vázquez-Baeza, Yoshiki and Vogtmann, Emily and von Hippel, Max and Walters, William and Wan, Yunhu and Wang, Mingxun and Warren, Jonathan and Weber, Kyle C. and Williamson, Charles H. D. and Willis, Amy D. and Xu, Zhenjiang Zech and Zaneveld, Jesse R. and Zhang, Yilong and Zhu, Qiyun and Knight, Rob and Caporaso, J. Gregory},
doi = {10.1038/s41587-019-0209-9},
issn = {1546-1696},
journal = {Nature Biotechnology},
number = {8},
pages = {852-857},
title = {Reproducible, interactive, scalable and extensible microbiome data science using QIIME 2},
url = {https://doi.org/10.1038/s41587-019-0209-9},
volume = {37},
year = {2019}
}
@article{view|types:2019.10.0|BIOMV210DirFmt|0,
author = {McDonald, Daniel and Clemente, Jose C and Kuczynski, Justin and Rideout, Jai Ram and Stombaugh, Jesse and Wendel, Doug and Wilke, Andreas and Huse, Susan and Hufnagle, John and Meyer, Folker and Knight, Rob and Caporaso, J Gregory},
doi = {10.1186/2047-217X-1-7},
journal = {GigaScience},
number = {1},
pages = {7},
publisher = {BioMed Central},
title = {The Biological Observation Matrix (BIOM) format or: how I learned to stop worrying and love the ome-ome},
volume = {1},
year = {2012}
}
@inproceedings{view|types:2019.10.0|pandas.core.frame:DataFrame|0,
author = { Wes McKinney },
booktitle = { Proceedings of the 9th Python in Science Conference },
editor = { Stéfan van der Walt and Jarrod Millman },
pages = { 51 -- 56 },
title = { Data Structures for Statistical Computing in Python },
year = { 2010 }
}
@inproceedings{view|types:2019.10.0|pandas.core.series:Series|0,
author = { Wes McKinney },
booktitle = { Proceedings of the 9th Python in Science Conference },
editor = { Stéfan van der Walt and Jarrod Millman },
pages = { 51 -- 56 },
title = { Data Structures for Statistical Computing in Python },
year = { 2010 }
}
@article{plugin|dada2:2019.10.0|0,
author = {Callahan, Benjamin J and McMurdie, Paul J and Rosen, Michael J and Han, Andrew W and Johnson, Amy Jo A and Holmes, Susan P},
doi = {10.1038/nmeth.3869},
journal = {Nature methods},
number = {7},
pages = {581},
publisher = {Nature Publishing Group},
title = {DADA2: high-resolution sample inference from Illumina amplicon data},
volume = {13},
year = {2016}
}
@article{framework|qiime2:2019.4.0|0,
author = {Bolyen, Evan and Rideout, Jai Ram and Dillon, Matthew R and Bokulich, Nicholas A and Abnet, Christian and Al-Ghalith, Gabriel A and Alexander, Harriet and Alm, Eric J and Arumugam, Manimozhiyan and Asnicar, Francesco and Bai, Yang and Bisanz, Jordan E and Bittinger, Kyle and Brejnrod, Asker and Brislawn, Colin J and Brown, C Titus and Callahan, Benjamin J and Caraballo-Rodríguez, Andrés Mauricio and Chase, John and Cope, Emily and Da Silva, Ricardo and Dorrestein, Pieter C and Douglas, Gavin M and Durall, Daniel M and Duvallet, Claire and Edwardson, Christian F and Ernst, Madeleine and Estaki, Mehrbod and Fouquier, Jennifer and Gauglitz, Julia M and Gibson, Deanna L and Gonzalez, Antonio and Gorlick, Kestrel and Guo, Jiarong and Hillmann, Benjamin and Holmes, Susan and Holste, Hannes and Huttenhower, Curtis and Huttley, Gavin and Janssen, Stefan and Jarmusch, Alan K and Jiang, Lingjing and Kaehler, Benjamin and Kang, Kyo Bin and Keefe, Christopher R and Keim, Paul and Kelley, Scott T and Knights, Dan and Koester, Irina and Kosciolek, Tomasz and Kreps, Jorden and Langille, Morgan GI and Lee, Joslynn and Ley, Ruth and Liu, Yong-Xin and Loftfield, Erikka and Lozupone, Catherine and Maher, Massoud and Marotz, Clarisse and Martin, Bryan and McDonald, Daniel and McIver, Lauren J and Melnik, Alexey V and Metcalf, Jessica L and Morgan, Sydney C and Morton, Jamie and Naimey, Ahmad Turan and Navas-Molina, Jose A and Nothias, Louis Felix and Orchanian, Stephanie B and Pearson, Talima and Peoples, Samuel L and Petras, Daniel and Preuss, Mary Lai and Pruesse, Elmar and Rasmussen, Lasse Buur and Rivers, Adam and Robeson, II, Michael S and Rosenthal, Patrick and Segata, Nicola and Shaffer, Michael and Shiffer, Arron and Sinha, Rashmi and Song, Se Jin and Spear, John R and Swafford, Austin D and Thompson, Luke R and Torres, Pedro J and Trinh, Pauline and Tripathi, Anupriya and Turnbaugh, Peter J and Ul-Hasan, Sabah and van der Hooft, Justin JJ and Vargas, Fernando and Vázquez-Baeza, Yoshiki and Vogtmann, Emily and von Hippel, Max and Walters, William and Wan, Yunhu and Wang, Mingxun and Warren, Jonathan and Weber, Kyle C and Williamson, Chase HD and Willis, Amy D and Xu, Zhenjiang Zech and Zaneveld, Jesse R and Zhang, Yilong and Knight, Rob and Caporaso, J Gregory},
doi = {10.7287/peerj.preprints.27295v1},
issn = {2167-9843},
journal = {PeerJ Preprints},
month = {oct},
pages = {e27295v1},
title = {QIIME 2: Reproducible, interactive, scalable, and extensible microbiome data science},
url = {https://doi.org/10.7287/peerj.preprints.27295v1},
volume = {6},
year = {2018}
}
@article{action|feature-classifier:2019.10.0|method:classify_sklearn|0,
author = {Pedregosa, Fabian and Varoquaux, Gaël and Gramfort, Alexandre and Michel, Vincent and Thirion, Bertrand and Grisel, Olivier and Blondel, Mathieu and Prettenhofer, Peter and Weiss, Ron and Dubourg, Vincent and Vanderplas, Jake and Passos, Alexandre and Cournapeau, David and Brucher, Matthieu and Perrot, Matthieu and Duchesnay, Édouard},
journal = {Journal of machine learning research},
number = {Oct},
pages = {2825--2830},
title = {Scikit-learn: Machine learning in Python},
volume = {12},
year = {2011}
}
@article{plugin|feature-classifier:2019.10.0|0,
author = {Bokulich, Nicholas A. and Kaehler, Benjamin D. and Rideout, Jai Ram and Dillon, Matthew and Bolyen, Evan and Knight, Rob and Huttley, Gavin A. and Caporaso, J. Gregory},
doi = {10.1186/s40168-018-0470-z},
journal = {Microbiome},
number = {1},
pages = {90},
title = {Optimizing taxonomic classification of marker-gene amplicon sequences with QIIME 2's q2-feature-classifier plugin},
url = {https://doi.org/10.1186/s40168-018-0470-z},
volume = {6},
year = {2018}
}
@article{view|types:2019.10.0|biom.table:Table|0,
author = {McDonald, Daniel and Clemente, Jose C and Kuczynski, Justin and Rideout, Jai Ram and Stombaugh, Jesse and Wendel, Doug and Wilke, Andreas and Huse, Susan and Hufnagle, John and Meyer, Folker and Knight, Rob and Caporaso, J Gregory},
doi = {10.1186/2047-217X-1-7},
journal = {GigaScience},
number = {1},
pages = {7},
publisher = {BioMed Central},
title = {The Biological Observation Matrix (BIOM) format or: how I learned to stop worrying and love the ome-ome},
volume = {1},
year = {2012}
}
@article{plugin|feature-classifier:2019.4.0|0,
author = {Bokulich, Nicholas A. and Kaehler, Benjamin D. and Rideout, Jai Ram and Dillon, Matthew and Bolyen, Evan and Knight, Rob and Huttley, Gavin A. and Caporaso, J. Gregory},
doi = {10.1186/s40168-018-0470-z},
journal = {Microbiome},
number = {1},
pages = {90},
title = {Optimizing taxonomic classification of marker-gene amplicon sequences with QIIME 2's q2-feature-classifier plugin},
url = {https://doi.org/10.1186/s40168-018-0470-z},
volume = {6},
year = {2018}
}
@article{action|feature-classifier:2019.4.0|method:fit_classifier_naive_bayes|0,
author = {Pedregosa, Fabian and Varoquaux, Gaël and Gramfort, Alexandre and Michel, Vincent and Thirion, Bertrand and Grisel, Olivier and Blondel, Mathieu and Prettenhofer, Peter and Weiss, Ron and Dubourg, Vincent and Vanderplas, Jake and Passos, Alexandre and Cournapeau, David and Brucher, Matthieu and Perrot, Matthieu and Duchesnay, Édouard},
journal = {Journal of machine learning research},
number = {Oct},
pages = {2825--2830},
title = {Scikit-learn: Machine learning in Python},
volume = {12},
year = {2011}
}
@inproceedings{view|types:2019.4.1|pandas.core.series:Series|0,
author = { Wes McKinney },
booktitle = { Proceedings of the 9th Python in Science Conference },
editor = { Stéfan van der Walt and Jarrod Millman },
pages = { 51 -- 56 },
title = { Data Structures for Statistical Computing in Python },
year = { 2010 }
}
As you can see, the citations for this particular visualization are presented above in BibTeX format.
We can also see the citations for a specific plugin:
qiime vsearch --citations
stdout:
% use `qiime tools citations` on a QIIME 2 result for complete list
@article{key0,
author = {Rognes, Torbjørn and Flouri, Tomáš and Nichols, Ben and Quince, Christopher and Mahé, Frédéric},
doi = {10.7717/peerj.2584},
journal = {PeerJ},
pages = {e2584},
publisher = {PeerJ Inc.},
title = {VSEARCH: a versatile open source tool for metagenomics},
volume = {4},
year = {2016}
}
And also for a specific action of a plugin:
qiime vsearch cluster-features-open-reference --citations
stdout:
% use `qiime tools citations` on a QIIME 2 result for complete list
@article{key0,
author = {Rognes, Torbjørn and Flouri, Tomáš and Nichols, Ben and Quince, Christopher and Mahé, Frédéric},
doi = {10.7717/peerj.2584},
journal = {PeerJ},
pages = {e2584},
publisher = {PeerJ Inc.},
title = {VSEARCH: a versatile open source tool for metagenomics},
volume = {4},
year = {2016}
}
@article{key1,
author = {Rideout, Jai Ram and He, Yan and Navas-Molina, Jose A. and Walters, William A. and Ursell, Luke K. and Gibbons, Sean M. and Chase, John and McDonald, Daniel and Gonzalez, Antonio and Robbins-Pianka, Adam and Clemente, Jose C. and Gilbert, Jack A. and Huse, Susan M. and Zhou, Hong-Wei and Knight, Rob and Caporaso, J. Gregory},
doi = {10.7717/peerj.545},
journal = {PeerJ},
pages = {e545},
publisher = {PeerJ Inc.},
title = {Subsampled open-reference clustering creates consistent, comprehensive OTU definitions and scales to billions of sequences},
volume = {2},
year = {2014}
}
Viewing Visualizations¶
What if we want to view our taxa bar plots? One option is to load the visualization
at https://view.qiime2.org. All QIIME 2 Results may be opened this way.
This will present the visualization (assuming the file is a .qzv
), Result
details (e.g. filename, uuid, type, format, citations), and a provenance graph
showing how the Visualization or Artifact was created.
Note
Provenance viewing is only available at https://view.qiime2.org.
Another option is to use qiime tools view
to accomplish the job. This command
may only be used with Visualizations, and will not display Visualization details
(see Peeking at Results) or provenence, but provides a quick and easy way to view your
results from the command line.
qiime tools view taxa-barplot.qzv
This will open a browser window with your visualization loaded in it. When you
are done, you can close the browser window and press ctrl-c
on the
keyboard to terminate the command.
Peeking at Results¶
Oftentimes we need to verify the type
and uuid
of an Artifact. We can use the
qiime tools peek
command to view a brief summary report of those facts. First,
let’s get some data to look at:
Download URL: https://data.qiime2.org/2024.10/tutorials/utilities/faith-pd-vector.qza
Save as: faith-pd-vector.qza
wget \
-O "faith-pd-vector.qza" \
"https://data.qiime2.org/2024.10/tutorials/utilities/faith-pd-vector.qza"
curl -sL \
"https://data.qiime2.org/2024.10/tutorials/utilities/faith-pd-vector.qza" > \
"faith-pd-vector.qza"
Now that we have data, we can learn more about the file:
qiime tools peek faith-pd-vector.qza
stdout:
UUID: d5186dce-438d-44bb-903c-cb51a7ad4abe
Type: SampleData[AlphaDiversity] % Properties('phylogenetic')
Data format: AlphaDiversityDirectoryFormat
Here we can see that the type of the Artifact is
SampleData[AlphaDiversity] % Properties('phylogenetic')
, as well as the
Artifact’s UUID and format.
Validating Results¶
We can also validate the integrity of the file by running
qiime tools validate
:
qiime tools validate faith-pd-vector.qza
stdout:
Result faith-pd-vector.qza appears to be valid at level=max.
If there was an issue with the file, this command will usually do a good job of reporting what the problem is (within reason).
Inspecting Metadata¶
In the Metadata tutorial we learned about the metadata tabulate
command, and the resulting visualization it creates. Oftentimes we don’t care
so much about the values of the Metadata, but rather, just the shape of it:
how many columns? What are their names? What are their types? How many rows (or IDs)
are in the file?
We can demonstrate this by first downloading some sample metadata:
Download URL: https://data.qiime2.org/2024.10/tutorials/pd-mice/sample_metadata.tsv
Save as: sample-metadata.tsv
wget \
-O "sample-metadata.tsv" \
"https://data.qiime2.org/2024.10/tutorials/pd-mice/sample_metadata.tsv"
curl -sL \
"https://data.qiime2.org/2024.10/tutorials/pd-mice/sample_metadata.tsv" > \
"sample-metadata.tsv"
Then, we can run the qiime tools inspect-metadata
command:
qiime tools inspect-metadata sample-metadata.tsv
stdout:
COLUMN NAME TYPE
========================= ===========
barcode categorical
mouse_id categorical
genotype categorical
cage_id categorical
donor categorical
donor_status categorical
days_post_transplant numeric
genotype_and_donor_status categorical
========================= ===========
IDS: 48
COLUMNS: 8
Question
How many metadata columns are there in sample-metadata.tsv
? How many IDs?
Identify how many categorical columns are present. Now do the same for numeric
columns.
This tool can be very helpful for learning about Metadata column names for files that are viewable as Metadata.
Download URL: https://data.qiime2.org/2024.10/tutorials/utilities/jaccard-pcoa.qza
Save as: jaccard-pcoa.qza
wget \
-O "jaccard-pcoa.qza" \
"https://data.qiime2.org/2024.10/tutorials/utilities/jaccard-pcoa.qza"
curl -sL \
"https://data.qiime2.org/2024.10/tutorials/utilities/jaccard-pcoa.qza" > \
"jaccard-pcoa.qza"
The file we just downloaded is a Jaccard PCoA (from the
PD Mice Tutorial), which, can be used in place of the “typical” TSV-formatted
Metadata file. We might need to know about column names for commands we wish to
run, using inspect-metadata
, we can learn all about it:
qiime tools inspect-metadata jaccard-pcoa.qza
stdout:
COLUMN NAME TYPE
=========== =======
Axis 1 numeric
Axis 2 numeric
Axis 3 numeric
Axis 4 numeric
Axis 5 numeric
Axis 6 numeric
Axis 7 numeric
Axis 8 numeric
Axis 9 numeric
Axis 10 numeric
Axis 11 numeric
Axis 12 numeric
Axis 13 numeric
Axis 14 numeric
Axis 15 numeric
Axis 16 numeric
Axis 17 numeric
Axis 18 numeric
Axis 19 numeric
Axis 20 numeric
Axis 21 numeric
Axis 22 numeric
Axis 23 numeric
Axis 24 numeric
Axis 25 numeric
Axis 26 numeric
Axis 27 numeric
Axis 28 numeric
Axis 29 numeric
Axis 30 numeric
Axis 31 numeric
Axis 32 numeric
Axis 33 numeric
Axis 34 numeric
Axis 35 numeric
Axis 36 numeric
Axis 37 numeric
Axis 38 numeric
Axis 39 numeric
Axis 40 numeric
Axis 41 numeric
Axis 42 numeric
Axis 43 numeric
Axis 44 numeric
Axis 45 numeric
Axis 46 numeric
Axis 47 numeric
=========== =======
IDS: 47
COLUMNS: 47
Question
How many IDs are there? How many columns? Are there any categorical columns? Why?
Casting Metadata Column Types¶
In the Metadata tutorial we learned about column types and utilizing the
qiime tools cast-metadata
tool to specifiy column types within a provided metadata file.
Below we will go through a few scenarios of how this tool can be used, and some common
mistakes that may come up.
We’ll start by first downloading some sample metadata. Note: This is the same sample
metadata used in the Inspect Metadata section, so you can skip this step if you have
already downloaded the sample_metadata.tsv
file from above.
Download URL: https://data.qiime2.org/2024.10/tutorials/pd-mice/sample_metadata.tsv
Save as: sample_metadata.tsv
wget \
-O "sample_metadata.tsv" \
"https://data.qiime2.org/2024.10/tutorials/pd-mice/sample_metadata.tsv"
curl -sL \
"https://data.qiime2.org/2024.10/tutorials/pd-mice/sample_metadata.tsv" > \
"sample_metadata.tsv"
In this example, we will cast the days_post_transplant
column from numeric
to
categorical
, and the mouse_id
column from categorical
to numeric
. The rest of
the columns contained within our metadata will be left as-is.
qiime tools cast-metadata sample_metadata.tsv \
--cast days_post_transplant:categorical \
--cast mouse_id:numeric
stdout:
sample_name barcode mouse_id genotype cage_id donor donor_status days_post_transplant genotype_and_donor_status
#q2:types categorical numeric categorical categorical categorical categorical categorical categorical
recip.220.WT.OB1.D7 CCTCCGTCATGG 457 wild type C35 hc_1 Healthy 49 wild type and Healthy
recip.290.ASO.OB2.D1 AACAGTAAACAA 456 susceptible C35 hc_1 Healthy 49 susceptible and Healthy
recip.389.WT.HC2.D21 ATGTATCAATTA 435 susceptible C31 hc_1 Healthy 21 susceptible and Healthy
recip.391.ASO.PD2.D14 GTCAGTATGGCT 435 susceptible C31 hc_1 Healthy 14 susceptible and Healthy
recip.391.ASO.PD2.D21 AGACAGTAGGAG 437 susceptible C31 hc_1 Healthy 21 susceptible and Healthy
recip.391.ASO.PD2.D7 GGTCTTAGCACC 435 susceptible C31 hc_1 Healthy 7 susceptible and Healthy
recip.400.ASO.HC2.D14 CGTTCGCTAGCC 437 susceptible C31 hc_1 Healthy 14 susceptible and Healthy
recip.401.ASO.HC2.D7 ATTTACAATTGA 437 susceptible C31 hc_1 Healthy 7 susceptible and Healthy
recip.403.ASO.PD2.D21 CGCAGATTAGTA 456 susceptible C35 hc_1 Healthy 21 susceptible and Healthy
recip.411.ASO.HC2.D14 ATGTTAGGGAAT 456 susceptible C35 hc_1 Healthy 14 susceptible and Healthy
recip.411.ASO.HC2.D21 CTCATATGCTAT 457 wild type C35 hc_1 Healthy 21 wild type and Healthy
recip.411.ASO.HC2.D49 GCAACGAACGAG 435 susceptible C31 hc_1 Healthy 49 susceptible and Healthy
recip.412.ASO.HC2.D14 AAGTGGCTATCC 457 wild type C35 hc_1 Healthy 14 wild type and Healthy
recip.412.ASO.HC2.D7 GCATTCGGCGTT 456 susceptible C35 hc_1 Healthy 7 susceptible and Healthy
recip.413.WT.HC2.D7 ACCAGTGACTCA 457 wild type C35 hc_1 Healthy 7 wild type and Healthy
recip.456.ASO.HC3.D49 ACGGCGTTATGT 468 wild type C42 hc_1 Healthy 49 wild type and Healthy
recip.458.ASO.HC3.D21 ACGGCCCTGGAG 468 wild type C42 hc_1 Healthy 21 wild type and Healthy
recip.458.ASO.HC3.D49 CATTTGACGACG 469 wild type C42 hc_1 Healthy 49 wild type and Healthy
recip.459.WT.HC3.D14 ACATGGGCGGAA 468 wild type C42 hc_1 Healthy 14 wild type and Healthy
recip.459.WT.HC3.D21 CATAAATTCTTG 469 wild type C42 hc_1 Healthy 21 wild type and Healthy
recip.459.WT.HC3.D49 GCTGCGTATACC 536 susceptible C43 pd_1 PD 49 susceptible and PD
recip.460.WT.HC3.D14 CTGCGGATATAC 469 wild type C42 hc_1 Healthy 14 wild type and Healthy
recip.460.WT.HC3.D21 GTCAATTAGTGG 536 susceptible C43 pd_1 PD 21 susceptible and PD
recip.460.WT.HC3.D49 GAGAAGCTTATA 537 wild type C43 pd_1 PD 49 wild type and PD
recip.460.WT.HC3.D7 GACCCGTTTCGC 468 wild type C42 hc_1 Healthy 7 wild type and Healthy
recip.461.ASO.HC3.D21 AGCCCGCAAAGG 537 wild type C43 pd_1 PD 21 wild type and PD
recip.461.ASO.HC3.D49 GGCGTAACGGCA 538 wild type C44 pd_1 PD 49 wild type and PD
recip.461.ASO.HC3.D7 ATTGCCTTGATT 469 wild type C42 hc_1 Healthy 7 wild type and Healthy
recip.462.WT.PD3.D14 GTGAGGGCAAGT 536 susceptible C43 pd_1 PD 14 susceptible and PD
recip.462.WT.PD3.D21 GGCCTATAAGTC 538 wild type C44 pd_1 PD 21 wild type and PD
recip.462.WT.PD3.D49 AATACAGACCTG 539 susceptible C44 pd_1 PD 49 susceptible and PD
recip.462.WT.PD3.D7 TTAGGATTCTAT 536 susceptible C43 pd_1 PD 7 susceptible and PD
recip.463.WT.PD3.D14 ATATTGGCAGCC 537 wild type C43 pd_1 PD 14 wild type and PD
recip.463.WT.PD3.D21 CGCGGCGCAGCT 539 susceptible C44 pd_1 PD 21 susceptible and PD
recip.463.WT.PD3.D7 GTTTATCTTAAG 537 wild type C43 pd_1 PD 7 wild type and PD
recip.464.WT.PD3.D14 TCATCCGTCGGC 538 wild type C44 pd_1 PD 14 wild type and PD
recip.465.ASO.PD3.D14 GGCTTCGGAGCG 539 susceptible C44 pd_1 PD 14 susceptible and PD
recip.465.ASO.PD3.D7 CAGTCTAGTACG 538 wild type C44 pd_1 PD 7 wild type and PD
recip.466.ASO.PD3.D7 GTGGGACTGCGC 539 susceptible C44 pd_1 PD 7 susceptible and PD
recip.467.WT.HC3.D49.a GTCAGGTGCGGC 437 susceptible C31 hc_1 Healthy 49 susceptible and Healthy
recip.467.WT.HC3.D49.b GTTAACTTACTA 546 susceptible C49 pd_1 PD 49 susceptible and PD
recip.536.ASO.PD4.D49 CAAATTCGGGAT 547 wild type C49 pd_1 PD 49 wild type and PD
recip.537.WT.PD4.D21 CTCTATTCCACC 546 susceptible C49 pd_1 PD 21 susceptible and PD
recip.538.WT.PD4.D21 ATGGATAGCTAA 547 wild type C49 pd_1 PD 21 wild type and PD
recip.539.ASO.PD4.D14 GATCCGGCAGGA 546 susceptible C49 pd_1 PD 14 susceptible and PD
recip.539.ASO.PD4.D7 GTTCGAGTGAAT 546 susceptible C49 pd_1 PD 7 susceptible and PD
recip.540.ASO.HC4.D14 CTTCCAACTCAT 547 wild type C49 pd_1 PD 14 wild type and PD
recip.540.ASO.HC4.D7 CGGCCTAAGTTC 547 wild type C49 pd_1 PD 7 wild type and PD
If the --output-file
flag is enabled, the specified output file will contain the modified
column types that we cast above, along with the rest of the columns and associated data
contained in sample_metadata.tsv
.
If you do not wish to save your cast metadata to an output file, you can omit the
--output-file
parameter and the results will be output to sdtout
(as shown in the
example above).
The --ignore-extra
and --error-on-missing
flags are used to handle cast columns not
contained within the original metadata file, and columns contained within the metadata file
that aren’t included in the cast call, respectively. We can take a look at how these flags can
be used below:
In the first example, we’ll take a look at utilizing the --ignore-extra
flag when a column
is cast that is not included within the original metadata file. Let’s start by looking at what
will happen if an extra column is included and this flag is not enabled.
qiime tools cast-metadata sample_metadata.tsv \
--cast spleen:numeric
stderr:
Usage: qiime tools cast-metadata [OPTIONS] METADATA...
Try 'qiime tools cast-metadata --help' for help.
Error: Invalid value for cast: The following cast columns were not found within the metadata: spleen
Notice that the spleen
column included in the cast call results in a raised error. If we
want to ignore any extra columns that are not present in the original metadata file, we can
enable the --ignore-extra
flag.
qiime tools cast-metadata sample_metadata.tsv \
--cast spleen:numeric \
--ignore-extra
When this flag is enabled, all columns included in the cast that are not present in the
original metadata file will be ignored. Note that stdout
for this example has been omitted
since we will not see a raised error with this flag enabled.
In our second example, we’ll take a look at the --error-on-missing
flag, which handles
columns that are present within the metadata that are not included in the cast call.
The default behavior permits a subset of the full metadata file to be included in the cast
call (e.g. not all columns within the metadata must be present in the cast call). If the
--error-on-missing
flag is enabled, all metadata columns must be included in the cast
call, otherwise an error will be raised.
qiime tools cast-metadata sample_metadata.tsv \
--cast mouse_id:numeric \
--error-on-missing
stderr:
Usage: qiime tools cast-metadata [OPTIONS] METADATA...
Try 'qiime tools cast-metadata --help' for help.
Error: Invalid value for cast: The following columns within the metadata were not provided in the cast: barcode, genotype_and_donor_status, cage_id, donor_status, donor, days_post_transplant, genotype
Artifact API¶
Unlike q2cli, the Artifact API (using QIIME 2 with Python) does not have a single central location for utility functions. Rather, utilities are often bound to objects as methods which operate on those objects.
Discovering Actions registered to a plugin¶
When working with a new plugin, it may be useful to check what Actions are available.
We first import the plugin, and then query its actions
attribute.
This gives us a list of public methods, and details of whether they are
methods, visualizers, or pipelines.
>>> from qiime2.plugins import feature_table
>>> help(feature_table.actions)
Help on module qiime2.plugins.feature_table.actions in qiime2.plugins.feature_table:
NAME
qiime2.plugins.feature_table.actions
DATA
__plugin__ = <qiime2.plugin.plugin.Plugin object>
core_features = <visualizer qiime2.plugins.feature_table.visualizers.c...
filter_features = <method qiime2.plugins.feature_table.methods.filter_...
...
If you already know that you are looking for a method, pipeline, or visualizer, you can get that subgroup of actions directly:
>>> help(feature_table.methods)
If you are working in a Jupyter Notebook or in iPython, you may prefer tab-complete to running help():
>>> feature_table.visualizers. # press tab after the . for tab-complete...
Getting help with an Action¶
Once you have imported a plugin, action helptext is accessible in interactive sessions
with the iPython ?
operator:
>>> feature_table.methods.merge?
Call signature:
feature_table.methods.merge(
tables: List[FeatureTable[Frequency]¹ | FeatureTable[RelativeFrequency]²],
overlap_method: Str % Choices('average', 'error_on_overlapping_feature', 'error_on_overlapping_sample', 'sum')¹ | Str % Choices('average', 'error_on_overlapping_feature', 'error_on_overlapping_sample')² = 'error_on_overlapping_sample',
) -> (FeatureTable[Frequency]¹ | FeatureTable[RelativeFrequency]²,)
Type: Method
String form: <method qiime2.plugins.feature_table.methods.merge>
File: ~/miniconda/envs/q2-dev/lib/python3.8/site-packages/qiime2/sdk/action.py
Docstring: QIIME 2 Method
Call docstring:
Combine multiple tables
Combines feature tables using the `overlap_method` provided.
Parameters
----------
tables : List[FeatureTable[Frequency]¹ | FeatureTable[RelativeFrequency]²]
overlap_method : Str % Choices('average', 'error_on_overlapping_feature', 'error_on_overlapping_sample', 'sum')¹ | Str % Choices('average', 'error_on_overlapping_feature', 'error_on_overlapping_sample')², optional
Method for handling overlapping ids.
Returns
-------
merged_table : FeatureTable[Frequency]¹ | FeatureTable[RelativeFrequency]²
The resulting merged feature table.
Retrieving Citations¶
The Artifact API does not provide a utility for getting all citations from a plugin.
Per-action citations are accessible in each action’s citations
attribute,
in BibTeX format.
>>> feature_table.actions.rarefy.citations
(CitationRecord(type='article', fields={'doi': '10.1186/s40168-017-0237-y', 'issn': '2049-2618', 'pages': '27', 'number': '1', 'volume': '5', 'month': 'Mar', 'year': '2017', 'journal': 'Microbiome', 'title': 'Normalization and microbial differential abundance strategies depend upon data characteristics', 'author': 'Weiss, Sophie and Xu, Zhenjiang Zech and Peddada, Shyamal and Amir, Amnon and Bittinger, Kyle and Gonzalez, Antonio and Lozupone, Catherine and Zaneveld, Jesse R. and Vázquez-Baeza, Yoshiki and Birmingham, Amanda and Hyde, Embriette R. and Knight, Rob'}),)
Peeking at Results¶
The Artifact API provides a .peek
method that displays the
UUID, Semantic Type, and :term: data format of any QIIME 2 archive.
>>> from qiime2 import Artifact
>>> Artifact.peek('observed_features_vector.qza')
ResultMetadata(uuid='2e96b8f3-8f0a-4f6e-b07e-fbf8326232e9', type='SampleData[AlphaDiversity]', format='AlphaDiversityDirectoryFormat')
If you have already loaded an artifact into memory and you’re not concerned with the data format, the artifact’s string representation will give you its UUID and Semantic Type.
>>> from qiime2 import Artifact
>>> table = Artifact.load('table.qza')
>>> table
<artifact: FeatureTable[Frequency] uuid: 2e96b8f3-8f0a-4f6e-b07e-fbf8326232e9>
Validating Results¶
Artifacts may be validated by loading them and then running the validate
method.
validate
takes one parameter, level
, which may be set to max
or min
,
defaulting to max
. Min validation is useful for quick checks,
while max validation generally trades comprehensiveness for longer runtimes.
The validate method returns None
if validation is successful;
simply running x.validate()
in the interpreter will output a blank line.
If the artifact is invalide, a ValidationError
or NotImplementedError
is raised.
>>> from qiime2 import Artifact
>>> table = Artifact.load('table.qza')
>>> table.validate(level='min')
>>> print(table.validate()) # equivalent to print(table.validate(level='max'))
None
Viewing Data¶
The view API allows us to review many types of data
without the need to save it as a .qza
.
>>> art = artifact.load('some.qza')
... # perform some analysis, producing a result
>>> myresult.view(pd.Series)
s00000001 74
s00000002 48
s00000003 79
s00000004 113
s00000005 111
Name: observed_otus, Length: 471, dtype: int64
Viewing data in a specific format is only possible if there is a transformer registered from the current view type to the type you want. We get an error if there’s no transformer. E.g. if we try to view this SampleData[AlphaDiversity] as a DataFrame.
>>> myresult.view(pd.Series)
---------------------------------------------------------------------------
Exception Traceback (most recent call last)
/tmp/ipykernel_18201/824837086.py in <module>
12 # Note: Views are only possible if there are transformers registered from the default
13 # view type to the type you want. We get an error if there's no tranformer
---> 14 art.view(pd.DataFrame)
... # traceback Here
Exception: No transformation from <class 'q2_types.sample_data._format.AlphaDiversityDirectoryFormat'> to <class 'pandas.core.frame.DataFrame'>
Some Artifacts are viewable as metadata. If you’d like to check, try:
>>> art.has_metadata()
True
>>> art_as_md = art.view(Metadata)
>>> art_as_md
Metadata
--------
471 IDs x 1 column
observed_otus: ColumnProperties(type='numeric')
Call to_dataframe() for a tabular representation.
Viewing Visualizations¶
The Artifact API does not provide utilities for viewing QIIME 2 visualizations. Users generally save visualizations and use QIIME 2 View to explore.
art.save('obs_features.qza')
Inspecting Metadata¶
Metadata sheets can be viewed in summary or displayed nicely in DataFrame format, once they have been loaded.
>>> from qiime2 import Metadata
>>> metadata = Metadata.load('simple-metadata.tsv')
Metadata
--------
516 IDs x 3 columns
barcode: ColumnProperties(type='categorical')
days: ColumnProperties(type='numeric')
extraction: ColumnProperties(type='categorical')
>>> print(metadata)
>>> metadata.to_dataframe()
barcode days extraction
sampleid
s00000001 806rcbc0 1 1
s00000002 806rcbc1 3 1
s00000003 806rcbc2 7 1
s00000004 806rcbc3 1 1
s00000005 806rcbc4 11 1
... ... ... ...
Casting Metadata Column Types¶
The Artifact API does not provide a dedicated utility for casting metadata column type,
and Metadata.columns
is a read-only property.
However, it is possible to edit your .tsv
and re-load it with Metadata.load
,
or to cast your Metadata to a Pandas.DataFrame,
cast the columns whose properties you need to change,
and reload as Metadata with the types corrected.
Here’s a walkthrough of the latter approach.
Load some Metadata¶
# Imagine you have loaded a tsv as metadata
>>> md = Metadata.load('md.tsv')
>>> print(md)
Metadata
--------
3 IDs x 5 columns
strCatOnly: ColumnProperties(type='categorical')
intNum: ColumnProperties(type='numeric')
intCat: ColumnProperties(type='categorical')
floatNum: ColumnProperties(type='numeric')
floatCat: ColumnProperties(type='categorical')
Call to_dataframe() for a tabular representation.
We have defined three columns of categorical data in the tsv, and two numeric.
The column IDs describe the data values (e.g. int
)
and the declared column type (e.g. Num for numeric
).
Limitations on casting¶
The sequences in strCatOnly
are read in as python strings,
and represented in the Numpy/Pandas stack as “objects”.
Loading the metadata would fail with an error if we typed this column numeric
,
because we don’t have a good way to represent strings as numbers.
Similarly, you won’t have much luck casting string data to int
or float
in Pandas.
Convert to DataFrame¶
>>> md = md.to_dataframe()
>>> print(md)
>>> print()
>>> print("intCat should be an object (because categorical): ", str(md['intCat'].dtype))
>>> print("floatNum should be a float (because numerical): ", str(md['floatNum'].dtype))
>>> print("intNum should be a float, not an int (because categorical): ", str(md['intCat'].dtype))
strCatOnly intNum intCat floatNum floatCat
sampleid
S1 TCCCTTGTCTCC 1.0 1 1.01 1.01
S2 ACGAGACTGATT 3.0 3 3.01 3.01
S3 GCTGTACGGATT 7.0 7 7.01 7.01
intCat should be an object (because categorical): object
floatNum should be a float (because numerical): float64
intNum should be a float, not an int (because categorical): float64
The intNum
and intCat
columns of the original .tsv contained integer data.
MetadataColumns typed as categorical
are represented in Pandas as object
.
MetadataColumns typed as numeric
are represented in Pandas as float
.
As such, intNum
is rendered as floating point data when to_dataframe
is called,
and intCat
is represented as an object
in the DataFrame.
These behaviors roundtrip cleanly. If we cast our DataFrame back to Metadata without making any changes, the new Metadata will be identical to the original Metadata we loaded from the tsv. We’re here to see how DataFrames allow us to cast metadata column types, though, so let’s give it a shot.
Cast columns¶
>>> md['intCat'] = md['intCat'].astype("int")
>>> md['floatNum'] = md['floatNum'].astype('str')
>>> print(md)
>>> print()
>>> print("intCat should be an int now: ", str(md['intCat'].dtype))
>>> print("floatNum should be an object now: ", str(md['floatNum'].dtype))
strCatOnly intNum intCat floatNum floatCat
sampleid
S1 TCCCTTGTCTCC 1.0 1 1.01 1.01
S2 ACGAGACTGATT 3.0 3 3.01 3.01
S3 GCTGTACGGATT 7.0 7 7.01 7.01
intCat should be an int now: int64
floatNum should be an object now: object
The DataFrame looks the same, but the column dtypes have changed as expected.
When we turn this DataFrame back into Metadata,
the ColumnProperties
have changed accordingly.
Columns represented in Pandas as objects
(including strs
) are categorical
.
Columns represented in Pandas as ints
or floats
are numeric
.
Cast the DataFrame back to Metadata¶
>>> md = Metadata(md)
>>> md
Metadata
--------
3 IDs x 5 columns
strCatOnly: ColumnProperties(type='categorical')
intNum: ColumnProperties(type='numeric')
intCat: ColumnProperties(type='numeric')
floatNum: ColumnProperties(type='categorical')
floatCat: ColumnProperties(type='categorical')
Call to_dataframe() for a tabular representation.
Note that intCat
, formerly categorical
, is now numeric
,
while floatNum
has changed from numeric
to categorical
.