Filtering feature tables#
We’ll next obtain a much larger feature table representing all of the samples included in the ([LTC+21]) dataset. These would take too much time to denoise in this course, so we’ll start with the feature table, sequences, and metadata provided by the authors and filter to samples that we’ll use for our analyses. If you’d like to perform other experiments with this feature table, you can do that using the full feature table or a subset that you define by filtering.
Access the data#
First, download the full feature table.
url = 'https://docs.qiime2.org/jupyterbooks/cancer-microbiome-intervention-tutorial/data/030-tutorial-downstream/010-filtering/feature-table.qza'
fn = 'feature-table.qza'
request.urlretrieve(url, fn)
feature_table = Artifact.load(fn)
url <- 'https://docs.qiime2.org/jupyterbooks/cancer-microbiome-intervention-tutorial/data/030-tutorial-downstream/010-filtering/feature-table.qza'
fn <- 'feature-table.qza'
request$urlretrieve(url, fn)
feature_table <- Artifact$load(fn)
wget \
-O 'feature-table.qza' \
'https://docs.qiime2.org/jupyterbooks/cancer-microbiome-intervention-tutorial/data/030-tutorial-downstream/010-filtering/feature-table.qza'
feature_table_url = 'https://data.qiime2.org/2024.5/tutorials/liao/full-feature-table.qza'
def artifact_from_url(url):
def factory():
import tempfile
import requests
import qiime2
data = requests.get(url)
with tempfile.NamedTemporaryFile() as f:
f.write(data.content)
f.flush()
result = qiime2.Artifact.load(f.name)
return result
return factory
feature_table = use.init_artifact(
'feature-table',
artifact_from_url(feature_table_url))
- Using the
Upload Data
tool: On the first tab (Regular), press the
Paste/Fetch
data button at the bottom.Set “Name” (first text-field) to:
feature-table.qza
In the larger text-area, copy-and-paste: https://docs.qiime2.org/jupyterbooks/cancer-microbiome-intervention-tutorial/data/030-tutorial-downstream/010-filtering/feature-table.qza
(“Type”, “Genome”, and “Settings” can be ignored)
Press the
Start
button at the bottom.
Next, download the ASV sequences.
url = 'https://docs.qiime2.org/jupyterbooks/cancer-microbiome-intervention-tutorial/data/030-tutorial-downstream/010-filtering/rep-seqs.qza'
fn = 'rep-seqs.qza'
request.urlretrieve(url, fn)
rep_seqs = Artifact.load(fn)
url <- 'https://docs.qiime2.org/jupyterbooks/cancer-microbiome-intervention-tutorial/data/030-tutorial-downstream/010-filtering/rep-seqs.qza'
fn <- 'rep-seqs.qza'
request$urlretrieve(url, fn)
rep_seqs <- Artifact$load(fn)
wget \
-O 'rep-seqs.qza' \
'https://docs.qiime2.org/jupyterbooks/cancer-microbiome-intervention-tutorial/data/030-tutorial-downstream/010-filtering/rep-seqs.qza'
seqs_url = 'https://data.qiime2.org/2024.5/tutorials/liao/rep-seqs.qza'
feature_sequences = use.init_artifact(
'rep-seqs',
artifact_from_url(seqs_url))
- Using the
Upload Data
tool: On the first tab (Regular), press the
Paste/Fetch
data button at the bottom.Set “Name” (first text-field) to:
rep-seqs.qza
In the larger text-area, copy-and-paste: https://docs.qiime2.org/jupyterbooks/cancer-microbiome-intervention-tutorial/data/030-tutorial-downstream/010-filtering/rep-seqs.qza
(“Type”, “Genome”, and “Settings” can be ignored)
Press the
Start
button at the bottom.
View the metadata#
We’ll take a quick look at the QIIME 2-formatted study metadata to refresh our memories. Either review the summary that you previously generated, or generate another one.
Expand this box for help generating a metadata summary.
metadata_summ_viz, = metadata_actions.tabulate(
input=sample_metadata_md,
)
action_results <- metadata_actions$tabulate(
input=sample_metadata_md,
)
metadata_summ_viz <- action_results$visualization
qiime metadata tabulate \
--m-input-file sample-metadata.tsv \
--o-visualization metadata-summ.qzv
use.action(
use.UsageAction(plugin_id='metadata', action_id='tabulate'),
use.UsageInputs(input=sample_metadata),
use.UsageOutputNames(visualization='metadata_summ')
)
- Using the
qiime2 metadata tabulate
tool: For “input”:
Perform the following steps.
Leave as
Metadata from TSV
Set “Metadata Source” to
sample-metadata.tsv
Press the
Execute
button.
- Once completed, for the new entry in your history, use the
Edit
button to set the name as follows: (Renaming is optional, but it will make any subsequent steps easier to complete.)
History Name
“Name” to set (be sure to press
Save
)#: qiime2 metadata tabulate [...] : visualization.qzv
metadata-summ.qzv
Generate summaries of full table and sequence data#
Next, it’s useful to generate summaries of the feature table and sequence data. We did this after running DADA2 previously, but since we’re now working with a new feature table and new sequence data, we should look at a summary of this table as well.
table_viz, = feature_table_actions.summarize(
table=feature_table,
sample_metadata=sample_metadata_md,
)
rep_seqs_viz, = feature_table_actions.tabulate_seqs(
data=rep_seqs,
)
action_results <- feature_table_actions$summarize(
table=feature_table,
sample_metadata=sample_metadata_md,
)
table_viz <- action_results$visualization
action_results <- feature_table_actions$tabulate_seqs(
data=rep_seqs,
)
rep_seqs_viz <- action_results$visualization
qiime feature-table summarize \
--i-table feature-table.qza \
--m-sample-metadata-file sample-metadata.tsv \
--o-visualization table.qzv
qiime feature-table tabulate-seqs \
--i-data rep-seqs.qza \
--o-visualization rep-seqs.qzv
use.action(
use.UsageAction(plugin_id='feature_table', action_id='summarize'),
use.UsageInputs(table=feature_table, sample_metadata=sample_metadata),
use.UsageOutputNames(visualization='table'),
)
use.action(
use.UsageAction(plugin_id='feature_table', action_id='tabulate_seqs'),
use.UsageInputs(data=feature_sequences),
use.UsageOutputNames(visualization='rep_seqs'),
)
- Using the
qiime2 feature-table summarize
tool: Set “table” to
#: feature-table.qza
Expand the
additional options
sectionFor “sample_metadata”:
Press the
+ Insert sample_metadata
button to set up the next steps.Leave as
Metadata from TSV
Set “Metadata Source” to
sample-metadata.tsv
Press the
Execute
button.
- Once completed, for the new entry in your history, use the
Edit
button to set the name as follows: (Renaming is optional, but it will make any subsequent steps easier to complete.)
History Name
“Name” to set (be sure to press
Save
)#: qiime2 feature-table summarize [...] : visualization.qzv
table.qzv
- Using the
qiime2 feature-table tabulate-seqs
tool: Set “data” to
#: rep-seqs.qza
Press the
Execute
button.
- Once completed, for the new entry in your history, use the
Edit
button to set the name as follows: (Renaming is optional, but it will make any subsequent steps easier to complete.)
History Name
“Name” to set (be sure to press
Save
)#: qiime2 feature-table tabulate-seqs [...] : visualization.qzv
rep-seqs.qzv
Which column or columns in the metadata could be used to identify samples that were included in the autoFMT study?
Solution to
Several columns contain this information, such as autoFmtGroup which contains the value “treatment” if the subject was in the treatment group, “control” if the subject was in the control group, and no value if the patient was not enrolled in this particular study.
Filter the feature table to the autoFMT study samples#
In this tutorial, we’re going to work specifically with samples that were included in the autoFMT randomized trial. We’ll now begin a series of filtering steps applied to both the feature table and the sequences to select only features and samples that are relevant to that study.
First, we’ll remove samples that are not part of the autoFMT study from the feature table. We identify these samples using the metadata. Specifically, this step filters samples that do not contain a value in the autoFmtGroup column in the metadata.
autofmt_table, = feature_table_actions.filter_samples(
table=feature_table,
metadata=sample_metadata_md,
where='autoFmtGroup IS NOT NULL',
)
action_results <- feature_table_actions$filter_samples(
table=feature_table,
metadata=sample_metadata_md,
where='autoFmtGroup IS NOT NULL',
)
autofmt_table <- action_results$filtered_table
qiime feature-table filter-samples \
--i-table feature-table.qza \
--m-metadata-file sample-metadata.tsv \
--p-where 'autoFmtGroup IS NOT NULL' \
--o-filtered-table autofmt-table.qza
autofmt_table, = use.action(
use.UsageAction(plugin_id='feature_table', action_id='filter_samples'),
use.UsageInputs(table=feature_table, metadata=sample_metadata,
where="autoFmtGroup IS NOT NULL"),
use.UsageOutputNames(filtered_table='autofmt_table')
)
- Using the
qiime2 feature-table filter-samples
tool: Set “table” to
#: feature-table.qza
Expand the
additional options
sectionFor “metadata”:
Press the
+ Insert metadata
button to set up the next steps.Leave as
Metadata from TSV
Set “Metadata Source” to
sample-metadata.tsv
Set “where” to
autoFmtGroup IS NOT NULL
Press the
Execute
button.
- Once completed, for the new entry in your history, use the
Edit
button to set the name as follows: (Renaming is optional, but it will make any subsequent steps easier to complete.)
History Name
“Name” to set (be sure to press
Save
)#: qiime2 feature-table filter-samples [...] : filtered_table.qza
autofmt-table.qza
We can now summarize the feature table again to observe how it changed as a result of this first filtering step.
autofmt_table_summ_viz, = feature_table_actions.summarize(
table=autofmt_table,
sample_metadata=sample_metadata_md,
)
action_results <- feature_table_actions$summarize(
table=autofmt_table,
sample_metadata=sample_metadata_md,
)
autofmt_table_summ_viz <- action_results$visualization
qiime feature-table summarize \
--i-table autofmt-table.qza \
--m-sample-metadata-file sample-metadata.tsv \
--o-visualization autofmt-table-summ.qzv
use.action(
use.UsageAction(plugin_id='feature_table', action_id='summarize'),
use.UsageInputs(table=autofmt_table, sample_metadata=sample_metadata),
use.UsageOutputNames(visualization='autofmt_table_summ'),
)
- Using the
qiime2 feature-table summarize
tool: Set “table” to
#: autofmt-table.qza
Expand the
additional options
sectionFor “sample_metadata”:
Press the
+ Insert sample_metadata
button to set up the next steps.Leave as
Metadata from TSV
Set “Metadata Source” to
sample-metadata.tsv
Press the
Execute
button.
- Once completed, for the new entry in your history, use the
Edit
button to set the name as follows: (Renaming is optional, but it will make any subsequent steps easier to complete.)
History Name
“Name” to set (be sure to press
Save
)#: qiime2 feature-table summarize [...] : visualization.qzv
autofmt-table-summ.qzv
How many samples and features are in this feature table after filtering? How does that compare to the feature table prior to filtering?
Solution to
Aftering filtering there are 556 samples and 4,256 features represented in the feature table.
Prior to filtering to just the autoFMT study, there were 12,546 samples and 17,865 features represented in the feature table.
Perform additional filtering steps on feature table#
Before we proceed with the analysis, we’ll apply a few more filtering steps.
First, we’re going to focus in on a specific window of time - mainly the ten days prior to the patients cell transplant through seventy days following the transplant. Some of the subjects in this study have very long-term microbiota data, but since many don’t it helps to just focus our analysis on the temporal range that is most relevant to this analysis.
filtered_table_1, = feature_table_actions.filter_samples(
table=autofmt_table,
metadata=sample_metadata_md,
where='DayRelativeToNearestHCT BETWEEN -10 AND 70',
)
action_results <- feature_table_actions$filter_samples(
table=autofmt_table,
metadata=sample_metadata_md,
where='DayRelativeToNearestHCT BETWEEN -10 AND 70',
)
filtered_table_1 <- action_results$filtered_table
qiime feature-table filter-samples \
--i-table autofmt-table.qza \
--m-metadata-file sample-metadata.tsv \
--p-where 'DayRelativeToNearestHCT BETWEEN -10 AND 70' \
--o-filtered-table filtered-table-1.qza
filtered_table_1, = use.action(
use.UsageAction(plugin_id='feature_table', action_id='filter_samples'),
use.UsageInputs(table=autofmt_table, metadata=sample_metadata,
where="DayRelativeToNearestHCT BETWEEN -10 AND 70"),
use.UsageOutputNames(filtered_table='filtered_table_1')
)
- Using the
qiime2 feature-table filter-samples
tool: Set “table” to
#: autofmt-table.qza
Expand the
additional options
sectionFor “metadata”:
Press the
+ Insert metadata
button to set up the next steps.Leave as
Metadata from TSV
Set “Metadata Source” to
sample-metadata.tsv
Set “where” to
DayRelativeToNearestHCT BETWEEN -10 AND 70
Press the
Execute
button.
- Once completed, for the new entry in your history, use the
Edit
button to set the name as follows: (Renaming is optional, but it will make any subsequent steps easier to complete.)
History Name
“Name” to set (be sure to press
Save
)#: qiime2 feature-table filter-samples [...] : filtered_table.qza
filtered-table-1.qza
Finally, we’ll filter features from the feature table if they don’t occur in at least two samples. This filter is used here primarily to reduce the runtime of some of the downstream steps for the purpose of this tutorial. This filter isn’t necessary to run in your own analyses.
filtered_table_2, = feature_table_actions.filter_features(
table=filtered_table_1,
min_samples=2,
)
action_results <- feature_table_actions$filter_features(
table=filtered_table_1,
min_samples=2L,
)
filtered_table_2 <- action_results$filtered_table
qiime feature-table filter-features \
--i-table filtered-table-1.qza \
--p-min-samples 2 \
--o-filtered-table filtered-table-2.qza
filtered_table_2, = use.action(
use.UsageAction(plugin_id='feature_table', action_id='filter_features'),
use.UsageInputs(table=filtered_table_1, min_samples=2),
use.UsageOutputNames(filtered_table='filtered_table_2')
)
- Using the
qiime2 feature-table filter-features
tool: Set “table” to
#: filtered-table-1.qza
Expand the
additional options
sectionSet “min_samples” to
2
Press the
Execute
button.
- Once completed, for the new entry in your history, use the
Edit
button to set the name as follows: (Renaming is optional, but it will make any subsequent steps easier to complete.)
History Name
“Name” to set (be sure to press
Save
)#: qiime2 feature-table filter-features [...] : filtered_table.qza
filtered-table-2.qza
Generate a summary of this latest filtered feature table on your own (expand this box for help if necessary). How many samples and features are in this feature table?
Solution to
filtered_table_2_summ_viz, = feature_table_actions.summarize(
table=filtered_table_2,
sample_metadata=sample_metadata_md,
)
action_results <- feature_table_actions$summarize(
table=filtered_table_2,
sample_metadata=sample_metadata_md,
)
filtered_table_2_summ_viz <- action_results$visualization
qiime feature-table summarize \
--i-table filtered-table-2.qza \
--m-sample-metadata-file sample-metadata.tsv \
--o-visualization filtered-table-2-summ.qzv
use.action(
use.UsageAction(plugin_id='feature_table', action_id='summarize'),
use.UsageInputs(table=filtered_table_2, sample_metadata=sample_metadata),
use.UsageOutputNames(visualization='filtered_table_2_summ'),
)
- Using the
qiime2 feature-table summarize
tool: Set “table” to
#: filtered-table-2.qza
Expand the
additional options
sectionFor “sample_metadata”:
Press the
+ Insert sample_metadata
button to set up the next steps.Leave as
Metadata from TSV
Set “Metadata Source” to
sample-metadata.tsv
Press the
Execute
button.
- Once completed, for the new entry in your history, use the
Edit
button to set the name as follows: (Renaming is optional, but it will make any subsequent steps easier to complete.)
History Name
“Name” to set (be sure to press
Save
)#: qiime2 feature-table summarize [...] : visualization.qzv
filtered-table-2-summ.qzv
The final feature table resulting from this series of steps contains 406 samples and 2,458 features. Approximately 23,000,000 sequences are represented in this feature table.
Filter features from sequence data to reduce runtime of feature annotation#
At this point, we have filtered features from our feature table, but those features are still present in our sequence data. In the next section we’ll be performing some computationally expensive operations on these sequences, so to make those go quicker we’ll next filter all features that are no longer in our feature table from our collection of feature sequences.
filtered_sequences_1, = feature_table_actions.filter_seqs(
data=rep_seqs,
table=filtered_table_2,
)
action_results <- feature_table_actions$filter_seqs(
data=rep_seqs,
table=filtered_table_2,
)
filtered_sequences_1 <- action_results$filtered_data
qiime feature-table filter-seqs \
--i-data rep-seqs.qza \
--i-table filtered-table-2.qza \
--o-filtered-data filtered-sequences-1.qza
filtered_sequences_1, = use.action(
use.UsageAction(plugin_id='feature_table', action_id='filter_seqs'),
use.UsageInputs(data=feature_sequences, table=filtered_table_2),
use.UsageOutputNames(filtered_data='filtered_sequences_1')
)
- Using the
qiime2 feature-table filter-seqs
tool: Set “data” to
#: rep-seqs.qza
Expand the
additional options
sectionSet “table” to
#: filtered-table-2.qza
Press the
Execute
button.
- Once completed, for the new entry in your history, use the
Edit
button to set the name as follows: (Renaming is optional, but it will make any subsequent steps easier to complete.)
History Name
“Name” to set (be sure to press
Save
)#: qiime2 feature-table filter-seqs [...] : filtered_data.qza
filtered-sequences-1.qza