Filtering feature tables#

We’ll next obtain a much larger feature table representing all of the samples included in the ([LTC+21]) dataset. These would take too much time to denoise in this course, so we’ll start with the feature table, sequences, and metadata provided by the authors and filter to samples that we’ll use for our analyses. If you’d like to perform other experiments with this feature table, you can do that using the full feature table or a subset that you define by filtering.

Access the data#

First, download the full feature table.

url = 'https://docs.qiime2.org/jupyterbooks/cancer-microbiome-intervention-tutorial/data/030-tutorial-downstream/010-filtering/feature-table.qza'
fn = 'feature-table.qza'
request.urlretrieve(url, fn)
feature_table = Artifact.load(fn)
url <- 'https://docs.qiime2.org/jupyterbooks/cancer-microbiome-intervention-tutorial/data/030-tutorial-downstream/010-filtering/feature-table.qza'
fn <- 'feature-table.qza'
request$urlretrieve(url, fn)
feature_table <- Artifact$load(fn)
wget \
  -O 'feature-table.qza' \
  'https://docs.qiime2.org/jupyterbooks/cancer-microbiome-intervention-tutorial/data/030-tutorial-downstream/010-filtering/feature-table.qza'
feature_table_url = 'https://data.qiime2.org/2024.5/tutorials/liao/full-feature-table.qza'

def artifact_from_url(url):
    def factory():
        import tempfile
        import requests
        import qiime2

        data = requests.get(url)

        with tempfile.NamedTemporaryFile() as f:
            f.write(data.content)
            f.flush()
            result = qiime2.Artifact.load(f.name)

        return result
    return factory

feature_table = use.init_artifact(
        'feature-table',
        artifact_from_url(feature_table_url))
Using the Upload Data tool:
  1. On the first tab (Regular), press the Paste/Fetch data button at the bottom.

    1. Set “Name” (first text-field) to: feature-table.qza

    2. In the larger text-area, copy-and-paste: https://docs.qiime2.org/jupyterbooks/cancer-microbiome-intervention-tutorial/data/030-tutorial-downstream/010-filtering/feature-table.qza

    3. (“Type”, “Genome”, and “Settings” can be ignored)

  2. Press the Start button at the bottom.

Next, download the ASV sequences.

url = 'https://docs.qiime2.org/jupyterbooks/cancer-microbiome-intervention-tutorial/data/030-tutorial-downstream/010-filtering/rep-seqs.qza'
fn = 'rep-seqs.qza'
request.urlretrieve(url, fn)
rep_seqs = Artifact.load(fn)
url <- 'https://docs.qiime2.org/jupyterbooks/cancer-microbiome-intervention-tutorial/data/030-tutorial-downstream/010-filtering/rep-seqs.qza'
fn <- 'rep-seqs.qza'
request$urlretrieve(url, fn)
rep_seqs <- Artifact$load(fn)
wget \
  -O 'rep-seqs.qza' \
  'https://docs.qiime2.org/jupyterbooks/cancer-microbiome-intervention-tutorial/data/030-tutorial-downstream/010-filtering/rep-seqs.qza'
seqs_url = 'https://data.qiime2.org/2024.5/tutorials/liao/rep-seqs.qza'

feature_sequences = use.init_artifact(
    'rep-seqs',
    artifact_from_url(seqs_url))
Using the Upload Data tool:
  1. On the first tab (Regular), press the Paste/Fetch data button at the bottom.

    1. Set “Name” (first text-field) to: rep-seqs.qza

    2. In the larger text-area, copy-and-paste: https://docs.qiime2.org/jupyterbooks/cancer-microbiome-intervention-tutorial/data/030-tutorial-downstream/010-filtering/rep-seqs.qza

    3. (“Type”, “Genome”, and “Settings” can be ignored)

  2. Press the Start button at the bottom.

View the metadata#

We’ll take a quick look at the QIIME 2-formatted study metadata to refresh our memories. Either review the summary that you previously generated, or generate another one.

Generate summaries of full table and sequence data#

Next, it’s useful to generate summaries of the feature table and sequence data. We did this after running DADA2 previously, but since we’re now working with a new feature table and new sequence data, we should look at a summary of this table as well.

table_viz, = feature_table_actions.summarize(
    table=feature_table,
    sample_metadata=sample_metadata_md,
)
rep_seqs_viz, = feature_table_actions.tabulate_seqs(
    data=rep_seqs,
)
action_results <- feature_table_actions$summarize(
    table=feature_table,
    sample_metadata=sample_metadata_md,
)
table_viz <- action_results$visualization
action_results <- feature_table_actions$tabulate_seqs(
    data=rep_seqs,
)
rep_seqs_viz <- action_results$visualization
qiime feature-table summarize \
  --i-table feature-table.qza \
  --m-sample-metadata-file sample-metadata.tsv \
  --o-visualization table.qzv
qiime feature-table tabulate-seqs \
  --i-data rep-seqs.qza \
  --o-visualization rep-seqs.qzv
use.action(
    use.UsageAction(plugin_id='feature_table', action_id='summarize'),
    use.UsageInputs(table=feature_table, sample_metadata=sample_metadata),
    use.UsageOutputNames(visualization='table'),
)

use.action(
    use.UsageAction(plugin_id='feature_table', action_id='tabulate_seqs'),
    use.UsageInputs(data=feature_sequences),
    use.UsageOutputNames(visualization='rep_seqs'),
)
Using the qiime2 feature-table summarize tool:
  1. Set “table” to #: feature-table.qza

  2. Expand the additional options section

    • For “sample_metadata”:

      • Press the + Insert sample_metadata button to set up the next steps.

        1. Leave as Metadata from TSV

        2. Set “Metadata Source” to sample-metadata.tsv

  3. Press the Execute button.

Once completed, for the new entry in your history, use the Edit button to set the name as follows:

(Renaming is optional, but it will make any subsequent steps easier to complete.)

History Name

“Name” to set (be sure to press Save)

#: qiime2 feature-table summarize [...] : visualization.qzv

table.qzv

Using the qiime2 feature-table tabulate-seqs tool:
  1. Set “data” to #: rep-seqs.qza

  2. Press the Execute button.

Once completed, for the new entry in your history, use the Edit button to set the name as follows:

(Renaming is optional, but it will make any subsequent steps easier to complete.)

History Name

“Name” to set (be sure to press Save)

#: qiime2 feature-table tabulate-seqs [...] : visualization.qzv

rep-seqs.qzv

Exercise 1

Which column or columns in the metadata could be used to identify samples that were included in the autoFMT study?

Filter the feature table to the autoFMT study samples#

In this tutorial, we’re going to work specifically with samples that were included in the autoFMT randomized trial. We’ll now begin a series of filtering steps applied to both the feature table and the sequences to select only features and samples that are relevant to that study.

First, we’ll remove samples that are not part of the autoFMT study from the feature table. We identify these samples using the metadata. Specifically, this step filters samples that do not contain a value in the autoFmtGroup column in the metadata.

autofmt_table, = feature_table_actions.filter_samples(
    table=feature_table,
    metadata=sample_metadata_md,
    where='autoFmtGroup IS NOT NULL',
)
action_results <- feature_table_actions$filter_samples(
    table=feature_table,
    metadata=sample_metadata_md,
    where='autoFmtGroup IS NOT NULL',
)
autofmt_table <- action_results$filtered_table
qiime feature-table filter-samples \
  --i-table feature-table.qza \
  --m-metadata-file sample-metadata.tsv \
  --p-where 'autoFmtGroup IS NOT NULL' \
  --o-filtered-table autofmt-table.qza
autofmt_table, = use.action(
    use.UsageAction(plugin_id='feature_table', action_id='filter_samples'),
    use.UsageInputs(table=feature_table, metadata=sample_metadata,
                    where="autoFmtGroup IS NOT NULL"),
    use.UsageOutputNames(filtered_table='autofmt_table')
)
Using the qiime2 feature-table filter-samples tool:
  1. Set “table” to #: feature-table.qza

  2. Expand the additional options section

    1. For “metadata”:

      • Press the + Insert metadata button to set up the next steps.

        1. Leave as Metadata from TSV

        2. Set “Metadata Source” to sample-metadata.tsv

    2. Set “where” to autoFmtGroup IS NOT NULL

  3. Press the Execute button.

Once completed, for the new entry in your history, use the Edit button to set the name as follows:

(Renaming is optional, but it will make any subsequent steps easier to complete.)

History Name

“Name” to set (be sure to press Save)

#: qiime2 feature-table filter-samples [...] : filtered_table.qza

autofmt-table.qza

We can now summarize the feature table again to observe how it changed as a result of this first filtering step.

autofmt_table_summ_viz, = feature_table_actions.summarize(
    table=autofmt_table,
    sample_metadata=sample_metadata_md,
)
action_results <- feature_table_actions$summarize(
    table=autofmt_table,
    sample_metadata=sample_metadata_md,
)
autofmt_table_summ_viz <- action_results$visualization
qiime feature-table summarize \
  --i-table autofmt-table.qza \
  --m-sample-metadata-file sample-metadata.tsv \
  --o-visualization autofmt-table-summ.qzv
use.action(
    use.UsageAction(plugin_id='feature_table', action_id='summarize'),
    use.UsageInputs(table=autofmt_table, sample_metadata=sample_metadata),
    use.UsageOutputNames(visualization='autofmt_table_summ'),
)
Using the qiime2 feature-table summarize tool:
  1. Set “table” to #: autofmt-table.qza

  2. Expand the additional options section

    • For “sample_metadata”:

      • Press the + Insert sample_metadata button to set up the next steps.

        1. Leave as Metadata from TSV

        2. Set “Metadata Source” to sample-metadata.tsv

  3. Press the Execute button.

Once completed, for the new entry in your history, use the Edit button to set the name as follows:

(Renaming is optional, but it will make any subsequent steps easier to complete.)

History Name

“Name” to set (be sure to press Save)

#: qiime2 feature-table summarize [...] : visualization.qzv

autofmt-table-summ.qzv

Exercise 2

How many samples and features are in this feature table after filtering? How does that compare to the feature table prior to filtering?

Perform additional filtering steps on feature table#

Before we proceed with the analysis, we’ll apply a few more filtering steps.

First, we’re going to focus in on a specific window of time - mainly the ten days prior to the patients cell transplant through seventy days following the transplant. Some of the subjects in this study have very long-term microbiota data, but since many don’t it helps to just focus our analysis on the temporal range that is most relevant to this analysis.

filtered_table_1, = feature_table_actions.filter_samples(
    table=autofmt_table,
    metadata=sample_metadata_md,
    where='DayRelativeToNearestHCT BETWEEN -10 AND 70',
)
action_results <- feature_table_actions$filter_samples(
    table=autofmt_table,
    metadata=sample_metadata_md,
    where='DayRelativeToNearestHCT BETWEEN -10 AND 70',
)
filtered_table_1 <- action_results$filtered_table
qiime feature-table filter-samples \
  --i-table autofmt-table.qza \
  --m-metadata-file sample-metadata.tsv \
  --p-where 'DayRelativeToNearestHCT BETWEEN -10 AND 70' \
  --o-filtered-table filtered-table-1.qza
filtered_table_1, = use.action(
    use.UsageAction(plugin_id='feature_table', action_id='filter_samples'),
    use.UsageInputs(table=autofmt_table, metadata=sample_metadata,
                    where="DayRelativeToNearestHCT BETWEEN -10 AND 70"),
    use.UsageOutputNames(filtered_table='filtered_table_1')
)
Using the qiime2 feature-table filter-samples tool:
  1. Set “table” to #: autofmt-table.qza

  2. Expand the additional options section

    1. For “metadata”:

      • Press the + Insert metadata button to set up the next steps.

        1. Leave as Metadata from TSV

        2. Set “Metadata Source” to sample-metadata.tsv

    2. Set “where” to DayRelativeToNearestHCT BETWEEN -10 AND 70

  3. Press the Execute button.

Once completed, for the new entry in your history, use the Edit button to set the name as follows:

(Renaming is optional, but it will make any subsequent steps easier to complete.)

History Name

“Name” to set (be sure to press Save)

#: qiime2 feature-table filter-samples [...] : filtered_table.qza

filtered-table-1.qza

Finally, we’ll filter features from the feature table if they don’t occur in at least two samples. This filter is used here primarily to reduce the runtime of some of the downstream steps for the purpose of this tutorial. This filter isn’t necessary to run in your own analyses.

filtered_table_2, = feature_table_actions.filter_features(
    table=filtered_table_1,
    min_samples=2,
)
action_results <- feature_table_actions$filter_features(
    table=filtered_table_1,
    min_samples=2L,
)
filtered_table_2 <- action_results$filtered_table
qiime feature-table filter-features \
  --i-table filtered-table-1.qza \
  --p-min-samples 2 \
  --o-filtered-table filtered-table-2.qza
filtered_table_2, = use.action(
    use.UsageAction(plugin_id='feature_table', action_id='filter_features'),
    use.UsageInputs(table=filtered_table_1, min_samples=2),
    use.UsageOutputNames(filtered_table='filtered_table_2')
    )
Using the qiime2 feature-table filter-features tool:
  1. Set “table” to #: filtered-table-1.qza

  2. Expand the additional options section

    • Set “min_samples” to 2

  3. Press the Execute button.

Once completed, for the new entry in your history, use the Edit button to set the name as follows:

(Renaming is optional, but it will make any subsequent steps easier to complete.)

History Name

“Name” to set (be sure to press Save)

#: qiime2 feature-table filter-features [...] : filtered_table.qza

filtered-table-2.qza

Exercise 3

Generate a summary of this latest filtered feature table on your own (expand this box for help if necessary). How many samples and features are in this feature table?

Filter features from sequence data to reduce runtime of feature annotation#

At this point, we have filtered features from our feature table, but those features are still present in our sequence data. In the next section we’ll be performing some computationally expensive operations on these sequences, so to make those go quicker we’ll next filter all features that are no longer in our feature table from our collection of feature sequences.

filtered_sequences_1, = feature_table_actions.filter_seqs(
    data=rep_seqs,
    table=filtered_table_2,
)
action_results <- feature_table_actions$filter_seqs(
    data=rep_seqs,
    table=filtered_table_2,
)
filtered_sequences_1 <- action_results$filtered_data
qiime feature-table filter-seqs \
  --i-data rep-seqs.qza \
  --i-table filtered-table-2.qza \
  --o-filtered-data filtered-sequences-1.qza
filtered_sequences_1, = use.action(
    use.UsageAction(plugin_id='feature_table', action_id='filter_seqs'),
    use.UsageInputs(data=feature_sequences, table=filtered_table_2),
    use.UsageOutputNames(filtered_data='filtered_sequences_1')
    )
Using the qiime2 feature-table filter-seqs tool:
  1. Set “data” to #: rep-seqs.qza

  2. Expand the additional options section

    • Set “table” to #: filtered-table-2.qza

  3. Press the Execute button.

Once completed, for the new entry in your history, use the Edit button to set the name as follows:

(Renaming is optional, but it will make any subsequent steps easier to complete.)

History Name

“Name” to set (be sure to press Save)

#: qiime2 feature-table filter-seqs [...] : filtered_data.qza

filtered-sequences-1.qza