Alpha diversity visualizations

Alpha diversity visualizations

Alpha group significance

First we’ll look for general patterns, by comparing different categorical groupings of samples to see if there is some relationship to richness.

To start with, we’ll examine ‘observed features’:

alpha_group_sig_obs_feats_viz, = diversity_actions.alpha_group_significance(
    alpha_diversity=observed_features_vector,
    metadata=sample_metadata_md,
)
qiime diversity alpha-group-significance \
  --i-alpha-diversity diversity-core-metrics-phylogenetic/observed_features_vector.qza \
  --m-metadata-file sample-metadata.tsv \
  --o-visualization alpha-group-sig-obs-feats.qzv
use.action(
    use.UsageAction('diversity', 'alpha_group_significance'),
    use.UsageInputs(
        alpha_diversity=core_metrics_results.observed_features_vector,
        metadata=sample_metadata),
    use.UsageOutputNames(visualization='alpha_group_sig_obs_feats')
)
Using the qiime2 diversity alpha-group-significance tool:
  1. Set “alpha_diversity” to #: qiime2 diversity core-metrics-phylogenetic [...] : observed_features_vector.qza

  2. For “metadata”:

    • Perform the following steps.

      1. Leave as Metadata from TSV

      2. Set “Metadata Source” to sample-metadata.tsv

  3. Press the Execute button.

Once completed, for the new entry in your history, use the Edit button to set the name as follows:

(Renaming is optional, but it will make any subsequent steps easier to complete.)

History Name

“Name” to set (be sure to press Save)

#: qiime2 diversity alpha-group-significance [...] : visualization.qzv

alpha-group-sig-obs-feats.qzv

The first thing to notice is the high variability in each individual’s richness (PatientID). The centers and spreads of the individual distributions are likely to obscure other effects, so we will want to keep this in mind. Additionally, we have repeated measures of each individual, so we are violating independence assumptions when looking at other categories. (Kruskal-Wallis is a non-parameteric test, but like most tests, still requires samples to be independent.)

Keeping in mind that other categories are probably inconclusive, we notice that there are (amusingly, and somewhat reassuringly) differences in stool consistency (solid vs non-solid).

Because these data were derived from a study in which participants recieved auto-fecal microbiota transplant, we may also be interested in whether there was a difference in richness between the control group and the auto-FMT goup.

Looking at autoFmtGroup we see that there is no apparent difference, but we also know that we are violating independence with our repeated measures, and all patients recieved a bone-marrow transplant which may be a stronger effect. (The goal of the auto-FMT was to mitigate the impact of the marrow transplant.)

We will use a more advanced statistical model to explore this question.

Linear Mixed Effects

In order to manage the repeated measures, we will use a linear mixed-effects model. In a mixed-effects model, we combine fixed-effects (your typical linear regression coefficients) with random-effects. These random effects are some (ostensibly random) per-group coefficient which minimizes the error within that group. In our situation, we would want our random effect to be the PatientID as we can see each subject has a different baseline for richness (and we have multiple measures for each patient). By making that a random effect, we can more accurately ascribe associations to the fixed effects as we treat each sample as a “draw” from a per-group distribution.

There are several ways to create a linear model with random effects, but we will be using a random-intercept, which allows for the per-subject intercept to take on a different average from the population intercept (modeling what we saw in the group-significance plot above).

First let’s evaluate the general trend of the transplant.

import qiime2.plugins.longitudinal.actions as longitudinal_actions

md_observed_features_md = observed_features_vector.view(Metadata)
merged_observed_features_md = sample_metadata_md.merge(md_observed_features_md)
lme_obs_features_HCT_viz, = longitudinal_actions.linear_mixed_effects(
    metadata=merged_observed_features_md,
    state_column='DayRelativeToNearestHCT',
    individual_id_column='PatientID',
    metric='observed_features',
)
qiime longitudinal linear-mixed-effects \
  --m-metadata-file sample-metadata.tsv diversity-core-metrics-phylogenetic/observed_features_vector.qza \
  --p-state-column DayRelativeToNearestHCT \
  --p-individual-id-column PatientID \
  --p-metric observed_features \
  --o-visualization lme-obs-features-HCT.qzv
md_observed_features = use.view_as_metadata(
    'md_observed_features',
     core_metrics_results.observed_features_vector)

merged_observed_features = use.merge_metadata(
    'merged_observed_features',
    sample_metadata,
    md_observed_features)

use.action(
    use.UsageAction('longitudinal', 'linear_mixed_effects'),
    use.UsageInputs(metadata=merged_observed_features,
                    state_column='DayRelativeToNearestHCT',
                    individual_id_column='PatientID',
                    metric='observed_features'),
    use.UsageOutputNames(visualization='lme-obs-features-HCT')
)
Using the qiime2 longitudinal linear-mixed-effects tool:
  1. For “metadata”:

    1. Perform the following steps.

      1. Leave as Metadata from TSV

      2. Set “Metadata Source” to sample-metadata.tsv

    2. Press the + Insert metadata button to set up the next steps.

      1. Change to Metadata from Artifact

      2. Set “Metadata Source” to qiime2 diversity core-metrics-phylogenetic [...] : observed_features_vector.qza

  2. Set “state_column” to DayRelativeToNearestHCT

  3. Set “individual_id_column” to PatientID

  4. Expand the additional options section

    • Set “metric” to observed_features

  5. Press the Execute button.

Once completed, for the new entry in your history, use the Edit button to set the name as follows:

(Renaming is optional, but it will make any subsequent steps easier to complete.)

History Name

“Name” to set (be sure to press Save)

#: qiime2 longitudinal linear-mixed-effects [...] : visualization.qzv

lme-obs-features-HCT.qzv

Here we see a significant association between richness and the bone marrow transplant.

We may also be interested in the effect of the auto fecal microbiota transplant. It should be known that these are generally correlated, so choosing one model over the other will require external knowledge.

lme_obs_features_FMT_viz, = longitudinal_actions.linear_mixed_effects(
    metadata=merged_observed_features_md,
    state_column='day-relative-to-fmt',
    individual_id_column='PatientID',
    metric='observed_features',
)
qiime longitudinal linear-mixed-effects \
  --m-metadata-file sample-metadata.tsv diversity-core-metrics-phylogenetic/observed_features_vector.qza \
  --p-state-column day-relative-to-fmt \
  --p-individual-id-column PatientID \
  --p-metric observed_features \
  --o-visualization lme-obs-features-FMT.qzv
use.action(
    use.UsageAction('longitudinal', 'linear_mixed_effects'),
    use.UsageInputs(metadata=merged_observed_features,
                    state_column='day-relative-to-fmt',
                    individual_id_column='PatientID',
                    metric='observed_features'),
    use.UsageOutputNames(visualization='lme-obs-features-FMT')
)
Using the qiime2 longitudinal linear-mixed-effects tool:
  1. For “metadata”:

    1. Perform the following steps.

      1. Leave as Metadata from TSV

      2. Set “Metadata Source” to sample-metadata.tsv

    2. Press the + Insert metadata button to set up the next steps.

      1. Change to Metadata from Artifact

      2. Set “Metadata Source” to qiime2 diversity core-metrics-phylogenetic [...] : observed_features_vector.qza

  2. Set “state_column” to day-relative-to-fmt

  3. Set “individual_id_column” to PatientID

  4. Expand the additional options section

    • Set “metric” to observed_features

  5. Press the Execute button.

Once completed, for the new entry in your history, use the Edit button to set the name as follows:

(Renaming is optional, but it will make any subsequent steps easier to complete.)

History Name

“Name” to set (be sure to press Save)

#: qiime2 longitudinal linear-mixed-effects [...] : visualization.qzv

lme-obs-features-FMT.qzv

We also see a downward trend from the FMT. Since the goal of the FMT was to ameliorate the impact of the bone marrow transplant protocol (which involves an extreme course of antibiotics) on gut health, and the timing of the FMT is related to the timing of the marrow transplant, we might deduce that the negative coefficient is primarily related to the bone marrow transplant procedure. (We can’t prove this with statistics alone however, in this case, we are using abductive reasoning).

Looking at the log-likelihood, we also note that the HCT result is slightly better than the FMT in accounting for the loss of richness. But only slightly, if we were to perform model testing it may not prove significant.

In any case, we can ask a more targeted question to identify if the FMT was useful in recovering richness.

By adding the autoFmtGroup to our linear model, we can see if there are different slopes for the two groups, based on an interaction term.

lme_obs_features_treatmentVScontrol_viz, = longitudinal_actions.linear_mixed_effects(
    metadata=merged_observed_features_md,
    state_column='day-relative-to-fmt',
    group_columns='autoFmtGroup',
    individual_id_column='PatientID',
    metric='observed_features',
)
qiime longitudinal linear-mixed-effects \
  --m-metadata-file sample-metadata.tsv diversity-core-metrics-phylogenetic/observed_features_vector.qza \
  --p-state-column day-relative-to-fmt \
  --p-group-columns autoFmtGroup \
  --p-individual-id-column PatientID \
  --p-metric observed_features \
  --o-visualization lme-obs-features-treatmentVScontrol.qzv
use.action(
    use.UsageAction('longitudinal', 'linear_mixed_effects'),
    use.UsageInputs(metadata=merged_observed_features,
                    state_column='day-relative-to-fmt',
                    group_columns='autoFmtGroup',
                    individual_id_column='PatientID',
                    metric='observed_features'),
    use.UsageOutputNames(visualization='lme-obs-features-treatmentVScontrol')
)
Using the qiime2 longitudinal linear-mixed-effects tool:
  1. For “metadata”:

    1. Perform the following steps.

      1. Leave as Metadata from TSV

      2. Set “Metadata Source” to sample-metadata.tsv

    2. Press the + Insert metadata button to set up the next steps.

      1. Change to Metadata from Artifact

      2. Set “Metadata Source” to qiime2 diversity core-metrics-phylogenetic [...] : observed_features_vector.qza

  2. Set “state_column” to day-relative-to-fmt

  3. Set “individual_id_column” to PatientID

  4. Expand the additional options section

    1. Set “metric” to observed_features

    2. Set “group_columns” to autoFmtGroup

  5. Press the Execute button.

Once completed, for the new entry in your history, use the Edit button to set the name as follows:

(Renaming is optional, but it will make any subsequent steps easier to complete.)

History Name

“Name” to set (be sure to press Save)

#: qiime2 longitudinal linear-mixed-effects [...] : visualization.qzv

lme-obs-features-treatmentVScontrol.qzv

Here we see that the autoFmtGroup is not on its own a significant predictor of richness, but its interaction term with Q('day-relative-to-fmt') is. This implies that there are different slopes between these groups, and we note that given the coding of Q('day-relative-to-fmt'):autoFmtGroup[T.treatment] we have a positive coefficient which counteracts (to a degree) the negative coefficient of Q('day-relative-to-fmt').

Warning

The slopes of the regression scatterplot in this visualization are not the same fit as our mixed-effects model in the table. They are a naive OLS fit to give a sense of the situation. A proper visualization would be a partial regression plot which can condition on other terms to show some effect in “isolation”. This is not currently implemented in QIIME 2.

Exercise 5

How would you test the above models with a different diversity index, such as Faith’s Phylogenetic Diversity?