R: Pairwise comparisons of lm.rrpp fits

pairwise

R Documentation

Pairwise comparisons of lm.rrpp fits

Description

Function generates distributions of pairwise statistics for a lm.rrpp fit and returns important statistics for hypothesis tests.

Usage

pairwise(
  fit,
  fit.null = NULL,
  groups,
  covariate = NULL,
  print.progress = FALSE
)

Arguments

`fit`	A linear model fit using `lm.rrpp`.
`fit.null`	An alternative linear model fit to use as a null model for RRPP, if the null model of the fit is not desired. Note, for FRPP this argument should remain NULL and FRPP must be established in the lm.rrpp fit (RRPP = FALSE). If the null model is uncertain, using `reveal.model.designs` will help elucidate the inherent null model used.
`groups`	A factor or vector that is coercible into a factor, describing the levels of the groups for which to find LS means or slopes. Normally this factor would be part of the model fit, but it is not necessary for that to be the case in order to obtain results.
`covariate`	A numeric vector for which to calculate slopes for comparison If NULL, LS means will be calculated instead of slopes. Normally this variable would be part of the model fit, but it is not necessary for that to be the case in order to obtain results.
`print.progress`	If a null model fit is provided, a logical value to indicate whether analytical results progress should be printed on screen. Unless large data sets are analyzed, this argument is probably not helpful.

Details

Based on an lm.rrpp fit, this function will find fitted values over all permutations and based on a grouping factor, calculate either least squares (LS) means or slopes, and pairwise statistics among them. Pairwise statistics have multiple flavors, related to vector attributes:

Distance between vectors, "dist" Vectors for LS means or slopes originate at the origin and point to some location, having both a magnitude and direction. A distance between two vectors is the inner-product of of the vector difference, i.e., the distance between their endpoints. For LS means, this distance is the difference between means. For multivariate slope vectors, this is the difference in location between estimated change for the dependent variables, per one-unit change of the covariate considered. For univariate slopes, this is the absolute difference between slopes.
Vector correlation, "VC" If LS mean or slope vectors are scaled to unit size, the vector correlation is the inner-product of the scaled vectors. The arccosine (acos) of this value is the angle between vectors, which can be expressed in radians or degrees. Vector correlation indicates the similarity of vector orientation, independent of vector length.
Difference in vector lengths, "DL" If the length of a vector is an important attribute – e.g., the amount of multivariate change per one-unit change in a covariate – then the absolute value of the difference in vector lengths is a practical statistic to compare vector lengths. Let d1 and d2 be the distances (length) of vectors. Then |d1 - d2| is a statistic that compares their lengths. For slope vectors, this is a comparison of rates.
Variance, "var Vectors of residuals from a linear model indicate can express the distances of observed values from fitted values. Mean squared distances of values (variance), by group, can be used to measure the amount of dispersion around estimated values for groups. Absolute differences between variances are used as test statistics to compare mean dispersion of values among groups. Variance degrees of freedom equal n, the group size, rather than n-1, as the purpose is to compare mean dispersion in the sample. (Additionally, tests with one subject in a group are possible, or at least not a hindrance to the analysis.)

The summary.pairwise function is used to select a test statistic for the statistics described above, as "dist", "VC", "DL", and "var", respectively. If vector correlation is tested, the angle.type argument can be used to choose between radians and degrees.

The null model is defined via lm.rrpp, but one can also use an alternative null model as an optional argument. In this case, residual randomization in the permutation procedure (RRPP) will be performed using the alternative null model to generate fitted values. If full randomization of values (FRPP) is preferred, it must be established in the lm.rrpp fit and an alternative model should not be chosen. If one is unsure about the inherent null model used if an alternative is not specified as an argument, the function reveal.model.designs can be used.

Observed statistics, effect sizes, P-values, and one-tailed confidence limits based on the confidence requested will be summarized with the summary.pairwise function. Confidence limits are inherently one-tailed as the statistics are similar to absolute values. For example, a distance is analogous to an absolute difference. Therefore, the one-tailed confidence limits are more akin to two-tailed hypothesis tests. (A comparable example is to use the absolute value of a t-statistic, in which case the distribution has a lower bound of 0.)

Notes for RRPP 0.6.2 and subsequent versions

In previous versions of pairwise, codesummary.pairwise had three test types: "dist", "VC", and "var". When one chose "dist", for LS mean vectors, the statistic was the inner-product of the vector difference. For slope vectors, "dist" returned the absolute value of the difference between vector lengths, which is "DL" in 0.6.2 and subsequent versions. This update uses the same calculation, irrespective of vector types. Generally, "DL" is the same as a contrast in rates for slope vectors, but might not have much meaning for LS means. Likewise, "dist" is the distance between vector endpoints, which might make more sense for LS means than slope vectors. Nevertheless, the user has more control over these decisions with version 0.6.2 and subsequent versions.

Value

An object of class pairwise is a list containing the following

`LS.means`	LS means for groups, across permutations.
`slopes`	Slopes for groups, across permutations.
`means.dist`	Pairwise distances between means, across permutations.
`means.vec.cor`	Pairwise vector correlations between means, across permutations.
`means.lengths`	LS means vector lengths, by group, across permutations.
`means.diff.length`	Pairwise absolute differences between mean vector lengths, across permutations.
`slopes.dist`	Pairwise distances between slopes (end-points), across permutations.
`slopes.vec.cor`	Pairwise vector correlations between slope vectors, across permutations.
`slopes.lengths`	Slope vector lengths, by group, across permutations.
`slopes.diff.length`	Pairwise absolute differences between slope vector lengths, across permutations.
`n`	Sample size
`p`	Data dimensions; i.e., variable number
`PermInfo`	Information for random permutations, passed on from lm.rrpp fit and possibly modified if an alternative null model was used.

Author(s)

Michael Collyer

References

Collyer, M.L., D.J. Sekora, and D.C. Adams. 2015. A method for analysis of phenotypic change for phenotypes described by high-dimensional data. Heredity. 115:357-365.

Adams, D.C and M.L. Collyer. 2018. Multivariate phylogenetic ANOVA: group-clade aggregation, biological challenges, and a refined permutation procedure. Evolution. In press.

Examples


# Examples use geometric morphometric data on pupfishes
# See the package, geomorph, for details about obtaining such data

# Body Shape Analysis (Multivariate) --------------

data("Pupfish")

# Note:

dim(Pupfish$coords) # highly multivariate!

Pupfish$logSize <- log(Pupfish$CS) 

# Note: one should use all dimensions of the data but with this 
# example, there are many.  Thus, only three principal components 
# will be used for demonstration purposes.

Pupfish$Y <- ordinate(Pupfish$coords)$x[, 1:3]

## Pairwise comparisons of LS means

# Note: one should increase RRPP iterations but a 
# smaller number is used here for demonstration 
# efficiency.  Generally, iter = 999 will take less
# than 1s for these examples with a modern computer.

fit1 <- lm.rrpp(Y ~ logSize + Sex * Pop, SS.type = "I", 
data = Pupfish, print.progress = FALSE, iter = 199) 
summary(fit1, formula = FALSE)
anova(fit1) 

pup.group <- interaction(Pupfish$Sex, Pupfish$Pop)
pup.group
PW1 <- pairwise(fit1, groups = pup.group)
PW1

# distances between means
summary(PW1, confidence = 0.95, test.type = "dist") 
summary(PW1, confidence = 0.95, test.type = "dist", stat.table = FALSE)

# absolute difference between mean vector lengths
summary(PW1, confidence = 0.95, test.type = "DL") 

# correlation between mean vectors (angles in degrees)
summary(PW1, confidence = 0.95, test.type = "VC", 
   angle.type = "deg") 

# Can also compare the dispersion around means
summary(PW1, confidence = 0.95, test.type = "var")

## Pairwise comparisons of slopes

fit2 <- lm.rrpp(Y ~ logSize * Sex * Pop, SS.type = "I", 
data = Pupfish, print.progress = FALSE, iter = 199) 
summary(fit2, formula = FALSE)
anova(fit1, fit2)

# Using a null fit that excludes all factor-covariate 
# interactions, not just the last one  

PW2 <- pairwise(fit2, fit.null = fit1, groups = pup.group, 
covariate = Pupfish$logSize, print.progress = FALSE) 
PW2

# distances between slope vectors (end-points)
summary(PW2, confidence = 0.95, test.type = "dist") 
summary(PW2, confidence = 0.95, test.type = "dist", stat.table = FALSE)

# absolute difference between slope vector lengths
summary(PW2, confidence = 0.95, test.type = "DL") 

# correlation between slope vectors (and angles)
summary(PW2, confidence = 0.95, test.type = "VC",
   angle.type = "deg") 
   
# Can also compare the dispersion around group slopes
summary(PW2, confidence = 0.95, test.type = "var")