# Correlate biological and physical patterns

A number of methods are available to examine the correlation between biological and physical patterns. Two commonly used computer packages with a variety of multivariate methods are PRIMER and CANOCO.

**PRIMER**

PRIMER comprises a wide range of univariate, graphical and
multivariate routines for analysing biological and
physical/environmental data (Clarke and Warwick, 2001; Clarke and
Gorley, 2006). ‘BEST’ and ‘LINKTREE’ are two routines targeted at
linking multivariate biological patterns with single or multiple
environmental variables.

The BEST routine available in PRIMER v6 combines the BIO-ENV
and BVSTEP procedures found in PRIMER v5. BIO-ENV uses

*all*the available environmental variables to find the combination that ‘best explains’ the patterns in the biological data. However, when large numbers (>15 or 16) of environmental variables are used the procedure can become impractical, as computation time may be excessive. In such cases the BVSTEP option can be employed to carry out a stepwise search of the variables, employing both*forward selection*and*backward elimination*. Starting with the variable showing the maximum matching coefficient, variables are successively added, the combinations tested and (at each stage) the variable contributing least eliminated. Several iterations of the procedure are carried out from a random selection of (e.g., ≤6) variables to ensure that the ‘best’ match is found.The LINKTREE routine takes the combination of variables that
were identified as ‘best’ in BIO-ENV together with the faunal
inter-station similarities to find the most effective way of
describing the biological-environment relationships relative to the
successive use of single variables. Starting with the group of all
samples, it divides them into two groups (a binary split),
determined by the most influential environmental variable(s). So,
the first split could be on the grounds that the two resulting
groups are most dissimilar in terms of their salinity. By
iteratively repeating this procedure on the resulting groups, the
samples are divided into a number of groups, within which all the
samples have similar biological and physical characteristics.
Expressed more technically, the group of samples is successively
divided according to the environmental variable(s) that maximise
the separation between the groups in multidimensional space.
Sometimes more than one variable is determined at a split (if
variable each gives the same result). A statistical test is used to
examine the significance (5% level) before each split, with
division stopped when non-significant. An output value (B%, see
table) provides an absolute measure of group differences, and low
values occur when samples are most similar.

This is divisive clustering, as opposed to agglomerative in
cluster analysis, and inversions can sometimes occur in the
clustering pattern. Unlike BIO-ENV the environmental variables are
non-additive and one advantage is that a variable can be identified
as important in

*part*of the overall faunal distribution, yet not so in other parts (conversely, BIO-ENV examines the overall wider situation). The LINKTREE procedure also has potential for prediction: if the environmental conditions are known for a new sample station, then the LINKTREE results may allow it to be assigned to a particular assemblage or group of sites.**An example table of the part of the descriptive information for a LINKTREE analysis of benthic macrofaunal distributions in the Outer Bristol Channel (from Mackie et al., 2006)**

**CANOCO**

CANOCO is a computer program for

**CANO**nical**C**ommunity**O**rdination by (partial/detrended/canonical) correspondence analysis, principal components analysis and redundancy analysis (ter Braak, 1986 and 1988), that originated as an extension of DECORANA (Hill, 1979b). Over the last 20 years it has evolved to include a variety of multivariate ordination methods and the current version (4.5) is available with a Microsoft Windows interface (ter Braak and Smilauer*,*2002). Jongman*et al.*(1995) provide a detailed account of the theory and implementation of the various techniques.Ordinations, like cluster analysis, are ‘indirect’ methods of
analysing species-environment relationships since additional
procedures are necessary to correlate the biological patterns to
the environmental variables. Canonical (or constrained) analyses
overcome this by integrating ordination with regression.

The methods available fall into four categories:

**Unconstrained ordinations**describe the structure in a single data set**Canonical ordinations**explain one data set by another data set (ordinations are constrained by explanatory variables)**Partial ordinations**describe the structure in a data set after accounting for variation explained by a second data set (co-variable data)**Partial canonical****ordinations**explain one data set by another data set after accounting for variation by a third data set (co-variable data)

ter Braak and Verdonschot (1995) examine the use of Canonical
Correspondence Analysis (CCA) in aquatic ecology and this technique
is the most commonly used direct gradient analysis method. It has
been widely used in marine benthic situations, from the intertidal
to deep water (Ysebaert, and Herman

*,*2002; Narayanaswamy*et al., 20*03; Bergquist*et al.,*2005). In CCA the ordination axes are derived from linear combinations of the environmental variables such that the dispersion of the species (and sample) scores are maximised. Environmental variables are shown on the ordinations as arrows directed from the origin of the plot where the origin represents the grand mean for each variable. Longer arrows are more strongly correlated with the ordination axes than short ones.In the following example, CCA was
employed to investigate the species-environment relationships of
benthic polychaetes in the Irish Sea (Mackie

*et al.*, 1997). Forward selection of the variables revealed seven that ‘best’ explained the data. At each step, a Monte Carlo permutation test was used to determine the significance of each variable. The first five variables were highly significant (P<0.0001), the others less so (P<0.05). The seven variables collectively explained 34.75% of the total inertia.In the ordination, Axes I and II
were the most important accounting for 21.3% of the species
variance and 61.2% of that explained by the variables.

As can be seen by the ordination
plot and the correlation table, sediment gravel content was most
influential for axis I. Depth and latitude were most important in
defining axis II. Variables such as depth (and latitude) may
however be proxies for other co-varying factors (e.g. temperature,
pressure, currents) rather than the variable itself.

Although omitted from the CCA plot
displayed here, species can also be displayed. This can be on the
same plot alongside the sample stations, or (for clarity)
separately. The species displayed can be selected to those showing
the best relationships with the environmental factors. Likewise,
the species-environment relationships could be investigated further
through partial CCA. Oug (1998) demonstrated this in a study of the
benthic macrofauna near Tromsø, Norway.