### Summary and Discussion of Ecological Formulas

This post is a continuation of the previous two posts which came from William James Tychonievich's excellent post "Calculating beta diversity."  I want to write this post to describe these issues in an intuitive and "big picture" manner and in such a way that the earlier posts, which give the details, can be skipped.  In those posts, I derived two formulas:

and

each of which relates γ (the probability of picking two different trees from a population of trees sorted into forests) to of other variables.  α is the average probability of picking two different trees from the same forest, δ is the average probability of picking two different trees if each one comes from a different forsests, and F is the number of forests.  δ was also described by William James Tychonievich in the original post, where it was called Approach 3.  β is the average proportion that the composition of a  pair of does not overlap.  For example, this picture by William James Tychonievich shows an example where β = 0.5 for two forests:

Here is another picture showing the same situation in a different way:
B is the spectrum of values that beta can take over all pairs of forests, so Var(B), the variance of B is a measure of the spread of the values of B, i.e., how close these values are to β, the mean.

Both of these formulas were derived under the assumption that every forest has the same number of trees, but they do not depend on the number of species of trees, or the particular composition of each forest.  If we allow forests of different sizes, then we will have to follow the suggestion given in comments by John Goes and William James's brother Luther from the original post and use weighted averages.  This is because alphas for larger forests will have a greater influence on gamma than those of smaller forests and likewise, the influence of beta and delta values from pairs of large forests will have a greater influence than beta and delta values for pairs of small forests.

Discussion of Formulas:

In this part of the post, I am going to discuss what these formulas tell us about the relationships of these measures of ecological diversity.

Formula 1:
This formula expresses γ in terms of α and δ and this makes sense because of how γ is defined.  γ is the probability of selecting two different trees from the population as a whole and there are two ways to do this.  We can either select two different trees from the same forest or from two different forests.   And our equation has terms expressing the average probabilities of both of these options.

Aside: If we wanted to know the probability of selecting three different trees, then I suspect that there would be an equation with three terms.  There are three ways to select three different trees: all three trees from the same forest, each tree from a different forest, or two trees from one forest and one from another.  We would probably have terms expressing the average probability of these three options, somehow related to the number of forests.

The derviation of this formula involves first expressing γ, α for each forest, and δ for each pair of forests and using algebra to substitute expressions for the individual alphas and deltas into the formula for gamma.

This formula tells us that if the number of forests stays constant, then as either α or δ increases, γ increases as well.  And this makes sense because if there is a greater chance of selecting two different trees either for a single forest or pairs of forests, then the probability of selecting two different trees from the forest as a whole should rise.

Also, if α and δ are constant, then as the number of forests increases, γ decreases.  In other words, if individual forests have less variety of species, then as the number of forests increases, the population as a whole will have less variety.

Lastly, we see that the coefficient of δ is F-1, so as the number of forests increases, α has a much smaller effect on γ than δ.  We would expect this because if there are a large number of forests, then there will be many more ways to pick two different trees from different forests than from the same forest.

Formula 2:

This is a somewhat strange formula.  I was not expecting it and certainly did not expect to see the variance of the different betas involved.  This is an inequality, not an equation, but it shows that γ is larger than the expression on the right.  The main idea is that if we calculate β according to William James Tychonievich's slice-matching method, then there is a relationship between beta for a pair of forests and delta for a pair of forests.  Give a pair of forests, if we pick one of the trees from one of the non-overlapping regions from one forest and one tree comes the other, then we are guaranteed that these two trees will be of different species.  If we have the pair forest j and forest i, then the relationship can be expresed by the inequality:

Where the left hand side is delta for this pair and the right hand side contains the beta for this pair.  After this, one has to get an expression in terms of delta to substitute into the original equation.

This formula tells us that β has a direct relationship with γ.  As β increases or decreases, γ increases or decreases as well.  And this makes sense because if the composition of forests becomes more similar, then the probability of picking two different trees will decrease.  In addition, as β decreases, the contribution of the second term shrinks, so α has more influence on γ.  Likewise, Var(B) has a direct relationship with γ.  If there is a larger spread in the different beta values, then γ will increase.  Also, as Var(B) increases then the effect of α on γ becomes greater as the second term shrinks.

What is interesting about this inequality is that it shows one of the formulas given in the original post:  β = γ - α, which we can rewrite as γ = α + β, correctly expresses the relationships among these variables.  α and β both have a direct relationship with γ and γ depends on both α and β.  The above inequality helps by allowing us to calculate more precisely the effect of these variables on each other.