
The appropriate role for inferential statistics in metaanalysis is not merely unclear, it is seen quite differently by different methodologists.
In 1981, in the first extended discussion of the topic, Glass, McGaw and Smith raised doubts about the applicability of inferential statistics for the metaanalysis problem. Inference at the level of persons within studies (of the type address by Rosenthal 1984) seemed quite unnecessary to them, since even a modest size synthesis will involve a few hundred persons (nested within studies) and lead to nearly automatic rejection of null hypotheses. Moreover the chances are remote that these persons or subjects within studies were drawn from defined populations with anything approaching probabilistic techniques; hence, probabilistic calculations advanced as if subjects had been randomly selected are dubious.
At the level of "studies," the question of the appropriateness of inferential statistics can be asked again, and Glass et al. seem to answer in the negative. They pointed out that there are two instances in which common inferential methods are clearly appropriate: when a defined population has been randomly sampled and when subjects have been randomly assigned to conditions in a controlled experiment. In the latter case, Fisher showed how the permutation test can be used to make inferences to the universe of all possible permutations. But this case in of little interest to metaanalysts who never assign units to treatments. Glass et al. claimed that the typical metaanalysis virtually never meets the condition of probabilistic sampling of a population. They took the position that inferential statistics has little role to play in metaanalysis: "The probability conclusions of inferential statistics depend on something like probabilistic sampling, or else they make no sense" (p. 199).
It is a common habit of thought to acknowledge that many data sets fail to meet probabilistic sampling conditions , but to argue that one might well treat the data in hand "as if" it were a random sample of some hypothetical population. Under this supposition, inferential techniques are applied and the results inspected. This circumlocution has neither logic nor common sense to support it. Indeed, it seems to be little short of a rationalization for performing statistics that one has gone to the trouble to learn whether they are appropriate or not. If the sample is fixed and the population is allowed to be hypothetical, then surely the data analyst will imagine a population that resembles the sample of data. Hence all of these "hypothetical populations" will be reflections of the samples and there will be no need for inferential statistics. The researcher runs the risk of generalizing to what well may be a fictitious, and hence irrelevant, population.
Hedges and Olkin (1985) developed inferential techniques that ignored the pro forma testing of null hypotheses and focused on the estimation of regression functions that estimate effects at different levels of study characteristics. They worried about both sources of statistical instability: that arising from persons within studies and that which arises from variation between studies. As they properly pointed out, the study based on 500 persons deserves greater weight than the study based on 5 persons in determining the response of the treatment condition to changes in study conditions. The techniques they present are based on traditional assumptions of random sampling and independence. It is, of course, unclear precisely how the validity of their methods are compromised by failure to achieve probabilistic sampling of persons and studies.
Rubin (1990) addressed most of these issues squarely and staked out a position that appeals to the authors of this manual: "...consider the idea that sampling and representativeness of the studies in a metaanalysis are important. I will claim that this is nonsensewe don't have to worry about representing a population but rather about other far more important things" (p. 155). These more important things to Rubin are the estimation of treatment effects under a set of standard or ideal study conditions. This process, as he outlined it, involves the fitting of response surfaces (a form of quantitative model building) between study effects (Y) and study conditions (X, W, Z etc.).
Where theorists disagree, technicians are well advised to leave open as many options as possible. Consequently, we have provided many methods of data analysis in MetaStat that address inferential issues, including Hedges's homogeneity tests, Rosenthal and Rubin's aggregate significance levels as well as conventional methods of analysis like regression, and a newly rediscovered approximate randomization test.
Where conditions of random sampling are met, the inferential techniques in MetaStat will have clear and undisputed application. In all other instances, it will be up to the metaanalyst to decide whether to apply them and how to interpret them. We provide fair warning that the inferential statistical techniques in MetaStat are easy to misapply and misuse.