(33) H. Bozdogan, S. L. Sclove, and A.K. Gupta. AIC-replacements for some multivariate tests of homogeneity with applications in multisample clustering and variable selection." Proceedings of the 1st U.S./Japan Conference on the Frontiers of Statistical Modeling: An Informational Approach, 2 (Multivariate Statistical Modeling), pp. 199-232. H. Bozdogan (ed.), Kluwer Academic Publishers, Dordrecht, Netherlands, 1993.
Many practical situations require the presentation of multivariate data from several structured samples for comparative inference and the grouping of the heterogeneous sample into homogeneous sets of samples. While many multiple comparison procedures (MCP's) have been proposed in the literature in the univariate case, in the multivariate case there are few MCP's available in practice. Little or no work has been done under covariance heterogeneity for comparative simultaneous inference and on variable selection. This paper studies the AIC-replacement for Box's (1949) M for testing the homogeneity of covariances, for Wilks' (1932) lambda-criterion for testing the equality of mean vectors, and for testing complete homogeneity from an information-theoretic viewpoint via Akaike's (1973) Information Criterion (AIC) and Consistent Akaike's Information Criterion (CAIC) proposed by Bozdogan (1987). The results presented in this paper, are based on the extension of the original work of Bozdogan (1981, 1986), and Bozdogan and Sclove (1984). These criteria combine the maximum value of the likelihood with number of parameters used in achieving that value. Asymptotic implied level of significance for both AIC and CAIC-replacements for three tests are computed and tabulated for varying p, the number of variables; k, the number of groups; and n, the sample size. The results are shown on computer generated three dimensinal mesh surfaceplots to help the interpretation of the tabled values.
Finally numerical examples are presented by applying the new approach to:
(1) Multisample clustering of oxygen consumption in males and females on p = 4 measurements, and identifying the optimal features which contribute to the separation of these two groups according to the model selection criteria. These results are compared with the results of an expert physician's reanking of the variables in ability to discriminate male and female groups based on the biological considerations in medicine.
(2) Multisample clustering of male Egyptian skulls in five historical epochs measured on four variables to determine the differences between the epochs (or historical periods), and identify the optimal subset of the variables which distinguish these five periods.
Our results show how to measure the amount of homogeneity and heterogeneity in clustering samples. The approach simultaneously identifies the relevant variables across the groups or samples without any test theory and the need of specifying any arbitrary level of significance. Furthermore, with this new approach, we avoid assuming dubious structures on the covariance matrices in reducing the dimensionality in multisample data sets.