The statistical analysis of individual choice behavior can be hampered by aggregate information. When a choice response variable is fully observed, individual choice behavior is typically studied with generalized linear models with the logit or probit link function. The individual choice response variable is, however, often available in an aggregate form because of data confidentiality, privacy protection, and a limited database quota. Then the generalized linear models are no longer directly applicable to such aggregate data. In this article, we confront such an aggregate data problem by using the method of data augmentation, where latent individual choice responses are augmented while satisfying a constraint on their aggregate sum. To do so, we devise the novel choice-wise sampling algorithm for a generalized linear model with aggregate binary responses. The proposed algorithm is applied to the 2006 Pennsylvania gubernatorial election data, where only aggregate votes cast for each candidate are available because of the privacy protection of voters. This article has supplementary material online.
Bibliographical noteFunding Information:
The author thanks anonymous referees and the editor for helpful comments and suggestions. Part of this work was done while the author was Assistant Professor in the Department of Statistics at the University of Pittsburgh. This research was supported in part by the Yonsei New Faculty Research grant 2010-1-0205.
All Science Journal Classification (ASJC) codes
- Statistics and Probability
- Discrete Mathematics and Combinatorics
- Statistics, Probability and Uncertainty