This article concerns datasets in which variables are in the form of intervals, which are obtained by aggregating information about variables from a larger dataset. We propose to view the observed set of hyper-rectangles as an empirical histogram, and to use a Gaussian kernel type estimator to approximate its underlying distribution in a nonparametric way. We apply this idea to both univariate density estimation and regression problems. Unlike many existing methods used in regression analysis, the proposed method can estimate the conditional distribution of the response variable for any given set of predictors even when some of them are not interval-valued. Empirical studies show that the proposed approach has a great flexibility in various scenarios with complex relationships between the location and width of intervals of the response and predictor variables.
All Science Journal Classification (ASJC) codes
- Statistics and Probability
- Modelling and Simulation
- Applied Mathematics