Abstract
Many algorithms have been proposed for estimating the intrinsic dimension of high dimensional data. A phenomenon common to all of them is a negative bias, perceived to be the result of undersampling. We propose improved methods for estimating intrinsic dimension, taking manifold boundaries into consideration. By estimating dimension locally, we are able to analyze and reduce the effect that sample data depth has on the negative bias. Additionally, we offer improvements to an existing algorithm for dimension estimation, based on knearest neighbor graphs, and offer an algorithm for adapting any dimension estimation algorithm to operate locally. Finally, we illustrate the uses of local dimension estimation with data sets consisting of multiple manifolds, including applications such as diagnosing anomalies in router networks and image segmentation.
Paper
Kevin M. Carter, Raviv Raich, and Alfred O. Hero III, “Debiasing for intrinsic dimension estimation,” in Proc. IEEE Statistical Signal Processing Workshop (SSP), August 2007, pp. 601605. (.pdf)
Matlab Code
(.zip) This code requires the Kernel Density Estimation Toolbox if you wish to plot the PDF of dimension vs data depth (Fig. 2 below).
Figures

Figure 1. The probability of randomly selecting a point on the boundary of an mdimensional hypercube

Figure 2. Analysis of the effect of data depth on local dimension estimation. Points with less depth estimate at a lower dimension, contributing to the overall negative bias.

Figure 3a. Biased global dimension estimates of hypercube

Figure 3b. Developing a debiased global dimension estimate by averaging over the 50% of points with the greatest depth on the manifold

Figure 4. The kNN Algorithm applied to network traffic data

Figure 5. Satellite image of New York City; three regions of differing complexity are noted

Figure 6a. NYC Region 1 dimension estimation (image complexity)

Figure 6b. NYC Region 2 dimension estimation (image complexity)

Figure 6c. NYC Region 3 dimension estimation (image complexity)