This page provides references for classic papers and books in various field of machine learning, big data analysis. For the beginner and new students, it provides foundations for your own research. For senior students, we expect you to get inspired by previous work of other pioneers in your field.
Machine Learning

For an overview:

Hastie, Tibshirani, Friedman; Elements of Statistical Learning.

Duda, Hart, Stork; Pattern Recognition.


More theoretical material:

Devroye, Györfi, Lugosi; A Probabilistic Theory of Pattern Recognition.

Mohri, Rostamizadeh, Talwalkar; Foundations of Machine Learning (Adaptive Computation and Machine Learning series)

Lugosi, Massart, Boucheron; Concentration Inequalities: A Nonasymptotic Theory of Independence


Penalized estimation:

Liu, Roeder, Wasserman; Stability Approach to Regularization Selection (StARS) for High Dimensional Graphical Models

General Surveys

Domingos; (For beginner) A few useful things to know about machine learning
Spectral Methods

Spectral clustering:

Ng, Jordan, Weiss; On Spectral Clustering: Analysis and an Algorithm.

Shi, Malik; Normalized Cuts and Image Segmentation.


Laplacian Eigenmaps: http://www.cse.ohiostate.edu/~mbelkin/papers/papers.html

Luxburg, Belkin, Bousquet; Consistency of Spectral Clustering

Diffusion maps:

Coifman, Lafon; Diffusion maps

Lafon, Keller, Coifman; Data Fusion and Multicue Data Matching by Diffusion Maps

Nadler, Lafon, Coifman, Kevrekidis; Diffusion Maps, Spectral Clustering and Eigenfunctions of FokkerPlanck Operators


(Semi)supervised learning:

Costa, Hero; Classification constrained dimensionality reduction

Raich, Costa, Damelin, Hero; Classification constrained dimensionality reduction

Zhou, Li; Semisupervised learning by disagreement

Dimensionality Estimation

Maaten, Postma, Herik Dimensionality reduction: A comparative review
Online learning and Boosting

ShalevShwartz; Online Learning and Online Convex Optimization

Murata, Takenouchi, Kanamori, Eguchi; Information geometry of UBoost and Bregman divergence
Sparse coding, dictionary learning and matrix factorization

Dictionary learning:

Aharon, Elad, Bruckstein; KSVD and its nonnegative variant for dictionary design

Mairal, Bach, Ponce; TaskDriven Dictionary Learning

Mairal, Bach, Ponce, Sapiro; Online learning for matrix factorization and sparse coding


Sparse coding and compressed sensing:

Candes, Wakin; Enhancing Sparsity by Reweighted L1 Minimization
Deep learning, neural network, feature learning

Deep learning:

Feature learning:

Lee, Grosse, Ranganath, Ng; Convolutional deep belief networks for scalable unsupervised learning of hierarchical representations

Bengio, Courville, Vincent; Representation Learning: A Review and New Perspectives

Learning from multiple sources

Multiview semisupervised learning:

Xu, Tao, Xu; A survey of multiview learning

Crammer, Keams, Wortman; Learning from Multiple Sources


Multitask learning:

Argyriou, Evgeniou, and Pontil; Convex multitask feature learning

Random Geometric Graphs and Networks

(Generalized) BHH theorem and application:

Percolation theory:

Penrose; Random Geometric Graphs


Random network theory:
Differential Geometry in statistics, information theory and learning

J. Manton http://arxiv.org/abs/1302.0430, A Primer on Stochastic Differential Geometry for Signal Processing

Amari, Nagaoka; Methods of Information Geometry (Translations of Mathematical Monographs) (Tanslations of Mathematical Monographs) Classic in information geometry (need to know differential geometry first)

Murata, Takenouchi, Kanamori, Eguchi; Information geometry of UBoost and Bregman divergence
Information Divergence Estimation and Applications

GraphBased Approaches

Henze, Penrose; On the multivariate runs test


KNN Methods

Moon, Hero; Ensemble estimation of multivariate fdivergence


KDE Plugin Methods

Moon, Sricharan, Greenewald, Hero; Improving convergence of divergence functional ensemble estimators

Kandasamy, Krishnamurthy, Poczos, Wasserman, Robins; Nonparametric von Mises Estimators for Entropies, Divergences and Mutual Informations


Other Methods

Nguyen, Wainwright, Jordan; Estimating divergence functionals and the likelihood ratio by convex risk minimization


Bayes Error Bounds

Berisha, Wisler, Hero, Spanias; Empirically Estimable Classification Bounds Based on a Nonparametric Divergence Measure

Moon, Delouille, Hero; Meta learning of bounds on the Bayes classifier error

Target Detection/Tracking/Localization

Localization in wireless sensor networks:

Rangarajan, Raich, and Hero; Sparse multidimensional scaling for blind tracking in sensor networks

An overview of tracking algorithms, including Kalman filters, extensions to Kalman filters, and particle filters:

Arulampalam, et. al.; A tutorial on particle filters for online nonlinear/nonGaussian Bayesian tracking


An overview on linearization of the particle filter proposal density:

Doucet, et. al.; On sequential Monte Carlo sampling methods for Bayesian filtering


Multiple target tracking using particle filters:

Kreucher, Kastella, and Hero; Multitarget Tracking using the Joint Multitarget Probability Density

Adaptive Sensing

Bashan, Raich, and Hero; Optimal twostage search for sparse targets using convex criteria

Chong, Kreucher, and Hero; Partially Observable Markov Decision Process Approximations for Adaptive Sensing

Hero, Kreucher, and Blatt; “Information theoretic approaches to sensor management”, Ch.3 in Foundations and Applications of Sensor Management
LaTeX tools
* TIKZ and PGF for drawing within LaTeX slides