References
This page provides references for classic papers and books in various field of machine learning, big data analysis. For the beginner and new students, it provides foundations for your own research. For senior students, we expect you to get inspired by previous work of other pioneers in your field.
Machine Learning
- For an overview:
- Hastie, Tibshirani, Friedman; Elements of Statistical Learning.
- Duda, Hart, Stork; Pattern Recognition.
- Wasserman; All of Statistics: A Concise Course in Statistical Inference
- MacKay; Information Theory, Inference, and Learning Algorithms
- More theoretical material:
- Devroye, Györfi, Lugosi; A Probabilistic Theory of Pattern Recognition.
- Mohri, Rostamizadeh, Talwalkar; Foundations of Machine Learning (Adaptive Computation and Machine Learning series)
- Scott; Lecture notes from Prof. Scott
- Lugosi, Massart, Boucheron; Concentration Inequalities: A Nonasymptotic Theory of Independence
- Penalized estimation:
- Liu, Roeder, Wasserman; Stability Approach to Regularization Selection (StARS) for High Dimensional Graphical Models
General Surveys
- Seeger; Bayesian Modelling in Machine Learning: A Tutorial Review
- Domingos; (For beginner) A few useful things to know about machine learning
- Kass; Statistical Inference: The Big Picture
Spectral Methods
- Spectral clustering:
- Ng, Jordan, Weiss; On Spectral Clustering: Analysis and an Algorithm.
- Shi, Malik; Normalized Cuts and Image Segmentation.
- Laplacian Eigenmaps: http://www.cse.ohio-state.edu/~mbelkin/papers/papers.html
- Belkin, Niyogi; Laplacian Eigenmaps for Dimensionality Reduction and Data Representation
- Luxburg, Belkin, Bousquet; Consistency of Spectral Clustering
- Diffusion maps:
- Coifman, Lafon; Diffusion maps
- Lafon, Keller, Coifman; Data Fusion and Multicue Data Matching by Diffusion Maps
- Nadler, Lafon, Coifman, Kevrekidis; Diffusion Maps, Spectral Clustering and Eigenfunctions of Fokker-Planck Operators
- (Semi-)supervised learning:
- Costa, Hero; Classification constrained dimensionality reduction
- Raich, Hero; On dimensionality reduction for classification and its application
- Raich, Costa, Damelin, Hero; Classification constrained dimensionality reduction
- Zhou, Li; Semi-supervised learning by disagreement
Dimensionality Estimation
- Costa, Hero; Geodesic entropic graphs for dimension and entropy estimation in manifold learning
- Carter, Hero; Variance reduction with neighborhood smoothing for local intrinsic dimension estimation
- Maaten, Postma, Herik Dimensionality reduction: A comparative review
Online learning and Boosting
- Schapire, Freund; Boosting: Foundations and Algorithms (Adaptive Computation and Machine Learning series)
- Shalev-Shwartz; Online Learning and Online Convex Optimization
- Murata, Takenouchi, Kanamori, Eguchi; Information geometry of U-Boost and Bregman divergence
Sparse coding, dictionary learning and matrix factorization
- Dictionary learning:
- Aharon, Elad, Bruckstein; K-SVD and its non-negative variant for dictionary design
- Mairal, Bach, Ponce; Task-Driven Dictionary Learning
- Mairal, Bach, Ponce, Sapiro; Online learning for matrix factorization and sparse coding
- Sparse coding and compressed sensing:
- Hastie, Tibshirani, Wainwright; Statistical Learning with Sparsity: The Lasso and Generalizations (Chapman & Hall/CRC Monographs on Statistics & Applied Probability)
- Candes, Wakin; Enhancing Sparsity by Reweighted L1 Minimization
- Foucart, Rauhut; A Mathematical Introduction to Compressive Sensing (Applied and Numerical Harmonic Analysis)
Deep learning, neural network, feature learning
- Deep learning:
- Feature learning:
- Lee, Grosse, Ranganath, Ng; Convolutional deep belief networks for scalable unsupervised learning of hierarchical representations
- Bengio, Courville, Vincent; Representation Learning: A Review and New Perspectives
Learning from multiple sources
- Multi-view semi-supervised learning:
- Xu, Tao, Xu; A survey of multi-view learning
- Crammer, Keams, Wortman; Learning from Multiple Sources
- Multi-task learning:
- Argyriou, Evgeniou, and Pontil; Convex multi-task feature learning
Random Geometric Graphs and Networks
- (Generalized) BHH theorem and application:
- Percolation theory:
- Penrose; Random Geometric Graphs
- Random network theory:
Differential Geometry in statistics, information theory and learning
- J. Manton http://arxiv.org/abs/1302.0430, A Primer on Stochastic Differential Geometry for Signal Processing
- Amari, Nagaoka; Methods of Information Geometry (Translations of Mathematical Monographs) (Tanslations of Mathematical Monographs) Classic in information geometry (need to know differential geometry first)
- Murata, Takenouchi, Kanamori, Eguchi; Information geometry of U-Boost and Bregman divergence
Information Divergence Estimation and Applications
- Graph-Based Approaches
- Henze, Penrose; On the multivariate runs test
- K-NN Methods
- KDE Plug-in Methods
- Moon, Sricharan, Greenewald, Hero; Improving convergence of divergence functional ensemble estimators
- Kandasamy, Krishnamurthy, Poczos, Wasserman, Robins; Nonparametric von Mises Estimators for Entropies, Divergences and Mutual Informations
- Sing, Poczos; Exponential Concentration of a Density Functional Estimator
- Other Methods
- Nguyen, Wainwright, Jordan; Estimating divergence functionals and the likelihood ratio by convex risk minimization
- Bayes Error Bounds
- Berisha, Wisler, Hero, Spanias; Empirically Estimable Classification Bounds Based on a Nonparametric Divergence Measure
- Moon, Delouille, Hero; Meta learning of bounds on the Bayes classifier error
Target Detection/Tracking/Localization
- Localization in wireless sensor networks:
- Costa, Patwari, and Hero; Distributed weighted-multidimensional scaling with adaptive weighting for node localization in sensor networks
- Rangarajan, Raich, and Hero; Sparse multidimensional scaling for blind tracking in sensor networks
- An overview of tracking algorithms, including Kalman filters, extensions to Kalman filters, and particle filters:
- Arulampalam, et. al.; A tutorial on particle filters for online nonlinear/non-Gaussian Bayesian tracking
- An overview on linearization of the particle filter proposal density:
- Doucet, et. al.; On sequential Monte Carlo sampling methods for Bayesian filtering
- Multiple target tracking using particle filters:
- Kreucher, Kastella, and Hero; Multitarget Tracking using the Joint Multitarget Probability Density
Adaptive Sensing
- Bashan, Raich, and Hero; Optimal two-stage search for sparse targets using convex criteria
- Chong, Kreucher, and Hero; Partially Observable Markov Decision Process Approximations for Adaptive Sensing
- Hero, Kreucher, and Blatt; “Information theoretic approaches to sensor management”, Ch.3 in Foundations and Applications of Sensor Management
LaTeX tools
* TIKZ and PGF for drawing within LaTeX slides