Contents Preface.iv Acknowledgments.vv Notation.vfivi CHAPTER 1 Introduction .1 1.1 What Machine Learning is About1 1.1.1 Classification.2 1.1.2 Regression3 1.2 Structure and a Road Map of the Book5 References8 CHAPTER 2 Probability and Stochastic Processes 9 2.1 Introduction.10 2.2 Probability and Random Variables.10 2.2.1Probability11 2.2.2Discrete Random Variables12 2.2.3Continuous Random Variables14 2.2.4Meanand Variance15 2.2.5Transformation of Random Variables.17 2.3 Examples of Distributions18 2.3.1Discrete Variables18 2.3.2Continuous Variables20 2.4 Stochastic Processes29 2.4.1First and Second Order Statistics.30 2.4.2Stationarity and Ergodicity30 2.4.3PowerSpectral Density33 2.4.4Autoregressive Models38 2.5 InformationTheory.41 2.5.1Discrete Random Variables42 2.5.2Continuous Random Variables45 2.6 Stochastic Convergence48 Problems49 References51 CHAPTER 3 Learning in Parametric Modeling: Basic Concepts and Directions 53 3.1 Introduction.53 3.2 Parameter Estimation: The Deterministic Point of View.54 3.3 Linear Regression.57 3.4 Classification60 3.5 Biased Versus Unbiased Estimation.64 3.5.1 Biased or Unbiased Estimation?65 3.6 The Cramér-Rao Lower Bound67 3.7 Suf?cient Statistic.70 3.8 Regularization.72 3.9 The Bias-Variance Dilemma.77 3.9.1 Mean-Square Error Estimation77 3.9.2 Bias-Variance Tradeoff78 3.10 MaximumLikelihoodMethod.82 3.10.1 Linear Regression: The Nonwhite Gaussian Noise Case84 3.11 Bayesian Inference84 3.11.1 The Maximum a Posteriori Probability Estimation Method.88 3.12 Curse of Dimensionality89 3.13 Validation.91 3.14 Expected and Empirical Loss Functions.93 3.15 Nonparametric Modeling and Estimation.95 Problems.97 References102 CHAPTER4 Mean-quare Error Linear Estimation105 4.1Introduction.105 4.2Mean-Square Error Linear Estimation: The Normal Equations106 4.2.1The Cost Function Surface107 4.3A Geometric Viewpoint: Orthogonality Condition109 4.4Extensionto Complex-Valued Variables111 4.4.1Widely Linear Complex-Valued Estimation113 4.4.2Optimizing with Respect to Complex-Valued Variables: Wirtinger Calculus116 4.5Linear Filtering.118 4.6MSE Linear Filtering: A Frequency Domain Point of View120 4.7Some Typical Applications.124 4.7.1Interference Cancellation124 4.7.2System Identification125 4.7.3Deconvolution: Channel Equalization126 4.8Algorithmic Aspects: The Levinson and the Lattice-Ladder Algorithms132 4.8.1The Lattice-Ladder Scheme.137 4.9Mean-Square Error Estimation of Linear Models.140 4.9.1The Gauss-Markov Theorem143 4.9.2Constrained Linear Estimation:The Beamforming Case145 4.10Time-Varying Statistics: Kalman Filtering148 Problems.154 References158 CHAPTER 5 Stochastic Gradient Descent: The LMS Algorithm and its Family .161 5.1 Introduction.162 5.2 The Steepest Descent Method163 5.3 Application to the Mean-Square Error Cost Function167 5.3.1 The Complex-Valued Case175 5.4 Stochastic Approximation177 5.5 The Least-Mean-Squares Adaptive Algorithm179 5.5.1 Convergence and Steady-State Performanceof the LMS in Stationary Environments.181 5.5.2 Cumulative Loss Bounds186 5.6 The Affine Projection Algorithm.188 5.6.1 The Normalized LMS.193 5.7 The Complex-Valued Case.194 5.8 Relatives of the LMS.196 5.9 Simulation Examples.199 5.10 Adaptive Decision Feedback Equalization202 5.11 The Linearly Constrained LMS204 5.12 Tracking Performance of the LMS in Nonstationary Environments.206 5.13 Distributed Learning:The Distributed LMS208 5.13.1Cooperation Strategies.209 5.13.2The Diffusion LMS211 5.13.3 Convergence and Steady-State Performance: Some Highlights218 5.13.4 Consensus-Based Distributed Schemes.220 5.14 A Case Study:Target Localization222 5.15 Some Concluding Remarks: Consensus Matrix.223 Problems.224 References227 CHAPTER 6 The Least-Squares Family 233 6.1 Introduction.234 6.2 Least-Squares Linear Regression: A Geometric Perspective.234 6.3 Statistical Properties of the LS Estimator236 6.4