1 introduction 1.1 elements of system identification 1.2 traditional identification criteria 1.3 information theoretic criteria 1.3.1 mee criteria 1.3.2 minimum information divergence criteria 1.3.3 mutual information-based criteria 1.4 organization of this book 8 appendix a: unifying framework of itl 2 information measures 2.1 entropy 2.2 mutual information 2.3 information divergence 2.4 fisher information 2.5 information rate 24 appendix b: α-stable distribution 26 appendix c: proof of (2.17) 26 appendix d: proof of cramer-rao inequality 3 information theoretic parameter estimation 3.1 traditional methods for parameter estimation 3.1.1 classical estimation 3.1.2 bayes estimation 3.2 information theoretic approaches to classical estimation 3.2.1 entropy matching method 3.2.2 maximum entropy method 3.2.3 minimum divergence estimation 3.3 information theoretic approaches to bayes estimation 3.3.1 minimum error entropy estimation 3.3.2 mc estimation 3.4 information criteria for model selection 56 appendix e: em algorithm 57 appendix f: minimum mse estimation 58 appendix g: derivation of aic criterion 4 system identification under minimum error entropy criteria 4.1 brief sketch of system parameter identification 4.1.1 model structure 4.1.2 criterion function 4.1.3 identification algorithm 4.2 mee identification criterion 4.2.1 common approaches to entropy estimation 4.2.2 empirical error entropies based on kde 4.3 identification algorithms under mee criterion 4.3.1 nonparametric information gradient algorithms 4.3.2 parametric ig algorithms 4.3.3 fixed-point minimum error entropy algorithm 4.3.4 kernel minimum error entropy algorithm 4.3.5 simulation examples 4.4 convergence analysis 4.4.1 convergence analysis based on approximate linearization 4.4.2 energy conservation relation 4.4.3 mean square convergence analysis based on energy conservation relation 4.5 optimization of φ-entropy criterion 4.6 survival information potential criterion 4.6.1 definition of sip 4.6.2 properties of the sip 4.6.3 empirical sip 4.6.4 application to system identification 4.7 δ-entropy criterion 4.7.1 definition of δ-entropy 4.7.2 some properties of the δ-entropy 4.7.3 estimation of δ-entropy 4.7.4 application to system identification 4.8 system identification with mcc 161 appendix h: vector gradient and matrix gradient 5 system identification under information divergence criteria 5.1 parameter identifiability under klid criterion 5.1.1 definitions and assumptions 5.1.2 relations with fisher information 5.1.3 gaussian process case 5.1.4 markov process case 5.1.5 asymptotic klid-identifiability 5.2 minimum information divergence identification with reference pdf 5.2.1 some properties 5.2.2 identification algorithm 5.2.3 simulation examples 5.2.4 adaptive infinite impulsive response filter with euclidean distance criterion 6 system identification based on mutual information criteria 6.1 system identification under the minmi criterion 6.1.1 properties of minmi criterion 6.1.2 relationship with independent component analysis 6.1.3 ica-based stochastic gradient identification algorithm 6.1.4 numerical simulation example 6.2 system identification under the maxmi criterion 6.2.1 properties of the maxmi criterion 6.2.2 stochastic mutual information gradient identification algorithm 6.2.3 double-criterion identification method 227 appendix i: minmi rate criterion references