EM method

Description:

Expectation and Maximization are the two steps of a procedure which handles mixture models. If our data stem from K processes, we build a model for each process and estimate a mixing distribution.

The basic idea of EM in this context is to pretend that we know the parameters of the model and then to infer the probability that each data point belongs to each component. After that, we refit the components to the data, where each component is fitted to the entire data set with each point weighted by the probability that it belongs to that component. The process iterates until convergence. Essentially, we are "completing" the data by inferring probability distributions over the hidden variables - which component each data point belongs to - based on the current model. For the mixture of Gaussians, we initialize the mixture model parameters arbitarily and then iterate the following two steps:

E-step: Compute the probabilities p_ij=P(C=i|x_j), the probability that datum x_j was generated by component i. By Bayes' rule, we have p_ij=aP(x_j|C=i)P(C=i). The term P(x_j|C=i) is just the probability at x_j of the ith Gaussian, and the term P(C=i) is just the weight parameter for the ith Gaussian. Define _i=Sum_jp_ij.
M-step: Compute the new mean, covariance, and component weights as follows:

m_i <- Sum_j p_ijx_j/p_i

Sum_i <- Sum_j p_ijx_jx_jT/p_i

w_i <- p_i
Iterate steps 2 and 3 until convergence

Publications:

Bilmes/97b: A Gentle Tutorial on the EM Algorithm and Its Application to Parameter Estimation for Gaussian Mixture and Hidden Markov Models