EM method

Description:

Expectation and Maximization are the two steps of a procedure which handles mixture models. If our data stem from K processes, we build a model for each process and estimate a mixing distribution.

The basic idea of EM in this context is to pretend that we know the parameters of the model and then to infer the probability that each data point belongs to each component. After that, we refit the components to the data, where each component is fitted to the entire data set with each point weighted by the probability that it belongs to that component. The process iterates until convergence. Essentially, we are "completing" the data by inferring probability distributions over the hidden variables - which component each data point belongs to - based on the current model. For the mixture of Gaussians, we initialize the mixture model parameters arbitarily and then iterate the following two steps:
  1. E-step: Compute the probabilities pij=P(C=i|xj), the probability that datum xj was generated by component i. By Bayes' rule, we have pij=aP(xj|C=i)P(C=i). The term P(xj|C=i) is just the probability at xj of the ith Gaussian, and the term P(C=i) is just the weight parameter for the ith Gaussian. Define i=Sumjpij.
  2. M-step: Compute the new mean, covariance, and component weights as follows:
    mi <- Sumj pijxj/pi
    Sumi <- Sumj pijxjxjT/pi
    wi <- pi
  3. Iterate steps 2 and 3 until convergence
Publications: Bilmes/97b: A Gentle Tutorial on the EM Algorithm and Its Application to Parameter Estimation for Gaussian Mixture and Hidden Markov Models