Likelihood vs probability

The terminology of likelihood is a bit misleading, $\operatorname{L}(\theta | x)$ should NOT be confused with $\operatorname{P}(\theta|x)$ \(\operatorname{L}(\theta | x) = \operatorname{P}_\theta(x) = \operatorname{P}(X=x | \theta)\)

MLE, MAP and Bayes estimator

Definition

  • MLE: $\theta = \underset{\theta}{\operatorname{argmax}} \operatorname{P}(X | \theta)$
  • MAP: $\theta =\underset{\theta}{\operatorname{argmax}} \operatorname{P}(\theta | X)$
  • Bayes estimator: $\theta =\operatorname{E}(\theta | X)$

Comparison

  • Frequentist vs Bayesian
    • MLE aim to maximize $\operatorname{P}(X=x | \theta)$, a frequentist inference method
    • MAP and Bayes estimator both assume prior distributions and are Bayesian inference methods
  • MAP vs MLE
    • For MAP, $\operatorname{P}(\theta | X) \sim \operatorname{P}(X | \theta)P(\theta)$
    • Therefore, MLE is a speical case of MAP: a MAP with a flat prior
  • MAP vs Bayes estimator
    • MAP = posterior mode
    • Bayes estimator = posterior mean
  • Optimization
    • Both MLE and MAP are distribution mode and can be found by differentiation / gradient descent
    • Bayes estimator is distribution mean and can be found by numerical integration

In the Bayesian world, you can argue Bayes estimator carries more information than MAP since it is take the entire distribution into consideration. By Bayesian central limit theorem, mode converge to mean. Therefore, the difference between MAP and Bayes estimator should be limited if you have a lot of data. The MAP and Bayes estimator are point estimators, posterior distribution should be always checked when available.

If you are interested in a solid example, you can read more here.