Maximum likelihood: why we do not seek the most likely data
ABSTRACT
Parameter optimization selects a single parameter value that maximizes the likelihood function. However, one may ask why we are not equally interested in the sample configuration that would maximize the likelihood. For example, for N samples drawn from a Gaussian, the likelihood is maximized when all N samples take exactly the mean value---a degenerate case that is rarely useful in practice. To clarify this and related questions, we will derive a simple expression and use it to understand, from a mathematical perspective, why maximum likelihood estimation focuses on optimizing parameters and does not attempt to obtain such maximum-likelihood samples.