maximum likelihood estimation
suppose that the likelihood function depends on
parameters
. choose as estimates those values of the parameters that maximize the likelihood
.
to emphasize the fact that the likelihood function is a function of the parameters
, we sometimes write the likelihood function as
. it is common to refer to maximum-likelihood estimators as MLEs.
[cite:@wackerly_stats_2008 section 9.7 the method of maximum likelihood]
to emphasize the fact that the likelihood function is a function of the parameters
[cite:@wackerly_stats_2008 section 9.7 the method of maximum likelihood]
a binomial experiment consisting of
trials resulted in observations
, where
if the
th trial was a success and
otherwise. find the MLE of
, the probability of a success.
the likelihood of the observed sample is the probability of observing
. hence,
we now wish to find the value of
that maximizes
. if
,
, and
is maximized when
. analogously, if
,
and
is maximized when
. if
, then
is zero when
and
and is continuous for values of
between 0 and 1. thus, for
, we can find the value of
that maximizes
by setting the derivative
equal to 0 and solving for
.
note that
is a monotonically increasing function of
. hence, both
and
are maximized for the same value of
. because
is a product of functions of
and finding the derivative of products is tedious, it is easier to find the value of
that maximizes
. we have
if
, the derivative of
with respect to
, is
for
, the value of
that maximizes (or minimizes)
is the solution of the equation
solving, we obtain the estimate
. you can easily verify that this solution occurs when
(and hence
) achieves a maximum.
because
is maximized at
when
, at
when
and at
when
, whatever the observed value of
,
is maximized when
.
the MLE,
, is the fraction of successes in the total number of trials n. hence, the MLE of
is actually the intuitive estimator for
.
[cite:@wackerly_stats_2008 example 9.14]note that
because
the MLE,
the method of maximum likelihood is, by far, the most popular technique for deriving broken link: blk:estimators. recall that if
are an iid sample from a population with pdf or pmf
, the likelihood function is defined by

intuitively, the MLE is a reasonable choice for an estimator. The MLE is the parameter point for which the observed sample is most likely.
there are two inherent drawbacks associated with the general problem of finding the maximum of a function, and hence of maximum likelihood estimation. the first problem is that of actually finding the global maximum and verifying that, indeed, a global maximum has been found. in many cases this problem reduces to a simple differential calculus exercise but, sometimes even for common densities, difficulties do arise. the second problem is that of numerical sensitivity. that is, how sensitive is the estimate to small changes in the data? unfortunately, it is sometimes the case that a slightly different sample will produce a vastly different MLE, making its use suspect. we consider first the problem of finding MLEs.
if the likelihood function is differentiable (in
), possible candidates for the MLE are the values of
that solve
note that the solutions to maximum_likelihood_estimation.html are only possible candidates for the MLE since the first derivative being 0 is only a necessary condition for a maximum, not a sufficient condition. furthermore, the zeros of the first derivative locate only extreme points in the interior of the domain of a function. if the extrema occur on the boundary the first derivative may not be 0. thus, the boundary must be checked separately for extrema.
points at which the first derivatives are 0 may be local or global minima, local or global maxima, or inflection points. our job is to find a global maximum.
for each sample point
, let
be a parameter value at which
attains its maximum as a function of
, with
held fixed. a maximum likelihood estimator (MLE) of the parameter
based on a sample
is
.
[cite:;taken from @berger_inference_2002 definition 7.2.4]
notice that, by its construction, the range of the MLE coincides with the range of the parameter. we also use the abbreviation MLE to stand for maximum likelihood estimate when we are talking of the realized value of the estimator.[cite:;taken from @berger_inference_2002 definition 7.2.4]
intuitively, the MLE is a reasonable choice for an estimator. The MLE is the parameter point for which the observed sample is most likely.
there are two inherent drawbacks associated with the general problem of finding the maximum of a function, and hence of maximum likelihood estimation. the first problem is that of actually finding the global maximum and verifying that, indeed, a global maximum has been found. in many cases this problem reduces to a simple differential calculus exercise but, sometimes even for common densities, difficulties do arise. the second problem is that of numerical sensitivity. that is, how sensitive is the estimate to small changes in the data? unfortunately, it is sometimes the case that a slightly different sample will produce a vastly different MLE, making its use suspect. we consider first the problem of finding MLEs.
if the likelihood function is differentiable (in
points at which the first derivatives are 0 may be local or global minima, local or global maxima, or inflection points. our job is to find a global maximum.
let
be iid
, and let
denote the likelihood function. then
the equation
reduces to
which has the solution
. hence,
is a candidate for the MLE. to verify that
is, in fact, a global maximum of the likelihood function, we can use the following argument. first, note that
is the only solution to
; hence
is the only zero of the first derivative. second, verify that
thus,
is the only extreme point in the interior and it is a maximum. to finally verify that
is a global maximum, we must check the boundaries,
. by taking limits it is easy to establish that the likelihood is 0 at
. So
is a global maximum and hence
is the MLE. (actually, we can be a bit more clever and avoid checking
. since we established that
is a unique interior extremum and is a maximum, there can be no maximum at
. if there were, then there would have to be an interior minimum, which contradicts uniqueness.)
another way to find an MLE is to abandon differentiation and proceed with a direct maximization. this method is usually simpler algebraically, especially if the derivatives tend to get messy, but is sometimes harder to implement because there are no set rules to follow. one general technique is to find a global upper bound on the likelihood function and then establish that there is a unique point for which the upper bound is attained.
[cite:;taken from @berger_inference_2002 chapter 7.2 methods of finding estimators]another way to find an MLE is to abandon differentiation and proceed with a direct maximization. this method is usually simpler algebraically, especially if the derivatives tend to get messy, but is sometimes harder to implement because there are no set rules to follow. one general technique is to find a global upper bound on the likelihood function and then establish that there is a unique point for which the upper bound is attained.