maximum likelihood estimation

suppose that the likelihood function depends on $\text{[math]}$ parameters $\text{[math]}$ . choose as estimates those values of the parameters that maximize the likelihood $\text{[math]}$ .
to emphasize the fact that the likelihood function is a function of the parameters $\text{[math]}$ , we sometimes write the likelihood function as $\text{[math]}$ . it is common to refer to maximum-likelihood estimators as MLEs.
[cite:@wackerly_stats_2008 section 9.7 the method of maximum likelihood]

a binomial experiment consisting of $\text{[math]}$ trials resulted in observations $\text{[math]}$ , where $\text{[math]}$ if the $\text{[math]}$ th trial was a success and $\text{[math]}$ otherwise. find the MLE of $\text{[math]}$ , the probability of a success.

the likelihood of the observed sample is the probability of observing $\text{[math]}$ . hence,
$\text{[math]}$ we now wish to find the value of $\text{[math]}$ that maximizes $\text{[math]}$ . if $\text{[math]}$ , $\text{[math]}$ , and $\text{[math]}$ is maximized when $\text{[math]}$ . analogously, if $\text{[math]}$ , $\text{[math]}$ and $\text{[math]}$ is maximized when $\text{[math]}$ . if $\text{[math]}$ , then $\text{[math]}$ is zero when $\text{[math]}$ and $\text{[math]}$ and is continuous for values of $\text{[math]}$ between 0 and 1. thus, for $\text{[math]}$ , we can find the value of $\text{[math]}$ that maximizes $\text{[math]}$ by setting the derivative $\text{[math]}$ equal to 0 and solving for $\text{[math]}$ .
note that $\text{[math]}$ is a monotonically increasing function of $\text{[math]}$ . hence, both $\text{[math]}$ and $\text{[math]}$ are maximized for the same value of $\text{[math]}$ . because $\text{[math]}$ is a product of functions of $\text{[math]}$ and finding the derivative of products is tedious, it is easier to find the value of $\text{[math]}$ that maximizes $\text{[math]}$ . we have
$\text{[math]}$ if $\text{[math]}$ , the derivative of $\text{[math]}$ with respect to $\text{[math]}$ , is
$\text{[math]}$ for $\text{[math]}$ , the value of $\text{[math]}$ that maximizes (or minimizes) $\text{[math]}$ is the solution of the equation
$\text{[math]}$ solving, we obtain the estimate $\text{[math]}$ . you can easily verify that this solution occurs when $\text{[math]}$ (and hence $\text{[math]}$ ) achieves a maximum.
because $\text{[math]}$ is maximized at $\text{[math]}$ when $\text{[math]}$ , at $\text{[math]}$ when $\text{[math]}$ and at $\text{[math]}$ when $\text{[math]}$ , whatever the observed value of $\text{[math]}$ , $\text{[math]}$ is maximized when $\text{[math]}$ .
the MLE, $\text{[math]}$ , is the fraction of successes in the total number of trials n. hence, the MLE of $\text{[math]}$ is actually the intuitive estimator for $\text{[math]}$ .

[cite:@wackerly_stats_2008 example 9.14]

the method of maximum likelihood is, by far, the most popular technique for deriving broken link: blk:estimators. recall that if $\text{[math]}$ are an iid sample from a population with pdf or pmf $\text{[math]}$ , the likelihood function is defined by
$\text{[math]}$

for each sample point $\text{[math]}$ , let $\text{[math]}$ be a parameter value at which $\text{[math]}$ attains its maximum as a function of $\text{[math]}$ , with $\text{[math]}$ held fixed. a maximum likelihood estimator (MLE) of the parameter $\text{[math]}$ based on a sample $\text{[math]}$ is $\text{[math]}$ .
[cite:;taken from @berger_inference_2002 definition 7.2.4]

notice that, by its construction, the range of the MLE coincides with the range of the parameter. we also use the abbreviation MLE to stand for maximum likelihood estimate when we are talking of the realized value of the estimator.
intuitively, the MLE is a reasonable choice for an estimator. The MLE is the parameter point for which the observed sample is most likely.
there are two inherent drawbacks associated with the general problem of finding the maximum of a function, and hence of maximum likelihood estimation. the first problem is that of actually finding the global maximum and verifying that, indeed, a global maximum has been found. in many cases this problem reduces to a simple differential calculus exercise but, sometimes even for common densities, difficulties do arise. the second problem is that of numerical sensitivity. that is, how sensitive is the estimate to small changes in the data? unfortunately, it is sometimes the case that a slightly different sample will produce a vastly different MLE, making its use suspect. we consider first the problem of finding MLEs.
if the likelihood function is differentiable (in $\text{[math]}$ ), possible candidates for the MLE are the values of $\text{[math]}$ that solve
$\text{[math]}$ note that the solutions to maximum_likelihood_estimation.html are only possible candidates for the MLE since the first derivative being 0 is only a necessary condition for a maximum, not a sufficient condition. furthermore, the zeros of the first derivative locate only extreme points in the interior of the domain of a function. if the extrema occur on the boundary the first derivative may not be 0. thus, the boundary must be checked separately for extrema.
points at which the first derivatives are 0 may be local or global minima, local or global maxima, or inflection points. our job is to find a global maximum.

let $\text{[math]}$ be iid $\text{[math]}$ , and let $\text{[math]}$ denote the likelihood function. then
$\text{[math]}$ the equation $\text{[math]}$ reduces to
$\text{[math]}$ which has the solution $\text{[math]}$ . hence, $\text{[math]}$ is a candidate for the MLE. to verify that $\text{[math]}$ is, in fact, a global maximum of the likelihood function, we can use the following argument. first, note that $\text{[math]}$ is the only solution to $\text{[math]}$ ; hence $\text{[math]}$ is the only zero of the first derivative. second, verify that
$\text{[math]}$ thus, $\text{[math]}$ is the only extreme point in the interior and it is a maximum. to finally verify that $\text{[math]}$ is a global maximum, we must check the boundaries, $\text{[math]}$ . by taking limits it is easy to establish that the likelihood is 0 at $\text{[math]}$ . So $\text{[math]}$ is a global maximum and hence $\text{[math]}$ is the MLE. (actually, we can be a bit more clever and avoid checking $\text{[math]}$ . since we established that $\text{[math]}$ is a unique interior extremum and is a maximum, there can be no maximum at $\text{[math]}$ . if there were, then there would have to be an interior minimum, which contradicts uniqueness.)
another way to find an MLE is to abandon differentiation and proceed with a direct maximization. this method is usually simpler algebraically, especially if the derivatives tend to get messy, but is sometimes harder to implement because there are no set rules to follow. one general technique is to find a global upper bound on the likelihood function and then establish that there is a unique point for which the upper bound is attained.

[cite:;taken from @berger_inference_2002 chapter 7.2 methods of finding estimators]