hidden markov model

a hidden markov model (HMM) is a probabilistic sequence model: given a sequence of units (words, letters, morphemes, sentences, whatever), it computes a probability distribution over possible sequences of labels and chooses the best label sequence.
a hidden markov model allows us to talk about both observed events (like words that we see in the input) and hidden events (like part-of-speech tags) that we think of as causal factors in our probabilistic model. an HMM is specified by the following components:

<<>> $\text{[math]}$ : a set of $\text{[math]}$ states,
<<>> $\text{[math]}$ : a transition probability matrix $\text{[math]}$ . each $\text{[math]}$ representing the probability of moving from state $\text{[math]}$ to state $\text{[math]}$ , \shortfor[such that]{s.t.} $\text{[math]}$ ,
<<>> $\text{[math]}$ a sequence of $\text{[math]}$ observations, each one drawn from a vocabulary $\text{[math]}$ ,
<<>> $\text{[math]}$ : a sequence of observation likelihoods, also called emission probabilities, each expressing the probability of an observation $\text{[math]}$ being generated from a state $\text{[math]}$ ,
<<>> $\text{[math]}$ : an initial probability distribution over states. $\text{[math]}$ is the probability that the markov chain will start in state $\text{[math]}$ . some states $\text{[math]}$ may have $\text{[math]}$ , meaning that they cannot be initial states. also, $\text{[math]}$ .

a first-order hidden markov model instantiates two simplifying assumptions. first, as with a first-order markov chain, the probability of a particular state depends only on the previous state:
$\text{[math]}$ second, the probability of an output observation $\text{[math]}$ depends only on the state that produced the observation $\text{[math]}$ and not on any other states or any other observations:
$\text{[math]}$ [cite:;taken from @nlp_jurafsky_2020 chapter 8.4 part-of-speech tagging]