part-of-speech tagging

the task of part-of-speech tagging consists of taking a sequence of words and assigning each word a part of speech like or , and the task of named entity recognition, assigning words or phrases tags like , or .
[cite:;taken from @nlp_jurafsky_2020 chapter 8 sequence labeling for parts of speech and named entities]

tagging as decoding

for any model, such as an HMM, that contains hidden variables, the task of determining the hidden variables sequence corresponding to the sequence of observations is called decoding. more formally,
given as input an HMM and a sequence of observations find the most probable sequence of states .
for part-of-speech tagging, the goal of HMM decoding is to choose the tag sequence that is most probable given the observation sequence of words :
the way we'll do this in the HMM is to use bayes' rule to instead compute:
furthermore, we simplify broken link: blk:eq-pos-tag-2 by dropping the denominator :
HMM taggers make two further simplifying assumptions. the first is that the probability of a word appearing depends only on its own tag and is independent of neighboring words and tags:
the second assumption, the bigram assumption, is that the probability of a tag is dependent only on the previous tag, rather than the entire tag sequence;
plugging the simplifying assumptions from broken link: blk:eq-pos-tag-4 and broken link: blk:eq-pos-tag-5 into broken link: blk:eq-pos-tag-3 results in the following equation for the most probable tag sequence from a bigram tagger:
the two parts of broken link: blk:eq-pos-tag-6 correspond neatly to the broken link: blk:hmm-obs-like and broken link: blk:hmm-trans-prob.
[cite:;taken from @nlp_jurafsky_2020 chapter 8.4.4 HMM tagging as decoding]