part-of-speech tagging
the task of part-of-speech tagging consists of taking a sequence of words and assigning each word a part of speech like
or
, and the task of named entity recognition, assigning words or phrases tags like
, or
.
[cite:;taken from @nlp_jurafsky_2020 chapter 8 sequence labeling for parts of speech and named entities]
[cite:;taken from @nlp_jurafsky_2020 chapter 8 sequence labeling for parts of speech and named entities]
tagging as decoding
for any model, such as an HMM, that contains hidden variables, the task of determining the hidden variables sequence corresponding to the sequence of observations is called decoding. more formally,
that is most probable given the observation sequence of
words
:
the way we'll do this in the HMM is to use bayes' rule to instead compute:
furthermore, we simplify broken link: blk:eq-pos-tag-2 by dropping the denominator
:
HMM taggers make two further simplifying assumptions. the first is that the probability of a word appearing depends only on its own tag and is independent of neighboring words and tags:
the second assumption, the bigram assumption, is that the probability of a tag is dependent only on the previous tag, rather than the entire tag sequence;
plugging the simplifying assumptions from broken link: blk:eq-pos-tag-4 and broken link: blk:eq-pos-tag-5 into broken link: blk:eq-pos-tag-3 results in the following equation for the most probable tag sequence from a bigram tagger:
the two parts of broken link: blk:eq-pos-tag-6 correspond neatly to the
broken link: blk:hmm-obs-like and
broken link: blk:hmm-trans-prob.
[cite:;taken from @nlp_jurafsky_2020 chapter 8.4.4 HMM tagging as decoding]
given as input an HMM
and a sequence of observations
find the most probable sequence of states
.
for part-of-speech tagging, the goal of HMM decoding is to choose the tag sequence [cite:;taken from @nlp_jurafsky_2020 chapter 8.4.4 HMM tagging as decoding]