context-free grammar
a context-free grammar is a 4-tuple
, where
if -
is a finite set called the variables,
-
is a finite set, disjoint from
, called the terminals,
-
is a finite set of rules, with each rule being a variable and a string of variables and terminals, and
-
is the start variable.
the following is an example of a context-free grammar, which we call
.
a grammar consists of a collection of substitution rules, also called productions. each rule appears as a line in the grammar, comprising a symbol and a string separated by an arrow. the symbol is called a variable. the string consists of variables and other symbols called terminals. the variable symbols often are represented by capital letters. the terminals are analogous to the input alphabet and often are represented by lowercase letters, numbers, or special symbols. one variable is designated as the start variable. it usually occurs on the left-hand side of the topmost rule. for example, grammar
contains three rules.
’s variables are
and
, where
is the start variable. its terminals are 0, 1, and #. you use a grammar to describe a language by generating each string of that language in the following manner.
generates the string 000111 the sequence of substitutions to obtain a string is called a derivation. a derivation of string 000111 in grammar
is
you may also represent the same information pictorially with a parse tree. an example of a parse tree is shown in fig-cfg-parse-tree-1.
all strings generated in this way constitute the language of the grammar. we write
for the language of grammar
. some experimentation with the grammar
shows us that
is
. any language that can be generated by some context-free grammar is called a context-free language (CFL). for convenience when presenting a context-free grammar, we abbreviate several rules with the same left-hand variable, such as
and
, into a single line
, using the symbol
as an "or".
the following is a second example of a context-free grammar, called
, which describes a fragment of the English language.
grammar
has 10 variables (the capitalized grammatical terms written inside brackets); 27 terminals (the standard English alphabet plus a space character); and 18 rules. strings in
include:
each of these strings has a derivation in grammar
. the following is a derivation of the first string on this list.

- write down the start variable. It is the variable on the left-hand side of the top rule, unless specified otherwise.
- find a variable that is written down and a rule that starts with that variable. replace the written down variable with the right-hand side of that rule.
- repeat step 2 until no variables remain.
the following is a second example of a context-free grammar, called
a boy sees the boy sees a flower a girl with a flower likes the boy
consider grammar
. the set of rules,
, is
this grammar generates strings such as
,
, and
. you can see more easily what this language is if you think of a as a left parenthesis "(" and
as a right parenthesis ")". viewed in this way,
is the language of all strings of properly nested parentheses. observe that the right-hand side of a rule may be the empty string
.
[cite:@sipser_comp_2012 example 2.3]
[cite:@sipser_comp_2012 chapter 2 context-free languages][cite:@sipser_comp_2012 example 2.3]
build cfg's for each of the following languages:

[cite:@sipser_comp_2012 chapter 2 context-free languages, designing context-free grammars]
-
.
-
.
-
.
or
- the empty language.
- all the words over
in which the string
appears.
-
[cite:@sipser_comp_2012 chapter 2 context-free ambiguity]