context-free grammar

a context-free grammar is a 4-tuple , where
  1. is a finite set called the variables,
  2. is a finite set, disjoint from , called the terminals,
  3. is a finite set of rules, with each rule being a variable and a string of variables and terminals, and
  4. is the start variable.
[cite:;from @sipser_comp_2012 definition 2.2]
if , and are strings of variables and terminals, and is a rule of the grammar, we say that yields , written . say that derives , written , if or if a sequence exists for and
the language of the grammar is .
the following is an example of a context-free grammar, which we call .
a grammar consists of a collection of substitution rules, also called productions. each rule appears as a line in the grammar, comprising a symbol and a string separated by an arrow. the symbol is called a variable. the string consists of variables and other symbols called terminals. the variable symbols often are represented by capital letters. the terminals are analogous to the input alphabet and often are represented by lowercase letters, numbers, or special symbols. one variable is designated as the start variable. it usually occurs on the left-hand side of the topmost rule. for example, grammar contains three rules. ’s variables are and , where is the start variable. its terminals are 0, 1, and #. you use a grammar to describe a language by generating each string of that language in the following manner.
  1. write down the start variable. It is the variable on the left-hand side of the top rule, unless specified otherwise.
  2. find a variable that is written down and a rule that starts with that variable. replace the written down variable with the right-hand side of that rule.
  3. repeat step 2 until no variables remain.
for example, grammar generates the string 000111 the sequence of substitutions to obtain a string is called a derivation. a derivation of string 000111 in grammar is
you may also represent the same information pictorially with a parse tree. an example of a parse tree is shown in fig-cfg-parse-tree-1.
all strings generated in this way constitute the language of the grammar. we write for the language of grammar . some experimentation with the grammar shows us that is . any language that can be generated by some context-free grammar is called a context-free language (CFL). for convenience when presenting a context-free grammar, we abbreviate several rules with the same left-hand variable, such as and , into a single line , using the symbol as an "or".
the following is a second example of a context-free grammar, called , which describes a fragment of the English language.
grammar has 10 variables (the capitalized grammatical terms written inside brackets); 27 terminals (the standard English alphabet plus a space character); and 18 rules. strings in include:
a boy sees
the boy sees a flower
a girl with a flower likes the boy
each of these strings has a derivation in grammar . the following is a derivation of the first string on this list.
consider grammar . the set of rules, , is
this grammar generates strings such as , , and . you can see more easily what this language is if you think of a as a left parenthesis "(" and as a right parenthesis ")". viewed in this way, is the language of all strings of properly nested parentheses. observe that the right-hand side of a rule may be the empty string .
[cite:@sipser_comp_2012 example 2.3]
[cite:@sipser_comp_2012 chapter 2 context-free languages]
build cfg's for each of the following languages:
  • .
  • .
  • . or
  • the empty language.
  • all the words over in which the string appears.
how about
[cite:@sipser_comp_2012 chapter 2 context-free languages, designing context-free grammars]
[cite:@sipser_comp_2012 chapter 2 context-free ambiguity]