Alternate Educational Experience for March 19, 1998

Griebach Normal Form

In class, we discussed the fact the CFGs (Context Free Grammars) were very unrestrained, and would be difficult to prove things about. Since we love to prove things, we must first prove that we can restrict the structure of our grammars--without losing any expressiveness (i.e. we can generate just as many languages with the restricted grammars).

We have already consider in class the restriction of the CFGs to Chomsky Normal Form (CNF), that every production in the set of production P for the grammar will be in one of the following forms:

A -> a (where "A" is a member of V, and "a" is a member of "T")
A -> BC (where A,B,C all members of V, but necessarily distinct)

( and if we haven't, for some reason, just assume we have anyway, we'll circle back and cover it later).

But, this is two different forms, which would require two separate cases to be consider when we do our proofs ;->, surely it would be better if we could restrict the grammars to just one case!

Well, it turns out we can; and its called Griebach Normal Form (GNF). It says that all grammars can reduced to productions of only the following form:

A -> aw ( where "A" is a member of V, "a" a member of "T", and w is a member of V*)

Notice that the bunch of variables at the end of the right-hand side (RHS) may be empty, so A->a is a perfectly valid production.

So, how do we do that? To do it, we will need to introduce two little procedures:

1 A-production tidying

Consider an arbitrary production of the form: A->w₁Bw₂. Find all of the B-productions: B -> B₁ | B₂ | ..... | B_n for that B. It is not surprising to note that we can remove the A-production noted above, and replace it with the following set of productions: A -> w₁B₁w₂ | w₁B₂w₂ | .... | w₁B_nw₂ Of course, we can prove it as well:

Call G the old grammar and G1 the new grammar: show that L(G) == L(G1)

<=
this is really pretty simple, because any derivation in G1 is just a series of productions. Each production is either in G already (since we just borrowed them from there directly) or is of the form A -> w₁B_iw₂. If so, we can replace that production in the derivation with A -> w₁Bw₂ and B -> B_i. So clearly, any derivation in G1 is also a derivation in G.
=>
Since all of the productions in G have duplicates in G1, except for A -> w₁Bw₂, we only have to consider the cases where this production is used.
Of course, if it is used (and assuming we really derive a string of terminals), then B must eventually derive something (via one of the B-productions), like this: B -> B_i. Now, just reorder these two productions (so that the B-production immediately follows the A-production), remove these two productions from the derivation, and use the step from G1: A -> w₁B_iw₂ instead. Therefore, we have built a grammar in G1 for any grammar in G.

N.B. We haven't given you any good reason to use this first rule yet! All we said is that you can pick out an arbitrary A-production from a random grammar--and that if you apply this transformation to the grammar, you will get back an equivalent grammar. We'll see why this might be useful soon.

2. Self Left-Most Productions

For a particular variable A, consider all of the A-productions of the following form:

A -> Aw,

That is, all those productions where the variable on the left-hand side (LHS) is also the leftmost variable on the RHS (w can be any mixture of terminals and variables--although when we use this rule, we will only have variables there). Any particular variable may have 0, 1, or lots of such productions:

A -> Aw₁ | Aw₂ | ... | A w_n

Also, make note of all of the other A-productions for this variable:

A -> B₁ | B₂ | ... | B_m

Add a fresh (new, unused) variable to the set V, get rid of all of the A-productions (with A as the leftmost item or not), and add in the following productions:

A -> B_i (for all i < m, NB in one sense, this just means we kept some of the A-productions....)
A -> B_iB (for all i < m)
B -> w_{i (for all i < n)}
B -> w_{iB (for all i < n)}

Hmmm, we don't have much better reason to know when or why to use this (except that you might note that is moving left-recursion [in the operation of the grammar] in favor of right-recursion...) rule than we did the last rule. But now, we'll take care of that:

GNF

Put the grammar into CNF (as above--at the top)
Pick an arbitrary ordering for the variables. The easiest thing to do is rename the variables in V (for example, make their names now be A₁, A₂.....), but all that matters is that there is an ordering, and you know what it is. (NB Some orderings are easier to work with than others, but any ordering will work.)
Starting with the lowest numbered productions, there are four possibilities:
1. A_i -> aw (ok!)
2. A_i -> A_jw (i < j: ok!)
3. A_i -> A_jw (i = j: save it)
4. A_i -> A_jw (j < i: NEXT STEP)

Whenever j < i, (case 4) simply apply the first algorithm (where the w_i is the empty string. Do this first for A₁, then for A₂, and so on, till you reach A_n. Note that after applying this algorithm for a particular variable, all of the productions for A₁ up to A_i will be only be the forms shown by cases 1,2,3.

Then (again working your way up the numbering scheme) apply the second algorithm to all of the productions of the form in case 3. This leaves all productions to have RHS which either begin with a terminal, or with a higher numbered variable.

But, we want to always have a variable at the beginning of any RHS, so (this time working from the largest number variables, downward) replace any production of the form A_n-1 -> A_nw by combining it will all of the A_n productions (note that since A_n is the highest numbered variable, all of its RHS must start with terminals). Now A_n-1 will have the same property, do it for A_n-2, and continue down to A₁.

Finally we have to take care of the B-productions, but notice that, due to the structure of algorithm 2, the RHS of any B-production must start with an A-variable. And, we just made it so that all of the A-productions have terminals as the leftmost symbol of their RHS. So, we can just substitute the A-productions in to the B-productions, and now every production will be of the appropriate form.


Consider the following example:

Here's the grammar:

S -> AA | 0
A -> SS | 1

first rename the variables:

A1 -> A2 A2 | 0
A2 -> A1 A1 | 1

three of the four productions are just fine,
only A2 -> A1 A1 is a problem.

Apply algorithm 1:

A2 -> A2 A2 A1 | 0 A1

So, now we have still have one problem production:

A2 -> A2 A2 A1

apply algorithm 2:

A2 -> 1 | 0 A1 | 1 B | 0 A1 B

B -> A2 A1 | A2 A1 B

so now our grammar looks like:

A1 -> A2 A2 | 0

A2 -> 1 | 0 A1 | 1 B | 0 A1 B

B -> A2 A1 | A2 A1 B

Now we must fix A1, so that is only starts with terminals:

A1 -> 1 A2 | 0 A1 A2 | 1 B A2 | 0 A1 B A2 | 0

then we must B in a similar fashion (replacing initial occurences of A2)

B -> 1 A1 | 0 A1 A1 | 1 B A1 | 0 A1 B A1 |
     1 A1 B | 0 A1 A1 B | 1 B A1 B | 0 A1 B A1 B

and now we have the following grammar:

A1 -> 1 A2 | 0 A1 A2 | 1 B A2 | 0 A1 B A2 | 0

A2 -> 1 | 0 A1 | 1 B | 0 A1 B
 
B -> 1 A1 | 0 A1 A1 | 1 B A1 | 0 A1 B A1 |
     1 A1 B | 0 A1 A1 B | 1 B A1 B | 0 A1 B A1 B

Which is in GNF.

ASSIGNMENT To proved you learned something:

Hand in for class on Tuesday, March 31, 1998:

Take the grammar S -> 0 | 1 | 0S1

1) put it into CNF

2) put it into GNF

Then take the grammar (already in CNF)
A -> BA | a
B -> AB | b

3) put it into GNF.