We have already consider in class the restriction of the CFGs to Chomsky Normal Form (CNF), that every production in the set of production P for the grammar will be in one of the following forms:
But, this is two different forms, which would require two separate cases to be consider when we do our proofs ;->, surely it would be better if we could restrict the grammars to just one case!
Well, it turns out we can; and its called Griebach Normal Form (GNF). It says that all grammars can reduced to productions of only the following form:
So, how do we do that? To do it, we will need to introduce two little procedures:
Call G the old grammar and G1 the new grammar: show that L(G) == L(G1)
this is really pretty simple, because any derivation in G1 is just a series of productions. Each production is either in G already (since we just borrowed them from there directly) or is of the form A -> w1Biw2. If so, we can replace that production in the derivation with A -> w1Bw2 and B -> Bi. So clearly, any derivation in G1 is also a derivation in G.
Since all of the productions in G have duplicates in G1, except for A -> w1Bw2, we only have to consider the cases where this production is used.
Of course, if it is used (and assuming we really derive a string of terminals), then B must eventually derive something (via one of the B-productions), like this: B -> Bi. Now, just reorder these two productions (so that the B-production immediately follows the A-production), remove these two productions from the derivation, and use the step from G1: A -> w1Biw2 instead. Therefore, we have built a grammar in G1 for any grammar in G.
A -> Aw,
That is, all those productions where the variable on the left-hand side (LHS) is also the leftmost variable on the RHS (w can be any mixture of terminals and variables--although when we use this rule, we will only have variables there). Any particular variable may have 0, 1, or lots of such productions:
A -> Aw1 | Aw2 | ... | A wn
Also, make note of all of the other A-productions for this variable:
A -> B1 | B2 | ... | Bm
Add a fresh (new, unused) variable to the set V, get rid of all of the A-productions (with A as the leftmost item or not), and add in the following productions:
Then (again working your way up the numbering scheme) apply the second algorithm to all of the productions of the form in case 3. This leaves all productions to have RHS which either begin with a terminal, or with a higher numbered variable.
But, we want to always have a variable at the beginning of any RHS, so (this time working from the largest number variables, downward) replace any production of the form An-1 -> Anw by combining it will all of the An productions (note that since An is the highest numbered variable, all of its RHS must start with terminals). Now An-1 will have the same property, do it for An-2, and continue down to A1.
Finally we have to take care of the B-productions, but notice that, due to the structure of algorithm 2, the RHS of any B-production must start with an A-variable. And, we just made it so that all of the A-productions have terminals as the leftmost symbol of their RHS. So, we can just substitute the A-productions in to the B-productions, and now every production will be of the appropriate form.
Consider the following example: Here's the grammar: S -> AA | 0 A -> SS | 1 first rename the variables: A1 -> A2 A2 | 0 A2 -> A1 A1 | 1 three of the four productions are just fine, only A2 -> A1 A1 is a problem. Apply algorithm 1: A2 -> A2 A2 A1 | 0 A1 So, now we have still have one problem production: A2 -> A2 A2 A1 apply algorithm 2: A2 -> 1 | 0 A1 | 1 B | 0 A1 B B -> A2 A1 | A2 A1 B so now our grammar looks like: A1 -> A2 A2 | 0 A2 -> 1 | 0 A1 | 1 B | 0 A1 B B -> A2 A1 | A2 A1 B Now we must fix A1, so that is only starts with terminals: A1 -> 1 A2 | 0 A1 A2 | 1 B A2 | 0 A1 B A2 | 0 then we must B in a similar fashion (replacing initial occurences of A2) B -> 1 A1 | 0 A1 A1 | 1 B A1 | 0 A1 B A1 | 1 A1 B | 0 A1 A1 B | 1 B A1 B | 0 A1 B A1 B and now we have the following grammar: A1 -> 1 A2 | 0 A1 A2 | 1 B A2 | 0 A1 B A2 | 0 A2 -> 1 | 0 A1 | 1 B | 0 A1 B B -> 1 A1 | 0 A1 A1 | 1 B A1 | 0 A1 B A1 | 1 A1 B | 0 A1 A1 B | 1 B A1 B | 0 A1 B A1 B Which is in GNF. To proved you learned something: Hand in for class on Tuesday, March 31, 1998: Take the grammar S -> 0 | 1 | 0S1 1) put it into CNF 2) put it into GNF Then take the grammar (already in CNF) A -> BA | a B -> AB | b 3) put it into GNF.