<- previous index next ->
Chomsky Normal Form is used by the CYK algorithm to determine
if a string is accepted by a Context Free Grammar.
Step 4) in the overall grammar "simplification" process is
to convert the grammar to Chomsky Normal Form.
Productions can be one of two formats A -> a or A -> BC
The right side of the production is either exactly
one terminal symbol or exactly two variables.
The grammars must have the "simplification" steps 1), 2) and 3) out
of the way, that is 1) No useless variables, 2) No nullable variables and
3) no unit productions.
Step 4) of "simplification" is the following algorithm:
'length' refers to the number of variables plus terminal
symbols on the right side of a production.
Loop through the productions
For each production with length greater than 1 do
Replace each terminal symbol with a new variable and
add a production new variable -> terminal symbol.
Loop through the productions
For each production with length grater than 2 do
Replace two rightmost variables with a new variable and
add a production new variable -> two rightmost variables.
(Repeat - either on a production or loop until no replacements.)
Now the grammar, as represented by the productions, is in Chomsky
Normal Form. proceed with CYK.
An optimization is possible but not required, for any two productions
with the same right side, delete the second production and
replace all occurrences of the second productions left variable
with the left variable of the first production in all productions.
Example grammar:
G = (V, T, P, S) V={S,A} T={a,b} S=S
S -> aAS
S -> a
A -> SbA
A -> SS
A -> ba
First loop through productions (Check n>1)
S -> aAS becomes S -> BAS (B is the next unused variable name)
B -> a
S -> a stays S -> a
A -> SbA becomes A -> SCA (C is the next unused variable name)
C -> b
A -> SS stays A -> SS
A -> ba becomes A -> DE
D -> b
E -> a
Second loop through productions (Check n>2)
S -> BAS becomes S -> BF (F is the next unused variable)
F -> AS
B -> a stays B -> a
S -> a stays S -> a
A -> SCA becomes A -> SG
G -> CA
C -> b stays C -> b
A -> SS stays A -> SS
A -> DE stays A -> DE
D -> b stays D -> b
E -> a stays E -> a
Optimization is possible, B -> a, S -> a, E -> a can be replaced
by the single production S -> a (just to keep 'S')
and all occurrences of 'B' and 'E' get replaced by 'S'.
Similarly D -> b can be deleted, keeping the C -> b production and
substituting 'C' for 'D'.
Giving the reduced Chomsky Normal Form:
S -> SF
F -> AS
S -> a
A -> CG
G -> CA
C -> b
A -> SS
A -> CS
<- previous index next ->