web.cs.city.ac.uk
Topic 2: Automata and Regular Expressions (Kleene's Theorem)

Regular Expressions (RE)
Regular Languages (RL)
Kleene's Theorem
Applying Kleene's Theorem
Mealy/Moore Machines

Regular Expressions

The class of all languages recognisable by Finite State Automata is known as the regular (or type3) languages.
Regular languages may be described by regular expressions (RE).

Definition

Example

The RE for the language BI of Binary Integers is

BI = 0 + 1(0 + 1)*

which we may read as:
Every word in the language BI is either

the symbol 0, all by itself, or
the symbol 1 followed by any sequence (including the null sequence) of 0s and 1s, in any order.

Self Assessment Question

Write down a RE for the language of Binary Fractions

Note that 'intermediate variables' may be freely introduced into the equations of REs, as in the following

Example

The RE for the language DI of Decimal Integers is

DI = 0 + N(N + 0)*
N = 1+2+3+4+5+6+7+8+9

which may be read as:
Every word in the language DI is either

the symbol 0, all by itself, or
a word of the language N followed by any sequence (including the null sequence) of words from the language N or 0s, in any order, where
every word in the language N is either 1, or 2, or 3, ... or 9.

Back to Topic 2

Regular Languages

We formalise this notion in the following

Definition

Self Assessment Questions

Write down a RE for the language of Binary Fractions
Write down a RE for the language of Decimal Fractions

The above definition leaves open some important

Questions

It is possible for different regular expressions to define the same language, but can we determine whether or not any two regular expressions are equivalent in this sense?
We have seen that every regular expression is associated with some language, but is it also true that every language is associated with some regular expression?

These questions were answered by the logician Stephen C. Kleene.
Back to Topic 2

Kleene's Theorem

Any language that can be defined by

regular expressions (RE), or
by finite state automata (FSA), or
by transition graphs (TG),

can be defined by all three methods.

Proof

The proof is in three parts:

Every language that can be defined by a FSA can also be defined by a TG.
Every language that can be defined by a RE can also be defined by a FSA.
Every language that can be defined by a TG can also be defined by a RE.

Part 1: TG=FSM

This is easy.
Every finite state automaton is representable by a corresponding transition graph.
Therefore any language that has been defined by a FSA has already been defined by a TG.

Part 2: RE=FSA

The proof of this part is by recursive definition and constructive algorithm:

Rule 1:

There is an FSA that accepts any particular letter of the alphabet.
There is an FSA that accepts only the empty string.

Self Assessment Questions

Construct FSAs for each of the two cases in rule 1.

Rule 2:

If there is an FSA that accepts the language defined by the RE r
and there is an FSA that accepts the language defined by the RE s,
then there is an FSA that accepts the language defined by the RE (r+s).

Self Assessment Questions

Construct FSAs for two simple REs and derive the FSA for their 'sum'.

Rule 3:

If there is an FSA that accepts the language defined by the RE r
and there is an FSA that accepts the language defined by the RE s,
then there is an FSA that accepts the language defined by the RE rs.

Self Assessment Questions

Construct FSAs for two simple REs and derive the FSA for their 'concatenation'.

Rule 4:

If there is an FSA that accepts the language defined by the RE r,
then there is an FSA that accepts the RE r*.

Self Assessment Questions

Construct FSAs for two simple REs and derive the FSA for their 'closure'.

End Proof

Part 3: TG=RE

The proof of this part provides a constructive algorithm that will transform every conceivable TG into an equivalent RE in a finite number of steps.

Step 1: Single Start State

A TG may have many start states.
We simplify such a TG by defining a single start state from which a transition, labeled with the symbol for the null string, is constructed to each state that was a start state of the original graph (but is not a start state of the new one), thus:

Step 2: Single Accepting State

Similarly, a TG with more than one accepting state can be transformed to an equivalent graph with a single accepting state to which an arc, labeled with the symbol for the null string, is constructed from each accepting state of the original graph, thus:

Step 2a: No Transitions into Start State or out of Final State

This rule eliminates loops at the start and end by introducing, if necessary, a new start state with a null transition to the original one , and/or a new accepting state with a null transition from the original one.

Step 3: Regular Expressions on Edges

Next we build up, piece by piece, a RE that defines the same language as the TG.
Previously, we allowed the edges of the graph to be labeled only with strings of symbols from the alphabet.
However, since every string can itself be thought of as a RE, we now allow REs themselves to label edges of the graph.

Step 3.1: Sum Loops

Consider a state with an edge looping back to itself.
However many different edges like this that there are on a given state, they can always be reduced to just one edge, the REs on those edges being summed, thus:

Step 3.2: Sum Edges

Similarly, two states that are connected by more than one edge going in the same direction can be reduced to just one edge, labelled by a summation, thus:

Step 3.3: Bypass States

The bypass operation is used progressively to reduce the TG to a RE: thus:

Here, the removal of state 2 from the graph breaks a path that leads from state 1 to state 3.
This path must be restored, and labelled with the appropriate RE, in this case ab, if the new graph is to be equivalent to the original.
When performing a bypass transformation, remember to account for all such broken paths, and no others.
If the bypassed state has an arc to itself (i.e. a loop) then the RE on that loop is Kleene starred, thus:

Back to Topic 2

Applying Kleene's Theorem

Given any TG, one may derive a Regular Expression for the language recognised by the FSA that the TG represents by applying appropriate steps of Kleene's theorem, successively reducing the graph until it consists of exactly one start state and one accept state.
At that point, the arc between those states will have become labelled by the required regular expression.

Example

Derive a Regular Expression for the language recognised by the following Transition Graph by applying an appropriate sequences of transformations, each of which is a step in Kleene's Theorem.

We will describe the steps in sequence here.
You should draw the succesive graphs yourself as as a Self Assessment Exercise.

Step 1 is not applicable here because there is already only one start state (1).
Step 2 is not applicable here because there is already only one accept state (4).
Step 2a is applicable because the start state (1) has two arcs leaving it and the accept state (4) has two arcs entering it.
We therefore introduce null transitions from a new start state (say, 5) to a new final state (say, 6) amd remove the start\final state indications from states 1 and 2.

Now we can apply Step 3.1 to sum the loops on state 4, which transforms the decoration on the loop from "a,b" to the RE "a+b".

Step 3.2 is not applicable here because there is no pair of arcs between the same states in the same direction.

Step 3.3 is applicable here, and the only choices (that do not reintroduce multiple arcs on the start and accept states) are to bypass state 2 or to bypass state 3.
Let's choose (arbitrarily) to bypass state 2.
This breaks four paths:

from state 1 to state 3, which requires the RE "ab",
from state 1 to state 4, which requires the RE "aa",
from state 3 to state 4, which requires the RE "aa", and
from state 3 to itself, which requires the RE "ab".

We remove the broken paths, and state 2, from the TG and insert the new paths with their REs.
Note the difference between the sum "a+b" on the loop at state 4 and the concatenation "ab" on the loop at state 3.

Now we can apply Step 3.2 to sum the arcs from state 1 to state 3, giving "ab+b", and the arcs from state 3 to state 4, giving ""aa+b", because they run in the same direction.

Now we can apply Step 3.4 to bypass state 3, including the loop on it.
This breaks one path from state 1 to state 4, which requires the RE "(ab+b)(ab)*(aa+b)".
Note how:

the whole RE labelling the loop on state 3 is enclosed within the Kleene star, and
parentheses are needed to disambiguate the new RE on the lower arc from state 1 to state 4.

Now we can apply Step 3.2 again to sum the arcs from state 1 to state 4, because they run in the same direction, giving a single arc from state 1 to state 4 labelled with the RE "aa+(ab+b)(ab)*(aa+b)".
Noe we can remove state 4, together with the loop on it. This breaks one path from state 1 to state 6 which requires the RE "(aa+(ab+b)(ab)*(aa+b))(a+b)*".
Note that the null transition, that we introduced from state 4 to state 5, vanishes again, because concatenating the null string to any RE does not alter it.

And finally, we can apply Step 3.2 again to bypass state 1. This breaks one path from state 5 to state 6 but does not change the RE because the path from state 1 to state 5 was labelled with the null string.
We now have a TG with a single start state (5), a single final state (6) and a single arc between them.
The label on this arc is the RE "(aa+(ab+b)(ab)*(aa+b))(a+b)*", which describes the same language as the original TG, as required.

Self Assessment Questions

Choose some words of the language generated by this RE and check that they are indeed recognised by the TG.
Choose some words not generated by the RE, but with the same alphabet, and check that the TG rejects them.
Repeat the derivation of an RE from the TG, but this time bypassing state 3 first.
Compare the REs produced by the two alternative derivations. They both generate the same language but is that fact obvious by inspection?

Back to Topic 2

Mealy/Moore Machines

So far, out machines have all been language acceptors, but it is possible to have FSAs which also produce outputs.
A Moore Machine is a collection of five things:

A finite set of states q0, q1, ... where q0 is designated as the start state.
An alphabet of symbols for forming the input string.
An alphabet of possible output symbols.
A transition table mapping each state and input symbol to the next state.
An output table showing what symbol from the output alphabet is generated at each state.

The input and output alphabets are not necessarily the same.

The Mealy Machines are similar, but their outputs are generated during transitions. The output alphabet therefore labels arcs rather than states.

It can be shown that Moore and Mealy machines are equivalent, that is, that the behaviour of every Moore machine may be simulated by a Mealy machine, and vice versa.
Back to Topic 2
Next Topic

Start of Module

If you have comments or suggestions, email to author at b.cohen@city.ac.uk
or me Vid-Com-Ram / members.tripod.com/bummerang

หมายเหตุ : เอกสารเวปหน้านี้นำมาจากมหาวิทยาลัยแห่งหนึ่งในประเทศอังกฤษ เป็นหลักสูตรคล้ายคลึงกับ CT313 นำมาเพื่อเป็นเอกสารเปรียบเทียบและอ้างอิง มิได้หวังผลประโยชน์ใดๆ และขอบอกว่าเค้ามีลิขสิทธิ์นะ

ทฤษฏีคอมพิวเตชั่น CT313 Original form MODULE P220 Theory of Computation 2001 http://web.cs.city.ac.uk Topic 2: Automata and Regular Expressions (Kleene's Theorem)

Contents

Regular Expressions

Definition

Example

Self Assessment Question

Example

Regular Languages

Definition

Self Assessment Questions

Questions

Kleene's Theorem

Proof

Part 1: TG=FSM

Part 2: RE=FSA

Rule 1:

Self Assessment Questions

Rule 2:

Self Assessment Questions

Rule 3:

Self Assessment Questions

Rule 4:

Self Assessment Questions

End Proof

Part 3: TG=RE

Step 1: Single Start State

Step 2: Single Accepting State

Step 2a: No Transitions into Start State or out of Final State

Step 3: Regular Expressions on Edges

Step 3.1: Sum Loops

Step 3.2: Sum Edges

Step 3.3: Bypass States

Applying Kleene's Theorem

Example

Self Assessment Questions

Mealy/Moore Machines

ทฤษฏีคอมพิวเตชั่น CT313
Original form MODULE P220
Theory of Computation 2001 http://web.cs.city.ac.uk
Topic 2: Automata and Regular Expressions (Kleene's Theorem)