Consider the language
which we might also write as
L = anbn for n = 0, 1, 2, ...
That is, every word in L consists of a certain number of as followed
by the same number ofbs.
There is an infinite number of such words, one for each value of n.
Now consider the language generated by
the Regular Expression a*b*.
Every word in this language consists of some (possibly none) as followed
by some (possibly none) bs.
Clearly L is a subset of a*b* but it is not exactly equivalent
to it.
a*b* also has an infinite number of words, but it is clearly regular,
since it is generated by a regular expression.
Find a string that is in a*b* but is not in L.
Suppose that L is a regular language.
Then there would have to exist some FSA that it.
By definition, this FSA would have to have a finite number of states.
Assume that the FSA that recognises L has, say, 95 states.
By definition, this FSA must accept all words in L and reject all words
not in L.
Now, the string a96b96is in L.
The first 96 letters of this string are as.
Our FSA cannot visit a new state for each of these letters because it has only
95 states.
Therefore, at some point it must return to a state that it has visited previously.
If this happens, there is a circuit in the transition diagram and if there is
such a circuit, then the FSA may loop round that circuit as many times as it
likes.
Say the circuit has 7 states in it, then the FSA would also accept words such
as a96+7b96.
But such a word is not in L because it has more as than bs.
So our FSA accepts some words not in L.
This contradicts the original assumption that our FSA recognised L.
Our choice of 95 for the number of states, and of 7 for the length of the circuit,
have no effect on this argument, which may be constructed for any such numbers.
We must therefore reject our original assumption, that an FSA accepting
L exists at all,
and we can conclude that L is not regular.
Back
to Topic 3
This form of argument may be generalized as follows:
Let L be any regular language that
has infinitely many words.
Then there exist some three strings x, y and z (where y
is not the null string) such that all strings of the form
xynz , for n = 1, 2, 3...
are words in L.
If L is a regular language, then
there is an FSA that accepts exactly the words in L.
This FSA has a finite number of states, but L has infinitely many words
in it, so L must contain some words that are longer than the number of
states in the FSA.
Let w be some word in L that is at least as long as the number
of states in the FSA.
When this word generates a path through the machine, the path cannot visit a
new state for each symbol because there are more symbols than states.
Therefore, it must at some point revisit a state that it has been to before.
We break the word w up into three parts.
TheTD for such an FSA would have a structure
like this:
Clearly, every word w = xynz for n = 0,1,2,3,É
is accepted by this FSA.
But this FSA recognises L, so every word accepted by it is in L.
Therefore, all words of the form w = xynz for n
= 0,1,2,3,É are in L.
In order to prove that a language L
is not regular, we first assume that it is (i.e. that some FSA
can recognise it), then we use the Pumping Lemma to demonstrate that any such
FSA would also recognise strings not in L.
This is known as proof by contradiction, or reductio ad absurdum.
The Pumping Lemma itself does not constitute a proof of the non-regularity of
any particular language.
It must be used in a proof that makes use of structural properties of
the language being considered.
The trick in applying it is to find suitable structural properties.
Example:
Prove, using the Pumping Lemma that the 'addition' language
L = apbqcr
where p, q, r= 0, 1, 2, 3, ... and p + q = ris not Regular
To apply the Pumping Lemma, we must assume
that there exists some FSA that recognises L, then demonstrate that this
assumption leads to a contradiction.
Since that FSA must have, by definition, a finite number of states, but must
recognise the infinite number oi words in L, it must have a circuit in
it somewhere.
By the Pumping Lemma,
if L contains any word of the form x y z,
where y is the string consumed by going round the circuit once and x and y are any, possibly null, strings, then L must also contain all strings of the form x y nz, for n = 0,1,2,3,...
If we examine the general structure of
the words of L, we find that every non-null word is in the form a...ab...bc...c,
that is an unbroken (possibly null) string of as followed by an unbroken
(possibly null) string of bs, followed by an unbroken (possibly null)
string of cs, with no more than one point where an a is immediately
followed by a b and no more than one point where a b is
immediately followed by a c, where the number of as plus the number
of bs equals the number of cs.
The arbitrary non-null word in L may be partitioned into three successive
substrings, x, y and z, in only five ways:
These can be shown in diagrammatic form
thus:
We have to show that, in each case,
by going round the circuit an arbitrary number of times (that is, by 'pumping'
y),
the FSA will accept words of the form x y nz that are demonstrably
not in L.
Examining each case in turn, we see that pumping y gives words that have:
None of these words is in L.
Thus, our assumption that some FSA could recognise Lmust be false.
Therefore there is no such FSA.
Therefore L is not regular.
QED
There are some non-regular languages for
which this technique fails.
Consider, for example, the language Palindrome
which comprises every string from the alphabet {a,b} that reads
the same when written backwards. For example, the null string is a word in Palindrome
as are the strings a, b, aa, abba, aba, etc.
We cannot use the Pumping Lemma in the form given above because we can set
x=a, y=b, and z=a,
then xyz = aba is in the language,
but so are xyyz = abba, xyyyz = abbba, ..., xynz = abna, for all n.
Let L be an infinite language recognised by a finite automaton with n states.
Then for all words w in L whose length is greater than n,
there exist strings x, y and z, where y is not null and |x| + |y| ? n, such that
w = xyz and all strings of the form xynz (for n= 1, 2, 3, ...) are in L.
Now consider an FSA that might recognise
the language Palindrome. Let's say that this machine has 77 states.
It must accept the palindrome w = a80ba80.
Because this word has more symbols (161) than the FSA has states, we can break
it into the three parts x, y and z.
But because |x|+|y| must be less than or equal to 77, both x
and y must be solid strings of as.
So when we form the string xyyz, we add as to the front of w
but not to the back, because all the as in the rear of w are in
the z part, which stays fixed at 80 as.
So the string xyyz cannot be a palindrome.
But the second version of the Pumping Lemma requires Palindrome to include
this string.
Therefore, Palindrome cannot be a regular language.
QED
Back
to Topic 3
Some question that we can ask about FSAs and REs are:
These are Ôyes/noÕ questions.
An effective solution to a yes/no question is called a decision procedure.
A problem that has a decision procedure is called decidable.
It has been proven that all these questions
are decidable for the regular languages, but this is not the case for all languages.
Noam Chomsky introduced a classification of languages in which classes of language
are distinguished according to the decidability of questions such as these.
The languages in each class share certain other properties, such as:
The union of two languages comprises the set all strings that are words in either language. This question asks whether the union of any two languages in some class is also a language in that class.
The intersection of two languages comprises the set all strings that are words in both languages.
The product of two languages, A and B, comprises the set of all strings that are formed by concatenating a word in A with a word in B.
The complement of a language, L, whose alphabet is some set, S, of symbols, comprises the set of all strings that are in S* but not in L. In other words, it is the set difference S* -L.
The Kleene
Star of a language comprises the set of words formed by concatenating each
word of the language with itself an arbitrary number of times.
Language Class | Type 3 | Type 2 | Type 1 | Type 0 |
Defined by | Regular Expression (RE) | Context Free | Context Sensitive | Phrase Structure = Recursively Enumerable |
Corresponding Classes of Acceptor Machine | Finite State Automaton (FSA) Transition Graph (TG) |
Pushdown Automaton (PDA) | Linear Bounded Automata (LBA) [TM with bounded tape] |
Turing Machine (TM) |
Is Computational Power of Deterministic = Non-Deterministic? | Yes: DFSA = NFSA | No: NPDA has more power. [DPDA = LR(k) grammars] |
Yes | Maybe (this question is still open) |
Closed Under? | Union, Product, Intersection, Kleene Star, Complement | Union, Product, Kleene Star | ? | Union, Intersection, Product, Kleene Star |
What's Decidable? | Equivalence, Emptiness, Finiteness, Membership | Emptiness, Finiteness, Membership | Membership | Not much! |
Examples of Application | Text Editors, Sequential Circuit Synthesisers | Programming Language Statements, Compilers | Programming Languages, Compilers | Computers |
We will address the Type 2 and Type 0 language
classes later in this module.
You will meet Type 3, Type 2 and Type 1 languages in several other modules of
your course.
REs are often used to describe the lexical structure of natural and programming
languages and the behaviour of discrete control systems. FSAs feature
in the design of the corresponding lexical analysers and controllers.
The Unixª tool set is largely constructed out of regular expression processors,
such as grep and lex, and can therefore ease the construction
of text editors and other language processors.
The closure properties make some proofs
of non-regularity (and non-context freeness) much easier.
Consider the language Equal, which comprises all strings from the alphabet
{a,b} that contain equal numbers of as and bs.
Some of the words in Equal are: the null string, ab, ba,
aaabbb, abab, and so on.
It is difficult to see a suitable way of breaking a typical word of Equal
into three parts in preparation for 'pumping'.
Instead, we observe that the language {anbn}, which
we already know to be non-regular is equal to the intersection
of Equal and the language defined by the regular expression a*b*,
which is, of course, regular.
Now, if Equal were a regular language, then {anbn}
would be equal to the intersection of two regular languages and therefore, by
the closure property of regular languages, it would have to be regular itself.
But {anbn} is not regular, so Equal
cannot be regular either.
QED
If you have comments or
suggestions, email to author at b.cohen@city.ac.uk
or me Vid-Com-Ram
/ members.tripod.com/bummerang
หมายเหตุ : เอกสารเวปหน้านี้นำมาจากมหาวิทยาลัยแห่งหนึ่งในประเทศอังกฤษ
เป็นหลักสูตรคล้ายคลึงกับ CT313 นำมาเพื่อเป็นเอกสารเปรียบเทียบและอ้างอิง มิได้หวังผลประโยชน์ใดๆ
และขอบอกว่าเค้ามีลิขสิทธิ์นะ