web.cs.city.ac.uk
Topic 3: The Pumping Lemma for Regular Laguages

Nonregular Languages
The Pumping Lemma
Applying the Pumping Lemma
Decidability and Chomsky's Classification

Nonregular Languages

Consider the language

which we might also write as
L = aⁿbⁿ for n = 0, 1, 2, ...
That is, every word in L consists of a certain number of as followed by the same number ofbs.
There is an infinite number of such words, one for each value of n.

Now consider the language generated by the Regular Expression a*b*.
Every word in this language consists of some (possibly none) as followed by some (possibly none) bs.
Clearly L is a subset of a*b* but it is not exactly equivalent to it.
a*b* also has an infinite number of words, but it is clearly regular, since it is generated by a regular expression.

Self Assessment Question

Find a string that is in a*b* but is not in L.

Is L is regular?

Suppose that L is a regular language.
Then there would have to exist some FSA that it.
By definition, this FSA would have to have a finite number of states.
Assume that the FSA that recognises L has, say, 95 states.
By definition, this FSA must accept all words in L and reject all words not in L.
Now, the string a⁹⁶b⁹⁶is in L.
The first 96 letters of this string are as.
Our FSA cannot visit a new state for each of these letters because it has only 95 states.
Therefore, at some point it must return to a state that it has visited previously.
If this happens, there is a circuit in the transition diagram and if there is such a circuit, then the FSA may loop round that circuit as many times as it likes.
Say the circuit has 7 states in it, then the FSA would also accept words such as a⁹⁶⁺⁷b⁹⁶.
But such a word is not in L because it has more as than bs.
So our FSA accepts some words not in L.
This contradicts the original assumption that our FSA recognised L.
Our choice of 95 for the number of states, and of 7 for the length of the circuit, have no effect on this argument, which may be constructed for any such numbers.
We must therefore reject our original assumption, that an FSA accepting L exists at all,
and we can conclude that L is not regular.
Back to Topic 3

The Pumping Lemma

This form of argument may be generalized as follows:

Theorem

Let L be any regular language that has infinitely many words.
Then there exist some three strings x, y and z (where y is not the null string) such that all strings of the form

xyⁿz , for n = 1, 2, 3...

are words in L.

Proof

If L is a regular language, then there is an FSA that accepts exactly the words in L.
This FSA has a finite number of states, but L has infinitely many words in it, so L must contain some words that are longer than the number of states in the FSA.
Let w be some word in L that is at least as long as the number of states in the FSA.
When this word generates a path through the machine, the path cannot visit a new state for each symbol because there are more symbols than states.
Therefore, it must at some point revisit a state that it has been to before.
We break the word w up into three parts.

All the symbols of w starting at the beginning that lead up to the first state that is revisited. Call this part x (it could be null if the state revisited is the start state).
Starting at the letter after the substring x, let y denote the substring of w that travels around the circuit coming back to the same state the circuit began with. Since there must be a circuit, y cannot be the null string. y contains the symbols of w for exactly one lop around the circuit.
Let z be the rest of w starting with one symbol after the substring y and going to the end of the string w. This z could be null. The path for z could also loop around the y (or or any other) circuit.

TheTD for such an FSA would have a structure like this:

Clearly, every word w = xyⁿz for n = 0,1,2,3,É is accepted by this FSA.
But this FSA recognises L, so every word accepted by it is in L.
Therefore, all words of the form w = xyⁿz for n = 0,1,2,3,É are in L.

End Proof

Back to Topic 3

Applying the Pumping Lemma

In order to prove that a language L is not regular, we first assume that it is (i.e. that some FSA can recognise it), then we use the Pumping Lemma to demonstrate that any such FSA would also recognise strings not in L.
This is known as proof by contradiction, or reductio ad absurdum.
The Pumping Lemma itself does not constitute a proof of the non-regularity of any particular language.
It must be used in a proof that makes use of structural properties of the language being considered.
The trick in applying it is to find suitable structural properties.
Example:

Prove, using the Pumping Lemma that the 'addition' language

L = a^pb^qc^r
where p, q, r= 0, 1, 2, 3, ... and p + q = r

is not Regular

To apply the Pumping Lemma, we must assume that there exists some FSA that recognises L, then demonstrate that this assumption leads to a contradiction.
Since that FSA must have, by definition, a finite number of states, but must recognise the infinite number oi words in L, it must have a circuit in it somewhere.
By the Pumping Lemma,

if L contains any word of the form x y z,
where y is the string consumed by going round the circuit once and x and y are any, possibly null, strings, then L must also contain all strings of the form x y ⁿz, for n = 0,1,2,3,...

If we examine the general structure of the words of L, we find that every non-null word is in the form a...ab...bc...c, that is an unbroken (possibly null) string of as followed by an unbroken (possibly null) string of bs, followed by an unbroken (possibly null) string of cs, with no more than one point where an a is immediately followed by a b and no more than one point where a b is immediately followed by a c, where the number of as plus the number of bs equals the number of cs.
The arbitrary non-null word in L may be partitioned into three successive substrings, x, y and z, in only five ways:

x is an unbroken (possibly null) string of as, y is a non-null unbroken string of as, and z contains the transitions from as to bs, and from bs to cs, or
x is an unbroken (possibly null) string of as, y contains the transition from as to bs, and z contains the transitions from bs to cs, or
x contains the transition from as to bs, y is an unbroken string of bs, and z contains the transitions from bs to cs, or
x contains the transition from as to bs, y contains the transitions from bs to cs, and z is an unbroken (possibly null) string of cs, or
x contains the transitions from as to bs, and from bs to cs, y is a non-null unbroken string of cs, and z is an unbroken (possibly null) string of cs.

These can be shown in diagrammatic form thus:

We have to show that, in each case, by going round the circuit an arbitrary number of times (that is, by 'pumping' y),
the FSA will accept words of the form x y ⁿz that are demonstrably not in L.
Examining each case in turn, we see that pumping y gives words that have:

too many as (i.e. p + q > r); or
more than one transition from a to b; or
too many bs (i.e. p + q > r); or
more than one transition from b to c; or
too many cs (i.e. p + q < r).

None of these words is in L.
Thus, our assumption that some FSA could recognise Lmust be false.
Therefore there is no such FSA.
Therefore L is not regular.
QED

There are some non-regular languages for which this technique fails.
Consider, for example, the language Palindrome which comprises every string from the alphabet {a,b} that reads the same when written backwards. For example, the null string is a word in Palindrome as are the strings a, b, aa, abba, aba, etc.
We cannot use the Pumping Lemma in the form given above because we can set

x=a, y=b, and z=a,
then xyz = aba is in the language,
but so are xyyz = abba, xyyyz = abbba, ..., xyⁿz = abⁿa, for all n.

A Stronger Form of the Pumping Lemma

Let L be an infinite language recognised by a finite automaton with n states.

Then for all words w in L whose length is greater than n,

there exist strings x, y and z, where y is not null and |x| + |y| ? n, such that

w = xyz and all strings of the form xyⁿz (for n= 1, 2, 3, ...) are in L.

Now consider an FSA that might recognise the language Palindrome. Let's say that this machine has 77 states.
It must accept the palindrome w = a⁸⁰ba⁸⁰.
Because this word has more symbols (161) than the FSA has states, we can break it into the three parts x, y and z.
But because |x|+|y| must be less than or equal to 77, both x and y must be solid strings of as.
So when we form the string xyyz, we add as to the front of w but not to the back, because all the as in the rear of w are in the z part, which stays fixed at 80 as.
So the string xyyz cannot be a palindrome.
But the second version of the Pumping Lemma requires Palindrome to include this string.
Therefore, Palindrome cannot be a regular language.
QED
Back to Topic 3

Decidability and Chomsky's Classification

Some question that we can ask about FSAs and REs are:

Do two given REs define the same language?
Are two given FSAs are equivalent?
Does the language defined by a given FSA have finitely or infinitely many words?

These are Ôyes/noÕ questions.
An effective solution to a yes/no question is called a decision procedure.
A problem that has a decision procedure is called decidable.

It has been proven that all these questions are decidable for the regular languages, but this is not the case for all languages.
Noam Chomsky introduced a classification of languages in which classes of language are distinguished according to the decidability of questions such as these.

The languages in each class share certain other properties, such as:

Is the class closed under union?

The union of two languages comprises the set all strings that are words in either language. This question asks whether the union of any two languages in some class is also a language in that class.

Is the class closed under intersection?

The intersection of two languages comprises the set all strings that are words in both languages.

Is the class closed under product?

The product of two languages, A and B, comprises the set of all strings that are formed by concatenating a word in A with a word in B.

Is the language closed under complement?

The complement of a language, L, whose alphabet is some set, S, of symbols, comprises the set of all strings that are in S^* but not in L. In other words, it is the set difference S* -L.

Is the class closed under Kleene Star?

The Kleene Star of a language comprises the set of words formed by concatenating each word of the language with itself an arbitrary number of times.

Language Class	Type 3	Type 2	Type 1	Type 0
Defined by	Regular Expression (RE)	Context Free	Context Sensitive	Phrase Structure = Recursively Enumerable
Corresponding Classes of Acceptor Machine	Finite State Automaton (FSA) Transition Graph (TG)	Pushdown Automaton (PDA)	Linear Bounded Automata (LBA) [TM with bounded tape]	Turing Machine (TM)
Is Computational Power of Deterministic = Non-Deterministic?	Yes: DFSA = NFSA	No: NPDA has more power. [DPDA = LR(k) grammars]	Yes	Maybe (this question is still open)
Closed Under?	Union, Product, Intersection, Kleene Star, Complement	Union, Product, Kleene Star	?	Union, Intersection, Product, Kleene Star
What's Decidable?	Equivalence, Emptiness, Finiteness, Membership	Emptiness, Finiteness, Membership	Membership	Not much!
Examples of Application	Text Editors, Sequential Circuit Synthesisers	Programming Language Statements, Compilers	Programming Languages, Compilers	Computers

We will address the Type 2 and Type 0 language classes later in this module.
You will meet Type 3, Type 2 and Type 1 languages in several other modules of your course.
REs are often used to describe the lexical structure of natural and programming languages and the behaviour of discrete control systems. FSAs feature in the design of the corresponding lexical analysers and controllers.
The Unixª tool set is largely constructed out of regular expression processors, such as grep and lex, and can therefore ease the construction of text editors and other language processors.

The closure properties make some proofs of non-regularity (and non-context freeness) much easier.
Consider the language Equal, which comprises all strings from the alphabet {a,b} that contain equal numbers of as and bs.
Some of the words in Equal are: the null string, ab, ba, aaabbb, abab, and so on.
It is difficult to see a suitable way of breaking a typical word of Equal into three parts in preparation for 'pumping'.
Instead, we observe that the language {aⁿbⁿ}, which we already know to be non-regular is equal to the intersection of Equal and the language defined by the regular expression a*b*, which is, of course, regular.
Now, if Equal were a regular language, then {aⁿbⁿ} would be equal to the intersection of two regular languages and therefore, by the closure property of regular languages, it would have to be regular itself.
But {aⁿbⁿ} is not regular, so Equal cannot be regular either.
QED

Back to Topic 3
Next Topic

Start of Module

If you have comments or suggestions, email to author at b.cohen@city.ac.uk
or me Vid-Com-Ram / members.tripod.com/bummerang

หมายเหตุ : เอกสารเวปหน้านี้นำมาจากมหาวิทยาลัยแห่งหนึ่งในประเทศอังกฤษ เป็นหลักสูตรคล้ายคลึงกับ CT313 นำมาเพื่อเป็นเอกสารเปรียบเทียบและอ้างอิง มิได้หวังผลประโยชน์ใดๆ และขอบอกว่าเค้ามีลิขสิทธิ์นะ

ทฤษฏีคอมพิวเตชั่น CT313 Original form MODULE P220 Theory of Computation 2001 http://web.cs.city.ac.uk Topic 3: The Pumping Lemma for Regular Laguages

Contents