Structural
Representation
There are two important
notations that are not automation like but play an important role in the study
of automata and their applications.
1.
Grammar 2.
Regular Expressions.
Grammars :
Are useful models when designing software that processes data with a recursive structure. The best known example is a “parser” the component of a compiler that deals with the recursively nested features of the typical programming language such as expressions, arithmetic conditional and so on.
For instance, a grammatical rule like EèE+E states that an expression can be farmed by taking any two expressions and connecting them by a plus sign.
This rule is typical of how expressions of real programming languages are formed. We introduce context free grammars, as they are usually called.
Regular Expressions :
Regular Expressions are also denote the structure
of data, especially text strings. As we shall see, the patterns of strings they
describe are exactly the same as what can be described by finite automata, the
style of these expressions differs significantly from that of grammars and we
shall content ourselves with a simple example here.
The UNIX style regular expressions, [A-Z][a-z] * [ ][A-Z][A-Z]1,
represents capitalized words followed by a space and two capital letters. This
expression represents patterns in text that could be a city and state, ex,
Ithaca NY. It misses multiword city names such as Palo Alto CA, Which could be
captured by the more complex expression.
[A-Z][a-z]*([ ][A-Z][a-z]*)*[ ][A-Z][A-Z] when interpreting such
expressions we only need to know that [A-Z] represents a range of characters
from capital ‘A’ to Capital ‘Z’ (i.e. any capital letter), and [ ] is used to
represent the blank character alone. Also the symbol * represents “Any number
of” the preceding expression. Parentheses are used to group components of the
expression; they do not represent characters of the text described.
Automata and complexity:
Automata are essential for the
study of the limits of computation. As we mentioned in the introduction to the
chapter, these are two important issues:
What can a
computer do at all? This study is called “decidability”, and the problems that
can be solved by computer are called “decidable”.
What can a computer do efficiently? This study can is called “intractability”, and the problems that can be solved by a computer using no more time than some slowly growing function of the size of the i/p are called “tractable”. Often we take all polynomial functions to be “slowly growing”, while functions that grow faster than any polynomial are deemed to grow too fast.
What can a computer do efficiently? This study can is called “intractability”, and the problems that can be solved by a computer using no more time than some slowly growing function of the size of the i/p are called “tractable”. Often we take all polynomial functions to be “slowly growing”, while functions that grow faster than any polynomial are deemed to grow too fast.
The central concepts of
Automata Theory
In this
section we shall introduce the most important definitions of terms that parade
the theory of automata. These concepts include the
Alphabet
( a set of symbols)
Strings
(a list of symbols from an alphabet)
Language(a
set of strings from the same alphabet)
Alphabets: An alphabet is a finite, non empty set of symbols. Conventionally
we use the symbol “ ∑“ for an
alphabet. Common alphabets include
i.
∑= {0,1 }, The binary
alphabet
ii.
∑={a, b, …z} the set of all
lower case letters
iii.
The set of all
ASCII characters, or the set of all printable ASCII characters.
Strings: A
string is a finite sequence of symbol chosen from a some alphabet. For ex,
01101 is a string from the binary alphabet ∑={0,1}.
The string 111 is another string chosen from this alphabet.
The empty string: is the string with zero occurrences of symbols. This string,
denoted , € is a string that may be chosen from any alphabet what so ever.
Length of a string : it is
often useful to classify strings by their length , that is the number of
positions for symbols in the string. For instance, 01101 has length 5. It is
common to say that the
length of a string is “the number of symbols” in the string.
The standard notation for the length
of a string w is |w|. for ex |10010| =5, |€|=0.
Powers of an alphabet: If ∑ is an alphabet, we
can express the set of all strings of a certain length from that alphabet by
using an exponential notation. We define ∑k to be the set of strings of length k, each of whose symbols is in ∑.
Ex: Note that ∑0= €, no matter what the alphabet ∑ is. In other words , € is the only string of length 0.
If ∑ = { a,b,c } then ∑1
= {a,b,c
∑2= { aa, ab, ac, ba, bb, bc, ca, cb, cc}
∑3= { aaa, aab, aac, aba, abb, abc, aca, acb, acc, baa, bab, bac,
bba, bbb, bbc, bca, bcb, bcc, caa, cab, cac, cba, cbb, cbc, cca, ccb, ccc }
Type coonvertion for symbols and strings:
Commonly
we shall use lowercase letters at the beginning of the alphabet (or digits) to
denote symbols, and lowercase letters near the end of the alphabet typically
w,x,y and z to denote strings. You should try to get used to this convention to
help remind you of the types of the elements being discussed.
The set of all strings over an
alphabet ∑ is conventionally denoted
∑* for instances {0,1}* = { €,
0,1,00,01,10,11,000,….}
Put
another way,
∑* = ∑1U∑2U∑3U∑4……
Sometimes we wish to exclude the
empty string from the set of strings. The set of non empty strings from
alphabet ∑ is denoted . Thus two
appropriate equivalences are
∑+ = ∑1U∑2U∑3U∑4……
∑* = ∑+U { € }
Concatenation of strings :
Let x
and y be strings, then xy denotes the concatenation of x and y, that is the
string formed by making a copy of x and following it by a copy of y. more
precisely, if x is the string composed of I symbols x=a1a2….ai and y is the
string composed of j symbols y = b1b2b3…bj, then xy is the string of length i+j
: xy=a1a2….aib1b2….bj
Ex :Let
x=1101 and y=0011 then xy=11010011 and yx = 00111101. For any string w, the
equations €w = w€ = w hold. That is, €
is the identity for concatenation, since when concatenated with any string it
yields the other string as a result.
Languages : A set of strings all of which are chosen from some ∑*, where ∑ is a particular alphabet is called a language. If ∑ is an alphabet and L C= ∑*, then L is a language over ∑.
However there are also many other languages that appear when we study automata. Some
are abstract examples such as
1.
The language of all strings consisting of n
0’s followed by n 1’s for some n>=0: { €, 01, 0011, 000111,….}
2.
The set of
strings of 0’s and 1’s and 1’s with an equal number of each { €, 01, 10, 0011,
0101, 1001, ….}
3.
The set of
binary numbers whose value is a prime {10, 11, 101, 111, 1011,….}
4.
∑* is
a language for any alphabet ∑
5.
Φ the empty language is a
language over any alphabet.
6.
{ € } the language consisting of only the empty string is also a
language over any alphabet. Notice that Φ
≠ { € }, the former has no
string and the latter has one string.
7.
The only important
constraint on what can be a language is that all alphabets are finite.
8.
Thus languages
although they can have an infinite number of strings are restricted to consist
of strings drawn from one fixed finite alphabet.
Problems : In automata theory a problem is the question of deciding whether a given string a member of some particular language. It turns out as we shall see that anything we more colloquially call a problem can be expressed as membership in a language. More precisely, if ∑ is an alphabet and L is a language over ∑ then the problem L is:
Given a
string w in ∑* decide whether or not w is in L
Very helpful!
ReplyDelete