Crazee Crazee Life: Structural Representation of Finite Automata || Formal Languages And Automata Theory

Structural Representation

There are two important notations that are not automation like but play an important role in the study of automata and their applications.

1. Grammar 2. Regular Expressions.

Grammars :

Are useful models when designing software that processes data with a recursive structure. The best known example is a “parser” the component of a compiler that deals with the recursively nested features of the typical programming language such as expressions, arithmetic conditional and so on.

For instance, a grammatical rule like EèE+E states that an expression can be farmed by taking any two expressions and connecting them by a plus sign.

This rule is typical of how expressions of real programming languages are formed. We introduce context free grammars, as they are usually called.

Regular Expressions :

Regular Expressions are also denote the structure of data, especially text strings. As we shall see, the patterns of strings they describe are exactly the same as what can be described by finite automata, the style of these expressions differs significantly from that of grammars and we shall content ourselves with a simple example here.

The UNIX style regular expressions, [A-Z][a-z] * [ ][A-Z][A-Z]1, represents capitalized words followed by a space and two capital letters. This expression represents patterns in text that could be a city and state, ex, Ithaca NY. It misses multiword city names such as Palo Alto CA, Which could be captured by the more complex expression.

[A-Z][a-z]*([ ][A-Z][a-z]*)*[ ][A-Z][A-Z] when interpreting such expressions we only need to know that [A-Z] represents a range of characters from capital ‘A’ to Capital ‘Z’ (i.e. any capital letter), and [ ] is used to represent the blank character alone. Also the symbol * represents “Any number of” the preceding expression. Parentheses are used to group components of the expression; they do not represent characters of the text described.

Automata and complexity:

Automata are essential for the study of the limits of computation. As we mentioned in the introduction to the chapter, these are two important issues:

What can a computer do at all? This study is called “decidability”, and the problems that can be solved by computer are called “decidable”.
What can a computer do efficiently? This study can is called “intractability”, and the problems that can be solved by a computer using no more time than some slowly growing function of the size of the i/p are called “tractable”. Often we take all polynomial functions to be “slowly growing”, while functions that grow faster than any polynomial are deemed to grow too fast.

The central concepts of Automata Theory

In this section we shall introduce the most important definitions of terms that parade the theory of automata. These concepts include the

Alphabet ( a set of symbols)

Strings (a list of symbols from an alphabet)

Language(a set of strings from the same alphabet)

Alphabets: An alphabet is a finite, non empty set of symbols. Conventionally we use the symbol “ ∑“ for an alphabet. Common alphabets include

i. ∑= {0,1 }, The binary alphabet

ii. ∑={a, b, …z} the set of all lower case letters

iii. The set of all ASCII characters, or the set of all printable ASCII characters.

Strings: A string is a finite sequence of symbol chosen from a some alphabet. For ex, 01101 is a string from the binary alphabet ∑={0,1}. The string 111 is another string chosen from this alphabet.

The empty string: is the string with zero occurrences of symbols. This string, denoted , € is a string that may be chosen from any alphabet what so ever.

Length of a string : it is often useful to classify strings by their length , that is the number of positions for symbols in the string. For instance, 01101 has length 5. It is common to say that the length of a string is “the number of symbols” in the string.

The standard notation for the length of a string w is |w|. for ex |10010| =5, |€|=0.

Powers of an alphabet: If ∑ is an alphabet, we can express the set of all strings of a certain length from that alphabet by using an exponential notation. We define ∑^k to be the set of strings of length k, each of whose symbols is in ∑.

Ex: Note that ∑⁰= €, no matter what the alphabet ∑ is. In other words , € is the only string of length 0.

If ∑ = { a,b,c } then ∑¹ = {a,b,c

∑²= { aa, ab, ac, ba, bb, bc, ca, cb, cc}

∑³= { aaa, aab, aac, aba, abb, abc, aca, acb, acc, baa, bab, bac, bba, bbb, bbc, bca, bcb, bcc, caa, cab, cac, cba, cbb, cbc, cca, ccb, ccc }

Type coonvertion for symbols and strings:

Commonly we shall use lowercase letters at the beginning of the alphabet (or digits) to denote symbols, and lowercase letters near the end of the alphabet typically w,x,y and z to denote strings. You should try to get used to this convention to help remind you of the types of the elements being discussed.

The set of all strings over an alphabet ∑ is conventionally denoted ∑* for instances {0,1}* = { €, 0,1,00,01,10,11,000,….}

Put another way,

∑* = ∑¹U∑²U∑³U∑⁴……

Sometimes we wish to exclude the empty string from the set of strings. The set of non empty strings from alphabet ∑ is denoted . Thus two appropriate equivalences are

∑⁺ = ∑¹U∑²U∑³U∑⁴……

∑* = ∑⁺U { € }

Concatenation of strings :

Let x and y be strings, then xy denotes the concatenation of x and y, that is the string formed by making a copy of x and following it by a copy of y. more precisely, if x is the string composed of I symbols x=a1a2….ai and y is the string composed of j symbols y = b1b2b3…bj, then xy is the string of length i+j : xy=a1a2….aib1b2….bj

Ex :Let x=1101 and y=0011 then xy=11010011 and yx = 00111101. For any string w, the equations €w = w€ = w hold. That is, € is the identity for concatenation, since when concatenated with any string it yields the other string as a result.

Languages : A set of strings all of which are chosen from some ∑*, where ∑ is a particular alphabet is called a language. If ∑ is an alphabet and L C= ∑*, then L is a language over ∑.

However there are also many other languages that appear when we study automata. Some are abstract examples such as

1. The language of all strings consisting of n 0’s followed by n 1’s for some n>=0: { €, 01, 0011, 000111,….}

2. The set of strings of 0’s and 1’s and 1’s with an equal number of each { €, 01, 10, 0011, 0101, 1001, ….}

3. The set of binary numbers whose value is a prime {10, 11, 101, 111, 1011,….}

4. ∑* is a language for any alphabet ∑

5. Φ the empty language is a language over any alphabet.

6. { € } the language consisting of only the empty string is also a language over any alphabet. Notice that Φ ≠ { € }, the former has no string and the latter has one string.

7. The only important constraint on what can be a language is that all alphabets are finite.

8. Thus languages although they can have an infinite number of strings are restricted to consist of strings drawn from one fixed finite alphabet.

Problems : In automata theory a problem is the question of deciding whether a given string a member of some particular language. It turns out as we shall see that anything we more colloquially call a problem can be expressed as membership in a language. More precisely, if ∑ is an alphabet and L is a language over ∑ then the problem L is:

Given a string w in ∑* decide whether or not w is in L

Pages

Sunday, 11 February 2018

Structural Representation of Finite Automata || Formal Languages And Automata Theory

1 comment: