Data Structures

 

Introduction

Computers have very specific ways of storing and manipulating data. This chapter shows how individual items of data may be associated in various ways to form data structures. At this stage it may be worth mentioning that the individual forms mentioned above may have various shapes and sizes. These shapes and sizes range from Boolean variables that allow only either yes or no to various types of string type variables whereby one string being a strictly ascii string is going to be different and not compatible to a string allowing UTF-8 ascii encoding. These structures enable large and potentially unwieldy collections of data to be managed by relatively simple operations.

Six data structures are going to be discussed in this chapter: Strings, arrays, stacks, queues, lists and trees. These are the most important and widely used Data Structures used to store data on computers today and in the past.

The way in which data is structured creates a framework within which we as program developers can go about storing data in such a way that the structure (framework) gets preserved and that every time the data needs to be accessed both we and the system know what to expect.

One important fact that should be bore in mind is that: Data Structures + Algorithms = Programs

Pointers:

Pointers are a very useful aspect of data structures that mean exactly what it implies: they are a bit of data that point at a location where other data can be found. There are various ways in which these pointers are implemented.

Objects are generally a collection of variables that reside in a part of the computer’s memory. To access that data structure you would generally have to access it through a pointer. In most high level programming languages, if you declare a string (collection of characters), the variable identifier will refer to a pointer that points to the space that has been set aside for it in memory.

Further more pointers are used to build many data structures. They provide the links which join the elements of the structure. Pointers that are particularly significant are those that point to the beginning and end of data structure or elements within a data structure. Later on in the chapter we will discuss how this is done within different data structures.

Strings:

Strings are a collection of characters that are regarded as a single data item. Initially the length of a string would have to be assigned and then that length would be store either at the start of the string or at the end. These days the assignment s not necessary but the amount of characters in a string is still stored. This is very useful because we often have to know how long a string is before we can manipulate the data.

There are also various operations that can be performed on a string. The most traditional ones are the ability of one string to be joined (annotated) to another string creating one big string or a string being broken up into two smaller ones. An example of the latter would be if a string contains a full name delimited by a space we can then easily break that up into a first name and last name (two different strings). There are more operations that can done on strings which we will not cover at this stage.

Arrays:

Arrays are quite similar to strings in that they are a collection of similar data elements. Where a string is a collection of characters making up a word or sentence, an array can be made up of a collection of strings whereby each element of the array is taken up by a string.

Arrays can represent more than just strings or characters. You can also create an array on integers or floating point numbers – you can even go so far as creating an array of objects. Remember that an object can be a word (collection of characters – string) or a record containing in variety of fields that make up someone’s name, address, telephone numbers and email address. When using the latter you are in effect implementing polymorphism.

It is also possible to create a two-dimensional array. A two-dimensional array is an array where you introduce the idea of rows and columns like a matrix.

Arrays are generally static data structures and that means that when you declare them you must also know how may elements you plan on storing in it. Each element is then indexed using an integer. Please note that indexing is controlled by integers that start counting at 0. Similarly when creating a two-dimensional array you would have to know how may elements there are horizontally as well as vertically.

For example the way a string array with 50 elements named temp gets declared in Java is as follows:

String [] temp = new String [50];

In Python there is a more dynamic approach to declaring variables and so the same declaration is as follows:

>>> temp[]

There are a few things worth noticing here: Firstly there is a lot less typing involvedJ. Secondly, in Python, arrays are not static but dynamic data structures. Python offers a different approach to collections of similar elements. Python makes use of lists which are handled in a somewhat different way, which will be covered later in this chapter.  What that means is that in Python the size of the list will dynamically grow as is required at run time. You do not therefore have to determine the size of the list while you are writing the program.

The way in which ‘two-dimensional’ arrays are declared is fundamentally different. As I mentioned before in Python lists are used and so to create a ‘two-dimensional array’ you are actually declaring a list of lists. You can also call this approach a nested list. In this example we are going to consider a situation where there are 50 columns and 20 rows. Here is how it’s done in Java:

String[][] temp = new String [50][20];

As in the previous example the Python way is somewhat different and we must bear in mind the fact that Python uses lists. One other major difference between the way Python works compared to languages like Java is dynamic nature of Python. When writing a Python program you will mostly declare and instantiate your variables in one statement. See the example:
>>> temp1 = [1, 2, 3, 4]   # A list of integers
>>> temp2 = [5, 6]         # Another list of integers
>>> temp3 = [temp1, temp2] # A list containing 2 lists
>>> temp3                  # Output the contents
[[1, 2, 3, 4], [5, 6]]
>>> temp3[0][1]            # Output a specific element
2
In the last statement you can see how the indexing works, the integer in enclosed within the first square brackets tells Python to look in the 1st element of the list (this is a list within a list). The second number ([2]), tells Python that you want to see what the 3rd element of that list is and as expected Python returns the number 2.

Stacks:

The best way of explaining how a stack data structure works is by comparing it to the way plates are stacked in a restaurant that offers some kind of self service whereby you collect your plate from the recessed, warmed plate storage and proceed to the buffet and choose your food. When the waitresses are bringing clean plates from the kitchen they actually push the plates onto the stack. Similarly when you get a plate you are actually popping the plate off the stack. When discussing stacks in computer terms we use the same terminology.

Now, if you think about how the plates are pushed onto the stack and then popped off the stack you can see how the first plates that are pushed onto the stack are the last ones to be popped off. Again the same happens in computers. This order of adding and removing is known as a LIFO system. That acronym means last in first out.

Pointers also play an important role when using stacks in computers. Generally one pointer is used to indicate the top of the stack and the other one is used to indicate the bottom of the stack.

In Python lists can be handled as stacks by using two of the function associated to lists: To push data onto the stack you use the .append() function and then to pop data off it again you use the .pop() function.

The stack is one of the more important data structures in computers whereby they are often used in facilitating mathematical calculations.

ds3

Queues:

As the name suggests queues are based on the good old queues that we have to wait in when going to the post office or a concert. The basis on which it works is that data that gets entered into the queue first will be extracted first. This approach to handling data is known as FIFO which stands for first in first out.

As with stacks pointers are used to keep track of where (in memory) the beginning of the queue is and where the end of the queue is. If another element is added to the queue, the pointer pointing to the beginning of the queue is altered to accompany it.

A good example of where queues are used in Computer Architecture is the process of pipelining. Pipelining came about when the speed of CPUs became faster than the speed of the rest of the motherboard’s front side bus speed (FSB).

The way pipelining works is that instructions are queued up in the cache memory of the CPU (very fast static RAM) and then processed as the CPU becomes available. This process closely resembles an assembly line using a conveyer belt. Another example is the spooler that allows various print jobs to be queued as the printer (being a slower device) prints them out.

Lists:

Lists are a very useful and versatile data structure within computers as they allow options that stacks and queues are incapable of. As has been mentioned earlier in this chapter Python makes extensive use of Lists as a way of storing similar and related sets of data.

To get a better understanding of the fundamentals of lists we will begin by looking at the way pointers work within lists. As with stacks and queues there is a pointer at the beginning of a list as well as a special one at the end. The one at the end is generally referred to as being a ‘null’ pointer. The beginning and end pointers are not the only pointers used in lists though. Every element within a list has a pointer pointing to the next element within the list.

The fact that each element has its own pointer creates a few worthwhile advantages: Firstly, the data does not have to be stored in memory in a contiguous way it can be store wherever there is space available. Remember each element has its own pointer so, it doesn’t matter where the elements are located they can always find one another. It now also becomes possible to add elements and remove elements from anywhere within the list. What happens when you add an item is that the pointer pointing at the original element is altered to point at the new (inserted) item and that item’s pointer points at the element the previous element was pointing at prior to the insertion. Similarly, when deleting an item, all that happens is that the undesirable item gets removed and the pointer that was pointing at that item gets altered to point at the item that came after the removed item. In effect the pointer from the deleted item gets copied to the preceding element.

Lists as data structures are also known as linked lists and they can act like either stacks or queues – all that happens is that all the pointers are adjusted according to the implementation of the linked list.

Trees:

One of the better ways of envisaging a tree in a data structure context is to imagine an upside down tree where the root is high and the leaves are towards the bottom as seen in Fig. 4.

fig4

The type of tree that is used most commonly in computing is known as a binary tree and so in this chapter we are going to concentrate on what a binary tree is and how we can traverse the data within that tree.

Generally trees start off at the top and the top node in a tree is known as the root node. Each node can have variety of sub-trees ranging from none to as many as can be supported. In a binary tree each node can only have 0, 1 or 2 sub-trees (as seen in Fig. 4). The fact that the maximum amount of sub-trees coming off a node in a binary tree is 2 is evident through the use of the term binary – meaning two.

As with all other data structures discussed previously, pointers are an important consideration. I the case of binary trees, each node consist of a data item plus two pointers. One or both of their pointers may have null values if they have no sub-trees to point to.

There are a number of operations that can be carried out on trees. Firstly two binary trees can be joined to an additional node, which then becomes the root of the original trees which are now sub-trees.

A tree may also be traversed in several different ways. Traversing a tree is accessing its elements in a systematic way. There are three different methods of traversing a binary tree:

 

Pre-order Traversal:

1. Start at the root node
2. Traverse the left sub-tree
3. Traverse the right sub-tree

Applying this traversal method to the example given (Fig. 4) the output would be: 0, 2, 1, 3, 7, 5, 4, 6

 

In-order Traversal:

1. Traverse the left sub-tree
2. Visit the root node
3. Traverse the right sub-tree

In this example the output is: 1, 2, 3, 4, 5, 7, 6

 

Post-order Traversal:

1. Traverse the left sub-tree
2. Traverse the right sub-tree
3. Visit the root node

Finally, using this method produces: 1, 3, 2, 4, 5, 6, 7

pl

« Previous || Next »