In Chapter Two the idea that a program is made up of two parts data and instructions concerning what to do with the data was introduced. Until this chapter all of the emphasis has been placed on the instruction part of program, with the assumption that the data type needed would be obvious or didn't really matter. This is very unrealistic in that how to represent something as data inside a program is not only crucial to the simplicity of the program, it is usually very difficult. The trouble is that while there are good ways of organising the construction of a program i.e. structured programming and stepwise refinement there is no really effective theory or method of selecting and creating data types.
The main difference between the instruction and data parts of program construction is that most of the decisions about the data have to be made before the program is written. Stepwise refinement simplifies program construction for the very reason that it puts off the making of important implementation decisions until they are really essential. In this way progress can be made without having to have all of the details and principles of the program determined beforehand. In the case of data, there is no equivalent stepwise approach, you have to consider what it is you are trying to do and identify what should be represented by data and how it should be implemented. For example, if you want a program that will act as a computerised telephone directory what should be represented by data is obvious: names and telephone numbers. But the way to implement this representation is not quite as clear as you might think. Most BASIC programmers would immediately choose a string array for 'name' and a numeric array for 'telephone number'. but as will be explained later the process of adding a new name and number is greatly simplified if a more complex type of data, a linked list, is used. This is an example of the general principle that the more suitable the data representation is, the simpler the resulting program.
When it comes to data, the only real advantage that an advanced programmer has over a beginner is having more possibilities to choose from. If you only know about arrays you can't even consider linked lists as a method of implementing a telephone directory! You may think that even if you know about linked lists there would be no point in considering them because the BBC Micro doesn't support anything more complex than arrays. This is quite true but it misses the point that the BBC Micro has a good range of indirection operators that can be used to construct almost any type of data easily and efficiently. This chapter serves two purposes; firstly to introduce some of the more advanced data types and secondly to show how they can be implemented on the BBC Micro. The more difficult task of indicating how to go about selecting a data type is treated by way of examples in the later chapters of this book!
There is one similarity between program and data construction they are both 'hierarchical'. When stepwise refinement is used a program can be seen as being made up of smaller programs or modules, which in turn are also made up of other, smaller modules and so on. In the same way complex forms of data are made up from smaller, simpler forms of data. For example, a numeric array is 'built up' from simple numeric variables, The method used to organise the simple numeric variables into an array is an example of a 'data structuring' method. In other words, taking a collection of simple numeric variables and 'putting them together' so that any one of them can be specified using a name of the form A(I) (or whatever the array and index are called) is a data structuring method.
Once you have identified a data structuring method there is nothing to stop you from appjying it to other simple data types. Thus applying array structuring to simple strings gives the familiar string array. The string array is so well known that it seems unnecessary to describe it in such a sophisticated way, but as will be explained later even the humble BASIC string is quite a complicated data type!
Not only can a data structuring method be applied to simple data, it can also be appiied to data that has already been organised in some way. For example, applying the array structuring method to an existing array gives an array of arrays better known as a two-dimensional array! That is, a two-dimensional array can be thought of as a one-dimensional array, each element of which is itself a one-dimensional array. This corresponds exactly to the usual way of thinking of a one-dimensional array as a 'row' of simple numeric variables and a two-dimensional array as a table consisting of 'rows' and 'columns'. If this sounds like a very complicated way of describing something that is simple, it is because BASIC doesn't really provide the range of data types and structuring methods that would make an example credible! What is important at the moment is that the idea of organising simple data types to produce new data types is clear.
What are the simplest forms of data that can be organised to make new types of data? It is convenient to distinguish two types of elemental data static and dynamic data.
The difference between the two is simply that an element of static data is stored in a fixed amount of memory, whereas the amount required to store dynamic data can change while the program is running. For example, a simple numeric variable is an item of static data but a string is an item of dynamic data. Static data types are much easier to implement than dynamic types and, apart from strings, they are better known to BASIC programmers. In fact, static data types are so simple that there are broadly only two types, scalars and reals.
Scalars form the largest and most useful group of data types. Essentially a scalar is a whole number or an integer but often its real nature is well hidden. For example, a single ASCII character is a scalar. To see this all you have to do is recall that each character in the ASCII set is represented by a code, i.e. an integer given by ASC("c") where c is the character. In a sense a scalar is any sort of data that can be represented by a range of integers. For example, the days of the week, Monday to Sunday, can be represented by the integers 1 to 7, or 0 to 6, or l0 to 17, etc. What is most important to a scalar type is not so much which particular integers are used but the range of integers. Indeed some computer languages, notably Pascal, allow the definition of scalar types that effectively hide the fact that integers are used in the representation. For example, in Pascal you can define a type of variable that can be used to store one of the days of the week and then write statements such as
DAY:=sunday
FOR DAY:=monday TO friday
You can achieve the same effect in BBC BASIC by explicitly using integers to represent the days of the week. That is:
10 MONDAY=1:TUESDAY=2:WEDNESDAY=3:THURSDAY=4
20 FRIDAY=5:SATURDAY=6:SUNDAY=7
30 DAY=SUNDAY
40 FOR DAY=MONDAY TO FRIDAY
This use of integers to construct what are effectively new types of data is second nature to most BASIC programmers but it helps to see that it forms part of a larger pattern. The only scalar types that BBC BASIC provides are the integers themselves, characters and the Boolean values TR UE and FALSE.
Integers can be stored either in special integer variables (distinguished by having names ending in %) or in ordinary real variables. However, there are considerable advantages in using integer variables whenever you are only using integer values. The two advantages most often quoted are that integer arithmetic is fast, and integer variables occupy less memory. While these are indeed advantages, the biggest reason for using integer variables is that they clearly show when integer values are the only ones that are logically possible. If for some reason you my to store a real value in an integer variable the program will, quite justifiably, crash! If you had used a real variable to store the integer values then this bug would go undetected.
Character data can only be stored directly in strings or, after conversion to an integer using the ASC function, in numeric arrays. The conversion from integers back to characters is achieved using the familiar CHR$ function. The on]y other way that ASCII characters reveal that they are represented by integers is by their order relations. The order of the ASCII characters is produced by the underlying order of the codes that are used to represent them. In other words "A"<"B" is true only because ASC("A") <ASC("B"). This is simple enough to understand but often causes problems when a mixture of upper- and lower-case characters are being sorted into order. The trouble is that all of the upper-case letters come before the lower-case letters in the order and this results in all the words that begin with upper-case letters coming before all the words beginning with lower-case letters.
The Boolean data type is particularly restrictive as it consists of the two values TRUE and FALSE. In nearly all versions of BASIC, Boolean data is stored directly in either integer or real variables. (This should be contrasted with the approach adopted by other languages of creating special Boolean variables only capable of storing the values TR UE and FALSE.) BBC BASIC uses the value -1 to represent TRUE and 0 to represent F ALSE. Y on can easily verify this fact by trying:
PRINT TRUE,FALSE
This assignment of integers to TRUE and FALSE is by no means standard and you should avoid writing programs that make use of it in any way. For example, in BBC BASIC it is true that TRUE<FALSE but in other versions of BASIC it may be false! Boolean variables are considered in more detail in Chapter Twelve.
Perhaps the most confusing thing about real constants and real variables is the use of the word 'real'. In mathematics the term 'real number' has an exact technical meaning but it has been taken over by computer scientists to mean any sort of number that has a fractional part. A real variable can be used to store numbers with fractional parts. The only problem is that an integer like 3 can also be stored in a real variable and this confuses the use of real and integer variables. Indeed early versions of BASIC only provided real variables, taking the attitude that integer variables were an unnecessary and confusing luxury.
In most cases it doesn't really matter whether you use real or integer variables. However it is important to be aware of the fact that real constants are not always stored accurately. Real variables in BBC BASIC numbers with an incredible range (2*10^38 to 2*l0^-38). No matter how large or how small a number becomes, only nine digits are actually stored the rest are made up with zeros! So the number 1234567891234 would be stored as 1234567890000 i.e. smaller by 1234. This may seem like a large inaccuracy but in fact it is only a matter of around .0000001%! In other words, real variables may store numbers with large absolute errors but the percentage error is generally small enough to ignore. However, there are occasions when these small percentage errors cause large absolute errors in a final result. For example, adding very small numbers to very large numbers often gives the original large number as the result. Try:
10 A=12345678912
20 A=A+1
30 PRINT A
40 GOTO 20
and you will see that the value of A never changes! In other words in real arithmetic it is quite possible for A=A+C to be true without implying that C is zero! Other important sources of error are subtracting large numbers that are close in value and dividing by small values. The whole subject of accuracy and error in computation is difficult and beyond the scope of this book but it is important to be aware that some odd things can happen when using real variables.
Now that the two types of static variable scalars and reals have been described it is worth introducing the two best known ways of organising data arrays and records. Arrays are so familiar to the BASIC programmer that it is hardly worth spending much time on them. BBC BASIC provides three types of array real arrays, integer arrays and string arrays. In the next section a method of constructing arrays for any sort of data type using the indirection operators is described. An array is characterised by being a collection of identical types of data called the elements of the array under a single name. Some computer languages allow the use of the array name on its own to signify that all of the elements are to be used. For example, if A is an array then PRINT A would mean print all of the elements. However BASIC in genera] and BBC BASIC in particular only allow single elements of the arrays to be used. Any particular element is signified by the use of an index in the usual way.
The main difference between an array and a record is that the elements of a record do not have to be all of the same type. For example, to store someone's name and telephone number you would use a string for the name and a numeric variable for the telephone number. While these two data types form a single logical unit of data in a program that manipulates telephone entries in BASIC there is no way that they can be gathered together under a single name. In many other computer languages they could be turned into a 'record' called ENTRY (or whatever). This is similar to the way the elements of an array are collected together under a single name but instead of using an index variable the individual elements called-fields of a record are accessed by having additional subnames. In the case of the telephone directory entry for example, the field that stores the name might be called NAME and the field that stores the number might be called TEL-N U M. To make clear which field in which record you were referring to, most languages require the use of all the names that a field has. So to store a name in the record ENTR Y you would use:
ENTRY.NAME="Fred"
and so on. This should be compared to the way an element of an array is specified by giving both the array name and an index. The array name corresponds to the record name and the index corresponds to the field mame.
A field within a record is not restricted to being a simple data type, it can be an array or another record. For example, if you wanted to store a date you might define a record called DA TE with three fields, DA Y, MONTH and YEAR, which might be incorporated as a field within ENTR Y. Now to save a date along with the name and telephone number you would write:
ENTRY.DATE.YEAR=84
ENTRY.DATE.MONTH=4
ENTRY.DATE.MONTH=1
and this should be compared to the use of two index variables in a two-dimensional array.
Arrays and records are the two main data structuring methods found in other computer languages. The fact that BASIC includes arrays and not records is a reflection of the fact that while it is possible to do without records it is very difficult to do without arrays (but this is a debatable point!). Records are used to form tape and disk files of information that are the computer equivalents of traditional paper-filled filing cabinets:- Y ou might think that the use of arrays is equally obvious, but in fact they are often under-used!
The most obvious use is for storing a list of data that is going to be processed. In this role arrays are similar to the mathematical 'objects' vectors and matrices. Indeed much of the purely numerical work that computers perform involves the use of one- and two-dimensional arrays as vectors and matrices respectively. However this is not the only use for arrays within computer science. Arrays are always introduced by examples such as finding the mean of a list of numbers or reading in a list of numbers and printing them out in reverse order. This is understandable, but because of it many programmers never see an example of how to use an array as a look-up tabble.
Look-up tables are an excellent example of how the number of actions that a program has to perform can be reduced by using a more complex data structure. For example, suppose you need to use the SIN of a particular set of angles repeatedly; the most obvious thing to do is to work out the result each time it is needed but this often produces a very slow program! An alternative method is to work out the values that you need just once and then store them in an array for later use. In other words, use an array as a look-up table. In the case of simple functions such as SIN and COS there are always two ways of obtaining results by direct calculation or by using a look-up table. However there are plenty of 'functions' that cannot be summarised by a simple formula and in these cases the only choice is to use a look-up table. For example, there is no way that you can calculate the time of departure of the third flight to a particular destination but you can use a look-up table.
The use of look-up tables can become very complicated indeed, for a good example see Chapter Eleven. There are many occasions when a two-dimensional look-up table is necessary. For example, you could use a two-dimensional look-up table to hold the distance between pairs of cities. The biggest problem encountered with using look-up tables is the amount of memory that they take and the need to initialise them with the correct values. Sometimes you can take advantage of a pattern in the data to reduce the size of the table but this depends very much on the nature of the problem. For an example of how regularities in the data can be used to reduce a two-dimensional table to one dimension see Chapter Eleven.
The BBC Micro provides all the programming facilities necessary to construct new data types in the byte array and the three indirection operators '?', '!' and '$'. Although it is possible to construct new data types it is not something to be taken on lightly. Use of the indirection operators can easily produce programs that are very difficult to understand.
The byte array has already been introduced in Chapter Four. The statement:
DIM variable_name size
will create a byte array of 'size'+1 bytes and store its starting address in 'variable'. For example:
DIM N% 10
will create a byte array of 11 bytes and store the address of the first byte in N%. Although origignally introduced as a way of reserving space for the storage of machine code programs you can think of byte arrays as a way of reserving memory locations for any purpose, including the storage of data. Notice that while the memory location of a byte array will not change once created there is no guarantee that it will be created at the same location each time the program is run.
The action of the indirection operators is easy enough to understand. The byte indirection operator '?' performs the same functions as PEEK and POKE in other versions of BASIC. That is:
?address=value
stores 'value', an integer between 0 and 255, in the memory location at 'address' and:
?address
returns the contents of the memory location at 'address'. Similarly, the word indirection operator '!' can be used to store and retrieve four bytes of data in the format used to hold data in an integer variable. Thus any value that could be stored in an integer variable can be stored in four memory locations using '!'. The final string, indirection operator '$', will store and retrieve up to 256 characters starting at the address specified. When saving characters a carriage return is added to the end of the string and when retrieving characters a carriage return is used to signal the end of the string. For examples of the elementary use of the indirection operators see the User Guide, Section 39. Notice that when the indirection operators are written in front of a value that forms a valid address they behave just like variables. That is, an expression like ?address will return a value if used on the right-hand side of an equals sign and will store a value if used on the left. The expression:
A=123
will store 123 directly in the variable A but
?A=123
will store ]23 in the memory location whose address is stored in A, i.e. where the value is stored in determined 'indirectly' by A. Hence the use of the word 'indirection' as applied to the indirection operators. On a slightly more advanced level it is worth pointing out that each of the three operators can be used to store the results of expressions of the correct type. For example, you can use statements like:
?A=I*2
!A=I+C
and
$A=A$+CHR$(7)
If you try to store a value larger than 255 using the byte indirection operator it will be reduced to the range 0 to 255 using the MOD function. That is, if x is greater than 255
?A=x
has the same effect as
?A=x MOD 256
As already mentioned, the indirection operator stores and retrieves data in the same format as used for integer variables. Thus the numeric range that can be handled is:
-2,147,483,648 to +2,147,483,647
The second advanced feature of the indirection operators '?' and '!' is the way that an 'offset' can be specified. For example, the command
A?F=D
is equivalent to
?(A+F)=D
The way that the indirection operators work is straightforward enough. However, it is not so easy to see how to use them for anything other than single values. Perhaps the simplest data type to implement directly is the one-dimensional array, Although BBC BASIC provides integer and real arrays there are occasions when it would be useful to use one-dimensional arrays that use fewer bytes per element. For example, if you know that the values that you want to store in an array lie in the range 0 to 255 then theoretically you can get away with one byte per element, but even using an integer array takes four bytes per element. However, using a byte array and the byte indirection operator you can construct your own array that uses only one byte per element. If you need an array with N elements then use:
10 DIM ARRAY N-1
To store a value in the Ith element use:
ARRAY?I=value
and to retrieve the contents of the I th element use:
variable=ARRAY?I
Similarly, if you need an array that uses two bytes per element i.e. a numeric range of 0 to 65535, then use:
10 DIM ARRAY 2*N-1
to reserve enough memory,
ARRAY?I*2=value MOD 256
ARRAY?(I*2+1)=value DIV 256
to store a value and
variable=ARRAY?(I*2)+256*ARRAY?(I*2+1)
to retrieve a value from the Ith element.
Two-dimensional arrays can be implemented in the same way. The only complication is finding any given element. For example, if the array uses one byte per element then the space needed by an N by M array is:
l0 ARRAY N*M-1
and the address of the I,J element is:
ARRAY+N*J+I
Notice that each of these array definitions has two parts; the reservation of the correct amount of memory as a byte array, and a function that gives the location of any particular element This scheme can even be extended to user-defined records. For example, consider the telephone directory record given earlier. If the name field is restricted to a maximum of 20 characters and the telephone number is stored as an integer then a record can be defined as:
10 DIM ENTRY 23
20 NAME=0
30 NUMBER=20
where NAME and NUMBER are offsets that define the start of each field from the first location allocated to the record. Thus:
40 $(ENTRY+NAME)="fred"
will store a string in the name field and:
50 ENTRY!NUMBER=12345
will store an integer in the number field. Notice that the form of a field specification when using '?' and '!' is particularly neat in that the record name can be written next to the field name using the indirection operator as a separator.
Once implemented it is possible to manipulate the elements of the user-defined arrays and records just as if they were standard variables. The only real restriction is the lack of an indirection operator that will store a real variable. This can be overcome in a number of ways. For example you could convert the real value to a string of digits using STR$ and then store it using '$'. In general, user-defined data structures are more difficult to work with than the standard BASIC arrays. However there is one respect in which they are better than standard arrays. Although you cannot pass an array to a procedure as a parameter there is nothing to stop you from passing the address of the start of the array! For example, you could write a procedure that would add together elements I to N of an integer array and store the result in element zero:
1000 DEF PROCsum(NAME%,N)
1010 LOCAL I,SUM
1020 SUM=0
1030 FOR I=1 TO N
1040 SUM=SUM+NAME%!(I*4)
1050 NEXT I
1060 NAME%!0=SUM
1070 ENDPROC
where NAME% contains the address of the start of the byte array being used. Notice that this procedure can total any user-defined integer array. For example:
10 DIM ARRAY% 400
20 PROCsum(ARRAY%,100)
30 PRINT ARRAY%!0
will total ARRAY%. This is such a useful facility that it is worth exploring ways of extending it to the standard BASIC arrays.
All BASIC arrays are stored in a very simple format composed of two parts; a number of memory locations that hold part of the name and other information about the size of the array, and a data section that actually stores the array elements. If the start address of the array can be found then there is nothing to stop us from using it with the indirection operators to manipulate the contents of the data section of the array directly. In particular, knowing the address of the start of an array makes it possible to write procedures that will perform an operation on any array (as in the case of PROCsum in the previous section).
The most obvious way of finding the address of the start of an array (or any variable for that matter) is to write a function that searches the area of memory where the variables are stored. However this is not an easy function to write and there is a much simpler method that works just as well. In BBC BASIC the memory needed to store arrays and simple variables is allocated only when they are actually encountered during the running of a program. This means that arrays are stored in the order that they are defined by DIM statements and as long as no other variables are used in between such definitions they will occupy adjacent areas of memory. Thus the statement:
DIM name 0,name(n)
will create two arrays next to each other in memory. The first is a byte array consisting or a single memory location and the second is a standard array consisting of n+1 elements. As the two arrays are created next to each other you might expect the variable 'name' to contain the address of the start of the standard array minus one. However, for some reason BBC BASIC allocates three memory locations to the byte array even though you only requested one, so 'name' holds the address of the start of the standard array minus three. For example:
10 DIM ARRAY% 0,ARRAY%(100)
creates an integer array of l01 elements (i.e. ARRAY%(0) to ARRAY%(100)) and a simple integer variable that contains three less than the address of the start of the array. The only complication is that it is not good enough simply to add three to ARRA Y%, to give the start of the array because the start of the array is not the start of the area where the data is stored. The format used to store variables is extensively described in The BBC Micro: An Expert Guide but all that you need to know in this context is that the end of array's name is marked by a zero and the next memory location holds a constant that indicates the start of the data area. Therefore to alter the address stored in ARRA Yg$ so that it 'points' to the data area u be:
2000 DEF FNcorrect(START%)
20l0 START%=START%+2
2020 REPEAT
2030 START%=START%+1
2040 UNTIL ?START%=0
2050 =START%+START%?1
as in
ARRAY%=FNcorrect(ARRAY%)
Once the address has been corrected to point to the start of the data area of the array it can be treated exactly like the user-defined arrays given earlier. That is, you can use PROCsum to add up all the elements of any integer array. For example, try:
10 DIM ARRAY% 0,ARRAY%(100)
20 ARRAY%= FNcorrect(ARRAY%)
30 FOR I=1 TO l0
40 ARRAY%(I)=1
50 NEXT I
60 PROCsum(ARRAY%,100)
70 PRINT ARRAY%(0)
80 END
where PROCsum and FN correct have to be appended.
All of the variables and data structures that we have examined so far have been static, in the sense that the amount of memory that is needed to store their values doesn't change while a program is running. By contrast dynamic variables do change their size as a program runs. To a BASIC programmer the most familiar example of a dynamic variable is the humble string. Indeed most BASIC programmers use string variables without a second thought but strings, like all dynamic variables, are very difficult to implement efficiently. So much so that many other programming languages, Pascal for example, do not provide strings as a data type. What they do provide as an alternative is the character array. A character array can be thought of as a fixed length string, but it is more directly related to a one-dimensional array where each element can be used to store a single character. BBC BASIC doesn't have character arrays but it is easy to see how they could be created using byte arrays and the string indirection operator.
The details of how the BBC Micro tackles the difficult problem of implementing strings are also given in The BBC Micro: An Expert Guide but essentially what happens is that a certain amount of memory is allocated to a string variable when it is created. If the number of characters that it has to store exceeds this initial memory allocation a brand new and larger amount of memory is allocated to it. This process can be repeated any number of times during the course of a program, each time leaving behind a redundant area of memory that was once allocated to the string variable. In this way it is possible to use up all of the BBC Micro's memory by storing old copies of string data. If this is a particular problem within a program the solution is very simple. All you have to do is initialise all the strings to hold the largest number of characters that the program will ever want to store. In this way the initial amount of memory allocated to string variables will be large enough never to need increasing.
Although strings are probably the most useful form of dynamic variables, they are so well known to the BASIC programmer that it is more sensible to spend time explaining how other forms of dynamic variables work and can be implemented in BBC BASIC. After the string the most important dynamic data types are:
the stack
the queue
the linked list and
the tree
Each of these four, along with other types of dynamic variable, are closely related to the use of pointers. A pointer is simply a variable that is used to store a memory address. In this sense a pointer really does 'point' at a memory location. There is nothing stopping us from storing an address in any numeric variable and indeed this has been done many times to access the contents of a memory location in combination with the indirection operators. That is, in:
?A
the variable 'A' is being used as a pointer to a memory location. The art of implementing and using dynamic data types on the BBC Micro is in setting up and manipulating pointers and this is best explained by examples of each of the dynamic data types listed.
The stack, or more accurately, the Last In First Out (LIFO) stack is well known to assembly language programmers. In action it mimics the operation of a stack of coins or plates. If you make a stack of plates you can identify two operations, adding to the stack by placing a plate on top and, conversely, removing a plate from the top of the stack. If you think about it for a moment you will see that it is a characteristic of such a stack that the order that the plates are removed is the opposite of the order that they were added for the simple reason that the plates added to the stack first will be further toward the bottom of the stack! In computing a stack behaves in the same way. Storing data on the stack is known as 'pushing data onto the stack', a PUSH operation, and retrieving data is known as 'pulling data off the stack', a PULL operation. The best way to illustrate the operation of a stack is by giving an example.
There are two components to the implementation of a LIFO stack, an area of memory that is used to store the data (often referred to as the stack itself) and a pointer that marks the top of the stack. The most obvious way to reserve some memory for a stack is once again the byte array. The pointer can be implemented as a standard BASIC variable used in conjunction with the appropriate indirection operator. When the stack is initially created the pointer variable or 'stack pointer' is set to point to the first free memory location. A PUSH operation stores data in this free memory location and then adjusts the stack pointer so that it points to the next free memory location. A PULL operation first adjusts the stack pointer so that it points at the data item on the top of the stack and then retrieves it, so freeing the location.
The following program uses an integer stack to reverse a list of ten integers:
10 DIM STACK% 400
20 POINTER%=STACK%
30 FOR I=1 TO l0
40 INPUT A
50 !POINTER%=A
60 POINTER%=POlNTER%+4
70 NEXT I
80 FOR I=1 TO 10
90 POINTER%=POINTER%-4
100 PRINT !POINTER%
110 NEXT I
Line 10 reserves sufficient memory for a stack of up to 100 integer items (you will remember that an integer takes four bytes to store) and line 20 initialises the stack pointer POINTER% to the start of the area. The FOR loop at lines 30 to 70 reads in l0 integers and pushes them onto the stack (fines 50 and 60). The second FOR loop at fines 80 to I l0 pulls and prints each item off the stack in turn (lines 90 and 100).
You can create stacks using other data types as raw material. For example, you can use a standard BBC BASIC array indexed by a variable playing the role of the stack pointer. Stacks are used whenever the order of incoming data has to be reversed or just as a way of temporarily storing data until it can be dealt with.
Once you have seen how a stack works the queue is an obvious next step. Indeed a queue is often referred to as a First In First Out or FIFO stack. The action of a queue is exactly what you would expect from a consideration of the way people queue. The first person to join the queue is (usually!) the first person to be served. So it is with the data queue, the first item to join the queue is also the first item to leave it. Notice that the two fundamental queue operations have already been identified JOIN adds a data item to the queue and LEAVE removes an item.
The implementation of a queue involves an area of memory and a pair of pointers. One of the pointers marks the front of the queue and the other marks the end. Obviously data items are taken out of the queue using the FRONT pointer and added using the END pointer. That is, assuming each item is a single byte,
JOIN is
?END=value
END=END+1
and
LEAVE is
variable=?FRONT
FRONT FRONT+1
Initially the FRONT and END pointers point to the same memory location and as data is added to the queue the end moves up in memory and the front stays put. Similarly, as data is removed from the queue, the front moves up and the end stays put. The only trouble with this implementation of the two operations is that the front and end of the queue move ever upward! The solution is to make the queue circular by setting any pointer that reaches the top of the allocated memory back to the start of the area. For example, the following program reads in l0 numbers and adds (joins) them to a queue and then prints them out in the order that they arrived.
10 DIM QUEUE% 100
20 FRONT%=QUEUE%
30 REAR%=QUEUE%
40 FOR I=1 TO l0
50 INPUT A
60 ?REAR%=A
70 REAR%=REAR%+1
80 IF REAR%>QUEUE%+100 THEN REAR%=QUEUE%
90 NEXT I
100 FOR I=1 TO l0
110 PRINT ?FRONT%
120 FRONT%=FRONT%+1
130 IF FRONT%>QUEUE%+100 THEN FRONT%=QUEUE%
140 NEXT I
You should be able to recognise the JOIN and LEAVE operations in lines 60 to 80 and 1l0 to 130. Also notice that, in contrast to the stack in the last section, this queue stores single byte items.
Queues are used whenever data is generated too fast to be dealt with immediately but nevertheless needs to be processed in the order that they were generated. The BBC Micro uses many queues in its normal operations. For example, the sound queue, the printer buffer, the keyboard buffer, etc.
The queue and the stack are characterised by restrictions on where data can be added or removed. In the case of the stack, data can only be added to or removed from the top and in the case of the queue, data can only be added at the end and removed from the front. The linked list is a more versatile data type in that it is relatively easy to add and remove data from any position. In this sense the linked list is more like the everyday idea of a list of items made on paper.
The fundamental principle that lies behind a linked list is that each item in the list includes a pointer to the next item in the list. This, coupled with a pointer to the start of the list and some way of detecting the end of the list, is ail that there is to implementing a linked list. For example, a list of names can be built up using a byte array and the string and word indirection operators. If each item in the list is defined to consist of a name with a maximum of 20 characters and a single integer pointer to the next item, it is easy to work out that each item needs 24 bytes of storage. The following program will build up a list of up to l0 names:
10 DIM NAME_LIST 24*10
20 START%=NAME_LIST
30 CURRENT% NAME_LIST
40 CURRENT%!20=0
50 INPUT N$
60 IF LEN(N$)>19 THEN GOTO 50
70 $CURRENT%=N$
80 CURRENT%!20=CURRENT%+24
90 CURRENT%=CURRENT%+24
100 CURRENT%!20=0
1l0 GOTO 50
The format of the list is as shown in Fig. 5.1. Notice that each pointer points to the start of the name and the last item doesn't store a name and its pointer is set to 0.
Fig. 5.1 An example of a linked list.
Notice that in the above example the pointer CURRENT% is used to add items to the end of the list but new items can in fact be added anywhere. For example, if you wanted to insert an item to follow the first, i.e. to become the second item, all you would have to do is change the pointers as shown in Fig. 5.2. The important idea is that the order that a linked list is read is determined by the pointers, not by where the item is stored.
Fig. 5.2. Adding "RAY" to the linked list in Fig. 5.1. Notice that reading the list by 'following' the pointers gives the names in correct alphabetical order
The items that make up a linked list can be any data type as long as it includes a pointer to the next item. For example, the telephone directory given earlier in the chapter is best implemented as a linked fist with each item as a record that includes a pointer to the next record. Using a linked list for a telephone directory has the advantage that when new items are added they can be inserted into the list at the correct position to preserve alphabetic order. (To do this the list is read, by following the pointers, until the first item that the new name should precede is found, then the new item is inserted by manipulating the pointers (as shown in Fig. 5.2).) Without using a linked list each new item would need a complete sort of the list to make sure that the result was in the correct order.
There are so many uses for linked lists that it is impossible to summarise them. However, it is often the case that ignorance of the existence of the linked list forces a programmer to use an array instead. The result is always a program that is much more complicated and involves a great deal of data moving. A linked list should be used whenever a list of data needs to be repeatedly rearranged.
The final dynamic data type is the most complicated of all the tree but if you have followed the use of pointers to form a linked list it should be easy to understand. Each data item in a tree is associated with two pointers the left pointer and the right pointer. The best way to think about this is as shown in Fig. 5.3 which clearly reveals why this data type is called a tree!
Fig. 5.3. A tree.
To implement a tree in BBC BASIC all that you have to do is reserve some memory using a byte array and use pointer and indirection operators in exactly the same way as for the linked list. (Apart from the fact that there are two pointers per item and an increased choice of where to add an item, that is!)
Trees are used to represent any sort of data that has a hierarchical structure like a famjly tree, or the command structure of an organisation. Most of their uses are found in advanced computer topics such as artificial intelligence and are thus outside the scope of this book. (If you would like to know more about this application of trees, see Artificial Intelligence in BASIC by Mike James published by Newnes, 1984.) However, you should be able to see how the tree is a simple use of pointers to construct complicated data structures. In fact the type of tree that has been described is more correctly described as a binary tree, because each item has two pointers. It is quite possible to construct trees that contain more pointers per item and linked lists that contain pointers to both the next item and the previous item (sometimes called doubly linked lists). The range of variations is too great to go into but once you have mastered the idea of using a pointer to indicate where the next item is stored you should be able to see them all for yourself.
At the end of this long chapter you might feel that there are far tqo many data types and structures to cope with. However, if you make a list of the data types that have been introduced you might be surprised to discover how few there are. Simple static data reduces to nothing more than the scalars, based on integers, and the real numbers. The only two common data structuring methods are the array and the record. Even if you include the dynamic data types we only have four more: the stack, the queue, the linked list and the tree. What is amazing is that so many different problems can be tackled successfully with such a small number of types!