The previous chapter considered the problem of keeping the flow of control simple but even this is not enough to make a large program easy to write. The key to the problem of writing large programs as easily as small programs has already been partially introduced in the form of 'nesting' structuring elements. In this chapter it is developed further into a second component of programming method that goes naturally with structured programming.
The best way to write a large program is as a number of small programs! This seems like an obvious approach in the tradition of 'divide and conquer' but there are a large number of interesting questions concerning exactly how the divisions should be arrived at and then how these smaller program should be assembled into one large bug-free program.
If you look at a finished program, no matter how it has been written, you should be able to identify groups of statements that carry out particular tasks. For example a program might have an initialisation and data input section, followed by a section that performs calculations and finally a section that prints results. As the final program will have this sectional or 'modular' structure no matter how you go about writing it, then it seems like a good idea to try to recognise this structure and make use of it while constructing the program. This is the philosophy that lies behind 'modular programming' . The benefit from breaking down the construction of a large program into a number of smaller modules is easy to see. Each module can be treated as if it was a program in its own right and, of course, the module will be a smaller program than the one needed to solve the initial problem, and smaller programs are easier to write! If one of the modules turns out to be rather too large then the process can be repeated and it can be divided up into a number of smaller modules and so on. An important factor in the success of this repeated division strategy is that each module really can be treated as a program in its own right and in isolation from the other modules.
The degree to which real modules can be treated in this way will be discussed later. For the time being all that will be said is that there are considerable advantages in implementing modules as BBC BASIC procedures. This step of identifying modules with procedures is useful but not essential. It would, for example, be possible to write sections of program as subroutines to do particular tasks or even just as lists of BASIC statements marked out by REMs, but either of these methods would be severely under-using the facilities offered by BBC BASIC
To say that a program should be made up from modules is one thing; to identify those modules when all you have is an idea of what the program should do is quite another. Stepwise refinement is an excellent technique for solving all manner of problems, not just those that demand programs as solutions. One of the best features of stepwise refinement is that even if you cannot solve the whole problem it is possible to make a start and quickly identify what the real difficulties are. It also automatically takes account of the natural modular structure of a program and uses it to advantage.
The best way to explain stepwise refinement is via a simple example. Consider the problem of producing a program that automatically sets and marks questions in simple arithmetic.
The statement of the problem is easy to understand. The program is intended to give routine practice in simple arithmetic, but unless you have written such a program before, or are exceptionally talented, you will only have a vague idea of what it should be like. If, as is normal, the complete program isn't inside your head the traditional advice is to sit down with plenty of paper and plan the program using flow charts or some other method. This is a very wasteful way of designing a program because each time you change your mind about how it should work you have to throw away a lot that is good and works, along with the unsatisfactory bits. Stepwise refinement doesn't suffer from this defect.
After very little thought it is clear that the arithmetic test program must first construct a question, then ask it, mark it and finally repeat the whole procedure until it is finished. If you examine this rough description carefully you will find that it is not so rough after all. In fact, if each of the parts of the description are interpreted as procedures, then you almost have a program. In other words after very little thought about the problem you can write:
10 REM arithmetic test
20 REPEAT
30 PROCset_question
40 PROCask_question
50 PROCmark_question
60 UNTIL finished
which is perfectly good BBC BASIC apart from the fact that the PROCs do not exist as yet and the condition 'finished' in the UNTIL statement is not fully determined. You can even avoid the "No such FN/PROC at line x" error messages that trying to run this program would produce by including:
1000 DEF PROCset_question
1010 ENDPROC
2000 DEF PROCask_question
2010 ENDPROC
3000 DEF PROCmark_question
3010 ENDPROC
With these 'dummy' procedures present you can even begin the testing of your first attempt at the program!
The rough program given above is the first stage in the stepwise refinement process. Using it as a starting point, the second stage of stepwise refinement is to fill in the details of the dummy procedures. In this sense the dummy procedures introduced in the first stage are promises to write procedures. The second stage of refinement can also apply the same technique of using dummy procedures as promises to write the procedures needed to complete the procedures introduced in the first stage. So, for example, the devejopment of PROCset_question might produce something like:
1000 DEF PROCset_question
1010 Q$=""
1020 PROCchoose_op
1030 NUMl=RND(99)
1040 NUM2=RND(99)
1050 Q$=STR$(NUM1)+O$+STR$(NUM2)
1060 PROCfind_ans
1070 ENDPROC
Where the dummy procedure PROCchoose_op returns one of '+' ,'-', '*' or '/' in the string OS and PROCfind_ans works out the answer to the arithmetic expression held in Q$. Notice that the two dummy procedures are produced because at this stage we don't know how to go about generating one of the four arithmetic operators in O$ nor how to work out the answer to an arithmetic expression built up in Q$. These are problems that will be solved at the next stage of refinement.
This is all there is to stepwise refinement: at each stage you write as much of the program as you can using dummy procedures to put off solving any large task that has to be performed until the next stage. In practice it is sometimes better to leave a task to the next stage even when you know how to implement it at the current stage. This is the case when the task is similar to other tasks that are being deferred to the next stage, so that the resulting procedure 'fits into' the overall structure of the program. For example, some programmers would have deferred the construction of the two random numbers NUM1 and NUM2 to a procedure at the next stage (PROCnumbers say) because this fits in with the way a procedure PROCchoose_op is used to generate the operator part of the expression. There are no hard and fast rules about when to defer a task to the next level of refinement; only a feeling for the overall program can help you. However, it is worth saying that in most cases you should err on the side of using too many rather than too few procedures and steps. The advice often given concerning working out simple arithmetic holds for stepwise refinement never do at this stage what you can put off until the next!
In the last chapter the idea of 'nesting' one structuring element inside another was introduced as a way of constructing programs. In many ways stepwise refinement is a method of program construction that proceeds by nesting one program inside another! For example if you look at the program produced in the first step in the last section (i.e. lines I0to 60) you can see that if you treat the procedure calls as single BASIC instructions then this is a very simple but complete BASIC program. Its structure is clearly one of our simple structuring elements an until loop and therefore it is very easy to understand. However if you look at it a little closer you will see that some of the BASIC statements are themselves complete programs with a structure. For example, line 30, the call to PROCset_question, is a complete program in the form of the simplest of ail structuring element the default flow of control. In practice, the modules that make up a program will contain a number of structuring elements but never so many that it is difficult to see what is happening.
It is useful to think of this nesting of modules as a number of levels present in the program. The top level is often called the 'main program' and this calls modules or procedures in the next level down and so on. For example the arithmetic test program could be described as:
When you first look at a program constructed in this way all that you need see to understand its overall working is the main program. As you look closer and more carefully you work your way down through the levels, understanding more and more of the detail of how the program works. This is an exact reflection of the processes that were used to write the program and this also explains why stepwise refinement is referred to as a 'top down' method.
In practice the separation between the levels of a program is not always so clear. For example, a procedure in a lower level may be called by a number of procedures in higher levels. In the case of the arithmetic program PROCfind_ans might also be called by PROCset_question and PROCmark_question or by any procedure that needs to know the answer to the problem being set. It is better to try to build the levels in such a way that procedures in each level onjy call procedures at the very next level down but this is often impractical and existing procedures are often re-used. Indeed, by reducing the total number of lines in a program, re-using procedures is a simplifying action in its own right. There is nothing to stop a procedure from calling one at a higher !eve] but this is a very dangerous and sloppy programming practice. It not only destroys the clear nesting structure of the program, it introduces the risk of the first procedure being recalled at a later stage. For example, if PROCfind_ans for some reason called PROCset_question then eventually PROCfind_ans would be called again before the first call to it was completed! This recalling of a procedure is a roundabout way of a procedure calling itself! In BBC BASIC this is a perfectly permissible thing to do and it is called recursion. Recursion is still a controversial subject in computer science; some programmers like it and claim that it is a simplifying method, others just find it confusing and difficult. Whatever your opinion of recursion it is important that you know when you are using it, and the danger referred to in a procedure calling a procedure at a higher level is simply that you might produce a recursive program without noticing! (For more information about recursion see Chapter Eleven of The Complete Programmer.)
One of the advantages of using separate modules to construct a program is that each module can be treated as a program in its own right. However, as pointed out earlier it is only possible to work on a module in isolation if it really doesn't have any effect on any other modules. If you think about this statement for a moment you will realise that it is not quite accurate. If a module is going to work together with others to make up a program then it is essential that data is passed to it and returned from it. In other words, to produce a program the modules have to communicate. What is really required is that modules should only interact with one another in clear and strictly controlled ways.
As an example of how uncontrolled interaction can cause problems, consider the traditional BASIC subroutine. Any variable used in a BASIC subroutine is available to any other part of the entire program. This is fine as long as each subroutine uses its own names for variables that it doesn't want any other subroutine to use. In practice it is very difficult to keep track of a large number of variable names and when two subroutines end up using the same variable for different purposes the resulting bug is often very difficult to find.
The point is that modules shouldn't interact in ways that you never intended. Such accidental interactions are usually called the 'side effects' of a module. Fortunately, the BBC BASIC procedure (and, as explained later, function) has a perfect mechanism for ensuring that there are no side effects. Any variable that is named in a LOCAL statement at the start of a procedure will be the sole 'property' of the module. If an identically named variable already exists before the procedure is called then its old value is saved and the variable is re-initialised to zero. If the variable doesn't already exist then it is created and initialised to zero. Either way the procedure gets a nice new version of the variable that it can use without interfering with any procedures at a higher level. When the procedure has finished, the old values of all the local variables are restored.
Variables named in LOCAL statements are, not unreasonably, called local variables. By contrast any variable that is not named in a LOCAL statement is referred to as a global variable. Such a variable is global in the sense that its value can be used and altered by any part of the program. To make sure that procedures do not interact with one another all that is necessary is that every variable which a procedure uses should be named in a LOCAL statement. Of course, the strict application of this rule would eliminate all types of interaction including desirable communication between procedures.
A few minutes' thought indicates that the variables a procedure uses fall into four categories:
(1) Variables that pass data to the module, the so-called 'input' variables.
(2) Variables that pass data from the module, the so-called 'output variables'.
(3) Variables that pass data into and out of the module, the so-called 'input/output variables'.
(4) Variables that are used within the module and neither pass data in or out of the procedure, the so-called 'internal' variables.
It is clear that internal variables should all be named in LOCAL statements as they are only of interest to the procedure that uses them. Input variables correspond to BBC BASIC% definition of 'parameters'. One way to think about a parameter is as a local variable that is initialised to the value stored in another variable before the procedure is called. For example in the procedure:
1000 DEF PROCspace(LINES)
1010 LOCAL I
1020 FOR I=1 TO LINES
1030 PRINT
1040 NEXT I
1050 ENDPROC
both I and LINES are local variables, that is changing their values within the procedure will not change the values stored in any variables of the same name in procedures at higher levels. However, when the procedure is called, I will be initialised to zero but LINES will be initialised to the value of the variable or expression used in between the brackets of the call. For example:
PROCspace(l0)
will cause LINES to be initialised to 10 and:
PROCspace(A*l0+32)
causes LINES to be initialised to the result of A*10+32. Notice that in BBC BASIC because parameters are local variables, they cannot be used to pass data back to the calling program. In this sense they are 'input parameters'. Some languages provide in addition both 'output' and 'input} output' parameters but unfortunately these are absent from BBC BASIC. In addition, arrays cannot be named in LOCAL statements and cannot be passed as parameters. However, this doesn't mean that you cannot use individual array elements, such as A( I) as values to be passed to procedures, only that an entire array cannot be local.
All this raises the problem of how to pass data out of a procedure. Unfortunately, there is no satisfactory answer to this problem. The only practical way of passing values back is to use global variables with the resulting risk of producing unwanted side effects. There is also no alternative if you have to pass arrays in or out of a procedure arrays are always global variables.
Putting all this together produces the following guidelines:
(1) All simple data, real numbers, integers and strings should be passed into a procedure by using parameters.
(2) Arrays have to be passed into and out of procedures as global variables.
(3) Simple data has to be passed out of procedures by using global variables.
(4) All simple variables that are not being used to pass data out of a procedure should be named in the LOCAL statement.
This scheme is the best that can be achieved using BBC BASIC and after using it for a while you should find that its only serious defect is the difficulty in making plain which variables are being used to pass data back to the procedure that initially called it.
BBC BASIC does supply one method of passing a single value back without using global variables the user-defined function. A function can return a single numeric or string value without using a variable because it can be incorporated into an arithmetic or string expression. For example, you could write a procedure to square a number and return the result in the global variable SQ but it is much more sensible to use a function:
1000 DEF FNsq(X)
1010 =X*X
The last line of the function can be thought of as assigning a value to the function's name and terminating the function. An 'empty' assignment of this sort anywhere in a function will set the value that it returns and terminate it. This convention leads to some very odd looking lines such as:
IF A=0 THEN 0
IF A>0 THEN =1 ELSE =-1
which are perfectly correct as part of a function definition and give rise to a user-defined version of the supplied function SON. As an example of a string function consider:
1000 DEF FNinsert(S$,I$,I)
1010 IF I<0 THEN =S$
1020 LEFT$(S$,l)+I$+MID$(S$,I+1)
which will insert the string I$ into the string S$ following the Ith letter.
User-defined functions can define their own local variables (using the LOCAL statement). They can also use global variables to return more than one result but this is a feature that definitely should not be used. The importance of functions is that they only return one result - a function that uses global variables to return more than one result should be turned into a procedure! The reason why this one-result rule is so important is the way that functions are used within arithmetic and string expressions. For example FNsq(X) can be used in arithmetic expressions such as:
A= FNsq(B)*3
or even:
A=SQR(FNsq(B)+FNsq(C))
and most programmers do not expect expressions, or functions for that matter, to produce any side effects. In the same way, it is possible to include PRINT and INPUT statements within functions but it takes a very clever programmer to suspect an expression like:
A=FNsq(B)
of producing output or requiring input. In other words, data should only be passed to functions by parameters and they should only pass one result back in the standard manner. The main characteristic of a function is that it can be used as part of an expression and as such it should never cause an expression to do anything unpredictable.
It is often said that the use of the REM statement to explain what is happening in a program is the most important part of making a program understandable. While REMs and the comments they provide certainly do add to the readability of a program they are by no means the most important fact on A badly written program is still impossible to understand even when the REM statements outnumber the other statements many times over! On the other hand, a well structured modular program is almost self-explanatory if meaningful names are used for variables, procedures and functions. Of course, it is often the case that the occasional REM statement helps but, as most programmers admit, no matter how many good intentions you start olf with, REMs soon become a chore that is forgotten during program development, The general ru]e seems to be, if it is extra trouble then it tends not to be used! Structured modular programming isn't extra trouble; it is part of the process of program construction and it wen makes it easier! Of course, if you do find that you have no choice but to use a complicated trick then it is absolutely essential to document it within the program by using REMs, but otherwise a well written program should be almost self-documenting.
Another method of improving the readability of programs is to use a line numbering scheme of the sort that has been used without comment in all of the examples. That is the main program occupies lines 1 to 999, thus subsequent procedures occupy line numbers in the thousands. It is remarkably difficult to keep to a numbering scheme of this sort. As indicated in the previous chapter, line numbers are something that BASIC would be better without. Given that they are an evil that nothing can be done about, then using a numbering scheme is the best that can be done. However, the first use of RENUMBER destroys any scheme and so it really isn't worth worrying about inventing anything elaborate!
A very useful facility for making programs easier to read is indenting. Following LISTO 6 all FOR and UNTIL loops will be indented by two spaces in listings. This makes it easier to spot the start and end of these two structuring elements and is well worth using. However manual indenting can also be applied to other loops and blank lines can be inserted before and after procedure and function definitions either by using empty REM statements or a line that contains nothing but a colon.
Whatever you do to improve the look and readability of your programs, nothing can replace the reward of using a natural structuring method combined with the use of modules.