This chapter and the following three each present a large applications program. The first of these is a 'spelling checker' which can, in principle, be implemented using nothing but BASIC. In practice, however, it is worth using some assembly language to achieve a reasonable performance. The program in Chapter Nine an execution tracer needs to be implemented entirely in assembly language because of the way that it fits into and modifies the workings of the BBC Micro's system software. The program in Chapter Ten is another example of an all assembly language program that again fits into the existing system software but it is of a different character to the execution tracer in that it uses the BBC Micro's interrupt and event handler. Chapter Eleven is devoted to the implementation of a 6502 disassembler - written exclusively in BASIC.
None of these programs have been refined to the point where they could be considered completely finished. (Something that all advanced programmers realise is that no program is ever entirely finished!) But they are taken to a stage of development where they do something useful in thia sense they are finished all but the frills. However, if you have been following the programming philosophy outlined in this book you will soon realise how much time and effort programming the 'frills' takes!
The first thing that you should do before starting work on any program is to consider in as much detail as possible what the program is intended to do. This establishes a clear objective that your finished program can be judged against. Although it is often necessary to modify initial objectives in the light of what is possible in a reasonable amount of time, at least you will know what you are sacrificing for a working program. The second thing that you should do is to consider general ways of achieving your objective. In other words, you should try to work out rough outlines of how program would go about the task. The more ways of doing something that you can think of at this stage the better. It is also important not to get bogged down with detail at this early stage. What you should aim for is an English description of the way the program will work that contains enough information for you to identify the data structures involved.
The spelling checker is an especially large and complicated program but its full objective is easy to state:
Check each word in a text file against a list of correctly spelt words (a dictionary).
Words that are not found in the dictionary are either new words or misspellings.
To discover which, each word is presented to the user and a decision is requested.
New words can be added to the dictionary so that the program can 'learn' to spell them.
Misspelt words should be identified in the text file and the user given the opportunity of correcting them.
From this specification it is not difficult to see that a spelling check program has four actions to perform:
(1) look up the words in a dictionary
(2) ask the user to identify misspellings
(3) add new words to dictionary
(4) correct misspellings in the text
To make the program suitable as an example it makes sense to drop the last action and simply print out a list of misspellings in the text so that the user can use a text processor} editor to make corrections. This is not at all unreasonable and produces a spelling checker that is almost as useful as one that implements the full specification.
The next question is how to go about the three actions that we are going to implement. Clearly the major problem is how to look up each word in the text file in a dictionary file. The straightforward approach is to read in each word from the text file and search through the dictionary for it. However, this approach is very time inefficient in that very common words would be looked up each time they occurred in the text. A much better method would be to read the entire text file and build a list of all the words used. This list is almost certainly going to be much shorter than the text file because common words such as 'the', 'and', etc., would appear only once in the list irrespective of the number of times that they occurred in the text. This reduction of the number of words also makes it feasible to store the list of words in memory rather than as a file.
Once this list has been built each word can be looked up in the dictionary in turn. This is a task that is made much faster if both the list of words and the dictionary are sorted into alphabetical order. If they are not in order then the entire dictionary has to be searched for each word in the list. If they are in order then you can abandon the search for a word as soon as you have gone past the position that it should occupy in the dictionary. Each word that is found in the dictionary can be deleted from the list as we are only interested in the words that are not in the dictionary The rest of the program is just a matter of simple operations on the list of unidentified words.
It is important to realise that in a program of this sort it is vital to settle on a method that is efficient. As each word might have to be checked against a large dictionary file any slow operations will because of their repetition make the program too slow to use.
The spelling checker program is so large that it makes sense to tackle it in a two parts the construction of the sorted list of words and then the dictionary search and updating, This is not stepwise refinement but it does provide the opportunity to get something working before going on to extend it. Attempting a smaller problem that can be extended to include additional desired features is a good strategy if you are working alone because it provides some reward for your efforts in a shorter time than by trying to write the full program.
Before the first stage of implementation we have to decide on a data structure suitable for the storage of a list of words. The most obvious choice is a one-dimensional string array with each word stored in a different element of the array. The disadvantage of this is that the management of the string array is left to the BASIC interpreter which, for this application, would mean wasting both time and memory. Once a string array has been rejected the only other possibility is to construct a data structure using a byte array. Since the list needs to be sorted into order as words are read in, the most obvious contender is a linked list. Using a linked list new words could be inserted into their correct position without having to move existing words to make room. Attractive though it is for this application, a linked list has one important disadvantage it is slow to search for a particular word. The reason for this is that the only search method that can be used is to start at the first word and examine each entry in turn until the word is found or until you have searched beyond the point at which the word would appear if it was present in the list. This searching method is known as a linear search and it is very inefficient when compared to alternatives such as a binary search. (For details of searching methods, see Chapter Eight of The Complete Programmer, Mike James (Granada, 1983).) As the creation of a sorted fist of words represents only a small part of the work of the program, the linked list does not appear to be an attractive option. Although it is more difficult to construct a sorted list using a BASIC string array, it is a much easier data structure to apply advanced searching methods to. The obvious solution is to construct a special purpose string array using a byte array and keep the storage management within the program. To be more precise, the words can be stored in a byte array using the '$' indirection operator, If a word has to be inserted into the list all of the words that follow it have to be moved up to make room but this is more efficient than using BASIC to manage the array.
Now that the data structure has been decided upon the main program can be written:
10 MODE 7
20 SIZE%=5000
30 DIM WORD% SIZE%
40 FIN%=WORD%+SIZE%
50 FILE_END=FALSE
60 PROCinitialise
70 PROCopen_file
75 REPEAT
80 PROCread_word(F)
90 IF WORD$<>"" THEN PROCadd_word(WORD$)
100 UNTIL FILE_END
110 PROClist_words
120 STOP
Line 30 reserves 5000 memory locations as a byte array for the word list, i.e. a total of 5000 characters. The variable SIZE% is used to set the size of the array throughout the program so that this can be easily changed and FIN% is set to the address of the end of the array by line 40. PROCinitialise is included to take care of any initialisation that has to be done just once at the start of the program. PROCopen_file is responsible for getting the file name and opening the file. The resulting channel number is returned in F. The main structure of the program is a REPEAT . . . UNTIL loop formed by lines 75 to 100. This reads words from the file usfing PROCread_word and adds them to the list using PROCadd_word. Thd REPEAT . . . UNTIL loop is brought to an end by reaching the end of the file, signalled by FILELEND becoming TRUE (see Chapter Twelve for information on the use of Boolean variables). The IF statement in line 90 is necessary to allow for the possibility of the file coming to an end without PROCread word returning a word in WORD$. Finally PROClist_words is used to examine the list for correctness.
The second stage of refinement presents no real problems.
1000 DEF PROCinitialise
1010 WORDCNT%=0
1020 CURRENT_END%=WORD%
1030 ENDPROC
2000 DEF PROCopen_file
2010 CLS
2020 PRINT TAB(1,5);
2030 INPUT "File name of text to be"'
" proof read ",F$
2040 IF LEN(F$)>10 THEN
PRINT"Name too long":GOTO 2020
2050 PRINT
2060 F=OPENIN(F$)
2070 ENDPROC
3000 DEF PROCread_word
3010 LOCAL CHAR%
3020 WORD$=""
3030 REPEAT
3040 CHAR%=FNread_cap(F)
3050 UNTIL NOT(FNseparator(CHAR%))
OR FILE_END
3060 WORD$=WORD$+CHR$(CHAR%)
3065 IF FILE_END THEN GOTO 3110
3070 CHAR%=FNread_cap(F)
3080 IF FNseparator(CHAR%) OR
FILE_END THEN GOTO 3110
3090 WORD$=WORD$+CHR$(CHAR%)
3100 GOTO 3065
3110 IF WORD$<>"" THEN
WORDCNT%=WORDCNT%+1
3120 PRINT TAB(0,8);
"Number of words =";WORDCNT%
3130 ENDPROC
4000 DEF PROCadd_word(WORD$)
4020 PROCfind_word(WORD$)
4025 PROCtime(2)
4030 IF FOUND% THEN ENDPROC
4035 PROCstart
4040 PROCmake_space(WPOINT%,LEN(WORD$)+1)
4045 PROCtime(3)
4050 $WPOINT%=WORD$
4060 ENDPROC
6500 DEF PROClist_words
6505 CLS:VDU 14
6506 PROCprinter
6510 WPOINT%=WORD%
6520 IF WPOINT%=CURRENT_END% THEN
VDU 15,3:ENDPROC
6530 PRINT $WPOINT%
6540 WPOINT%=WPOINT%+LEN($WPOINT%)+1
6550 GOTO 6520
PROCinitialise and PROCopen_file are straightforward. The only real point of interest is that PROCinitialise sets CURRENT_END% to the start of the byte array. CURRENT_END% is used to mark the extent of the byte array that has been used to store words, i.e. it marks the division between the used and unused parts of the array.
The first procedure worth examination is PROCread_word. The main problem in implementing this procedure is the definition of a 'word'. You might think that a word was any group of letters enclosed by blanks. However, this is not a sufficiently wide definition to include words that are terminated by commas, full stops or other punctuation. The easiest way to define a word is to say that it is a group of letters enclosed by any of a number of legal separator characters. With this definition the algorithm of PROCread_word is:
(1) Read characters until the first non-separator is found and then store it in WORD$.
(2) Read characters until a separator is found and add each non- separator to WORD$.
Each of these steps can be clearly seen as loops in the procedure. The first is a REPEAT . . . UNTIL loop (lines 3030 to 3050) and the second is a additional loop with an exit point in the middle (lines 3070 to 3100). The exit point in the middle could be avoided and the loop turned into a REPEAT . . . UNTIL loop but in this case it is more natural to use the more complicated sort of loop. PROCread_word uses no additional procedures but it does use two functions. FNread_cap(F) returns the ASCII code of the next character read from the file after it has changed lower-case letters to upper-case. FNseparator(CHAR%) is an example of a predicate function in that it returns a value of TRUE if CHAR% is a separator and FALSE otherwise (predicate functons are described in Chapter Twelve). PROCadd_word first checks to see it WORD$ is already in the list (using PROCCfind_word). If it is then nothing else needs to be done. If the word does have to be added to the list then PROCmake_space creates a 'gap' in the list by moving up all of the existing entries. PROCfind_word returns two results, a Boolean variable FOUND% that is TRUE if the word is in the list and FALSE otherwise, and WPOINT% which gives either the position of the word in the byte array or the position that it should occupy if it is not in the array.
The final procedure is PROClist_words and this uses WPOINT% to record the address of each word in the byte array. IF WPOINT% is the address of the start of the first word then PRINT $WPOINT%. will print the first word and WPOINT%=WPOINT%+LEN($WPOINT%)+1 is the address of the second word. The only procedure that PROClist_words calls is PROCprinter which asks the user if the words should be listed on the printer or not.
The procedures to be defined at the third stage of refinement are PROCfind_word, PROCmake_space and PROCprinter. The functions yet to be defined are FNread_cap(F) and FNseparator. Of these PROCfind_word and PROCmake_space are the most difficult.
5000 DEF PROCfind_word(WORD$)
5020 FOUND%=FALSE
5030 WPOINT%=WORD%
5040 IF WPOINT%=CURRENT_END% THEN ENDPROC
5050 IF $WPOINT%>WORD$ THEN ENDPROC
5060 IF $WPOINT%=WORD$ THEN
FOUND%=TRUE:ENDPROC
5070 WPOINT%=WPOINT%+LEN($WPOINT%)+1
5080 GOTO 5040
6000 DEF PROCmake_space(FROM%,AMOUNT%)
6005 LOCAL I
6010 IF CURRENT_END%=FROM% THEN
GOTO 6050
6020 FOR I=CURRENT_END%-1 TO FROM% STEP -1
6030 I?AMOUNT%=?I
6040 NEXT I
6050 CURRENT_END%=CURRENT_END%+AMOUNT%
6060 IF ABS(FIN%-CURRENT_END%)<20 THEN
PROCnospace
6070 ENDPROC
6600 DEF PROCprinter
6605 LOCAL A%
6610 PRINT TAB(0,4);
6620 INPUT "Do you want to use
the printer (Y/N)",A$
6630 IF A$<>"Y" AND A$<>"N" THEN
GOTO 6620
6640 IF A$="Y" THEN VDU 2 ELSE VDU 3
6650 ENDPROC
9000 DEF FNseparator(C%)
9010 IF C%=ASC(" ") THEN =TRUE
9020 IF C%=ASC(",") THEN =TRUE
9030 IF C%=ASC(".") THEN =TRUE
9040 IF C%=ASC(":") THEN =TRUE
9050 IF C%=ASC(";") THEN =TRUE
9070 IF C%<32 THEN =TRUE
9080 IF C%>127 THEN =TRUE
9090 =FALSE
9100 DEF FNread_cap(F)
9110 LOCAL C%
9120 C%=BGET#F
9125 IF EOF#F THEN CLOSE#F:FILE_END=TRUE
9140 IF C%>96 THEN C%=C%-32
9150 =C%
PROCfind_word simply searches through the list of words until it reaches the end of the list (line 5040), finds the word (line 5060) or reaches the point in the list where the word should be stored (line 5050). The body of the procedure is in the form of a loop with multiple exit points (three to be exact) all grouped at the start of the loop. In fact, the first two exit points could be combined into one, that is:
5040 IF WPOINTf%=CURRENT_END% OR
$WPOINT%>WORD$ THEN ENDPROC
but the form used in the program is just as clear.
PROCmake_space is simple in principle. All it has to do is to use a FOR loop to move all of the elements of the array from FROM% to CURRENT_END% up by AMOUNT% bytes. However, in practice it is quite difficult to get all the details right. If you are writing procedures that perform complicated movements of data then it is a good idea to go through the steps involved on paper below trying to write any BASIC. It is possible for PROCmake_space to use up all of the byte array. This condition is checked for in fine 6060 but the real question is what to do when the program runs out of space, In the tradition of stepwise refinement, the answer to this difficult question is put off until the next stage of refinement by calling a procedure PROCnospace!
The functions FNseparator and FNread_cap are not difficult to implement but they both contain important points of detail. FNseparator simply checks C% against each possible type of separator character. The separators in lines 9010 to 9050 are obvious but some text processors embed control codes in the text in place of blanks and so any code lower than 32 also has to be considered a legal separator (9070). The same holds for codes above 127 but it is far less common for these to occur. Notice that the advantage of using a function to determine whether or not a character is a separator is that it is very easy to change or add to the set of characters that are considered separators and without having to delve into the inner areas of the program! For example, after you have run the program on a few samples of text you will soon discover that there are two omissions. The characters '(' and ')' have to be considered valid word separators but it is left as an exercise for the reader to add them to FNseparator. FNread_cap reads in a character code from the file and then immediately tests to see if the end of file has been reached. [fit has, then it sets the FILE_END flag so that the rest of the program knows that there is no more data and actually closes the file. Line 9140 converts all lower-case characters to upper-case. This function could protect the rest of the program from trying to read data from a closed file by including the line:
9115 IF FILE_END THEN =0
but it is probably better to know that the program is trying to read from a file that it shouldn't and deal with the problem higher up in the structure.
The fourth stage in refinement is now just one procedure, PROCnospace. It is often said that what a program should do when everything is working is easy it's when errors occur that things get tough! The temptation is to write a procedure that gives an error message "NO ROOM" and stops the program. This is quick and easy for the programmer but the user would feel that the whole experience was a waste of time. A reasonable compromise is to inform the user of the problem and then proceed to process as many words as can be held in memory. This is what PROCnospace does. It displays a message for a short time, closes the file and sets FILE_END to TRUE so that the rest of the program will think that there is no more text to process.
6700 DEF PROCnospace
6710 PRINT TAB(0,10);"Not enough
memory!!"
6720 PRINT "Closing file - but will
process"
6730 PRINT "words already read"
6740 FILE_END=TRUE
6750 CLOSE#F
6760 TIME=0
6770 REPEAT:UNTIL TIME>500
6780 ENDPROC
At the end of the last section the program was developed to the point where it would read a text file, construct a sorted list of words in the byte array starting at WORD% and then print the results either on the screen or additionally on the printer. After testing this version of the program on a number of texts you shouldn't be able to find any bugs (if you do then fix them!) although you should be able to find ways of crashing the program by specifying non-existent filenames, etc. The next stage in producing the program is to move on from this working, but extremely limited, version of the program by the addition of procedures to handle the dictionary file. Once again before committing anything to BASIC there are a few design decisions to be made. The main consideration is the form of the dictionary tile. Up to this point the program has been independent of the file system in use i.e. the program will work with either tape or disk. The problem with the dictionary becomes apparent as soon as you consider how it should be updated. If the dictionary file is to be kept in order then adding new words would mean some very complicated random access file handling. In fact it would be better to avoid the random access file handling by simply rewriting the entire dictionary file, adding the new words in their correct positions. Even this simpler method of adding words to the dictionary file requires two files to be open at the same time one for reading and one for writing and this is something that the tape system cannot usually handle.
As an alternative it seems worthwhile to drop the requirement that the dictionary file should be sorted into order. The advantage is simply that new words could be added on to the end of the file i.e. they could be appended. The disadvantage is that the search time is greater. However, this can be minimised by a number of changes to the search algorithm. Firstly the list of words stored in memory is smaller than the dictionary so it makes sense to read a word from the dictionary and search for it in the list nt words rather than vice versa. It also seems reasonable to physically remove each word that has been identified from the word list so that subsequent searches are on a smaller list. Neither of these two improvements produce a program that is as fast as one that uses a sorted dictionary but the advantages in terms of simplicity are great.
It would be possible to work out a scheme whereby new words could be added to the end of an existing cassette file but this would be fairly complicated. The cassette system not only doesn't support random access files, it doesn't allow existing files to be extended. The simplest method of adding data to a cassette file is to write an additional file following it. That is the dictionary data would be stored on tape stored on tape not in one file but in as many files as needed named DICT1, DlCT2 and so on. You would have to introduce your special 'end of all files' marker so that you would know when you had reached blank tape but apart from this there are no real problems. However, this scheme entails a complication unnecessary to the essence of the spelling checker and the remainder of the program will assume that the disk filing system is in use. Changing it to work on the tape system is left as an exercise for the reader.
The first stage in adding the dictionary handling to the spelling checker involves changes to the main program:
10 MODE 7
20 SIZE%=5000
30 DIM WORD% SIZE%
40 FIN%=WORD%+SIZE%
50 FILE_END=FALSE
60 PROCinitialise
70 PROCopen_file
75 REPEAT
76 PROCstart
80 PROCread_word
85 PROCtime(1)
90 IF WORD$<>"" THEN PROCadd_word(WORD$)
100 UNTIL FILE_END
110 PROCreport
120 PROClookup
130 PROClist_words
140 PROCadd_dict
150 END
PROCreport informs the user that reading the text file has finished and that the dictionary lookup is about to begin. PROClookup, as its name suggests, performs the actual look up operation and deletes any words that have been found from the word list. PROClist words is the procedure developed for the first version of the program that prints the list of words on the screen and additionally on the printer. Now that it follows PROClookup it gives the user a list of only those words that are not in the dictionary and are therefore possible misspellings. Finally PROCadd_dict performs the task of adding words to the dictionary.
In the second stage of refinement only PROClookup and PROCadd_dict are at all complicated:
7000 DEF PROCreport
7010 CLS
7020 PRINT TAB(0,5);"Text file has been
read."
7030 PRINT "A total of -";WORDCNT%;
" words"
7040 PRINT
7050 PRINT "Ready to check spelling
against"
7060 PRINT "dictionary file."
7070 ENDPROC
7100 DEF PROClookup
7110 F=OPENUP("DICT")
7115 FILE_END=FALSE
7120 REPEAT
7130 INPUT#F,WORD$
7135 IF WORD$="" THEN FILE_END=TRUE
7140 PROCfind_word(WORD$)
7150 IF FOUND% THEN
PROCremove(WPOINT%,LEN($WPOINT%)+1)
7160 UNTIL FILE_END
7180 PTR#F=PTR#F-2
7190 ENDPROC
8000 DEF PROCadd_dict
8005 LOCAL A$
8010 PRINT "Do you want to add correct"
8020 PRINT "words to dictionary (Y/N) ";
8030 INPUT A$
8040 IF A$<>"Y" AND A$<>"N" THEN
GOTO 8010
8050 IF A$="N" THEN CLOSE#F:ENDPROC
8060 WPOINT%=WORD%
8070 IF WPOINT%=CURRENT_END% THEN
PRINT#F,"":CLOSE#F:ENDPROC
8080 WORD$=$WPOINT%
8090 PRINT WORD$
8100 INPUT "A(dd) or I(gnore)",A$
8110 IF A$<>"A" AND A$<>"I" THEN
GOTO 8100
8120 IF A$="A" THEN PRINT #F,WORD$
8130 WPOINT%=WPOINT%+LEN($WPOINT%)+1
8140 GOTO 8070
PROClookup assumes that the dictionary file is always called DICT, opens it and proceeds to read words from it. Notice that the DICT file is read using INPUT and written using PRINT so that the BBC Micro's filing system looks after the internal format of the file. The actual search of the word list is performed by PROCfind_word which was developed for the first version of the program and so alrendy exists. The only new procedure needed is PROCremove which is the logical opposite of PROCadd_word in that it removes a word by shifting all of the entries in the word list down to close up the space that the word occupied.
PROCadd_dict is a very simple routine that asks the user if each word in the word list should be added to the dictionary. Notice that there is no need to open the dictionary file because PROC1ookup leaves the file open and positioned at the end of file marker ready for new words to be added.
The final stage of refinement involves writing PROCremove which is closely modelled on PROCadd_word.
7500 DEF PROCremove(FROM%,AMOUNT%)
7510 LOCAL I
7520 FOR I=FROM% TO CURRENT_END%-AMOUNT%
7530 ?I=I?AMOUNT%
7540 NEXT I
7550 CURRENT_END%=CURRENT_END%-AMOUNT%
7560 ENDPROC
This completes this version of the program. All that is necessary to 'start the ball rolling' is the creation of first dictionary file using:
10 *SAVE DICT 0000 8000
20 F=OPENUP("DICT")
30 PRINT#F,""
40 CLOSE#F
This results in a single entry, the word "A", being stored in the dictionary lie DICT. Following this the program can be run and the dictionary extended by the addition of correctly spelt words from a range of text files. in this way the dictionary is slowly built up rather than created by transferring a traditional paper dictionary to disk.
At this stage in program development it is always worth evaluating the program so far. In use the program shows no obvious problems apart from its tendency to crash if the correct files are not present on disk. This clearly nidicates the need for the addition of some error handling. The internal structure of the program is quite good although PROCadd_dict is a little on the large size for a single procedure and might better have been broken down into two smaller procedures. It would also have been better if PROClookup had used separate procedures to open and read the dictionary file this would make any future modifications to the file handling easier. However these criticisms are not so serious that any immediate action needs to be taken although you might want to disagree!
Perhaps the most obvious problem with this program is that it is slow. And it gets increasingly slow as words are added to the word list. There are two possible answers to this problem:
(1) Improve the search method.
(2) Use machine code.
It is quite obvious that a great saving in time could be made by replacing PROCfind_word with a procedure that uses a binary search (see The Complete Programmer). However, as the spelling checker is being used here as an example we will go straight to the machine code option as a way of illustrating how BASIC and machine code can be used together.
The main thing to remember about using assembly language is that it is better not to use it at all! Given that the bulk of any program is only executed a few times it is possible to implement it using BASIC without any serious problems of inefficiency. However, any section of a program that is executed repeatedly can very easily become a problem. Even a section of code that only takes a fraction of a second to execute in BASIC may add up to a considerable delay if it is carried out thousands of times. Obviously the best thing to do is to identify any such 'critical' pieces of program and replace them by assembly language versions. In this way BASIC and assembler can be used to produce efficient programs with the minimum of effort.
In the case of the spelling checker it is obvious that searching the word list and adding new words to it are the two most likely candidates for assembly language routines. However, for the sake of example we will suppose that this is not quite so obvious and try to identify where the inefficiencies lie. To do this we need to know how much of the program's running time is taken by each procedure. The following three procedures can be used to record the cumulative time that any procedure is used for:
10000 DEF PROCstart
10010 TIME=0
10020 ENDPROC
10100 DEF PROCtime(I)
10110 T(I)=T(I)+TIME
10130 TIME=0
10140 ENDPROC
10200 DEF PROCtresult(N)
10210 LOCAL I
10220 FOR I=1 TO N
10230 PRINT I,T(I)
10240 NEXT I
10250 IF INKEY$(0)="" THEN GOTO 10250
10260 ENDPROC
PROCstart zeros the clock. PROCtime(I) adds the time since it was last called or from the last call to PROCstart to (I). PROCtresult(N) prints the cumulative times stored in T(I) to T(N). To use these procedures to find out how long it takes to read all the words in a file, search for them in the word list and finally add them to the list all we have to do is add the lines:
1 DIM T(10)
76 PROCstart
85 PROCtime(1)
105 PROCtresult(3)
4025 PROCtime(2)
4035 PROCstart
4045 PROCtime(3)
Lines 76 and 85 record the time spent in PROCread_word, line 4025 records the time spent in PROCfind word and lines 4035 and 4045 record the time spent in PROCmake_space (on common sense grounds these are likely to be the most time-consuming procedures).
Running the spelling checker, a 150 (approx.) word text gave the following results:
PROCread_word 18.07 seconds
PROCfind_word 28.29 seconds
PROCmake_space 43.36 seconds
Clearly PROCmake_space is the first procedure that should be replaced by assembly language!
Using assembly language within a BBC BASIC program is easy but it is still worth minimising the total amount of assembly language used. An examination of PROCmake_space suggests that the most time-consuming section is the FOR loop that actually does the move of the words in the list. This suggests that we need an assembly language routine that will move all of the characters in the list from FROM% to CURRENT_END% up by AMOUNT% bytes. It is clear that we are going to have to pass FROM%, CURRENT_END% and AMOUNT% to the routine as parameters and this implies the use of the CALL instruction. That is, the new version of PROCmake_space using an assembly language routine MOVE% to implement the move is:
6000 DEF PROCmake_space(FROM%,AMOUNT%)
6010 IF CURRENT_END%=FROM% THEN
GOTO 6050
6020 CALL MVE%,FROM%,AMOUNT%,CURRENT_END%
6050 CURRENT_END%=CURRENT_END%+AMOUNT%
6060 IF ABS(FIN%-CURRENT_END%)<20 THEN
PROCnospace
6070 ENDPROC
Notice that only the FOR loop (lines 6020 to 6040 in the old version) has been replaced and BASIC is still used to check for a word added to the end of the byte array (line 6010). updating CURRENT_END% (line 6050) and checking that there is enough space to add more words (line 6060). These operations would all have taken a great deal of assembly language to implement and the resulting gain in speed would be negligible.
All that now remains is to write the assembly language that constitutes MVE%!
The 6502 has a reputation for being a fast and efficient microprocessor but it lacks any registers that can hold a full 16-bit number or address. This is a problem for this particular application in that it involves the manipulation oi 16-bit addresses. The solution lies in the use of 'indirect' addressing. For example,
LDY #0
LDA (POlNT%),Y
will load the A register from the memory location whose address is stored in the pair of locations POINT% and POINT%+1. In this sense POINT% and POINT%+1 act like a 16-bit index register i.e. they hold the full 16- bit address of a memory location that is involved in an instruction. The only complication is that this sort of indirect addressing only works with memory locations in page zero, that is &00 to &FF. Page zero memory locations are so important and in such short supply that most of them are already used by BBC BASIC and the MOS, but & 70 to &8F are set aside for use by application programs.
If POINT% is the address of the lowest pair of memory locations in page zero, then the basis of the main algorithm of the MVE% routine can be written:
LDY #0
LDA (POINT%),Y
LDY AMNT%
STA (POINT%),Y
The first two lines load A from the memory location pointed at by POINT% and the final two lines store A in the memory location AMNT% higher up in memory. This should be compared to
POINT%?AMNT%=?POINT%
which performs the same operation in BASIC.
Assembly language programming follows the same stepwise refinement principle used for BASIC and so the first stage of MVE% is:
8610 .MVE% JSR PARM_GET%
8620 JSR START_LOOP%
8630 .MLOOP% LDY #0
8640 LDA (POINT%),Y
8650 LDY AMNT%
8660 STA (POINT%),Y
8665 JSR TST_END%
8666 BEQ MEXIT%
8670 DEC POINT%
8675 LDA POINT%
8676 CMP #&FF
8680 BNE MLOOP%
8690 DEC POINT%+1
8700 JMP MLOOP%
8710 .MEXIT% RTS
Subroutine PARM_GET% moves the parameters passed to MVE% from the parameter block at &600 into page zero so that they can be used as indirect addresses. Notice that the CALL instruction stores the addresses of the parameters, not their values in the parameter block. START_LOOP% initialises POINT% to the address stored in CURRENT_END% and stores the value in AMOUNT% in AMNT%. Lines 8630 form the assembly language equivalent of the FOR loop in the original version of the procedure. Subroutine TST_END% compares POINT% to the address in FROM% and returns with the Z flag set accordingly. Finally lines 8670 to 8690 perform a 16-bit decrement of the value stored in POINT% notice that the 6502 has no 16-bit operations.
The next stage of refinement gives:
8740 .PARM_GET% LDX #0
8750 LDY #0
8760 .PLOOP% LDA &601,Y
8770 STA &70,X
8780 INX
8790 INY
8800 LDA &601,Y
8810 STA &70,X
8820 INX
8830 INY
8840 INY
8850 DEC &600
8860 BNE PLOOP%
8870 RTS
8880 \
8890 .START_LOOP% LDY #0
8900 LDA (CRNT_END%),Y
8910 STA POINT%,Y
8915 LDA (AMNT%),Y
8916 STA AMNT%
8920 INY
8930 LDA (CRNT_END%),Y
8940 STA POINT%,Y
8950 RTS
8955 \
8960 .TST_END% LDA POINT%
8965 LDY #0
8967 CMP (FRM%),Y
8969 BNE TEXIT%
8970 INY
8972 LDA POINT%+1
8974 CMP (FRM%),Y
8976 .TEXIT% RTS
PARM_GET simply transfers the parameter addresses from the parameter block at &600 to zero page locations starting at &70. To understand this subroutine all you have to know is that &600 contains the number of parameters and each parameter is in the form of a two byte (16-bit) address and a single byte giving the type of the parameter. As all the parameters used in MVE% are integer variables, the type bytes are ignored and at the end of PARM_GET%, &70 and &71 hold the address of FROM%, &72 and &73 hold the address of AMOUNT% and &74 and &75 hold the address of CURRENT_END%. That is after PARM_GET%.
LDY #0
LDA (&70), Y
would load A with the first (least significant) byte of FROM%. To avoid confusion the following labels are defined:
8545 FRM%=&70
8546 AMNT%=&72
8547 CRNT_END%=&74
START_LOOP% transfers the value stored in CURRENT_END% into POINT% and POINT%+1 and stores the first byte of AMOUNT% in AMNT%. Notice that following START_LOOP% AMNT% no longer contains the address of AMOUNT%, it contains the first byte of its value. (It is assumed that no word is longer than 256 characters!) Finally TST_END% is the assembly language equivalent of a predicate function. Instead of a truth value TST_END% returns with the Z condition code set according to whether the l6-bit value in POINT% is equal to the 16-bit value stored in FROM%. Notice that this is done in two stages; first the least significant bytes are compared (line 8967 and 8969) and then the most significant bytes are compared (lines 8972 and 8974). If either pair of bytes is different then the subroutine returns with the Z condition code clear. Whenever you have a complex condition to test for in assembler use a predicate subroutine like TST_END%.
After including all of the assembly language and the necessary BASIC to assemble it, the complete program is:
1 DIM T(10)
10 MODE 7
15 DIM CODE% 200
20 SIZE%=5000
30 DIM WORD% SIZE%
40 FIN%=WORD%+SIZE%
50 FILE_END=FALSE
60 PROCinitialise
70 PROCopen_file
75 REPEAT
76 PROCstart
80 PROCread_word
85 PROCtime(1)
90 IF WORD$<>"" THEN PROCadd_word
100 UNTIL FILE_END
105 PROCtresult(3)
110 PROCreport
120 PROClookup
130 PROClist_words
140 PROCadd_dict
150 END
1000 DEF PROCinitialise
1010 WORDCNT%=0
1020 CURRENT_END%=WORD%
1030 PROCasm(CODE%)
1040 ENDPROC
2000 DEF PROCopen_file
2010 CLS
2020 PRINT TAB(1,5);
2030 INPUT "File name of text to be"'
" proof read ",F$
2040 IF LEN(F$)>10 THEN
PRINT"Name too long":GOTO 2020
2050 PRINT
2060 F=OPENIN(F$)
2070 ENDPROC
3000 DEF PROCread_word
3010 LOCAL CHAR%
3020 WORD$=""
3030 REPEAT
3040 CHAR%=FNread_cap(F)
3050 UNTIL NOT(FNseparator(CHAR%))
OR FILE_END
3060 WORD$=WORD$+CHR$(CHAR%)
3065 IF FILE_END THEN GOTO 3110
3070 CHAR%=FNread_cap(F)
3080 IF FNseparator(CHAR%) OR
FILE_END THEN GOTO 3110
3090 WORD$=WORD$+CHR$(CHAR%)
3100 GOTO 3065
3110 IF WORD$<>"" THEN
WORDCNT%=WORDCNT%+1
3120 PRINT TAB(0,8);
"Number of words =";WORDCNT%
3130 ENDPROC
4000 DEF PROCadd_word
4020 PROCfind_word
4025 PROCtime(2)
4030 IF FOUND% THEN ENDPROC
4035 PROCstart
4040 PROCmake_space(WPOINT%,LEN(WORD$)+1)
4045 PROCtime(3)
4050 $WPOINT%=WORD$
4060 ENDPROC
5000 DEF PROCfind_word
5020 FOUND%=FALSE
5030 WPOINT%=WORD%
5040 IF WPOINT%=CURRENT_END% THEN ENDPROC
5050 IF $WPOINT%>WORD$ THEN ENDPROC
5060 IF $WPOINT%=WORD$ THEN FOUND%=TRUE
:ENDPROC
5070 WPOINT%=WPOINT%+LEN($WPOINT%)+1
5080 GOTO 5040
6500 DEF PROClist_words
6505 CLS:VDU 14
6506 PROCprinter
6510 WPOINT%=WORD%
6520 IF WPOINT%=CURRENT_END% THEN
VDU 15,3:ENDPROC
6530 PRINT $WPOINT%
6540 WPOINT%=WPOINT%+LEN($WPOINT%)+1
6550 GOTO 6520
6600 DEF PROCprinter
6605 LOCAL A%
6610 PRINT TAB(0,4);
6620 INPUT "Do you want to use
the printer (Y/N)",A$
6630 IF A$<>"Y" AND A$<>"N" THEN
GOTO 6620
6640 IF A$="Y" THEN VDU 2 ELSE VDU 3
6650 ENDPROC
6700 DEF PROCnospace
6710 PRINT TAB(0,10);"Not enough
memory!!"
6720 PRINT "Closing file - but will
process"
6730 PRINT "words already read"
6740 FILE_END=TRUE
6750 CLOSE#F
6760 TIME=0
6770 REPEAT:UNTIL TIME>500
6780 ENDPROC
7000 DEF PROCreport
7010 CLS
7020 PRINT TAB(0,5);"Text file has been
read."
7030 PRINT "A total of -";WORDCNT%;
" words"
7040 PRINT
7050 PRINT "Ready to check spelling
against"
7060 PRINT "dictionary file."
7070 ENDPROC
7100 DEF PROClookup
7110 F=OPENUP("DICT")
7115 FILE_END=FALSE
7120 REPEAT
7130 INPUT#F,WORD$
7135 IF WORD$="" THEN FILE_END=TRUE
7140 PROCfind_word
7150 IF FOUND% THEN
PROCremove(WPOINT%,LEN($WPOINT%)+1)
7160 UNTIL FILE_END
7180 PTR#F=PTR#F-2
7190 ENDPROC
7500 DEF PROCremove(FROM%,AMOUNT%)
7510 LOCAL I
7520 FOR I=FROM% TO CURRENT_END%-AMOUNT%
7530 ?I=I?AMOUNT%
7540 NEXT I
7550 CURRENT_END%=CURRENT_END%-AMOUNT%
7560 ENDPROC
8000 DEF PROCadd_dict
8005 LOCAL A$
8010 PRINT "Do you want to add correct"
8020 PRINT "words to dictionary (Y/N) ";
8030 INPUT A$
8040 IF A$<>"Y" AND A$<>"N" THEN
GOTO 8010
8050 IF A$="N" THEN CLOSE#F:ENDPROC
8060 WPOINT%=WORD%
8070 IF WPOINT%=CURRENT_END% THEN
PRINT#F,"":CLOSE#F:ENDPROC
8080 WORD$=$WPOINT%
8090 PRINT WORD$
8100 INPUT "A(dd) or I(gnore)",A$
8110 IF A$<>"A" AND A$<>"I" THEN
GOTO 8100
8120 IF A$="A" THEN PRINT #F,WORD$
8130 WPOINT%=WPOINT%+LEN($WPOINT%)+1
8140 GOTO 8070
8500 DEF PROCasm(START%)
8510 LOCAL PASS
8520 FOR PASS=0 TO 3 STEP 3
8530 P%=START%
8540 REM CALL MVE%,FRM%,AMNT%,CRNT_END%
8545 FRM%=&70
8546 AMNT%=&72
8547 CRNT_END%=&74
8548 POINT%=&76
8600 [OPT PASS
8610 .MVE% JSR PARM_GET%
8620 JSR START_LOOP%
8630 .MLOOP% LDY #0
8640 LDA (POINT%),Y
8650 LDY AMNT%
8660 STA (POINT%),Y
8665 JSR TST_END%
8666 BEQ MEXIT%
8670 DEC POINT%
8675 LDA POINT%
8676 CMP #&FF
8680 BNE MLOOP%
8690 DEC POINT%+1
8700 JMP MLOOP%
8710 .MEXIT% RTS
8720 \
8740 .PARM_GET% LDX #0
8750 LDY #0
8760 .PLOOP% LDA &601,Y
8770 STA &70,X
8780 INX
8790 INY
8800 LDA &601,Y
8810 STA &70,X
8820 INX
8830 INY
8840 INY
8850 DEC &600
8860 BNE PLOOP%
8870 RTS
8880 \
8890 .START_LOOP% LDY #0
8900 LDA (CRNT_END%),Y
8910 STA POINT%,Y
8915 LDA (AMNT%),Y
8916 STA AMNT%
8920 INY
8930 LDA (CRNT_END%),Y
8940 STA POINT%,Y
8950 RTS
8955 \
8960 .TST_END% LDA POINT%
8965 LDY #0
8967 CMP (FRM%),Y
8969 BNE TEXIT%
8970 INY
8972 LDA POINT%+1
8974 CMP (FRM%),Y
8976 .TEXIT% RTS
8990 ]
8995 NEXT PASS
8999 ENDPROC
9000 DEF FNseparator(C%)
9010 IF C%=ASC(" ") THEN =TRUE
9020 IF C%=ASC(",") THEN =TRUE
9030 IF C%=ASC(".") THEN =TRUE
9040 IF C%=ASC(":") THEN =TRUE
9050 IF C%=ASC(";") THEN =TRUE
9070 IF C%<32 THEN =TRUE
9080 IF C%>127 THEN =TRUE
9090 =FALSE
9100 DEF FNread_cap(F)
9110 LOCAL C%
9120 C%=BGET#F
9125 IF EOF#F THEN CLOSE#F:FILE_END=TRUE
9140 IF C%>96 THEN C%=C%-32
9150 =C%
10000 DEF PROCstart
10010 TIME=0
10020 ENDPROC
10100 DEF PROCtime(I)
10110 T(I)=T(I)+TIME
10130 TIME=0
10140 ENDPROC
10200 DEF PROCtresult(N)
10210 LOCAL I
10220 FOR I=1 TO N
10230 PRINT I,T(I)
10240 NEXT I
10250 IF INKEY$(0)="" THEN GOTO 10250
10260 ENDPROC
When this version of the program is run on the same text file of 150 (approx.) words the timing results are:
PROCread_word 18.65
PROCfind word 28.55
PROCmake_space 2.49
which for PROCmake_space is an improvement of more than a factor of twenty! This excellent result should convince you to change PROCfmd_word to an assembly language routine.
The spelling checker has proved to be a very large program indeed and a completely finished version is well beyond the scope of this book. You should, however, be able to extend and modify it by the continued use of modular and structured programming. In doing so you should be able to prove for yourself how much easier it is to program using an organised method. The second point that this example illustrates is how BASIC and assembler can be used together to make programs efficient. There are very few occasions where a program has to be written in nothing but assembler and BASIC is always easier!