Section three: BASIC program storage


The format in which BASIC programs are stored is as follows:

PAGE&D -- 'return'
PAGE + 1LSB of line number
PAGE + 2MSB of line number
PAGE + 3Length of line
.......Text of line
PAGE + N&D -- 'return'
PAGE + N + 1LSB of line number
PAGE + N + 2MSB of line number
PAGE + N + 3Length of line
PAGE + N + 4Start of text of next line
etc. . .

Each line of text is preceded by the sequence 'return'/line
number/length of line. The end of the program is indicated by a line number whose first byte is &FF.

The text of the lines is stored in normal ASCII codes, except for a few special cases:

--   All keywords are stored as tokens. These are single byte abbreviations.

--   The line numbers in GOSUB/GOTO/RESTORE/ON . . . GOTO/ON . . . GOSUB are stored in special binary format.

The tokens used are listed in the User Guide. A point to watch is that certain keywords are not totally tokenised. For example, TOP is tokenised as the keyword 'TO', as in FOR, followed by the ASCII letter 'P'.

The format used following a GOTO or GOSUB is particularly involved:
The line number is replaced by a byte 141, followed by three bytes of code:

Bits --7654 32 10
Byte 101128s64s016384s00
Byte 20132s16s8s4s2s1s
Byte 3018192s4096s2048s1024s512s256s

to represent the line number.

Those bits with a bar across their values are one if the line number does not contain the value, and zero if it does. The format is thus basically binary, except that the order of the bits has been altered.

As an example, the line GOTO 12345 will be 'hand tokenised': the code of GOTO is &E5, so this will be the first byte of the line.
A space follows, so the next byte is &20.
Then we get the code 141, or &8D (oddly enough, the double height code in teletext graphics).

The number 12345 in binary is "0011000000111001".

This can be better expressed as:

1 unit
0 twos
0 fours
1 eight
1 sixteen
1 thirty-two
0 sixty-fours
0 one-hundred-and-twenty-eights
0 two-hundred-and-fifty-sixes
0 five-hundred-and-twelves
0 one-thousand-and-twenty-fours
0 two-thousand-and-forty-eights
1 four-thousand-and-ninety-six
1 eight-thousand-and-one-hundred-and-ninety-two
0 sixteen-thousand-three-hundred-and-eighty-fours

Thus the binary format for the next three bytes is as follows:

Byte 101010100
Byte 201111001
Byte 301110000

In hexadecimal, this is:

Byte 1&54
Byte 2&79
Byte 3&70

To check this, try this program. I have included a printout sample run to reassure you!

   10 GOTO 12345
12345 FOR T%=PAGE TO PAGE+20
12346 PRINT ~T%,~?T%
12347 NEXT T%
 RUN
       E00         D
       E01       0
       E02         A
       E03        20
       E05        E5
       E06        20
       E07        8D
       E08        54
       E09        79
       E0A        70
       E0B         D
       E0C        30
       E0D        39
       E0E        12
       E0F        20
       E10        E3
       E11        20
       E12        54
       E13        25
       E14        3D


The bytes I have described start at &E05.

The idea of using this peculiar code is to increase the speed of various operations concerning statements like GOTO/GOSUB/RESTORE/ON. . .GOTO. The most obvious advantage of this approach is that GOTO 1 occupies the same space as GOTO 32767. Thus, the command RENUMBER need only alter these three bytes, and the two bytes containing each line number to renumber the whole program. Actually, it renumbers the lines, and then looks for any byte 141s. When it finds one, the three bytes following it are renumbered. On other computers, the whole program text may need to be moved about, to accommodate the differing lengths of program lines as the GOTO and GOSUB destinations are altered.

The other advantage occurs when the line is being interpreted -- the computer need not convert a string of ASCII digits into binary before acting on the command -- it has them in a form of binary already.

It should be noted that the only part of this of use to a good programmer is the RESTORE statements option when it is included with a line number.

There is a table starting at address &806D in the BASIC ROM which contains all the keywords in ASCII, followed by their tokens. The table ends at address &8358.

The format of the table is: ASCII Characters/token/spare byte and so on. The end of the ASCII characters is gauged by when the next character is greater than 127, since all tokens are &80 or greater. The spare byte is used to show certain things about the keyword, which need not concern us here.

The program which follows prints out all legal keywords and their tokens, by accessing the table. I have included a sample run:

   10 VDU 14
   20 T%=&806D:REM &8071 for Basic 2
   30 REPEAT
   40 REPEAT
   50 PRINT CHR$(?T%);
   60 T%=T%+1
   70 UNTIL ?T%>127
   80 PRINT STRING$(20-POS,".");~?T%
   90 T%=T%+2
  100 UNTIL T%>&8358:REM &8366 for Basic 2
  110 VDU 15
 RUN
AND.................80
ABS.................94
ACS.................95
ADVAL...............96
ASC.................97
ASN.................98
ATN.................99
AUTO................C6
BGET................9A
BPUT................D5
COLOUR..............FB
CALL................D6
CHAIN...............D7
CHR$................BD
CLEAR...............D8
CLOSE...............D9
CLG.................DA
CLS.................DB
COS.................9B
COUNT...............9C
DATA................DC
DEG.................9D
DEF.................DD
DELETE..............C7
DIV.................81
DIM.................DE
DRAW................DF
ENDPROC.............E1
END.................E0
ENVELOPE............E2
ELSE................8B
EVAL................A0
ERL.................9E
ERROR...............85
EOF.................C5
EOR.................82
ERR.................9F
EXP.................A1
EXT.................A2
FOR.................E3
FALSE...............A3
FN..................A4
GOTO................E5
GET$................BE
GET.................A5
GOSUB...............E4
GCOL................E6
HIMEM...............93
INPUT...............E8
IF..................E7
INKEY$..............BF
INKEY...............A6
INT.................A8
INSTR(..............A7
LIST................C9
LINE................86
LOAD................C8
LOMEM...............92
LOCAL...............EA
LEFT$(..............C0
LEN.................A9
LET.................E9
LOG.................AB
LN..................AA
MID$(...............C1
MODE................EB
MOD.................83
MOVE................EC
NEXT................ED
NEW.................CA
NOT.................AC
OLD.................CB
ON..................EE
OFF.................87
OR..................84
OPENIN..............8E
OPENOUT.............AE
OPENUP..............AD
OSCLI...............FF
PRINT...............F1
PAGE................90
PTR.................8F
PI..................AF
PLOT................F0
POINT(..............B0
PROC................F2
POS.................B1
RETURN..............F8
REPEAT..............F5
REPORT..............F6
READ................F3
REM.................F4
RUN.................F9
RAD.................B2
RESTORE.............F7
RIGHT$(.............C2
RND.................B3
RENUMBER............CC
STEP................88
SAVE................CD
SGN.................B4
SIN.................B5
SQR.................B6
SPC.................89
STR$................C3
STRING$(............C4
SOUND...............D4
STOP................FA
TAN.................B7
THEN................8C
TO..................B8
TAB(................8A
TRACE...............FC
TIME................91
TRUE................B9
UNTIL...............FD
USR.................BA
VDU.................EF
VAL.................BB
VPOS................BC
WIDTH...............FE
PAGE................D0
PTR.................CF
TIME................D1
LOMEM...............D2
HIMEM...............D3


Notice how only those functions which take two or more arguments include the bracket in the token. This is because arguments taking a single argument may have the brackets omitted. At the end of the table, the pseudo variables appear again. Their tokens here are used when the variable appears on the right-hand side of an assignment statement. You can see how this works in the list of keywords in the manual.

On the subject of pseudo variables, here is a list of the locations where TOP, HIMEM, PAGE, and LOMEM can be found:

NameLSBMSB
TOP&12&13
PAGE&1D
HIMEM&6&7
LOMEM&0&1

Knowing these locations should only be useful to the machine language programmer, since BASIC programmers are already provided with the tools to alter and interrogate these locations. If you ever need to alter a BASIC program from within a BASIC program, I would be inclined to add the changes to the keyboard buffer, using programs given in the last section, rather than using the indirection operators. If you do this, remember that BASIC will accept lines of input which are still tokenised.