10.0 Introduction
10.1 The equation of a straight line
10.2 The correlation coefficient
10,3 Using PROCbstln
10.4 Activities
10.3 Setting the accuracy of the equation and the correlation coefficient
10.6 Activities
It is often useful to plot points in such a way that they lie on a straight line. There are several reasons for this. For example, any predictions from a straight line: the line can be extended (or extrapolated) beyond the range of the measurements, and in-between values can be inserted (or interpolated).
When the points that you want to plot lie very close to a straight line, it is not difficult to draw the line. The problem comes when the points are scattered widely to either side of the line. Then it is difficult to decide on the most appropriate orientation of the line. In this chapter we help by providing a procedure PROCbstln and showing how to use it.
As the next sections explain, PROCbstln does not only draw the best straight line through a set of points. It also prints up other relevant information.
In practice PRDCbstln does more than merely drawing the best straight line through any set of points which you provide. It also prints up the equation of that line. This is always in the following form, where m is the slope of the graph and c is the Intercept on the Y axis:
Y = mX + c
The slope is positive if the line slopes from bottom left to top right and negative if the line slopes from top left to bottom right. The intercept on the Y axis is positive if the line cuts the Y axis above the X axis and negative if the line cuts the Y axis below the X axis.
PROCbstln does not only print up the equation of the best straight line. It also indicates how well the points fit a straight line. It expresses this as the 'correlation coefficient'. The magnitude of a correlation coefficient can be between 0 and I, he. I if the fit is perfect and is 0 if the fit is non-existent. The sign of a correlation coefficient indicates which way the line slopes. Bottom left to top right is positive and top left to bottom right is negative.
In order to use PR0Cbstln, all you have to do is to develop some 'driver' lines and feed in your data. We shall illustrate with some data about how the total fuel consumption of a car varies with total mileage. From the point of view of the owner of the car, this is the sort of relationship that is worth plotting, because it should be more or less a straight line: the greater the mileage, the correspondingly greater the fuel consumed. Any variation from a straight line suggests that the car is not working as efficiently as it might; and the magnitude of the slope gives the average fuel consumption in miles per litre.
Screen Display 10.1 shows a graph of the mileometer reading for a car against the number of litres of petrol put into it. Listing 10.1 gives the program. You see that the data is fed in and stored as data statements. Ours was originally taken from a log book: The mileometer reading was recorded each time the car was filled with petrol, as was the amount of petrol left in the tank at the time of filling.
The program of Listing 10.1 is probably self-explanatory. it sets the foreground and background colours for the display and it uses PROCgraph, PROCnamex and PROCnamey, as explained in Chapter 8. It expects the first item in the data to be the number of sets of entries to follow. The arrays X(n) and Y(n) are then dimensioned. Each set of data in the data list corresponds to the figures collected each time the car is filled with petrol i.e. the mileometer reading, the petrol put in and an estimate of how much petrol was still in the tank at the time. As the petrol placed in the tank at any one filling is for miles still to be travelled, while the mileometer reading is for miles already travelled, the data for the arrays X(n) and Y(n) have to be read out of step, as you can see in line 50. Line 60 then sums the total petrol put in the tank, while line 70 compensates for any petrol still left in the tank. Errors in these measurements do not accumulate and, if necessary, the estimate for the fuel left in the tank when filling up can be kept at zero. This would give more scatter to the points on the graph, but the overall slope should give the same fuel consumption.
i. Run the program of Listing 10.1 and see if it behaves as you expect.
Screen Display 10.1
ii. If you run a car, try recording your mileage each time you fill up for petrol. Guess the petrol left in the tank at the time and record the petrol put in. You will now be able to keep an accurate track of how your car is performing. In recording this program, you automatically store your previous petrol consumption figures with it. So next time you fill up with petrol, you merely have to load in your existing program and add to the data. Any changes should be immediately obvious from the display.
iii. Modify the data for Listing 10.1 by neglecting the estimated petrol remaining in the tank, i.e. setting it to zero fw all the data. How does this affect the equation of the straight line and the correlation coefficient?
When PROCbst1n evaluates the equation of the best straight line and the correlation coefficient, it prints them out, corrected to two decimal places. We chose this accuracy quite arbitrarily to prevent the equation looking too clumsy, but we have made provision for you to alter it if you want. PROCbstln includes the following statement at line 10870:
@%=&20204
It sets the number of decimal places to 2, but you can vary this up to an accuracy of 9 significant figures, The number of decimal places - currently 2 - is given by the third digit after the ampersand &. If you increase this, you will accordingly have to alter the space within which the number is printed. This space at present set to 4 - is given by the last digit.
Screen display 10.2 (first part)
Screen Display 10.2 (second part)
Screen Display 10.2 shows the dialogue and screen display produced by running the program of Listing 10.2. The data that we, as users of the program have fed in, is underlined to distinguish it from that part of the dialogue which comes from the computer. This program is similar to that of Listing 10.1, except that it allows data to be entered at the time the program is run. Enter this program and run it. Then try going into PROCbst1n to vary the accuracy with which the equation and correlation coefficient are printed out.