Improving System Performance BASIC Efficiency
Recently, the consultant was called by a client who had problems running a tape archive process. He needed the results within a day, and the process was taking over 72 hours.
The company was in the business of physical document and data record storage using a GA 5500 to keep track of location and content of each document.
The task was designed to transfer geological survey data to tape for distribution. This data was kept in separate files according to type. These types included seismic section, well log, and well core data
For non-PICK compatibility, the tapes were required to be fixed record length format. BASIC was used to read the variable length items and convert them to fixed length records, block them (10 records per block) and write to tape
Each tape routine was a separate process within the job and was initiated by its own BASIC program. The worst offender was the seismic section tape. It alone was taking 24 hours.
Each item represented a scroll or document stored elsewhere. The item-id was the location and reference number of a document. There were 30 attributes in each item, and every attribute had up to 100 multivalues. These were correlated lists of seismic data.
The program had to build a fixed length 110 byte record from each value followed vertically through each attribute.
(Record 1 was all the first values, record 2 was all the second values, etc.)
The data item:
#1 #2 #3 ... #100
001 1data]2data]3data ...]Cdata
002 ldata]2data]3data ...]Cdata
030 1Data]2data]3data ...]Cdata
Tape records to generate:
#1 [ldata 1data 1data ... 1data]
#2 [2data 2data 2data ... 2data]
#100 [Cdata Cdata Cdata ... Cdata]
In the interest of clarity, consider only the program loop used to break apart the item. This algorithm was used by each tape program that the job ran:
STOP.LOOP = 0 LOOP * Get the next id from the active item-list. READNEXT ID ELSE STOP.LOOP = 1 UNTIL STOP.LOOP DO * Read the item to the array variable ITEM. READ ITEM FROM FILE,ID ELSE ITEM = "" * TAPE.RECORD = "" ;* Init the Tape Record FOR VALUE = 1 TO 100 ;* for 100 values FOR ATTRIBUTE = 1 TO 30 ;*get each attribute VAL = ITEM<ATTRIBUTE,VALUE> ;* put it In VAL TAPE.RECORD = TAPE.RECORD : VAL[1,3] ;*VAL Is 3 long NEXT ATTRIBUTE * * Code to process the Tape Record * NEXT VALUE REPEAT
Each pass through the loop addresses the attribute and value with a dynamic array. This function is performed (100x30) 3,000 times.
Arrays come in two flavors, dynamic and dimensioned. The choice of the correct array type under the right circumstances can greatly affect program performance.
Dynamic means "in motion, changing." Dynamic arrays have a variable number of entries. The array is referenced as a single variable and occupies a single entry in the DESCRIPTOR table. The array is actually a character string containing imbedded system delimiters marking the array entries. These delimiters may occur as any combination of attribute marks, value marks or sub-value marks. System delimiters are treated as part of the data string.
READ ARRAY FROM FILE,ID ELSE STOP
Array ARRAY equals the string:
BASIC contains certain intrinsic functions for handling dynamic arrays. Array entries can be INSERTed, DELETEd, EXTRACTed or REPLACEd. Dynamic array references are of the general form:
Each array reference must search for the addressed entry. On each request, the process scans from the beginning of the string to the delimiter count. As strings get larger, the amount of work required for each instruction increases. The speed at which an entry in a dynamic array is addressed can further decay as the string crosses frame boundaries.
The command READ retrieves an item and assigns it as a dynamic array. There is no setup required other than OPENing
Dimensioned arrays can be referred to as "static" arrays. The number of array entries is constant. This static array must be defined within the program by the DIMENSION command.
DIM ARRAY(20) DIM TABLE(20,50)
Each entry in these arrays is a "subscripted" variable, occupying a separate entry in the DESCRIPTOR table. When an item is read using MATREAD, the attributes are assigned to their corresponding array entries, (MATREAD only works with one-dimensional arrays.)
DIM ARRAY(3) MATREAD ARRAY FROM FILE, ID ELSE STOP
Each attribute is a separate variable and does not require the item to be scanned. Attributes are retrieved using a subscript to address the variable.
ARRAY(2) is attribute 2 of the item MATREAD into ARRAY.
Each array entry contains imbedded value and sub-value marks. Therefore, each dimensioned array entry in itself can be a dynamic array.
This addresses the third value in dimensioned array entry number 2. The value of this EXTRACT is "411."
In the program, each array reference, ITEM<ATTRIBUTE,VALUE>, requires a scan from the beginning ITEM. For simplicity, assume there are three characters per value field, approximately 400 characters per attribute (including delimiters) and 12,000 characters in an item. The approximate number of characters required to scan on the first pass (when the VALUE is equal to 1,2,3, up to 100) is shown in the follow charts.
How much overhead is involved when using the dynamic array under these conditions?
The approximate length of the scan per value using a dynamic array:
174Kb for the first pass. Each subsequent pass is slightly more. The total number of bytes to scan for the entire item is approximately 18Mb. That's a major amount of overhead for each item.
I decided to try reading each item in as a dimensioned array. The code looked like this:
DIM ITEM(30) * STOP.LOOP = 0 LOOP * Get the next id from the active item-list. READNEXT ID ELSE STOP.LOOP = 1 UNTIL STOP.LOOP DO * Read the item to the array variable ITEM. MATREAD ITEM FROM FILE,ID ELSE MAT ITEM = "" * TAPE.RECORD = "" ;* Init the Tape Record FOR VALUE = 1 TO 100 ;* for 100 values FOR ATTRIBUTE = 1 TO 30 ;*get each attribute VAL = ITEM(ATTRIBUTE)<1,VALUE> ;* put it In VAL TAPE.RECORD = TAPE.RECORD : VAL[1,3] ;*VAL Is 3 long NEXT ATTRIBUTE * * Code to process the Tape Record * NEXT VALUE REPEAT
Notice the statement to extract VAL.
VAL = ITEM(ATTRIBUTE)<1,VALUE>
The dynamic reference to the value fields is to each attribute separately. The maximum length of the string for each scan is 400 characters.
The approximate length of the scan per value using a dimensioned array:
The first pass locates the start of the value in the first position of each entry. Each subsequent value pass is slightly more. The total number of bytes to scan for the entire item is approximately 594K.
This is a vast improvement over the previous method. The seismic section process was now completing in under 9 hours.