Improving System Performance BASIC Efficiency

Recently, the consultant was called by a client who had problems running a tape archive process. He needed the results within a day, and the process was taking over 72 hours.

The company was in the business of physical document and data record storage using a GA 5500 to keep track of location and content of each document.

The task was designed to transfer geological survey data to tape for distribution. This data was kept in separate files according to type. These types included seismic section, well log, and well core data

For non-PICK compatibility, the tapes were required to be fixed record length format. BASIC was used to read the variable length items and convert them to fixed length records, block them (10 records per block) and write to tape
(WRITET).

Each tape routine was a separate process within the job and was initiated by its own BASIC program. The worst offender was the seismic section tape. It alone was taking 24 hours.

The Data

Each item represented a scroll or document stored elsewhere. The item-id was the location and reference number of a document. There were 30 attributes in each item, and every attribute had up to 100 multivalues. These were correlated lists of seismic data.

The program had to build a fixed length 110 byte record from each value followed vertically through each attribute.
(Record 1 was all the first values, record 2 was all the second values, etc.)

The data item:

ITEM-ID: Referencenumber 

    #1    #2    #3    ... #100

001    1data]2data]3data    ...]Cdata

002    ldata]2data]3data    ...]Cdata

...

...

030    1Data]2data]3data    ...]Cdata

Tape records to generate:

#1   [ldata  1data  1data  ... 1data]

#2   [2data  2data  2data  ... 2data]

...

...

#100 [Cdata  Cdata  Cdata  ... Cdata]

The Program

In the interest of clarity, consider only the program loop used to break apart the item. This algorithm was used by each tape program that the job ran:

STOP.LOOP = 0
LOOP
  * Get the next id from the active item-list.
  READNEXT ID ELSE STOP.LOOP = 1
UNTIL STOP.LOOP DO
  * Read the item to the array variable ITEM.
  READ ITEM FROM FILE,ID ELSE ITEM = ""
  *
  TAPE.RECORD = "" ;* Init the Tape Record
  FOR VALUE = 1 TO 100 ;* for 100 values
    FOR ATTRIBUTE = 1 TO 30 ;*get each attribute
      VAL = ITEM<ATTRIBUTE,VALUE> ;* put it In VAL
      TAPE.RECORD = TAPE.RECORD : VAL[1,3] ;*VAL Is 3 long
    NEXT ATTRIBUTE
    *
    * Code to process the Tape Record
    *
  NEXT VALUE
REPEAT

Each pass through the loop addresses the attribute and value with a dynamic array. This function is performed (100x30) 3,000 times.

Dynamic Arrays

Arrays come in two flavors, dynamic and dimensioned. The choice of the correct array type under the right circumstances can greatly affect program performance.

Dynamic means "in motion, changing." Dynamic arrays have a variable number of entries. The array is referenced as a single variable and occupies a single entry in the DESCRIPTOR table. The array is actually a character string containing imbedded system delimiters marking the array entries. These delimiters may occur as any combination of attribute marks, value marks or sub-value marks. System delimiters are treated as part of the data string.

READ ARRAY FROM FILE,ID ELSE STOP

Array ARRAY equals the string:

221]321]432^211]311]411^555]777]888

BASIC contains certain intrinsic functions for handling dynamic arrays. Array entries can be INSERTed, DELETEd, EXTRACTed or REPLACEd. Dynamic array references are of the general form:

Arrayname<attnumber,valnumber,subvalnumber>

Each array reference must search for the addressed entry. On each request, the process scans from the beginning of the string to the delimiter count. As strings get larger, the amount of work required for each instruction increases. The speed at which an entry in a dynamic array is addressed can further decay as the string crosses frame boundaries.

The command READ retrieves an item and assigns it as a dynamic array. There is no setup required other than OPENing
the file.

Dimensioned Arrays

Dimensioned arrays can be referred to as "static" arrays. The number of array entries is constant. This static array must be defined within the program by the DIMENSION command.

DIM ARRAY(20)
DIM TABLE(20,50)

Each entry in these arrays is a "subscripted" variable, occupying a separate entry in the DESCRIPTOR table. When an item is read using MATREAD, the attributes are assigned to their corresponding array entries, (MATREAD only works with one-dimensional arrays.)

DIM ARRAY(3)
MATREAD ARRAY FROM FILE, ID ELSE STOP

Each attribute is a separate variable and does not require the item to be scanned. Attributes are retrieved using a subscript to address the variable.

ARRAY(2) is attribute 2 of the item MATREAD into ARRAY.

Each array entry contains imbedded value and sub-value marks. Therefore, each dimensioned array entry in itself can be a dynamic array.

ARRAY(2)<1,3>

This addresses the third value in dimensioned array entry number 2. The value of this EXTRACT is "411."

A Solution

In the program, each array reference, ITEM<ATTRIBUTE,VALUE>, requires a scan from the beginning ITEM. For simplicity, assume there are three characters per value field, approximately 400 characters per attribute (including delimiters) and 12,000 characters in an item. The approximate number of characters required to scan on the first pass (when the VALUE is equal to 1,2,3, up to 100) is shown in the follow charts.

How much overhead is involved when using the dynamic array under these conditions?

The approximate length of the scan per value using a dynamic array:

174Kb for the first pass. Each subsequent pass is slightly more. The total number of bytes to scan for the entire item is approximately 18Mb. That's a major amount of overhead for each item.

I decided to try reading each item in as a dimensioned array. The code looked like this:

DIM ITEM(30)
*
STOP.LOOP = 0
LOOP
  * Get the next id from the active item-list.
  READNEXT ID ELSE STOP.LOOP = 1
UNTIL STOP.LOOP DO
  * Read the item to the array variable ITEM.
  MATREAD ITEM FROM FILE,ID ELSE MAT ITEM = ""
  *
  TAPE.RECORD = "" ;* Init the Tape Record
  FOR VALUE = 1 TO 100 ;* for 100 values
    FOR ATTRIBUTE = 1 TO 30 ;*get each attribute
      VAL = ITEM(ATTRIBUTE)<1,VALUE> ;* put it In VAL
      TAPE.RECORD = TAPE.RECORD : VAL[1,3] ;*VAL Is 3 long
    NEXT ATTRIBUTE
    *
    * Code to process the Tape Record
    *
  NEXT VALUE
REPEAT

Notice the statement to extract VAL.

VAL = ITEM(ATTRIBUTE)<1,VALUE>

The dynamic reference to the value fields is to each attribute separately. The maximum length of the string for each scan is 400 characters.

The approximate length of the scan per value using a dimensioned array:

The first pass locates the start of the value in the first position of each entry. Each subsequent value pass is slightly more. The total number of bytes to scan for the entire item is approximately 594K.

This is a vast improvement over the previous method. The seismic section process was now completing in under 9 hours.