Locking - Part 1: Locking Refresher
This short series was prompted by a posting on the U2 Users list, bemoaning the fact that junior programmers don't understand locking any more. Now whether or not the poster was right in this assertion, the fact is that the designs and rigours of today's systems place demands in terms of locking, contention, and resolution that simply did not exist when the locking model for MultiValue systems was first conceived. So it is probably worth returning to this most essential of database programming activities for a fresh look.
In this series we're going to start off with some basic coverage of locking before looking in detail at the standard MultiValue locking model in Part 2. Unfortunately this often proves too restrictive for modern systems, so in Part 3 we will look at some of the issues around the traditional techniques and investigate some alternative locking models.
I'll be basing most of this on UniVerse and to a lesser extent on UniData, but beyond the platform-dependent issues of lock promotion, isolation levels, and such like, most of this will translate across to other MultiValue platforms.
Back to Basics
When we look at locking within the context of a MultiValue database, we are really considering two different aspects of database management.
On the one hand, we have the low level requirement to ensure physical data integrity by managing individual processes that read and write to parts of a database file. This is typically the concern of the database engineers and DBAs, and most developers probably don't stop to think too carefully about this, though there are some caveats that they should be aware of.
The other is ensuring logical data integrity by managing the order in which application level updates are applied to prevent updates getting lost and the appropriate use of transactions, where supported. This should one of the highest concerns of any developer, and so it is this aspect that I'll be concentrating on most for the next two articles in this series.
Before we dive into programming practices in regard to application locking, a short look at how physical integrity is maintained can be useful as well, so let's start off with an overview of group management.
Ensuring Physical Integrity
UniVerse and UniData, in line with many other MVDBMS platforms, support a range of different file types to meet the variegated needs of data storage and retrieval within an application. Of these the most popular for managing regular data are static and dynamic (linear) hashed files. Whilst the internal organization of these differs from platform to platform, the fundamental principles are the same across all MVDBMS.
Hashed files distribute potentially huge numbers of records into a file structure by logically splitting the file space into a number of much smaller separately addressable buckets or "groups." Records are allocated to a particular group on the basis of a hashing algorithm that is applied to the key, limiting the area of the file that needs to be searched to retrieve or update the record by any read or write operation.
As well as being a very fast and efficient mechanism for managing large volumes of data, any hashed storage model (fig. 1) is particularly well suited to systems that support a high level of concurrency. Because records can be predictably allocated to specific sections within a file, a number of separate processes can safely read and write to different areas of a file at the same time — important as most MultiValue systems use direct file access rather than a memory cache and lazy writing used by some other database types.
Fig 1: multiple processes writing to a hashed file in parallel.
Even so, some level of contention is still likely even with well tuned hashed storage. Requests may be made for records that hash to the same group or for shared resources such as a free list to extend the file or blocks used for statistics and load factors. And, of course, not all data files use hashed storage anyway, so the database still needs a mechanism to block concurrent requests that would otherwise interfere with one another.
UniVerse maintains physical integrity using two structures both held in shared memory (under Windows, as a memory mapped file). The first of these is an array that holds the load statistics for all open dynamic files preventing the need to flush these to the file header on every update. The file header is only updated once the file is closed or in response to a dbpause instruction — a vital step for 24x7 sites that use mirrored file systems to source their backups. This is fixed, though tuneable, and once full any subsequent opens on new dynamic files will fail.
The second is the group lock table which manages file read and write operations, another tuneable shared memory structure. This is effectively split into two regions: a group lock table that manages physical reads and writes, and a record lock table that manages logical contention.
The group lock table permits or denies access to individual file groups. Before a process can read or write a group in a data file it must request permission to do so. The group lock table consists of a series of sets of slots each of which is protected by a semaphore (fig.2). A lock key is calculated from the group and file (inode) numbers and this is hashed to a particular set in a similar manner to the hashing on a record key. If the group lock requested is already taken, or the group lock set is full, the process must wait until the slot is available. The same set of semaphores is also used for the record lock table and the presence of a record lock is indicated at the group level to improve lookup speed.
Fig 2: Group lock table and semaphores
In the example in figure 3, taken from the UniVerse LIST.READU command, the Lmode shows the detail of an active group lock: a WRite lock protecting the group address 26000 of the file, using an entry in the lock table protected by the semaphore 41. Commands like this can give an indication of excessive contention leading to tuning the group lock table.
Active Group Locks: Record Group Group Group Device.... Inode..... Netnode Userno Lmode G-Address. Locks ...RD ...SH ...EX 1369043616 993978089 0 8148 41 WR 26000 0 0 0 0
At this point you are probably asking, apart from boring my friends down the pub, why should any of this affect me, since this is just the database looking after its own? Well, as good as this model is, there are some places where the physical locking model needs some thought even before we move on to look at application level locking.
For dynamic files, access to the load statistics block in shared memory under UniVerse (not UniData) is protected by a single semaphore and the size of the block is fixed (though tuneable). Whilst this works well, it is a signal to UniVerse developers to only to use dynamic files where necessary.
The B+Tree files used for indexing are also block based. But with a B+Tree, all processes must begin their traversal in the same place - with the root node of the tree - and so this is a candidate for plenty of contention: and unless the database locks the entire tree each page traversed will involve a separate group lock and release action. Adding more B+Tree indices or using B+Trees to handle data only exacerbates the hit on the lock table.
Directory files are much larger problem. Directories are not group based, and so the locking system can only consider these to be single units that can only be accessed by one process at a time, and will block the entire file for every read/write operation. That is in addition to the overheads involved in storing large number of files in an operating system directory. Under Windows, NTFS directories actually use a B-Tree structure for file indices: these speed sorted display but are slow to update. So once again, directory files should only be used to house data where absolutely necessary.
Knowing how the physical locking works can also lead to some common sense design decisions. If you have a central control file holding frequently accessed elements like accumulators (next number generators) it makes sense to oversize this so that you do not end up with a small number of groups getting frequently blocked, and to make sure all your next numbers are not held in the same record.
And finally we come to the perennial problem of file overflow. Everyone understands that poor file sizing or record distribution in regular hashed files can cause massive chaining of overflow groups with every read/write operation potentially needing to trudge through a set of these to find a given record. But don't forget that poorly sized files have another impact — on the lock table.
The longer a read or write operation takes, the longer the lock table must hold on to that lock. And since this is a shared resource the effects on the lock table from having badly sized files can quickly become significant.
That's probably enough time spent on the physical locking aspects of the database. In the next article we will move on to look at application locking from a traditional perspective.