QM and VFS
I had a chance last month to look at an aspect of the QM platform that I had not used before: The VFS or Virtual File System.
The background to this was an interesting request from a client to look into various options for integrating their application with SQL Server. Not just as a reporting base or data warehouse, which is bread and butter stuff, but to host their live data in SQL Server and have their application make use of it for transactional purposes. Having pointed out the obvious risks and performance issues to be expected by going that route — given that they have a very large, complex, mature and hard working UniVerse application managing huge numbers of complicated transactions and talking to numerous upstream and downstream systems — and the fact that a MultiValue application is designed in a fundamentally different way to an SQL one — I duly set out to assess the current state of various technologies that might allow this.
The actual findings are commercially sensitive. I'm not going to reproduce those here. Nor am I going to list out all the various options that I put before them, or recommend one option over another. The goal was to lay out alternatives so the client could decide which ones would merit further consideration.
I knew that QM had a virtual file system, but I had never had a reason to play with it. It's also a fairly undocumented experience, with only a couple of sample drivers to look at, neither of which was a good model for what I was attempting. I needed to develop a proof of concept driver that would talk — at reasonable speed — to SQL Server.
Since this was under the auspices of producing a report, development time and budget would be extremely limited. But I wanted to see whether it was technically feasible to create something that might perform similarly to, for example, MVON. I knew the experience might not be as smooth and streamlined as the On Group's approach to running MultiValue on SQL. The end result was likely to be more of a hybrid approach than MVON's seamless model.
As you can tell, I had already been highly impressed with the way that MVON 'just worked' straight out of the box and how quickly I had been able to get a demo system running on MVON. So, I knew that was a high benchmark to aim at.
The QM VFS is a means of redirecting file access. It works on a file by file basis, allowing you to mix and match VFS files with regular files. You can, for example, host a data file in VFS and the associated dictionary as a native QM file.
To access a file using VFS you need three things:
- A VFS driver.
- A VFS server entry.
- A modified file pointer.
The VFS driver is the fun part but I'll speak to that later in this article. For now it's just worth saying that there are two types: (A) Internal drivers written as QM Basic classes and (B) external drivers written in native code. For various reasons my driver is external. I did also model an internal driver but found it too limiting. In other situations the mileage may vary.
A VFS server is a logical collection of information that defines a particular data source. It combines the name of the VFS driver with connection and host details, so you could have a generic driver called, say, demovfs (for the want of anything original), and it could talk to a number of different databases or external systems for pulling and pushing data. It is similar in that regard to an ODBC data source definition. The VFS server definition is created using the SET.VFS.SERVER command, which is necessary as it encrypts the password for storage.
To add a VFS server called 'DEMO' with an external VTS driver of 'demovfs' accessing a remote data source named 'MY_DBSERVER' you would need the SET.VFS.SERVER command as seen below.
SET.VFS.SERVER DEMO EXT demovfs MY_DBSERVER my_username my_password
The password is stored, encrypted, and passed to the driver since you don't want it to prompt you for authentication information when it is actually running. It could be a dummy, as the driver can decide exactly what it wants to do with the parameters it is given.
The LIST.SERVERS command lists your available VFS servers.
:LIST.SERVERS VFS Servers..... IP address.............. Port... Sec Remote user name.... DEMO Handler: EXT demovfs MY_DBSERVER 0 No demo
Once you have defined your server, actually making use of it is satisfyingly simple. Create a file pointer with a path name composed of the following:
The target_name is just a string that just gets passed through verbatim to the driver to identify the final data source. In my case I'm using the name of the SQL table or view, but it could be absolutely anything just so long as it is meaningful to the driver. You might, for example, create an IMAP driver that could interrogate a mail box passing the name of a folder or tag.
So to create a VFS file called MY_FILE that will instruct the DEMO server to access a thing called MY_TABLE (you get the naming convention), you would create a QM file as follows. Note that I'm still defining a local dictionary for the file.
:CREATE.FILE DICT MY_FILE Created DICT part as MY_FILE.DIC Added default '@ID' record to dictionary :ED VOC MY_FILE VOC MY_FILE 3 line(s) ----: 0001: F ----: 0002: ----: r VFS:DEMO:MY_TABLE 0002: VFS:DEMO:MY_TABLE ----: fi 'MY_FILE' filed in VOC
You can now access MY_FILE just like any other QM file: listing it, reading and writing records and generally having fun. All of these actions just get passed to the driver via the VFS API and it can decide what to do with them.
The VFS API
The nice thing about VFS is the API itself. The nice thing about the API is what it leaves out.
The VFS API is really, really simple, and makes absolutely no assumptions about what you want to do with your virtual files under the covers. Where other approaches might, for example, use a mapping schema that resolves into generic SQL statements being passed to an ODBC-like driver, VFS works at the logical MultiValue level and lets you get on with the job of doing your own mapping underneath. It knows that you want to open a file, read, write and delete some records, possibly clear the file from time to time and run selections, but how you do that is entirely up to you.
Now there is no doubt that if you want to access an SQL database and haven't got the knowledge (or time, or interest) to set up all the mapping and connection details yourself, having a mapping schema is a great benefit. But it also restricts you to whatever that schema will allow. If the schema is determined to turn your writes into generic UPDATE, INSERT and DELETE commands you don't have the choice to use alternatives such as SQL Server specific MERGE or TRUNCATE statements. If you have a tame TSQL guru who is willing to churn out highly optimized stored procedures, or even to make use of the CLR features inside SQL Server, then again you need a tool that will allow you to make full use of those.
Mapping the Data
In my case this was only a proof of concept, so I created a simple driver in C# that expected each target to be either a table or a view. The SQL Server .NET client exposes a schema API through which you can easily discover the layout of any table or view. I wanted to test two layout options: a simplified storage consisting of just a primary key and a varchar(MAX) field to hold a record body just as it would appear in a MultiValue file; and a normalized layout consisting of columns of different data types that would correspond one to one with fields in the record by their ordinal position. The only requirement was that both layouts would require a primary key, again discoverable though the schema API. The driver would be responsible for translating between the MultiValue representation and the columns in the table or view, and would handle some internal niceties as turning PICK format dates into SQL Server equivalents [ Figure 4 ].
Writing to an SQL database is a more painful operation than writing to a MultiValue database, not the least because of the differentiation between an insert and an update. Often the combination is termed an upsert operation, which is normally handled by either (A) testing explicitly for the presence of a row with the given primary key before deciding whether to insert or update; or (B) by attempting an update and following this with an insert if it fails. Both of these are tried and tested methods and work across different database platforms.
The MERGE statement in SQL Server combines these into one and can be quicker in some circumstances, though with a verbose and ugly syntax. It is normally used for merging the content of two tables, but your statement can just select the input parameters as the merge source.
Also, like all SQL Server statements, it needs to be planned and optimized and so only comes into its own through reuse. SQL Server will cache and reuse the latest query plan where possible, but to ensure a level of performance you almost always need a stored procedure: so the driver discovers or creates such a procedure on the first write request it receives to a given VFS file. If the file is only ever read, no procedure is generated. Further, the driver caches the parameterized command objects so they are there ready for use and it doesn't have to waste time checking the parameters again. These are all small things, but the cumulative performance benefits can stack up.
So having created (and unit tested!) the generic SQL Server actions, it is time to turn to the VFS part.
The VFS Driver
As I wrote above, a VFS driver can be implemented in two ways: as an external driver written in a native language such as C or C#, or as an internal QM Basic class. Both internal and external drivers offer the same API but with one important difference. For readers more familiar with other MultiValue databases, QM Basic has object oriented features similar to VB or VB.NET that sadly the other manufacturers have never picked up - the local functions in U2 are a poor alternative. QM classes are written in QM Basic and surface methods as local subroutines and functions.
The internal driver, the one written in QM Basic, is much simpler to develop. There is a template class that you can copy and from there you can simply fill in the methods provided: they all have reasonable comments and the parameters are meaningful. However, since QM Basic doesn't talk directly to SQL Server in the way that I wanted, I could only work the internal driver via an intermediate service application over a socket connection. Also, a new instance of the class is created for each separate VFS file being accessed: and the driver persists for as long as the VFS file is open, which means that unless the file is opened to a named common the driver is constantly being created and destroyed. For a staged architecture, this made it very inefficient as it had to continually reinitialize to reestablish its connection with the service.
The external driver, on the other hand, could be written as .NET executable (or any other language that can communicate through the API) and so did not need an intervening service to talk to SQL Server. More importantly, it is launched as a single instance for each logged on user accessing files through a VFS server definition. Instead of associating each VFS file with a separate instance of the driver, and the internal driver does, a file identifier is simply passed to or from the API on each call and the same driver manages a whole group of files. This meant that the driver could itself maintain a single connection to SQL Server for that user for lifespan of their login session.
The first request to open a VFS file causes QM to create the driver by starting the executable and passing to it the identifier of a named pipe. This can act like a TCP network socket, but will also route via shared memory for speed if both end points are on the same machine. The executable must open its end of the pipe and can then receive commands and send responses through the pipe to the QM session. One complication is that under .NET there is support for named pipes, but only in a specific configuration that does not match the way that QM uses them, and so a driver written using .NET needs to invoke the underlying Windows API to establish the connection [Figure 5].
A minimal driver needs to handle the following actions, which are identified by an action code passed at the start of each request and a set of parameters that are passed up and down the pipe in a simple encoded fashion [Figure 6].
Additional functions allow you to chunk large selections so that you do not pass huge lists of keys back through the pipe in a single hit; to work directly with indices, and to lock and unlock the file.
Try it for Yourself
I can't share the code as it's creation was partly funded by the client for this report and so it is not mine to give. But hopefully this article provides sufficient clues to allow you to write your own: it could be an interesting side project.
There are a few niceties in regard to formatting the data returned that I will leave up to the reader to discover - I wouldn't want to deny the reader too much of the fun!