Compressing Traffic for UniObjects

One of my clients has been looking at a particular pinch point in their application: fetching potentially large volumes of data for presentation in a data-bound grid.

As well as the usual issues of building data, and it lacked parallelism. It runs a selection, grabs the data, transfers it and then builds a DataSet — all serial and with plenty of options for asynchronous building to improve the overall response.

Additionally, the physical architecture at one of their customers' sites means that what can be large volumes of traffic being transferred over the UniObjects protocol from their server is slow. The data itself should be highly amendable to compression so the question came — could I find some way to compress this?

If you look at the members of the old Session object, there is a compression flag. That might seem like a good starting point but it is not settable . It is also conspicuously missing from the U2 Data Provider for .NET, suggesting that it was a hoped-for extension that never got built. So, no go there.

But in this case all that is really needed is to compress the data itself before it is sent over the wire, and using a form that can be quickly and readily reinstated on the other end. The application makes a call to a UniBasic subroutine to generate and return the data, so if I can compress using a standard format before it leaves the subroutine and decompress it the other end once it has been received, we should be in business.

It should be quick, of course, since there is no point adding time for the compression that would undo any performance improvements from reducing the traffic. I have a compression routine in UniBasic that I use for my installer packages, but in that situation I am not overly concerned about speed.

To GCI or Not To GCI

My first thought was to use a GCI function to compress the data. For those who have not familiar with GCI, it is a way of extending UniVerse with native C code. Writing GCI on Windows is simpler since it can call DLLs, but this needs to run on Windows and Linux and on Linux it is potentially painful. And, other than C#, I've never particularly enjoyed writing C code. Delphi is always an option on Windows, but I've had problems calling the equivalent FreePascal from GCI on Linux.

So GCI remains an option, but only if I can't find something simpler to administer.

Calling Python

So why not leverage the python support? I've been looking for excuses to find things for python to do, and this seems like a natural choice. Python supports zlib compression, and with a little bit of jiggery-pokery that can be turned back using the .NET System.Compression.DeflateStream. ZLib uses the same encoding as deflate with the addition of a header, checksum and optional dictionary. The only difficulty is managing the encoding at each end to convert these to and from string representation.

Simple enough? Well, nearly — it took me two goes to get it working. It's the old issue of Unicode rearing its ugly and malformed head.

UniVerse 11 introduces a PyCallFunction() call into UniBasic. This calls a function in a Python module from a nominated location and can pass zero or more arguments. The function return value is captured and assigned in the UniBasic program < Figure 1 >.

Value = PyCallFunction('myModule','myfunction','Arg1','Arg2'..)

Figure 1

Which works well for regular data, and anything that can be represented as standard Unicode.

In Python 2 you could compress a string. A string was nothing more than a managed set of bytes, as it is on UniVerse, old style Delphi, etc. etc. A string could represent anything and we didn't care which of the 256 possible values filled each byte. Then along came Unicode, and everyone adopted it for their strings and now you can't use strings for any kind of binary data. Delphi sensibly has a RawByteString to fill the gap but for .NET and python there is no equivalent. There is no Unicode formatting for old style 8-bit data that works since Unicode confuses the absence of representation with non-existence. It's a fundamental flaw in Unicode.

In Python 3 you can only compress something that is byte-able, like a series of bytes. Converting a string into a byte array is relatively straightforward so long as you can find an equivalent encoding format. UniVerse always passes strings in as Unicode, so a field marks (char 254) get translated, for example, to its UTF-8 equivalent pairing. Pass that in and out and you get to see the c3 be (195 190 decimal) pairing in Figure 2.

def pkt(origPacket):      newBytes = origPacket.encode('utf-8')      newPacket = ''.join(format(x, '02x') for x in newBytes)      return newPacket

Figure 2

This snippet first converts the original packet to bytes, then those bytes into a hex encoding before passing back the hex encoded string. When passed a 4 field dynamic array from UniVerse, the resulting encoded string returned to UniVerse, and its MX0C conversion, are given in Figure 3.

496e207468652064656570206f662074686520736561c3be496e2074686520646565702077617465722062 6c7565c3be4c69766564206120666973682077686f20636f756c642077697368c3be46e642065616368207769736820776f756c6420636f6d652074727565
In the deep of the seaÃ¾In the deep water blueÃ¾Lived a fish who could wishÃ¾And each wish would come true

Figure 3

You could choose to keep the original markers which will be neatly handled on the client side (that also expects UTF-8) or convert them into low order characters such as char(1) to char(3) for field, value and subvalue markers if you don't want to double up. So in normal operations we can pass a string, decode it into bytes and compress it as in Figure 4.

import zlib
def zip(origPacket):      bytes = origPacket.encode('utf-8')

Figure 4

Returning the compressed result is more of a problem: UniVerse requires that this is returned as a string. UniVerse strings are not the same as Python strings and whilst UniVerse encodes into Unicode on the call, to get the data back UniVerse automatically takes the platform encoding, which defaults to cp1252 (Latin-1). This is an extended ASCII encoding and one that does not include the whole range of 8 bit characters. Put simply, even if you encode stuff using an 8 bit ISO representation inside your Python function, you cannot directly return a string containing binary data to UniVerse.

From the Python perspective once we've turned this into a set of bytes we should keep it there, and whilst this makes good sense all the time we're working in Python (the same would be true in .NET), in order to return it to UniVerse without going through an intermediate file, I need to turn it back into a string that will fit with the Latin-1 encoding. I can't simply take those bytes and encode them using, for example, iso8539-1 which would preserve the content byte-for-char since the subsequent UniVerse re-encoding on exit will trash that.

Since this is only in-memory and UniVerse is sharing the same process space, it's not too much of an overhead to simply hex encode it as above; not least because whilst it is wasteful there is a single conversion code on the UniVerse side to decode it back and if the engineers have been sensible it should only need one memory reallocation (since you know the new string will be exactly half the size). It might also be possible to base64 encode it, but Python sulks at that < Figure 5 >.

def zip(origPacket):      bytes = origPacket.encode('utf-8')      newBytes = zlib.compress(bytes)      newPacket = ''.join(format(x, '02x') for x in newBytes)      return newPacket

Figure 5

Now I can call this and fetch my compressed data on the UniVerse side < Figure 6 >.

ModuleName = 'u2zip'

Zipped = PyCallFunction(ModuleName,'zip', InData)
If @PYEXCEPTIONTYPE NE '' Then    ErrText = @PYEXCEPTIONMSG    ErrText<-1> = @PYEXCEPTIONTRACEBACK
End Else   RealZipped = IConv(Zipped,'MX0C')
End

Figure 6

That all looks reasonable and it's saving some space now, so time to build this into my subroutine call from the client.

Note that the module needs to be in a path where UniVerse will discover it: this is held in a *.pth file. See the U2 Python guide for details. Note also that UniVerse caches Python calls for a session for speed, so any modifications to the Python routine require that you log out and in again.

Calling from .NET

I need to get that same set of bytes generated from Python back in .NET. This runs into exactly the same problem - if you try to get that as a string using the UniSubroutine.GetArg() method, it will try to encode it as Unicode and the result will be a horrible mess that will not decode correctly. Again I could simply base64 encode it < Figure 7 >.

Temp = encode("BASE64A", 1, RealZipped, 1, OutData, 1)

Figure 7

And then convert it on the .NET side < Figure 8 >.

byte[] bytes = Convert.FromBase64(OutData);

Figure 8

But now I've stuck a load more bytes into the traffic for the encoding scheme when I'm trying to reduce them. There is still a net gain — base64 encoding is quite lean — but that is obviously not what I want.

Fortunately, all is not lost. You can fetch a subroutine argument as a string, or as a dynamic array object (a UniDynArray). The latter exposes a useful method: ToByteArray(). So I can return the underlying bytes that got transferred over the wire from my UniVerse subroutine < Figure 9 >.

UniDynArray outData = subr.GetArgDynArray(2);
byte[] bytes = outData.ToByteArray();  Put these together and I can decode the compressed data to get the original results:

string decode (byte[] bytes) {    string result = string.Empty;                    MemoryStream s = new MemoryStream(bytes);    DeflateStream z = new DeflateStream(s, CompressionMode.Decompress);    const int size = 4096;    byte[] buffer = new byte[size];    using (MemoryStream memory = new MemoryStream()) {        int count = 0;        do {            count = z.Read(buffer, 0, size);            if (count > 0) {                memory.Write(buffer, 0, count);            }        }        while (count > 0);        // now convert these back into a string        memory.Position = 0;        StreamReader r = new StreamReader(memory);        result = r.ReadToEnd();    }    return result;
}

Figure 9

Is that all there is to it? Not quite - Deflate and ZLib compression are not quite the same. ZLib adds a header, checksum and optional dictionary to the data. These can be stripped off easily, or there are other zlib libraries that can be leveraged: but this is enough or what I need.

BRIAN LEACH

View more articles

Featured:

Jan/Feb 2018