Migrating NorthWind Database to a MultiValue Database
The battle between traditional database (RDBMS) and MultiValue databases have raged for years, and will likely continue for the foreseeable future. As NoSQL-style databases are starting to become more accepted by CEOs and CIOs for enterprise-level datastores, the MultiValue model is being looked at very closely again. The old, unfair, stigmas are gone.
RDBMS were never really designed for high volume CRUD (Create, Read, Update, and Delete) business transactions, where the MultiValue Database was designed specifically to handle and store business data. Other NoSQL databases are gaining ground, but they don't have the enterprise level locking, CRUD management, transaction boundaries, and general OLAP transaction ability that MultiValue Systems have implemented over the last 30 years of their existence.
This flexibility and ease which is core to the MultiValue datastore can be confusing to outside developers. Time working in other models creates expectations that don't match the MultiValue experience. They find themselves trying to solve problems which don't exist in our systems. And, without properly understanding how to construct data models effectively, they won't get all the benefits.
To demonstrate the differences, I decided to use Microsoft's sample database, NorthWinds, as an example. Why? Because most developers will have run across it while learning how to interface with an RDBMS. Giving them a familiar point of reference will help them compare and contrast the techniques needed to optimize in each environment.
Good programmers don't just know code, of course. As with any development project, you have to understand the business — the what, why and how — which informs the need of the software. If you don't, your project will fail. For those not familiar with NorthWinds data, it is a fictitious company called NorthWind Traders that creates and processes orders. This collection of data has been around for years and has been replaced with more complex data models like AdventureWorks but it stil stands as a clean and clear example.
NorthWinds Business Model
Figure 1 shows the schema of the NorthWinds database as found in Microsoft SQL server. Let's assume the company creates new orders every day. The number of orders created could be in the tens of thousands per hourly. It all depends on how many users are entering order. With this assumption of high volume, we can further assume the Orders entity is write intensive. Reads, on the other hand, will not be as voluminous as writes. In other words, Orders is, in MV-speak, a transaction file.
Next, let's consider the entities of Customers, Employees, Products, Shippers, and Suppliers. These entities will be created, updated, or removed at a low frequency. These are, in MV-speak Master Files. Product reads will be particularly high against every Order. An order usually has only one Shipper or Supplier but many Products.
Building the Database in MultiValue
Now that we know the business expectations, let create the basic files and dictionaries that match the SQL Structure above. I'll lay the structure out using standard MultiValue LIST-DICT syntax [Figure 2].
/* Table: orders */ CREATE TABLE orders ( id INT NOT NULL, employee_id INT , customer_id INT , order_date DATETIME , shipped_date DATETIME , shipper_id INT , ship_name VARCHAR(50) , ship_address VARCHAR(250) , ship_city VARCHAR(50) , ship_state_province VARCHAR(50) , ship_zip_postal_code VARCHAR(50) , ship_country_region VARCHAR(50) , shipping_fee DECIMAL(19,4) NULL DEFAULT '0.0000', taxes DECIMAL(19,4) NULL DEFAULT '0.0000', payment_type VARCHAR(50) , paid_date DATETIME , notes VARCHAR(250) , tax_rate DOUBLE NULL DEFAULT '0', tax_status_id TINYINT , status_id TINYINT NULL DEFAULT '0', PRIMARY KEY (id) ); /* Table: order_details */ CREATE TABLE order_details ( order_id INT NOT NULL, product_id INT , quantity DECIMAL(18,4) NOT NULL DEFAULT '0.0000', unit_price DECIMAL(19,4) NULL DEFAULT '0.0000', discount DOUBLE NOT NULL DEFAULT '0', order_detail_status VARCHAR(25), date_allocated DATETIME , PRIMARY KEY (order_id, product_id) );
File: Order [Figure 3]
As you can see, this is pretty straightforward. You can do similar conversions with other tables, but there is something important we have to address: OrderDetail. This table isn't a primary table like most of the tables in the Northwinds database.
It is what SQL administrators like to call a subtable, support table, or detail table. Basically, it is a table that can't live on its own, and requires the Order table to help provide additional context to the data found in OrderDetail table.
In SQL, you would have to do a JOIN to get the OrderDetail information along with the Order information [Figure 4].
SELECT Orders.orders.id, Orders.customer_id, Orders.Order_date, OrderDetails.Quantity, OrderDetails.unit_price FROM OrderDetials WHERE Orders.Id = "1234" INNER JOIN Orders ON OrderDetail.OrderID = Orders.IDFigure 4
Since the OrderDetail information is not likely to be accessed independently from the Orders file, now would be a good time to take advantage of the MultiValue database model to store the OrderDetail data.
There are a few ways to handle this in the MultiValue database:
- Create it exactly the same way, we can do anything architecturally that SQL can.
- Create a group of MultiValue fields inside the Orders file
- Create a Group of MultiValue fields outside the Orders File
There are several design arguments for and against using both ways of doing this. This is where MultiValue developers have options denied to SQL developers. We get to choose.
OrderDetail - MultiValue Fields Inside Orders
One of the benefits of turning the OrderDetail data into a group of MultiValue fields inside the Order File is "Referential Integrity".
Another benefit has to do with data retrieval and update optimization. If the OrderDetail is included with the Order data, then one IO read will pull all the data that is associated with the Order. In turn, one IO write will update all the data associated with the Order.
This differs from the SQL model which would require an IO read/write for every OrderDetail record associated with the Order. No matter how fast your hardware is, one write will be faster than fifty writes on that same hardware.
File: ORDER [Figure 5].
When you are ready to see the information you would enter a command as seen in Figure 6.
LIST ORDERS "1234" ORDER_ID CUSTOMER_ID ORDER_DATE QTY UNIT_PRICE
The drawback to this approach starts to show up when you have Orders that contain tens of thousands of lines of OrderDetail. While a single MultiValue record can easily handle tens of thousands of MultiValue, it can slow down filtering and selections.
For example, if the user needed to report on all the Orders that contain part ABC, then the following select would look something like Figure 7.
LIST ORDERS BY-EXP PRODUCT_ID = "ABC" ORDER_ID CUSTOMER_ID ORDER_DATE QTY UNIT_PRICE
This looks straight forward, but it would cause the selection to read each record in the Orders file and parse the multi-valued field in memory to find the part number. The selection process then returns the indexed position - the multi-value position - of the part number.
Once the select is complete, the list statement would have to parse each of the associated multi-valued field, QTY and UNIT_PRICE, to return the particular value found at that relative position.
While parsing and processing multi-valued fields that are only thousands of values big is very efficient, when you get into the ten thousand, the overhead of doing an additional IO read may be better.
In later articles, I will look advantages and disadvantages of the keeping the OrderDetail as a single file, as well as, grouping OrderDetail as multi-value, but keeping them in a separate file.