Migrating NorthWind Database to a MultiValue Database

The battle between traditional database (RDBMS) and MultiValue databases have raged for years, and will likely continue for the foreseeable future. As NoSQL-style databases are starting to become more accepted by CEOs and CIOs for enterprise-level datastores, the MultiValue model is being looked at very closely again. The old, unfair, stigmas are gone.

RDBMS were never really designed for high volume CRUD (Create, Read, Update, and Delete) business transactions, where the MultiValue Database was designed specifically to handle and store business data. Other NoSQL databases are gaining ground, but they don't have the enterprise level locking, CRUD management, transaction boundaries, and general OLAP transaction ability that MultiValue Systems have implemented over the last 30 years of their existence.

This flexibility and ease which is core to the MultiValue datastore can be confusing to outside developers. Time working in other models creates expectations that don't match the MultiValue experience. They find themselves trying to solve problems which don't exist in our systems. And, without properly understanding how to construct data models effectively, they won't get all the benefits.

To demonstrate the differences, I decided to use Microsoft's sample database, NorthWinds, as an example. Why? Because most developers will have run across it while learning how to interface with an RDBMS. Giving them a familiar point of reference will help them compare and contrast the techniques needed to optimize in each environment.

Good programmers don't just know code, of course. As with any development project, you have to understand the business — the what, why and how — which informs the need of the software. If you don't, your project will fail. For those not familiar with NorthWinds data, it is a fictitious company called NorthWind Traders that creates and processes orders. This collection of data has been around for years and has been replaced with more complex data models like AdventureWorks but it stil stands as a clean and clear example.

NorthWinds Business Model

Figure 1 shows the schema of the NorthWinds database as found in Microsoft SQL server. Let's assume the company creates new orders every day. The number of orders created could be in the tens of thousands per hourly. It all depends on how many users are entering order. With this assumption of high volume, we can further assume the Orders entity is write intensive. Reads, on the other hand, will not be as voluminous as writes. In other words, Orders is, in MV-speak, a transaction file.

Northwind Figure 1

Figure 1

Next, let's consider the entities of Customers, Employees, Products, Shippers, and Suppliers. These entities will be created, updated, or removed at a low frequency. These are, in MV-speak Master Files. Product reads will be particularly high against every Order. An order usually has only one Shipper or Supplier but many Products.

Building the Database in MultiValue

Now that we know the business expectations, let create the basic files and dictionaries that match the SQL Structure above. I'll lay the structure out using standard MultiValue LIST-DICT syntax [Figure 2].

/* Table: orders */
CREATE TABLE orders (
  id              INT NOT NULL,
  employee_id     INT ,
  customer_id     INT ,
  order_date      DATETIME ,
  shipped_date    DATETIME ,
  shipper_id      INT ,
  ship_name       VARCHAR(50) ,
  ship_address    VARCHAR(250) ,
  ship_city       VARCHAR(50) ,
  ship_state_province  VARCHAR(50) ,
  ship_zip_postal_code VARCHAR(50) ,
  ship_country_region  VARCHAR(50) ,
  shipping_fee    DECIMAL(19,4) NULL DEFAULT '0.0000',
  taxes           DECIMAL(19,4) NULL DEFAULT '0.0000',
  payment_type    VARCHAR(50) ,
  paid_date       DATETIME ,
  notes           VARCHAR(250) ,
  tax_rate        DOUBLE NULL DEFAULT '0',
  tax_status_id   TINYINT ,
  status_id       TINYINT NULL DEFAULT '0',
  PRIMARY KEY (id)
);
/* Table: order_details */
CREATE TABLE order_details (
  order_id            INT NOT NULL,
  product_id          INT ,
  quantity            DECIMAL(18,4) NOT NULL DEFAULT '0.0000',
  unit_price          DECIMAL(19,4) NULL DEFAULT '0.0000',
  discount            DOUBLE NOT NULL DEFAULT '0',
  order_detail_status VARCHAR(25),
  date_allocated      DATETIME ,
  PRIMARY KEY (order_id, product_id)
);

Figure 2

File: Order [Figure 3]

Northwind Figure 2

Figure 3

As you can see, this is pretty straightforward. You can do similar conversions with other tables, but there is something important we have to address: OrderDetail. This table isn't a primary table like most of the tables in the Northwinds database.

It is what SQL administrators like to call a subtable, support table, or detail table. Basically, it is a table that can't live on its own, and requires the Order table to help provide additional context to the data found in OrderDetail table.

In SQL, you would have to do a JOIN to get the OrderDetail information along with the Order information [Figure 4].

SELECT Orders.orders.id, Orders.customer_id, Orders.Order_date, OrderDetails.Quantity, OrderDetails.unit_price 
FROM OrderDetials 
WHERE Orders.Id = "1234" 
INNER JOIN Orders ON OrderDetail.OrderID = Orders.ID

Figure 4

Since the OrderDetail information is not likely to be accessed independently from the Orders file, now would be a good time to take advantage of the MultiValue database model to store the OrderDetail data.

There are a few ways to handle this in the MultiValue database:

  1. Create it exactly the same way, we can do anything architecturally that SQL can.
  2. Create a group of MultiValue fields inside the Orders file
  3. Create a Group of MultiValue fields outside the Orders File

There are several design arguments for and against using both ways of doing this. This is where MultiValue developers have options denied to SQL developers. We get to choose.

OrderDetail - MultiValue Fields Inside Orders

One of the benefits of turning the OrderDetail data into a group of MultiValue fields inside the Order File is "Referential Integrity".

Another benefit has to do with data retrieval and update optimization. If the OrderDetail is included with the Order data, then one IO read will pull all the data that is associated with the Order. In turn, one IO write will update all the data associated with the Order.

This differs from the SQL model which would require an IO read/write for every OrderDetail record associated with the Order. No matter how fast your hardware is, one write will be faster than fifty writes on that same hardware.

File: ORDER [Figure 5].

Northwind Figure 3

Figure 5

When you are ready to see the information you would enter a command as seen in Figure 6.

LIST ORDERS "1234" ORDER_ID CUSTOMER_ID ORDER_DATE QTY UNIT_PRICE
Figure 6

The drawback to this approach starts to show up when you have Orders that contain tens of thousands of lines of OrderDetail. While a single MultiValue record can easily handle tens of thousands of MultiValue, it can slow down filtering and selections.

For example, if the user needed to report on all the Orders that contain part ABC, then the following select would look something like Figure 7.

LIST ORDERS BY-EXP PRODUCT_ID = "ABC" ORDER_ID CUSTOMER_ID ORDER_DATE QTY UNIT_PRICE
Figure 7

This looks straight forward, but it would cause the selection to read each record in the Orders file and parse the multi-valued field in memory to find the part number. The selection process then returns the indexed position - the multi-value position - of the part number.

Once the select is complete, the list statement would have to parse each of the associated multi-valued field, QTY and UNIT_PRICE, to return the particular value found at that relative position.

While parsing and processing multi-valued fields that are only thousands of values big is very efficient, when you get into the ten thousand, the overhead of doing an additional IO read may be better.

In later articles, I will look advantages and disadvantages of the keeping the OrderDetail as a single file, as well as, grouping OrderDetail as multi-value, but keeping them in a separate file.

Nathan Rector

Nathan Rector, President of International Spectrum, has been in the MultiValue marketplace as a consultant, author, and presenter since 1992. As a consultant, Nathan specialized in integrating MultiValue applications with other devices and non-MultiValue data, structures, and applications into existing MultiValue databases. During that time, Nathan worked with PDA, Mobile Device, Handheld scanners, POS, and other manufacturing and distribution interfaces.

In 2006, Nathan purchased International Spectrum Magazine and Conference and has been working with the MultiValue Community to expand its reach into current technologies and markets. During this time he has been providing mentorship training to people converting Console Applications (Green Screen/Text Driven) to GUI (Graphical User Interfaces), Mobile, and Web. He has also been working with new developers to the MultiValue Marketplace to train them in how MultiValue works and acts, as well as how it differs from the traditional Relational Database Model (SQL).

View more articles

Featured:

Sep/Oct 2018

menu
menu