Understanding Semantic Web - Experts Q&A

Just as XML gave us a common framework for exchanging data, Semantic Web seeks to create a common way of seeing information so that disparate data can be exchanged without custom interpretation being required for every pair of systems. In other words, Semantic Web is about semantics — the consistent usage of words — more than it is about data, per se.

XML is, as those of us who work with it know, still evolving. Semantic Web, likewise, is not a finished, packaged entity. We are still in the early stages of this technology. Despite that, it is already being deployed to great advantage.

The original definition comes from Tim Bermers-Lee: I have a dream for the Web [in which computers] become capable of analyzing all the data on the Web - the content, links, and transactions between people and computers. A "Semantic Web", which makes this possible, has yet to emerge, but when it does, the day-to-day mechanisms of trade, bureaucracy, and our daily lives will be handled by machines talking to machines. The "intelligent agents" people have touted for ages will finally materialize. [Source: Wikipedia]

For a better understanding of this emerging technology, we spoke with David Amerland. David is a Fortune 50 consultant on Semantic Search and has authored several books, including Google Semantic Search <intl-spectrum.com/s/19z18>.

Outside Experts Interview

IS: Semantic Web is just about search? Or is there more to it?

David: There's another layer to the Semantic Web. It's happening primarily behind corporate doors. Because they are beginning to see the value of understanding better what's happening within [the data]. When the great credit crunch happened, we realized a couple of things. Despite all the safeguards, no one knew who owned whom or what or where. Since then, the finance industry has launched an initiative, which is quite broad, in which it applies new abstraction data formatting to the information of company entities; which allows us to identify other relational connections amongst them. So, three years from now, if we have another credit crunch, we would actually know exactly what company X owns in terms of other companies, and who owns whom, and how the connections are actually being made. Which is a big positive thing. And this is quite opaque. It's happening behind finance corporate silos.

IS: Let me stop you there for a moment, because while I understand the idea of integrating data to find those kinds of connections, I'm not sure what makes Semantic Web a specific technique for doing that. Could you talk about what makes that a particular approach or methodology? What defines it as Semantic Web?

David: Well, we always have the ability to interrogate a database. What we didn't have was the ability to interrogate databases across the entire industry sector, or even, the entire world. And this is exactly the direction that Semantic Web is walking towards. And I say walking, because it is not very fast in, and it is an incredibly painful process. But, it's getting to the stage where we're getting to that direction, where the moment it happens we see a couple of things. First of all, the traditional opaqueness which happens behind - which is created by compartmentalization of silos - which keeps information locked up in certain silos, will go. Information travels a lot more freely. And it actually makes sense as it gets out of where it is and goes somewhere else. And the other thing which we will see is that the moment we get that, we actually begin to see a meta-layer of connections. If we see, for instance, a piece of information which is held in place A, which then has an impact on another piece of information which appears to be separate on point B, suddenly that connection begins to mean something which we can infer from that. So, by joining these dots, by joining these data points, we begin to see how the world works at a much more grainy kind of way; which allows us to then operate in a different manner.

IS: So then the thing that makes Semantic Web interesting and important is its ability to take unstructured data and make clean assumptions about it? So that it can link two disparate pieces of data? Is that a correct summary?

The full interview can be found here <intl-spectrum.com/s/19z1e>.

Inside Experts

While we interviewed David as an expert from outside the MultiValue Industry, we also sent out a survey to our MultiValue software providers and VARs, soliciting their input.

Question: Semantic Web starts by developing rules for common definitions. What 'odd duck' cases have you encountered? For example: one system counts dollars as all monies received, but another system counts taxes separately, so that the word "dollar" actually has a different meaning on each system.

HDWP: We had a system where they thought they were storing all times as US Eastern. It turned out that they were storing time based on the location of the data entry. So, if New York put in a Dallas ticket, it was US Eastern, but if Dallas did their own data entry, it was showing as one hour earlier. The more offices they added, the worse the mess got. We finally had to split the entries by office and cross them by data entry person. Where office and entry person agreed, we auto-fixed it to UTC. Where they contradicted, we checked the records and fixed them by hand.

Precision Solutions: You bring up a good point about the common definitions. Heck, even getting people to agree on the standard abbreviations for units of measure, currency, or country codes can be a challenge - and these are established ISO standards! We deal with this in EDI all the time; our customers call a case of materials a "CS" and the trading partner calls it a "CT" or even worse "A0". So it's more fundamental even than "dollar", which is of course one of many currencies with different decimal precision and valuation rules. (See Euro vs. GBP for a case study in madness.)

Question: Semantic Web builds on these 'a dollar means a dollar' algorithms by adding an intelligent search layer. What sort of tricks have you used to enhance search on systems you work with or develop?

HDWP: We built a pre-processor to create a soundex-variant of the descriptive data. Then, when someone typed a query, we ran their string through the same soundex-variant before we did the compare. Someone looking for the words "bread crumbs," would have their search string converted to "BRBSRMS" <see chart below> which would match the pre-processed search strings.

b ? B
r ? R
e ? <null>
a ? <null>
d ? B
c ? S
r ? R
u ? <null>
mb ? M
s ? S

Precision Solutions: Assuming there are established and accepted standards for literally everything, searching for a "dollar" is a remarkably American search, just as regional as searching for "floren". Searching for a Euro is broader, and I suppose at some level if we classified dollar, euro, and floren as currencies, a system could automatically look up the currency exchange to present a "value" in a local currency or a variety of currencies. This would cause the user to see the current - virtual - value of their currency, not the stored value.

Question: Semantic Web is envisioned as an underlayer for Expert Systems or Decision Support Systems. What sort of work have you done with Expert Systems or Decision Support Systems.

HDWP: We had a client who had an array of different sized boxes available for shipping. In their rush to get orders out, the clerks would often grab the nearest boxes instead of looking of the most efficient fit. We wrote an algorithm to determine the optimum fit for each order. We printed the packing slip and the box labels based on it. Taking the guesswork out of the packing made the job faster, so the clerks used the pre-generated rules. The end result was fewer boxes going out half empty and orders going out with the smallest number of separate boxes. This made packing up the truck more efficient. It also made delivery faster because the driver had to locate fewer boxes at each delivery stop.

Precision Solutions: The problem with AI, however, and the problem I see with anything resembling AI, is that we human types tend to think in terms of language when intelligence is iconic . Take for instance the word "hand". Is it a noun? Is it a verb? One needs to know what it is to know how it relates to its surroundings. Even as a noun there are multiple independent concepts that the word might describe. Each of these concepts is iconic unto itself, and how that icon relates to its surrounding icons is itself iconic. However, when we think in terms of language, words like "has", "is", "near", and "far" are too ambiguous to define atomically and therefore establishing any level of definiteness is problematic. Without a concrete foundation, the model collapses under the ambiguity.

Simply providing machine readable information and standards is one heck of a good start. But it does little to establish context and iconic relations, which are essential for comprehension. Without comprehension, there is no understanding, and without understanding there is no intelligence, machine or otherwise. Maybe I'm just a pessimist, but from all I've read, it seems that people have approached this problem all wrong for a good long while now.


Sep/Oct 2014