Experts - Data Mining
Modern businesses rarely have just one database, one programming language, or one reporting tool. We've started a new column called Outside Experts to help us explore non-MultiValue technologies to see what lessons we can learn. To kick the new column off, we decided to look at data mining, specifically how Nod3x does data mining.
Nod3x: What does it do
Let's look at some of the key features of this product. The interface (figure 1, figure 2) might be bewildering to the new user, but everything has its purpose. The vast array of dots represents every Google+ post which matches the profile #foodanddrink, made since March 28th. We could do the same thing with Twitter, Facebook, or any combination of the three.
If you were running a campaign online and used a hashtag for your marketing, like #is2015, for the International Spectrum 2015 conference, you would see how active the tag was, i.e. how many posts and tweets used that marker.
That's the big picture; we can get into the detail by focusing on any dot. We've highlighted one in figure 1. Hovering over will tell you the identity of the person posting it. The size and color of the dot tell you, once you are trained to see it, which are originals, and which are shares and retweets.
Finally, on the right there's a settings menu. Two tabs are particularly useful: Save, which is open on figure 1, and People, which allows you to search for posts and tweets by a specific person. Save lets us pull the data as a CSV, for analysis in other tools. People lets you determine which posts came from your staff vs. the posts representing customers or other interested parties.
Figure 2 is the dashboard view. We get top influencers, the people who promoted that hashtag the most. We also get a word cloud of related tags, where the words are bigger or smaller based on how often they were linked to the target hashtag(s).
Why do we Care?
Just like every other BI dashboard, Nod3x attempts to take massive amounts of data and make it quickly scanable by humans. Just the fact that this is one more tool available to us makes it worth hearing about. However, there's more. Unlike most dashboards, Nod3x is all about unstructured data. As MultiValue technologists, we are constantly working with data deemed 'too messy' for SQL. This makes the ideas behind Nod3x more interesting to us. Lee Smallword and his team are wrestling with the same sort of challenges as we are: how to abstract data and patterns from free-form information.
A good example of this is the Sentiment Analysis tool, which is also part of Nod3x. It attempts to algorithmically quantify posts and tweets into three categories: Positive, Negative, and Neutral.
We did a video interview with Lee Smallwood from Nod3x. An excerpt appears below.
Lee: Where Google collects website posts, we actually bring together what people are talking about from Google+, Facebook, Twitter, and others.
IS: You reduce the data into something more visible. Can you talk a bit about that?
Lee: One of the main challenges with social media data is that it is vast. It's volume heavy. It's unstructured. We look at... try to bring in similarities between social media networks: a link, hopefully people are getting the hang of images... a slide show...
IS: You mentioned the graphic element. How important is it?
Lee: Being visual people, we react more quickly to an image than we do to a body of text. As content creators, we have to buy a person's time. That first impact of that image is the key.
The full interview, including Lee's reaction to the ideas behind MultiValue, can be found here: (intl-spectrum/s1067)
While we interviewed Lee as an expert from outside on data mining, and specifically on social media data mining, we also sent out a survey to our MultiValue software providers and VARs soliciting their input.
Question: What tools do you have or use for data mining MultiValue data?
Pick Programmer's Shop: mv.SSRS when SQL was a requirement. Informer seems to be our goto if the client is strictly MultiValue skillset. Easy to install, and easier to use. However, there is nothing like exposing data to show the client how "bad" their data is (file structures, missing key attributes, mixed use of a single data field) !
Kore Technologies: Most companies use multiple applications within their organization in addition to an ERP system such as: customer relationship management (CRM) systems, eCommerce websites, and other third-party solutions, which in turn generate high volumes of data. To simplify analysis and data mining, this information needs to be consolidated and aggregated into one central location: an enterprise data warehouse.
Products like Kore Technologies' Kourier Integrator Release 4.1.5 can be used to connect and extract data from multiple data sources to build an enterprise data warehouse. Kourier consolidates information from UniData, UniVerse and other non-U2 sources (e.g., Oracle, MySQL, Microsoft SQL Server and Microsoft Access) into a Microsoft SQL Server data warehouse in near-real time, enabling more timely analysis and decision making.
Business intelligence and analytics products can then be used to mine and analyze their data. Kore is agnostic to the actual reporting solution, but often recommends its partner's products: CorVu NG from Rocket Software and SSRS (SQL Server Reporting Services) from Microsoft. Both tools are used in the marketplace to make data-driven decisions on a day-to-day basis.
International Spectrum: Depending on the amount of data that is being processed, we use a combination of SELECTS and REFORMATS along with Excel and Google Prediction APIs to generate reporting solutions.
We have done proof-of-concept on a few other tools, including the 'R' programming environment. Many times MultiValue BASIC subroutines are used with the SELECTS and REFORMAT to generate the raw dataset which is then passed into other reports and analytic engines to get additional results.
Question: How are your customers data mining Social media for use in their enterprise?
Kore Technologies: Unfortunately, the adoption of social media in the MultiValue industry is lagging and we are not yet seeing much use of social media as a primary means of research, education, information sharing and analysis. We are also not seeing much social interaction, which is its real power. We believe increased social media education or training is needed to inform our audience about how to unleash the true power and value that social media offers. Social media is still relatively new however, it's evolving at a rapid rate and we, as a community shouldn't be left behind.
On the other hand, some of our clients are active on social media and sometimes interact with our social profiles, (Twitter, LinkedIn and Facebook). Their data mining strategies are unknown but we do know they use it for customer service, researching company information, learning about product updates, etc. Through other research and communications we've found that in some cases the success (or lack thereof) of a company's social media strategy is unknown to the executives. Many times it's handled by the marketing or public relations person, which leaves everyone else by the wayside. It's interesting because these same people are aware of current marketing and branding initiatives but are left in social darkness.
It's important to note that you should understand your audience before you create a social strategy of how and when to use social media. There are numerous tools, blogs and platforms to use as references and most are free! All in all, it's time for the MultiValue community to overcome the hurdle to embrace the influence and clarity that social media offers.
International Spectrum: The developers and users that we have talked to use social media data mining to predict mood and find customers that are unhappy. The social media data also helps the enterprise predict what sales may be for specific products based on how much people are talking about them.
Question: What is your future roadmap for data mining and big data analytics?
Pick Programmer's Shop: Continued exploration of tools available. Including those pushing the limits (Intersystems Cache with their "iKnow" technology) of analytics. I'd love to move us toward the difficult data repositories (email as an example) and be able to make analytical/reporting sense of that data.
Kore Technologies: As an industry leader in data management and integration, Kore Technologies is committed to enhancing Kourier Integrator to support the evolving needs for advanced business intelligence, information gathering and data mining requirements that will be needed by the industry. Kourier will continue to extend its architecture to support additional data sources, while continuing to improve the product's ease of use and performance.
International Spectrum: As more and more open source tools and programs are provided to the code analysis of large data sets, we will provide articles and code samples that will enable MultiValue Databases and LOB/ERP applications to take advantage of the information.
The International Spectrum 2015 conference will also have topics on Big Data and analytics using MultiValue Databases. We will continue to provide MultiValue companies options and solutions using these technologies.