What’s happening to the term “big data”? Gartner has dropped big data from it’s hype cycle sighting other more specific terms that have replaced it. Gartner’s official position is:

“Insofar as big data is a megatrend that touches so many aspects of our interactions with computers–from the Internet of Things and content analytics to cloud computing and virtual reality (all hype-full categories in Gartner’s eyes)”1

But the situation is perhaps more serious than Gartner states. With the early no-SQL solutions completely removed from the hype cycle: Hadoop in the dreaded “Obsolete before the plateau” as are “Key-Value DBMS” systems it would seem that large portions of the no-SQL environment are out of favor.

Comparing Key-Value and NoSQL Solutions

Here’s the gartner 2017 & 2018 charts showing Hadoop and Key-Value DBMS obsolete before plateau and notably Document Store DBMSs not in the Key-Value space:

So the first question is, what is happening with Key-Value DBMS vs Document Store DBMSs (because technically they’re the same thing). Both databases use key-value pairs like this example JSON Record

{name: "Douglas Adams",
street: "782 Southwest St.",
city: "Austin",
state: "TX"}

The only appreciable difference is that document databases encapsulate these records in larger records. Key-Value DBMS are built off of open source and include ElasticSearch and Splunk. Splunk has just had a very telling year. In 2018 Splunk had over $1.5 Billion in revenue. This was an awesome year for Splunk, except for the 20% loss which exceeds their available cash. Leaving the question: Can these databases be practical–even at scale?

Open Source’s Less Obvious Costs

The lesson that Gartner is presenting here is that Do-It-Yourself Open-Source is too expensive and does not lead to productive solutions but instead leads to extended, open ended engineering engagements requiring extensive budgets and unclear goals.

Gartner & Forrester are clearly favoring the Master Data Management approach of using commercial tools to develop data science project.2 Very Notably the ‘old guard’ with integration and logical models is not achieving business value and relevance:3

An industry moving away from generic models and to custom fit solutions is a great place for a consultant to be.

Providing Better Long-Term Value

Consultants become the business clarity in the fog of inflated expenses and confusing technology. As consultants, our expertise become the glue for a “high speed analytical solution”–higher upfront costs but better long run return on investment. In order to capitalize here, a database is needed that merges the relational performance of SQL with the ease of use of document database.

Introducing Painted Streams

A solution exists in a new development called Painted Streams from Painted Intelligence, Inc. Painted Streams:

  1. Archives a better than 10X import into relational database structures with full ACID support.
  2. Maintains the efficiency of fully RDBMS systems.
  3. Maintains the ease of use of document database.
  4. Controls cost of engineering resources through simplicity.
  5. Controls costs of data cluster resources through performance.

By requiring fewer resources, more reliable data availability, and more predictable costs, your engineering schedules become more manageable.

Types of High-Speed Analytics

Analytics are described in both technical and customer-facing terms. The following table summarizes the two categories of terms.

Technical Term and Description Customer Term and Description
Descriptive Analytics

Tells a customer what happened as close to the event as possible and involves data processing and analysis. 90% of analytics falls into this category.

Marketing Analytics

Uses Descriptive Analytics (what happened) to identify the highest value actions you can take in the future (Prescriptive Analytics). Examples of this are customer responses to ads and click through, A/B testing and other sales, software download and customer purchase events.

Predictive Analytics

Tells a customer what is most likely to happen next. It takes Descriptive Analytics described in the preceding table row, and applies machine learning algorithms to it to provide a probability of future events.

In-App Analytics

Uses a continuous data stream of application activity across a constellation of installed software/hardware to give a company insight into how their software is being used, when NPS (Net Promoter Score) events are occuring, and when the company is gaining or losing customer engagement. It uses Descriptive Analytics often analyzing records from thousands or millions of users once a minute or more. It is the IoT (Internet of Things) example where a company has deployed software or hardware to the field.

Prescriptive Analytics

Uses Predictive Analytics described in the preceding table row to recommend an action to take which predicts the optimal outcome.

Business Analytics

(Sometimes called Management Dashboards)
Gives a business manager actionable insight into the functioning of a business. This uses descriptive analytics to tell a business manager what is happening in the business (hopefully in near real time) and may contain elements of Marketing and other Analytics.

Examples of Business Analytics are sales organization tracking sales across an enterprise in near-real-time. Inventory supply chain for perishable goods (restaurant chains) often need to know where their goods are in the supply chain, what condition they are in and when they will arrive. And, complex manufacturing processes where events on the factory floor need to be tracked and responded too to keep production facilities running smoothly.

High Speed Analysis Tools

While there is an opportunity for the consultant to specialize in high speed analysis, there is also a problem. Integration Models and Logical Models are not very effective as per Gartner and Forrester (See Graphic above) and Open Source projects like Elasticsearch and Hadoop are largely considered to generic to be useful. While High Speed Analytic Models like Reltio and Novetta are in a walled garden that Consultants don’t have access too.

Making the Internet the Database with Painted Streams

Painted Streams is a concept by Painted Intelligence that uses a new concept in database management to perform analytics faster and more efficiently than anybody else on the market. By taking the approach that The Internet is the Database, painted streams is able to take this data and analyse it 10 to 500 times faster than existing systems.

Analytic systems all take massive number of events and siphon them into a database creating a relative bottleneck. This bottleneck can be solved in a variety of ways.

A Painted Stream Example

For example, Oracle demonstrated a $30,000,000 server able to handle 30 Million transactions a minute. Earlier attempts were based on a Hadoop or Elasticsearch system that has since fallen off the Gartner hype curve.

Painted Streams offers a new system demonstrating 18-million transactions per minute on a laptop–a price performance improvement of about 10,000 to 1. This is also coupled with ease of setup, use and relational efficiency giving the astute business person a quick-built solution and near-real-time access to data.

Painted Streams takes a revolutionary technique where the stream is the authority. Traditional RDBMS has the disk as the authority: locking the disk before a write, unlocking it after a write and confirming that the write was successful. Moving read, write and locks to the stream and changing the metrics on how transactions are actually conducted, Painted Streams achieves spectacular performance improvements.

Streams also have efficient distribution, replication and recovery–pointing a stream at multiple servers, repeat the efficient processing and replication and distribution are achieved. Painted Streams techniques challenge the CAP Theorem (Consistency, Availability and Partition Tolerance). From the perspective of the system of record which is the stream, all 3 can happen at once. The state of the remote system is guaranteed: it can recover it from the stream even in the worst conditions. Thus, it’s consistency, availability and partition tolerance is no longer in question. Maintain the stream until the server is back and replicate it on demand. Painted Streams makes the stream is the authority, not the disk.

To create a Painted Stream, you must create the:

  1. Database structure (RDBMS currently supported)
  2. Stream’s record types
  3. Queries you want to process out of the system (also included at the end of the JSON file above).
  4. Configuration file that links the records to the database. A configuration file that processes 18 million records a minute into 5 database tables requires about 180 lines of JSON. The JSON defines the source, destination and associations of each data element. For an example, see Appendix A: JSON File with Stream & Query definitions, below.

Further cloud deployment allows each stream and query to run as an independent lambda service for scalability. Queries which run against a combination of stored data and stream data and are instantly available are also under development and aspects of the system will continue in development as customer needs present themselves and as the product advances for more complex scenarios.

1 https://www.datanami.com/2015/08/26/why-gartner-dropped-big-data-off-the-hype-curve/

2 https://reprints.forrester.com/#/assets/2/87/RES119980/reports

3 https://reprints.forrester.com/#/assets/2/87/RES119980/reports (Figure 3)

Appendix A: JSON File with Stream & Query Definitions

The following JSON example defines the source, destination and associations of each data element as described under Example of a Painted Stream, above.

{

  "StreamParser": {

    "RecordCount": "1",

    "Streams": [

      {

        "tableStream": "Trader",

        "recordType": "Trade",

        "pkMethod": "insert",

        "Keys": [

          {

            "streamName": "traderName",

            "dbTable": "Trader",

            "dbColumn": "name",

            "dbPKey": "idTrader"

          }

        ],

        "DestinationTable": {

          "dbTable": "Trader",

          "dbPKey": "idTrader",

          "onFail": "confirmDependency",

          "DestinationValues": [

             {

              "streamName": "traderName",

              "lookup": "Query:getAssetId",

              "dbColumn": "idTrader",

              "action": "primaryKey",

              "type": "int"

            }, 

            {

              "streamName": "totalCost",

              "dbColumn": "cash",

              "action": "sum",

              "type": "USD",

              "initValue": "1000.00"

            },

            {

              "streamName": "brokerName",

              "lookup": "Table:Brokerage",

              "dbColumn": "idBrokerage",

              "action": "primaryKey",

              "type": "int"

            }

          ]

        }

      },

      {

        "tableStream": "Brokerage",

        "recordType": "Trade",

        "pkMethod": "insert",

        "Keys": [

          {

            "streamName": "brokerName",

            "dbTable": "Brokerage",

            "dbColumn": "name",

            "dbPKey": "idBrokerage"

          }

        ],

        "DestinationTable": {

          "dbTable": "Brokerage",

          "dbPKey": "idBrokerage",

          "DestinationValues": [

            {

              "streamName": "brokerName",

              "lookup": "Query:getBrokerageId",

              "dbColumn": "idBrokerage",

              "action": "primaryKey",

              "type": "int"

            }, 

            {

              "streamName": "totalCost",
     "dbColumn": "cash",

              "action": "sum",

              "type": "USD",

              "initValue": "10000.00"

            }

          ]

        }

      },

      {

        "tableStream": "asset",

        "recordType": "Trade",

        "pkMethod": "lookup",

        "Keys": [

          {

            "streamName": "asset",

            "dbTable": "Asset",

            "dbColumn": "name",

            "dbPKey": "idAsset"

          }

        ],

        "DestinationTable": {

          "dbTable": "Asset",

          "dbPKey": "idAsset",

          "DestinationValues": [

            {

              "streamName": "asset",

              "lookup": "Query:getAssetId",

              "dbColumn": "idAsset",

              "action": "primaryKey",

              "type": "int"

            }, 

            {

              "streamName": "amount",

              "dbColumn": "sold",

              "action": "sum",

              "type": "int",

              "initValue": "0"

            },

            {

              "streamName": "unitCost",

              "dbColumn": "value",

              "action": "replace",

              "type": "USD",

              "initValue": "100"

            }

          ]

        }

      },

      {

        "tableStream": "traderAsset",

        "recordType": "Trade",

        "Keys": [

          {

            "streamName": "traderName",

            "dbTable": "Trader",

            "dbColumn": "name",

            "dbPKey": "idTrader"

          },

          {

            "streamName": "asset",

            "dbTable": "Asset",

            "dbColumn": "name",

            "dbPKey": "idAsset"

          }

        ],

        "DestinationTable": {

          "dbTable": "TraderAssets",

          "dbPKey": "idTraderAssets",

          "DestinationValues": [

            {

              "streamName": "amount",

              "dbColumn": "amount",
              "action": "sum",

              "initValue": "0",

              "type": "int"

            }

          ]

        }

      },

      {

        "tableStream": "brokerageAsset",

        "recordType": "Trade",

        "Keys": [

          {

            "streamName": "brokerName",

            "dbTable": "Brokerage",

            "dbColumn": "name",

            "dbPKey": "idBrokerage"

          },

          {

       "streamName": "asset",

            "dbTable": "Asset",

            "dbColumn": "name",

            "dbPKey": "idAsset"

          }

        ],

        "DestinationTable": {

          "dbTable": "BrokerageAssets",

          "dbPKey": "idBrokerageAssets",

          "DestinationValues": [

            {

              "streamName": "amount",

              "dbColumn": "amount",

              "action": "sum",

              "initValue": "0",

              "type": "int"

            }]}} ],

    "Queries": [ {

        "queryName": "getAssetId",

        "resultType": "Item",

        "criteria": [{

            "streamName": "asset",

            "dbTable": "Asset",

            "dbColumn": "name"

          }],

        "selection": [

          { "dbTable": "Asset",

            "dbColumn": "idAsset",

            "type": "int"

          }]},{

        "queryName": "getBrokerageId",

        "resultType": "Item",

        "criteria": [

          {

            "streamName": "brokerName",

            "dbTable": "Brokerage",

            "dbColumn": "name"

          }

        ],

        "selection": [ { "dbTable": "Brokerage",

            "dbColumn": "idBrokerage",

                 "type": "int"

          }]},{

        "queryName": "getTraderId",

        "resultType": "Item",

        "criteria": [

          {

            "streamName": "traderName",

            "dbTable": "Trader",

            "dbColumn": "name"

          }

        ],

        "selection": [

          { "dbTable": "Trader",

            "dbColumn": "idTrader",

            "type": "int"

   }]}]