How to work on a BI Modernization strategy for your company — Part 2

In the Part 1 of this article I had written about what should be the goals of a BI Modernization strategy and how grouping your existing BI use cases by data latency requirements can help clarify your BI Strategy. The three groupings by data latency are Real-Time, Operational and Analytical Reporting. In this article I will cover possible architectures for each of these use models.

Design Pattern

The key design assumption for these architectures are the following –

1. Companies will have multiple visualization tools

2. A single architecture and set of tools might not be able to satisfy all the three data latency use cases.

Real-Time Reporting

The characteristics of real time reporting are that users need the data “as it is any point” in the source system when they run the report. The users need data in real-time from one data source like Oracle EBS, SAP, or Salesforce, and the volume of data required for each report is relative small — few tens of thousands at most and the UI needs of the reports are simple, which means mostly they need a table like report. Given the fact that there are multiple source systems that need real-time reporting each might have a different way of accessing the data. Some might have JDBC/ODBC interfaces where you can run SQL queries while others might have an API interface. The other consideration is that there might be multiple visualization tools which can be used for real-time reporting. So, given the fact that there are multiple back end systems and multiple visualization platforms and there is no need to join one system to another (no need for query federation) a central data modeling tool would be ideal. This will avoid creating data models in different data visualization tools and give the capability to restrict the data volume and query performance on the source systems, so the reporting platforms do not impact the source system resources. A leading data virtualization tool like Denodo will be a good solution for this use case and I have written about it here in detail. The data virtualization tool will provide the semantic layer and it will enable the various BI tools to connect to it so if a new BI tool is introduced there is no need to redo the data model.

Operational Reporting

The data needed for operational reporting are usually in multiple systems and the reporting solution should be able to join these data together. Each of these systems can have data in a different shape and format and hence it is important to bring the required data into a central database. If you have a lot of data sources and millions of rows that need to be brought in, then query federation is usually not an option as there will be significant performance differences between the different sources and hence user experience might not be optimal. So, you will need to decide on an ELT/ETL tool to move the data and a database to host the data. Once the data is brought into the central database then the data modeling exercise begins where the data is reshaped and joined appropriately based on the reporting requirements. After the data modeling is done then business friendly labeling can be applied and calculated columns like currency translations, gross margin can be added. Once this is complete, then the reporting and data visualization activity can begin.

There are two approaches companies can make –

  1. Choose one tool which does all the tasks — ETL/ELT, Database, Data Modeling, and Data Visualization. There is a new industry term for these kinds of tools called Unified Data and Analytics Platforms (UDAP). These UDAP tools don’t have best of breed features in any of these four domains but the advantage is that they have a single stack which leads to well-integrated set of capabilities which should lead to significant time to value. Examples of these tools include Incorta, Birst, Domo, Pentaho and others. I specifically have good experience with Incorta which has been very good for operational reporting from large complicated on-premise and cloud data sources and joining them at scale. I have written about the Incorta architecture and capabilities here.

2. Use best of breed software to create operational reporting stack

A best of breed approach can also be taken to create the operational reporting stack. In this case, you will have to choose an ETL/ELT including CDC mechanism which can get near real-time data into the target database quickly and without much transformation in the process. I have talked about how to enable real-time analytics to Snowflake here in detail. The transformation will happen after it is loaded into the target database. In this case, an ETL/ELT tool like Matillion, Stitch, FiveTran and others can be used to move the data into the target database. In today’s world the target database totally makes sense to be one of the major cloud platforms like Snowflake, Redshift, or Google Big Query. If you are using a tool like Stitch or FiveTran which are primarily data ingestion tools then using dbt to transform the data will be very useful. Matillion does both ingestion and transformation as well as orchestration.

This stack works best if all the major data sources are on the cloud. If there are large volumes of on-premise data and data is needed to be loaded several times a day, I don’t think this will be a very robust data pipeline. The biggest advantage is that you will have your BI tool of choice for operational, analytical and real-time reporting. You will have to evaluate your data sources, volume and frequency of loads to make sure this is the best architecture for your operational reporting.

Analytical Reporting

Analytical reporting has the following characteristics –

1. Analyze data trends over months, quarters or years.

2. Frequency of data loads can be daily, weekly or monthly from various sources. No need for loading several times a day.

3. The data is usually stored for the long term in the form of snapshots.

4. Data is summarized to a more aggregate form.

5. Data can come from multiple sources.

6. Analysts and users prefer to use their favorite BI tools. The end users skew more towards senior management.

The proposed architecture for analytical reporting is not too different from the best of breed tools for operational reporting. This architecture should address all the six major characteristics of analytical reporting.

The important distinction between operational reporting and analytical reporting are the reduced frequency of data loads, use of snapshots and aggregations to store data for long term, and the use the best self-service data visualization tool that your organization has access to. The modern cloud data warehouses like Snowflake, Redshift and Google Big Query are well suited for this purpose as they perform fast, provide cheap storage for snapshots and data retention and can be accessed by a wide variety of BI tools for visualization.

Summary

When you look to modernizing the data architecture for your company, it might be helpful to classify your current BI landscape into real-time, operational and analytical reporting buckets. This will give clarity on how significant each of these use models are for your company. Once that is done you can design an architecture for each of these and then build your existing reports and analysis into one of these three architectures and then continue using these platforms for your future requirements.

I would love feedback on whether thinking about your BI landscape in this way using data latency requirements is useful to you and your organization. I look forward to your comments.

How Incorta works and how it can fit into your enterprise BI strategy

Incorta is the new kid on the block in a very competitive enterprise business intelligence tools space. The company started around 6 years back and we have been using it for the last 2.5 years successfully for operational reporting.

This blog was written in October 2019 and represents Incorta features as of their 4.3.x version. Please keep in mind Incorta builds significant new capabilities in each of their quarterly releases and the product changes over time. I will try to add new articles to go over significant new capabilities in the platform.

Incorta has taken a very different approach to solve BI and analytics problems than any other vendor in the market. They are essentially four different integrated BI tools in one.

These are the –

1. Data Extraction Layer

2. Database layer

3. Data Modeling Layer

4. Data Visualization Layer

In addition to the above Incorta has a very close integration with Spark and comes with the same installation package. Spark integration provides 2 functions for Incorta –

1. Advanced calculations and joins

2. SQL Interface (SQLi) or Datahub for external BI tools to connect with Incorta

How Incorta Works?

In a traditional BI environment, you have at least three tools (if not four if you have a separate semantic layer tool) to do the same set of tasks. You will have a ETL tool like Informatica or SSIS to load into a database like Oracle or SQL Server and then data model and visualize the data in a BI tool like MicroStrategy, OBIEE or Tableau. You must keep in mind that these are best of breed tools which have been around for a long time with tons of features.

Incorta does a lot of these tasks very well but it cannot match the same set of features as these best of breed solutions in the various categories. As a combined package it delivers a lot of capabilities for BI developers and users. But first let’s take a look at the various components of the Incorta platform.

Components of Incorta platform

Here is a very brief and simplified description of the various components of the Incorta platform and what is special or unique about them –

1. Data Extraction Layer

The first layer of Incorta platform is the Data Extraction layer. Here they have built a technology called Direct Data Mapping where you can extract data table by table from the source system both as a full load or as an incremental load. The data then goes into a parquet file in the Incorta platform and then gets loaded into memory. Each table can have a have a full load and incremental load SQL

Full Load SQL can look as simple as the following –

Select COL_A, COL_B, COL_C from TABLE_1 where CREATED_DATE>=’01–01–2016’

Incremental Load SQL can look like the following –

Select COL_A, COL_B, COL_C from TABLE_1 where LAST_UPDATE_DATE>?

When the load is kicked off the table gets populated for a full load and then in each incremental run it will get the incremental data from the source. Incorta can pull data from almost all traditional databases like Oracle, SQL Server, Cloud applications like Salesforce or from files stored on on-premise or cloud storage. The number of source applications supported increases in each release of the Incorta platform.

After loading data from source systems into parquet files, materialized views (MV) can be created using the Incorta UI and can be written in SQL or PySpark. During run time Incorta sends the request to Spark to read the data from Parquet files and compute the data and reload into a separate parquet file which then gets loaded into memory during the loading stage. The Spark component comes with the Incorta package but does involve some set up to be done for it to work properly.

The advantage of pulling table by table is that the performance of the loads is very fast as there is no need to perform the joins in the source system and the second advantage is that it becomes very easy to add new columns into Incorta as all you need to do is add a new column in the extract SQL and run a full load. Hence an addition of a new column in Incorta can be done in a few minutes compared to days or months using traditional ETL methods.

2. Database Layer

Once the data is extracted into parquet files it is then loaded into the in-memory database engine and then the joins defined in the schema are precomputed at the end of the load. This precomputation of the joins leads to the revolutionary performance of the reports in Incorta. Even if the report built in Incorta has 30 or 40 joins across large tables containing millions of rows the data still comes back in a few seconds. This kind of performance is unheard off in a traditional database.

3. Data Modeling Layer

Incorta data modeling layer comes in two forms — physical schemas and business schemas.

In the physical schema you can create aliases to base tables and materialized views and create joins between various tables and MVs. Business schemas can be created to present a flat, user friendly representation of the physical schema to the user building the reports. Columns can be brought in from one or more tables in different physical schemas and renamed. Formula columns can be added either in physical or business schemas.

4. Visualization Layer

Incorta comes with its own visualization layer. Insights (Reports) and Dashboards can be created on top of physical or business schemas. Analyzer users can create insights or reports with in-built filters, prompts and bookmarks that then end users can consume. There are multiple types of visualizations like charts, pivot tables as well as drill downs are available.

Incorta visualization is good enough for most purposes but when compared to a more best of breed product like Tableau, Power BI or MicroStrategy it does fall short a bit. The Incorta development are rapidly adding features in various releases and they should be able catch up to the other tools in terms of UI capabilities sooner than later.

In addition, Incorta provides a SQL Interface using which other BI tools like Tableau or Power BI can connect to the business schema, physical schema or the Parquet layer.

Where does Incorta fit in your BI architecture

Incorta comes with several distinct advantages compared to traditional BI tools –

– revolutionary performance of reports

– no tuning or DB maintenance needed

– can hold massive amounts of data (billions of rows of data) in memory

– complex data modeling is possible

– simple to develop and make changes as reshaping of data is not needed

– easy to learn and get started for any developer

– a platform where all the four layers of BI are available

– cross data source joins is very easy to achieve

There are three kinds of reporting in any company — Operational Reporting, Real-Time reporting and Analytical Reporting (snapshots, period over period analysis). Incorta’s sweet spot is operational reporting requirements where data can be pulled from various transaction systems and joined together to provide lighting fast reports. You can load data into Incorta several times a day from your source systems and provide operational insights that the users need to make daily decisions to run their business.

In summary, if you are looking for a solution to get lighting quick operational reporting with data joined from multiple complex source systems then Incorta is the one of the best options that is out there today.

How Incorta enables Operational Analytics without cumbersome ETL or star schema

Link to the demo and to the full webinar

The key benefits we realized with Incorta for operational reporting are –

  • Delivered real-time operational reports against data aggregated from complex data models of Oracle EBS, Oracle Siebel, and Oracle Agile
  • Empowered business user to self-serve even the most complex reports and analytics
  • Redeployed 13 of 15 BI support resources to work on more innovative projects
  • Reduced new report delivery time from months to mere hours
  • Joining data from multiple sources like Oracle and Siebel which were very difficult to do in the past other than a slow dblink between two Oracle databases
  • Reduced complex 40 table reports running time from 6 hours to seconds