How Incorta works and how it can fit into your enterprise BI strategy

Incorta is the new kid on the block in a very competitive enterprise business intelligence tools space. The company started around 6 years back and we have been using it for the last 2.5 years successfully for operational reporting.

This blog was written in October 2019 and represents Incorta features as of their 4.3.x version. Please keep in mind Incorta builds significant new capabilities in each of their quarterly releases and the product changes over time. I will try to add new articles to go over significant new capabilities in the platform.

Incorta has taken a very different approach to solve BI and analytics problems than any other vendor in the market. They are essentially four different integrated BI tools in one.

These are the –

1. Data Extraction Layer

2. Database layer

3. Data Modeling Layer

4. Data Visualization Layer

In addition to the above Incorta has a very close integration with Spark and comes with the same installation package. Spark integration provides 2 functions for Incorta –

1. Advanced calculations and joins

2. SQL Interface (SQLi) or Datahub for external BI tools to connect with Incorta

How Incorta Works?

In a traditional BI environment, you have at least three tools (if not four if you have a separate semantic layer tool) to do the same set of tasks. You will have a ETL tool like Informatica or SSIS to load into a database like Oracle or SQL Server and then data model and visualize the data in a BI tool like MicroStrategy, OBIEE or Tableau. You must keep in mind that these are best of breed tools which have been around for a long time with tons of features.

Incorta does a lot of these tasks very well but it cannot match the same set of features as these best of breed solutions in the various categories. As a combined package it delivers a lot of capabilities for BI developers and users. But first let’s take a look at the various components of the Incorta platform.

Components of Incorta platform

Here is a very brief and simplified description of the various components of the Incorta platform and what is special or unique about them –

1. Data Extraction Layer

The first layer of Incorta platform is the Data Extraction layer. Here they have built a technology called Direct Data Mapping where you can extract data table by table from the source system both as a full load or as an incremental load. The data then goes into a parquet file in the Incorta platform and then gets loaded into memory. Each table can have a have a full load and incremental load SQL

Full Load SQL can look as simple as the following –

Select COL_A, COL_B, COL_C from TABLE_1 where CREATED_DATE>=’01–01–2016’

Incremental Load SQL can look like the following –

Select COL_A, COL_B, COL_C from TABLE_1 where LAST_UPDATE_DATE>?

When the load is kicked off the table gets populated for a full load and then in each incremental run it will get the incremental data from the source. Incorta can pull data from almost all traditional databases like Oracle, SQL Server, Cloud applications like Salesforce or from files stored on on-premise or cloud storage. The number of source applications supported increases in each release of the Incorta platform.

After loading data from source systems into parquet files, materialized views (MV) can be created using the Incorta UI and can be written in SQL or PySpark. During run time Incorta sends the request to Spark to read the data from Parquet files and compute the data and reload into a separate parquet file which then gets loaded into memory during the loading stage. The Spark component comes with the Incorta package but does involve some set up to be done for it to work properly.

The advantage of pulling table by table is that the performance of the loads is very fast as there is no need to perform the joins in the source system and the second advantage is that it becomes very easy to add new columns into Incorta as all you need to do is add a new column in the extract SQL and run a full load. Hence an addition of a new column in Incorta can be done in a few minutes compared to days or months using traditional ETL methods.

2. Database Layer

Once the data is extracted into parquet files it is then loaded into the in-memory database engine and then the joins defined in the schema are precomputed at the end of the load. This precomputation of the joins leads to the revolutionary performance of the reports in Incorta. Even if the report built in Incorta has 30 or 40 joins across large tables containing millions of rows the data still comes back in a few seconds. This kind of performance is unheard off in a traditional database.

3. Data Modeling Layer

Incorta data modeling layer comes in two forms — physical schemas and business schemas.

In the physical schema you can create aliases to base tables and materialized views and create joins between various tables and MVs. Business schemas can be created to present a flat, user friendly representation of the physical schema to the user building the reports. Columns can be brought in from one or more tables in different physical schemas and renamed. Formula columns can be added either in physical or business schemas.

4. Visualization Layer

Incorta comes with its own visualization layer. Insights (Reports) and Dashboards can be created on top of physical or business schemas. Analyzer users can create insights or reports with in-built filters, prompts and bookmarks that then end users can consume. There are multiple types of visualizations like charts, pivot tables as well as drill downs are available.

Incorta visualization is good enough for most purposes but when compared to a more best of breed product like Tableau, Power BI or MicroStrategy it does fall short a bit. The Incorta development are rapidly adding features in various releases and they should be able catch up to the other tools in terms of UI capabilities sooner than later.

In addition, Incorta provides a SQL Interface using which other BI tools like Tableau or Power BI can connect to the business schema, physical schema or the Parquet layer.

Where does Incorta fit in your BI architecture

Incorta comes with several distinct advantages compared to traditional BI tools –

– revolutionary performance of reports

– no tuning or DB maintenance needed

– can hold massive amounts of data (billions of rows of data) in memory

– complex data modeling is possible

– simple to develop and make changes as reshaping of data is not needed

– easy to learn and get started for any developer

– a platform where all the four layers of BI are available

– cross data source joins is very easy to achieve

There are three kinds of reporting in any company — Operational Reporting, Real-Time reporting and Analytical Reporting (snapshots, period over period analysis). Incorta’s sweet spot is operational reporting requirements where data can be pulled from various transaction systems and joined together to provide lighting fast reports. You can load data into Incorta several times a day from your source systems and provide operational insights that the users need to make daily decisions to run their business.

In summary, if you are looking for a solution to get lighting quick operational reporting with data joined from multiple complex source systems then Incorta is the one of the best options that is out there today.

What is Data Virtualization and how it can unlock real-time insights directly from source systems

In today’s fast-moving business environment there are a lot of reporting requirements which cannot wait for a batch process to load data from operational systems. The business users need the data as it is currently in their on-premise ERP, CRM or any of the cloud systems. For example, towards the end of a fiscal quarter, sales and finance leaders want to know the order bookings in real-time so that they have a good idea if they are going to “make” the quarter or not in terms of revenue. They cannot wait 4–8 hours of delay for a ETL job to finish.

Data virtualization (DV) is a method of building a “logical” warehouse by connecting “virtually” to various source systems and combining or joining them “virtually” to provide a unified data layer for BI systems or enterprise applications to query from.

Fig 1: High Level Architecture of Data Virtualization

The alternate to the data virtualization approach of providing a unified layer is the traditional ETL approach of moving data from the various source systems at periodic intervals and loading them into a physical data warehouse. The data in the warehouse is then combined by loading them into a target star schema or snowflake data model. ETL loads are usually a batch process and are loaded once a day or a few times a day depending on the time it takes to load and the data latency requirements.

How DV Works?

At the core of it DV is a SQL generation tool with an easy to use UI to build a semantic layer across various data sources. Instead of loading data into a physical warehouse by extracting, transforming and loading (ETL) from various source systems, with DV the data will primarily remain in the source systems and the DV tool will load the metadata of the tables (or other objects like views) from source systems into the DV tool. We can call these virtual tables as “Physical Views”. Once the metadata of the tables are loaded, the DV tool will allow the developer to perform various manipulations to these physical views including joins, and formula columns to enrich them with required business logic. We can name these as “Logical Views”. Once the joins and formula columns are added the developer can then build out the “BI Datasets” using these various logical views to create datasets which are flattened views of the data for a specific business purpose. These datasets can include fixed filters if they are always needed.

These flattened datasets will be then be imported in a BI tool like MicroStrategy, Power BI or Tableau. In these BI tools, users can add further business formula columns, filters or even join two “BI Datasets” from the DV tool. The DV tool will take the input call from the BI tools and then translate the SQL which is appropriate to the specific backend data source, be it a database like Oracle, SQL Server, Teradata or a cloud system using their API like Salesforce.

Fig 2: How Data Virtualization Works

The best DV tools perform SQL tuning before sending queries to the backend data sources. Instead of just sending the base SQL to the backend the DV tool will introspect the incoming request, optimize the query for the specific backend data source and then send the SQL or API call to that data source.

In addition, the leading DV tools can cache any of physical views, logical views or BI Datasets for better performance if the source is slow, either in a native database or a commercial database like Oracle or Snowflake.

Advantages of DV

Data virtualization comes with a lot of benefits –

1. Simplified view of complex source system schemas

If the semantic layer is built in the DV tool and the “BI Datasets” are exposed as a flat table/view to be imported in a BI tool like Tableau or Power BI, the complexity of the source schema like Oracle EBS or SAP will be hidden to the BI developers. BI developers are usually not very familiar with the source system tables and struggle to build a semantic layer and instead depend on a database view in the source system. Instead, building the semantic layer in the DV tool not only simplifies a complex schema, it also avoids maintaining a complex and hard to change physical views in source databases.

2. Speed of development

Since DV just imports the metadata of tables and creates virtual tables which mirrors the source objects, complex ETL to achieve real time reporting can be avoided. Adding new fields or tables in ETL could take several weeks to months as star schemas have to be designed and built, but in DV the shape of the data can be preserved to be the same as the source and hence adding new tables or fields will be much easier and quicker.

3. Real time reporting need

Since DV points to the source system directly, either through ODBC/JDBC to the database or through an API to cloud systems the latency of the data is now zero to a minimum. The data available to the user is real-time and for use cases where users need data immediately as it changes DV really comes through as a great option.

4. Source system changes can be hidden from BI tools

A big benefit of DV is that when a source system is upgraded or even if the company replaces a source system with another or is planning to just move databases from let’s say Oracle to Snowflake, virtualizing the semantic layer will help hiding these complex changes in the environment from the various BI tools. To accommodate changes which occur in source systems due to upgrades or database changes, in the DV tool the data source connection needs to be re-pointed to the new database if the schema is the same or some changes in relevant physical views will be needed if there are schema changes like column additions etc. This way the reports and visualizations built on the various BI tools do not have to change and users are not impacted.

5. Performance optimizations and caching

Most leading DV tools provide the ability to cache certain tables from the source database or cache datasets as well. Caching will significantly improve the performance of the reports in the BI tools. The best DV tools tunes the queries they receive from the BI tools and then pushes them down to the source databases. They also find ways to extract and move minimum amount of data from different data sources before joining them in the DV tool and sending to the BI tools.

Data Virtualization Tools

There is a lot of confusion in the market as to what Data Virtualization is. For Data Virtualization as described here, these are tools that companies can evaluate for their needs –

Commercial Products

1. Denodo

2. Tibco Data Virtualization

3. DataVirtuality

4. AtScale

5. Azure Analysis Service

Open Source Alternatives

1. Dremio

2. Teiid

We personally selected Denodo and we really like it for working with on-premise databases, cloud data sources, merging data from multiple sources as well as presenting a simplified view of the data to multiple BI tools. I will write a more detailed article on Denodo and how it works on a future article.

Is Traditional ETL dead?

For many years traditional Extract, Transform and Load (ETL) was the only way to extract and transform data out of operational systems and load into a data warehouse for analysis by a business intelligence tool like Business Objects, MicroStrategy or OBIEE. Traditional ETL works by extracting data out of one or more transactional systems, merging and reshaping the data using staging tables and populating dimensions and fact target tables in a data warehouse. These data warehouses were usually built on Oracle, SQL Server or Teradata databases.  To load the data warehouse using the traditional ETL method the following steps need to be accomplished –

  1. Understand the business requirements – you must know all the ways the users will slice and dice the data, what is the grain of the data (is it at customer level or order line level etc.)
  2. Data model the star and snowflake schemas including all facts and dimensions, their keys and how they all join
  3. Figure out where the data in the source systems are coming from to populate the facts and dimensions and then write the SQLs which join the various tables in the source system to populate these individual facts and dimensions with appropriate keys
  4. Figure out the load order and write a workflow to populate the dimensions and facts in the appropriate order. This is important because the dimensions need to loaded first before the fact tables for the dimensional keys to get correctly looked up when the facts get loaded.
  5. Figure out the tables will be populated with Incremental and full loads

Let’s go over each of these steps and see what is involved –

Business Requirements

This is probably one of the most difficult part of any traditional BI project. Net new BI and data warehousing projects are usually started when new operational systems are put in place. For example, the company can implement a new ERP or CRM or a cloud HR system. To give accurate business requirements for a traditional ETL/BI project the business users need to come up to speed on the new source system and its data structure and then try to guess all potential reports and ways to slice the data in the future and define a set of requirements which IT developers can understand.

The probability of getting the requirements right are very low. But the expectation of the IT and business leadership is that proper requirements written in stone needs to be provided upfront before the implementation of the project begins. This is because the ETL developers need to know what data to pull from the source to populate the data warehouse whose data model in turn has been designed based on the business requirements.

Data Modeling

Data Modeling entails working backwards from what questions the business users will ask and then designed to answer those questions. The metrics or measures goes into the fact tables and the attributes by which these metrics can be sliced and diced are modeled as dimensions around these fact tables. The fact tables then will have keys to each of these dimensions.

With this method of building a warehouse each set of question will need a new fact table. For example, if the business users want to analyze opportunities, revenue, orders, COGS or GL data they will need to design a different fact and associated dimensions for each of them. Detailed analysis needs to be done upfront to understand the grain (aggregation level) of each of the fact tables and the dimensions are then designed to match this grain.

Build source SQL or API data pull

Once the data model of the data warehouse is prepared its handed over to the ETL developers. The ETL developers need to understand where the data resides in the source system and write a SQL or API from cloud systems or from flat files to pull the data into the dimensions and facts contained in the data warehouse. Staging tables are usually there for intermediate hosting of the data which are then populated into the target tables. Incremental and full loads also need to be designed.

Here are the challenges with writing SQL/flat files/API from the source system –

  1. ETL developers rarely understand the source system well. For example, ETL developers usually have very little knowledge of SAP or Oracle ERP tables. They must rely on source system experts who normally don’t have too much time for them.
  2. Once they write the SQL and populate the staging and target tables they must rely on the BI teams to have completed the semantic data model and build out the reports before any business user can test the results. By the time all this happens the project is almost at its end the data is usually very different from what was expected. The whole BI/ETL project team must go back to the drawing board several times to fix the issues. All this causes a lot of stress and delays in the project.
  3. The SQL that is written is usually very expensive in terms of system resources in the source database and must be rewritten due to performance reasons. This also sometimes needs significant logic changes leading to data errors etc.

Load Order

The ETL developers also need to figure out which is the correct order to load the dimension and facts. This can get very tricky sometimes when there are a lot of interdependent tables that are getting pulled and especially if the data is coming from multiple source systems.  Also, the right set of dimensional keys need to be populated in the all fact tables with right look up logic.

Incremental and Full Loads

Incremental loads are needed in a lot of cases when there a lot of rows that get updated on the source system. But getting the logic right as to which rows are getting updated can get very complicated when there are not proper Last Update Date columns in the source data. Sometimes complicated OR statements need to be written if a dimension or fact gets populated from multiple source tables.

In addition, when a new column is added to the target dimension or fact a full load needs to be triggered or an update SQL needs to be written to populate the column for older rows. Also, deleted rows in the source system need to be taken care as there might not be a clear flag or trigger in the source to indicate which rows are deleted.

Speed of Change

Given all the above complexities getting a change done – for example adding new dimension, fact or even new columns starts to get very difficult to do when traditional ETL is used. The skill level of the ETL developers to perform all the above steps need to be very high and they are very difficult to find. In addition, testing a change using the traditional ETL approach gets quite complicated as the data gets reshaped during the load from an OLTP data model to a OLAP data model. It could take weeks to add a new column or months to add a new dimension or fact table, test and move the change to production.

Can the business really wait that long for a simple change? The answer is clear – most business users are expecting changes to their BI systems to be done in a matter of days if not less.

So, is traditional ETL the best way moving forward for new BI projects?

The answer is NO. With today’s technology including cloud data warehouses, replication methods, and powerful self-service visualization tools there is very little reason to go with the traditional ETL and legacy BI tools approach for any new BI project.

There are several other new techniques to solve the problem of getting the data out of the source system and presenting the data in a meaningful manner to the end user.

  1. Replication – Replicate the source data to the target database (increasingly cloud data bases like Snowflake or Amazon Redshift are becoming attractive target databases) and build the business logic post replication
  2. Data Virtualization with a semantic layer and caching in a fast database

I will cover more about these two approaches in future blogs.

What about existing ETL driven data warehouses and BI solutions?

If the existing ETL+BI solution does not require frequent changes and there is a low cost of maintenance, then my suggestion is let the users make use of the system till it loses its business value. If there is a significant change in the source system like move to cloud or an upgrade which needs a lot of change to the ETL, that might be the time to replace the legacy solution to the new architecture.

You don’t have to replace the whole solution to the new architecture at one. You can start with the business-critical reports and then consider stopping the software maintenance support for the legacy BI and ETL tools. Then you can take your time to migrate the other valuable content over to the new platforms and shut down the old ETL/BI solution in a phased manner.

Why MicroStrategy is a great BI tool for governed self-service

In the previous blog post I had explained what is governed self service and how to set it up in your company.  To be successful governed self service needs a software which supports different set of features and capabilities for different level of users.

In a governed self-service model, IT’s role is to build the data pipelines and ETL from the various source systems to the data warehouse and build out the sematic model using the help of the super user.

The super user will be primarily building the master datasets for consumption by the reports and dashboards. The master dataset hides the complexity of the various joins and where the data came from. It will have all the attributes and metrics for a specific business need. The master dataset is validated for data accuracy by the super users and is deemed as a trusted dataset for the power users. The master dataset can be a cached for high performance in-memory or a cube format.

The power users use the master dataset to analyze, slice and dice the data the way they want. They can build story boards, pivot the data or mash it up with other local data sources to provide the analysis that their end users need. The end user is the consumer of the data and makes decisions based on the analysis provided to them.

What is also critical is that the BI software in the environment need to support the governed self-service model. While a governed self-service can be achieved to some extent in different tools, MicroStrategy seems to be the best tool built from the ground up to support this model.

The various components in MicroStrategy that make up the governed self- service are as follows –

IT ownership

Data warehouse connection – Connecting MicroStrategy Developer to the database and importing the relevant tables

Logical Tables – This is where the developer defines the table or view definitions

Attributes, Facts and other schema objects – The next step in the development process where the attributes, facts, joins and other data model components are built by the IT developer.

Super User Ownership

Once the attributes and metrics are built then the Super User can build the prompts, filters, hierarchies and finally the grid report. The super user can include all the necessary attributes, metrics and filters in this master grid report. This report can be used to support one or multiple dossiers. The idea is that the super user uses her knowledge of the data and business need and creates a master report which has all the information needed for analysis and the users of this report are confident that the data is accurate and in a useful format.

The grid report in MicroStrategy is the basic component on which the Master Dossiers can be built.

In the MicroStrategy web the super user can right click the grid report and create a master dossier. Once the master dossier is created the grid report can be hidden from the power and end users.

The master dossier can then be copied by the power users for further analysis.

Power User/End User Ownership

The Power Users can copy the master dossier into their folders and further create formula columns, add/remove columns, pivot the data, create visualization as needed for their end users. They can be confident that the master report has the data they need, and it is accurate.

The biggest advantage of this model of operation is that if a need field, prompt or any other changes are made to the master grid report by the super user it automatically gets propagated to all the dossiers that are built on top of it. This way the “master dataset” on which all the analysis that are built is owned by the super user who has the best knowledge of the data and the closest relationship to the IT developers.

If the BI structure (both technically and organizationally) is setup in the above fashion, then the power of governed self-service can be achieved in your company using MicroStrategy. This is a great way to utilize the skills and expertise of the various teams as well as getting accurate data to the analysts and end users.

What is Governed Self Service and how to set it up in your company?

After working on many different BI projects, tools and teams I have come to strongly believe in the concept of a “Governed Self Service” model. In this article I will explain how to set this up technically and how to structure your BI teams (both IT and Business users) to make this model successful. Before we go into what is Governed Self Service let’s look at the other two extreme models for making BI operational in your company – 100% Self Service and 100% Governed. By analyzing the strengths and weaknesses of these models, the need for governed self-service model becomes clear.

In any business intelligence/analytics model there are several different applications and people. I want to make sure the definitions are clear before I delve into the various operating model for BI. These are most commonly occurring roles in different companies. I’m sure there are some minor variations in different companies.

Systems

  1. Source Systems – These are the source systems that essentially create the data to be analyzed – ERP (Oracle/SAP/Netsuite), CRM (Salesforce, Siebel), Google Analytics, Marketing (Eloqua, Marketo and others), supply chain and plenty of other custom operational and transactional systems.
  2. Data Warehouse – This is the curated predefined schema designed, loaded and maintained by IT developers in a database of choice – Oracle, SQL Server, Teradata etc..

People

  1. IT Developer – ETL/BI tool – This is the traditional IT role where they have database logins to source systems and from these systems, load the warehouse periodically.
  2. Super User – These users reside in the business and understand the data very well and are technical. They interact with IT and work with power users and end users.
  3. Power User/data scientists – These users work very closely with the end users (CFO, VP of Sales) and use the reports provided by the super users to further analyze and create stories out of the data.
  4. End User/Consumer – These are the end customers of the data and analysis and they and their teams take business actions out of the data.

Now, let’s get back to the 100% self service and 100% governed models.

100% Self Service Model

In simplistic terms this model will operate like this – Business users get direct database access to the on-premise source systems like ERP database or get “extract all” access to cloud systems like Salesforce. Then they take this data and create a  data mart on a small server and then they mash up the data. These department users would have bought Tableau or Power BI licenses and they can wrangle the data and create dashboards and insights out of this data mart.

The advantages of this approach are –

  1. Nimble – This is very nimble as it is done by the super users and they know the data and requirements well and can create a quick and dirty solution fast
  2. Cheap – They can circumvent the enterprise IT processes and rigor which tend to add cost to the implementation.

The disadvantages of this approaches are plenty –

  1. Very people dependent – These solutions tend to stick for a long time and the person who built this solution is stuck doing this for longer than they want. If they leave then the department struggles to maintain the solution and go back to IT to help maintain and enhance this solution.
  2. Security – The direct database logins to back end transaction databases leads to big security risks. In addition, these small business owned databases and systems don’t get patched regularly and pose security risks for the company.
  3. Data Discrepancies – Since there could different interpretation of the same data set, it’s very common to see 2 different answers for the same question when built in different platforms.

Self-service systems are very useful when the data is new and not understood yet. It needs further data discovery and analysis from business users to understand which part of the data is useful for the organization. Once the useful data is identified and it is determined that the analysis of this data is needed for the long term, then a new long term model for this data is needed. It is too dangerous to leave this in the hands of a business super user and the data sitting in a server under their desk.

100% Governed Model

This is the opposite of the 100% self-service model. Here all the requirements are fed to a central IT team and they build the ETL and data pipelines from the data sources into a data model built into a central data warehouse. Then IT creates a semantic layer in a BI system like OBIEE or Business Objects and then build reports and dashboards on this.

In this model, the business team is primarily responsible for providing requirements, testing and consuming the reports and dashboards that IT produces.

The advantages of this approach are –

  1. Very tight governance and security
  2. Centrally built and delivered data – everybody gets the same trusted data

The disadvantages of this approach are –

  1. Slow delivery cycles – IT controls everything and they always have competing priorities. Any new project must go through the rigor of development, test and release cycles. Even small changes might take months to implement. Business cannot wait for data that long anymore.
  2. Knowledge/Know how is underutilized – There are a lot of business people with decent technical knowledge and very good knowledge of the data. Their capabilities are not used in this model.

This kind of 100% governed model is on the decline because of the volume and velocity of data these days. Decision makers cannot wait months for data to do their analysis.

Hence, most companies are going down the 100% self-service model bypassing IT. There is a better way to do this – Governed Self- Service.

Governed Self-Service

Pic 1: Steps involved in the Governed Self-Service model

In the Governed Self Service model, there are distinct parts of the analytics workflow and distinct roles for each of the user groups in the process.

IT’s Role

In this model, based on business needs the IT team will build the data pipelines and ETL from the various source systems to the data warehouse or data lake. On top of this data warehouse in the business intelligence system they will build the semantic data model along with the help of the business super user. To build the joins, dimensions and metrics in the semantic layer IT developers need the help of the super users who understand the data and the business requirements.

Super User’s role and the concept of the “Master Dataset”

In addition to helping IT build the semantic data, the super user will be primarily building the master datasets for consumption by the reports and dashboards. The master dataset is a crucial component of the governed self-service model. This dataset hides the complexity of the various joins and where the data came from. It will have all the attributes and metrics for a specific business need. For example, the master data set will have all the attributes, metrics and calculated formula fields necessary for revenue analysis in a company. For this analysis to be complete the dataset needs to have the time, product, customer, order, pricing, quantity, currency, and various other information. The super user’s role is to build this master dataset which makes it easy for the power users to further analyze the data for their end users. The master dataset is validated for data accuracy by the super users and is deemed as a trusted dataset for the power users. The master dataset can be a cached for high performance in-memory or a cube format.

Power User’s Role

The power user is less technical than the super user and they normally don’t understand the intricacies of the data model behind the master data set. They are the analysts that the executives and decision makers go to get their data from. They could also be data scientists who need the data prepared and ready for analysis. The power users will need to go the super users to get their master dataset ready for analysis.

Once they have the master dataset, the power users will be empowered to analyze, slice and dice the data the way they want. They can do this in Excel or in the BI tool of their choice like Tableau, Qlik or Power BI. They can build story boards, pivot the data or mash it up with other local data sources to provide the analysis that their end users need.

End User’s Role

The end user is the consumer of the data and make decisions based on the analysis provided to them. They could also learn the tools and with the help of the power users can do a lot of their own slicing and dicing of the master data. Their primary role in the governed self-service model is to provide accurate and timely requirements to the power users as to what data and analysis they need to do their job. Power Users in turn work with the super users who prepare the master dataset if the data is already available in the semantic model if not they will work with IT to extract and prepare the data as needed.

It’s clear that for the governed self-service model to work the right people need to be engaged in IT and the various business groups. Different departments within the company need to identify who their super and power users are. A BI Center of Excellence needs to be established where all these users can work together along with IT to get the data that their end users need. Governed Self-Service model will work only with the right sponsorship at the executive level in the company. It will be a long road to establish this in your company, but in the end the reward is that the right data and analysis will be delivered to the decision makers at the right time!