After working on many different BI projects, tools and teams I have come to strongly believe in the concept of a “Governed Self Service” model. In this article I will explain how to set this up technically and how to structure your BI teams (both IT and Business users) to make this model successful. Before we go into what is Governed Self Service let’s look at the other two extreme models for making BI operational in your company – 100% Self Service and 100% Governed. By analyzing the strengths and weaknesses of these models, the need for governed self-service model becomes clear.
In any business intelligence/analytics model there are several different applications and people. I want to make sure the definitions are clear before I delve into the various operating model for BI. These are most commonly occurring roles in different companies. I’m sure there are some minor variations in different companies.
- Source Systems – These are the source systems that essentially create the data to be analyzed – ERP (Oracle/SAP/Netsuite), CRM (Salesforce, Siebel), Google Analytics, Marketing (Eloqua, Marketo and others), supply chain and plenty of other custom operational and transactional systems.
- Data Warehouse – This is the curated predefined schema designed, loaded and maintained by IT developers in a database of choice – Oracle, SQL Server, Teradata etc..
- IT Developer – ETL/BI tool – This is the traditional IT role where they have database logins to source systems and from these systems, load the warehouse periodically.
- Super User – These users reside in the business and understand the data very well and are technical. They interact with IT and work with power users and end users.
- Power User/data scientists – These users work very closely with the end users (CFO, VP of Sales) and use the reports provided by the super users to further analyze and create stories out of the data.
- End User/Consumer – These are the end customers of the data and analysis and they and their teams take business actions out of the data.
Now, let’s get back to the 100% self service and 100% governed models.
100% Self Service Model
In simplistic terms this model will operate like this – Business users get direct database access to the on-premise source systems like ERP database or get “extract all” access to cloud systems like Salesforce. Then they take this data and create a data mart on a small server and then they mash up the data. These department users would have bought Tableau or Power BI licenses and they can wrangle the data and create dashboards and insights out of this data mart.
The advantages of this approach are –
- Nimble – This is very nimble as it is done by the super users and they know the data and requirements well and can create a quick and dirty solution fast
- Cheap – They can circumvent the enterprise IT processes and rigor which tend to add cost to the implementation.
The disadvantages of this approaches are plenty –
- Very people dependent – These solutions tend to stick for a long time and the person who built this solution is stuck doing this for longer than they want. If they leave then the department struggles to maintain the solution and go back to IT to help maintain and enhance this solution.
- Security – The direct database logins to back end transaction databases leads to big security risks. In addition, these small business owned databases and systems don’t get patched regularly and pose security risks for the company.
- Data Discrepancies – Since there could different interpretation of the same data set, it’s very common to see 2 different answers for the same question when built in different platforms.
Self-service systems are very useful when the data is new and not understood yet. It needs further data discovery and analysis from business users to understand which part of the data is useful for the organization. Once the useful data is identified and it is determined that the analysis of this data is needed for the long term, then a new long term model for this data is needed. It is too dangerous to leave this in the hands of a business super user and the data sitting in a server under their desk.
100% Governed Model
This is the opposite of the 100% self-service model. Here all the requirements are fed to a central IT team and they build the ETL and data pipelines from the data sources into a data model built into a central data warehouse. Then IT creates a semantic layer in a BI system like OBIEE or Business Objects and then build reports and dashboards on this.
In this model, the business team is primarily responsible for providing requirements, testing and consuming the reports and dashboards that IT produces.
The advantages of this approach are –
- Very tight governance and security
- Centrally built and delivered data – everybody gets the same trusted data
The disadvantages of this approach are –
- Slow delivery cycles – IT controls everything and they always have competing priorities. Any new project must go through the rigor of development, test and release cycles. Even small changes might take months to implement. Business cannot wait for data that long anymore.
- Knowledge/Know how is underutilized – There are a lot of business people with decent technical knowledge and very good knowledge of the data. Their capabilities are not used in this model.
This kind of 100% governed model is on the decline because of the volume and velocity of data these days. Decision makers cannot wait months for data to do their analysis.
Hence, most companies are going down the 100% self-service model bypassing IT. There is a better way to do this – Governed Self- Service.
Pic 1: Steps involved in the Governed Self-Service model
In the Governed Self Service model, there are distinct parts of the analytics workflow and distinct roles for each of the user groups in the process.
In this model, based on business needs the IT team will build the data pipelines and ETL from the various source systems to the data warehouse or data lake. On top of this data warehouse in the business intelligence system they will build the semantic data model along with the help of the business super user. To build the joins, dimensions and metrics in the semantic layer IT developers need the help of the super users who understand the data and the business requirements.
Super User’s role and the concept of the “Master Dataset”
In addition to helping IT build the semantic data, the super user will be primarily building the master datasets for consumption by the reports and dashboards. The master dataset is a crucial component of the governed self-service model. This dataset hides the complexity of the various joins and where the data came from. It will have all the attributes and metrics for a specific business need. For example, the master data set will have all the attributes, metrics and calculated formula fields necessary for revenue analysis in a company. For this analysis to be complete the dataset needs to have the time, product, customer, order, pricing, quantity, currency, and various other information. The super user’s role is to build this master dataset which makes it easy for the power users to further analyze the data for their end users. The master dataset is validated for data accuracy by the super users and is deemed as a trusted dataset for the power users. The master dataset can be a cached for high performance in-memory or a cube format.
Power User’s Role
The power user is less technical than the super user and they normally don’t understand the intricacies of the data model behind the master data set. They are the analysts that the executives and decision makers go to get their data from. They could also be data scientists who need the data prepared and ready for analysis. The power users will need to go the super users to get their master dataset ready for analysis.
Once they have the master dataset, the power users will be empowered to analyze, slice and dice the data the way they want. They can do this in Excel or in the BI tool of their choice like Tableau, Qlik or Power BI. They can build story boards, pivot the data or mash it up with other local data sources to provide the analysis that their end users need.
End User’s Role
The end user is the consumer of the data and make decisions based on the analysis provided to them. They could also learn the tools and with the help of the power users can do a lot of their own slicing and dicing of the master data. Their primary role in the governed self-service model is to provide accurate and timely requirements to the power users as to what data and analysis they need to do their job. Power Users in turn work with the super users who prepare the master dataset if the data is already available in the semantic model if not they will work with IT to extract and prepare the data as needed.
It’s clear that for the governed self-service model to work the right people need to be engaged in IT and the various business groups. Different departments within the company need to identify who their super and power users are. A BI Center of Excellence needs to be established where all these users can work together along with IT to get the data that their end users need. Governed Self-Service model will work only with the right sponsorship at the executive level in the company. It will be a long road to establish this in your company, but in the end the reward is that the right data and analysis will be delivered to the decision makers at the right time!