BI on the fly – coping with dirty, ill-prepared data

There’s a classic problem that almost every business intelligence (BI) project runs into at one stage or another: Either the data is in a bad way or it hasn’t been prepared for presentation at the BI layer.

An integrator might be looking to do a proof of concept of a week, only to spend the first two days getting at data that is locked up behind firewalls or assessing data preparation tools before getting around to running queries.

Taking you as you are

For many, it’s a major inhibitor. But the right partner won’t expect you to doll up your data for the journey when it can easily be readied along the way. In other words, rather than expect customers to prepare data in a separate process prior to analysis, BI providers should offer data preparation as an integral part of the analytic platform.

How does that work, and why is it better?

The old way

Most BI platforms and tools require source data to be fixed and presented in a specific format for reporting and analysis.

Called data preparation, this stage of pre-analysis involves extracting data from sources such as enterprise resource planning applications, transforming it into the desired format and loading it into cubes, models or other staging environments.

A key challenge with this is that it is a disjointed process. Typically, data preparation and analytics happen in separate software applications, making it a slow, hard and expensive undertaking.

Integration overcomes inefficiency

By performing data preparation in a virtual environment, integrated into the analytic platform, the data doesn’t have to move, which overcomes delays in data provisioning to data analysts. Rules are set within the BI platform and executed automatically when running the query.

In addition, data preparation in a virtual environment uses the computing power of the server in which the data is stored. This approach reduces the cost and complexity of managing data migration processes. It also leverages rapidly scalable computing speed at lower cost.

Lastly under the banner of efficiency, integrating data preparation in the BI tool allows organisations to realise immediate value, questionable data or not. Presenting data straight from the source’s mouth, as it were, gives organisations the opportunity to highlight where problems may exist. Users can simultaneously analyse and clean data, which means BI projects don’t need to be hamstrung by dodgy data.

Integration improves governance

Disjointed data preparation processes also suffer from poor data governance, since a measure of self-service is involved.

Usually, it leads to a proliferation of ad hoc data sets across the organisation (for each analytics project), increasing the cost, risk and reliability of data preparation, and decreasing capacity for scalability, security and governance. But because data doesn’t leave the source application with integrated virtualised data preparation, IT can maintain data governance and control, and data analysts can integrate more data sources directly into the BI environment in less time. An important technical requirement to achieve this is the presence of a comprehensive metadata layer in the BI platform, and execution of the data preparation process at the metadata level.

With this in place, any changes made will be uniformly reflected across all content based on that metadata layer throughout the enterprise – from reports and charts to dashboards.


Going from data preparation to decision-making all in one solution allows organisations to maintain data lineage, visibility and control.

The upshot is greater organisational trust in the validity of data and accuracy of its decision-making.

0 views0 comments