Thursday, June 25, 2009

BI and the Cloud

Wayne Eckerson from The Data Warehouse Institute has an interesting post about Implementing BI in the Cloud. He mentions that BI in the Cloud faces four constraints:

1) Customization or application fit
2) Ongoing cost of transferring data to the Cloud
3) Data Security
4) Vendor viability

Wayne wraps up his post with the following conclusion:
BI for SaaS offers a lot of promise to reduce costs and speed deployment but only for companies whose requirements are suitable to cloud-based computing. Today, these are companies that have limited or no available IT resources, little capital to spend on building compute-based or software capabilities inhouse, and whose BI applications don’t require significant, continuous transfers of data from source systems to the cloud.
I tend to agree with the following high level thoughts:

1) BI on the Cloud is not for everybody (yet)
2) Due diligence is necessary to reduce risks on data security and vendor viability

But Wayne's post raised several questions in my mind:

Integration. There are several ways to customize an application. For example, a multi-tenant architecture like Salesforce.com offers endless possibilities to customize and extend every single instance. Are these customizations unprofitable? No they are not, they are part of the application and they do not require changes to the underlying code. Can the same level of customization apply to BI? Absolutely. Wayne mentions briefly Platform as a Service but his focus is towards custom application development (although his chart shows "DW as a Service"). An intriguing approach to offer a BI platform as a service would be to setup something like MicroStrategy and configure it in a multi-tenant fashion. The underlying layer of IaaS would support the data repository while the top layer of SaaS could support ad-hoc reporting, vertical applications or full customizations on top of their API. Would this be unprofitable? Not at all. Wayne makes another good point regarding integration:
So, unless the SaaS vendor supports a broad range of integrated functional applications, it’s hard to justify purchasing any SaaS application.
But from my experience, successful enterprise wide deployments need to focus on integrating subject areas at the data level. This is an architecture and design challenge. A well integrated data repository will support integrated functional applications seamlessly. It is about the underlying data not only the application.

Ongoing Data Transfers Costs. Is this really a significant constraint? How much data does the typical Data Warehouse has to incorporate on a daily basis? The cost to transfer data to Google's App Engine is $0.10 per GB. Moving a TB a day would cost around $3,000 per month (I'm not suggesting using BigTable as a DWH repository yet). As Data Warehouse costs go, this does not seem unreasonable. Amazon is running a promotion right now that would bring that cost down to $1,000; hardly a deal braker. Latency and complexity can complicate this data transfer. This is to be expected because 99% of them were not designed with the Cloud in mind. Which brings me to my final point.

I mentioned using MicroStrategy as a BI platform on the cloud as an example to make a point. I believe that successful Cloud applications need to do more than just cloning their on-premise counterparts. They need to leverage the Cloud inherent qualities, for example elastic computing power. The nature of the Cloud can enable ongoing ETL: receive a copy of the transaction on the fly via a web hook, cleanse, transform and aggregate in real time or a few times a day at least. How about Map Reduce? I think this technique will allow to create more powerful analysis over more data, faster and easier.

Rigid applications built with yesterday's patterns will struggle to survive, in the Cloud or elsewhere. The Cloud is an open environment by definition, its openness will facilitate the integration of multiple data sources from inside and outside the corporate firewall. This integration will support a next generation of cross-functional applications. Bandwidth and storage costs continue to drop very rapidly and will cease to be a major consideration in the near future. New design principles (e.g. scale out vs. scale up) will enable more sophisticated analysis over ever larger datasets (Google analyzes over a PetaByte of data every day). With over $1B in sales Salesforce.com is the most successful SaaS provider. They host more than 55k customers, well over 1M users and every day execute more than 30M lines of customer code. If they can do it, I'm convinced the next BI leader in the Cloud will do it as well. That is how I see it.



1 comment:

  1. Hi Manuel,

    Nice writeup. Thanks for the info about the data transfer pricing. YOu're right: that and latency will become less of an issue over time.

    One clarification: To me, MSTR is a platform for building "applications" not an application itself. You use MSTR to build reports for a particular domain, like pipeline analysis. These reports in essence are applications in the BI world. SaaS BI vendors like LucidEra offered this.

    I consider other BI SaaS tools, like PivotLink (with whom i did a webinar yesterday) as platform service providers (PaaS). They let you build a BI application in the cloud for the cloud. In some respects, they are just like a BI consultancy who will help you build a data mart, create ETL maps, and create the reports. The only difference is that they do it in the cloud with their own tools.

    Is this better than doing the same thing on premise? I guess if all your data is on premise and you have an IT team and HW infrastructure in place, then building BI apps in the cloud isn't necessary. In fact, it probably will become another analytic silo, a renegade spreadmart if you will. But if you don't have IT resources, then building a BI infrastructure and applications in the cloud might make sense. Unless all your business runs in the cloud, you really need to create an architectural roadmap to show how your on premise and cloud based resources are going to work together and evolve.

    ReplyDelete

Note: Only a member of this blog may post a comment.