Jun 4, 2015

7 Tips to Prevent Meltdown of Big-Data Projects





Big data projects need to commence on a foundation of Business Benefit and Business Value. More than one Big Data project has commenced as a technology installation, followed by collecting all the data, and then proceeding the first step - lets see what we can extract out of this baby. Our motto with SAP HANA and with all Big-Data projects is 'These are strategic Business Benefit programs and not simply a technical install'. Here are 7 tips on who to avoid some of the pitfalls in Big-Data projects.

  1. Security: Pause to plan for security before jumping headlong into Big Data and IoT projects.
  2. Business Value Focus: The same data can provide different data and unless you know what the end result needs to be you might as well be collecting grains of sand on a beach. If your business needs a 4 seater aircraft don’t try to acquire and build yourself a Dreamliner hanger. Lack of clear business benefit focus is the #1 reason Big-Data projects tend to go south.
  3. Different sources different quality: As we source data from heterogeneous systems we need to have a scientific master and transactional data harmonization and transformation. Include time granularity into this equation, i.e. monthly from one source and hourly from another.
  4. Data Quality & Cleansing: When data is streaming t differential velocities, volumes and varieties and where real-time transformation is being undertaken in process a small error can produce massively different results
  5. Customizing and Algorithms: Big data needs patterns and patterns come from algorithms. Take the famous LDA (Latent Dirichlet Allocation) used to identify unrelated texts. According to reports it is only 90% accurate and its repeat accuracy is only around 80%. The reason for that is its simplicity. The takeaway is that we must know about this data discrepancy going in.
  6. Data Models and flows differ: If we give the same big-data to 5 data developers and ask them a single question we will most probably get five different answers. Like Mother Nature data is a continuum.  Some of the data is very precise other is human quantification of processes and events. After that we filter the data, model it, grid it and then apply special subsets to analyze the data. This area goes right back to point #2 as the fulcrum of Big-Data analytics- what is it you are looking for – the surgical focus on defining and delivering business benefit.
  7. Simplicity vs. Complexity:  The goal is Simplicity of data access, which must often be achieved by complexities in the underlying design. Leonardo Da Vinci is reported to have stated that “Under everything that looks simple, there is an underlying web of complexities”. Models too can be simple or complex. When models are too complex they can create unnecessary noise, when too simple they can miss critical data points.  Choosing the right model is a one of the biggest challenges.  The big thing in Big-Data is the sheer volume of data. So encourage different models, algorithms, outcomes and views. Get out of the self-blindness zone. You need your assumptions to fail to find the truth. If everything looks right the first time check again.

No comments:

Post a Comment