Jan 13, 2013

Data Scientist- Critical Attributes for BI - 1 (ver Jan 2013)

I have received many email on this topic and met some who claimed to be data scientists.

Here is an executive brief of what is takes to make one in the SAP BI environment.


Just like IT, BI and HANA - Data scientist is not a technical only skill. Data scientist keeps 'Meet Business Expectations as the center of this universe'
Read the independent statements in level 0.4 and get aligned in your thinking. This singularity could save your company millions of dollars.

Here is a brief of what all you need to know / understand to become one, or claim to be one..

LEVEL 0

1. Understanding that Business Intelligence has been and is about two factors. The first being business and the second intelligence.

2. Resonating with Gartners 2010 comment ' without business in business intelligence, BI is dead'

3. Working on a 'Business First and Business Last' focus in all things business

4. Knowing that
   4.1 'Less that 50% of BI projects will meet business expectations' Gartner in 2003
   4.2 'Less that 30% of BI projects will meet business expectations by 2014' Gartner 2012
   4.3 '98% of BI projects declare success in week 1 of their go-live, yet less than 50% remain
         so by week 10' BI Valuenomics 2010

LEVEL A
1. Conceptual: As a data science it is critical that your conceptual understanding of Business Intelligence is solid not only from a technology standpoint but also from a business side of BI
2. Technical: It is critical to understand the technology that one is dealing with along with its capabilities and limitations. It is also critical to understand alternatives for the business need
3. Judgmental: Understand Data Architecture, TDQM, and the data impact from metadata to maser data. Understand the various statistical methodologies and the ability to build algorithms.

LEVEL B

SAP BI: Understand the fundamental of SAP BI. This includes deep understanding of SAP BW, BW Accelerator, BusinessObjects, BO Explorer and the recent HANA. It also includes understanding how each of these applications manages data along with strengths and weaknesses of each application area.
Global BI Architecture: Understanding Data Marts, Enterprise Data Warehouses with a singular purpose of building global FEDW environments (Federated Enterprise DW's) that allow global standards and 100% local independence.
Automated Modeling optimization: Just like most of us cannot put together a Rubik’s Cube in any optimized time we need to accept that we certainly cannot model a cube with 10 dimensions and 60 characters under any circumstance. Use automated modeling tools
Data Flow optimization: A clear understanding in the impact of differential data flows and the maintenance of Data Quality within each data element and the impact it has on the entire data warehouse.

LEVEL C

Basic Understanding:
0. Matching Algorithms: This won the 2012 Nobel Prize. Stable matching, Optimal pairing, Incentive compatibility, one and two sided matching, medical markets, experimental evidence, market designs..
1. Mathematical basics like an understanding of exponentials, logs, distribution types, continuous and random variations of data sources and elements
2. Econometrics and Modeling: The economics of language based system commands, descriptive statistics, Brownian movements in data, ARCH/GARC modeling, Monte Carlo Simulations, Auto regressive modeling, etc
3. Mean variance optimization: Quadratic optimization, Tracing out efficient frontiers, Covariance or combinations of portfolios, and other portfolio analytics
4. Textual data management: Extracting information from news and blogs; framework of textual data management, word count classifiers, vector distance classifiers, confusion matrix, accuracy, etc..
5. Bayesian modeling: joint probability administration, correlated default applications, Bayes net, Accounting fraud, etc.
6. Predictive modeling: Predicting growth in markets, product and services, Bass modeling, Peak growth calibration, artificial intelligence algorithms, organic growth modeling, etc.
7. Large data extractions: Discriminative analysis, Eigen systems, Factor analysis
8. Auctions financial models: Auctions methodology, Theory of auctions, Auction and bidder types, Optimization of bids, Discriminating pricing, Collusions in auctions, Advertising by auctions, Next price auctions, etc.
9. Network financial modeling: Graph theory, Strongly connected components, Shortest path algorithms, VC Web, Centrality, etc..
10. Financial Neural networks: Non linear regression, Perceptions, Squashing functions, Feedback/backward propagations, neural nets, etc
11. Mathematical Speculation: Gambling, Odds, Edge, Book makers, Kelly criterion, Entropy analysis, Casino games, day trading
12. Cluster analysis and Prediction trees: K-mean clustering, Hierarchical clustering, Prediction classification and regression trees, etc
13. Storage and speed in big data: Distributed computing from Hadoop, Map-reduce concepts, Parallel processing engines, Prototyping, advanced language usage,
14. Misc: Dynamic programming, Fourier analysis, artificial intelligence clustering, stable matching, optimal pairing, Incentive compatibility,

So welcome to the world of the data scientist in the new world of too much data and very little information.



17 comments:

  1. Excellent Hari!! Should we not consider the other needed traits are knowledge over Database Optimization techniques, expertise on Data Architectures too are important aspects of data Scientist?

    Shyam

    ReplyDelete