I wrote my initial DS attributes blog in Jan 2013 what I call DS version 1. This blog is version 2 and we are sharing our experience and reviewing what changed over the last year and a half.
Note added on Aug 14th: based on the recent ASUG poll from SAP users on HANA here is an additional item I would like to add.
1. Make sure your data scientist does not take a Technocratic solution in total isolation of business needs, business benefits or business inclusion.
2. Your data scientist must have a Business Value Architect background with extensive business interaction or you could end with a model that it technically perfect but has little business relevance.
3. Your perfect data scientist is one that can model a business benefit model, has extensive erperience modeling to meet business expectations. has extensive experience co-innovating with business users. And, has a pure business benefit focus (Cant say this often enough for it seems to simply float by the technocratic ears.
Take Chris Farrell who spent five years mining data from a giant particle accelerator and now spends his time analyzing ratings from Yelp (WSJ 8/9/14). The job title of Data Scientist barely existed three years ago and today has become the hottest platform for the hi-tech market. Retailers, banks, heavy equipment manufacturers, ministry of defense, armed forces, match-makers, Netflix, and almost everyone we talk to today want to extract the diamonds, i.e. EDGE data ERI (extraction reduction à Interpretation) from the great explosion of data-explosion that surrounds us today.
|Your big data vortex|
This explosion of data is like a Data Vortex consisting of Exabyte’s and Yottabyte’s of free flowing data, a vortex that never touches the ground, is not destructive and most humans are not even aware of it even though they are completely surrounded by it. It’s a vortex that we first need to visualize, and then place a device into its stream to get direct access to nuggets that are important to us. In our image the cloud above is the data that surrounds us and the slim vortex represents data that we need to extract and interpret into real-time decision enhancers. This Big-Data is disruptively beneficial and needs EDGE thinking for the data scientist to remotely succeed.
At a very shallow level this data-growth is being propelled by your smart phone, your likes and dislikes on Facebook or twitter, internet clicks, upstream appliances-machines and devices, downstream appliances-machines and devices, the media, blogs like this one, blogosphere and the world-wide-web itself.
According to Jonathan Goldman, who ran LinkedIn data science team, (WSJ 8/11) good data scientists are already being referred to as ‘Unicorns’ due to their extreme rarity. This is because as we mature we realize that the combination of skills required is so rare that they are worth their weight in potential value explosion (way beyond platinum).
Data scientists need to have more than technical intelligence. Ideal candidates must possess a surgical passion for more than traditional market-research skills. They need to possess a business benefit focus then use that as a foundation to identify patterns in millions of available data elements from different data sources, infer the interactive patterns prior to building statistical models that pinpoint desired triggers.
Just as an examples a biostatistician (actual business use) who earlier spent years mining medical records to identify patterns for early identification of breast cancer now writes statistical models to figure out the terms people use when they search etsy website for a new fashion they see on the street. At Square, the new e-payment application, a cognitive Ph.D scientist who built statistical models to identify triggers on how people change political affiliations today is working on identifying which customers are more inclined to have clients that will demand their money back. Another Ph.D at yelp who worked on genetic mapping today is building models to measure the effect on consumers when multiple small changes are made to online advertisements.
The key in all this big data analysis is the ability to make and measure small changes that cumulatively has very big impacts.
On the negative side Facebook and dating sites are being accused of making small changes, data manipulation and alterations, to ensure higher connections and likelihood of two people getting together in order that these consumers would use the service more frequently.
While a six figure salary is becoming rather common on the Silicon Valley anyone with ‘Data-Scientist’ skills with just two years of data experience can easily earn between $200,000 to $300,000 per year. Anyone with Data Scientist in their resume in LinkedIn can expect to get around 100 or so emails a day from potential recruiters.
This scarcity is muddied by the some potential candidates using ‘Data Scientist’ in appropriately either out of ignorance or by intent. However, once the spin is isolated we are left with pure experience and skills. Just as an example LinkedIn today has around 24,000 to 36,000 positions open for data scientists. In 2012 there were approximately 2,500 doctoral degrees awarded to biostatisticians, Statistics, Particle physics and computer science all of which are on the trajectory to finding your data scientist. Over the last year many universities have initiated programs to launch certificate and masters programs in data science to fill this new demand-supply gap. Taking it one step further, close to Stanford University a new program called Insight Data Science Fellows program takes doctoral candidates and funnels them into data science programs. This program is funded by tech companies close by and has a 100% placement record.
Five years ago statisticians and data scientists who would have gone for banking or become wall street quants now feel this new pull for their skills from companies as diverse as Airnb, Palantir, Jawbone, to Capital One Finance Group, New York times and a host of companies who plan to build applications to meet new demands in this world of micro-segmentation and B2P direct solutions. The biggest apps are the ones that are consumer facing. We all need to go out and build applications that directly affect people’s lives and make their day-to-day living a lot easier.