What makes a Data scientist is part 1 of this discussion. Also discussed are critical attributes of a data scientist in part 2. This is the third blog on this topic as we learn more and mature in our understanding of what works and what to avoid.
Most corporations continue to be in
a mode of articles on the hunt for good
data scientists- the modern unicorns’ of Big-Data. Some people have not heard
of them, others claim to have read about them in articles, other seems to have
seen them at google, LinkedIn and Facebook; while some have worked with them.
Either way the internet is filled with ‘Seeking Data Scientist’ discussions.
The reason for this is that data science is the most sought for skill in the
new digital economy driven by data where companies like Google, and Amazon are
the new digital alchemists – turning data into gold. Alchemy today is no longer
a Chemical concept but a digital reality.
One of the biggest changes we have recently
seen in LinkedIn profiles is the addition of the word ‘Data Scientist’. So
suddenly we are finding a bunch of members who overnight morphed into data
scientists by adding data scientist to their profile. It’s simply a
demand-supply reality. As current demand is exceptionally high, with salaries
anywhere from 30% to 100% higher than the normal- it automatically becomes
beneficial to attach the tag of ‘data scientist’ to your profile in order to get
a bunch of email from the other side of the world asking if you are looking for
a job- these from more recruiters that too may not understand the basics of
data science. So the first task is to filter out the Nuevo-data scientists from
the true ones. Data scientists are thus not only becoming more and more rare
but also mysterious creatures on the periphery of our needs. Also because
everyone wants to compete in the new data science economy and mine their own
gold so this is a rarified but still a crowded arena where employers have to
tread very carefully.
Very much like in the world of BI in
the past Big data has exactly the same issues. In BI we had the technical folks
who still believe that technology can answer all the questions. Gartner places
their methodology success at 30%. We also had our business value architects who
sought business benefits above all- recommended by Gartner since 2009.
Similarly, Data Scientists also come in two varieties.
The first and most common variety
are the technocrats - wizards of statistics, math’s and big data
experience as described in my earlier blog on the critical attributes of a data scientist, but with little actual experience of business or their needs. This group
understands math’s, statistics and data but still need experience in applied
sciences. They currently lack the business benefit and end user UI parts of
data science.
The second is the
Business Solution architect – people who understand business and business needs
along with BI data science and Big data technology. They become the critical
glue between (a) the business stakeholders and their needs plus benefits and (b)
the technical data scientists who are akin to the BI developers of the past.
People who can build all the algorithms and coding but without adequate
experience turning data into actionable gold.
Big data projects are like cavernous
black holes with so much data that traditional modelers could easily hyper
ventilate just grasping the magnitude of the Volume, Variety and Velocity of
these domains. While traditional SAP BW developers are used to a few terabytes
of data, where the largest SAP BW I have personally worked with was a 107 TB
single instance- the world of big data dwells in the zettabytes and petabytes.
The NSA datacenter is starting to use words like Yottabytes today. This is like
comparing a tablespoon of water with a swimming pool. In this environment the
data scientist is envisioned as a wizard of the data mists. Where everybody
sees a humongous, senseless, misty cloud of data the data scientist can use
their algorithmic wand combines it with statistical chants and presto the mist
clears and you get to glimpse of true business benefits. Without a data
scientist we could be lost in the misty black hole of big-data for ever.
However, without a Business Value Architect your projects could produce a lot
of islands of information that your business does not need, 70% by Gartner’s
reports.
Over the last three years of having
worked with a lot of data scientists we have come up with a couple of
definitions of the data scientists. The first
is the Silicone valley Data Scientist: this is where some of the world’s
greatest data scientists are reputed to exist in companies like Palantir, Google,
Facebook, Target, etc. The second is
the technocratic data scientist: these are analysts with degrees in math’s and
statistics and all the algorithmic knowledge under their belt. The third is the Business Benefit data
scientist: these, in our minds, are the true unicorns in our findings. They are
people who have a solid BI, PKI’s and metrics knowledge, but more importantly
have dealt with very large volumes of data and understand the business benefit
side of data optimization more than the technology variants. They are the data
scientists who sit at the cusp of the group two and business stakeholders. Our
research indicates that data scientists and technology leaders alone cannot
deliver high business value, and neither can the Business Benefit experts. This
is time for co-innovation and teaming to consistently deliver exceptional value
in the form of Digital gold to your information consumers.
When we do a search on what the
world is looking we see the following rankings in a word cloud
This word cloud is made from
analyzing the recent job postings and what companies are looking for in data
scientists. What I see clearly missing, though we see a glimpse of that on the
top, is Business Benefits. Just like I have been writing since 2009 that there
are two ways to build BI solutions- [1] the technocratic way with a firm belief
that technology alone can answer all business questions; and [2] business
benefit focused architecture and design leveraging internal company resources
who know the needs, the data and the business relevance to research, discover,
model, filter, research and analyze.
There are far more global resources
who know Hadoop, ABAP, SQL, Python, than there are people who understand your
enterprise and company’s business. There are a few people who have worked on
the Business Benefit side of BI. There are fewer true data scientists and they
still remain a rarity. The key is in
teaming your people who understand your business goals, handle big data
volumes, and can assist build your enterprise competitive differentiators. Your
true gold from Data.
So what we are left with is
co-innovation. Find partners that can build the critical glue towards strategic
success by creating a team of your internal business stakeholders, combined
with technical Business Value experts and the IT technical resources that have
precise skills to fit the roles identified to meet business expectations. These
team members with diverse skills will inspire and enrich the overall
capabilities of the team – towards discovery, design, realization and bringing
new capabilities and insights that bring true business benefits and insights
into the enterprise decision machine.
What we say to SAP HANA customers is the
following
- The strategic goal of HANA is big-data analytics plan accordingly
- HANA is a business solution and not just another SAP technical install
- Without business in business intelligence, BI is dead (Gartner 2010)
- Do not start your HANA journey without a professional ‘Road to HANA’ workshop
What we say to Big Data customers is
the following (the strategic goal of all
HANA customers)
- Big Data is a business solution and not just another technical install
- Plan your work and only then work your plan
- Do not start your Big-Data journey without first identifying business goals and benefits. Think Security very carefully
- Design your Tactical, Mid-Term and Strategic Goals, then align every step to the long term goal