What if anyone could do data prep?
What if everything you ever wanted to do in terms of enriching, transforming, merging (or in current Gartner parlance ‘wrangling’) disparately sourced data could be distilled down to just 7 functions? Is this an optimistic question? Well when you survey the established players in the Data Wrangling space, it does unfortunately appear so – or so it seems at least.
A couple of leading Data Wrangling vendors use what I call a ‘node canvas’ approach to query assembly in the UI – that is the user establishes the ‘data flow’ from multiple inputs to multiple outputs with a series of enriching/transforming operations all via a pretty intuitive drag and drop oriented methodology. I’ve been struck however at just how technical these products are with vast nested libraries of data treatment operators to get to know – all very powerful I’m sure but it sure requires a technical persona to operate.
What’s also noteworthy is that there is such interest in this space; Gartner has it pegged as a key emerging tech trend and is most likely driven (in my view) by the clear ‘data wrangling deficit’ that is apparent in leading self-serve BI products. This has led to a gap of opportunity take disparate data, ‘wrangle’ it to deduce end user oriented outputs for consumption in self-service BI – however the current leading products are technical in nature and this doubtless increases the design, deployment, support and training burden when implemented.
So I ask again – What if everything you ever need to do in Data Wrangling could be distilled down to just these 7 functions for use on your node canvas?
- Cross tab
- Normalise (pivot)
- Calculated fields
I’ve been working full time in the data wrangling space for over a decade now and thinking back over all those queries I created – as well as those I’ve seen from colleagues, customers and partners – I can’t think of anything we couldn’t do using only the above.
To me it makes sense – There is real scope to simplify this space and this will bring new non-technical users to the discipline – end users that understand their own data. There is an opportunity to establish a whole new usability paradigm within the data wrangling space.
This is just my take on it though and you may disagree – if you do, feel free to let me know what crucial component is missed out.