• January 18, 2019

What if anyone could do data prep?

Data Wrangle

What if everything you ever wanted to do in terms of enriching, transforming, merging (or in current Gartner parlance ‘wrangling’) disparately sourced data could be distilled down to just 7 functions? Is this an optimistic question? Well when you survey the established players in the Data Wrangling space, it does unfortunately appear so – or so it seems at least.

A couple of leading Data Wrangling vendors use what I call a ‘node canvas’ approach to query assembly in the UI – that is the user establishes the ‘data flow’ from multiple inputs to multiple outputs with a series of enriching/transforming operations all via a pretty intuitive drag and drop oriented methodology. I’ve been struck however at just how technical these products are with vast nested libraries of data treatment operators to get to know – all very powerful I’m sure but it sure requires a technical persona to operate.

What’s also noteworthy is that there is such interest in this space; Gartner has it pegged as a key emerging tech trend and is most likely driven (in my view) by the clear ‘data wrangling deficit’ that is apparent in leading self-serve BI products. This has led to a gap of opportunity take disparate data, ‘wrangle’ it to deduce end user oriented outputs for consumption in self-service BI – however the current leading products are technical in nature and this doubtless increases the design, deployment, support and training burden when implemented.

So I ask again – What if everything you ever need to do in Data Wrangling could be distilled down to just these 7 functions for use on your node canvas?

  • Sort/Group/Sum
  • Filter
  • Join
  • Append
  • Cross tab
  • Normalise (pivot)
  • Calculated fields

I’ve been working full time in the data wrangling space for over a decade now and thinking back over all those queries I created – as well as those I’ve seen from colleagues, customers and partners – I can’t think of anything we couldn’t do using only the above.

To me it makes sense – There is real scope to simplify this space and this will bring new non-technical users to the discipline – end users that understand their own data. There is an opportunity to establish a whole new usability paradigm within the data wrangling space.

This is just my take on it though and you may disagree – if you do, feel free to let me know what crucial component is missed out.

Alan Brown 2 Posts

Alan is a Product Manager in the Rocket CorVu lab.

4 Comments

  • Steve Hitchman Reply

    October 22, 2014 at 8:16 pm

    Sounds like Alteryx

  • Scott Reply

    October 23, 2014 at 2:13 am

    In order to be truly effective/efficient, you’ll want a formula tool in there as well for chopping/parsing concatenated fields, which are some common in data extracts.

  • Cari Reply

    October 24, 2014 at 3:47 pm

    I think you are on to something here – and within a broad term like “wrangling,” the data quality piece can be categorized really nicely into the functions you listed (I would add finding/removing blanks/null, filling down/bulk editing, plus a few shaping things like de-pivoting, etc.) as long as the basic concept is that the machine is helping identify where quality issues are, without needing a human eye or lots of scripting to do the discovery.

    That said, the data prep category is so much more than just the quality piece. It should include the ability to reuse data prep steps on new data sets, as well as an “audit” of what was done so we don’t fight over how the data was prep’d. It should also allow people to work together on a data prep project. Many times sales/marketing ops prepare data differently and then get different visualization/analytic results because of some step taken in the data prep process.

    Finally – and this is a biggie: the data prep process should assume that you never have all the data in one place. You need machine-enhanced data integration and enrichment to bring together as many data sets as needed without having a human requirement to understand how best to do the joins or combines.

  • Announcing Rocket Discover: Business Intelligence for business users Reply

    April 30, 2015 at 1:31 pm

    […] transformation, merge operations to just seven functions. To learn more, read Alan Brown’s post “What if anyone could do data prep?” for the […]

Leave a Comment

Your email address will not be published. Required fields are marked *