• June 25, 2022

There’s No Such Thing as Unstructured Data

Unstructured Data often refers to information that either does not have a pre-defined data model or that lacks the structure to make it easy for traditional software applications to access and understand.  Text-heavy documents such as journals and books are typical examples of unstructured data.  Prevailing wisdom says that 80 to 90 percent of the world’s data is unstructured.

My assertion is that all data has structure.  Just because the data is not stored in fields or columns in a relational database and accessed by Structured Query Language (SQL) popularized over 40 years ago by relational database vendors, does not mean that the data lacks structure.  In fact, data that lacks the rigid structure dictated by relational database architectures is more flexible.  Tagging of data using markup languages like XML and HTML provide enough information for modern software applications to read and understand whatever a user needs.  Whatever meaning is lost can be recovered through search engines that index non-relational data types using semantic processors to determine meaning through sophisticated models.

All data has structure and there are new standards and technologies that are liberating our software developers to create amazing new applications.  JSON or JavaScript Object Notation, is a relatively new open standard format that uses human-readable text to transmit data objects consisting of attribute–value pairs. JSON is used primarily to transmit data between a server and a web application, as an alternative to XML.  JSON has been popularized by web services developed utilizing REST principles.  JSON and REST are used by virtually all modern application developers that want flexible architectures with little overhead compared to more structured formats such as XML.

Lastly search engines are getting very sophisticated.  A business user or a consumer expects to type in any sequence of words that are meaningful to them and find exactly the information they are looking for.  Search engine vendors are building technology that scales to multi petabytes and that can easily connect to virtually any data source regardless of the structure (or lack thereof) of the data being discovered.

So the next time you hear the words “Unstructured Data,” politely say “There’s no such thing.”

Bob Potter 5 Posts

Bob Potter is Senior Vice President and General Manager of the Business Information and Analytics Business Unit at Rocket.


