The Fourth V- Veracity

"There are three kinds of lies: lies, damned lies and statistics." - Mark Twain

Various vendors verbalize the four Vs of data bigness, Volume, Velocity, Variety, and Veracity. It seems to be in vogue to add V's of our own to the conversation. Let's really get to know the last one-- Veracity.

According to a recent KPMG survey of 400 of the top executives, two out of three CEOs do not have a high level of trust in the accuracy of their data and analytics. That same study also reports that CEOs named data and analytics as a top three investment priority for the next three years. It is ironic that something that is such a priority is also not trusted.

That's typically related to the source data itself. But even assuming that the source data is true and accurate, how do we know that the reports we are getting are truly accurate?

Trust but Verify.

The only real way to validate that the reports you receive are accurate is to look for yourself. As the typical executive is not well-versed in SQL, that implies there needs to be a way to view the data directly with a user interface tailored to the enterprise viewer.

There are tools in the visualization space like QlikView and Tableau that help managers to see the data, but the data is often in formats designed by engineers and for engineers. It becomes a large knowledge transfer task whenever someone wants to check some questions for themselves. Data quality is directly proportional to the ability to explore the data independently. Because data is not annotated with descriptions and documentations, with simply cryptic field names, it is impenetrable to all but the schema creators, further reinforcing silos.

Many organizations record data, but making the data intelligible is even more important. Data that isn't intelligible is just a cost center, and much like code, data that is difficult to understand is often harder to make sure is free of bugs and can be trusted.

Open source code is often more reliable and bug free because of the many eyeballs looking at it. The same arguments apply to data. It is hard to trust insights and answers from data when the data team is the single point of contact (and the single point of failure). We believe making data more accessible to the whole organization is the only way you can begin to trust data and solve veracity.

The answer to Veracity is Visibility.

Ryan Braley

Founder and Chief Architect at Designed the world's first distributed genetic algorithm in the cloud using Hadoop, built neural interface, behavioral analytics pioneer.

comments powered by Disqus