Massively Small Data

By Covatic

The phrase, ‘data is the new oil’ could be re-stated as ‘data is the feedstock of AI’  and in the commercial world the personal data of consumers is the oil that drives business.   It is becoming increasingly apparent to everyone that the privacy of personal data is now a major mainstream issue.  The scale of personal data breaches has gone from millions to billions of users, personal data is routinely sold on the dark web and the misuse of social media data has been the subject of a parliamentary inquiry.  To this end, I would like to suggest a radical solution to the problem of data privacy.  

Data Trade

To paraphrase an oft repeated internet meme, if you're not paying for a web service, you're the product, not the customer.  In the case of search, email, messaging or social media this leads to a transaction to which we all turn a blind eye: trading our personal data in return for the convenience of a free, pervasive and personalised service. The choice presented to consumers is to accept this trade or go without the benefits of the service on offer, which is really no choice at all.

But is this trade really necessary?  Is it the only possible model for AI empowered services?  At Covatic we don’t think it is.  But before I talk about our vision, I’d like to discuss the term ‘Big Data’ and what that means in this context.  

Big Data

From a business view point, ‘Big Data’ is typically used to describe the process of deriving insight into consumer behaviour based on processing vast quantities of warehoused data with ever more sophisticate algorithms.  Some of this data is collected from third party sources but mostly it is collected directly from consumers via their web interactions. This approach generates analytic insights that are now key to driving the global digital marketing campaigns of the world’s largest companies.  

From an engineering point of view the term ‘Big Data’ describes the situation you find yourself in when all the customer data you’d collected will no longer fit on your biggest computer.  In 2004, Google published a paper describing a technique to solve this problem.  Their solution was to move the analytic algorithms to the data, rather than the data to the algorithms.   This is an important concept that I’ll return to shortly.

Big data analytics have been highly successful, but in such a competitive space there is constant pressure to deliver improved performance.    In turn this requires that AI driven algorithmic solutions have access to even greater volumes of personal data in order to deliver improved insights.  


However, with the introduction of the GDPR businesses now face a paradox : how can they feed their insight algorithms with more data while reducing the capture and storage of that data?  We believe that this paradox cannot be addressed by incremental changes to existing approaches but requires fundamental change.  

 At this point it seems reasonable to ask if big data can survive in a privacy-first world, and if not then what is the alternative?  

Our Vision

Returning to the concept I mentioned earlier, that of moving the algorithms to the data, we asked ourselves, ‘why not move it all the way’? Instead of capturing and harvesting personal information which then has to be brought to a back-end data warehouse, why not leave it where it is and run the algorithms on it in place?   If you are wondering where that data is, you’re actually carrying most of it around with you right now.  For most people the smart phone as become the central repository of their personal information.   In addition to everything we enter into mobile apps, the sensor technology integrated into modern phone hardware generates a wealth of information on our day-to-day habits and behaviour.

It’s also worth considering that the world-wide adoption of the smart phone has created the largest distributed computing platform in history. The combined computing power of the world’s smartphones now far exceeds that of the most powerful supercomputer.

Our vision is one of AI deployed to the computing edge.  With local access to secured personal data, algorithms are able to run in real-time and provide a range of insights not possible in conventional back-end analytics.  Deploying our algorithms at the edge enables customers to effectively process their own data on their own computing platform – their smart phone.    This combination of AI, smartphone computing power and inherently secure access to local data allows us to process individual customer data in-place and to derive real-time insights specifically tailored for that individual.  

This anticipation of customer needs is the core capability of our architecture.  An example application from the media space is the prediction of an individual audience member’s needs for news and entertainment during their daily commute to work.  In this situation the broadcaster doesn’t need to know where the person lives or works, their social security number or the name of their dog, they just need to know the duration of their journey and the time when suitable content should be delivered to their device for consumption, for example while they are still at home and connected to WiFi.

What does this mean for Big Data?  We believe that the future of data analytics is private-by-design edge AI, processing individual customer data.   While each data instance is small, deployed to the world’s smartphone population it would be a distributed computing platform of unparalleled size.

Welcome to the age of massively small data.