Data Smashing

The availability of huge data streams is pushing researchers to find innovative ways to analyse their content, compare it and derive meaning. This is not easy at all.
So far data mining approaches are based on a hunch from a human being that is then coded into an algorithm to see if it is true. Data are analysed to prove or disprove a certain hypotheses. As an example you may want to see if the pattern of buying a product is related to some ads broadcasted on different media and in different places. This works pretty well and many market researches are based on such algorithms. 
Same goes for the effectiveness of certain drugs combination in fighting a disease or the capturing of the first signs of an epidemic.  
However, we don’t have algorithms that can point out something they were not programmed to look for. They miss the serendipity of our eyes (and brain).
This is what researchers at Cornell and University of Chicago set out to change.
They have proposed in a nice paper to use a new technique, they call it data smashing, for discovering unexpected information.
The approach is based on the idea that if you have recurrence of data in different streams you can annihilate them and what you are left with are actual new information. By applying data smashing along with the classical analyses used in big data (where the point is to look for patterns) you get both the similarities and the differences. Out of this patterns may emerge.
They have tested their hypotheses on a number of data streams and indeed they have been able to discover patterns that were not coded in the algorithms. The figure presents three examples.
We can expect in the coming years that radically new approaches to mine huge data sets will be discovered (invented). Some of these may well derive from advances in our understanding of the serendipity that so often seems to guide our discoveries when we just look around….

About Roberto Saracco

Roberto Saracco fell in love with technology and its implications long time ago. His background is in math and computer science. Until April 2017 he led the EIT Digital Italian Node and then was head of the Industrial Doctoral School of EIT Digital up to September 2018. Previously, up to December 2011 he was the Director of the Telecom Italia Future Centre in Venice, looking at the interplay of technology evolution, economics and society. At the turn of the century he led a World Bank-Infodev project to stimulate entrepreneurship in Latin America. He is a senior member of IEEE where he leads the New Initiative Committee and co-chairs the Digital Reality Initiative. He is a member of the IEEE in 2050 Ad Hoc Committee. He teaches a Master course on Technology Forecasting and Market impact at the University of Trento. He has published over 100 papers in journals and magazines and 14 books.