Watch YouTube to learn your way…

It is not unusual for me to turn to YouTube to learn to do some specific task, be it the upgrading of RAM on my Mac or a food recipe. And I bet that goes for you as well.
What is more unusual is to imagine a robot accessing YouTube to learn its way in doing things.  On the other hand we have so much knowledge in the Web that it just makes sense to let robots tap on it and learn by themselves.
We have seen robots built to be capable of learning, one of the first example is probably Baxter, by being shown how to do a specific task. But so far there has always been a person showing it the way (by demonstrating how to do or by programming it).
RoboBrain made the first leap in learning by accessing the Web to assimilate "concepts". So far it has downloaded one billion images, 120,000 YouTube clips and 100 million how-to-do documents and manuals (finally someone is reading instruction manuals!)
Now, a team of researchers at the University of Maryland in cooperation with a team at NICTA, Australia, use the latest advances in "deep neural networks" to ease the analyses of the materials that can be seen on YouTube.  This is much more tricky than it might seem at first glance. Just because you don’t even have to think when you see a person cracking the egg shell by hitting it on the edge of a bowl it does not mean that seeing it by a robot it leads to a straightforward interpretation. It could be the way that person is holding the egg, it might even be that oval shaped objects break their outer shell when touching a surface. Of course to us these interpretations are pure nonsense, and we won’t even consider them, but that is because we have been trained by life experience as we grew up… and that is not the case for a robot!
The first step, of course, is to recognise the various objects, then to recognise the action and the way an action involves several objects (the hand, the arm, the edge of the bowl and the egg). This image recognition is achieved through deep neural networks. Then the robot needs to extract a meaning and then accumulate that knowledge: that is not the end of it. Once you have the knowledge you need to learn when to apply it… So many hierarchies in what seems trivial to us.
I found fascinating to observe how much insight we are gaining on our thinking processes by attempting to have a robot performing what seems to be a very simple action, like picking up an egg and opening it up to use it in a recipe….

About Roberto Saracco

Roberto Saracco fell in love with technology and its implications long time ago. His background is in math and computer science. Until April 2017 he led the EIT Digital Italian Node and then was head of the Industrial Doctoral School of EIT Digital up to September 2018. Previously, up to December 2011 he was the Director of the Telecom Italia Future Centre in Venice, looking at the interplay of technology evolution, economics and society. At the turn of the century he led a World Bank-Infodev project to stimulate entrepreneurship in Latin America. He is a senior member of IEEE where he leads the New Initiative Committee and co-chairs the Digital Reality Initiative. He is a member of the IEEE in 2050 Ad Hoc Committee. He teaches a Master course on Technology Forecasting and Market impact at the University of Trento. He has published over 100 papers in journals and magazines and 14 books.