It is so easy for us to look around, see what is out there and make sense out of it. This happens every single moment, and we are able to make sense also of things we are seeing for the very first time (well, 99.99% of the time!).
Our brain is also very good in creating abstractions and use those abstractions to identify new “things” that would fit that abstraction (by the way, this is why we look at clouds and “see” a face, a dog, a hammer: the brain find in the abstraction of that cloud a match with the abstraction of a face….).
This is not so for computers. They can very well distinguish two objects because one is one mm larger than the other or the color hue is a tad different (something our brain would not notice -that’s why we play “find the 7 differences…”).
Not so anymore! A team of researchers at CSAIL, MIT, has equipped a robot with a computer vision system with the capability to self learn objects characteristics, create abstractions and use them to identify new objects.
The trick is done by … DON: Dense Object Nets. When looking at an object DON creates a dense representation made of points in a tri-dimensional space. These points create a model that will be used to abstract some visual characteristics. As you can see in the clip below, DON is able to form the abstract concept of “shoe” and of the tongue that each shoe has. You can place very different kinds of shoes but since DON has got the meaning of “shoe” it will recognise any kind of shoe, look for its tongue and pick it up if asked to do so. Since the model abstraction is tri-dimensional the robot can recognise a shoe even if it is upside down or laying on a side.
This is a very important evolution, on the path of creating systems that can become autonomous, self teaching. There are many further steps needed but at least this one has been taken.