Seeing is straightforward. You just need to take a look and your eyes will report to your brain what’s “outside”. Actually, it is a tad more complicated that this. If you have look at the photo you have probably identified something green that looks like a toothbrush but you might have missed the big toothbrush stuck on the wall at the left of the photo.
The reason is that your eyes take a glimpse and communicate a bare sketch of what they saw to your brain. It is your brain that actually “sees” and “identifies” object and the identification is done through reasoning. Where do you expect to find a toothbrush? How big is a toothbrush? The answer to these questions are based on experience and the brain “orders” the eyes to take a second look specifically pointing at those areas where it is most likely a toothbrush can be found.
This does not happen to computers. The deep neural networks that are today used in image detection and processing are not as “sophisticated” as our brain, and because of that they don’t fall into traps that are misguiding us.
A study has been carried out by a team of psychologists and brain scientists at the University of Santa Barbara, California to identify the differences between our way of “seeing” things and a “computer”. It turns out that we are more efficient in processing an image because we take into account the semantics and the overall relations among objects in a picture, but at the same time we might be less accurate in some situations. Psychologists are also looking into different ways of processing images in case of abnormal behaviour, like in the case of autisms where it seems that these persons lack the capability to take an overall view of an image, rather they focus on individual aspects. As such they are less likely to miss something that should not fit, like the big toothbrush in the photo.
A deep neural network over time learns what to expect but it remains rooted on facts. The study aims at bettering the way neural network work equipping them with more insight… (if you pardon the pun). That would make them more effective, decreasing the processing power needed that remains, at least for the coming years, a limiting factor in certain application area, like safety video camera where the processing power is limited.