Computers might, eventually, understand Italians …

Roberto Saracco July 27, 2017 Blog 325 Views

Recognising individual finger movement in a complex scene is now becoming possible. Credit: CMU

Italians are well known to speak with their hands. Gesturing is an integral part of our speaking process.

Computers have become pretty good in understanding human speech, think about Alexa, Cortana or Siri, just to make a few examples.

Recognising gesture has made progress too but not au pair with their capability to understand speech. Now this may change thanks to a research at Carnegie Mellon University where a set up consisting of 500 video cameras arranged in a sphere like dome 2 stories high can pick up movement and gesture from many persons at the same time and a software can decode the images and recognise individual gesture at finger level.

This is an amazing result. Although it seems quite straightforward to us recognising movement of several people at the same time and no deal at all to see their fingers moving for a machine this is extremely tricky. Our brain makes all the “computations” required to sort out the different parts making up the overall scene, identify and associate each of them to create a coherent representation of motion. A similar thing is done by the processing application developed at CMU and the researchers have created a library of elementary instruction codes that can be used by others to sort out movement in complex scenes.

The set up, as I mentioned, is quite complex (500 video cameras is quite a lot!) but the software developed is even more complex. It really opens up the door to a variety of applications and the CMU team is in talks with over 20 companies in different market areas that are interested in using this technology, including automotive companies.

Yesterday I was at the Ericsson research lab in Budapest talking to students that are part of the EIT Digital Industrial Doctoral school and I was shown, as work in progress, their study of algorithms to sort out movement of objects in a 3D space, using multiple cameras (in the demo there were 4 of them) having in mind Augmented Reality applications. By identifying, and understanding, the movement of objects the computer can also “understand” that an object is no longer visible because it moved behind another object blocking the line of sight but it is still there. This augmented understanding of a scene is crucial for Augmented Reality (like letting you see behind object as I mentioned in a previous post discussing augmented humans).

All these progresses in technology are enabled by the increased processing capacity available today. A few more steps and computers may even end up understanding Italians, I mean Italian lay people, understanding Italian politicians is still in the science fiction realm…