Computers might, eventually, understand Italians …

Recognising individual finger movement in a complex scene is now becoming possible. Credit: CMU

Italians are well known to speak with their hands. Gesturing is an integral part of our speaking process.

Computers have become pretty good in understanding human speech, think about Alexa, Cortana or Siri, just to make a few examples.

Recognising gesture has made progress too but not au pair with their capability to understand speech.  Now this may change thanks to a research at Carnegie Mellon University where a set up consisting of 500 video cameras arranged in a sphere like dome 2 stories high can pick up movement and gesture from many persons at the same time and a software can decode the images and recognise individual gesture at finger level.

This is an amazing result. Although it seems quite straightforward to us recognising movement of several people at the same time and no deal at all to see their fingers moving for a machine this is extremely tricky. Our brain makes all the “computations” required to sort out the different parts making up the overall scene, identify and associate each of them to create a coherent representation of motion.  A similar thing is done by the processing application developed at CMU and the researchers have created a library of elementary instruction codes that can be used by others to sort out movement in complex scenes.

The set up, as I mentioned, is quite complex (500 video cameras is quite a lot!) but the software developed is even more complex. It really opens up the door to a variety of applications and the CMU team is in talks with over 20 companies in different market areas that are interested in using this technology, including automotive companies.

Yesterday I was at the Ericsson research lab in Budapest talking to students that are part of the EIT Digital Industrial Doctoral school and I was shown, as work in progress, their study of algorithms to sort out movement of objects in a 3D space, using multiple cameras (in the demo there were 4 of them) having in mind Augmented Reality applications. By identifying, and understanding, the movement of objects the computer can also “understand” that an object is no longer visible because it moved behind another object blocking the line of sight but it is still there. This augmented understanding of a scene is crucial for Augmented Reality (like letting you see behind object as I mentioned in a previous post discussing augmented humans).

All these progresses in technology are enabled by the increased processing capacity available today. A few more steps and computers may even end up understanding Italians, I mean Italian lay people, understanding Italian politicians is still in the science fiction realm…

About Roberto Saracco

Roberto Saracco fell in love with technology and its implications long time ago. His background is in math and computer science. Until April 2017 he led the EIT Digital Italian Node and then was head of the Industrial Doctoral School of EIT Digital up to September 2018. Previously, up to December 2011 he was the Director of the Telecom Italia Future Centre in Venice, looking at the interplay of technology evolution, economics and society. At the turn of the century he led a World Bank-Infodev project to stimulate entrepreneurship in Latin America. He is a senior member of IEEE where he leads the New Initiative Committee and co-chairs the Digital Reality Initiative. He is a member of the IEEE in 2050 Ad Hoc Committee. He teaches a Master course on Technology Forecasting and Market impact at the University of Trento. He has published over 100 papers in journals and magazines and 14 books.