I didn’t say that!

Editing sound, digitally, has been possible for many years, yet editing words has proven extremely difficult. Credit: Princeton University

Technology is opening up unexpected possibilities that in turns may lead to undesired results.

This is surely the case with the results obtained in a collaboration between Princeton and Adobe: the editing of voice messages.

The joint team has created a voice editor, VoCo, that works exactly like a text editor. With a text editor you can scroll, or search, for a specific word and you may delete it or change it with another word. VoCo lets you manipulate spoken messages. Take a look at the clip and see what can be done.

Although it has been possible for many years to edit digital sound, the editing of voices has proven challenging, probably because we are so good in detecting nuances in voices that any glitch, doesn’t matter how tiny it is, is immediately detected. The problem is that there is not a “sound” corresponding to a given word. The sound depends on what words are in that sentence, what is the emotion associated by the speaker to the sentence, basically, every single sentence we utter has its own “music” and this is reflect by the words composing it. Hence editing a voice message requires the capability to extract, in a way, the emotion impressed by the speaker, and reinsert that “emotion” in the word you are substituting. Like wise, you cannot delete a word by just deleting its sound. anybody would immediately spot that there is something wrong in that sentence.

VoCo is based on a sophisticated algorithm that learns the characteristics of the speaker in that specific sentence and is able to replicate it in the introduction of a new word seamlessly joining the word to the previous and following ones.

Now, all this is great. At the same time it is not difficult to imaging some bad guy hacking your voice and introducing changes that may completely reverse the sense of the sentence, and the intention of the speaker. And yet, the voice is the one of the person you know, it is him who said that.
In the US sometimes people say “read my lips”. Well, that might be the answer to detecting someone has tampered with your voice. Your lips will no more be in synch… Unfortunately, a program developed at Disney Research, FaceDirector, can take care of that: changing the facial expression and the lips movement to synchronise the video with the sound!

We are really entering a new dimension where the virtual and the real get blurred and trust is challenged ever more.

About Roberto Saracco

Roberto Saracco fell in love with technology and its implications long time ago. His background is in math and computer science. Until April 2017 he led the EIT Digital Italian Node and then was head of the Industrial Doctoral School of EIT Digital up to September 2018. Previously, up to December 2011 he was the Director of the Telecom Italia Future Centre in Venice, looking at the interplay of technology evolution, economics and society. At the turn of the century he led a World Bank-Infodev project to stimulate entrepreneurship in Latin America. He is a senior member of IEEE where he leads the New Initiative Committee and co-chairs the Digital Reality Initiative. He is a member of the IEEE in 2050 Ad Hoc Committee. He teaches a Master course on Technology Forecasting and Market impact at the University of Trento. He has published over 100 papers in journals and magazines and 14 books.