It speaks like me! … and you

A spectrogram for the word “whoa”. By the way, does it look to you like a mouth voicing a sound? It surely does to me. Credit: Lorenzo Tlacaelel / CC BY 2.0

I do not know how 2017 will be remembered in terms of technology advances. If it were for me I would associate it with the pervasive uptake of AI, Artificial Intelligence.

I posted quite a number of news on “applied AI”, i.e. application of Artificial Intelligence in many fields, some of them really unexpected (like a smart tripod), others really pervasive (like in digital photography, from taking a photo with smarter and smarter cameras to editing them with AI assistance). Artificial intelligence is proving better at (some, and growing) medical diagnoses, at painting, at driving cars and drones, it is making real time language translations a commodity, language understanding the preferred way to interact with your television and appliances …, and so on. If I counted correctly, 52 out of the 365 posts I published this year involved artificial intelligence, by far in terms of its application.

Hence, I find appropriate to close the year with a news of a paper published by Google researchers that is reporting on the success of Tacotron 2, an artificial intelligence based system that can generate speech that is undistinguishable from our human speech (you can hear a few examples here). I remember my, and many others, wonder at the first speech syntheses by computer (the computer talks!!!) back in the last century. In the clip below you can hear the voice syntheses made by MUSA, an application developed back in 1977 (the recording is from 1978) by a team of researchers in CSELT, the research centre of the Italian Telecommunications company, that was considered particularly advanced at that time (both the team and the application!).

At that time the processing power, as well as storage capacity, was limited. Those very crude results  were considered phenomenal.

With the exponential growth of processing and storage capacity the quality of the speech increased significantly and we can hear the results today in the announcements at airports and railway stations. Somebody says it is cheating, it is not real speech syntheses. By leveraging the unlimited processing and storage capacity the applications today fetch human pre-recorded words/sentences and assemble them into a coherent set taking care of smoothing the points where different segments are pasted together. The result is quite good.
Reading a book is a different story because it requires the application to “understand” what it reads if you expect an empathy in the artificial voice and this “understanding” is the domain of artificial intelligence. In this last year the progress have been significant. The results obtained by Google researchers is remarkable, I personally cannot distinguish the artificial voice from a human voice (that, by the way, the application mimic also in tone and pitch so that it appears to be spoken by the same person). However, the examples provided relates to single sentences. It would be interesting to hear a book being read. There you can appreciate if the quality is such to trick you into believing it is a human reader. And of course, the next step would be to have the artificial voice impersonating an actor that can infuse in the speech the feeling (a sad voice, a thrilled one…) based on what the situation demands.

All in all, this year has marked an avalanche of applications leveraging on artificial intelligence (in its various hues: deep learning, deep neural networks, convoluted networks, …) and has brought machines a step closer to us. A path that the FDC Initiatives on Symbiotic Autonomous System has started to explore.

About Roberto Saracco

Roberto Saracco fell in love with technology and its implications long time ago. His background is in math and computer science. Until April 2017 he led the EIT Digital Italian Node and then was head of the Industrial Doctoral School of EIT Digital up to September 2018. Previously, up to December 2011 he was the Director of the Telecom Italia Future Centre in Venice, looking at the interplay of technology evolution, economics and society. At the turn of the century he led a World Bank-Infodev project to stimulate entrepreneurship in Latin America. He is a senior member of IEEE where he leads the New Initiative Committee and co-chairs the Digital Reality Initiative. He is a member of the IEEE in 2050 Ad Hoc Committee. He teaches a Master course on Technology Forecasting and Market impact at the University of Trento. He has published over 100 papers in journals and magazines and 14 books.