The development of AI adversaries continues apace: a paper by Nicholas Carlini and David Wagner of the University of California Berkeley has explained off a technique to trick speech recognition by changing the source waveform by 0.1 per cent. The pair wrote at arXiv that their attack achieved a first: not merely an attack that made a speech recognition SR engine fail, but one that returned a result chosen by the attacker.

In other words, because the attack waveform is 99.9 per cent identical to the original, a human wouldn’t notice what’s wrong with a recording of “it was the best of times, it was the worst of times”, but an AI could be tricked into transcribing it as something else entirely: the authors say it could produce “it is a truth universally acknowledged that a single” from a slightly-altered sample. One of these things is not quite like the other.

Image from Carlini and Wagner’s paper It works every single time: the pair claimed a 100 per cent success rate for their attack, and frighteningly, an attacker can even hide a target waveform in what (to the observer) appears to be silence. Such attacks against image processors became almost routine in 2017.

There was a single-pixel image attack that made a deep neural network recognise a dog as a car; MIT students developed an algorithm that made Google’s AI think a 3D-printed turtle was a gun; and on New Year’s Eve, Google researchers took adversarial imaging into the real world, creating stickers that confused vision systems trying to recognise objects (deciding a toaster was a banana). Speech recognition systems have proven harder to fool.

As Carlini and Wagner wrote in the paper, “audio adversarial examples have different properties from those on images”. They explained that untargeted attacks are simple, since “simply causing word-misspellings would be regarded as a successful attack”. Read more from…

thumbnail courtesy of