Big Brother is listening. Companies use “bossware” to spy on employees when they are near a computer. Some “spy” apps can record calls. Home devices like the Amazon Echo can also record everyday conversations. A new technology called NeuralVoiceCamouflage now provides protection. It creates its own background audio noise when you speak, which confuses the artificial intelligence (AI) that decodes the recorded speech.
The new system uses “hostile attacks”. This strategy uses machine learning, where algorithms find patterns in the data, to adjust the sound so that AI, not humans, mistakes it for something else. Basically, you are using one AI to trick another.
However, the process is not as simple as it seems. The machine learning AI needs to process the entire sound clip before knowing how to set it up. This doesn’t work if you want to mask in real time.
So, in a new study, researchers trained a brain-inspired machine learning system, a neural network, to effectively predict the future. They trained him on hours of audio recording so he could constantly process 2 seconds of audio clips and hide what was likely to be said next.
For example, if someone says “have a great feast,” you can’t predict exactly what will be said next. However, taking into account what has just been said, as well as the characteristics of the speaker’s voice, it produces a sound that breaks up some possible sentences that may follow. This includes what actually happened next. Here the same speaker says “in preparation”. To the listener, the camouflaged sound sounds like background noise and can understand the spoken language without any problem. But the car stumbles.
Scientists have superimposed the recorded audio on the system’s output. The recorded voice was sent directly to one of the automatic speech recognition (ASR) systems that interceptors could use to decode. The system has increased the number of errors in the ASR software word from 11.3% to 80.2%. For example, “I almost starved to death because the conquest of this kingdom is a daunting task”, is transcribed as “as a mercenary for a reason, I immediately start looking for the dangers of conquering this kingdom”. up).
Error rates for speech masked by white noise and enemy enemy attacks (masking only what was heard when the noise was played 0.5 seconds later due to lack of predictive power) were only 12.8% and 20.5%, respectively. .. .. This paper was presented as a dissertation last month at an international educational presentation conference where the submitted manuscripts were reviewed.
Even when the ASR system was trained to decode garbled speech with neural speech camouflage (a technique interceptors can use), the error rate remained at 52.5%. In general, short words like “the” were the hardest to break, but these are the least obvious parts of the conversation.
The researchers have also tested this method in the real world by playing audio recordings in combination with camouflage through a set of speakers in the same room as the microphone. It still worked. For example, “I also got a new monitor” was transcribed “because they are also new toscats and maniters”.
This is just the first step in protecting privacy from AI, said Mia Chikier, a Columbia University computer scientist who led the study. “Artificial intelligence collects data about our voices, faces and behaviors. We need next generation technologies that respect privacy,” she said.
Chiquier adds that the predictive part of the system has great potential for other applications that require real-time processing, such as autonomous cars. “We need to predict where the car is and where the pedestrians are,” she says. The brain also works by waiting. You will be surprised if your brain predicts something wrong. In this sense, Chikyo says: “We are imitating the ways of the people.”
“There’s something great about combining the classic machine learning problem of predicting the future with another hostile machine learning problem,” said Andrew Owens, a computer scientist at the University of Michigan, Anerber, who is studying the process. And he did not participate in visual camouflage and work. Bo Li, a computer scientist at the University of Illinois at Urbana-Champaign who has tackled hostile audio attacks, was impressed that the new approach also works for enhanced ASR systems.
Jay Stanley, senior policy analyst for the American Civil Liberties Union,