How AI voice cloning has opened the door to a horrific new type of phishing scam

26 Apr 2023, by Slade Baylis

In recent months we’ve covered the new language-model AI from OpenAI - called ChatGPT - and the potential implications for both cyber and job security.  We’ve also covered the history of phishing, discussing how it developed, how it works and what you can do to avoid it.  However, this month we’re looking to cover where these two technologies overlap.  New developments on the AI voice synthesis front – more specifically with “voice cloning” – are having a drastic impact on the success-rate and harm from phishing attacks.

One of the most commonly known developments with AI recently has been the creation and propagation of deepfakes.  For those not already familiar, deepfakes are synthetic media – which refers to artificially produced or modified media – where a person’s likeness in visual content is replaced with that of another.  In much the same way, voice cloning is a technique used to re-create a person’s voice by training AI systems on samples of their actual voice. 

This technology has been around for a while, though in the past it was more easy to spot and required a large amount of training data to generate a convincing voice replica.  It would take thousands of hours of source data to adequately train an AI to re-create someone’s voice – such as with public figures like podcast host Joe Rogan, which is why he was used as one of the earliest examples of this technology.  However, the most concerning development is that with newer AI models, the amount of source data required has been reduced to as little as just three seconds!

A low bar to entry and more traumatic attacks – An increased risk of phishing attacks and targeted scams 

With such a low bar to clear – given that a malicious actor might only need a few seconds – the potential for enhanced phishing attacks is clear.  Potentially, even a single answered phone-call might be all they need to create a convincing replicated voice!

Currently it’s already relatively common for people to receive fake messages claiming to be from friends, family, or colleagues, requesting that they assist them in some way – either through sending them money or giving them information or access to things that they shouldn’t have.  In this concerning new world, scammers could potentially call you with the voice of that trusted family member or friend, making it even harder to know that the request is fraudulent.

This poses such a large risk that the Federal Trade Commission1 in the US has put out a consumer alert warning people to watch out for fake emergency calls from family members.  They warn that fake emergency calls that utilise voice cloning could be used to create a heightened sense of emergency in order to spur people into action and fall for their scam. 

In their example, they walk through a fake call from a grandson, urging a grandparent to help pay for bail as they’ve gotten into a car crash and landed in jail – even going so far as to urge them not to contact other family members, as they’ll “get in trouble”.  With this demographic already being one of the biggest targets for this sort of scam, this new development just adds more fuel to the fire due to the enhanced sense of urgency they’re able to create.  Even more concerning, the FBI has also warned about more malicious uses of this technology, such as virtual kidnapping scams.

As a horrific real-world example, a mother in the US has reportedly received a call from a scammer who used a cloned version of her daughter’s voice to fake her kidnapping.  As reported by Arizona’s Family2, the woman received a call from an unlisted number, once she answered it she heard her 15-year-old daughter crying, saying things like “Mum, I messed up…”, and then a man’s voice cuts in, stating that he had her daughter and demanded a one-million-dollar ransom – all whilst she could hear her daughter's voice in the background crying and begging for help.

She said that in her mind there was no question that her daughter was in trouble, stating “It was never a question of who is this? It was completely her voice. It was her inflection. It was the way she would have cried. … That’s the freaky part that really got me to my core”. 

As an example of the quality of these AI-created voices, here is the three second voice prompt that Microsoft used to replicate another person’s voice…

Transcription: “Well satisfied with my cabin, which was located in the…”

…and here is the entirely new phrase they were able to create using that single sample above.

Transcription: “The others resented postponement, but it was just his scruples that charmed me.”

As you can see, with the quality of these audio forgeries, they are hard enough to discern without the added pressure of a targeted scam – with emotions running high in such situations, these types of attacks are even more likely to succeed.  With it also being possible for scammers to fake their outgoing phone numbers, these types of calls can even show up as coming from the "right" phone number, so extra vigilance is going to be required by all to prevent these sorts of scams moving forward. 

Voice-based verification thrown into question – The “voiceprint” used by Centrelink and ATO is able to be fooled

As if the development of a new kind of emotionally-charged phishing scam wasn’t enough, this voice cloning technology is also exposing security flaws in verification checks used by large institutions, such as Government agencies and banks.  For example, Centrelink and the Australian Taxation Office (ATO) both allow people to use a “voiceprint”, along with other information, to verify their identity over the phone.  This service is described as using the “rhythm, physical characteristics, and patterns” of their client’s voices to help identify them.

As reported by Guardian Australia3, through using an AI-generated voice trained on just four minutes of audio taken of one of their journalists, they were able to gain access into their own Centrelink self-service account. To have this form of verification bypassed is concerning, especially given that the voiceprint service was used by around 3.8 million Centrelink clients as of the end of February, with more than 7 million using the same service with the ATO. 

The Centrelink verification process does require additional information, as they also require the person to know the account-holder’s customer reference number, however this information isn’t treated as securely as a password.  Guardian Australia reports that this information is often “included in correspondence from Centrelink and other service providers, such as childcare centres” making it much more accessible to malicious third parties intent of stealing a person’s information.

In a similarly concerning development, this same form of voice verification is used by banks across the U.S. and Europe and these too seem to be vulnerable to these same type of forgeries.  As reported by Vice4, through utilising an entirely AI-generated voice, they were able to break into a bank account, being able to access the account information, including balances and lists of recent transactions and transfers. 

Whilst some banks claim that voice identification is equivalent to a fingerprint, Rachel Tobac, the CEO of social engineering focused firm SocialProof Security told Vice: “I recommend all organizations leveraging voice ‘authentication’ switch to a secure method of identity verification, like multi-factor authentication, ASAP.” This sort of voice replication can be “completed without ever needing to interact with the person in real life.”

How can you protect yourself against these sorts of attack?

As reported by the ABC5, experts have started to suggest a few basic tactics that people can employ to protect themselves from voice-based attacks:

  • Calling friends or family directly to verify if the call is real if you suspect it to be fraudulent;
  • Coming up with a shared safe word or password to use with those close to you in cases of emergency;
  • Treating any unexpected phone call as potentially fake, as even caller ID can be faked; and
  • Being careful with who you share personal identifying information with, ideally only sharing those details if absolutely necessary.

In addition, when it comes to providing sensitive information, we recommend only providing this sort of information when you have called the person/organisation yourself, rather than on any received phone-call – this could even mean ending a call and then calling them back immediately, just to verify the veracity of the request!

On the identity verification side there are also lessons to be learnt here.  As proven in the cases of Centrelink and the ATO, verification methods that use voice recognition have been thrown into question.  Instead, using methods of verification that require knowledge to be provided are likely to be much more reliable and secure. In addition, using technologies such as multi-factor authentication, such as systems where a user needs to provide a code that is SMS’d through to their mobile, are likely to be an important step to only providing access and details to authorised individuals. 

In general, the more levels of security and verification that are in place, the more secure that system or data will be.  However, it’s important to note that any procedure that is used is going to require a line to be drawn between security and ease-of-use for your clients,  As an example, even though it would be more secure, most people aren’t likely to tolerate a thirty-minute-long verification procedure! 

Have any questions about how to increase your security posture?

If you’re concerned about your own vulnerabilities to these sorts of phishing attacks, or are looking for ways to increase the security of your systems, let us know?  We have services to help upskill your staff on how to spot and respond to phishing attacks and can provide advice on how to secure your systems more generally!

Call us on 1300 769 972 (Option #1) or email us at sales@micron21.com and we’ll be able to help you address any security concerns that you have!

Sources

1, Federal Trade Commision, Scammers use AI to enhance their family emergency schemes, <https://consumer.ftc.gov/consumer-alerts/2023/03/scammers-use-ai-enhance-their-family-emergency-schemes>
2, Arizona’s Family, ’I’ve got your daughter’: Scottsdale mom warns of close call with AI voice cloning scam, <https://www.azfamily.com/2023/04/10/ive-got-your-daughter-scottsdale-mom-warns-close-encounter-with-ai-voice-cloning-scam/>
3, Guardian Australia, AI can fool voice recognition used to verify identity by Centrelink and Australian tax office, <https://www.theguardian.com/technology/2023/mar/16/voice-system-used-to-verify-identity-by-centrelink-can-be-fooled-by-ai>
4, Vice, How I Broke Into a Bank Account With an AI-Generated Voice, <https://www.vice.com/en/article/dy7axa/how-i-broke-into-a-bank-account-with-an-ai-generated-voice>
5, ABC News, Experts say AI scams are on the rise as criminals use voice cloning, phishing and technologies like ChatGPT to trick people, <https://www.abc.net.au/news/2023-04-12/artificial-intelligence-ai-scams-voice-cloning-phishing-chatgpt/102064086>

See it for yourself.

Australia’s first Tier IV Data Centre
in Melbourne!

Speak to our Australian based team.

24 hours a day, 7 days a week
1300 769 972

Sign up for the Micron21 Newsletter