The growing threat of unintentional data leaks via Shadow AI

21 Nov 2025, by Slade Baylis

Last month we spoke about the dangers of Shadow IT, which is the practice wherein staff use unapproved software or other IT resources whilst performing their jobs. Not only is this a risk to organisations due to the lack of oversight, but it also exposes those users – and in turn, the organisation as a whole - to risks such as malicious software, data leakage, or compliance issues. In this month’s article, we’ll delve in deeper on this topic and focus on the newest and most prominent form of Shadow IT, that of the growing use of unapproved AI tools within the workplace.

The adoption of these new AI tools poses new risks when compared to other forms of Shadow IT. Specifically, instead of the previously more common threats, such as the risk of malware, it’s now the risk of unintentional data loss that has many concerned. As an example, research out of Hong Kong has shed light on a new vulnerability within public LLMs (Large Language Models) such as Github Copilot. Within their research they were able to extract 2,702 hard-coded credentials in their experiment to build a prompt-building algorithm which tried to extract credentials from the LLMs. What this means is that it’s theoretically possible for bad-actors to steal your credentials via maliciously-crafted prompts if they are hard-coded within your software.

That’s why this month we’ll be talking about “Shadow AI”. We’ll explain what it is, what unique risks come with it, what information is potentially vulnerable to this new threat, as well as what can be done to mitigate it.

What is Shadow AI and what are its risks?

Much like Shadow IT - which is the unsanctioned use of software, hardware, or other IT tools used by employees or end-users – Shadow AI is simply a subset of this category, referring specifically to the unsanctioned use of AI tools.

With many different tools being available - especially at low or no cost, such as ChatGPT, Microsoft Copilot, Grok, and Google Gemini – the likelihood that these tools will be used by staff has increased dramatically. The reason is simple, these tools can greatly increase productivity in certain areas. However, the problem is that these can and do pose a significant data security risk, with users potentially sharing proprietary, confidential, and sensitive data with these tools – which could be the source of a future data leak.

With these platforms storing this data and potentially using it for training, this creates a growing and very much real-world risk of this data being exposed through sophisticated prompt engineering. As reported by GitGuardian¹, research out of a university in Hong Kong exposed a “significant privacy risk posed by code completion tools like GitHub Copilot and Amazon CodeWhisperer”. What they found was that “these models not only leak the original secrets present in their training data, but also suggest other secrets that were encountered elsewhere in their training”.

Whilst it’s against best practice to not include hard-coded secrets within your codebase, such as the types exposed by this research, unfortunately it’s also been found that AI-generated code is more likely to contain them. As reported by CSO Online², it’s been found that Copilot-enabled repositories are 40% more likely to contain hard-coded API (Application Programming Interface) keys, passwords, and tokens. This means that these AI-coding assistants are potentially contributing to the exact same problem that can leave you vulnerable to these details and secrets being stolen.

This is especially concerning when you consider the growing use of AI tools within the software development space. With AI tools being directly integrated with IDEs – short for “Integrated Development Environments”, which are software development tools used by developers to improve productivity – they can and often do have full visibility over your codebase whilst it’s in development. This visibility means that they have access to all of your proprietary code. If the chosen AI tools choose to utilise this access - to what would otherwise be entirely private code - for training their AI models, this data could be stolen in the future through a maliciously-crafted prompt.

Unfortunately for organisations everywhere, what this means is that disallowing your codebase from being uploaded to a publicly hosted repository isn’t enough to limit your risk on this front. All it could take for your data to potentially be leaked in this way would be for a single developer with access to your code to use an IDE with an unapproved and integrated AI tool. That’s why governance around AI tool usage is critical and each organisation should look to establish policies around which tools are and are not allowed to be used.

The threat isn’t just limited to software codebases, all data entered is potentially vulnerable

It’s not just software code that could be exposed via this sort of threat vector – it’s any information that has been entered into AI tools. If the information that’s been entered into these AI tools is used to train the model further - which the developers of such tools have a large incentive to do in order to keep up with the AI arms race that’s occurring - then any information that you enter into these tools could potentially be exposed.

This is why within Google’s Gemini Apps Privacy Hub³ - where they list how the information you enter into their tools are used - they state to not enter “confidential information that you wouldn’t want a reviewer to see or Google to use to improve our services, including machine-learning technologies”. Whilst not stated directly, the information you enter into AI tools such as these could potentially be used to train these models even further, either today or at some point in the future. It’s for this reason that many people familiar with AI choose to not enter any private or personal information into these tools, just in case the information is used by these organisations for training in the future.

As you can imagine, the concern over this sort of threat within government is taken quite seriously too, which is why most have provided guidelines on how agencies should approach it.

For example, the Queensland Government⁴ in their “Use of generative AI in Queensland Government” guideline laid out several key takeaways, including:

“All official government information (including classification levels Sensitive and Protected) and personal information related to employees of the government, or others, should not be shared with, input, or uploaded to generative AI products and services not approved for use by your department.”
“The Queensland Government will provide options to access generative AI capabilities in a secure and managed environment, consistent with existing policies. Where available, employees should use generative AI capabilities approved by their agency.”

That second point is especially poignant, as one of the main solutions being used to mitigate against the risks of data leaks across users is the use of privately hosted LLMs and AI tools more generally. By using privately hosted and managed solutions, both government and organisations are able to ensure that the data entered into these systems doesn’t escape outside of the confines of the organisations using them.

The good news is that depending on the options chosen, running your own dedicated AI tool doesn’t have to be a tremendously expensive exercise. Of course, any solution deployed for a government department is going to be a large and likely expensive system, but for smaller organisations, the options for running your own LLMs or other generative AI tools affordably has increased substantially.

For example, here at Micron21 we have GPU Dedicated Server plans, as well as GPU Cloud Server plans that are quite affordable even for smaller organisations. We’ve even introduced “Contended GPU” cloud server plans that give you access to a portion of a GPU rather that a full dedicated GPU, to bring the costs down even further. Any of these options can be used to get your own custom AI tools running locally on a system you control, without the concerns of future data leaks affecting you or your business.

For those interested in further reading on our Dedicated and Contended GPU options, I recommend checking out our Dedicated GPU vs Contended GPU - Get the benefits of GPU compute without the associated costs article from a few months back.

Have any questions about Shadow AI and how to protect yourself from it?

If you have any questions about Shadow AI, Shadow IT, or even where to begin if you’re looking to get your own locally running AI, let us know! We’re happy to answer any questions that you have and get you started on the right foot.

You can reach us via email at sales@micron21.com and via phone on 1300 769 972 (Option #1).

Sources

1, GitGuardian, “Yes, GitHub's Copilot can Leak (Real) Secrets”, <https://blog.gitguardian.com/yes-github-copilot-can-leak-secrets/>
2, CSO Online, “AI programming copilots are worsening code security and leaking more secrets”, <https://www.csoonline.com/article/3953927/ai-programming-copilots-are-worsening-code-security-and-leaking-more-secrets.html>
3, Google, “Gemini Apps Privacy Hub”, <https://support.google.com/gemini/answer/13594961?hl=en#pn_data_usage>
4, Queensland Government, “Use of generative AI in Queensland Government”, <https://www.forgov.qld.gov.au/information-technology/queensland-government-enterprise-architecture-qgea/qgea-directions-and-guidance/qgea-policies-standards-and-guidelines/use-of-generative-ai-in-queensland-government>

Back

The growing threat of unintentional data leaks via Shadow AI

What is Shadow AI and what are its risks?

The threat isn’t just limited to software codebases, all data entered is potentially vulnerable

Have any questions about Shadow AI and how to protect yourself from it?

Sources

See it for yourself.

Australia’s first Tier IV Data Centre
in Melbourne!

Speak to our Australian based team.

24 hours a day, 7 days a week
1300 769 972

Cart Items

The growing threat of unintentional data leaks via Shadow AI

What is Shadow AI and what are its risks?

The threat isn’t just limited to software codebases, all data entered is potentially vulnerable

Have any questions about Shadow AI and how to protect yourself from it?

Sources

See it for yourself.

Australia’s first Tier IV Data Centre in Melbourne!

Speak to our Australian based team.

24 hours a day, 7 days a week 1300 769 972

Cart Items

Australia’s first Tier IV Data Centre
in Melbourne!

24 hours a day, 7 days a week
1300 769 972