Data

The Dual-Edged Security Sword: Leveraging Open-Source AI and Data Tools

By Slawomir Ligier, VP of Product Management, Protegrity

By leveraging open-source artificial intelligence (AI) and data tools, we are accelerating the possibility of innovations, reducing overall solution costs and bringing transparency and collaboration to our tech stacks.

However, these gains do not come without risk. While organizations may be better able to compete with the advantages these tools bring, they present significant challenges in terms of data security.

Organizations must address security, data privacy and compliance issues before implementing their AI strategies, or they risk being at the center of the next security scandal.

A Cautionary Open-Source Tale

Open-source AI and data tools have gained popularity lately because they can spread AI wealth. After all, there is now a cheaper and easier way for organizations to access all of AI’s benefits in a scalable manner.

What we are also seeing with these models is a cautionary tale.

It began in 2023 with the open-source AI developer platform Hugging Face. As identified by outside researchers, over 700 organizations using this platform were inadvertently putting their API tokens at risk inside exposed code repositories.

In just the first months of 2025, we saw the rise and fall of open-source AI and data analytics firm DeepSeek. Initially hailed as a cheaper alternative to more expensive tools, the focus shifted to the 1 million sensitive data records that were put at risk by February.

Even more recently, AI aggregator OmniGPT (self-proclaimed as the “most affordable ChatGPT alternative”) has allegedly suffered its own security breach, which included the personal identifiable information (PII) of 30,000 people.

Open-source tools will undoubtedly propel innovation in the future. In a recent IBM study, 61% of surveyed IT decision-makers indicated that their companies are already using open-source AI models to create their tools.

Cybercriminals are clearly paying attention to and targeting open source’s inherent flaws. Organizations must act to prevent further security risk while preserving the benefits.

The Challenges of Open Source

Organizations have four initial hurdles to overcome before they can securely adopt open-source AI and data tools:

  1. Vulnerabilities and Exploits: Threat actors easily identify and exploit weaknesses because open-source software is publicly available. The software’s susceptibility to cyberattack means organizations need to be vigilant about software updates.
  2. Data Privacy: Open-source AI tools most often include substantial amounts of data, some of them sensitive. How does your organization plan to adhere to significant data compliance regulations like the GDPR while using these tools?
  3. Supply Chain Risks: At the end of the day, open-source projects are interconnected. One compromised piece of the web can compromise an entire system, and according to a 2024 Sonatype report, last year saw a 156% increase in attacks on open-source ecosystems.
  4. Lack of Support: How will your organization support and maintain your open-source tool to combat rising supply chain threats?

Balancing Innovation and Security in Open Source

To overcome these challenges, organizations can take five proactive steps in their AI strategies:

  1. Risk Assessment: Take the time to thoroughly assess potential risks and vulnerabilities that could impact your organization––including evaluating the security of the open-source tools you plan to use. One key consideration is the location of both sensitive training data and the storage of AI prompts and responses.
  2. Governance Framework: Establish a clear governance framework that outlines the policies and procedures for using open-source software. This framework should include guidelines for code contributions, version control, and security practices.
  3. Regular Audits and Updates: Schedule regular security audits and updates. Keeping your open-source tools up to date with the latest security patches is essential to mitigate risks.
  4. Data Encryption / Tokenization: Ensure that all sensitive data is encrypted both in transit and at rest. Implement strict access controls to limit who can access and modify data.
  5. Monitoring and Redaction of Prompts and Responses: Sensitive data can be included in prompts delivered to the Large Language Model (LLM) and potentially used in future responses. We need to redact that information based on set policies before delivering it to the AI system. The same applies to responses—they need to be properly sanitized to avoid delivering sensitive information to unauthorized individuals.

Open source’s rise presents a unique opportunity to drive innovation and competition, but organizations must address long-standing security concerns simultaneously. All things can be good in moderation or with the proper guardrails in place, and we must strike this balance now while it is early.

The double-edged sword does not mean we cannot leverage open-source tools; we must recognize, understand and address critical issues to safeguard ourselves and our data.

Organizations that take the time to consider and secure their open-source tools properly will benefit their AI strategies eventually.

Author

Related Articles

Back to top button