Cyber Security

AI and Privacy: Striking the Balance Between Innovation and Data Protection

By Danny Jenkins, ThreatLocker CEO & Co-Founder, Cybersecurity Expert

The use of artificial intelligence (AI), namely large language models (LLM), permeates all modern industries and is primarily used for data analysis. However, this raises significant questions relating to data protection. We have seen a similar transition before with the initial adoption of digital record keeping. Businesses made the transition from physical filing cabinets to electronic database management systems, with the primary goal of improving efficiency and operations.

With AI, we see similar challenges related to data management and data security. Regulatory and compliance requirements are helping with some of it, but they’re not always comprehensive enough to keep up with the ever-changing landscape of AI. Although it’s not as all-encompassing as the shift towards digital record keeping, the companies developing these LLMs have a responsibility to provide data protection. This is especially important considering they rely on large amounts of data to operate, some of which are provided by their users. There are still some glaring concerns surrounding data management and data security, and only the future can tell how regulatory and compliance requirements will address these issues.

Data management 

Data management is a major factor when considering the integration of AI and LLMs into our everyday workflow and applications. These systems require large amounts of data to learn and output useful information. The issue not only lies with whether data is being fed from an end user or if it’s scraping publicly available data, but also with the eventual storage or management of that data. Companies approach the issue of data management with data collection and consent agreements, data encryption and secure storage, as well as data anonymization practices.

When it comes to data collection and consent, companies want collection methods to be legal, ethical, and transparent. Obtaining explicit consent can be difficult as AI systems may use publicly available content, but the user or forum host may not have explicitly consented to the usage of that data for AI training purposes. This is further exacerbated by the tracking of consent. There is no centralized consent management system, and due to the sheer scale of data used, it’s impractical to track whether the source provided consent for that data to be used in AI training.

This lack of tracking, combined with the fact that people repost content all of the time, highlights the need for data anonymization, particularly data minimization. Minimizing the data collected to only relevant data needed for the LLM to function is key.

In addition to issues with data consent and anonymization, there are concerns surrounding the encryption and storage of this data. Many LLMs encrypt data in transit and at rest, which is especially important when it is likely to contain confidential data. However, it brings into question the storage procedures implemented by these companies. As with all data, you need to have access control methods in place. Leakage risks always exists. Data stored or processed by an LLM could unintentionally include private or sensitive data, which directly ties into broader data security concerns when using LLMs.

Data security 

With data security in mind, we must address the vast amount of data these models process—and the risks of leakage. Past cases have shown models unintentionally revealing private details from their training data when prompted. This often happens through attribute inference attacks, where sensitive or personal information is deduced by analyzing a machine learning model’s behavior or responses. A study of such attacks, using real Reddit profiles, showed that LLMs can accurately infer personal information with high accuracy.

Other data security concerns include user, network, and software-level attacks. User-level attacks include tools developed by cybercriminals, such as FraudGPT and WormGPT. These tools enable cyberattacks and lack the safeguards implemented by ChatGPT, though it’s important to note that ChatGPT’s controls are not flawless. The main purpose of FraudGPT and WormGPT is to automate cyberattacks. FraudGPT carries out attacks by forging documents, crafting personalized phishing messages, and generating fake reviews for scam products online. Meanwhile, WormGPT enables large-scale phishing campaigns.

Network-level attacks also involve phishing but may not always rely on malicious LLMs. Instead, they leverage ChatGPT to create more personalized emails without triggering the security controls present in user-level attacks. These personalized phishing messages have a higher click-through rate than non-personalized ones. As a result, threat actors can use these tools to push out spear-phishing campaigns en masse.

Lastly, software-level attacks happen when general users exploit legitimate versions of ChatGPT to create malware. This can include ransomware, worms, keyloggers, and even fileless malware. Given these risks, regulatory and compliance measures are essential in mitigating these data security concerns related to LLMs.

Compliance and regulation 

Regulatory and compliance play a crucial role in addressing some of the data protection concerns when it comes to LLMs. While internal controls are generally more effective, existing measures like the General Data Protection Regulation (GDPR) and the California Consumer Privacy Act (CCPA) have been implemented to address these concerns. These regulations include informed consent, data minimization restrictions, transparency about how it uses the data, and data rights.

Other measures taken include the AI act proposed by the European Union, which places LLMs under high-risk categories. As a result, LLM developers must adhere to accountability controls like mandatory human oversight and impact assessments, alongside transparency and data security.

Additionally, ensuring data security involves requiring compliance with security frameworks for encryption and secure data storage. However, as LLMs are used in new areas, it may necessitate updates to existing cybersecurity laws. The ever-evolving regulatory needs are ultimately the balance between these innovative tools used by almost all industries, and the data protection required to safely justify their use.

Conclusion 

Data protection concerns surrounding the use of LLMs in business environments are intensifying, with significant risks related to consent, anonymization, storage, and security. These concerns are further compounded by the potential for data leakage, which can be exploited to obtain sensitive or personalized information. Additionally, LLMs can be weaponized to automate phishing attacks and even generate malware, often masked under the pretense of ethical applications.

Most concerns can be mitigated by good security practices within the company, and regulations and compliance standards applied to LLMs overall. Ultimately, while the evolving regulatory landscape for LLMs presents important challenges, proactive internal policies and adherence to robust security practices are crucial in mitigating risks. By establishing clear guidelines around data protection and limiting LLM usage to only when necessary, businesses can safeguard sensitive information and maintain compliance, ensuring a secure and ethical approach to AI deployment.

Author

Related Articles

Back to top button