The more Artificial Intelligence (AI) is hyped, the more people start wondering if their personal data might end up as training material for AI. The big question is: how can I stop not only my data but also my content from falling into AI’s hands? Well, you can start addressing this issue by covering the basics. Assume that anything you put online could become part of AI’s training material unless you explicitly protect your data and content.
Few people benefit from accepting the cookies websites propose. The primary beneficiary is the site owner, typically a commercial entity. Cookies are designed to collect information about users to tailor marketing and gather behavioral data. This data often ends up being analyzed by AI.
As a rule of thumb, it’s best to block all cookies. Similarly, avoid logging into websites unless absolutely necessary. The more anonymous you are, the better protected you are. The same principle applies to browsers. Google and Microsoft encourage you to sign into browsers so your data can seamlessly flow between devices – desktop to phone and back. While convenient, it also shares valuable information.
Social media is a significant focus area. Recently, Meta, which owns Facebook, Instagram, and WhatsApp, asked users for consent to use their data for training AI models. Meta intends to leverage public content such as images, texts, and comments to develop and improve generative AI models.
This request primarily targets European users, influenced by the EU’s General Data Protection Regulation (GDPR), which mandates transparency about data use. GDPR also gives users the right to oppose such use.
Meta’s notifications include a link to a form where users can object to data usage. If users do not respond, Meta assumes consent. This request stems from growing scrutiny by European authorities and advocacy groups on how corporations use user-generated data in AI development.
If you haven’t received such a notification, you can proactively deny consent. Facebook settings are notoriously complex, often buried deep within menus.
While WhatsApp, also owned by Meta, encrypts all messages, AI cannot access them. However, this doesn’t apply to interactions with Meta’s AI through WhatsApp.
Consider setting your social media accounts to private and restricting unknown users from accessing your data. Review your follower list and decide whether each post should be public or private.
An alternative to platforms like X (formerly Twitter) is Bluesky. Currently, Bluesky does not feed user data or content into AI systems and has publicly stated it has no such plans. However, this does not mean data cannot be scraped by third parties for AI training.
Bluesky is an open, decentralized platform. All posts are public by default, making them vulnerable to third-party data scraping. Significant volumes of data from Bluesky are already believed to have been used by AI. The platform has pledged to enhance users’ ability to protect their data.
On mobile devices, it’s better to use social media platforms via browsers rather than their apps. Apps collect considerably more data about users.
Beyond social media, protect your creative works proactively. For images, the simplest method is to add a watermark or a digital, preferably encrypted, signature. Tools like Photoshop or the free service ArtShield can handle this.
Another technique is to “poison” images by making imperceptible alterations that confuse AI systems. This can lead to incorrect outputs, such as misinterpreted images, making AI models less reliable and deterring unauthorized use. Effective tools for this include Nightshade and Glaze.
Despite applying these protections, regularly use reverse image search tools like ArtShield to monitor your content’s usage.
Text is arguably the hardest to protect, given its ease of use and the mixed effectiveness of protective measures. Adding a notice prohibiting the use of your text for AI training can help. You can also embed invisible characters and spaces to disrupt AI processing.
Some suggest making text highly personal, unpredictable, and human-like, which might render it less useful for general AI purposes. Ironically, this aligns with the qualities of good writing, especially in creative fields.
However, it remains unclear how effective such strategies are in deterring AI. Entire categories of tools, often called “AI humanizers,” have emerged to address these challenges.
AI has already mastered generating music that sounds deeply human. This wouldn’t be possible without feeding it human-created music. In Finland, Teosto is working to protect musical works from unauthorized AI use, mainly by influencing legislation. The goal is to require permission from creators, offer fair compensation, and ensure clear and transparent rules.
Add metadata to your works specifying they must not be used for AI training. Even a simple tag like “no AI training” is better than nothing. Watermarks, especially for audio content, are also effective. Tools like AG Watermark and StirMark for Audio can help.
Being proactive is crucial. Once a creation becomes part of AI training material, it is nearly impossible to remove it.
Despite these measures, it’s impossible to completely prevent AI from collecting personal data. That ship has sailed. However, by making it harder to exploit your data, you can still protect yourself and your creations to some extent.