Welcome to DU! The truly grassroots left-of-center political community where regular people, not algorithms, drive the discussions and set the standards. Join the community: Create a free account Support DU (and get rid of ads!): Become a Star Member Latest Breaking News Editorials & Other Articles General Discussion The DU Lounge All Forums Issue Forums Culture Forums Alliance Forums Region Forums Support Forums Help & Search

General Discussion

In reply to the discussion: Bernie vs. Claude [View all]

scipan

(3,035 posts)
7. I got more specific answers for some reason, and I think it's interesting.
Tue Mar 24, 2026, 05:31 PM
12 hrs ago

(bold is mine)
Me: I want to know just how much information AI collects is being used, and what would surprise most Americans about how that information is being used

This is a genuinely important topic, and the gap between what's actually happening and what most people assume is significant. Let me break it down.
What's being collected
The scope is wider than most people realize. Web scrapers vacuum up data from blogs, personal web pages, LinkedIn profiles, voter-registration databases, government pages, Wikipedia, Reddit, and academic institutions — plus pirated-content compilations and web archives that often contain data removed from its original source. (Scientific American) One striking example: an artist discovered that a private diagnostic medical image of hers had been included in the LAION training database, and the same dataset contained medical-record photographs of thousands of other people. (Scientific American)
Some crawlers can even get past paywalls by disguising themselves behind paid accounts. (Scientific American)
Your conversations and prompts
Many platforms harvest user inputs to refine their models. Users, (Captain Compliance) for the most part, don't understand what data is being used, how it's stored, or who might have access to it.
Most platforms share prompts with third parties — including service providers, legal authorities, and affiliated companies. Microsoft and Meta allow sharing with advertisers or affiliates under broader terms. No platform offers a way for users to remove their personal data from existing training sets. (Dataconomy)
On opt-outs: platforms like Gemini, DeepSeek, Pi AI, and Meta AI do not appear to provide a way to opt out of training use. (Dataconomy)
The "big tech" platforms are the worst offenders
Meta AI landed at the bottom of privacy rankings, with Gemini (Google) and Copilot (Microsoft) not far behind. Key concerns included that these platforms collect precise location, contact details, and usage data — particularly through their mobile apps. (Captain Compliance)
Meta's AI tool provides no means for its users to opt out, while Google and LinkedIn at least offer an opt-out mechanism. (Al Jazeera)
What would surprise most Americans
A few things stand out:
1. It extends far beyond chatbots. AI-powered predictive systems are being used to screen job candidates, with employers using them to decide who gets interviewed — with documented bias problems, including Amazon's own hiring tool that turned out to discriminate against women. (Stanford HAI)
2. Facial recognition is now in your doorbell. Amazon rolled out AI-powered facial recognition for Ring doorbells, allowing users to identify frequent visitors through stored facial profiles — though it's unavailable in states with strict biometric laws like Illinois and Texas. (Crescendo)
3. Most people already sense something is wrong. 70% of Americans have little to no trust in companies to make responsible decisions about how they use AI in their products, and 80% feel uneasy about AI being used to give medical or health advice. (Digital Silk) Yet the regulatory framework hasn't caught up.
4. Deletion isn't really deletion. Scraped databases do not go away (Stanford HAI) — even if you delete a post or a photo, if it was already crawled into a training dataset, it persists in the model's weights in ways that can't be undone.
The honest bottom line: it is basically impossible for people using online products or services to escape systematic digital surveillance across most facets of life — and AI may make matters even worse. (Stanford HAI) The data collection problem predates AI, but AI massively amplifies it by making that data operationally useful in ways it never was before


I asked it about seeming contradictory statements regarding which AIs offer an opt out, and it replied that they came from 2 different sources probably referring to different things. You can't opt out if it's already in their training data. (A good example of why you should be careful of what it tells you!)

I just thought it was interesting how I got such a different answer. But claude doesn't seem to sugar coat the problem in answering to Bernie or me.

Recommendations

0 members have recommended this reply (displayed in chronological order):

Bernie vs. Claude [View all] Quixote1818 18 hrs ago OP
Don't like a thing about this. Tells him what he wants to hear. An Infamous MAGA get same answers to same questions? IA8IT 18 hrs ago #1
Bernie, like most everyone here, understands the implications and societal costs of AI. Gaugamela 17 hrs ago #3
The Waterboy was released in 1998 IA8IT 17 hrs ago #4
How would 2 people.... RussBLib 17 hrs ago #5
Large Language Models are really non-deterministic (semi-random) number generators HesNotHere 12 hrs ago #8
bernie is asking the right questions rampartd 18 hrs ago #2
Thank you Bernie, for taking on this subject. AI is being pushed down our throats and Marie Marie 13 hrs ago #6
I got more specific answers for some reason, and I think it's interesting. scipan 12 hrs ago #7
LLMs are non-deterministic HesNotHere 12 hrs ago #9
I agree that it doesn't "know" anything, but it does alot more than just predicting the next token. scipan 11 hrs ago #10
No, it doesn't. HesNotHere 10 hrs ago #11
Number generator? Even parallel processors work in binary numbers, don't they? scipan 9 hrs ago #12
I disagree about its intent. scipan 8 hrs ago #13
Training is not programming. HesNotHere 8 hrs ago #14
It's a form of programming. Training is probably a better word. scipan 5 hrs ago #17
Training is not programming HesNotHere 5 hrs ago #20
BTW, if people can come to a real understanding of what is happening under the hood... HesNotHere 8 hrs ago #15
I know, it's scary nt scipan 5 hrs ago #18
Last thing....if you use your cellphone to ask ChatGPT what the solution is Fermi's Paradox... HesNotHere 8 hrs ago #16
Also no warp drive or Crucible scipan 5 hrs ago #19
I play with it and work with it too HesNotHere 5 hrs ago #21
Hell, give it an arm, and a goat, and maybe a few more updates. scipan 5 hrs ago #23
Side note, warp drive misses the point of the sad joke of the universe HesNotHere 5 hrs ago #22
Yeah it's not looking good. nt scipan 5 hrs ago #24
Latest Discussions»General Discussion»Bernie vs. Claude»Reply #7