Skip to main content

Zuckerberg’s AI announcement raises privacy and toxicity red flags

Meta CEO Mark Zuckerberg’s AI announcement has raised major concerns, after he said that the company had more user data than was used to train ChatGPT – and would soon be using it to train its own AI systems.

The company’s plan to use Facebook and Instagram posts and comments to train a competing chatbot raises concerns about both privacy and toxicity …

Zuckerberg announced the company’s plan after releasing the company’s latest earnings report, as Bloomberg reports.

For many people, Facebook is the internet, and the number of its users is still growing, according to Meta Platforms Inc.’s latest financial results. But Mark Zuckerberg isn’t just celebrating that continuing growth. He wants to take advantage of it by using data from Facebook and Instagram to create powerful, general-purpose artificial intelligence […]

[Zuckerberg said] “The next key part of our playbook is learning from unique data and feedback loops in our products… On Facebook and Instagram, there are hundreds of billions of publicly shared images and tens of billions of public videos, which we estimate is greater than the Common Crawl dataset and people share large numbers of public text posts in comments across our services as well.”

Common Crawl refers to a huge archive of 250 billion webpages, representing the bulk of the text used to train ChatGPT. By calling on an even larger dataset, Meta could be in a position to build a smarter chatbot.

As Bloomberg notes, it’s not just the sheer volume of data that might give Meta an advantage – it’s the fact that so much of it is interactive.

The pile of data he’s sitting on is especially valuable because so much of it comes from comment threads. Any text that represents human dialogue is critical for training so-called conversational agents, which is why OpenAI heavily mined the internet forum Reddit Inc. to build its own popular chatbot.

But the piece also points to the two big red flags here. First, Meta would effectively be training its AI on what may be quite personal posts, and conversations between friends in Facebook comments. That raises major privacy alarms.

Second, anyone who has ever read the comments section anywhere on the Internet knows that the percentage of toxic content is high. While thoughtful users debate the issues, there’s no shortage of commenters resorting to personal attacks and crude insults – and a worrying proportion of that is racist and sexist.

That’s something any chatbot training system has to filter – and Apple is likely being more cautious than anyone else in its own chatbot development work, contributing to a very late Siri relaunch – but the situation here may be particularly bad.

Some of the content on Facebook that gets flagged as toxic doesn’t get reviewed by a human anymore and is left on the site. Worse: When Zuckerberg said that Meta’s data was bigger than that of Common Crawl, he was likely lumping in the company’s historic archive that would include all the hyperbolic political content and fake news that were on the site before Zuckerberg took pains to clean it up.

And this is the company that, even a few days ago, said that a fake video of President Biden should be allowed to remain on the platform because it was edited by a human and not by an AI system, so its standards aren’t exactly high even today.

Photo by Mariia Shalabaieva on Unsplash

FTC: We use income earning auto affiliate links. More.

You’re reading 9to5Mac — experts who break news about Apple and its surrounding ecosystem, day after day. Be sure to check out our homepage for all the latest news, and follow 9to5Mac on Twitter, Facebook, and LinkedIn to stay in the loop. Don’t know where to start? Check out our exclusive stories, reviews, how-tos, and subscribe to our YouTube channel

Comments

Author

Avatar for Ben Lovejoy Ben Lovejoy

Ben Lovejoy is a British technology writer and EU Editor for 9to5Mac. He’s known for his op-eds and diary pieces, exploring his experience of Apple products over time, for a more rounded review. He also writes fiction, with two technothriller novels, a couple of SF shorts and a rom-com!


Ben Lovejoy's favorite gear