The unintended consequence: How Our online history built the AI industry, KNEWS

Opinion

24 APRIL 2023 - 14:29

by Scott Rosenberg - AXIOS

The AI boom is built on data, the data comes from the internet, and the internet came from us.

Driving the news: A Washington Post analysis of one public data set widely used for training AIs shows how broadly today's AI industry has sampled the 30-year treasury of web publishing to tutor their neural networks.

Why it matters: Ever written a blog? Built a web page? Participated in a Reddit thread? Chances are your words have contributed to the education of AI chatbots everywhere.

The big picture: While this massive verbal repurposing is triggering an important legal brawl over whether it should be treated as fair use or theft, it's also inspiring a personal reckoning for many of the millions whose postings built today's online world.

We thought we were sharing our hearts and minds, and of course, we were.

But without realizing it we were also creating a database, incomplete but rich, of human expression.

That database makes the uncannily adept sentence-completion gymnastics of ChatGPT and its competitors possible.

Because visual AI tools like Dall-E, Midjourney and Stable Diffusion got popular before verbal chatbots like ChatGPT took off, visual creators —photographers, illustrators and fine artists — were the first to grapple with this realization.

Ever written a blog? Built a web page? Participated in a Reddit thread? Chances are your words have contributed to the education of AI chatbots everywhere.

Musicians face the same kind of epiphany, as they encounter multiplying AI-conjured facsimiles of their works — like last week's (never-happened) collaboration between Drake and the Weeknd, "Heart on My Sleeve."

But far more of us have typed a few words on the internet than have ever recorded songs or drawn pictures.

The Washington Post project lets you enter any internet domain name to see whether and how much it contributed to one AI training database. (This isn't the same one OpenAI used for ChatGPT or its other projects; OpenAI has not disclosed its training-data sources.)

"The data set contained more than half a million personal blogs, representing 3.8 percent" of the total "tokens," or discrete language chunks, in the data, the Post team found. (Postings on proprietary social media platforms like Facebook, Instagram and Twitter don't show up — those companies have kept access to their data to themselves.)

Of note: These training databases are enormous but hardly representative. Some cultures, groups and subjects are oversampled; many others are unfairly neglected. And all the biases, limitations and toxic aspects of internet culture show up in the AI training data.

My thought bubble: The personal blog I wrote fairly consistently for 15 years is well represented in the Post data set — along, it seems, with most of the other writing I contributed for ten years to the web magazine I helped create.

If you have any kind of online history, the self-lookup opportunity the Post's research provides is irresistible, like Googling your own name. (There's a similar lookup tool called "Have I Been Trained?" for visuals.)

When you do find your work listed, you're probably going to ask yourself, as I did, "Is this what I wanted?" and "Why wasn't I consulted?" and "What if I'd known this was coming?"

Be smart: AI's hunger for training data casts the entire 30-year history of the popular internet in a new light.

Today's AI breakthroughs couldn't happen without the availability of the digital stockpiles and landfills of info, ideas and feelings that the internet prompted people to produce.

But we produced all that stuff for one another, not for AI.

From this vantage, the existence of these vast "corpuses" of data was a profoundly important unintended consequence of the rise of the web itself.

In 1995, when a generation fell in love with the "www" and the browser, or ten years later, when another generation celebrated the advent of blogs and the "wisdom of the crowd," this outcome was hidden from view.

By the early 2010s, the stirrings of the machine-learning revolution began to make some far-seeing experts uneasy. But it took a very long gaze to sense that the entire web might be about to turn into AI training fodder.

Today, this unintended consequence is front and center in our online experience — reminding us that everything we're doing right now with, and to, AI will in turn shape the future in ways we can't foresee.

For instance: If we unleash a flood of simulacra on our public networks, we risk discouraging people from continuing to share or even make their own original work.

That might leave future AI models stuck forever with the frozen output of humanity circa 2000-2020, with nothing newer to learn from.

Are you ready for ChatGPT to take over your job?

Ακολουθήστε το KNEWS
στο Google News

TAGS

Cyprus | AI | technology

Opinion: Latest Articles

When public trust is at stake, stepping aside is responsibility, not guilt. Image is AI

Presumption of innocence is not a political shield

Legal standards cannot substitute for political accountability in a functioning democracy.

Eleni Xenou

24/02/2026 | OPINION

Officials praise their record but citizens see a widening gap between accountability and impunity.

Dangerous matters

The 'Golden Passports' verdict deepens public mistrust in Cyprus’s justice system.

Dorita Yiannakou

21/02/2026 | OPINION

While historic homes fall to midnight demolitions, citizens and bicommunal initiatives struggle to defend the island’s shared heritage. Photo credit: @TCCHCyprus

The island is drowning in concrete

Unrestrained development is erasing Cyprus’s architectural memory, yet resistance is growing on both sides of the divide. ...

Apostolos Kouroupakis

19/02/2026 | OPINION

Cyprus soap opera ends: Everyone’s innocent, again

From Al Jazeera to carnival floats, the island proves saints never really sin… they just take a coffee break.

Onasagoras

18/02/2026 | OPINION

Cyprus’s Eurovision entry exposed our cultural insecurities.

Mirror, mirror on the wall....

The scandal over “Jalla” says more about us than the song ever could.

Paris Demetriades

17/02/2026 | OPINION

When your salary lasts for only 18 days

Despite modest raises, millions of Greeks, and many in Cyprus, are running out of money before the month ends, squeezed ...

Opinion

16/02/2026 | OPINION

Photo of the Margelina vineyards at the Zambartas winery. Courtesy Zambartas

Meeting Margelina made me fall in love with my homeland

There’s always room to fall a little more in love with the place we call home.

Opinion

13/02/2026 | OPINION

Politics Blog: The circus is in town...and we're the main act

From TV tantrums to Eurovision wars and empty dams, 2026 is holding up a very uncomfortable mirror.

Onasagoras

11/02/2026 | OPINION

The Washington conference on Greece and Cyprus

Greek and Cypriot officials spotlight defense, energy and regional alliances as high-level talks in Washington unfold ahead ...

Athanasios Ellis

11/02/2026 | OPINION

From EU illusions to the normalization of partition.

Our bright future

The European “toolbox” has turned into a Turkish advantage.

Pavlos Xanthoulis

10/02/2026 | OPINION

Big challenges ignored, half-measures applauded, and accountability nowhere to be found. File photo

The President’s band-aid solutions

A State of the Union full of ''band-aids'' for problems that require surgery.

Opinion

09/02/2026 | OPINION

''I can hear my food'': A chef’s passion on display

Mrs. Evroulla Ioannou reminds us that dedication, love, and authenticity are what make great cooking truly memorable.

Opinion

06/02/2026 | OPINION

Politics Blog: Phedonas, Annie and Cyprus's State of the Union

From explosive scandals to bullets in hospitals, the island’s leaders scramble as social media sets the agenda and governance ...

Onasagoras

05/02/2026 | OPINION

File photo of Ayios Dometios crossing point

Selling Cyprus for a single crossing point?

When leadership fears opening a gate, the country risks losing far more than just a checkpoint.

Opinion

05/02/2026 | OPINION

No Cypriots on the Epstein list...cue the great distraction

Explosives disappear, scandals simmer, and the spotlight moves right on time.

Onasagoras

04/02/2026 | OPINION

Coincidence? I think not!

When viral lies collide with power games.

Opinion

02/02/2026 | OPINION

Greece, Turkey and the mind of Donald Trump

Amid Trump’s unpredictable moves, Athens treads carefully with Ankara while pushing energy and EU alliances to safeguard ...

Opinion

30/01/2026 | OPINION

After inflation, the real test begins: growth, reform, and Europe’s strategic choices. Photo credit: Unsplash

2026 will define the Eurozone’s future

Stability has returned, but structural reform will determine whether recovery becomes lasting growth.

Opinion

28/01/2026 | OPINION

With natural beauty and wintry buzz, the question lingers: Why does the food never rise to the occasion? Photo credit: @troodoshotel Facebook

Why doesn’t Troodos have good food?

A love letter to mountain landscapes and a quiet complaint about what’s missing from their tables.

Opinion

27/01/2026 | OPINION

The illusion of the old world has collapsed and time is running out for the European Union. Photo credit: @NChristodoulides Facebook

How von der Leyen can save Europe

If Europe is to gain strategic autonomy and survive the age of hard power, its Commission president must now choose leadership ...

Opinion

26/01/2026 | OPINION

The results will speak for themselves. Photo credit: Cypr24 Polish media in Cyprus Google

Labour policy and the test of leadership

On social dialogue and institutional responsibility.

Dorita Yiannakou

23/01/2026 | OPINION

Cyprus gets a Museum of Modern and Contemporary Art….kind of

SPEL’s name changes, but the artworks, funding, and real museum? Still in progress—yet somehow, we now “have” a museum.

Apostolos Kouroupakis

22/01/2026 | OPINION

Politics Blog: Politics, perks, and a presidential siesta

Cyprus’ ''national prince'' faces pressure from all sides...and videogate isn’t helping.

Onasagoras

21/01/2026 | OPINION

Cyprus property market 2025: More cash, unexpected winners

Sales barely moved, investment jumped, and the real estate map of Cyprus quietly shifted.

Andreas Andreou

19/01/2026 | OPINION

Nicosia celebrates the reopening of Famagusta Gate after seven years

Historic landmark restored and ready for events, art, and public life, as the city looks to turn Nicosia into a thriving ...

Apostolos Kouroupakis

15/01/2026 | OPINION

From presidential legacy to a Hollywood-style scandal

While Cyprus honors George Vassiliou for building a proper state, the current government juggles Dutch, Russian, and Turkish ...

Onasagoras

14/01/2026 | OPINION

When some invasions matter more than others, selective empathy becomes hypocrisy. Photo credit: bbc.co.uk

Let Zelensky put down his “scribble” too, and let’s be done with it

Bitter memories do not allow Marios Matsakis to rest; they do not permit him to remain unmoved in the face of hypocrisy.

Opinion

13/01/2026 | OPINION

In Cyprus, cash buys everything...candles, votes, and all

From street vendors to oligarchs, and even the President’s inner circle, money moves the island’s power plays.

Opinion

12/01/2026 | OPINION

News Room

Popular

From the same Author

Opinion: Latest Articles