From Pencils to Porsches: Rethinking How We Talk About Large Language Models

If you’ve used ChatGPT, Claude or DeepSeek lately, you’ve probably noticed something: everyone refers to them as “AI”. Tech companies do it. Marketing teams do it. Even the news does it. But here’s the thing – they’re not really artificial intelligence in the way most people imagine.

What We’re Actually Talking About

Those interfaces you’re using? They are Large Language Models (LLMs) – or more precisely applications built on top of LLMs. The distinction matters more than you’d think.

According to the International Organization for Standardization, which establishes the official terminology for AI, true artificial intelligence refers to computer systems designed to perform tasks that would require human intelligence – things like reasoning, learning, and perception. But here’s what the ISO definition doesn’t guarantee: consciousness, genuine comprehension, or awareness of context.

LLMs are sophisticated pattern-recognition machines trained on massive datasets. They predict the next word in a sequence with impressive accuracy, but they don’t “understand” what they’re saying any more than your autocorrect understands your texts. As Yann LeCun, Meta’s Chief AI Scientist and Turing Award winner, put it: “That’s not understanding – that’s imitation”.

So when we call a LLM “AI”, we’re making the same mistake as calling a car a driver. It’s like calling your mee-krow-wah-vay a chef, or autocorrect a writer, or your calculator a mathematician.

Sure, the car is sophisticated machinery. But sophisticated doesn’t mean sentient. It’s not the thing making decisions behind the wheel. At least, not yet.

The Marketing Scheme

Given the hype and talk around LLMs being marketed as “AI”, you’ve probably tested whether these tools could handle complex tasks – the kind that required your professional training and expertise to complete. And if you did, you likely discovered what the rest of us found: they’re not as impressive or precise as marketing teams made them out to be.

The more you experimented, the clearer it became: “AI-powered” often functions as a marketing strategy to sell tools that require significant investment without necessarily delivering returns. Even when these tools produce something usable, it’s only after closer human observation, constant evaluation, and multiple rounds of corrections that you get anywhere near the results you were expecting in the first place.

And it’s not just anecdotal frustration. A December 2025 study of LLM agents in production environments confirmed what many professionals already suspected. The research found that 68% of production agents require human intervention within just 10 steps of attempting a task, and 74% depend primarily on human evaluation rather than autonomous decision-making to function properly. These aren’t experimental prototypes – these are deployed, real-world systems that companies are actively using.

In other words, even when these tools are supposedly “working”, they’re working with a human babysitter standing right behind them. This doesn’t sound like “complete automation” to me.

Here’s the thing that makes this particularly irritating: people with zero professional experience in a field will rave about the “amazing performance” of the generative output in domains they’ve never actually worked in. We’re looking at you, self-proclaimed “AI artists” heralding “the next era of cinematography” while having never touched a professional camera, looked at the theory of storytelling or studied composition. It’s easy to be impressed by something when you don’t know what good actually looks like. After all, even the yearly upgraded iPhones haven’t turned you into professional photographers.

Stop Comparing LLMs to Pencils

In everyday conversations, and more importantly, in serious debates about LLMs people inevitably reach for tool comparisons. It makes sense: LLMs are tools, after all. But here’s where the conversation goes wrong: the comparisons people default to are almost always pencils, hammers, or calculators. And that’s not just oversimplified, it is intellectually lazy and misleading.

LLMs didn’t appear overnight. They’re a product of decades of research in machine learning, natural language processing, and computational linguistics. Building a well-functioning LLM requires massive datasets, specialised hardware consuming enormous amounts of energy, code build on complex mathematical foundations from statistics, calculus, and linear algebra. And that’s before you get to the continuous refinement and maintenance once it’s deployed.

So here’s the actual standard we should be using: if we’re going to compare LLMs to anything, compare them to tools of similar complexity. Tools that took decades to develop. Tools that require constant maintenance and upgrades. Tools that changed entire industries but also came with serious risks.

And that brings us to the car comparison.

Why Cars Are the Right Comparison

Whilst I’m no car expert and I rely on the knowledge provided from high-school, let’s break down what makes a car a complex tool and how LLMs compare in relation to those aspects:

It’s a multi-component system. A car is made of thousands of interconnected parts, such as engine, transmission, electrical system, safety features, just from the top of my head. LLMs similarly comprise multiple complex architectures: neural networks, data collection and prep, attention mechanisms, transformers, optimization formulas, training pipelines, and deployment infrastructure.
A car requires continuous maintenance. It needs regular servicing and repairs. LLMs, in turn, require ongoing updates, fine-tuning, and infrastructure maintenance across data centers which are filled with GPUs. The running units at the centers, completing millions of high-intensity computations, account for about 40% of total energy use at the above-mentioned data centers.¹
Cars have shown to have severe environmental impact. They generate carbon emissions. So do LLMs, and massively, at that. Training GPT-4 alone consumed an estimated 28.8 kWh of electricity over 100 days, and amounted to 6,912,000 kilograms (6,912 metric tons) of CO2 equivalent emissions.² Meanwhile, GPT-3 training phase consumed an estimated 700,000 liters, or 700 kL of water.³
Cars are incredibly useful, but potentially harmful. Cars get you from point A to point B efficiently, but unregulated cars caused deaths, pollution, and traffic chaos. LLMs can assist with tedious tasks when used correctly, but unregulated use creates its own set of harms. (see upcoming section on “Documented Harms”)
Car forced us to introduce global regulatory frameworks. We have international standards for driving, be it licenses, road rules, emission standards. You can drive a car in most countries without relearning everything because we’ve achieved remarkable global cooperation on regulation.

Which Brings Us to Regulation

If LLMs are as complex as cars, and in some ways we’re striving for them to become the “driver” rather than just the vehicle, we need similarly robust regulation.

Consider how we handle cars: there are age restrictions, you must pass exams providing competence; your license can get suspended or revoked; there is global cooperation to regulate environmental impact of the cars; and, finally, we have safety requirements – the mandatory features that protect both the users and bystanders.

Now consider the current state of LLMs: as of this day anyone of any age can access them with virtually no training, no understanding of how they work, and no accountability for misuse.

Some of the Documented Harms

The consequences of unregulated access are already visible.

A 2025 MIT Media Lab study (not yet peer reviewed at the time of this writing) found that students using ChatGPT showed the lowest brain engagement and consistent under-performance at neural, linguistic, and behavioral levels compared to those using search engines or their own knowledge. Dr Zishan Khan warned “From a psychiatric standpoint, over-reliance on these LLMs can have unintended psychological and cognitive consequences, especially for young people whose brains are still developing.”

A comprehensive 2026 Brooking Institution report, covering 50 countries, found that using LLMs in education can “undermine children’s foundational development” and can lead us down a doom path of AI dependence. Students would heavily rely on technology and reduce much of their own thinking, which if you ponder about it just enough, most likely will lead to cognitive decline.

Recently more and more of controversies came to light, all to do with unregulated use of LLMs. Here is an article that captures most recent 27 scandals, ranging from authoritarian use of AI for surveillance, to digitally undressing children and women, to layoffs of thousands of people, because “AI” is meant to make us super-productive (just what we need more of in this day and age – being more productive…).

What Regulation Should Look Like

Just as cars, regulation should be multi-faceted.

Age-appropriate access. We must implement meaningful age verification and some sense of actually working parental controls, not just terms of service that kids click through.
Mandatory education. Just as before operating as complex machinery as a car we study and practice in controlled environment, similarly before using LLMs, users of all age and background should understand how these tools work, their limitations and their risks. This doesn’t mean everyone needs a computer science degree, but basic “LLM literacy” should be as standard as driver’s education.
Environmental accountability. Data centers must face the same emissions scrutiny as automotive manufacturers. Users should be informed about the environmental cost of their queries every single time.
Transparency requirements. Companies should disclose what data was used for training, how it was obtained, how their systems work, and where they fall short.
Usage standards. There should be clear guidelines on appropriate vs. inappropriate use cases, particularly in education, healthcare, and other high-stakes domains.

Side note: In an ideal world, LLMs wouldn’t have been rushed to market with billion-dollar marketing campaigns and zero regulatory oversight. They would have been introduced thoughtfully, with careful consideration of long-term consequences. But that’s not what happened.

Now we’re in a position where some people argue we should stop using generative AI altogether. And while I understand the impulse, that’s simply not realistic. These tools are publicly accessible, often free, and genuinely help people manage the insane productivity expectations of modern work, even they create even more work for people. You can’t put that genie back in the bottle.

Consider the global reality: if you’re struggling economically, living paycheck to paycheck, competing for underpaid work in your local market, supporting a family with limited resources, with zero access to “progressive” opportunities, and suddenly you have access to a free tool that could make you exponentially more productive, would you really refuse to use it? Would you prioritize abstract long-term societal concerns over immediate, tangible benefits to your survival?

This isn’t about morality. It’s about economic pressure and rational self-interest. Expecting billions of people worldwide to collectively refuse a powerful, accessible tool because of potential future harms is not only unrealistic – it ignores the very real, very immediate problems people are trying to solve right now.

Moving Forward

None of this means LLMs are inherently bad or should be banned. Cars aren’t bad either – they’ve transformed society for the better in countless ways. But we recognize that cars require careful regulation to maximize benefits and minimize harms.

The same principle should be applied to LLMs. We’re past the point where we can treat them as simple tools like pens or calculators. They’re powerful, complex systems with significant societal impact, for better and worse.

The question isn’t whether to regulate, but how to do it effectively before the harms become even more entrenched. We didn’t wait for decades of catastrophic road crashes before mandating seat belts. We shouldn’t wait for an entire generation of cognitively impaired young people, or for non-consensual sexualized deepfakes to become so normalized that we can’t put the technology back in the box. The damage is already happening.

LLMs are as complex as cars. They can be beneficial as cars. And they can cause as much harm as cars and on a much bigger scale when left unregulated. Just as we achieved global cooperation on traffic laws, emission standards, and safety requirements, we need that same level of international coordination now. Before the costs become irreversible. This isn’t something governments can fix alone. It requires public pressure, sustained advocacy, and a collective refusal to accept “move fast and break things” as an acceptable approach to technology that affects billions of lives.

If we want to go further down the whole energy consumption, this McKinsey report (based of October 2024) currently a data center in USA requires 60 gigawatts (GW) of energy. Here is a fun article that tells you how much 1 GW in various representations. But if you want to understand in more relatable terms, this article tells you precisely how many USA households just 1GW can power throughout the year. Meaning with 60GW = 525,600GWh you can power 52,560,000 households in a year assuming are spending 10,000 KWh per year each (which is what the average is telling us). 52 million, Karl. ↩︎
An average US household carbon emission is 48 ton CO2 per year. So it took about 144 households worth of carbon dioxide emissions PER YEAR just to train GPT-4 for 100 days. ↩︎
In comparison, an average US household consumes 9,400 gallons annually, which equates to roughly 42kL of water per household per year, meaning it took 16 worth of households of annual water consumption to train GPT-3. ↩︎