Next.js App Router + React Server Components Demo

new
past
show
ask
show
jobs
submit

▲Lessons for Agentic Coding: What should we do when code is cheap? (dbreunig.com)

269 points by ingve 4 days ago | 238 comments

spicyusername 4 days ago [-]

A lot of people down on AI in this thread, but I'm watching the industry slip over the line of trust with these latest frontier models. GPT 5.5 is the first model good enough for me to just let rip.

Every jira ticket I see now has acceptance criteria, reproduction steps, and detailed information about why the ticket exists.

Every commit message now matches the repo style, and has detailed information about what's contained in the commit.

Every MR now has detailed information about what's being merged.

Every code base in the teams around me now has 70 to 90%+ code coverage.

Every line of code now comes with best practices baked in, helpful comments, and optimized hot paths.

I regularly ship four features at a time now across multiple projects.

The MCP has now automated away all of the drudgery of programming, from summarizing emails, to generating confluence documentation, to generating slide decks.

People keep screaming that tech debt is going to pile up, but I think it's going to be exactly the opposite. Software is going to pile up because developing it is now cheap.

Most code before llms sucked. Most projects I on-boarded to were a massive ball of undocumented spaghetti, written by humans. The floor has been raised significantly as to what bad code can even look like, and fixing issues is now basically free if your company is willing to shell out for tokens.

HarHarVeryFunny 4 days ago [-]

> Software is going to pile up because developing it is now cheap.

Software to do what, though ?!

Coding, maybe 10% of a developers job (Brooks "Silver Bullet" estimates 1/6), was never the bottleneck, and even if you automated that away entirely then you've only reduced development time by 10% (assuming you are not doing human code review etc).

I would also argue that software development as a whole (not just the coding part) was also typically never the bottleneck to companies shipping product faster, maybe also not for automating their business faster (internal IT systems), since the rest of the company is not moving that fast, business needs are not changing that fast, and external factors that might drive change are not moving that fast either.

I think that when the dust settles we'll find that LLM-assisted coding has had far less impact than those trying to sell it to us are forecasting. There will be exceptions of course, especially in terms of what a lone developer can do, or how fast a software startup can get going, but in terms of impact to larger established companies I expect not so much.

rafterydj 4 days ago [-]

+1 for any mention of Fred Brooks. I like your point about software as a whole not being a bottleneck. In the 1970s the hardware was co-evolving with business uses (it still is, but constraints were much more severe) leading to large headcounts on software projects that _absolutely_ had to work and _absolutely_ required uncommon expertise. Most people had no concept of a computer's capabilities, computer science was not as widely distributed.

One thing that I would point to today to show that the landscape is different - the average programmer/engineer/developer today has no actual admin staff. Fred Brooks' example team setup of "The Surgical Team" has more support staff than programmers. Anyone who responds to the questions like "who manages the calendar" and "who manages the documentation" will state that the engineers doing it themselves offer the best results. Same goes for designing test cases, performing rollbacks, etc.

The fact of the matter is that any self respecting engineer today works in an environment where pro-activity and self-sufficiency are prerequisites. Managing your calendar and workload, communicating to leadership and users, these are all common tasks that would have been another person a generation ago.

So when discussing writing code more efficiently and aiding in software development, what I am essentially seeing is more people trying everything they can to offload work that used to be another person's job anyway. If you care about communication - you offload coding standards. If you care about security - you offload feature refactors, and so on.

In my opinion, I think that at some point we'll either realize that we need highly competent people _and also_ regular people to help us ensure the work gets done to a good standard. Or, we will each eventually survive by working alone in a room with a suite of AI tools, and wonder why we're still making software in the first place.

worik 4 days ago [-]

> Most people had no concept of a computer's capabilities, computer science was not as widely distributed.

I am not sure that has changed....

philwelch 4 days ago [-]

As I recall, “No Silver Bullet” fundamentally rested on the assumption that the subroutine was the last word in abstractions to make programming more efficient, which probably wasn’t even defensible at the time because Lisp had already been invented, and is even less defensible after the past several decades of programming language research. Brooks was still onto something when it came to irreducible complexity, but offloading complexity an LLM can tackle to the LLM still saves time.

One of the lesser discussed Brooks essays is actually the best description of AI-first development: the “surgical team”. It just turns out the surgeon is the only human, and like many modern surgeries, the surgeon is controlling a robot instead of operating by hand.

It would be interesting to reread The Mythical Man-Month and see how each essay applies to AI-first development.

skydhash 4 days ago [-]

The sibling comment to yours (by rafterydj) makes a good point that the surgical team is necessary, but we have eliminated the positions and put the roles on the same person. It’s like the writer being the subject expert, the reviewer, the editor,… which we all knows leads to mediocre work.

g42gregory 4 days ago [-]

> Software to do what, though ?!

Replace all Oracle Applications in the Enterprise, for example. That will keep Corporate IT/Dev teams busy for quite a while.

Of course, this does not involve Oracle infrastructure, such as Database.

pojzon 4 days ago [-]

We can all agree that very big portion of the time needed during product engineering is.. syncing progress, requirements, plans, etc etc. And we have to do it over and over due to how big teams are.

Fast forward, fire half of those ppl, for sure fire all middle managers, scrum masters, coaches, wooden-architects.

Suddenly you save up so much time on syncing, you can ship twice as fast.

And NO, quality and impact doesnt go down. It actually goes up.

This is probably something you did not want to hear :)

Few competent ppl with AI are much much much better than dozens of medicore teams.

We need now „Product Builders” and „Product maintainers”. All of the other roles lost value.

tracker1 4 days ago [-]

That's kind of the point in GP... everything around the code has improved... the workflows, definitions, documentation, process. I'd say that all of those things are improving and expanding at a rate faster than the improvements in code output, which are also happening at a faster turn around than actual people.

I've said several times that when I use an Agent, I'm getting about 2-4x the value and about 10x the output... the "value" is features landing in code and the difference to the 10x is documentation and testing. While a lot of that may not get reviewed by every person that touches a product, it helps with further ai based feature development.

I'm not a big fan of running many agents or outright vibe coding slop... but you can definitely leverage the coding agents and get a lot of improved output.

HarHarVeryFunny 4 days ago [-]

I'm not talking about what the developer is doing - I'm talking about what the company is doing in terms of initiating new development work. Again, startups and one-man shops are different because you control your own pace, but in many large corporations you may sit around just minding shop until the next big product development comes along (I would use this time to start my own initiatives to build tools and libraries to help the team), and that company pace is not being determined by how long development takes.

This is especially true if like most developers you are not working at a company where software is the product, but rather where software is part of the product, or where you are part of IT working on internal systems, not part of product development at all.

pryelluw 4 days ago [-]

Then is it a real 10x increase in output if the output is in supporting areas and not the code/feature delivery? It seems like you’re saying that you are now able to maintain documentation at a faster rate and increase testing but not the actual development speed of the feature itself.

tracker1 4 days ago [-]

I said 2-4x on value.. which would be development of the feature itself in terms of direct output... not even considering setting up test harnesses and doing more adventurous changes that would take me a lot longer to do.

pryelluw 4 days ago [-]

I was referring to the sentence: I'm getting about 2-4x the value and about 10x the output... the "value" is features landing in code and the difference to the 10x is documentation and testing.

What does the 10x imply?

And are you saying that you are outputting 2-4x as in: value * value * value * value in the case of 4x? That seems rather high.

tracker1 3 days ago [-]

10x is 10x the output I could have done by myself. 2-4x is between 2 and 4 times the output I could have done myself. I distinguish "value" as that which is strictly a concern for the users/stakeholders, who themselves don't necessarily consider robust testing harnesses or internal-only documentation a value.

I said 4x as a cap for value, I don't know how you interpret that as x^4 ...

4x is 4 * x, x^4 would be your xxx*x ...

nsxwolf 4 days ago [-]

Everybody’s cooking, nobody’s eating.

whattheheckheck 4 days ago [-]

Pivot to be a user and demand requirements

4 days ago [-]

ikrenji 4 days ago [-]

you can't continue shipping code the same way pre-LLM vs post-LLM and expecting a huge speed gain. the trick is abandoning the old models and bottlenecks and embracing the new possibilities enabled by LLMs. requires a high trust environment

HarHarVeryFunny 4 days ago [-]

So, get rid of the marketing department, get rid of business owners, agile, budgets, and everything that is stopping developers from creating their self-conceived vibe-coded creations as fast as they can ?

butlike 4 days ago [-]

A hard dependency like that doesn't seem flexible. There's no such thing as a high-trust environment, abstractly.

ikrenji 4 days ago [-]

there are workplaces that trust their employees to do the right thing to higher and lesser level. places that empower the employees the most will see the highest gains from LLMs

LtWorf 4 days ago [-]

After having a job, I do not trust my coworkers. I might trust a few selected ones, but I certainly do not trust all of them.

Synthetic7346 2 days ago [-]

Just fire them and replace them with agents /s

RealityVoid 4 days ago [-]

High... Ugh, trust... In the... LLM? Hah!

ikrenji 4 days ago [-]

in the employees...

banannaise 4 days ago [-]

The ticket has subtle errors in its description that are only caught by someone experienced with the codebase.

The code hides an exception behind an if-then-else that defaults to the most common state, which isn't caught until it breaks things for the 1% of users who don't have that state.

The new feature quietly breaks a feature not covered by the acceptance tests.

The documentation is four times as long and nobody who relies on it can read it.

And I'm stuck spending my time going over tickets with a fine-toothed comb, reviewing PRs, and mentoring contributors to prevent all of this garbage from ending up in the live code.

elfly 4 days ago [-]

I will give you 4.

1, 2 and 3 happened a ton in the good old times before AI. If anything, we can make the code be more tested than before, but that requires a lot more engineering, that is made easier by LLMs.

It's just we haven't adapted to do them.

chorsestudios 4 days ago [-]

People noted similar issues ever since LLMs came out, but the rate at which they have been rapidly improving on all of these is significant. Documentation being 4x too long could probably be fixed with a rule instructing the agent to keep it concise and no longer than 2-3 paragraphs.

banannaise 4 days ago [-]

They seem to be converging toward an asymptotic accuracy level that is not particularly close to 100%. That is not good enough when you're trying to instruct engineers, particularly junior ones.

skydhash 4 days ago [-]

Conciseness is variable. Sometines a paragraph is enough, some other tome you need multiple chapters and a glossary to get the point accross. 4x too long may be 4 paragraphs in the first case and a 400 pages book in the second case.

Adding a rule like yours is not the solution.

glial 4 days ago [-]

Definitely, but the first 3 issues are also created by human co-workers.

oliveralbertini 4 days ago [-]

Do you use Microsoft Copilot ?

neya 4 days ago [-]

What you are describing is a the role of a manager, not a software engineer. Software engineering has very little to do with writing code, but more on architecting at the higher level on what needs to be done. The code is just the executional part. LLMs can code? Ok good. Without a clear architectural pathway / direction, that code is just useless. It's not tech debt. It's just a bunch of random strings. You can argue that Claude code and others do create a plan of attack - but still, it's not at the architectural level, but rather executional level.

To me, architecture starts all the way from the top - even before you write a single line of code, you do the DDD (Domain-Driven Design) and then create a set of rulesets (eg. use the domain name as table prefix) and contexts and then define the functionality w.r.t to that architecture. LLMs can do all this - only if you ask them to explicitly. So, they are pretty useful to brainstorm with, but not autonomously design reliably and push it to production with your eyes closed and support a 100,000 user base. It's a far cry from that.

But sure, you can upsell to management about the vanity metrics like lines of code and get that promotion with LLM. But, it's still not software engineering.

threepts 4 days ago [-]

That is why we have SWE bench pro, they test architecture design too, turns out 1000 dollars of tokens outperform 10k dollars of labor in meta design.

SpicyLemonZest 4 days ago [-]

That's just not accurate. I haven't studied SWE Bench Pro in detail, so I can't tell you exactly what the flaw is, but SOTA models routinely make bad architectural choices I have to intervene to fix.

threepts 4 days ago [-]

You can read the paper here: https://labs.scale.com/papers/swe_bench_pro

TL;DR its very effective as it directly tests model on REAL codebases: "The benchmark is constructed from GPL-style copyleft repositories and private proprietary codebases". The use case is very real.

SpicyLemonZest 4 days ago [-]

It doesn't sound to me like this benchmark is attempting to measure architecture design. As far as I see in the paper, they do not evaluate the architectural quality of a task completion, only whether the model is capable of completing it at all.

dawnerd 4 days ago [-]

1000 dollars of subsidized tokens.

margalabargala 4 days ago [-]

Eh.

It's "not software engineering" but neither was what most people writing code did before LLMs.

> Without a clear architectural pathway / direction, that code is just useless. It's not tech debt. It's just a bunch of random strings

This is pretty clearly false. It's a bunch of random strings that you can compile and run to do what you want. It's more akin to a black box. A compiled closed source dependency.

xXSLAYERXx 4 days ago [-]

Agreed. I never considered myself an "engineer". Honestly just a regular code monkey. Software Engineer was just my job title. Folks higher up the ladder did engineer software. You know what? It sucked. Was always broken, we were always patching, we never saw around corners. But hey - they software engineered it.

alrtkh 4 days ago [-]

For people who like to tick boxes, which is essentially most of the above, AI is welcome. That includes managers.

It still has nothing to do with software engineering. All good code was written by humans. AI took it, plagiarizes it, launders it and repackages it in a bloated form.

Whenever I look deeply at an AI plagiarized mess, it looks like it is 90% there but in reality it is only 50%. Fixing the mess takes longer than writing it oneself.

peab 4 days ago [-]

How can you say it has "nothing to do with software engineering" with a straight face?

I think you might be in serious denial.

Of course writing code isn't the only task of a software engineer, but it's an important one.

There wouldn't be so much controversy if it wasn't the case

zozbot234 4 days ago [-]

"Writing code" as a task of its own is called cowboy coding. It's neat that AI can do this now, but that has nothing to do with proper software engineering which always starts from a careful, human-led design.

philwelch 4 days ago [-]

Yes and every AI-first development workflow worth its salt does exactly this, and it does it much more thoroughly than I’ve ever seen a team of meatbags do it.

My workflow, at a high level, is:

1. I write a high level spec. Not as high level as a single-sentence prompt, but high level enough to capture my top requirements.

2. I prompt the AI to interview me about the spec to clear up any ambiguity or open questions, then when I’m satisfied, the AI writes a longer spec, which I then review.

3. Then I prompt the AI to write an implementation plan based on the spec. I might just skim this, and by this point I might be asking the LLM more questions than it’s asking me.

4. Now I hand it off to the implementer agent.

This isn’t cowboy coding, it’s not even agile. It’s waterfall. The problem with doing waterfall was that it’s too slow, especially with the deserialization/serialization cost of routing all of this documentation through meatbrains. The LLM is doing just as much work, true, but faster.

The thing I found surprising was that, while LLM’s are still pretty awful at writing as an art form, they are better technical writers than I have the time to be, especially when writing for an audience of other LLM’s.

skydhash 4 days ago [-]

Is this project in production and for how long? How many users?

peab 4 days ago [-]

"has nothing to do with proper software engineering"

So you're saying software engineers don't write code? Just because there are other things that SWEs do, does not mean it has nothing to do with it.

It's arguably a pretty important part. Would you really hire a software engineer who can't code?

shimman 4 days ago [-]

Writing code and copying the output of an LLM is absolutely not the same.

You wouldn't call someone an author that takes LLM outputs and shoves it in a book. IDK why this distinction doesn't apply to devs too.

peab 4 days ago [-]

You call someone an author when they use a ghostwriter. They're giving inputs that are core to the output, even though they aren't doing all the writing. Same thing.

shimman 4 days ago [-]

I can assure you a sizable amount of people in the writing community look down on "authors" that only use ghostwriters.

Why do tech workers act shock that people hate this junk being force fed to them that they are now resorting to violence to reject said junk?

You think telling humans with specialized crafts that they don't matter is good politics? Good grief.

peab 4 days ago [-]

Of course.

I'm not surprised at all that devs are upset.

>You think telling humans with specialized crafts that they don't matter is good politics? Good grief.

Yeah, of course not. There are lot's of historical examples of this. That being said, those historical examples don't play out well for the craftsmen, either.

Look, I'm a SWE myself. I see my job drastically changing right in front of my eyes. I know there's nuance to it, too, that's hard to articulate in these comment threads.

But I think a lot of people here are biased against thinking that they are irreplaceable - I've definitely been in that camp. I don't think that it's wise, however.

QuercusMax 4 days ago [-]

Or even more appropriate: a movie director is almost never on-screen but the actors aren't the ones determining the shots to use or writing the script.

fernandotakai 4 days ago [-]

>You call someone an author when they use a ghostwriter.

i don't know about you, but i absolutely don't. either you write the book yourself or you are not the author.

as kendrick lamar wrote:

I can dig rappin', but a rapper with a ghostwriter?

What the fuck happened? (Oh no)

jf22 4 days ago [-]

What's a good example of human-led design?

zozbot234 4 days ago [-]

The hard part of software engineering is turning a vague problem description into a set of box-ticking exercises. If ticking boxes became genuinely easier, the software engineering part is now a lot more valuable.

philwelch 4 days ago [-]

You’re reminding me a lot of those old assembly hackers who thought compilers were bullshit because they could hand-write better assembly. And I don’t mean that as an insult; those guys were probably right about their assembly code, just like an Amish craftsman will make better furniture than a factory in China. The problem is that the world needs more furniture and more software than skilled craftsmen can produce, and the skill gap between the craftsman and the mass production process is diminishing fast.

We’re still going to have handwritten software, just like we still have handwritten assembly. It just won’t be the norm.

readitalready 4 days ago [-]

No fixing the mess definitely does not take longer than writing it oneself.

Your linter should identify all issues - including architectural and stylistic choices - and the AI agents will immediately repair them.

It's about 1000x faster than a human code at repairing its own mess.

applfanboysbgon 4 days ago [-]

> Your linter should identify all issues - including architectural

If a linter could deterministically identify bad architecture, you wouldn't need an LLM, your linters could just write your code for you. The vibe coding takes are just getting more and more empty-headed...

readitalready 4 days ago [-]

Your custom linters don't check architectural design?

linters statically check code and provide deterministic recommendations. LLMs are used to make judgement. I specifically write my linters for my project to make recommendations for LLMs.

This is how you save on token usage, so your LLMS aren't wasting tokens on static analysis that a linter could do for free.

That's at least how I make my linters.

hansmayer 4 days ago [-]

> If a linter could deterministically identify bad architecture, you wouldn't need an LLM,

a) that's not what a linter is built for, its a tool with very specific role

b) You must've never seen LLM expose secrets in plain text or use the most convoluted scenarios you can think of.

wilkystyle 4 days ago [-]

I think you missed the point of the person you are replying to.

duskdozer 4 days ago [-]

>I regularly ship four features at a time now across multiple projects.

Well, this explains why so much software nowadays is so slow, buggy, and chaotic.

agency 4 days ago [-]

Unlike 3 years ago, when nobody complained about software being slow, buggy and chaotic

kusokurae 4 days ago [-]

Incredibly impressive how, the moment AI becomes the topic of conversation, trivial things such as speaking in relative terms become incredibly difficult for the more addled of the prompting users.

ariedro 4 days ago [-]

Hell has no true bottom.

p2detar 4 days ago [-]

> I regularly ship four features at a time now across multiple projects.

Can that happen without you? I would assume this is the next step. I don't find it either good or bad, but I'm genuinely curious where this all goes.

gom_jabbar 4 days ago [-]

> I'm genuinely curious where this all goes

Maybe toward autonomous/sovereign capital with no humans in the loop, not even at the level of (asset) ownership.

spicyusername 3 days ago [-]

It can't happen without someone, but certainly it can happen with a lot less people, which is what's going to happen to the industry. Some days I'm shook to my core about how much I did relative to how long it would have taken just a year ago.

All software engineers will become product managers as the agents take over doing the bulk of the work.

Companies will either do the same with less or more with the same.

My opinion is that any company whose business model is selling software is going to go out of business.

onion2k 4 days ago [-]

Software is going to pile up because developing it is now cheap.

It won't, because right now we're busy exhausting the vein of good-ideas-we-wanted-to-build, and that's the source of all the good stuff you listed. When that runs out you'll see teams building any old crap because building is cheap, and learning that experimenting by putting any old crap in front of users is a fast way to burn goodwill and brand loyalty.

You still need good ideas and the taste to choose which to put out there over the bad ideas that people actively dislike.

onlyrealcuzzo 4 days ago [-]

> I regularly ship four features at a time now across multiple projects.

Many people are missing the fact that LLMs allow ICs to start operating like managers.

You can manage 4 streams now. Within a couple years, you may be able to manage 10 streams like a typical manager does today.

IME, LLMs don't speed you up that much if 1) you're already an expert at what you're doing (inherently not scalable), 2) you're only working on one thing (doesn't make sense when you can manage multiple streams), or 3) doing something LLMs are particularly bad it (not many remaining coding tasks, but definitely still some).

zozbot234 4 days ago [-]

A manager doesn't have to look at the code that's being shipped. An IC will still need to do that, and this will eventually take up much of their work. It can be addressed by moving up the stack to higher level and more strictly checked languages, where there's overall less stuff to review manually.

onlyrealcuzzo 4 days ago [-]

People typically think it's not a new person's fault if they come in to a team and bring down production.

That's a failure of the existing infrastructure to allow someone to do this.

LLM coding will work like this.

If you're letting LLMs go wild with no system in place to automatically know they're moving in the right direction and "shipping" things up to your standards, the failure is you, not the LLM.

girvo 4 days ago [-]

The dirty secret is all the people talking about shipping 4 features a day etc are just lying about reviewing anything. They don’t review it at all.

spicyusername 3 days ago [-]

I didn't say shipping a day. I said shipping at the same time.

The review comes at the end, though I truly believe this will go away as well. Agents will also get better at review until they're good enough that no one will want to do it anyways. Good enough is good enough.

swader999 4 days ago [-]

I review more thoroughly and faster with Claude than without.

Salgat 4 days ago [-]

Claude absolutely improves code review quality, but it still misses a lot. It's a second pair of eyes, it doesn't replace/remove the work you have to put in to fully review the code yourself.

It's like saying that you code reviewed faster just because someone else also reviewed the code, that's not how it works.

swader999 4 days ago [-]

Agree, and with CC my volume and quality of PR review has substantially increased since 4.5. Without CC for review we would have a ridiculous bottleneck in our dev/qa pipeline.

girvo 4 days ago [-]

I'm faster, sure, but more thorough, no. The same, because I was already very careful. But it's not a massive win either; 4.7 misses too much still because it would need to read too much of the context each time to understand the architectural problems I'm catching.

Its nice to not have to care about nits and other things that we don't have lints for though, so that's useful.

hansmayer 4 days ago [-]

Spot on. When will the cretins understand, it's not about how much code you can generate.

jnwatson 4 days ago [-]

Just like a manager, you don't need to look at the code. You need to set up quality systems to provide evidence the code does what it is supposed to do, just like a manager.

skydhash 4 days ago [-]

I’ve never met a manager that have setup “quality” systems to ensure that the job is done correctly. Their actions are always retroactive. And not pertaining to code at all. The overarching contract is “You do a bad job, you will be fired”.

SpicyLemonZest 4 days ago [-]

Code review has a number of important purposes beyond merely verifying functionality. It's true that some managers don't recognize this, fail to allocate time for anything but feature work, and then wonder a few years later why the software is so buggy and new feature development is so hard.

AtlasBarfed 4 days ago [-]

A software engineer was always a manager.

Software engineers were always creating, maintaining and updating automated business processes. In olden days we would have computers, that is rows of people computing things. That room of people is replaced with code in von Neumann machines.

The economic tension has always been a resistance to grant programmers status and class of management. Instead management wants to treat programmers like labor.

yodsanklai 4 days ago [-]

Sometimes I wonder if people praising AI work on the same type of code as I do.

Just now, I was working on a bug report. I had Claude write the code. Perfect, CI is green, new tests, everything seems fine. Took me 5 minutes. Then looking closer, I can see that there may be a performance regression and that the code seems pretty verbose. I iterate on the prompt "of course, you're right, let me fix this". New code is even more verbose, lots of comments that shouldn't be there, the code is more intricate, it takes me some time to understand what's going on. Plus new test cases to review.

After a day of asynchronous iterations on this, I finally sit down to look at this problem. There was a one line fix that Claude couldn't find on its own.

I lost time, reviewer lost time, and if this had been shipped as is, the system would have been worse. I could go on and on because this happens daily. And the worst part is teammates submitting slop.

altruios 4 days ago [-]

> and fixing issues is now basically free if your company is willing to shell out for tokens.

Does "basically free" to you mean for you just that someone else is paying the cost? That's a mentality that has only made the world worse when applied to a wider range of things. Be hesitant in that line of thinking, I suggest, and consider the future.

nyxtom 4 days ago [-]

I agree with most of this, I just have sort of turned a blind eye to what the code actually probably looks like. Reviews are rapid, and I’ll admit I do feel like I’m betraying my inner programmer by just optimizing directly against the claims of token bot. But the way I see it, as long as the numbers don’t lie I’m okay with the process.

oblio 4 days ago [-]

> Software is going to pile up because developing it is now cheap.

https://somehowmanage.com/2020/10/17/code-is-a-liability-not...

j16sdiz 4 days ago [-]

Kind of like credit card.

Every american learns how to live with debt :)

oblio 4 days ago [-]

I don't feel so good, Mr. Stark:

https://www.federalreserve.gov/releases/z1/dataviz/z1/nonfin...

reus09 3 days ago [-]

I'm seeing the exact opposite with LLMs. So much unmaintainable brittle code is being generated since devs are not even looking at the code and LLMs are dumb like 75% of the time

BlueRock-Jake 4 days ago [-]

Agreed on the floor being raised. The part I'd push back on is "fixing issues is now basically free." That's true for the issues that surface in code review or a failing test. The new class of issue is good-looking code that does something unexpected at runtime, usually through chains of tool calls that each looked fine in isolation. Those don't fail your tests. They fail in prod, sometimes quietly.

shakabrah 2 days ago [-]

5.5 came out two weeks ago. Maybe wait a bit before declaring victory?

kiba 4 days ago [-]

Everyone talks about productivity as if that is the only metric that matters in the business.

The MCP has now automated away all of the drudgery of programming, from summarizing emails, to generating confluence documentation, to generating slide decks.

I wonder about the hallucination. Reading someone's writing doesn't take all that long.

xantronix 4 days ago [-]

> the drudgery of programming

Is programming supposed to suck all the time? Am I doing it wrong? I mean yeah, sure, it sucks sometimes, but overcoming that "suck" is where I feel progress and growth. If we decide to optimise that away...What the fuck am I doing here? No offence to managers, but if everybody is a manager, is anybody?

spicyusername 3 days ago [-]

I'm referring to things like spending a whole day pruning a jira backlog and cleaning up stale git branches.

Classic drudgery that were part of the day in the life that we're not directly writing code.

Forgeties79 4 days ago [-]

Feels kind of like the problem of everybody wanting to be an entrepreneur in the 2010s. Just led to people basically trying to get paid to be middleman companies skimming from others that don’t really need them, or worse, selling supplements and life coaching or whatever on social media and other grifts.

RealityVoid 4 days ago [-]

Bingo! Nobody wants to build actual stuff. They all want to be intermediaries.

pryelluw 4 days ago [-]

This better stated as: Use of agents has forced teams to adopt best practices and guide style guides.

Which is my experience. Once you get into the actual development process, the code itself produced by the agents is not good enough. Still needs editing and rewriting.

mhitza 4 days ago [-]

> GPT 5.5 is the first model good enough for me to just let rip.

You know this is the exact same thing said during Opus 4.6, right?

That makes it hard to believe because it's the same "last week's model was so much behind you can't even comprehend" meme that's been going on throughout last year.

More info dumped into tickets and projects is great for understanding for both people and LLM. But hopefully not LLM generated.

raincole 4 days ago [-]

> You know this is the exact same thing said during Opus 4.6, right?

Yeah, and for Sonnet 3.5 or even GPT4o. Because it was true for many. Different people have different timing to reach acceptance stage.

kusokurae 4 days ago [-]

It's just cope. I'm so close to just never coming back to HN because the quality of thought has just gone through the floor. Anything whatsoever to hedge one's way to fellating a phallusless chatbot

john_strinlai 4 days ago [-]

>You know this is the exact same thing said during Opus 4.6, right?

spicyusername said this exact same thing about Opus 4.6?

or is there more than one person on HN, and perhaps they have different opinions?

mhitza 4 days ago [-]

There wasn't any personal mention in my post. A snark remark at the fact that this cycle keeps continuing and every new release is game changer except in the banchmarks where there is mostly a slight couple percent change, generally.

anthonyrstevens 4 days ago [-]

You're missing the point that it's (conceivably, and probably) different people making the comments. Each model release has a few new converts, which is expected if the models are in fact getting better at agentic coding.

You're implying it's a hype train when in fact it's an adoption curve.

maccard 4 days ago [-]

> which is expected if the models are in fact getting better at agentic coding

Is it? Or is it also explainable that the models are not getting better but people are still adopting it.

If the models were getting we’d be seeing mobile apps with new features at 10x the rate previously, or websites with 4 times the number of features. But we’re not.

happytoexplain 4 days ago [-]

I think numerically this is the exception - and it's a fantastic exception! But in practice what I've seen is things getting worse because people still just aren't very good at thinking, so the great-looking Jira ticket actually turns out to be nonsensical in some subtle way, whereas before it was just lacking in some obvious way that could immediately be called out and had an obvious solution.

I.e. it's making good output better, but it's making mediocre output (which is most output) worse by adding volume and the appearance of quality, creating a new layer of FUD, stress, tedium, and unhappiness on top of the previously more-manageable problems that come with mediocre output.

I'm still seeing this even with the newest models, because the problem is the user, not the model - the model just empowers them to be even worse, in a new and different way.

skywhopper 4 days ago [-]

Yes, the software that piles up literally is the tech debt. Every automation and tool that was vibe-coded has to be maintained as well. If software is 100x easier to write and you write 100x as much of it, then taking into account network effects, your tech debt is now 100x worse. Congrats!

acedTrex 4 days ago [-]

> The floor has been raised significantly as to what bad code can even look like

It's hard for me to disagree with this take more wow. LLM slop code is TERRIBLE and verbose.

globnomulous 4 days ago [-]

I was an LLM naysayer for a very long time. I continue to have serious reservations about the ethics of LLM use and the likely economic effects (these tools are likely to empower the owners of capital and disempower labor). On the other hand, I had a rather striking experience the other day that convinced me that the future in which these tools write software may not be so bad:

I had an idea to improve performance in one of the slowest but also one of the most critical parts of the codebase I own, so I asked Claude to re-write it. I gave it exact instructions. It got most things right but key things wrong. I caught the bugs and then asked it for some optimizations, and it came up with a number that were quite good. As I read the code, I saw more and more opportunities for improvement. To make a long story short, code that used to require upwards of 30 seconds in a particularly heinously ugly stress test now finishes in about 8ms.

My original code was terrible. That's indisputable. Maybe the bar for improvement was low. Still, the algorithms and optimizations that I was able to devise while using Claude Opus 4.6 surprised me. I don't often feel pleased with the cleverness of my work, but in this case the work really is stellar -- or at least enough of an improvement that it feels stellar.

Could I have written it without Claude? Yes, definitely. But I was able to produce the code in a few days while having a fever of 100-102, which I definitely couldn't have done on my own.

Moreover, it was plainly apparent to me, while I worked, that I was better able to think about high-level architecture and design because I wasn't stuck on the details of actually writing the code. The code itself, line by line, isn't difficult if you have familiarity with bitwise operations, but there's enough of it, with enough branches, that it's difficult as a whole and the work of writing it would have consumed much of my attention and energy.

Claude missed a huge amount. I improved performance by more than 95% after it told me there were no other opportunities for major optimizations.

Using the tool freed me, I found, to think more clearly, more deeply, and more effectively. Does the result create tech debt? I don't think so. I've pored over it and can't find anything lacking in style, design, or architecture. It's very well documented. Claude wrote tests, as I requested, for everything, including all the bugs that Claude missed and I caught. Test coverage is probably 100%, but, much more importantly, tests exhaustively cover cases, including edge cases, that would have, again, been difficult to enumerate and write by myself.

I doubt Claude could have done all this as well if the codebase and tests weren't already as mature as they are. I really wonder about the feasibility and advisability of greenfield software development with these tools. And a junior developer absolutely couldn't have accomplished what I did. The tool would have produced far worse work in the hands of someone who doesn't know what they were doing.

So I agree with you and disagree: I'm turning a corner on these tools, but I absolutely could not just let rip and trust it to do anything correctly. Moreover, I could not be less impressed by the MCPs written by people in my company. The bare tool by itself is better, though maybe that says more about my company, and my regards for the people I work with, than the tools.

RealityVoid 4 days ago [-]

I think this is the way. Human-machine co-design worked great for me so far. Hell, even the test writing alone is great, because I can have more confidence in my code. And test writing was mostly drudgery. On the other hand, you _must_ have a good mental model of the thing in your head else this will not work. And it's much easier to believe you have it and not really have it if you don't engage with the codebase.

arnitdo 4 days ago [-]

> Could I have written it without Claude? Yes, definitely. But I was able to produce the code in a few days while having a fever of 100-102, which I definitely couldn't have done on my own.

While I admire your strength in attempting it, this just adds one more brick to the wall of precedents that "what's stopping you from just sending one prompt, it'll just take 30 seconds and you can do it in bed!"

You could sum it up into a simple equation as Features Shipped = Features/Hour * Developer Hours

Developer hours has remained a constant, and F/H has gone up. I am of the opinion that the ideal is the inverse.

globnomulous 4 days ago [-]

That's an excellent point. To be fair, I allowed myself to work on this while I was sick only because it was fun. This was a bit like scratching an itch, because I'd had the germ of an idea for a while and wanted to get it out of my head. You're absolutely right that it sets a dangerous precedent: the easier it becomes to do work, the easier it is to demand work of the people doing it. The boundaries need to be firm.

On the other hand, this was also a case where Claude really did help me finish something more quickly than I could have without it. So in thi scase I do think it lowered the number of developer hours per feature.

Tade0 4 days ago [-]

> fixing issues is now basically free if your company is willing to shell out for tokens.

Yeah, about that: I looked into Cursor's usage stats and daily I'm going through the equivalent of a bacon sandwich in my cantina, so not much, but this is at today's prices and very light usage of Sonnet.

I was for a time using Opus 4.6 for a heavier task and even then I think the cost was well into the double digit percentages of my salary.

Opus 4.7 reportedly uses more tokens overall and while they reportedly kept rates stable, that is not a given.

Just wait until, with increasing costs, the first company figures that they'll offer this as a benefit and then maybe scrap it altogether in the name of cost cutting.

arnitdo 4 days ago [-]

Watch token budget be included as part of employee TC figures - I feel this is an eventuality due to rising costs and "true pricing" slowly creeping in.

Current ventures feel moreso like a pilot program (you bought a private jet, now get a couple of your pilots to actually fly it) versus having an entire fleet of jets, and having to pay salaries to all those pilots, plus account for their fuel charges.

Right now all expenses are relatively "someone else's {problem,money,infra}".

sjq2026 4 days ago [-]

[dead]

inquirerGeneral 4 days ago [-]

[dead]

qazxcvbnmlp 4 days ago [-]

The gap between the ai haves and have-nots is starting to appear. 6 months ago a developer with copilot was about on par with one without. The AI code required a lot of review, about the same amount of time as writing the code manually.

Now.. the AI first engineer might still have to deal with hallucinated things. But.. they can also use the newfound cheapness of code to improve their workflow. Instead of just testing on localhost and manually deploying to prod, you can have a full dev, staging, prod pipeline for free. Tech debt can be one command from being refactored. The open source package that doesn’t quite do what you need it to do? Fork it and write a patch. The ai will be able to maintain the patch. Oh.. you need that bespoke feature for management? Np, done in a 1hr ai session.

Each of these things might be arguably insignificant on their own but net over a projects lifetime they really build up.

maccard 4 days ago [-]

This is what people were telling me when opus 4.6 was released 3 months ago, that this time it’s different.

Rendered at 15:50:50 GMT+0000 (Coordinated Universal Time) with Vercel.