[go: up one dir, main page]

Changelog Interviews – Episode #673

The era of the Small Giant

with Damien Tanner

All Episodes

Damien Tanner (founder of Pusher, now building Layercode) is back for a reunion 17 years in the making. Damien officially returns to The Changelog to discuss the seismic shift happening in software development. From the first sponsor of the podcast to frontline builder in the AI agent era, Damien shares his insights on why SaaS is dying, why code review is a bottleneck (and non-existent for some), and how small teams can now build giant things.

Featuring

Sponsors

Depot10x faster builds? Yes please. Build faster. Waste less time. Accelerate Docker image builds, and GitHub Actions workflows. Easily integrate with your existing CI provider and dev workflows to save hours of build time.

Tiger Data – Postgres for Developers, devices, and agents The data platform trusted by hundreds of thousands from IoT to Web3 to AI and more.

Notion – Notion is a place where any team can write, plan, organize, and rediscover the joy of play. It’s a workspace designed not just for making progress, but getting inspired. Notion is for everyone — whether you’re a Fortune 500 company or freelance designer, starting a new startup or a student juggling classes and clubs.

Fly.ioThe home of Changelog.com — Deploy your apps close to your users — global Anycast load-balancing, zero-configuration private networking, hardware isolation, and instant WireGuard VPN connections. Push-button deployments that scale to thousands of instances. Check out the speedrun to get started in minutes.

Notes & Links

📝 Edit Notes

Chapters

1 00:00 This week on The Changelog 01:26
2 01:26 Sponsor: Depot 02:18
3 03:44 Start the show! 01:47
4 05:31 AI Engineer (AIE) Code Summit 2025 03:58
5 09:29 What is/was Pusher? 03:34
6 13:03 How are today's days different? 02:17
7 15:20 SaaS is dead!? 13:52
8 29:12 Sponsor: Tiger Data 02:30
9 31:41 No code review? What's replacing it? 02:52
10 34:33 Opus 4.5 changed things (really Sonnet 4.5 first) 03:16
11 37:49 Is Saas REALLY dead? Hmm... 04:38
12 42:27 Inviting non-technical folks to Terminal 05:18
13 47:44 What if everything was JIT? 03:40
14 51:24 It's Layercode time 12:31
15 1:03:55 Sponsor: Notion 02:09
16 1:06:04 Set on Cloudflare workers (and TypeScript) 04:00
17 1:10:04 Why not Go (or...)? 03:53
18 1:13:58 Directing the interupt 02:41
19 1:16:39 API vs local models - latency and reliability 10:09
20 1:26:47 The era of the small giant 07:09
21 1:33:56 What's next? What's over the horizon? 02:23
22 1:36:19 Bye friends! 00:38
23 1:36:57 Closing thoughts and stuff 01:15

Transcript

📝 Edit Transcript

Changelog

Play the audio to listen along while you enjoy the transcript. 🎧

[00:00] Well, friends, I’m here with a long time friend, first time sponsor of this podcast, Damien Tanner. Damien, this has been a journey, man. Like, this is the 18th year of producing The Changelog. As you know, when Wynn Netherland and I started this show back in 2009, I corrected myself recently. I thought it was November 19th. It was actually November 9th was the very first, the birthday of The Changelog. November 9th, 2009. And back then you ran Pusher, Pusher.app. And that’s kind of when sponsoring a podcast was kind of like almost charity. Like, you didn’t get a ton of value because there wasn’t a huge audience, but you wanted to support the makers of the podcast. And, you know, we were learning, and obviously open source is moving fast and we were trying to keep up, and GitHub was one year old. I mean, like, this is a different world. But I want to start off by saying, you were our first sponsor of this podcast. I appreciate that, man. Welcome to the show.

So kind of you. I, you know, reflecting on Pusher, we kind of just ended up creating a lot of great community, especially around London and also around the world with Pusher. And I really love everything we did. And we started an event series. And in fact, another kind of like coming back around, Alex MacCaw, who works at Mastra, he’s coming to speak at the AI Engineer London meetup branch that I run. And he started and ran the Pusher Sessions, which became a really well known talk series in London.

Okay. Were you at the most recent AI conference? I was in SF.

What was that like? I can always kind of jump in the shark a little bit because I kind of want to talk. I want to juxtapose like Pusher then timeframe developer to like now, which is drastically different. So don’t, let’s not go too far there, but how was AI in SF recently?

It was a good experience, always a good injection of energy going to SF. I live just outside London. But, you know what, the venue was quite big and it didn’t have that like, together feel as much as some competitors. But it was the first time though I sat and, you know, huge conference hall. And I think it was like Windsurf or something’s chatting and I was like, this is, this is really like, we’re all miners at a conference about mining automation. And we’re like, we’re engineers. So we’re super excited about it, but, right, it’s kind of weird. Like, it’s going to change all of our jobs. All right. It’s like, I’m working right now to change everything I’m doing tomorrow, right? I mean, that’s kind of how I viewed it.

I was watching a lot of the playback. I wasn’t there personally, this time around, but I would want to make it the next time around. But, you know, just the Swyx, the content coming out of there, everybody’s speaking, I know a lot of great people are there. Obviously pushing the boundaries of what’s next for us, the frontier, so to speak. But a lot of the content, I mean, almost all the content was like, top, top notch. And I feel like I was just watching the tip of humanity, right? Like just experiencing what’s to come because in tech, you know this as being a veteran in tech, we shape, we’re shaping the future of humanity. In a lot of cases, technology drives that. Technology is a major driver of everything. And here we are at the precipice of the next, the next, next thing. And it’s just wild to see what people are doing with it, how it’s changing everything we know. Everything. I feel like it’s like a flip. It’s a complete, not even a one eighty, like a 720, you know what I mean? Like it’s three spins or four spins, it’s not just one spin around to change things. I feel like it’s a dramatic forever. Don’t even know how it’s going to change things, changing things thing.

[04:05] I mean, you know, bringing it back to the Pusher days, it’s the vibe we had then. You know, there was this period around just before Pusher and the first half of Pusher, I felt like where we were going through this. Maybe it’s called like the Web 2.0, but there was a lot of great software being built. And a lot of, you know, the community. And I think the craft that went into, especially like the Rails community. And we were just able to build incredible web-based software. And then, you know, we’ve gone through like the commercialization, industrialization of SaaS. And what gets me really excited is now when we’re, you know, we run this AI Engineer London branch. And incredible communities come together. And it’s got that energy again. And I guess the energy is very exciting. There’s new stuff. Everyone can play a part in it. And we’re also just all completely working it out. And it’s like, you’ve got the, you know, folks on the main stage of the conference. And then you’ve got, we’ll chat about it later, maybe like, Geoffrey Huntley, posting his meme, blog post. It’s like that the crazy ideas and innovation is kind of coming from anywhere, which is brilliant.

Yeah, there’s some satire happened too. I think there was a talk that was quite comedic. I can’t remember who the talk was from, but I was really just enjoying just the fun nature of what’s happening and having fun with it, not just being completely serious all the time with it.

For those who are uninitiated, and I kind of am to some degree, because this has been a long time, remind me and our listeners what exactly was Pusher. And I suppose the tail end of that, how are things different today than they were then?

Pusher was basically a WebSockets push API. So you could push anything to your web app in real time. So just things like notifications into your application. We ended up having a bunch of customers, maybe in finance or crypto or any kind of area where you need to live updating pricing. In the early days at one point, Uber was using Pusher to update the cars in real time before they built their own infrastructure. And it was funny. I remember the standout because we ran a consultancy where we were chatting about the WebSockets in browsers and we’re like, oh, this is cool. How can we use this? And the problem is, you know, we were all building Rails apps. So like, okay, we need like a separate thing, which manages all the WebSocket connections to the client. And then we can just post an API request and say, push this message to all the clients. It was a simple idea and we took it seriously and built it into pretty formidable dev tool used by millions of developers. And still used a lot today. And we eventually exited the company to MessageBird who are kind of European Twilio competitor. Actually, at one point, we nearly sold the company to Twilio. That would have been a very different timeline.

According to my notes, you raised $9.2 million, which is a lot of money back then. I mean, it’s a lot of money now, but like, that was tremendous. That was probably 2010, right? 2011?

The bulk of that we raised later on from Balderton. Okay. The first round was maybe half a million. Very, very small. And it started out the agency. So we built the first version in the agency. Just for fun, I suppose.

And maybe some tears on your part. Juxtapose the timeline, right? You got an acquisition ultimately, but you mentioned Twilio as an opportunity. How would have that been different, if you can like branch the timeline?

[08:09] It would have been a great experience to work with the team. There’s incredible people who watched Twilio and move through Twilio. I don’t know. I haven’t calculated it, but we didn’t sell, because the offer wasn’t good enough in our minds. It was a bit of a local, and it was stock. In hindsight, the stock hasn’t gone very well. Turns out it was a good financial decision, but would have loved that experience. I think Twilio became the kind of OG of DevRel, right? And dev community. And we’ve run, you know, how we got to know them is we did a lot of combined events with them and hackathons with them. That was a fun time.

Yeah, they were like the origination. Jeff Lawson was, you know, very much quintessential in that process of a whole new way to market to developers. And I think that might have been the beginning of what we call DevRel today. Would you agree with that? I mean, it’s like, if there was a seed, that was one of many, probably, but I think one of the earliest seeds to plant of what DevRel is today.

Crazy times, man.

So how do you think about those times of Pusher and the web and building APIs and building SaaS services, et cetera, and, you know, pushing messages to Rails apps. How are today’s days different for you?

It’s exciting, because the web and software is just completely changing again. Like, I feel like we had that with Web 2.0, right? That was the birth of software on the internet, hosted software on the internet. And it’s such an embedded thing in our culture, in our business, as developers, a lot of us work on that kind of software. But most businesses run on SaaS software now. And I have to remind myself, like, there was a time before SaaS. And therefore, there can be a time after SaaS, and there can be a thing that comes after SaaS. And it’s not a given that SaaS sticks around. I mean, like any technology, we tend to kind of go in layers, right? We still have a bunch of copper phone lines around the place, and we use them for other things, and we’re slowly replacing them. Like, these changes, you know, in the aggregate take a lot of time. But I guess, you know, the thing that can shift more quickly is the direction things are going. And really in the last few months, I think I’ve been more and more convinced by my own experiences and things I’ve seen playing with stuff, that it’s entirely possible, and probably pretty likely that there is a post SaaS. And I think I don’t know if everyone realizes, but like the, or everyone is with that intention, but like all of us playing with agents and LLMs, whether it’s to build the software or to do things, we are doing that. You know, when we’re probably doing that instead of building a SaaS, we’re using it to build a SaaS, right? It’s already playing out amongst the developers.

Yeah, it’s an interesting thought experiment to think about the time before SaaS and the potential, as you may say, the potential time after SaaS. I’m curious because I hold that opinion to some degree. I think there’s, you know, what SaaS stays and what SaaS goes if it dies. And you said in the pre-call, burst the bubble a little bit here, you did say, and I quote, all SaaS is dead. Can you explain your homework? All SaaS is dead.

I think I should probably go through my journey to here, to kind of illustrate it, because…

Give us the TLDR first, though. Give us the, the clip, and then go into the journey.

Okay, okay. The TLDR is SaaS. So there’s a few layers as like the building of software or parts to software. There’s a building of software. And then there’s the operating of software to get something done. And I think most developers are very familiar with like the building of software as changing now. But the operating software, the operating of work, the doing of work in all industries and all knowledge work, can change like we’ve changed software. And SaaS is made for humans, slow humans to use. The SaaS UI is made for a puny human to go in, you know, understand, you know, work at this complex thing and it has to be in a nice UI. If it’s not a human actually doing the work that they do in the SaaS, if it’s an AI doing that work, why is there a, why is there a SaaS tool? The AI doesn’t need a SaaS tool to get the work done. It might need a little UI for you to tell you what it’s done. But the whole idea of humans using software, I think, is going to change. It can.

Yeah. Well, you’ve been steeped in, and I still want to hear your journey, but I’m going to step in one second. You’ve been steeped in APIs and SaaS for a while. So I hold that opinion that you have that I agree that the, if the SaaS exists for a UI for humans, that’s definitely changing. So I agree with that. Where I’m not sure of, and I’m still questioning myself is like, what is the true solution here? There are SaaS services that can simply be an API. You know this, you built them. I don’t really need the web UI. Actually, I kind of just prefer the CLI, I kind of prefer just JSON from my agents. I kind of prefer Markdown for me because I’m the human. I want those good prose. I want all of it local. So my agents can, you know, mine it and create sentiment analysis and, you know, all this fun stuff you could do with DuckDB and Parquet and just super, super fast stuff across embeddings and, you know, vector, you know, PG vector, all those fun things you could do in your own data. But that’s where I stop is I do agree that the web UI will go where some version of it. Maybe it’s just there’s like a dashboard for those who don’t want to play in the dev world with CLIs and APIs and MCP and whatnot. But I feel like SaaS shifts. Like my take is CLI is the new app. That’s my take is that SaaS will shift. But I think it will shift into CLI for a human to instruct an agent and an agent to do. And it’s largely based on API, JSON, you know, clear defined endpoints, great specifications, things that get more and more mature as a result of that.

Yeah, I guess we should probably kind of tease apart SaaS the business and SaaS the software. Okay. Because yeah, I agree that the interface is changing. The interface that we use, whether it’s visually CLI or a chat conversation or something. But the way we communicate with the software is changing, right? It’s a much more natural language thing. We don’t have to dig in the UI to find the thing to click. But also so much of the software we use that we call SaaS that we access remotely. If you can just magic that SaaS locally or within your company, right? There’s no need to access that SaaS anymore, right? You just have that functionality. You just ask for that functionality and it’s being built. But yeah, SaaS, the business, I guess this is the challenge for companies today is they’re going to have to, if they want to stay in business, they’re going to have to shift somehow. Because yeah, I mean, there’s still got to be some harness, harness is the wrong word because you use that in coding agents, but like some infrastructure, some cloud, some coordination, authentication, data storage, there’s still a lot to do. And I think there’s going to be some great opportunities for companies to do that. And maybe a CRM, a Salesforce or something, manages to say, hey, we are, people like Salesforce trying to do that. We are the place to run your sales agents, right? You’re magically instantiated CRM code that you want just for your business. Maybe there’ll be some winners there. But the idea that I think the thing that’s going to change SaaS’s business, SaaS Software is the idea that like everyone has to go and buy the same version, you know, of some software which they remotely accessing can’t really change.

Okay, I’m feeling that for sure. Take us back into the journey then because I feel like I cut you off and I don’t want to disappoint you, but not let you go and give the context, the key word for most people these days, the context for that blanket statement that SaaS is dead or dying.

Yeah, okay, I’ll give you a bit of the story. So my company Layercode, I’ll just give you a little short on that. We provide a voice agents platform. So anyone can add voice to their agent. It’s developer tool, developer API platform for that. And we’re now ramping up our sales and marketing. And we kind of started doing it the normal ways. We kind of got a CRM. We got some marketing tools. And I was just finding, we went through a CRM too. And I was just finding them like, these are like the new CRMs that are supposed to be good. They were just really, really slow. And then I just couldn’t work out how to do stuff. It was like, I had to go and set up a workflow. And it was like, I needed training to use this CRM tool. And I’d been having a lot of fun with Claude Code and Codex, kind of both flipping between them, kind of getting a feel for them. And so I just said, build me a, I just voice dictated, you know, a brain dump for like 10, 15 minutes. Here’s the CRM I need. And also it wasn’t just like a boring CRM, it was like, I need a CRM, I need you to make a CRM that kind of engages me as a developer who doesn’t wake up and go, let’s do sales, you know, gamify it for me. And then here are the ways I want you to do that. And it just did it. That was my kind of like coding agents moment. And I think you have that moment when you do a new project. You use an LLM and a completely greenfield project. And there’s no kind of existing code it’s going to kind of mess up or get wrong and the project’s not too big. It just built the whole freaking CRM. And it was really good. It was a good CRM and it worked really well. And so that was like my kind of like level one awakening, which was like this idea that you can just have the SaaS you want instantly. It suddenly felt true. It felt, because I had done it. And I have canceled the old CRM system now. And there’s a bunch of other tools I plan to cancel. You know, because they’re all crap, but because it’s harder to use them than it is to just say what I want. Because I kind of have to learn how to use those tools. Whereas I can just say, make me the thing, make me the website I want instead of using a website builder tool or make me the CRM that I want to use. And then there’s this like different cycle that you have, the loop that you have of improvement, where it’s not a once, it’s not build and then use the software. It’s like as you’re using the software, you can improve the software at any time. And we’ve still got to work out how this works. Like who has the power to change the software? And how do you share that amongst a team, right? And do I have a branch of the software that I, or do I have different, like my own views or something in the CRM that I can mess around with? But just as a within our team of three doing this stuff in the company, it was like, oh, you’re annoyed with this part of the software. Just change it. Just change it.

Yeah. When it annoys you, it’s the exact point of time and then continue with the work. Right. And I assume you’re probably still doing like a GitHub or some sort of like primary GitHub, not literally like GitHub, but git repository as a hub for your work, right? And you probably have pull requests or merge requests. So even if your teammate is frustrated, improves the software, pushes it back, you’re still using the same software and you’re still using the same traditional developer tooling, which is pull requests, code review, merging.

Yeah. That’s going to have to change as well.

Okay. Take me there. I woke up this morning with that feeling.

Okay. That’s changing too.

How’s it changing?

With the CRM and with something we’ve been building this week, there were new pieces of software. There weren’t existing code bases. I didn’t have any prior ideas and taste and requirements about what the code should look like. I think this is the thing that slows people down with coding agents. You use it on existing repo and LLMs have bad taste. They just give you kind of the most common denominator, kind of bad taste version of anything, whether it’s like writing a blog post or coding, right? And so when you use it on an existing project and then you review the code, you just find all these things wrong with it, right? It’s like, you know, right now they love doing all this like really defensive try-catch in JavaScript or really verbose stuff or right now a utility function that exists in a library already. But when you start on a new project and you just use YOLO mode and you’re just, you know, you’re building something for yourself as well, right? And it works. Like, where’s the code? Why review the code? I think we’re only in this temporary, weird thing where we’re like trying to jam, like we have these existing software processes that ensure we deliver high quality software, secure software, good software. I think it’s hard, we can’t throw it, we’ve got SOC 2, we can’t throw those out the window for everything that exists today. But for everything new that you’re building, you’ve got an opportunity to kind of pull apart, question and collapse down all of these kind of processes we built for ourselves, processes that were built to ensure humans don’t make mistakes, right? And help humans collaborate and help humans manage change in the repository and everything. It’s like if the humans aren’t writing the code anymore, we need to question these things.

Are you moving into the land of agent first then? It sounds like that’s what you’re going.

I feel like I’m being pulled into it by, yeah, I’m slight, I’m kind of like, there is a tide. I can’t resist. I’m falling in the hole. And we’re kind of like, we’re dipping our toes in, right? Trying to try out Cursor and Tab. And then we’re kind of in there and we’re swimming, trying to swim the way we normally swim the way we want to go. And suddenly I’ve just gone, just like relax and just let the tide, let the river take you. Just let it go, man. Just let it go.

It’s scary.

It feels kind of terrifying and there are going to, and I don’t have the answers to how we do code review. But you know, if you look at like a lot of, you know, teams talking about using AI coding agents and the resisting project, everyone’s big problem now is code reviews, right? Because everyone using coding agents is producing so many PRs, it’s like it’s piling up in this review process that has to be done. The new teams that don’t have that process in place, they are going multiple times faster right now.

Okay. What is replacing code review if there’s no code review? Is it just nothing?

To these teams that you’re explaining about, like us as developers, we need to think more like, we need to put ourselves in the shoes of PMs, designers, managers. Because they don’t, they don’t look at the code, right? They say we need this functionality. We build it. We do our code reviews. We ensure it works. And the PM, whoever, goes, oh yeah, great. I’ve used it and it’s a requirement. It’s great, right? They’re comfortable not looking at the code. They’re a long way from the code. They’re closing the deal. They’re with the customer. They’re integrating. They’re like, I am confident that the intelligent being that created this code did a good job. Now, I think the only reason we’re kind of stuck in this old process is because many of them are set in stone. But also because LLMs aren’t quite smart enough yet. They still make stupid mistakes. Right. You still need a human in the loop and on the loop.

Yeah, I mean, they’re still a bit dumb and they get done with silly things and they do. Oh, look, they’re going the wrong direction for a while. And I’m like, no, hang on a second. That’s a great thought here. But let’s get back on track. This is the problem we’re solving. And you’ve sidequested us. It’s a fun sidequest, that was the point, but that’s not the point.

But this is going to change. Right. And this is one of the hard things is trying to put ourselves in the mindset of what it’s going to be like in a year. And I think I’ve only been, you know, after us being able to play with LLMs for several years, it feels like I can feel the velocity of it now. Right. Because I’ve felt GPT-3, 4, 5, Claude, Codex. And now I can go, oh, okay, that’s what it feels like for it to get better. And it’s going to keep getting better for a few more years. So it’s kind of like self-driving cars, right? They’re like not very useful while they’re worse than humans. But suddenly when they’re safer than a human, like, why would you have a human?

And I think it’s the same with coding. Like all this process is to stop humans making mistakes. We make mistakes. Like our mistakes are not special, better mistakes. They’re still like, we stuff up code. We cause security incidents. And so I think as soon as the LLMs are twice as good, five times as good, 10 times better at outputting good code that doesn’t cause these issues. We’re going to start to let go of this concern, like these things, right? We’re going to start to trust them more.

Something I leaned on recently, and it was really with Opus 4.5. I feel like that’s when things sort of changed because I’m with you on the trend from GPT-3 on to now and feeling the incremental change. I feel like Opus 4.5 really changed things. And I think I heard it in an AI talk or at least in the intention of it, if it wasn’t verbatim, was trust the model. Just trust the model. As a matter of fact, I think it was one of the guys, man, they were building an agent. And in the process was maybe as an agent layer, layer agent, something like that. Maybe borrowed something from your name, Layercode. I have to look it up. I’ll get the talk. I’ll put in the show notes. But I think it was that talk. And I was like, okay, the next time I play, I’m going to trust the model. And I will sometimes like stop it from doing something because I think I’m trying to direct it in a certain direction. And now I’ve been like, wait a second. And like, this code’s free basically. It’s just going to generate anyways. Let’s see what it does. Worst case, I’m like, you know, roll back or worst case is like just generate better. I mean, like ultra think. Right. You know, what’s the worst that could happen because it’s going faster than I can. Anyways, let’s see. Even if it’s a mistake, let’s see the mistake. Let’s learn from the mistake because that’s how we learn even as humans. I’m sure LLMs the same. And so I’ve come back to this philosophy or this thought, almost to the way you describe it, like falling into this hole, slipping in via gravity. Not excited at first, but then kind of like excited because like, well, it’s good in there. Let’s just go. Just trust the model, man. Just

[30:03] Trust the model. And it can surprise you. And I think that still gives me that dopamine hit that I would have coding, right? When I was coding manually, you’d get a function right and you’d be like, “Ah, it works.” And now it’s like, you’ve got the whole application and you’re like, “Ah, I just did a problem for the whole thing works.” That’s right.

Yeah, it’s really exciting. And yeah, it’s fun right now. I mean, it’s going to keep changing. This is just a bit of a temporary phase here and now. But I think for many of us building software, we love the craft of it, which you can still do. But also the making a thing is also one of the exciting bits of it. And the world is full of software still. Like, you think about so many interactions you have with like government service or whatever. Not saying that they’re going to adopt coding agents, particularly, but there is a lot of bad software in the web. And software has been expensive to build. And that’s because it’s been in high demand. And so I don’t think we’re going to run out of stuff to build. I think even if we get 10 times faster, 100 times faster, there’s so much useful software and products and things and jobs to be done.

Close this loop for me then. You said SaaS is dead or dying. I’m paraphrasing because you didn’t say “or dying.” I’m just going to say we’re dying. I’ll parenthesis. That’s my parenthesis. I’ll add it to your thing. How is it going to change then? So if we’re making software, there’s still tons of software to write, but SaaS is dead. What exactly are we making then? If it’s not SaaS. I know that not all software is SaaS, but you do build something, a platform, and people buy the platform. Is that SaaS? What changes? You mentioned interfaces like, where do you say that?

I think we’re moving. And so this is the next level. The next kind of revelation I had was I started using the CRM. I was like, this is cool. This is super fast. This is better than the other CRM. And I can change it. Cool. I’m doing some important sales work. I’m enriching leads. And then I kind of woke up a few days later. I was like, why am I doing the work? What’s going on here? I create an interface for me to use, right? Why can’t Claude Code just do the work that I need to do for me? I know it’s not going to be with the same taste that I have. And I know it’s going to make mistakes. But I can have 10 of them do it at the same time.

And it’s not a particularly fun idea, fully automated sales. And what that means for the world in general, but it’s the particular vertical where I had this kind of revelation, right? Well, the enriching certainly makes sense for the LLM to do, right? The enriching is like, come on, that’s under the API, I’m copying things. And a lot of it is so manual still. And so the revelation was just waking up and then going, okay, Claude Code’s going to do the work for me today. Like it does for software. It builds the software for me. I’m going to give it a Chrome browser connection. That’s still a non-solved problem. There’s a lot of pain in LLM chatting to the browser. But there’s a few good ones. And I’m going to let it use my LinkedIn. I’m going to let it use my X. And I’m going to connect it to the APIs that I need that aren’t pieces of software, but like data sources, right? I can get enriched and search things. And then I just started getting it to just do it.

And it was really quite, it was slow. But it was really quite good. And that was a kind of, that was like that moment where we typed in, build this feature in Claude Code, build this. But it was suddenly like this thing can just do anything a human can do on a computer. The only thing holding it back right now is the access to tools and good integrations with the interfaces, like the old software it still needs to use to do what a human does. And a bigger context window and it’d be great if it was faster. But I can run them in parallel so that the speed’s not a massive problem.

And in the space of a week, I built the CRM. And then I got Claude Code to just do the work, but I didn’t tell it to use the CRM. I just told it to use the database. And I just ended up throwing away the CRM. And now we have this little Claude Code harness that overrides the Claude Code system prompt, sets up all the tools and gives it a Postgres database. And I’ve just got like, I need to vibe code a new CRM UI. But I’ve just got like a database viewer that the non-technical team used to kind of look at the leads and stuff like that. It’s just a kind of Beekeeper kind of database viewer. And now Claude Code is just doing the work. We’ve only applied it there, but this is just like Claude Code is like this kind of little innovation in an agent that can do work for a long time. And we already know people use ChatGPT for all sorts of different things beyond coding, right? And so suddenly I think these coding agents are a glimpse of all knowledge work can be sped up or replaced.

Administration can be replaced with these things now.

Yeah, these non-technical folks, why not just invite them to the terminal and give them CLI outputs that they can easily run and just up arrow to repeat or just teach them certain things that maybe they weren’t really comfortable with doing before. And now they’re also one step from being a developer or a builder because they’re already in the terminal. And that’s where Claude’s at.

Yeah, I mean, that’s what we’ve done now. I’ve seen some unexpected kind of teething issues with that. I think there’s just a terminal feels a bit scary to non-technical people. Even if you explain how to use it, like when they quit Claude Code or something, they’re just kind of like lost. They’re like, “Oh my gosh, where did Claude Code go?”

Yeah. And I was onboarding whatever team, it was like open the terminal. And then I’m like, okay, we’ve got a CD. What if the terminal was just Claude Code? What if you built your own terminal that was just—

I think when I actually think about specific UI, whether it’s terminal or web UI, it’s kind of not the handle there, but there is the magic is a thing that can access everything on your computer, right. And they’re doing that with, I think it’s called Co-Work. Have you seen Co-Work yet?

So it’s like, I haven’t played with it enough to know what it can and can’t do. I think I unleashed it on a directory with some PDFs that I had collected that was around business structure. And it was like an idea I had like four months ago with just a different business structure that would just make more sense primarily around tax purposes, you know. And I was like, hey, revisit this idea I haven’t touched it in forever. And it was a directory. And I think it went and it just did a bunch of stuff. But then it was like, come on with ideas. I’m like, nah, there’s not good ideas. I don’t know like if it’s like less smart than Claude Code is in intent or whatever, but it’s kind of, I think that’s what you’re trying to do with Co-Work. But you know, you could just drop them into essentially a directory, which is what Claude Code lives. And it lives in a directory of maybe files that is an application or knows how to talk to the database as you said your CRM does. And they can just be in the Claude Code instance, just asking questions. Show me the latest leads.

Yeah. That could use a skill if you want to go that route or it can just be smart enough to be like, well, I have a Neon database here. The Neon CTL CLI is installed. I’m just going to query it directly. You have to be able to write some Python to make it faster. Maybe I’ll store some of this stuff locally and it’ll do it all behind the scenes. But then it gives this non-technical person a list of leads. All I had to do was be like, give me the leads, man. You know. And then you mentioned enabling them as builders. I think it then is a window into that because then when they want something, oh, they get curious, right? They’ll be like us. They’re just going to be like, hey, build me a report for this. Build me a web app for this. Help me make this easier.

Yeah. You’d be surprised how easy that is, like, help me make it easier is one of those weird ones. And Claude Code will also auto complete and just let you tab and enter. And I’ve noticed that those things got more terse. Like, maybe I think if the last one I did was like, that’s interesting. It was like super short. It was like, “I like it, implement it.” What was the completion for them?

“I like it, implement it.” I was like, okay, is that how easy it’s gotten now to like just spit out a feature that we were just riffing on that you know the problem. You understand the bug we just got over. And now your response to me to tell you what to say because you need me the human to get you back in the loop. At least in today’s REPL is “I like it, implement it.” You know what I mean?

I found myself just responding with the letter Y. I know a lot of the time it just knows what to do, right? Even if it kind of like is a bit ambiguous, you’re kind of like you’ll work it out.

So I think it’s very exciting that Anthropic released this Co-Work thing because they’ve obviously seen that inside Anthropic. All sorts of people using Claude Code. And you know, when we think about, okay, someone starts there for non-coding purposes, but stuff is done with code and CLI tools and some MCPs or whatever, APIs. And then the user says, well, make me a UI to make this easier. So for instance, I had to review a bunch of draft messages that I wrote. I was like, okay, this is kind of janky in the terminal. Make me a UI to do the review. It just did it.

And I think this is where software is changing because when the LLM is 10 times faster, I mean, if you use the Groq with a Qwen 10 points, right, they’re insanely fast. It’s going to be fast. Then if you can have any interface you want within a second, why have static interfaces, right?

Yeah, I’m camping out there with you. What if everything was just in time? I think that interface. What if I didn’t need to share it with you? Because you’re my teammate, but what if you could do the same thing for you and it solves your problem? And you’re in your own branch and what you do in your branch, it’s like Vegas and it stays there. It doesn’t have to be saved anywhere else, right? Like just leave it in Vegas, right? What if in your own branch, in your own little world, as a sales development representative, for example, an SDR who’s trying to help the team, help the organization grow and all they need is an interface. What if it was just in time for them only? And it didn’t matter if it was maintainable. It didn’t matter how good the code was. All it mattered was that it solved their problem, get the opportunity and enable them to do what they got to do either to do the job. And you just take that and multiply or copy and paste it on all the roles that make sense for that just in time. What? It completely changes the idea of what software. It also completely changes how we interact with the computer and what a computer does and what it is for. I just love this notion that every user can change the computer, can change the software as they’re using it as they like it.

I think that’s very essentially everyone’s a developer. Yeah, I mean, it’s the ultimate way to use a computer like all the gates are down, right? There’s no geeky prerequisite anymore. If I want software the way I want software so long as I have authentication and authorization, I got the keys to my kingdom I want to make.

And I think also the agents can preempt, right? I haven’t tried this yet, but I was thinking of giving it the little sales thing. We have a little prompt where it says, like, either if a web UI is going to be better for the user to do this, review this, then just do it. So then instead of you asking it, you ask it to do some work and then it comes back and be like, oh, I’ve made you this UI where I’ve like displayed it all for you. Have a look at it. Let me know if you’re happy with it.

I mean, this is getting kind of wild this idea, but you, it’s kind of how we can think about how we communicate with each other as humans, as employees, right? We have back and forth conversations. We have email, which is a bit more asynchronous. You know, we put up a preview URL or something. Like I think all those communication channels can be enabled in the agent you’re chatting to and like I haven’t like this kind of like product companies have sell, you know, the initial messaging where people solve like digital employees, right? But something like that’s going to happen.

And I don’t think it’s the exciting bit for me is the human computer interaction, right. It’s like and this is how it’s kind of exciting in the context of Layercode and why we love voice is like voice is this OG communication method. Whereas humans we’ve been speaking, we started speaking before we were writing. And it’s kind of quite a rich communication medium. And it’s a terrific way, like if your agents can be really multi-medium, whether it’s you’re doing voice with them, text with them, they create a web UI for you, you interact with the UI with them. Like there doesn’t have to be these strict modes or delineations between those things.

Well, let’s go there. I didn’t take us there yet, but I do want to talk to you about what you’re doing with Layercode. I obviously produce a podcast. So I’m kind of interested in speech to text to some degree because transcripts, right. And then you have the obvious version, which is like you start out with speech, you get something or even a voice prompt. What exactly is Layercode? And I suppose we’ve been 51 minutes deep on nerding out on AI essentially. And not at all on your startup and what you’re doing, which was sort of the impetus of even getting back in touch. That’s all you had something new you were doing. And I’m like, well, I haven’t talked to Damien since he’s sponsored the show almost 17 years ago. It’s probably a good time to talk, right? So there you go. That’s how it works out.

Has your excitement and your dopamine hits on the daily or even by minute by minute changed how you feel about your ability with Layercode. And what exactly are you trying to do with it?

[46:29] Well, there’s, and we’ve talked a lot about the building of a company and the building of software now. And I think founders today have that, it is as important as the thing that they’re building, right? Because if you just head into your company and operate it like you did a few years ago, using no AI, using all your kind of slow development practices, using our slow sales and marketing practices, you’re going to really get left behind.

And so there is a lot to be done in working out and exploring what the company of the future looks like. What the software company of the future looks like. I’m very excited about the idea that we can build large companies with small teams. I think a lot of developers, well, I mean, there is a lot of HR and politics and culture change that happens when teams get truly large and companies get truly large. And this is one of the kind of founding principles when we started our startup was let’s see how big we can make this. Yeah. We’re the small team. And that’s very exciting because I think you can move fast and you can keep culture and keep a great culture.

And so that’s why we invest a lot of our energy into the building of the company. And what we build and what we provide right now is, and our first product is a voice infrastructure, voice API for real-time building real-time voice AI agents. And this is currently a pretty hard problem. We focus a lot on the real-time conversational aspect. And there’s a lot of kind of weird problems in that, right? Conversations are dynamic things. And there’s a lot of state changes and interruptions and back channeling and everything that happens.

And if you’re a developer building an agent, it could be your sales agent, it could be a developer of a coding agent. And you want to add voice AI, there’s a bunch of stuff you’re going to bump into when you start building that. And it’s a pretty, it’s interesting, we kind of see our customers, we can kind of predict where they are in that journey, right, because there’s a bunch of problems that you don’t kind of preempt and then you just quickly slam into them.

And so we’ve solved a lot of those problems. And so with Layercode you can then just take our API, plug it into your existing agent backend, so you can use any backend you want. And you can use any agent LLM library you want and any LLM you want. So the basic example is a Next.js application that uses the Vercel AI SDK. We’ve got Python examples as well. And you connect up to the Layercode voice layer and put in our browser SDK and then you get a little voice agent microphone button and everything in the web app. We also connect to phone over to layer.

And then for every turn of the conversation, whenever the user’s finished speaking, we ship your backend that transcript. You call the LLM of your choice, you do your tool calls, everything you need to do to generate a response like you normally do for a text agent. Then you start streaming the response tokens back to us. And then as soon as we get that first word, we start converting that text to speech and start streaming back to the user.

And so there’s a bunch of stuff you have to do to make that really low latency, make that a real time conversation where you’re not waiting more than a second or two for the agent to respond. So we put a lot of work into refining that. And there’s also a lot of exciting innovation happening in the model space for voice models, whether it’s transcription or the text to speech.

And so we give you the freedom to switch between those models, right, so you can try out some of the different voice models, some that are really really cheap and really you know got really casual voices and some like ElevenLabs, they’re a much more expensive but they’re very professional clean voices and you can find the right fit for your kind of experience that you want, trade off. There’s a lot of trade-offs, right, in voice between latency, price, quality, so we let users explore that and find the right fit for their voice agent.

That is interesting. So Next.js SDK, streaming, latency, is it, you’re meant to be the middleware between implementation and feedback to user.

Yeah, we handle everything related to the voice basically and we let you just handle text like a text chatbot basically.

No heavy MP3 or wave file coming down, just—

Yeah, and everything’s streaming. And so it’s a very interesting problem to solve because the whole system has to be in real time. So the whole thing, we call it a pipeline. I don’t know if that’s a great name for it because it’s not like an ETL loading pipeline, but the real time agent system, our backend, when you start a new session, it runs on Cloudflare Workers. So it’s running right near the user who clicked to chat with your agent with voice.

And then from that point on everything is streaming. So the microphone input from the user’s browser streaming in, that is then getting streamed to the transcription model in real time. The transcription model is spitting out partial transcripts. We send that partial transcript back to you so you can show what you’re saying if you want to show them that.

And then the hardest bit in this whole thing is working out when the user is finished speaking. It’s so difficult because we pause, we make sounds, we pause and then we start again and conversation is such a dynamic kind of, it’s like a game almost, right?

So we have to do some clever things, use some other AI models to help you detect when the user ends speaking. And when we have enough confidence, like there’s no certainty here, but we have enough confidence, I think the user finished their thought, then we finalize that transcript, you know, finish transcribing that last word and ship you that whole user utterance, like whether it’s a word, a sentence, paragraph the user has spoken.

The reason we have to kind of like, we can’t stream at that point, right, we have to like bundle up this user utterance and choose an end, is because LLMs don’t take a streaming input. I mean, you can stream the input but like you need the complete thing, the complete question to send to the LLM to then make a request to the LLM to then generate any response, right? There is no duplex LLM that like takes input and generates output at the same time.

Technically, what if you constantly wrote to a file locally or wherever the system is and then at some point it just ends and you send a call that sends the end versus packaging it up and sending the whole thing once it’s done. Like you incrementally, yeah just line by line, it’s like maybe even like, I don’t know, I’m not sure how to describe but that’s how I think about like what if you constantly wrote to something and you just said okay it’s done and what was there was the done thing.

Yeah, so we can do that in terms of like, because we have to, partial transcripts, yeah, so we can, you can stream the partial transcripts and then say okay now it’s done, now make the LLM call, then you make the LLM call.

But interesting, text, sending text is actually super fast in the context of, very fast. And actually the default example, this is crazy, I didn’t think this was, this would work until we tried it but it just uses a webhook. When the user finishes speaking, the basic example sends your Next.js API route a webhook with the user text. And turns out the webhook sending, sending webhook with a few sentences in it, that’s like, that’s fine, that’s fast. It’s all the other stuff, like then waiting for the LLM to respond.

Yeah, that’s actually not the hard part. I mean, you have to be a millisecond or somewhat a few milliseconds but it’s not going to be a dramatic shift the way I described it versus how you do it.

Yeah, and we’ve got a WebSocket endpoint now so we can kind of shave off that HTTP connection and everything but yeah, then the big heavy latency items come in. So generating an LLM response, most LLMs we use right now, they’re optimized, the ones we use in coding agents, they’re optimized for intelligence, not really speed. Then when people optimize for speed, the LLM labs, they tend to optimize for just token throughput. Very few people optimize for time to first token. And that’s all that matters in voice is I give you the user utterance, how long is the user going to have to wait before I can start playing back an agent response to them?

And time to first token is that, right, how long before I get the first kind of word or two that I can turn into voice and they can start hearing. The only major LLM lab that actually optimizes for this or maintains a low latency of TTFT is Google and Gemini Flash. OpenAI, most voice agents now doing it this way, we’re using GPT-4o or Gemini Flash. GPT-4o has got some annoying, the OpenAI endpoints have some annoying inconsistencies in latency. And that’s kind of the killer in voice, right, it’s a bad user experience if it works you know the first few turns of the conversation are fast and then suddenly the next turn the agent takes three seconds to respond to you. Like is the agent wrong? Is the agent broken?

But then once you get that first token back then you’re good because then you can, you send that text to us, you start streaming the text to us and then we can start turning into full sentences. And then again we get to this batching problem. The voice models that do text to voice, again they don’t stream in the input. They require a full sentence of input before they can start generating any output because again how you speak, how things are pronounced depends on what comes later.

And so you have to then buffer the LLM output into sentences, ship the buffered sentence by sentence to the voice model, and then as soon as we get that first chunk of 20 millisecond audio, we chunk it up into, we stream that straight back down WebSockets from the Cloudflare worker straight into the user’s browser, we can start playing the agent response.

You chose TypeScript to do all this? Understand we’re pretty set on Cloudflare Workers from day one, okay.

And it just solves so many infrastructure problems that you’re going to run into later on. I don’t think we’ll need a DevOps person ever. Yeah, it’s such a, it’s interesting, it’s such a wonderful platform. There are constraints you have to build to, right, you’re using V8 JavaScript, browser JavaScript in a Cloudflare worker. Tons of Node APIs don’t work. There is a bit of a compatibility layer. You do have to do things a bit differently. But what do you get in return? Your application runs everywhere at 330 locations around the world. There is essentially zero cold start. Cloudflare workers start up in the time while the SSL negotiations are happening, the worker has already started. And you have very few limitations to your scaling, extremely high concurrency. Every instance is very kind of isolated, that’s really important voice as well. There’s often quite big spikes, like 9 a.m. everyone’s calling up somewhere that’s got a voice agent and asking to kind of book an appointment or something. You get these big spikes. You want to be able to scale and you need to scale very quickly because you don’t want people waiting around.

And if you throw, well if you throw tons of users on the same system and you start kind of overloading it then suddenly people get this problem where the agent starts responding in three seconds instead of one second. And it sounds weird but yeah, Cloudflare gives you an incredible amount of that for no effort. And I think compared to kind of Lambda and stuff it’s also pretty, like the interface, it’s just a HTTP interface to your worker. There’s nothing in front and you can do WebSockets very nicely. And there’s this crazy thing called Durable Objects which I think is a bad name—

[60:08] And it’s also kind of a weird piece of technology, but it’s a little JavaScript runtime that is persistent basically and has a little SQLite database attached to it. And it is — I don’t know what the right word is, it’s kind of like, it’s not the right word for JavaScript, but it’s basically think of it like thread safe. So you can have it take a bunch of WebSocket connections and do a bunch of SQL writes to its SQLite database it has attached, and you don’t have to do any kind of special stuff dealing with concurrency and atomic operations.

So you know, the simple example is just implement a rate limiter or a counter or something like that. You can do very simply in Durable Objects. You can have as many Durable Objects as you want. Each one of them has a SQLite database attached to it. You can have 10 gigabytes per one. And you can then kind of do whatever, you can kind of shard however you want, right? You could have a Durable Object per customer that tracks something that you need to be done real time. You could have a Durable Object per chat room.

As long as you don’t kind of — like it does have a set amount of compute, a Durable Object, but you can use it for all sorts of magical things. And I think it’s a real underknown thing that Cloudflare has. Coming from Pusher, it’s like the kind of real-time primitive now. And a lot of the stuff we’d reach for something like Pusher, Durable Objects — especially when you’re building a fully real-time system — is really grateful.

Yeah. You chose TypeScript based on Cloudflare Workers, it sounds like, because that gave you three hundred locations across the world, Durable Objects, great ecosystem, no DevOps. For those who choose Go — or I don’t think you choose Rust for this because it’s just not the kind of place you put Rust — but Go would compete for the same kind of mind share for you. How would the system have been different if you chose Go, or can you even think about that?

[62:10] I haven’t actually written any Go, so I don’t know if I can give a good comparison. But from the perspective of what we do have out there — there are similar real-time voice agent platforms in Python. And I think because a lot of the people building the models, the voice models, then built coordination systems like Layercode for coordinating the real-time conversations, Python was the language they chose.

And I think what’s more important is the patterns rather than the specific languages. And so we actually wrote the first implementation with RxJS, and that has an implementation in most popular languages. I hadn’t used it before, but we chose it because it was for stream processing. It’s not really for real-time systems, but it gives you subjects, channels — these kinds of, it has its own names for these things — but basically it’s like a pub-subby kind of thing. And then it’s got these kind of functional chaining things where you can then kind of pipe things and filter things and filter messages and split messages and things like that.

And that did allow us to build the first version of this kind of quite dynamic system. We didn’t touch on it, but interruptions is this other really difficult dynamic part where whilst the agent is speaking its response to you, if the user starts speaking again, you then need to decide in real time whether the user is interrupting the agent or are they just going “yeah” and like agreeing with the agent, or are they trying to say “oh stop.” How much of the hard problems — we have to still be transcribing audio even when the user is hearing it, and we’ve got to deal with background noise and everything.

And then when we’re confident the user is trying to interrupt the agent, we’ve then got to do this whole kind of state change where we tear down all of this in-flight LLM request, in-flight voice generation request, and then as quickly as possible start focusing on the user’s new question. And especially if their interruption is really short, like “stop” — suddenly you’ve got to tear down all the old stuff, transcribe the word “stop,” then ship that as a new LLM request to the backend, generate the response, and then get the agent speaking back as quickly as possible.

And it’s all happening down one pipe, as it were, at the end of the day, right? It’s like audio from the browser microphone and then audio replaying back. And we would have bugs like you’d interrupt the agent, but then when it started replying there’d still be a few chunks of 20 millisecond audio from the old response snuck in there, or the old audio would be interleaved with the new audio from the agent back. And you’re kind of in Audacity or something, some audio editor, trying to work out like what’s going — why does it sound like this? And you’re rearranging bits of audio going, “Okay, the responses are taking turns every 20 milliseconds, it’s interleaving the two responses,” trying to work out what’s going on. Real pain in the bottom, yeah.

When you solve that problem with the interruption, do you focus on the false examples, the true examples? So do you like have these — if it is an interruption you can tell it’s an interruption by these 17 known cases? Like how do you direct that, the interrupt?

It really depends on the use case. How you configure the voice agent really depends on how the voice agent is being used, right? Like a therapy voice agent needs to behave very differently than a vet appointment booking answering phone agent.

[66:21] Yeah. What about dogs barking in the background?

Yeah, there’s that. And when we call that audio environments, that’s often an early issue users have. It’s interesting, they’re like, “Well, my users call from cafes and it kind of really misunderstands them.” And big problem with audio transcription — it just transcribes any audio it hears, right? So if someone’s talking behind you, the model doesn’t quite know that’s irrelevant conversation. It’s just transcribing it all.

But if you imagine the therapy voice agent, it needs to actually not respond too quickly to the user and allow the user to have long pondering thoughts, long sentences, big pauses. Maybe tears, they’re crying, or just some sort of human — you know, interrupt, but it’s not a true interrupt. It’s something that you should maybe even capture and process.

And so you can choose a few different levels of interruption, right? You can just interrupt when you hear any word. By default, we interrupt when we hear any word that’s not a filler word, so you filter out things like that. And then if you need some more intelligence, you can actually just ship off the partial transcripts to an LLM in real time.

So let’s say the user’s speaking while and starts interrupting the agent — every kind of word you get, or a few words, you fire up a request to Gemini Flash and you say, “Here’s the previous thing that the user said, here’s what the agent said, here’s what the user just said. Respond yes or no, do you think they’re interrupting the agent?” And you get that back in about 250-300 milliseconds. And you just — as you get new transcripts you can cancel the old ones — you just constantly try and make that request until the user stops speaking. Then you get the response from that and then you can kind of make quite an intelligent decision.

But these things feel very hacky, but they actually work very well.

[68:26] Well, the first thing I think about there is that Gemini Flash is not local. So you do have to deal with an outage or latency or downtime. Or in Cloudflare’s case, most recently, a lot of downtime because of usage — like really heavy usage. Unless it is — I’ve had more trepidation with Cloudflare than ever, and I’m like, “Okay cool, I get it.” You know, I’m not upset with you because I empathize with how in the world do you scale those services.

Yeah, it’s the Ralph effect.

It’s the Ralph effect. And I’m like — so why does your system not allow for a local LLM to be just as smart as Gemini Flash might be to answer that very simple question? Like an interrupt, it’s a pretty easy thing to determine.

Yeah, yeah. I think smaller LLMs can do that. Gemini is just incredibly fast, and I think because of their TPU infrastructure they’ve got an incredibly low TTFT — time to first token — which is the most important thing. But I agree that there are smaller LLMs, and actually I think probably maybe one of the Groq with a qLLaMA is actually might even be a bit faster. We should try that.

But you make a point about reliability. People really notice it in voice agents when it doesn’t work, for sure. And especially for businesses relying on it to collect a bunch of calls for them. And so that is one of the other helpful things that platforms like us provide as well.

Or even just cost. I imagine over time, cost — I mean, right now you’re probably fine with it as you’re innovating and maybe you’re finding out customer fit, ability, reliability, all those things. And you’re sort of just-in-time building a lot of the stuff and you’re maybe okay with the inherent cost of innovation. But at some point you may flatten a little bit and you’re like, “You know what, if we had been running that locally for the last little bit, we just saved 50 grand,” you know?

I don’t know what the number is, but the local model becomes a version of free when you own the hardware and you own the compute and you own the pipe to it, and you can own the SLA latency to it as well. The reliability comes from that.

And there’s some cool — there’s a new transcription model from Nvidia and they’ve got some voice models as well. And so there was a great demo of fully open source, like all voice agent platform, that was done with Pipecat — which is the Python coordination agent, open source project that I was mentioning. And they’ve got a really great pattern. They have a plugin pattern for the voice agent, and I think that’s the right pattern.

And we’ve adopted a similar — and other frameworks have done that — we’ve adopted a similar pattern for us. When we rebuilt it recently, the important thing is the plugins are independent things that you can test in isolation. That was the biggest problem we had with RxJS — the whole thing was kind of, you know, those mixing kind of audio mixing things with cables going everywhere. It was kind of like that, right, with RxJS subjects going absolutely everywhere. It was kind of hard — it was hard for us as humans to understand. It was the kind of code where you come back to a week later and go, “What was happening here?”

And things like, you know, oftentimes we’d end up writing code where the code at the top of the file was actually the thing that happened last in the execution of it. You know, basic stuff like that, just because that’s how the RxJS was kind of telling us to do it or kind of guiding us and how we had to initialize things.

But that was one of the key things — we moved to a plugin architecture. We went to a very basic — we’ve got no kind of RxJS-style stream processing plugin. It’s just all very simple JavaScript with async iterables, and we just pass a waterfall of messages down through plugins. And it’s so much better. And we can take out a plugin if we need to, and we can unit test a plugin, and we can write integration tests and mock out plugins up and down. And we’re about to launch that, and that’s just such a game changer.

And then interestingly, tying back to LLMs, we ended up here because with the first implementation we found it hard as developers to understand the code. We’d run the LLMs — they were hopeless. They just could not hold the state of this dynamic, crazy, multi-subject stream system in their head.

Context was everywhere, right? Like it was all — it was here, it was there.

Yeah. Even if you — I would do things like take the whole file. I was like, copying and pasting files into ChatGPT Pro being like, “You have — you definitely have all the context here. Fix this problem.” And they couldn’t solve the problem.

And part of the problem was that complexity. I mean, not having the ability to test things in isolation then meant we couldn’t have a kind of TDD loop, whether it was with the human or with an agent. And so — and because we couldn’t use agents to add features to this platform, to the core of the platform, it was slowing us down.

And so that’s when we really started to use coding agents — Claude Code, Codex — like really properly and hard. We were like — I spent like two weeks just in Claude Code and Codex, and the mission was: if I can get the coding agent to write the new version of this — it was kind of not even a refactor, it had to be rewritten, start from scratch, first principles — then it will, by virtue of it writing it, understand it. And then I’ll be able to use coding agents to add features.

And I started with literally the API docs for our public API, because I didn’t want to change that, and the API docs of all of the providers and models we implement with — like the speech-to-text and text-to-speech model provider endpoints — and just some ideas about, “I think we should just use a simple waterfall pipe, like pass messages through the plugins.”

And that experience was really interesting because it felt like molding clay. Because I did care about — I really cared about how the code looked, because I wanted humans as well as — the agents aren’t quite good enough to build this whole thing from a prompt, but I think they will be in a year or two, right? But it did an okay job, and it needed a lot of reprompting — “refactor this, re-architect this” — but it felt like clay in one sense because, and you mentioned this earlier, you can just write some code and even if it’s wrong, you’ve kind of learned some experience.

I was able to just say, “Write this whole plugin architecture and do it,” and it would do it, and I’d be like, “Oh, that seems a bit wrong, that’s hard to understand.” I was like, “Write it again like this. Write it again like this.” And I suddenly got that experience of throwing away code because it hadn’t taken me weeks and weeks to write this code.

It had taken you 10 minutes.

And I was there, just threw it away. And you still have your chat session too, so even if you had to scroll back up a little bit, or maybe even copy that out to a file for long-term memory if you needed to, you still have that there as a reference point.

Yeah. I find myself doing similar things, which is like just trust the model, throw it away and do it again if you need to. Learn the mistake, go down the wrong road for the learning, and make the prompt better.

And it did a terrific job. And then the bit that really got it over the finish line was then I said — I gave it this script that we used to have to do manually to test our voice agent. You know, it’s like: connect to the voice agent, say this to the voice agent, tell it to tell you a long story, now interrupt the story, you shouldn’t hear any leftover audio from the long story — like all these things, there’s like 20 different tests you had to do.

I gave it that script and I was like, “Write the test suite for all of these tests.” And then it did. And I gave it all these bugs we had in our backlog. I was like, “Write tests for this.” And I just started doing TDD on our backlog, and it was great.

Then I was like, “Write a bunch —” I did like a chaos monkey thing. I was like, “Write a bunch of tests for like crazy stuff the users could do with the API.”

Found a bunch of bugs and issues, security issues. And then it kind of — it got it working, got a bunch of unit tests. And I was still having to kind of do a bit of manual testing. And then one day I was like, you know what, I really want — no one’s made an integration test thing for voice agents. There are a few observation platforms, observability platforms and eval platforms.

So I was like, I just wanted to simulate conversations. And it’s so — that’s part of the magic, is trying something that you’re like, “This is a pain in the ass to build,” or like, “How is this even going to work?” Well, I just got it to build it.

And I recorded some WAV files of me saying things, and I gave it to them. I was like, “Make an integration test suite for this and feed the WAV files like you’re having a conversation, and check the transcripts you get back.” Wow. And it did a great job. And then it was actually able to fully simulate those conversations and do all the tests.

And then that — I mean, we’ve got these practices like TDD which are going to hold value, right? It was so valuable for the model, for the agent, to be running the test, fixing the test, running the test, fixing the test. And that feels a bit like magic when you get that working.

So much to cover in this journey. I’m so glad we had this conversation. I kind of feel like a good place to begin the end — not actually end — is back to this idea that is on your about page.

And I just got a reMarkable, because I love to write and I really hate paper, because this thing has Linux on it. And I wrote an API that I now API to my reMarkable Pro tablet. So amazing. I’m loving this.

Behind me?

So you can be able to Claude Code or Codex to your tablet?

That’s next.

I just got it. I just got it. So like, that’s the next thing. I’m going to have this — it’s a little playground for me, basically. But it’s real time. So if you see me looking over here writing — audience, or even you Damien — I’m not not paying attention. I’m writing things down.

And the one thing I wrote down earlier from your about page was “the era of the small giant,” which you alluded to, but you didn’t say those exact words. And the reason why I think it might be a good place to begin to end is I want to — I think you might be able to encourage the single developer that maybe in the last couple of months they’ve just begun to touch and not resist falling into this gravity hole, or however we describe this resistance we’ve had as developers, loving to read our own code and code review and all the things as humans, and now to not resist as much, or if at all, and just trust the model.

To give them this word of encouragement towards, “Hey, you’re a single developer,” and in your case, Damien, you don’t need a DevOps. It’s not that they’re not valuable or useful, but you chose a model, a way to develop your application, solve your problem, that didn’t require a DevOps team. Give them that encouragement. What does it mean to be in this era of the small giant?

[80:41] I think the hardest thing is our own mindset, right? I just found this with coding agents. You start off putting in things where you kind of have an idea, you know what to expect out of it. And then you start just putting in stuff that just seems a bit ridiculous and ambitious. And oftentimes it fails, but more and more it’s working. That’s a very magical feeling, and it’s a very revealing kind of experience.

And so I think we can all be more ambitious now. I think we all — and especially as engineers, we know how the whole thing works, right? There is a lot of power everyone’s been given with vibe coding. Being able to vibe code — there are a lot of security issues I think will be solved over time — but as engineers, we have the knowledge to be able to take things fully through, deploy things, scale them, fix the issues that the LLMs can still get stuck on.

But we can do so much more now. We can be so much more ambitious now. And I think the thing that everyone should — every engineer should be doing now is not only trying out Claude Code and Codex and doing something new and fun. I mean, the great thing is it’s so low risk. It’s so easy to do that you can build something ridiculous and fun that you’ve always wanted to do.

You can just — and you can build something for a friend, for your wife. And that’s really exciting.

And I think this Ralph Wiggum thing — very kind of basic idea — is give a spec.md, todo.md, just an ambitious task or a long list of tasks in a markdown file, and you run a shell script. And all it does is it just says to Claude Code, “Do the next bit of work. When there’s no more work to do, return complete.” And the shell script just greps for “complete,” and if it hasn’t seen that word in some XML tags, “complete,” it just calls Claude Code again.

And like many of these things, it seems like a terrible idea, it seems ridiculous, but it is also incredible what it can do. And so I think that’s probably — to feel what the future is going to be like, I feel like you write down something very ambitious in a markdown file, or transcribe an idea you have that you’ve been thinking about for a while, and you set a Ralph Wiggum script off on it, and you just go for a long walk or go and have lunch. And when you come back — I mean, it’s a very exciting feeling.

And as a developer, it’s very fun because then you get to go through all this code and be like, “Why did it do it?” And you’re like, “Oh, that was pretty smart. But I didn’t like that. Okay, that was quite a good idea. It messed up this bit.” But that’s — I just feel like that’s a very, very exciting experience.

Very cool. I definitely agree with that. I’m looking forward to writing that todo.md or spec.md and just going for that walk. Because I haven’t done it yet. I’ve only peaked at some of the videos and some of the demos, but I haven’t tried the Ralph Wiggum loop yet.

I’m gonna post on X a one-liner as well, because I think you can just then copy and paste the thing.

There’s their blog post to read, yeah.

Well, I feel like with everything, I want to make it more ceremonious. Because I want to — it’s not because it needs to be, because I want to know — I want to give myself space to think of something that I think will be challenging for me, even, you know? And then give it to the thing and go away, like you said, and come back happy.

I want to save space to do that when I can give it full mind share, versus the incremental 20 minutes or 10 minutes or whatever it might be that I have available to give it. I kind of want to give it a bit more ceremony, not because it deserves it, because I want to actually do it for myself.

So I’m just in this constant learning scenario. It’s a pretty wild era to be a developer and to be an enabled developer. These non-technical folks that may get introduced to a terminal-like thing that’s basically just Claude in a directory, really, and ask questions and get a just-in-time interface managed to them only — that’s a really, really, really cool world to be in.

And it doesn’t mean that software goes away. It just means there’s gonna be a heck of a lot more of it out there. And I do concur that maybe code doesn’t matter anymore. Maybe it won’t in a year. Maybe it won’t in six weeks. I don’t know how many weeks it’ll take.

Let’s truly aim with this. What’s over the horizon for you? What’s over the horizon for Layercode? What is coming?

So the show release — next Wednesday you’ve got a week, given that horizon. And no one’s listening. It’s a week from now.

What’s on the horizon for you that you can give us a peek at? Is there anything?

We are working really hard to bring down the cost of voice agents. There is a magic number of one dollar an hour for running a voice agent, where suddenly a huge, huge number of use cases open up — whether it’s consumer applications, gaming. There are so many places where voice AI will be super valuable, super fun, and isn’t implemented yet.

And with the choices we made being on Cloudflare, with the system we’ve built, we’re going to be able to bring out the lowest cost platform. I’m very excited for that.

And most of all, very excited just to see voice AI everywhere. Voice AI — voice is just such a wonderful interface, right? I find myself dictating all the time to Claude Code, and you can kind of get out your thoughts so much better. And I’m excited to see how many applications we can enable adding voice AI into their application.

And then we get an insight into the future of voice AI as well, with the companies that are building — a lot of them startups — and they’re building some crazy, crazy new things with voice AI on our platform. So there’s going to be some amazing stuff with voice coming out this year.

What’s the sweet spot for Layercode right now that you can invite folks to come and try?

Well, the great thing is we’ve got a CLI — single command you can run — and you’ll get a Next.js demo app all connected to the live voice agent. And you can get a voice agent up and running within a minute. So it’s super fun, worth trying. And then from that point, you can use Claude Code, Codex, and just start building from that.

Well friends, right here at the last minute of the very last question, Damien’s internet dropped off or something happened. I’m not sure. But it was a fun conversation with Damien.

Kind of wild to be talking to somebody 17 years later after being one of the first — if not the first, I’m pretty sure the first — sponsor of this podcast. What a wild world it is to be this deep in years and experience and history in software and to just still be enamored by the possibilities.

I hope you enjoyed today’s conversation with Damien, and we’ll see you next time.

Changelog

Our transcripts are open source on GitHub. Improvements are welcome. 💚

Player art
  0:00 / 0:00