Episode 6

In this episode, Quinn and Thorsten discuss Claude 4, sub-agents, background agents, and they share "hot tips" for agentic coding.

Listen on other platforms

Transcript

Thorsten: To me, it just seems like there's different philosophies in these houses that, you know, what an agent should be. And I think Anthropic so far seems to be the most on the path of like this practical coding agent that can go in and figure stuff out and not just build apps, you know?

Quinn: Yeah. I don't understand why, because that seems like it is strictly better to be able to get that kind of feedback from the environment to iterate. It feels like that's better from a business model point of view, because I mean, think of all the inference you're doing if you have a long tool calling thread going on.

Thorsten: Welcome to another episode of Raising an Agent. This time, episode six, last time recorded in San Francisco. Now we're back. One of us is in Germany and the other one is in San Francisco. Here's Quinn, CEO of Sourcegraph. Hi.

Quinn: Hello. How's the coding been?

Thorsten: It's been really good. It's been so good yesterday. I'm not making this up. It's been so good yesterday. How it felt that in the evening, I thought, "how can I feel this good?" There has to be something wrong. Like, did I miss anything today? Did I miss to catch up with anything? But no, it's been nice. Like yesterday, I wrote a lot of code by hand again after letting Claude 4 rip, or Sonnet 4 rip for a few days. And it was nice.

Quinn: So why did you need to write it by hand?

Thorsten: Good question. I think I had it, so the previous two days, I had it, you know, zero shot, like features go all through the stack and build a new thing. And that kind of gave me like an overview of, okay, this is not what I want, or these are the files involved. And then the day before yesterday, I had this thought of, I could build it this way. And then yesterday morning, I had Sonnet 4 take my idea and implement like a rough, easy version. And then I just spent the rest of the day kind of exploring the code and figuring out little bits and pieces and moving them around. It's like moving the guardrails around. Like, I don't know how to even express this in a prompt. It's more like, "let me see how this works and let me understand the invariants involved here. And then let's put them in code." And then I had Sonnet 4 what we always do, like write a new storybook thing or like, okay, here's how I want it to work. Sonnet 4 you now go and implement the front end component for this or just take care of the typing for me. And that felt really good yesterday. But it's been a while since I wrote this much code by hand. Nice. How has coding been for you?

Quinn: Well, ever since we opened up the wait list with Amp, it's been easy to get distracted by all these people sending in their feedback on the Discord and X and email and all that, which I love. It's so motivating. And we never want to forget the basics, like fixing bugs, making things fast, all that. And I've had my time consumed by a lot of that. And Amp has been really good at those kinds of things. But there have been some big, bigger things I've wanted to get to that I haven't. Like improving how we have the internal API communication from the client to the server. Right now, it's sort of like an ad hoc REST API. And then background agents. And I've not gotten a big chunk of free time to work on either of those.

Thorsten: Yeah.

Quinn: So I want to get to it. But Claude 4 has also just been mind-blowing. And it was five days ago that it came out. It made us crazy.

Thorsten: Yeah. So, okay, three things we got to talk about. Claude 4, background agents, which you just mentioned. And then maybe let's start again by opening it up for everybody, like removing the wait list. We did this three weeks ago, two, three weeks ago. What's your main...

Quinn: Well, it was a result of meticulous planning, where I believe you and I were talking and we're like, "should we do it in five days? No, what about now? Right?"

Thorsten: What about now? It was, okay, let's do it in a few days. Let's add more people from the wait list. And we both started clicking to let people in from the wait list. And then we said, "how about we do it today?" And then, "oh yeah, tomorrow. I think tomorrow." And then I wrote this post and then we just did it the next day. But yeah, just my impression ever since opening it up is, of course there are bugs and we had to fix bugs, but no big surprises, like no nasty bugs that made me sweat of like, oh my god I really have to get this fixed today which is you know just speaks to having the waitlist concept and like fixing bugs with you know it speaks to Geoff right like testing this stuff out but overall I'm I'm quite happy you know how much it resonates like what we want to put out in the world how much it resonates with others when they say it's um super simple some say it's the best agent and it's focused on just getting quality and just focused on getting out of the way that was nice like you never know how it's going to resonate really but it's nice now seeing this feedback

Quinn: Yeah it was so fun to see all the feedback and still it's coming in I think that it's really nice and validating to see that in this crazy busy space where it feels like everyone is building some code AI tool that there is space for one that tried to be radical and just focus on being the best with unconstrained tokens and no model selector and these radical things I mean there's space there's always space for that

Thorsten: Yeah yeah I don't I don't I don't want to say we're thought leaders because if I do then there's like a bucket of slime above you know that's gonna crash down but how maybe that's with me but how outdated does it look now this model selector and like different modes and like just one more button and you can configure everything I I don't know if this is just me being used to Amp now but it just seems like that was before too like that was why would you let the user switch to an obviously worse model like why put mini or you know Haiku in there. Like, well, that doesn't make sense. Like, sure, you can create it. But it seems like, again, like, since we started recording the first episode, even, I think the world has changed even still.

Quinn: Yeah, well, we know how much work it is to make a really great agentic coding tool. And the other people, the other teams that are building these things, not like they have significantly bigger teams. It's hard to even do it for one model. And we had the benefit of being able to work with some of the new models that come out well before they're out takes a ton of work and you know we should also talk about our investigation at Gemini 2.5 Pro and why we ultimately decided at least for now to not move forward with it but I can understand why a new model comes out and it like doesn't work well in some of these tools.

Thorsten: Yeah

Quinn: It is really hard to make it work it takes a lot of thought but it's so easy to get the vibes of hey we got a new bottle you know it's out there yeah

Thorsten: "internal benchmark show..."

Quinn: Yeah exactly yeah

Thorsten: So okay let's talk about Sonnet 4 then um which still trips me up by the way let's start with this that they called it Claude Sonnet 4 and not Claude 3 you know Claude 3.5 Sonnet oh it's a weird thing but um yeah we switched to this last I think on friday it came out last thursday on friday we switched to it we didn't immediately switch once you know all the other 18 AI agent builders published yeah we switch you know it's now available but we also tried it out like for a bunch of weeks before and yeah what do you think like what are your impressions from Sonnet 4?

Quinn: I think it's fantastic so I do not have evals for my personal use just to be clear but it does a much better job of doing a full end-to-end feature in the Amp repo, where we've got a server, we've got a client, we've got tests, we've got like a core kind of shared library package. It does a great job of that. And I really don't have complaints. And I know that I'll find complaints as I start to get used to it, but it's doing so much better than previous models that I need to change how I work with it to find the limits.

Thorsten: Yeah, that's, I think, the end-to-end feature thing. Like it can take on tasks of bigger complexity. That's the most striking thing for me. As in the first time I used it, you start to go, you know, be more aggressive and throw bigger things at it. And then I now realized after a week of heavy daily use that I adjusted my internal instinct of this is probably too big for the model. You know, like let's cut the scope of this, which we've developed over the last few months, right? Where we would tell users, this is too big. Like Sonnet 3.7, just use smaller threads and whatnot. And I still think that's valid advice. You should use small threads, but still like that. The average task can now be, it's hard to put into words, but it can be more complex and it still gets it. And I'm surprised when I hear the ding sound and it is done. And then I switch tabs and then I'll look what it did. And I'm like, "you actually did it. You actually went all the way through the stack." Good example. I asked it something really complex, which can get into a second, which is we have sub-agents that are agents running in. The agent can call a tool and that tool is, surprise, surprise, another agent. And what I wanted is a UI element to show the progress of this agent. And I basically gave it a prompt and I said, "here's the component." Right now, it only shows in progress or agent is working or something. But here on the other end, when the agent is working, we can collect all the progress it's making. And it's just to paint a picture here. It's not like an enum that says, like, queued in progress done. The agent is working, and every time it goes into a loop, it has a message, and it has potentially tool uses, and then those tool uses update. So it's basically an array of big chunk of things. So it's an array of a message and possible tool uses. And then the tool uses updates because they go from queue to in progress to done. And it's a fairly compact thing. So I did a bad job of just explaining it. And I kind of sketched this out for Sonnet 4. And then it went and did it. And it worked. It worked, I think, not on the first try, but it got one little thing wrong, which is basically it forgot to hook it up in the front end component, which is also on me. How would it know? But I was surprised. That's complex stuff. And I could see the old Claude messing this up. It was nice. That's cool.

Quinn: And that's for the task tool that you introduced?

Thorsten: Exactly, yeah.

Quinn: Yeah, well, that's another new thing that Claude 4 seems a lot more eager to use. You want to talk about that?

Thorsten: Yeah. Yeah, so again, we have a task tool or sub-agent tool. And the idea is, just like edit file, list file, whatever, you give the agent a tool called task tool, and you say this is an agent you can use to finish small tasks, and it will return a result to you, or it will do stuff for you. And one, we've had an agent for a long time in Amp, which is our code-based search, which is just another LLM going through, or an agent going through the code base and finding results. But that's really specific. It's not generic for all the tasks. So I think I added another sub-agent a while ago, but Claude 3.7 was not that eager to use it. But then with Sonnet 4, once you add the task thing, and I mean, users reported this. We already have it. So users reported this on Discord. Once you add the task sub-agent, Sonnet 4 is really happy to use it. Like the first example that blew my mind was I tested it out and I said, "take all of the blog posts in here and remove from the YAML front matter the comments true thing because I don't have comments anymore." What it did was it did a, I think, ripgrep or glob. So it knew that there's like 36 blog posts and then it divided them up and spawned four different agents and told each agent you go and remove the comments thing from the blog post and they went

Quinn: That's cool and it and was that using the batch one to do it in parallel too?

Thorsten: No no no no it was it was just spawning four different ones because it can't use parallel tool calls right which is another thing Claude 4 seems more eager to use parallel two calls it seems faster than 3.7. So yeah, that's, you know, and now I'm working on the task or sub agent again to make it, you know, look better in the UI to give it more tools to kind of streamline the whole thing. And the other thing because I get this, you know, we get this question a lot, like why use a sub agent? What's the advantage of a sub agent? And with the code-based search agent, it's not that apparent I think, but what people have been bumping into everybody is you only have a limited context window right we have right now we have 168k output tokens because sorry input tokens and because we have 32k output tokens for Claude and at some point you run into this limit and it's especially you know it's not nice when you run into this limit after the agent went off the rails and did something bad you know like when it fails to edit a file 18 times or when it creates a new test file and spends a bunch of tokens and whatnot. The funny thing is with these agents and Claude 4 being much more eager to call agents is that, well, each sub-agent has its own context window, right? So if you say, "hey, sub-agent, take these eight blog posts and remove the YAML thing," that agent gets its own K tokens, right? And if it fails to edit that file, sure, it might fill up that context window. But your main agent, after that task is gone, still has only, what, like 20% tokens used to something. Like, it's not a lot.

Quinn: Yeah. And so the tasks complete faster because they're not saddled with all the context of your parent thread. And if it goes off and does the wrong thing, the parent thread won't get confused.

Thorsten: Yes.

Quinn: And also, yeah, it can go. So a lot of benefits to this.

Thorsten: Okay, so the mental model is right. Like if you have to do something of complexity, if you delegate to five other people, you don't have to keep a lot of stuff in your head. You just have to know they will come back to me.

Quinn: Yeah.

Thorsten: Yeah.

Quinn: Yeah. And you don't want to know everything they did necessarily.

Thorsten: Yeah.

Quinn: Yeah. All right. So we're going to say. All right. I think there's a lot of people that are wondering like, why? Why does Claude 4 support this? Is it because it's a smarter model and therefore it can do this? And I think this is a really important point that these models are trained well in advance. I don't know the exact timeframe. The model houses don't release it. And they were trained, let's say six months or nine months ago or something like that. And at that time, they surveyed the landscape. They looked at what people were doing internally and all across the world. And they said, how are people doing tool calling? And what can we do to anticipate where they're going to be six to nine months from now? And they trained the models to do well with that. And there's post-training too, I'm sure that can be closer to time, but it's an intentional choice by the model creators to train these kinds of behaviors into the model. And absolutely there's emergent behavior, but no, I mean, this kind of stuff is intentional. And then, you know, this is when we talk about going with the grain of the model, trying to understand the model's capabilities, building deeply into a single model's capabilities, instead of trying to support 17 models in a dropdown. It's really important to get a feel for how the model was trained. What's it good at? What does it want to do?

Thorsten: Yes.

Quinn: So this is a perfect example of that.

Thorsten: Yeah. I think there's what I'm starting to realize over the last, you know, we've had a bunch of releases since we recorded last episode. So we had OpenAI come out with Codex. Before that, Codex the CLI, they now have Codex the model and they have Codex the background agent. And Google came out with Jules, which is their background agent. Obviously, Claude Code new releases and Claude 4, Sonnet 4 and Opus 4. And Gemini, there's a new version of Gemini, not sure. But I think what's emerging now compared to, say, a year ago, a year ago, it seems like they were all maybe not Anthropic, but the big goal was to build a consumer LLM, like a chatbot with which you can talk and ask about stuff and whatnot. But now when it comes to coding, I think there's different philosophies emerging in the models. And I think, for example, Gemini and even OpenAI, it seems like they're going really after this. You tell the model, "I want to build a to-do application in Swift," and then it goes and one-shots it or zero-shots it. I think in their mind, the vision is it takes the few steps to go there and it takes a long, long time, or it can go for a long, long time and build something. And I think Anthropic's vision is more that the model reacts more to the environment if necessary. As in the example I always use is I can now build you a bot or an agent, sorry, an agent that I can ask, "restart this Nginx server." And Claude will go and see like, oh, is Nginx running? It will run PS. It will try every binary location it knows of in its training data. And it might even go and do PS and find the running process and then find that binary location and whatnot. Like it can wiggle itself out of these problems that it runs into. And it feels like Gemini 2.5, for example, doesn't have it that much. Like Gemini 2.5, when it runs into an error, it's like, "here we are. How about you do something and then I can go and do my job again?" And to me, it just seems like there's different philosophies in these houses that what an agent should be. And I think Anthropic so far seems to be the most on the path of this practical coding agent that can go in and figure stuff out and not just build apps, you know.

Quinn: Yeah I don't understand why because that seems like it is strictly better to be able to get that kind of feedback from the environment to iterate it feels like that's better from a business model point of view because I mean think of all the inference you're doing if you have a long tool calling thread going on and it also feels like a simpler model than some of the other kind of code execution on the inference server side, you know, kinds of setups we've seen. So why do you think Anthropic is pursuing a different path? Why do you think the others are not pursuing that same path?

Thorsten: I don't know. I can only guess, but I would say it's really hard to build stuff in a big team with a coherent vision and to take an idea and turn it into reality and an idea that's so in some sense it's really subtle like what's the difference between the agent philosophy of Gemini 2.5 Pro and Claude right, and taking that and making the whole org turn that idea into reality through data collection, through reinforcement learning, through evaluation of reinforcement learning through testing this in production I think that's really hard and it seems like Anthropic just might approach reinforcement learning in a different way than maybe Gemini. Like if I were to, and I haven't done reinforcement learning, but the way I understand it works is that you basically let a model do something in different ways. And then depending on what it did, you give it a reward and then you have a try again, right? So if you say, "create me a to-do list," you have it go and do this five times. And then whatever its first step is, if it's a good step, you give it a little bit of a reward. You reinforce that and then it goes and does a different thing. And so you guide it through this. And I think with, if you want to zero shot an app, it's relatively easy to go from the end state, like a fully formed to-do list app in Swift, back to kind of go back to the start state and say, I want you first to create this file. Then I want you to create this file. I want you to do this. Like just you take the end result, split it up, and then you give it rewards to get back to this. But if we're talking about run this terminal command, look at this output, it failed. Now, based on this, try this other thing. That's really hard to, I think, to train for, or at least in my mind.

Quinn: Yeah.

Thorsten: Maybe I'm completely off here, but it just seems like they have a different vision of what an agent should do. And they managed to translate this vision into reality across multiple different things in their org.

Quinn: Yeah. And the decision to go in this direction rather than others had to be made a while ago.

Thorsten: Yes.

Quinn: So, you know, it's kind of like with the flu vaccine, they look at the strains and they have to guess and maybe someone guesses right. So there's a conundrum though, which is when you look at some of these background agents that were released over the last week and a half, you have Jules and Codex from Google and OpenAI respectively, both being very much, let's run a full build environment. That in an OpenAI's case, it doesn't have network access, but it's a full build environment. It can run shell commands arbitrarily. It has the full power. And then you have Anthropic's approach, which is much lighter weight, which primarily uses something like GitHub Actions or CI for feedback. So if anything, you see the two models that are not optimized, not as good at arbitrary tool calling to get to the solution using a full background agent build environment, and Anthropic, which is actually trying to just go for the more one-shot approach or the longer iteration cycles before they get feedback approach. So I think it's interesting that you see that inverted over there.

Thorsten: Yeah, let's talk about our ideas for background agents. I mean, that's a good topic.

Quinn: Yeah, I love a topic when it feels wrong and it feels like you can tell people it and they argue and you can't really convince them about it. I think that's when it's something you really just want to build and try and try it on. And if we can do it, then it means that maybe we have some time advantage.

Thorsten: Yeah.

Quinn: So the idea is with background agents, it's the idea of, okay, I love using the agentic coding tool in my editor on the CLI, but what if I'm at my kid's soccer game and I want to do something for my phone there? That's the general case is right in the background.

Thorsten: Yeah, for everybody listening, I'm at my kid's soccer game. This has been said 50 times over the last five weeks, you know?

Quinn: Yeah.

Thorsten: Why do you need a background agent? When am I at my kid's soccer game and I want to keep coding? But it's a valid thing with agents. You started some work or you're outside and you know that you just want to try out an idea or something. And I'm doing this because I'm saying on my phone. The idea that you now have these things that can go and run for 10, 15 minutes, it's nice to be able to start this from anywhere, right? So that's the whole idea.

Quinn: Yeah, that's right. And so there are a lot of different approaches. I think Devin was one of the first to basically spin up a whole VM or container. And that's similar to what Jules is doing, what OpenAI is doing. And this, I think, is very similar to the whole idea of cloud IDEs. Cloud IDEs are such a good idea. Who wouldn't want to be able to just, you know, spawn up something in their web browser, have the same build environment that they have on their local, you know, local desktop editor, make a change, not even touch their local state. It's so beautiful and clean. And yet there's basically maybe two companies in the world that have really gotten that right.

Thorsten: And that's Meta and Google.

Quinn: And they put in incredible investment into making a cloud IDE that works perfectly. And for everyone else, cloud IDE adoption, it's so nascent. I mean, even when you have VS Code, where it is literally using web technologies that can run in the browser, it's the same experience. It's all the little long tail of things that just doesn't quite work as well. None of the extensions really work in that scenario. None of the language servers work. You know, all your little tools, they don't work. And things like GitHub Codespaces and other really smart people I know have worked on cloud IDEs and they've just not gotten that much adoption because it just never is good enough. It's really hard to maintain the CI environment, which you need to have, and the local dev environment and a cloud IDE environment. And the third one, you just neglect. And you could disagree with me. You could say it shouldn't be that way, but it is that way. So I'm stating the facts. All right. So the hypothesis is that if you want background agents, then they should use CI for feedback, for seeing if the test passed, if the linter worked, if all these things are correct, instead of having a build environment. And now it's not that a build environment is bad, but that is an optimization. And first, build it to use CI. Build it for the whole class of problems for which CI is good enough. And another way to define that is build it for the whole class of problems where the model actually can get pretty damn far one shot at it. And it doesn't need a lot of inner kind of incremental, make a little edit, get some feedback, make a little edit, get some feedback, but where it can go, you know, for a longer iteration cycle, there's a lot of problems that can be solved with that kind of agent. And then only when CI is too slow or not granular enough, then you look to optimize and then you might want to bring in a build environment or you might want to just make your CI better at only running the subset of tests that need to be run given a change. And hey, by the way, if you do that, that's going to benefit human devs as well.

Thorsten: Yeah. Yeah, so for everybody listening, the distinction is that if you've used OpenAI Codex, for example, when you set up, you have to say, "I want to run Codex in this repository" and then you get like, it's a page with forms and then you have to put in the environment variables. You have to say, you know, I don't know, NPM install, bundle install, go build, whatever it is, go mod tidy. You have to set up the dependencies because then the goal is it boots up a VM and inside of that VM, it has access to your build tools. And if you've been listening to the other episodes, we're big fans of giving these agents feedback. So we really like, you know, giving the agent diagnostics or build feedback or having it be able to run build commands. But we think, well, this is what you just said, right? That setting this up in the cloud in a VM is really hard and it might not be worth it. As in, we can just run this without access to the build tools and it can use CI as feedback, right? So it pushes a commit, gets CI feedback, and then continues to iterate on it. So you only have to set it up once in CI, which you've already most likely have. And the other thing to add here is that, you know, latency is not that big of an issue. Like, the latency of it pushes the commit and it gets feedback from CI might not be a big issue because you're doing stuff anyway, you know. And I would add to what you just said that for most of the use cases I have in mind where I want to kick off an agent and have it do stuff, I don't want to have it go and do stuff where I know that it will need a lot of compiler feedback because I know that then you're kind of going through this one layer removed and you cannot really touch it, which is the same with Cloud IDEs. I would rather have it do stuff where I know it can just one shot do them, like some crazy stuff where it doesn't maybe need the build environment.

Quinn: Yeah, that's right. It's a false promise for a background agent to be able to say, oh, you can do all the same things that it can do locally.

Thorsten: Yeah.

Quinn: And it's a false promise because you don't have a background agent build environment that is accurately replicating your local dev environment. And if you're the tiny subset of companies that do, congratulations. If you just yesterday set up Codex or the Cursor background agent works the same way, if you just set it up and it matches your local dev environment today, well, just wait until someone changes tomorrow. You know, it's going to drift and think, you know, you're in a lot of pain there. But there's a lot of optimizations you can make. And this whole model of tool calling is really great because I think one of the most common ways that this is going to fail is like linting and formatting and white space, these dumb fixes. And we've observed that a lot of people are adapting how their code is structured to be able to work better with LLMs. So here's one optimization, for example. Okay. You find that like the white space and formatting is getting messed up. Well, just have it be able to run a tool that fixes the formatting and white space in a way that doesn't require arbitrary execution of like prettier plugins or, you know, go fmt is a great example where it's very standardized in the Go community. And that's the kind of thing that can be run safely and you don't need a build environment. So if you kind of, you know, compromise on the configurability of some of these linting and formatting tools, you can actually get a really long way. And by the way, you know, that's don't bike shed that just, you know, do whatever. Those should be in service of moving fast, not in service of some idiosyncratic preference.

Thorsten: Yeah. So that means you're working on background agents. We're working on background agents. We haven't made as much progress because being busy with everything else, like there's so much stuff to build. I'm looking at my to-do list here, which is growing by the hour.

Quinn: And we love when people get feedback. So if you tell us about bugs or anything else, we will drop everything else to work on those bugs. So keep the feedback coming.

Thorsten: Yes, keep the feedback coming. Keep sending in bugs. If you run into some, we love to hear about it and fix them. So yeah, let's wrap it up. One last thing I got to ask this. So Claude Sonnet 4, amazing model. It's awesome. But it uses a lot of emojis, like in the summaries. What do you think about that as the last? Does it not bother you when it comes back?

Quinn: Over on my screen here. I have, I was doing something and yeah, I was just looking at the summary and it's got some, you know, those green check mark emojis, which is a little bit more professional. And the summaries, you know, okay. I'm still getting used to them. It gives long summaries about what it did. And we've tried prompting that out in the system prompt saying, "don't give summaries," but you can't prompt that out that well. This is the idea of going with the grain of the model. If the model has been trained to do that, you're going to, I don't know, like waste a lot of its IQ in trying to get it to do something different.

Thorsten: Yeah.

Quinn: But one thing that's really cool is I believe it was Camden on our team who realized that the summaries are actually a great place where it can do citations. So now the summaries in all the files and symbols, they link, they'll have little Amp underlined links and you can click on them. And I've actually found that to be useful.

Thorsten: Yeah.

Quinn: So maybe lemonade out of lemons.

Thorsten: Yeah. So yesterday, what was it? Like the last thing I did yesterday evening was a storybook thing. And then it comes back with the summary, but now the clickable. So now I use this to kind of check on what it's doing and just click on that stuff. And I'll take some emojis for this. But yeah, it's, we try to get the emojis out. There's stuff in the system to get the emojis out. But if you run into emojis, we tried our best.

Quinn: Yeah. Maybe the summaries will help address some user feedback, which is like, I want a better kind of diff you a summary of what it did. Because that is a common piece of feedback. So that maybe that's, yeah. Actually, the thing you mentioned about storybooks, I think there's a little tip that I want to share that we found to be really useful in having the agent iterate. Of course, it's using the Playwright MCP server, which lets the agent take a screenshot of the browser and then iterate on that. So if you go into your settings tab and Amp, that's one of the recommended ones. Works really well. Storybooks, you can use like the official storybook library, but we just have a page slash storybook. I think you can go to ampcode.com/storybook and you'll see it. And it's just like, you know, just pages. It's not any kind of formal framework. And then one other tip that people find really helpful is if your app has authentication and you want the agent to be able to browse your app, you don't want to have to like put in a password and go through OAuth, whatever. Then what we do is there's an environment variable that we'll set. It's only activated in local dev that essentially lets the agent bypass auth login is a special user, username auth bypass. And this means that it can just open that up in a browser, no auth needed, and it can navigate the whole app and see it. So that's a really useful pattern that we'll document, but you got it here first.

Thorsten: Yeah. And the other tip I would add to this is you can just ask the agent to create seed data in your dev environment. So I, you know, when I was testing out some things, I didn't have a lot of data in my database or for other users, I didn't have it in my local database. So it just told the agent, "hey, here's PSQL, run this, look at this database and give me some more data, like distributed over other users." And it just goes and does it. Like it figures out the schema. If somebody is good with SQL and you haven't tried giving the agent access to your database, you should try this. It's really nice. Like how it figures out the schema and then just write SQL queries and it returns results. It's really nice to see. It's really cool. All right. So we're ending with hot tip and hot tips. I think that's a good wrap up, right?

Quinn: Yeah.

Thorsten: Yeah.

Quinn: Happy hacking, everyone.

Thorsten: Happy hacking. Bye-bye.