[go: up one dir, main page]

  • 6 Posts
  • 203 Comments
Joined 3 years ago
cake
Cake day: July 1st, 2023

help-circle



  • Finland and West Germany were more major trading partners with the USSR, and more than half of Yugoslavia’s trade was with the OECD.

    Yugoslavia’s economy destabilized firstly from the 70s oil crises and the IMF loans tied to requirements to privatize industries. Many of those loans were taken with the premise that the USSR may invade and the funds were necessary for defense.








  • It seems they may have agreed, but there’s disagreement to how far the scope extends. It’s not clear what the actual contents of the agreement were, but Pakistan, which mediated the deal, said the two-week pause in fighting did extend to Lebanon. Iran has previously included that as a term in talks, but hasn’t confirmed if they kept it this time. But they likely did, I don’t see why Pakistan would lie about that either.

    Israel, unsurprisingly, says the ceasefire does not extend to Lebanon, and has not stopped attacking Lebanon. In fact, they just launched the largest attack on Lebanon in decades. It also sounds like an oil refinery in Iran was hit after the US ceasefire announcement, but there are no details of who hit it yet.



  • “…a more enduring threat: asymmetric warfare, in which individuals or small groups of militants can pose threats strategic to the American military.”

    You know what US military targets an Iranian “shoulder-fired missile” can’t hit? The ones in the US. The only place the US military should be. Invading forces aren’t entitled to an easy time stealing another country’s resources.

    The NBC can fuck right off with this war crime apologia masquerading as a news article containing a mild warning. It’s not a Saving Private Ryan reboot, it’s modern exploitation colonialism.





  • I did read it, yes, that is why I still had the link open. The title and by-line heavily suggests that SA has taken action or made statements which demonstrate actively moving away from the US. This is contradicted inside the article itself, as quoted prior.

    Has their position likely changed? Sure. Is analysis of the timeline worthwhile? Also yes. But there’s literally nothing there which supports the position that any form of action has occurred that alters existing or future US-Saudi agreements. A step towards Ukraine is something noteworthy, but that doesn’t require a step away from the US, and the step away is still speculative.

    That’s not to say I am not fine with speculative. It’s just that it’s not the same content described by: '"BREAKING: Trump Just Lost Saudi Arabia. Trump told the man who controls 12% of the world’s oil to kiss his ass. That man just restructured Middle Eastern security with Ukraine, telling Trump, “It’s over”. ’




  • You’re right, but Johnny rightly also identified the issue where Claude creates complex trash code to work around user-provided constraints while not actually changing approach at all (see the part about tool denial workarounds).

    I think Anthropic optimized for appended system prompt character count, and measured it in isolation - at least in the project’s beginning stages, if it’s not still in the code. I assume the inefficiencies have come from the agent working with and around that requirement, backfiring horribly in the spaghetti you see now. Not only is the resulting trash control flow less likely to be caught as a problem by agents, especially compared to checking a character count occasionally, but it’s more likely the agent will treat the trash code as an accepted pattern it should replicate.

    Claude will also not trace a control flow to any kind of depth unless asked, and if you ask, and it encounters more than one or two levels of recursion or abstraction, it will choke. Probably because it’s so inefficient, but then they’re getting the inefficient tool to add more to itself and… there’s no way to recover from that loop without human refactoring. I assume that’s a taboo at Anthropic too.

    A type of fix I was imagining would be something like an extra call like “after editing, evaluate changes against this large collection of terrible choices that should not occur, for example, the agent’s current internal code”. That would obviously increase the short term token consumption, context window overhead, and make an Anthropic project manager break out in a cold sweat. But it would reduce the gradient of the project death spiral by providing more robust code for future agents to copy paste that can be more cheaply evaluated, and require fewer user prompts overall to rectify obvious bad code.

    They would never go for that type of long game, because they’d have to do some combination of:

    1. listening to all the users complain that they ran out of tokens too soon while creating the millionth token dashboard project, or,
    2. increase the limits for users at company cost, or,
    3. increase prices, or,
    4. sacrifice feature development velocity by getting humans to fix the mess / implement no-or-low-agent client-side tooling for common checks.

    They should just set it all on fire, the abomination can’t salvage the abomination.


  • Sorry, this was more of a rant than I thought it would be, I hit one of my own nerves while writing it. This is what happens when you’re not in a good position to escape enforced AI usage hell. Tl;dr in bold at end.

    wall divider

    I can think of several practical measures, because I’ve tried them myself in an effort to make my coerced work with LLMs less painful, and because in the process I’ve previously fallen into the gambling trap Johnny outlined.

    The less novel things I tried are things they’ve half-assed themselves as “features” already. For example, Johnny found one of the things I had spotted in the wild a while back - the “system_reminder” injection. This periodically injects a small line into the logs in an effort to keep it within the context window. In my case, I tried the same thing with a line that summed up to “reread the original fucking context and assess whether the changes make a shred of sense against the task because what the fuck”. I had tried this unsuccessfully because I had no way to realistically enforce it within their system, and they recently included the “team lead” skill which (I rightly assumed) tries to do exactly the same thing. The implementation suggests they will only have been marginally more successful than my attempt, it didn’t look like they tried very hard. This could be better implemented and extended to even a little more than “read original context”.

    For this leak, some of the very easy things they could have done was to verify their own code against best practises, implement the most basic of tests, or attempt to measure the consistency of their implementation. Source maps in production is a ridiculously easily preventable rookie error. This should already be executed automatically in multiple stages of their coding, merging and deployment pipelines with varying degrees of redundancy and thoroughness the same way it is for any tech company with more than maybe 10 developers. There is just no reason they shouldn’t have prevented huge chunks of the now visible code issues, if were they triggering their own trash bots against their codebase with even the simplest prompt of “evaluate against good system design and architecture principles”. This implies that they either weren’t doing it at all, or maybe worse, ignored all the red flags it is capable of identifying after ingesting all of the system architecture guides and textbooks ever published online.

    Anthropic is constrained in that some of the fixes which should be pushed to users are things which would have significant trade-off in the form of cost or context window, neither of which are palatable to them for reasons this community has discussed at length. But that constraint doesn’t prevent them from running checks or applying fixes to their own code, which reveals the root cause: The problems Anthropic are facing are clearly cultural. They’re pushing as much new shit as they can as quickly as possible and almost never going back to fix any of it. That’s a choice.

    I saw a couple of signs that there are at least a few people there who are capable, and who are trying to steer an out of control titanic away from the iceberg, but the codebase stinks of missing architectural plans which are being retrofitted piecemeal long after they were needed. That aligns with Anthropic’s origin story, where OpenAI researchers accurately gauged how gullible venture capitalists are, but overestimated how much smarter they are than the rest of the world, and underestimated the value of practical experience building and running complex systems.

    With the resources they have, even for a codebase of this unreasonable size, they could and should vibe code a much better version within a couple of months. That is not resounding praise for Claude, only a commentary on the quality of the existing code. Perhaps as a first step they could use their own “plan mode” which just appends a string that says not to make any edits, only to investigate and assess requirements…

    Were I happy to watch the world burn, I’d start my own damn AI company that would do a much better job at this, because holy shit, people actually financed this trash.

    Tl;dr, you’re right that it doesn’t bode well for their prospects of improvement, but it’s not because there aren’t many things they could be doing practically. It’s because they refuse to point the gun somewhere other than their own feet.