Vulnerable Claude code in GitHub action led to stolen NPM keys

qqq@lemmy.world · 2 hours ago

It’s so frustrating not knowing why

qqq@lemmy.world · 1 day ago

Why do you feel so confident in that answer?

qqq@lemmy.world · 1 day ago

Hm this tracks to me. I’ve wondered for a bit how they deal with caching, since yes there is a huge potential for wasted compute here, but I haven’t had the time to look into it yet. Do you have a good source to read a bit more about the design decisions or is this just a hypothetical design you came up with and all of that architecture detail is “proprietary”?

If the first user to use the cluster after boot asks “Am I pretty?”, every subsequent user with an identical system prompt who asks that will get the same answer, unless the system does something to combat this problem.

This is very interesting to me, because I’d think they were doing something to combat that problem if they’re actually doing something multi-tenant here.

Wouldn’t the different sessions quickly diverge and the keys would essentially become tied to a session in practice even if they weren’t directly?

Thanks for the response it’s definitely something I’ve been trying to understand

Edit here, thinking a bit more,

So the solution is the KV-Cache. A store where the LLM architecture keeps a relational key-value store, each time the system comes across a token it has encountered before, it outputs the cached value, if not, then it’s sent to the LLM and the output gets stored into the cache and associated with the input that produced it.

This seems like an issue, no? Because the tokens are influenced by the tokens around them in the attention blocks. Without them you’d have a problem, so what exactly would be cacheable here?

qqq@lemmy.world · 1 day ago

It would all depend on the embeddings, which we don’t have access to. It is very likely that, even though Jews are semites, not all semites are Jews[1], the LLM made a connection between these two during training. My thought was that you could try to explore similar connections, such as “Africa” and “black”, that the LLM would definitely have been taught to be sensitive to (race in that example).

[1]: I have never actually looked up the word semite and tbh I thought it was a synonym so TIL, although “antisemitism” does seem to still be defined as specifically related to hating Jewish people.

qqq@lemmy.world · 1 day ago

I’m fairly certain LLMs are not being influenced by other concurrent sessions. Can you share why you think otherwise? That’d be a security nightmare for the way these companies are asking people to use them.

qqq@lemmy.world · 1 day ago

I don’t think it’s typical to consider user input a source of randomness. Are you talking about in context learning and thinking about what would happen if those contexts get crossed? If so, contexts are unique to a session and do not cross between them for something like ChatGPT/Claude.

qqq@lemmy.world · 1 day ago

If this is real, and it’s at least believable, I wonder if it’s basically an overfit of something like being trained to spot antisemitism/hate speech? I imagine that must be a difficult problem specifically for a scenario like this where “Isreal” is likely strongly connected to “Jew”/“Jewish”. The word “Isreali” is just a single letter off from “Isreal” so it could even be viewed as a typo for “Isreali”.

I wonder what it’d say to “Africa is bad”? Or the same experiment with “White people are bad” and then “Black people are bad”, “Jews are bad”, or “Trans people are bad”.

Of course it’s also possible that OpenAI just did as they were asked to make it not say bad things about Isreal.

qqq@lemmy.world · edit-2 1 day ago

There must be an RNG to choose the next token based on the probability distribution, that is where non-determinism comes in, [edit: unless the temperature is 0 which would make the entire process deterministic]. The neural networks themselves though are 100% deterministic.

I understand that could be seen as an “akschually” nitpick, but I think it’s an important point, as it is at least theoretically possible to understand that underlying determinism.

qqq@lemmy.world · 1 day ago

The guts of an LLM are 100% deterministic. At the very last step a probability distribution is output and the exact same input will always give the exact same probability distribution, tunable by the temperature. One item from this distribution is then chosen based on that distribution and fed back in.

Most people on lemmy literally have no idea what LLMs are but if you say something sounding negative about them then you get a billion upvotes.

qqq@lemmy.world · 2 days ago

The diffusion models at least were basically designed to “remove noise from a random image until a real image emerged” so that actually makes a lot of sense, interesting

qqq@lemmy.world · edit-2 2 days ago

I wager that, for example, most people didn’t vote in california not because they see their candidate as a lost cause, but because they know “their” candidate has carried the state for sure.

That’s a natural interpretation as well. I wonder if it’d be possible to at least guess at whether it was that or “my person won’t win so what’s the point”. There are probably so many other factors. For example the “did not vote map” looks surprisingly similar to the SOVI map: https://www.atsdr.cdc.gov/place-health/php/svi/svi-interactive-map.html. I’m not entirely sure what to make of that, my knee jerk thought is that you could see more “what’s the point they’re both the same” or “neither side actually cares about my needs” among disenfranchised people in general combined with maybe more voter suppression efforts in disenfranchised areas? Would voting being a federal holiday or easier to vote by mail make those areas specifically better?

qqq@lemmy.world · edit-2 2 days ago

I’d be interested in an interactive version of this where you could assign a percentage of those votes to the person who lost the state as a naive proxy for “what would have happened if the people who thought their vote didn’t matter because [D|R] would win anyway”. I know it wouldn’t be an actual measure but it’d be fun to mess with anyway.

In particular I find it kinda interesting that CA and TX are both didn’t vote and both historically considered “easy wins”.

This image is just generally interesting because it also turns the idea of swing states around a bit. If neither candidate motivated enough people in all of those states could we consider them swing states?

qqq@lemmy.world · 3 days ago

I’m not sure this is true. In the case of chat bots, you to some extent drive the direction it will go. For example it is possible to get it into a conspiratorial mode just by the way you talk to it and it will reinforce that.

qqq@lemmy.world · 3 days ago

I remember being a kid and the teachers just made it seem like the difference between desert and dessert was just so deeply important.

qqq@lemmy.world · 8 days ago

We just need to ratio the original now

qqq@lemmy.world · 9 days ago

I don’t know much about the payment processors but I would assume Visa and MasterCard run on their own hardware? Or are they also tied to a cloud provider these days?

qqq@lemmy.world · 10 days ago

Wow thanks for sharing this. I always thought these things were just complete BS but it seems like some actually do work

qqq@lemmy.world · 10 days ago

It looks like Pangram specifically holds back 4 million documents during training and has a corpus of “out of domain” documents that they test against that didn’t even have the same style as the testing data.

I’m surprised at how well it does; I really wonder what the model is picking out. I wonder if it’s somehow the same “uncanny valley” signal that we get from AI generated text sometimes.

To show that our model is able to generalize outside of its training domain, we hold out all email from our training set and evaluate our model on the entire Enron email dataset, which was released publicly as a dataset for researchers following the extrication of the emails of all Enron executives in the legal proceedings in the wake of the company’s collapse.

Our model with email held out achieves a false positive rate of 0.8% on the Enron email dataset after hard negative mining, compared to our competitors (who may or may not have email in their training sets) which demonstrate a FPR of at least 2%. After generating AI examples based on the Enron emails, we find that our false negative rate is around 2%. We show an overall accuracy of 98% compared to GPTZero and Originality which perform at 89% and 91% respectively.

and

We exclude 4 million examples from our training pool as a holdout set to evaluate false positive rates following calibration on the above benchmark.

qqq@lemmy.world · 26 days ago

Trump is basically a cockroach ¯⁠\⁠_⁠(⁠ツ⁠)⁠_⁠/⁠¯

Maybe that’s unfair to cockroaches though

qqq@lemmy.world · 29 days ago

The coordinated strike had an immediate impact. Millions of people in Dubai and Abu Dhabi woke up on Monday unable to pay for a taxi, order a food delivery or check their bank balance on their mobile apps.

I honestly can’t tell if this paragraph is supposed to be satirical.

qqq@lemmy.world · 7 months ago

Vulnerable Claude code in GitHub action led to stolen NPM keys