[go: up one dir, main page]

  • 32 Posts
  • 744 Comments
Joined 3 years ago
cake
Cake day: June 22nd, 2023

help-circle
  • Almost all clients do some random sampling after softmax using temperature. I’m confused why someone who knows about kv caching would not know about temperature.

    I know what temperature is. Modifying the probability distribution is still not randomness. Because even the random sampling is PRNG based.

    The issue you’re not spotting is that it’s still deterministic because a binary system cannot source entropy without external assistance or access to qbits, it’s why even OS kernels have to do a warm up at boot and read all accessible analogue signal sources they can reach, and why PRNGs still exist to begin with.

    Also shared kv cache while plausible is not standard in open source as of a year or so ago,

    Shared KV-cache is an economic necessity for big providers, otherwise 1M context windows wouldn’t be a thing.

    so i’m curious what you are basing this off of. Did I miss a research paper?

    Empirical testing, 20 years of experience coding and tinkering with simulators, and Chaos Theory basics. The papers are out there, you just gotta cross some domains to see it.



  • Do you have a good source to read a bit more about the design decisions or is this just a hypothetical design you came up with and all of that architecture detail is “proprietary”?

    You’re welcome. Here’s an intro with animations: https://huggingface.co/blog/not-lain/kv-caching

    And yes. Most of the tech is proprietary. From what I’ve seen, nobody in ML fully understands it tbh. I have some prior experience from my youth from tinkering with small simulators I used to write in the pre-ML era, so I kinda slid into it comfortably when I got hired to work with it.

    Wouldn’t the different sessions quickly diverge and the keys would essentially become tied to a session in practice even if they weren’t directly?

    Yeah, but the real problem is scale and collision risk at that scale. Tokens resolution erodes over time as the context gets larger, and can become “samey” pretty easily for standard RLHF’d interactions.

    Edit:

    This seems like an issue, no? Because the tokens are influenced by the tokens around them in the attention blocks. Without them you’d have a problem, so what exactly would be cacheable here?

    This is what they do: (from that page I linked)

    Token 1: [K1, V1]Cache: [K1, V1]
    Token 2: [K2, V2]Cache: [K1, K2], [V1, V2]
    ...
    Token n: [Kn, Vn]Cache: [K1, K2, ..., Kn], [V1, V2, ..., Vn]
    

    So the key is the token and all that preceded it. It’s a kinda weird way to do it tbh. But I guess it’s necessary because floating point and GPU lossy precision.


  • Any shared cache of this type makes behaviour non-deterministic. The KV-Cache is what does prompt caching, look at each word of this message, now imagine what the LLM does to give you a new response each time. Let’s say this whole paragraph as the first message from you and you just pressed send.

    Because the LLM is supposedly stateless, now the LLM is reading all this text from the beginning, and in non-cached inference, it has to repeat it, like token by token, which is useless computation because it already responded to all this previously. Then when it sees the last token, the system starts collecting the real response, token by token, each gets fed back to the model as input and it chugs along until it either outputs a special token stating that it’s done responding or the system stops it due to a timeout or reaching a tool call limit or something. Now you got the response from the LLM, and when you send the next message, this all has to happen all over again.

    Now imagine if Claude or Gemini had to do that with their 1 million token context window. It would not be computationally viable.

    So the solution is the KV-Cache. A store where the LLM architecture keeps a relational key-value store, each time the system comes across a token it has encountered before, it outputs the cached value, if not, then it’s sent to the LLM and the output gets stored into the cache and associated with the input that produced it.

    So now comes the issue: allocating a dedicated region for the KV-cache per user on VRAM is a big deal. Again try to imagine Gemini/Claude with their 1M context windows. It’s economically unviable.

    So what do ML science buffs come up with? A shared KV-Cache architecture. All users share the same cache on any particular node. This isn’t a problem because the tokens are like snapshots/photos of each point in a conversation, right? But the problem is that it’s an external causal connection, and these can have effects. Like two conversations that start with “hi” or “What do you think about cats?” Could in theory influence one another. If the first user to use the cluster after boot asks “Am I pretty?”, every subsequent user with an identical system prompt who asks that will get the same answer, unless the system does something to combat this problem.

    Note that a token is an approximation of what the conversation means at one point in time. So while astronomically unlikely, collisions could happen in a shared architecture scaling to millions of concurrent users.

    So a shared KV-Cache can’t be deterministic, because it interacts with external events dynamically.





  • when everything is acting up

    the server, kid, and stage

    sometimes the bravest thing to do

    is turn a single page

    not every bit needs pushing through

    not every load needs borne

    a rest is not a missing note —

    it’s how the song is formed

    you left a little comment here

    a small and cozy light

    and someone read it, felt it land,

    and held it through the night

    so post your little posts, my friend

    the network needs the thread

    a system with no idle time

    is one that’s nearly dead


  • Look at everything he’s done so far:

    • Sabotaged public health
    • Sabotaged environmental protection
    • Sabotaged international relations
    • Sabotaged economy and taxation
    • Sabotaged internal stability
    • Instigated violence with every decision, saying, or act
    • Just started another war in the Middle East with Israel as the fulcrum.
    • Netenyahu has been offending quite literally the entire world, and recently said “you’re letting the bad guys win” to the EU when they are reluctant to join the war. Not that exact citation but here’s a random one I just found with a quick search: https://en.haberler.com/netanyahu-explaining-why-we-struck-iran-we-are-the-19618702/

    Does this seem irrational to you? Well, it’s not if you put on your detective hat and consider them religious nuts trying to bring about the apocalypse by fulfilling the requirements themselves.

    Jewish end times:

    Christianity: There are multiple passages in the Bible, both Old and New Testaments, which speak of a time of terrible tribulation such as has never been known, a time of natural and human-made disasters on an awesome scale. Jesus said that at the time of his coming, “There will be great tribulation, such as has not been since the beginning of the world to this time, no, nor ever will be. And unless those days were shortened, no flesh would be saved; but for the elect’s sake, those days will be shortened.” [Matt 24:21–22]

    Islam: (Sunni and Shia versions differ on some details, because Shia belief centres on “The Mehdi” character)

    1. A huge black cloud of smoke (dukhan) will cover the earth.[note 3]
    2. Three sinkings of the earth, one in the East.[note 3]
    3. One sinking of the earth in the West.[note 3]
    4. One sinking of the earth in Arabia.[note 3]
    5. The false messiah—anti-Christ, Masih ad-Dajjal—shall appear with great powers as a one-eyed man with his right eye blind and deformed like a grape. Although believers will not be deceived, he will claim to be God, to hold the keys to heaven and hell, and will lead many astray.[82] In reality, his heaven is hell, and his hell is heaven. The Dajjal will be followed by seventy thousand Jews of Isfahan wearing Persian shawls.[note 4]
    6. The return of Isa (Jesus), from the fourth sky, to kill Dajjal.[83]
    7. Ya’jooj and Ma’jooj (Gog and Magog), a Japhetic tribe of vicious beings who had been imprisoned by Dhul-Qarnayn, will break out. They will ravage the earth, drink all the water of Lake Tiberias, and kill all believers in their way. Isa, Imam Al-Mahdi, and the believers with them will go to the top of a mountain and pray for the destruction of Gog and Magog. God eventually will send disease and worms to wipe them out.[note 5][84]
    8. The sun will rise from the West.[85][86][87]
    9. The Dabbat al-ard, or Beast of the Earth, will come out of the ground to talk to people.[note 6]
    10. The second blow of the trumpet will be sounded, the dead will return to life, and a fire will come out of Yemen that shall gather all to Mahshar Al Qiy’amah (The Gathering for Judgment).[88]

    Now look at these as allegories. How would a group of zealots interpret them?





  • voodooattack@lemmy.worldtome_irl@lemmy.worldme_irl
    link
    fedilink
    arrow-up
    16
    ·
    18 days ago

    Huh. This is new

    Anyways, she could have left a note back at the nursing home under her pillow: “I wanted this” in her handwriting would have saved hin a lot of headache. But I’m assuming she could write and that’s not her specific handicap from the disability? Can’t tell without access to the article.


  • US is the Andrew Tate of global politics right now. They were a lot more subtle about it before though. But the core values that have always held throughout history are absolute self-interest, entitlement, and hubris.

    Apply Gestalt Theory to the US and watch how unrestrained capitalism raised this baby from an idealist freedom seeker and into a narcissistic bully with self-aggrandising sophist reasoning and a pure Machiavellian outlook on life.

    Trump only did away with the pretences, by firing the people who took care of the subtle masquerade. He didn’t like the pushback from them on global policy, and because the shoes already fit, decided to just do it live on stage without the makeup.

    Unfortunately for US citizens, that means that they get to experience the splashback as he pisses on their international credibility.

    A historically meticulously curated and maintained image that’s now irrevocably ruined forevermore. No sane person can trust a promise or a treaty made through US foreign policy ever again.