[go: up one dir, main page]

  • 1 Post
  • 282 Comments
Joined 2 years ago
cake
Cake day: January 1st, 2024

help-circle
  • Look, I’m not trying to argue against your moral stance. I’m neither saying it’s wrong nor that it’s outweighed by any usefulness, real or not. What I’m trying is get you to see that your claims about uselessness are undermining your moral argument, which would be a hell of a lot stronger if you were not hell-bent on denying any kind of utility! Because in the eyes of people that do perceive LLMs as useful (which is exactly the kind of people that need to hear about the moral issues), that just makes you seem out of touch and not worth listening to.

    It’s useless for security analysis.

    Have you looked at any of the four links I provided? You might be working on old data here because it’s a very recent development, but a lot of high profile open source maintainers are saying that AI-generated security reports are now generally pretty good and not slop anymore. They’re fixing actual bugs because of it, and more than ever. How can you call that useless?

    Surely, the energy cost to verify the translation would be the same as translating it?

    Uh, no? Have you ever translated something? Verifying a translation happens mostly at attentive reading speed, double it for probably reading it twice overall to focus on content and grammar separately, plus some overhead for correcting the occasional flaw and checking one or two things that I’m unsure about from the top of my head, so for the sake of argument let’s say three times slower than just reading normally. I don’t know about you, but three times slower than reading is still a lot faster than I would be able to produce a translation from scratch, weighing different word options against each other, how to get some flow into the reading experience, etc. If I’m translating into a language that I’m fluent but not native in that takes even longer, because the ratio between my passive and active vocabulary is worse. I can read (and thus verify) English at a much more sophisticated level than I’m able to talk or write, because the words and native idioms just don’t come to me as naturally, or sometimes even at all without a lot of mental effort and a Thesaurus. LLMs are just plain better at writing English than I have any hope of achieving in my lifetime, and I can still fully understand and verify the factual, orthographic and grammatical correctness of what they’re outputting easily. Those two things are not mutually exclusive.

    It’s useless for rhyming (I notice you didn’t mention that one)

    Yeah, because I’m focusing on the more relevant things. I disagree that it’s completely useless for rhyming, but it is a much weaker and more contrived point than the others, and going into that discussion would just derail things more for no added value. Also, funny that you call me out for that, when you just fully ignored two use cases I mentioned in my initial comment (LLM proofreading texts, and answering questions about unfamiliar code bases). Those have a lot of legitimate utility for someone who’s not aware of or doesn’t care about the moral issues. And once again, that’s my point here - those people will not listen if they perceive you as talking about a fictional world where LLMs are completely useless, which fails to match up with their experience.


  • If you know enough to verify a translation as accurate, or you have the tools to figure out an accurate translation through dictionaries or some such, then you know enough to do the translation yourself.

    Correct. But it’s going to take me a lot more work and time, possibly to the point of not being feasible and probably even matching the energy cost of using the LLM over the entirety of the task.

    why would you trust something like system security to an LLM?

    I wouldn’t. I don’t know where you got that. Adding LLM-based analysis to your toolkit to spot important issues that otherwise might not have been found is just that: an addition. Not replacing anything. And it is demonstrably useful for that at this point, there’s just no denying that.

    Once again, even if the billionaire’s toxic Nazi plagiarism machine was useful, it is so morally repugnant that it should never be used, which makes it functionally useless.

    My point is that if you are this confidently wrong about the capabilities of LLM-based tools, then why should I believe you to be any less wrong about the moral and ethical issues you’re raising? It looks like you’re either completely misinformed or deliberately fighting a strawman for a part of your argument, so it gives anyone on the other side an easy excuse to just not engage with the rest of it and just dismiss it entirely. That’s what I’m trying to get across here.


  • The only way to know if LLM output is accurate is to know what an accurate output should look like, and if you know that, you don’t need an LLM

    I empathize with your overall standpoint, but that’s just plain wrong. There are a lot of problems where verifying an answer is much easier for a human (or non-LLM computer program) than coming up with a correct answer.

    Anything that involves language manipulation, for example. I’ll have a much easier time checking a translation from English to German for accuracy than doing the full translation myself, assuming the model gets most of it correct and I don’t have to rewrite anything major (which is generally the case with current models). Or letting an LLM proof-read a text I wrote - I can’t be sure it got everything, but the things it does find are trivial for me to verify, and will often include things that slipped past me and three other people who proof-read the same text. Less useful, but still applicable to the premise: Producing a set of words that rhyme with a given one. Coming up with new ones after the first couple that pop into your head gets pretty hard, but checking if new candidates actually do rhyme is trivially easy.

    Moving on from language-stuff, finding security issues in software is a huge one - finding those is often extremely hard, but verifying them is mostly pretty straightforward if the report is well prepared. Models are just now getting good enough to reliably produce good security reports for actual issues.

    Answering questions about a big codebase, where the actual value doesn’t lie in the specific response the model gives, but pointing me to the correct places in the code where I can check for myself.

    Producing code or entire programs is a bit more debatable and it depends heavily on the goal and the skill level of the operator whether complete verification is actually easier than doing it yourself.

    Just a couple of examples. As I said I get where you’re coming from, but completely denying any kind of utility does not help your cause at all, it just make you look like an absolutist who doesn’t know what they’re talking about.


  • Quick note on terminology, there’s no thing called a “math engine”. Most models have the ability to run custom computer code in some way as one of the “tools” they have available, and that’s what’s used if a model decides to offload the calculation, rather than answer directly.

    This is what that looks like in Claude Code:

    Notice the lines starting with a green dot and the text Bash(python3...). Those are the model calling the “Bash” tool to run python code to answer the second and third question. The first question it answered (correctly, btw) without doing any tool call, that’s just the LLM itself getting it right in a straight shot, similar to DeepSeek in your example. Current models are actually good enough to generally get this kind of simple math correct on their own. I still wouldn’t want to rely on that, but I’m not surprised it got it correct without any tool calls.

    So I tested my more complex calculations against DeepSeek, and it seems like (at least in the Web UI) it doesn’t have any access to a math or code running tool. It just starts working through it in verbose text, basically explaining to itself how to do manual addition like you learn in school, and then doing that. Incredibly wasteful, but it did actually arrive at the correct answers.

    Gemini is the only web-based AI app I thought to test right now that seems to have access to a code running tool, here’s what that looks like:

    It’s hidden by default, but you can click on “Show code” in the top right to see what it did.

    This is what I mean when I say the harness matters. The models are all pretty similar, but the app you’re using to interact with them determines what tools are even made available to the LLM in the first place, and whether/how you’re shown when it calls those tools.








  • Gave it a quick shot right now, and gonna be honest - while the premise seems nice, the sample project is very transparently AI slop generated with a prompt that, I can only assume, included an instruction like “for every sentence that doesn’t include a whimsical quip, I’m gonna kill a kitten”. It is absolutely grating to read. I don’t care if you do that in your marketing copy, but keep that shit out of technical documentation, it’s annoying, it’s distracting, and it’s turning me off the entire project. Like wtf is this:



  • Also raufgeklickt, dahinter die perfekt nachgebaute SIMon-Mobile-Anmeldeseite. Meine Anmeldedaten eingegeben.

    Weil es bisher in den Kommentaren noch nicht erwähnt wurde, aber es einer der wichtigsten Schutzmechanismen gegen sowas ist: Jeder, absolut jeder, sollte konsequent einen Passwort-Manager mit Autofill benutzen, und dann sehr, sehr skeptisch werden wenn Autofill mal nicht funktioniert - normalerweise bedeutet das, dass man gerade nicht auf der Seite ist, auf der man glaubt zu sein.

    Passwort-Manager sind wirklich in jeglicher Hinsicht win-win ohne Kompromisse - sich irgendwo anzumelden wird einfacher und sicherer, gleichzeitig. Man muss sich nur noch ein einziges Passwort merken und von Hand eingeben, alles andere macht der Passwort-Manager für dich, und sorgt ergänzend auch noch dafür dass du überall unterschiedliche und sichere Passwörter benutzt.






  • I agree they probably should’ve addressed that in the main post, but at least it’s in the caveats below:

    Fine, maybe country first. The purists in the comments are technically correct — postal codes aren’t globally unique. You could do country first (pre-filled via IP), then postal code, then let the magic happen. The point was never “skip the country field.” The point is: stop making me type things you already know.



  • I mean I get your point, but it seems like at the current point in time, “Gaming” distros also happen to be the distros that produce the least amount of weird issues and headaches for someone new to Linux, especially if you’re on Nvidia. Bazzite in particular has been incredibly smooth sailing in a way I’ve seen no other distro achieve so far. And it does have a non-Gaming sibling distro if you don’t want that stuff.