Yes, I understand that, you are not understanding what I’m describing. I am not talking about taking an average of numerical data. LLMs take something that can be thought of as an “average” of text. It says “given all the text I have seen, and this new text input, what’s the most likely output?” In some numerical contexts the expected value is also an average, LLMs find a similar result, and that is what I am drawing a parallel between here.
- 3 Posts
- 1.07K Comments
Yes, it can obviously never entirely replace real surveys. I would assume that survey results forming a part of the training set is a big part of why they’re able to get good results in the first place, and as I said I think its a significant risk that the evaluation is done it performs well because the data being evaluated against are (unbeknownst to the researcher) present in the training set.
I think I was overall pretty critical of the idea? I just find it interesting.
That’s a good point, although I imagine a dedicated company could refine a model using more recently sampled general data to improve the recency.
I’m not talking about numerical data, the way LLMs work is to find a “most likely response” based on the input text. There is absolutely maths happening inside the model, how else do you think they work? I’m not saying they take numbers and find an average.
I was interested in this idea, because although LLMs are not good at many things, what they absolutely are good at is taking large data sets of writing and finding a kind of “average” of that data. I can understand why this would make sense. I think it’s a situation where the further you go from the training set the less reliable your “silicon sample” will be, because it has less and less relevant information to draw from, but I can also kind of see it working in some circumstances.
So, anyway, I have done a little research into this and the concept does show some definite promise. I think this is the study that kicked off the concept, and their results are quite impressive. GPT-3 manages to be close to human respondents on a variety of topics and in a variety of contexts (guessing preferences, tone, word choices, etc).
There are some issues I don’t see addressed:
- The evaluation is necessarily on data that is available, and it’s unclear whether they’ve determined if that data existed in GPT-3’s training set. Obviously if it did, this would somewhat poison the results as it would “know” the answers ahead of time.
- The evaluation is limited to the US, and is all of “public opinion” topics, outside those I can’t find further evidence that this works at all - while the paper does include methods they used to correct for default biases in GPT-3, this remains within this fairly narrow context.
- Because much of the data is qualitative, some of the methods used to evaluate the fidelity of the model are somewhat unreliable (e.g. surveying humans and having them gauge the model’s output). To be fair, this is in many cases inherent to the nature of psychological research rather than LLMs, but it makes trusting the results more difficult.
One important part from the article:
These studies suggest that after establishing algorithmic fidelity in a given model for a given topic/domain, researchers can leverage the insights gained from simulated, silicon samples to pilot different question wording, triage different types of measures, identify key relationships to evaluate more closely, and come up with analysis plans prior to collecting any data with human participants.
“Algorithmic fidelity” is a term that I think they have coined in this paper, it refers to how accurately the model reflects the population you are sampling. Roughly what they suggest is - take a known dataset of the population you want to assess, in the general area you are researching, and compare the real results of that with the LLM results. If this is successful you have an indication that the model can predict the population/area of interest, and you can adjust your questions to your specific topic. They don’t really highlight enough that without this your results could just be completely bogus. Who knows what this company Aaru are doing.
I do think this is quite an interesting and potentially promising use of the technology. Despite the fact it might on the surface seem to be just “inventing” data, in a way the LLM has already surveyed many more heads than any “real” survey ever could hope to. I would like to see more research before being sure of any of this though, I’m certainly going to continue reading about it to see what limitations there are beyond my first assumptions. GPT-3 is not the latest model, and I wonder about how much AI generated content is out there now… Are the later generations of models starting to eat their own tails? There’s obvious manipulation of online conversations through bots, could someone poison the well in this way and cause these “surveys” to produce skewed results?___
I think what makes time really fly is boredom. Not necessarily like, waiting for a train or whatever, but more like “I do the same thing every day” exhaustion with life. The past couple of years have flown by for me but it’s 100% because I don’t like my current job and I don’t do a lot else recently. The more I like my 9-5, and the more the rest of my life excites me, the more memories I get and the longer time seems to take to pass.
It’s no surprise that for most people this happens in your late teens/early 20s, you’re meeting new people all the time, you maybe go to university, you go to parties, and so on. If you stop doing that as you get older and don’t start anything else its inevitable that time will just start to get away from you.
BluesF@lemmy.worldto World News@lemmy.world•Dubai has ten days of fresh food leftEnglish17·30 days agoI propose the slaves eat their masters
I would argue it is either 1) neither necessary nor sufficient, or 2) that it is sufficient but not necessary. Which line depends on whether we are talking about something literally BEING a meme, or something merely having the potential to be a meme.
In the former case - whether something is a meme or not - having text is neither necessary nor sufficient. Adding text to an image does not a meme make, and equally some memes do not have text (or aren’t images at all).
In the latter - whether something could be a meme - adding text is sufficient to provide that potential. It may not be necessary (6-7 for example) but it could be enough.
“I have no memory of this plashe…”
That role was Alan Quartermaine in League of Extraordinary Gentleman. After that movie bombed he retired from acting altogether.
Which is a shame really because I remember it being… Kinda good? Clearly not a work of great art, but fun action silliness. Admittedly I watched it as a kid and never again so who knows really.
Jesus can you actually imagine if LOTR had Sean Connery instead of Ian McKellen? Hideous prospect. No shade to Connery, he’s played some of my favourite characters over the years, but christ that would have been a terrible mistake.
Miserable, pitiable folk who do not know what it means to love or be loved.
Well it kinda is? Most of the time I’ve been using VR has either been on the headset itself or wirelessly streaming from a PC. My headset is old so the quality isn’t amazing, but by and large its always worked well.
Great video from Life Take Two (ex-Mormon/MAGA woman) about this phenomenon and the reasons behind it. In short: it’s misogyny.
BluesF@lemmy.worldto Lemmy Shitpost@lemmy.world•surely your hobby can't be that expensive2·4 months agoYou are wiser than I
BluesF@lemmy.worldto Lemmy Shitpost@lemmy.world•surely your hobby can't be that expensive2·4 months agoMy modular synth is sitting next to me just begging me to pour more money into its gaping holes
Antisemitism has be co-opted and applied to any and all criticism of Israel, as opposed to it’s previous meaning, hatred of Jews/Judaism. This isn’t strictly because the meaning of the word is being used differently as much as it is that proponents of Israel like to conflate Israel with all of Judaism, or even more broadly with all Jews (as an ethnic group as opposed to a religious one). Since Israel takes any criticism to be hatred, the inevitable consequence is that criticism of Israel becomes antisemitism. I’m splitting hairs here and probably making things more complicated than they need to be… But hopefully you understand what I’m getting at.
Incidentally, even in its more broadly accepted definition “antisemitism” itself is a bit of an etymological oddity, because “Semites”, or the Semitic people, are both Jews, Arabs and others… Judaeophobia is an alternative that is unquestionably specific to Jews/Judaism.
The tie is the most egregious part, if you zoom in the pattern makes no sense at all.
I agree, which is why I find the results they got interesting, the fact that the initial study was able to, arguably quite correctly (well, debatable if it was correct, as I pointed out their results are not the easiest to evaluate), predict real results is pretty impressive.