Ashktorab et al., 2024 - Google Patents

Aligning human and llm judgments: Insights from evalassist on task-specific evaluations and ai-assisted assessment strategy preferences

Ashktorab et al., 2024

View PDF

Document ID: 420548654973346704
Author: Ashktorab Z; Desmond M; Pan Q; Johnson J; Cooper M; Daly E; Nair R; Pedapati T; Do H; Geyer W
Publication year: 2024
Publication venue: arXiv preprint arXiv:2410.00873

External Links

Cited by

Snippet

Evaluation of large language model (LLM) outputs requires users to make critical judgments about the best outputs across various configurations. This process is costly and takes time given the large amounts of data. LLMs are increasingly used as evaluators to filter training …

Continue reading at arxiv.org (PDF) (other versions)

238000011156 evaluation 0 title abstract description 122

Classifications

- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G06K9/62—Methods or arrangements for recognition using electronic means
- G06K9/6217—Design or setup of recognition systems and techniques; Extraction of features in feature space; Clustering techniques; Blind source separation
- G06K9/6232—Extracting features by transforming the feature space, e.g. multidimensional scaling; Mappings, e.g. subspace methods
- G06K9/6247—Extracting features by transforming the feature space, e.g. multidimensional scaling; Mappings, e.g. subspace methods based on an approximation criterion, e.g. principal component analysis
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06N—COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N5/00—Computer systems utilising knowledge based models
- G06N5/02—Knowledge representation
- G06N5/022—Knowledge engineering, knowledge acquisition

Similar Documents

Publication	Publication Date	Title
DeVos et al.	2022	Toward User-Driven Algorithm Auditing: Investigating users’ strategies for uncovering harmful algorithmic behavior
Dai et al.	2024	Bias and unfairness in information retrieval systems: New challenges in the llm era
US11875118B2 (en)	2024-01-16	Detection of deception within text using communicative discourse trees
JP6510624B2 (en)	2019-05-08	Method and system for intentional computing
US10839154B2 (en)	2020-11-17	Enabling chatbots by detecting and supporting affective argumentation
Bauman et al.	2017	Online consumer trust: Trends in research
Radziwill et al.	2017	Evaluating quality of chatbots and intelligent conversational agents
Wang et al.	2024	Understanding user experience in large language model interactions
US12141535B2 (en)	2024-11-12	Techniques for maintaining rhetorical flow
Ashktorab et al.	2024	Aligning human and llm judgments: Insights from evalassist on task-specific evaluations and ai-assisted assessment strategy preferences
Felin et al.	2024	A scientific method for startups
US12001804B2 (en)	2024-06-04	Using communicative discourse trees to detect distributed incompetence
Abdulqader et al.	2022	Fake online reviews: A unified detection model using deception theories
Golder	2017	Social science with social media
TWI524719B (en)	2016-03-01	A system and method for identifying and linking users having matching confidential information
Rathi et al.	2022	Psychometric profiling of individuals using Twitter profiles: A psychological Natural Language Processing based approach
Lee et al.	2025	AI-generated news content: The impact of AI writer identity and perceived AI human-likeness
Kuutila et al.	2024	What Makes Programmers Laugh? Exploring the Submissions of the Subreddit r/ProgrammerHumor.
Zaki et al.	2025	Leveraging Machine Learning to Analyze Influencer Credibility’s Impact on Brand Admiration and Consumer Purchase Intent in Social Media Marketing
Lubis	2024	User sentiment analysis towards islamic banking applications in Indonesia
Santana et al.	2025	Can LLMs Recommend More Responsible Prompts?
Akolkar	2024	Examining the impact of artificial intelligence on customer satisfaction in the banking sector: A quantitative analysis
Ahonen et al.	2024	Gender biases in AI-Mitigation strategies contributing to fairness
Chandra	2025	Using the Power of Artificial Intelligence (AI) for Fraud Detection and Prevention in E-Commerce/Online Retail
Plohl et al.	2025	Development and Validation of the perceived deepfake trustworthiness questionnaire (PDTQ) in three languages