Ashktorab et al., 2024 - Google Patents
Aligning human and llm judgments: Insights from evalassist on task-specific evaluations and ai-assisted assessment strategy preferencesAshktorab et al., 2024
View PDF- Document ID
- 420548654973346704
- Author
- Ashktorab Z
- Desmond M
- Pan Q
- Johnson J
- Cooper M
- Daly E
- Nair R
- Pedapati T
- Do H
- Geyer W
- Publication year
- Publication venue
- arXiv preprint arXiv:2410.00873
External Links
Snippet
Evaluation of large language model (LLM) outputs requires users to make critical judgments about the best outputs across various configurations. This process is costly and takes time given the large amounts of data. LLMs are increasingly used as evaluators to filter training …
- 238000011156 evaluation 0 title abstract description 122
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G06K9/62—Methods or arrangements for recognition using electronic means
- G06K9/6217—Design or setup of recognition systems and techniques; Extraction of features in feature space; Clustering techniques; Blind source separation
- G06K9/6232—Extracting features by transforming the feature space, e.g. multidimensional scaling; Mappings, e.g. subspace methods
- G06K9/6247—Extracting features by transforming the feature space, e.g. multidimensional scaling; Mappings, e.g. subspace methods based on an approximation criterion, e.g. principal component analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06N—COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N5/00—Computer systems utilising knowledge based models
- G06N5/02—Knowledge representation
- G06N5/022—Knowledge engineering, knowledge acquisition
Similar Documents
Publication | Publication Date | Title |
---|---|---|
DeVos et al. | Toward User-Driven Algorithm Auditing: Investigating users’ strategies for uncovering harmful algorithmic behavior | |
Dai et al. | Bias and unfairness in information retrieval systems: New challenges in the llm era | |
US11875118B2 (en) | Detection of deception within text using communicative discourse trees | |
JP6510624B2 (en) | Method and system for intentional computing | |
US10839154B2 (en) | Enabling chatbots by detecting and supporting affective argumentation | |
Bauman et al. | Online consumer trust: Trends in research | |
Radziwill et al. | Evaluating quality of chatbots and intelligent conversational agents | |
Wang et al. | Understanding user experience in large language model interactions | |
US12141535B2 (en) | Techniques for maintaining rhetorical flow | |
Ashktorab et al. | Aligning human and llm judgments: Insights from evalassist on task-specific evaluations and ai-assisted assessment strategy preferences | |
Felin et al. | A scientific method for startups | |
US12001804B2 (en) | Using communicative discourse trees to detect distributed incompetence | |
Abdulqader et al. | Fake online reviews: A unified detection model using deception theories | |
Golder | Social science with social media | |
TWI524719B (en) | A system and method for identifying and linking users having matching confidential information | |
Rathi et al. | Psychometric profiling of individuals using Twitter profiles: A psychological Natural Language Processing based approach | |
Lee et al. | AI-generated news content: The impact of AI writer identity and perceived AI human-likeness | |
Kuutila et al. | What Makes Programmers Laugh? Exploring the Submissions of the Subreddit r/ProgrammerHumor. | |
Zaki et al. | Leveraging Machine Learning to Analyze Influencer Credibility’s Impact on Brand Admiration and Consumer Purchase Intent in Social Media Marketing | |
Lubis | User sentiment analysis towards islamic banking applications in Indonesia | |
Santana et al. | Can LLMs Recommend More Responsible Prompts? | |
Akolkar | Examining the impact of artificial intelligence on customer satisfaction in the banking sector: A quantitative analysis | |
Ahonen et al. | Gender biases in AI-Mitigation strategies contributing to fairness | |
Chandra | Using the Power of Artificial Intelligence (AI) for Fraud Detection and Prevention in E-Commerce/Online Retail | |
Plohl et al. | Development and Validation of the perceived deepfake trustworthiness questionnaire (PDTQ) in three languages |