[go: up one dir, main page]

Ashktorab et al., 2024 - Google Patents

Aligning human and llm judgments: Insights from evalassist on task-specific evaluations and ai-assisted assessment strategy preferences

Ashktorab et al., 2024

View PDF
Document ID
420548654973346704
Author
Ashktorab Z
Desmond M
Pan Q
Johnson J
Cooper M
Daly E
Nair R
Pedapati T
Do H
Geyer W
Publication year
Publication venue
arXiv preprint arXiv:2410.00873

External Links

Snippet

Evaluation of large language model (LLM) outputs requires users to make critical judgments about the best outputs across various configurations. This process is costly and takes time given the large amounts of data. LLMs are increasingly used as evaluators to filter training …
Continue reading at arxiv.org (PDF) (other versions)

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06KRECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
    • G06K9/00Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
    • G06K9/62Methods or arrangements for recognition using electronic means
    • G06K9/6217Design or setup of recognition systems and techniques; Extraction of features in feature space; Clustering techniques; Blind source separation
    • G06K9/6232Extracting features by transforming the feature space, e.g. multidimensional scaling; Mappings, e.g. subspace methods
    • G06K9/6247Extracting features by transforming the feature space, e.g. multidimensional scaling; Mappings, e.g. subspace methods based on an approximation criterion, e.g. principal component analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06NCOMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N5/00Computer systems utilising knowledge based models
    • G06N5/02Knowledge representation
    • G06N5/022Knowledge engineering, knowledge acquisition

Similar Documents

Publication Publication Date Title
DeVos et al. Toward User-Driven Algorithm Auditing: Investigating users’ strategies for uncovering harmful algorithmic behavior
Dai et al. Bias and unfairness in information retrieval systems: New challenges in the llm era
US11875118B2 (en) Detection of deception within text using communicative discourse trees
JP6510624B2 (en) Method and system for intentional computing
US10839154B2 (en) Enabling chatbots by detecting and supporting affective argumentation
Bauman et al. Online consumer trust: Trends in research
Radziwill et al. Evaluating quality of chatbots and intelligent conversational agents
Wang et al. Understanding user experience in large language model interactions
US12141535B2 (en) Techniques for maintaining rhetorical flow
Ashktorab et al. Aligning human and llm judgments: Insights from evalassist on task-specific evaluations and ai-assisted assessment strategy preferences
Felin et al. A scientific method for startups
US12001804B2 (en) Using communicative discourse trees to detect distributed incompetence
Abdulqader et al. Fake online reviews: A unified detection model using deception theories
Golder Social science with social media
TWI524719B (en) A system and method for identifying and linking users having matching confidential information
Rathi et al. Psychometric profiling of individuals using Twitter profiles: A psychological Natural Language Processing based approach
Lee et al. AI-generated news content: The impact of AI writer identity and perceived AI human-likeness
Kuutila et al. What Makes Programmers Laugh? Exploring the Submissions of the Subreddit r/ProgrammerHumor.
Zaki et al. Leveraging Machine Learning to Analyze Influencer Credibility’s Impact on Brand Admiration and Consumer Purchase Intent in Social Media Marketing
Lubis User sentiment analysis towards islamic banking applications in Indonesia
Santana et al. Can LLMs Recommend More Responsible Prompts?
Akolkar Examining the impact of artificial intelligence on customer satisfaction in the banking sector: A quantitative analysis
Ahonen et al. Gender biases in AI-Mitigation strategies contributing to fairness
Chandra Using the Power of Artificial Intelligence (AI) for Fraud Detection and Prevention in E-Commerce/Online Retail
Plohl et al. Development and Validation of the perceived deepfake trustworthiness questionnaire (PDTQ) in three languages