Iteration 3: Experiment with different prompts and compare how they perform on user satisfaction with response
Everyone can contribute. Help move this issue forward while earning points, leveling up and collecting rewards.
Goal of this issue
- Before releasing Explain Code as GA, we should achieve that user feedback is in at least x% of cases
helpful
and in less then y% of caseswrong
. The values for x and y remain to be determined. - We are likely going to switch to a different AI vendor. We should compare how the initial vendor compare to the new vendor to inform the business decision regarding consequences on user's satisfaction when switching vendors.
Proposal
Enhance the metrics for collecting user satisfaction with:
-
helpful
,unhelpful
,wrong
(already available as a result of Iteration 2: Collecting prompt and user satisfa... (#404272 - closed)) - number of lines (or characters or tokens) selected for explanation
- this will help us understand if the satisfaction is a function of the length of code selected
- number of characters (or tokens) of the answer
- this will help us understand if the satisfaction is a function of the length of the answer
- language of the code selected
- this will help us understand if the satisfaction is a function of the code language
- the prompt used and wether the selected code was before the prompt or after the prompt
- this will help us understand how different prompt designs perform
- do not collect the code itself or the answer from the AI to prevent collecting customer or user data.
- allow users to add a text message to explain their sentiment about the response or the feature as such.
- count the total number of times that
- users have received an AI answer vs.
- the times they also choose to provide feedback
- the times they asked a follow-up question
- and did not give feedback
- did give feedback
Play with different prompts
- Use guidance like https://www.promptingguide.ai/ to engineer a hand full of prompts.
- Randomly use the different prompts and different providers.
- Present the results in Sisense.
- we intend to keep measuring user satisfaction also beyond GA, to be able to adjust prompts when needed
- Use the best performing response going forward.
Edited by 🤖 GitLab Bot 🤖