Investigate code generation experience for small files and empty functions
Problem to solve
We have received feedback that the experience of using code generation in empty functions and small files is not what users expect. They expect an experience more like code completion for these use cases. We should investigate how we can improve the experience for code generation in these situations or if we should use code completion.
This is a follow up to Intent detection overrides server intent detect... (gitlab-org/editor-extensions/gitlab-lsp#210 - closed), based on feedback around the UX in specific use cases.
Proposal
Currently, we detect intent on empty functions and small files and trigger a code generation request, after this fix was applied to the language server. Because of this, even with streaming enabled, the latency for something to show up in the IDE is much higher than it is if code completion was invoked. When users are working in a file that doesn't have a comment to ask the model to generate code, they expect it to show up quickly and in the same format that they see when they are working in a large file or a function that isn't empty.
We should explore a few things:
- See if there is a way to improve code generation to reduce latency.
- Have we considered making a direct connection for code
code generation
to AI Gateway and streaming directly? Then, could we build the prompts on the AI Gateway side? (suggestion from Angelo Rivera) - Improving code gen latency is something that we need to always be working on anyway, so regardless of the solution for this use case, we'll keep exploring ways to reduce it.
- Have we considered making a direct connection for code
- Introduce a way to switch between code generation and code completion (e.g. UI element, hotkey, slash command) and allow the user to choose what they wanted to invoke, with code completion being the default. (aggregated suggestion from Taylor Vanderhelm and Dasha Adushkina)
- Could we request a code generation before they stop typing (i.e. every X characters) and only display the result when they stop typing? Keeping a rolling window of code generation results going? (suggestion from Allen Cook)
- Another option we could do, is only trigger
generation
for empty functions if the class is "big" enough. For instance, we could do a little bit of static analysis to check the number of methods or functions in a particular file and switch to codegeneration
since the code is deemed to be more "complex". This is similar to the DaVinci approach, but instead of for context, for determining intent. (suggestion from Angelo Rivera) - Is there a situation where we could invoke code completion first, to provide a quick response, but also start the process to get a code generation response as well, if the code completion suggestion is not accepted.
- We would need to work through the UX of how that would work and was presented to the user, but it could short-cut the complaints by initially using code completion and falling back to code generation if no suggestion was accepted.
- Likely not worth exploring, given the potential effort.
- Investigate whether we should switch these use cases to code completion completely.
- If I remember correctly, we had a problem with
code-gecko
sending empty responses when asked to complete code in these use cases. If this is the case, we should see if we could include more context, like open tabs and X-Ray artifacts, to try to improve the responses and mitigate receiving empty suggestions.
- If I remember correctly, we had a problem with
This issue is for discussion to figure out what the best way forward would be. There is no currently proposed fix or implementation, but we should figure out how to improve the experience in these cases.
Feature Usage Metrics
We should explore adding metrics to report how intent was decided. Since code generation requests are still a very small amount of our total code suggestions, it would be helpful how many times it was invoked by a comment, small file, or empty function. While we've heard this experience is less than ideal, knowing how often users were running into it would help make decisions.