API improvements to diff retrieval.
Everyone can contribute. Help move this issue forward while earning points, leveling up and collecting rewards.
- Dev: https://dev.gitlab.org/gitlab/gitlabhq/issues/2131
- ZenDesk: https://gitlab.zendesk.com/agent/tickets/2103
We were approached via Twitter by the maintainers of a new product that would like to integrate with GitLab. They are having problems with the API. This is what they need:
I'm the maintainer of the Review Board code review project. We have some basic integration with GitLab that we're working to improve, and have hit some issues with the representation of diffs in the API.
We need to be able to fetch a valid Git diff from GitLab, given a commit SHA1. We can sort of do this with the Commits API, using /projects/:id/repository/commits/:sha/diff.
However, the representation is missing all the information commonly found in a Git diff, including the "diff --git" line and any Git metadata.
Without this information, we can't cleanly apply a diff to a file fetched from the repository. In particular, we can't identify the blob SHA1 of the original or modified file (the original being the most important part).
We have found that we can use http://///commit/.diff to fetch a diff, and we fortunately can access this with the API token. However, it's not part of the API, which makes me a little nervous, and it only contains short SHAs, not the full SHAs. This means we can't do some of the matching we need to do.
So I'd like to see what options you guys think would make the most sense here.
At a minimum, we'd want that above view to use a full SHA. Ideally, the Commits API for fetching a diff would return the full, unaltered git diff for that commit, with full SHAs.
Patricio
I think it would be cool to have more products integrated with GitLab and I think the improvements to the diff API make sense.
Marin
What I don't understand from their ticket is if they have a proposal or they want to discuss stuff with us or ?
Patricio
I haven't asked, but I can assume that they have no proposal, otherwise they would have said so already, they might be open to discussing best cases and I believe they want us to implement it.
I asked to see how they want to discuss this and if they already have something in mind on how to do it best.
I asked him what he had in mind and this is his repy:
Git diff improvements
The git diffs currently shown in the API can't be easily used to rebuild a commit locally, for three reasons:
- They're missing the Git diff metadata, just providing the diff hunks. There's no "diff --git" lines or "similarity index" or anything. This data is generally provided by "git show", "git diff", and "git format-patch", so I'd imagine whatever API is being used would have an equivalent of this that can be used to output the full data.
- The SHAs listed are all short SHAs instead of full SHAs, which is a lot harder to work with. This would probably be an easy fix. Assuming there's a way to pass the equivalent of --full-index to these calls, I'd suggest updating all endpoints that show diffs to pass the necessary argument.
- The diffs have a "signature" line after the diff data that looks like:
--
libgit2 <version>
Hopefully that can be suppressed, because it causes patch failures. patch thinks that the "--" line is trying to remove a line consisting solely of "-" (The first "-" begin "remove a line", and the second being the line contents). If there isn't an option for this, some post-processing to remove these so the caller can consume it safely would be nice. Those three things would make a lot of our problems go away. Update the API endpoint for fetching full diffs for a commit
We can currently use http://///commit/.diff to fetch a diff, but there are no guarantees about how error responses would be handled, and I don't know if it's even intentional that we can access that with an API token. You have an API resource that would be ideal here: https://github.com/gitlabhq/gitlabhq/blob/master/doc/api/commits.md#get-the-diff-of-a-commit. If this can return a diff with the above fixes, equivalent to http://///commit/.diff, we'd be able to use that. Something that is the equivalent of doing 'git show `. Note also that the 'git show ' returns the commit message, author, etc. This information would be very handy to have in these results as well. So my recommendation there would be to have that API return the exact equivalent of what 'git show' returns. Capabilities
The idea here is to have one API we can query that tells us information on the GitLab API: The server version and flags representing certain types of features/fixes. For instance, maybe we'd access /api/v3/capabilities/ (or maybe just /api/v3/) and see something along the lines of:
{
'gitlab_version': '1.2.3',
'diffs': {
'full_git_diffs': True
}
}
As you add new things or change behavior, new flags could be added that we can test against. The use case here is that we can perform this one API call and see whether these flags are set. If they are, great, we can enable the feature in our product and make the rest of the API calls. If they're missing, we can instead provide some alternative UI or a message to the user. By including the version, we can also say "This feature is not supported for GitLab . Please upgrade to ." Use cases
Okay, so those are the behaviors we'd love to see. I'll go into our use case more to help tie that all together. In Review Board, we have a UI for browsing through commits on a repository. When clicking a commit, we offer the option to post it for review. To do this effectively, we need a full, standard Git diff we can parse and apply. This diff needs full SHAs and all the metadata provided in a typical Git diff, so that we can sanely represent those changes to the user and fetch additional information on the full contents of those files (depending on that metadata). This also needs to be something that the user can download and apply to their local clone using 'git apply'. Currently, we can't do any of this reliably. More and more, we're hearing from companies who are long-time Review Board users who are moving many repositories away from GitHub or other services to GitLab, but are still making use of non-Git repositories (Subversion, Perforce, etc.), hence the usage of Review Board over GitLab's built-in code review support in these cases. These users are used to this feature for GitHub, and have been asking about the GitLab equivalent.
Dmitriy
As I understand they need to get plain diff for commit via API. This is something we can implement in API or they can contribute if they know ruby.
Another thing they want is return full SHA for commit. But it is possible already. id field returns full sha according to http://doc.gitlab.com/ce/api/commits.html