Use GraphQL in GitHub Import
Everyone can contribute. Help move this issue forward while earning points, leveling up and collecting rewards.
Using GitHub's GraphQL API in GitHub Import could significantly reduce the number of requests to GitHub API. With a single GraphQL request, it would be possible to retrieve all required information to import a pull request or issue, which consequently would reduce the number of times the migration process is API rate limited.
Challenges
The GitHub GraphQL API has a more liberal breaking change policy compared to REST and any breaking change to a schema used by GitHub Import would make the import unusable. (Kudos to Luke for finding this info.)
We'll announce upcoming breaking changes at least three months before making changes to the GraphQL schema, to give integrators time to make the necessary adjustments. Changes go into effect on the first day of a quarter (January 1st, April 1st, July 1st, or October 1st). For example, if we announce a change on January 15th, it will be made on July 1st.
For instance, on 2023-10-01, GitHub GraphQL API introduced a breaking change to the DiffNotes schema, which would break GitHub Import if it was using GraphQL
Breaking
A change will be made toPullRequestReviewComment.position
.Description:
position
will be removed. Use theline
andstartLine
fields instead, which are file line numbers instead of diff line numbers
For SaSS, such breaking changes wouldn't be a problem as we would be able to adjust the GraphQL query as soon as the breaking change is announced. However, that wouldn't apply to self-managed instances.
A possible solution to the issue at hand is to provide support for both REST and GraphQL methods. This approach would enable users to switch to REST if required. However, implementing this solution would add some complexity to the importer even though most of the code would be reusable.
How fast GitHub Import would be using GraphQL
The speed gain depends on how much we will change GitHub Import's architecture.
A simple change that could significantly reduce the migration time of a project with 100,000 pull requests would be to use GraphQL to query DiffNotes and Events information in the same request. This would save around 20 hours of migration time and wouldn't require any major architectural changes.
A more ambitious improvement would be to retrieve data on 100 pull requests simultaneously. Although this would be challenging due to the nested paginations, it would enable the retrieval of information on all 100,000 pull requests in less than one hour.