Investigate latency with CustomersDot overage checks at larger volumes
In https://gitlab.com/gitlab-org/customers-gitlab-com/-/work_items/14791, I ran load tests against stgsub using the following k6 script:
import http from 'k6/http';
import { sleep, check } from 'k6';
export const options = {
vus: 50,
duration: '300s',
};
const urls = [
'https://customers.staging.gitlab.com/api/v1/consumers/resolve?realm=saas&user_id=1675938&root_namespace_id=10315565&namespace_id=10315565',
'https://customers.staging.gitlab.com/api/v1/consumers/resolve?realm=self-managed&user_id=111&instance_id=a9f5c4c3-26fa-46be-8f66-19407f3ab7ee&unique_instance_id=a9f5c4c3-26fa-46be-8f66-19407f3ab7ee',
];
const params = {
headers: {
'accept': 'application/json',
'content-encoding': 'application/json',
'content-type': 'application/json',
'user-agent': 'Gotten from 1password',
},
};
export default function() {
for (const url of urls) {
let res = http.head(url, params);
check(res, { "status is 200": (res) => res.status === 200 });
sleep(0.5);
}
}
I discovered that up to 100 VUs worked fine, but 1000 VUs caused a 5 second average response time, despite adequate infrastructure resources.
The environment also stopped receiving metrics from the customersdot application during that period of time, but not from the stackdriver exporters.
We should figure out why that latency jumped so significantly and if there are any easy fixes, however, this can happen post GA.