[go: up one dir, main page]

Intermittent GRPC::ResourceExhausted: 8:stream terminated by RST_STREAM with error code: FLOW_CONTROL_ERROR messages with in CI

In gitlab-org/quality/engineering-productivity/master-broken-incidents#8333 (closed) after we updated to grpc v1.66.0 on Workhorse, we started seeing more of these GRPC::ResourceExhausted messages.

We reverted that change, but still see this failure intermittently. In https://gitlab.com/gitlab-org/gitlab/-/jobs/7797774422#L8777,

Git::ProcessRefChangesService (./spec/services/git/process_ref_changes_service_spec.rb:5) – 00:00.507 (304 / 1) of 00:01.544 (32.86%)
Failures:
  1) Integrations::Slack behaves like Integrations::SlackMattermostNotifier Note events when commit comment event executed behaves like triggered Slack integration notifies about commit comment events
     Failure/Error: stub(service, storage).__send__(rpc, request, kwargs) # rubocop:disable GitlabSecurity/PublicSend
     GRPC::ResourceExhausted:
       8:stream terminated by RST_STREAM with error code: FLOW_CONTROL_ERROR. debug_error_string:{UNKNOWN:Error received from peer unix:tmp/tests/gitaly/praefect.socket {grpc_message:"stream terminated by RST_STREAM with error code: FLOW_CONTROL_ERROR", grpc_status:8, created_time:"2024-09-11T12:10:17.749769723+00:00"}}
     Shared Example Group: "triggered Slack integration" called from ./spec/support/shared_examples/models/concerns/integrations/slack_mattermost_notifier_shared_examples.rb:481
     Shared Example Group: Integrations::SlackMattermostNotifier called from ./spec/models/integrations/slack_spec.rb:6
     # ./lib/gitlab/gitaly_client.rb:292:in `execute'
     # ./lib/gitlab/gitaly_client/call.rb:18:in `block in call'
     # ./lib/gitlab/gitaly_client/call.rb:60:in `recording_request'
     # ./lib/gitlab/gitaly_client/call.rb:17:in `call'
     # ./lib/gitlab/gitaly_client.rb:281:in `call'
     # ./lib/gitlab/gitaly_client/with_feature_flag_actors.rb:31:in `block in gitaly_client_call'
     # ./lib/gitlab/gitaly_client.rb:644:in `with_feature_flag_actors'
     # ./lib/gitlab/gitaly_client/with_feature_flag_actors.rb:25:in `gitaly_client_call'
     # ./lib/gitlab/gitaly_client/repository_service.rb:383:in `gitaly_repo_stream_request'
     # ./lib/gitlab/gitaly_client/repository_service.rb:210:in `create_from_bundle'
     # ./lib/gitlab/git/repository.rb:954:in `create_from_bundle'
     # ./spec/factories/projects.rb:505:in `block (4 levels) in <main>'
     # ./spec/support/shared_examples/models/concerns/integrations/slack_mattermost_notifier_shared_examples.rb:454:in `block (3 levels) in <main>'

In gitlab-org/quality/engineering-productivity/master-broken-incidents#8333 (comment 2098792552), I looked at the job logs for one instance. I'm not seeing anything in the test logs that suggests that Gitaly raised ResourceExhausted. For example, if I look at this CreateRepositoryFromBundle RPC, we see that it attempted to run create temp bundle, but then just got canceled:

gitaly-test.log

time="2024-09-09T11:44:09.773Z" level=info msg="finished unary call with code OK" component=gitaly.UnaryServerInterceptor correlation_id=ce5bd736acb575e33defb8a02b2cd2cf feature_flags="gpg_signing mailmap_options use_resizable_semaphore_in_concurrency_limiter" grpc.code=OK grpc.meta.auth_version=v2 grpc.meta.client_name=gitlab-web grpc.meta.method_operation=accessor grpc.meta.method_scope=repository grpc.meta.method_type=unary grpc.method=RepositoryExists grpc.request.deadline="2024-09-09T11:44:19.002" grpc.request.fullMethod=/gitaly.RepositoryService/RepositoryExists grpc.request.glProjectPath= grpc.request.glRepository= grpc.request.payload_bytes=95 grpc.request.repoPath=@hashed/6b/86/6b86b273ff34fce19d6b804eff5a3f5747ada4eaa22f1d49c01e52ddb7875b4b.git grpc.request.repoStorage=default grpc.response.payload_bytes=0 grpc.service=gitaly.RepositoryService grpc.start_time="2024-09-09T11:44:09.773" grpc.time_ms=0.239 pid=287 span.kind=server system=grpc
time="2024-09-09T11:44:10.234Z" level=info msg="diskcache state change" component=gitaly.StreamServerInterceptor correlation_id=ce5bd736acb575e33defb8a02b2cd2cf diskcache=38ef3cf1-a29c-49ff-99c0-42e904109634 grpc.meta.auth_version=v2 grpc.meta.client_name=gitlab-web grpc.meta.method_operation=mutator grpc.meta.method_scope=repository grpc.meta.method_type=client_stream grpc.method=CreateRepositoryFromBundle grpc.request.deadline="2024-09-09T17:44:10.197" grpc.request.fullMethod=/gitaly.RepositoryService/CreateRepositoryFromBundle grpc.request.glProjectPath=namespace1/project-1 grpc.request.glRepository=project-1 grpc.request.repoPath=@hashed/6b/86/6b86b273ff34fce19d6b804eff5a3f5747ada4eaa22f1d49c01e52ddb7875b4b.git grpc.request.repoStorage=default grpc.service=gitaly.RepositoryService grpc.start_time="2024-09-09T11:44:10.198" pid=287 span.kind=server system=grpc
time="2024-09-09T11:44:10.234Z" level=info msg="finished streaming call with code Canceled" component=gitaly.StreamServerInterceptor correlation_id=ce5bd736acb575e33defb8a02b2cd2cf error="creating repository: cloning bundle: create temp bundle: rpc error: code = Canceled desc = context canceled" feature_flags="gpg_signing mailmap_options use_resizable_semaphore_in_concurrency_limiter" grpc.code=Canceled grpc.meta.auth_version=v2 grpc.meta.client_name=gitlab-web grpc.meta.method_operation=mutator grpc.meta.method_scope=repository grpc.meta.method_type=client_stream grpc.method=CreateRepositoryFromBundle grpc.request.deadline="2024-09-09T17:44:10.197" grpc.request.fullMethod=/gitaly.RepositoryService/CreateRepositoryFromBundle grpc.request.glProjectPath=namespace1/project-1 grpc.request.glRepository=project-1 grpc.request.payload_bytes=2228420 grpc.request.repoPath=@hashed/6b/86/6b86b273ff34fce19d6b804eff5a3f5747ada4eaa22f1d49c01e52ddb7875b4b.git grpc.request.repoStorage=default grpc.response.payload_bytes=0 grpc.service=gitaly.RepositoryService grpc.start_time="2024-09-09T11:44:10.198" grpc.time_ms=36.071 pid=287 span.kind=server system=grpc
time="2024-09-09T11:44:10.626Z" level=info msg="finished unary call with code OK" component=gitaly.UnaryServerInterceptor correlation_id=57b4deab047841b38bb3576bbe55446a feature_flags="gpg_signing mailmap_options use_resizable_semaphore_in_concurrency_limiter" grpc.code=OK grpc.meta.auth_version=v2 grpc.meta.client_name=gitlab-web grpc.meta.method_operation=accessor grpc.meta.method_scope=repository grpc.meta.method_type=unary grpc.method=RepositoryExists grpc.request.deadline="2024-09-09T11:44:20.005" grpc.request.fullMethod=/gitaly.RepositoryService/RepositoryExists grpc.request.glProjectPath= grpc.request.glRepository= grpc.request.payload_bytes=95 grpc.request.repoPath=@hashed/d4/73/d4735e3a265e16eee03f59718b9b5d03019c07d8b6c51f90da3a666eec13ab35.git grpc.request.repoStorage=default grpc.response.payload_bytes=0 grpc.service=gitaly.RepositoryService grpc.start_time="2024-09-09T11:44:10.626" grpc.time_ms=0.282 pid=287 span.kind=server system=grpc

praefect-test.log

time="2024-09-09T11:44:09.773Z" level=info msg="finished streaming call with code OK" component=praefect.StreamServerInterceptor correlation_id=ce5bd736acb575e33defb8a02b2cd2cf grpc.code=OK grpc.meta.auth_version=v2 grpc.meta.client_name=gitlab-web grpc.meta.method_operation=accessor grpc.meta.method_scope=repository grpc.meta.method_type=bidi_stream grpc.method=RepositoryExists grpc.request.deadline="2024-09-09T11:44:19.001" grpc.request.fullMethod=/gitaly.RepositoryService/RepositoryExists grpc.request.payload_bytes=95 grpc.response.payload_bytes=0 grpc.service=gitaly.RepositoryService grpc.start_time="2024-09-09T11:44:09.771" grpc.time_ms=1.807 pid=304 relative_path=@hashed/6b/86/6b86b273ff34fce19d6b804eff5a3f5747ada4eaa22f1d49c01e52ddb7875b4b.git route="map[replica_path:@hashed/6b/86/6b86b273ff34fce19d6b804eff5a3f5747ada4eaa22f1d49c01e52ddb7875b4b.git storage:default]" span.kind=server system=grpc virtual_storage=default
time="2024-09-09T11:44:10.227Z" level=info msg="transaction completed" component=praefect.StreamServerInterceptor correlation_id=ce5bd736acb575e33defb8a02b2cd2cf grpc.meta.auth_version=v2 grpc.meta.client_name=gitlab-web grpc.meta.method_operation=mutator grpc.meta.method_scope=repository grpc.meta.method_type=bidi_stream grpc.method=CreateRepositoryFromBundle grpc.request.deadline="2024-09-09T17:44:10.196" grpc.request.fullMethod=/gitaly.RepositoryService/CreateRepositoryFromBundle grpc.service=gitaly.RepositoryService grpc.start_time="2024-09-09T11:44:10.196" pid=304 relative_path=@hashed/6b/86/6b86b273ff34fce19d6b804eff5a3f5747ada4eaa22f1d49c01e52ddb7875b4b.git route="map[additional_replica_path: primary:{default 0xc002ae3808} replica_path:@hashed/6b/86/6b86b273ff34fce19d6b804eff5a3f5747ada4eaa22f1d49c01e52ddb7875b4b.git replication_targets:[] repository_id:0 secondaries:[]]" span.kind=server system=grpc transaction.committed=0/1 transaction.id=1 transaction.subtransactions=0 virtual_storage=default
time="2024-09-09T11:44:10.227Z" level=warning msg="finished streaming call with code ResourceExhausted" component=praefect.StreamServerInterceptor correlation_id=ce5bd736acb575e33defb8a02b2cd2cf error="stream terminated by RST_STREAM with error code: FLOW_CONTROL_ERROR" grpc.code=ResourceExhausted grpc.meta.auth_version=v2 grpc.meta.client_name=gitlab-web grpc.meta.method_operation=mutator grpc.meta.method_scope=repository grpc.meta.method_type=bidi_stream grpc.method=CreateRepositoryFromBundle grpc.request.deadline="2024-09-09T17:44:10.196" grpc.request.fullMethod=/gitaly.RepositoryService/CreateRepositoryFromBundle grpc.request.payload_bytes=2893094 grpc.response.payload_bytes=0 grpc.service=gitaly.RepositoryService grpc.start_time="2024-09-09T11:44:10.196" grpc.time_ms=31.378 pid=304 relative_path=@hashed/6b/86/6b86b273ff34fce19d6b804eff5a3f5747ada4eaa22f1d49c01e52ddb7875b4b.git route="map[additional_replica_path: primary:{default 0xc002ae3808} replica_path:@hashed/6b/86/6b86b273ff34fce19d6b804eff5a3f5747ada4eaa22f1d49c01e52ddb7875b4b.git replication_targets:[] repository_id:0 secondaries:[]]" span.kind=server system=grpc virtual_storage=default

This makes me wonder if something isn't being cleaned up on the Praefect side in between tests.