Analyze job duration and adjust queue weights based on execution time

Problem

Queue weights should consider job execution time to prevent long-running jobs from starving other queues. Currently, we don't have a systematic approach to factoring job duration into weight assignments.

Potential issues:

Long-running jobs with high weights can monopolize worker threads
Quick jobs with low weights may experience unnecessary delays
No documented relationship between job duration and appropriate weight

Proposal

Analyze job execution times across all queues and adjust weights to balance throughput and fairness.

Analysis Needed

For each queue, gather metrics on:

Job duration (P50, P95, P99 percentiles)
Queue depth during normal operations
Job frequency (jobs per hour/day)
Failure rates and retry patterns

Queues to Investigate

Potentially long-running (may need lower weights):

usage_billing (weight 2): ClickHouse operations, data processing
- Billing::Usage::ConsumptionJob
- Billing::Usage::EnrichmentJob
- ExportChDataToS3Job
salesforce (weight 4): External API calls with potential timeouts
- Salesforce::CreateOpportunityJob
- Salesforce::CreateQuoteForReconciliationJob
zuora (weight 4): Complex synchronization operations
- Zuora::RefreshLocalSubscriptionsJob
- Zuora::SyncResourceJob

Potentially quick (could have higher weights):

mailers (weight 2): Email delivery (usually fast)
expiration (weight 3): Simple status updates
health_check (weight 4): Quick health checks

Weight Assignment Guidelines

Based on analysis, establish guidelines like:

Quick jobs (< 1 second average):

Can have higher weights (7-10) without blocking
Examples: Health checks, simple notifications, status updates

Medium jobs (1-10 seconds average):

Moderate weights (4-6) appropriate
Examples: API calls, database operations, email sending

Long jobs (> 10 seconds average):

Lower weights (2-3) to prevent starvation
Examples: Bulk data processing, complex synchronization, report generation

Very long jobs (> 30 seconds average):

Lowest weights (1-2) or consider breaking into smaller jobs
Examples: Large data exports, comprehensive audits

Implementation Steps

Gather production metrics (last 30 days):

# Example query for Sidekiq metrics
# - Job duration by queue
# - Queue depth over time
# - Job throughput

Analyze patterns:
- Identify queues with high variance in job duration
- Find queues where long jobs block quick jobs
- Look for correlation between queue depth and job duration
Propose weight adjustments:
- Document current vs. proposed weights
- Explain rationale based on metrics
- Consider business priority alongside duration
Test in staging:
- Simulate production load
- Measure impact on queue latency
- Verify no unintended consequences
Monitor after deployment:
- Track queue depth changes
- Monitor job latency (enqueue to execution time)
- Watch for customer-reported issues
Document findings:
- Create guidelines for future queue weight assignments
- Include typical job durations for each queue
- Establish process for periodic review

Success Criteria

All queues have documented average job durations
Weight assignments consider both business priority and execution time
No queue experiences starvation due to long-running jobs in higher-priority queues
Clear guidelines exist for assigning weights to new queues

Parent epic: &19587
Related: #14268 (weight granularity)
Related: #14270 (user-facing vs internal)