diff --git a/tracing/README.md b/tracing/README.md new file mode 100644 index 0000000000000000000000000000000000000000..d014b84f406946c80739f26d5b9a2155e04626c0 --- /dev/null +++ b/tracing/README.md @@ -0,0 +1,621 @@ +# Tracing Package + +The `tracing` package is LabKit's primary entrypoint for distributed tracing functionality. It provides a unified interface for instrumenting Go applications with distributed tracing capabilities while abstracting away the underlying tracing implementation details. + +## Overview + +This package uses OpenTracing internally but avoids leaking this abstraction, allowing LabKit to potentially replace OpenTracing with other distributed tracing interfaces (like Zipkin or OpenCensus) without requiring application changes. + +## Key Features + +- **Environment-based configuration** via `GITLAB_TRACING` environment variable +- **HTTP middleware** for instrumenting incoming HTTP requests +- **HTTP RoundTripper** for instrumenting outbound HTTP requests +- **gRPC interceptors** for both client and server-side tracing +- **Process-to-process tracing** via environment variable injection/extraction +- **Multiple tracer backends** support (Jaeger, Datadog, Lightstep, Stackdriver) +- **Correlation ID integration** for request tracking +- **Sampling status checking** for conditional instrumentation + +## Installation and Compilation + +### Build Tags + +Go's OpenTracing interface requires tracing implementations to be compiled into the application using build tags: + +- `tracer_static` - Enables static plugin registry support +- `tracer_static_[DRIVER_NAME]` - Compiles support for a specific driver + +**Examples:** + +```bash +# Compile with Jaeger support +go build -tags "tracer_static,tracer_static_jaeger" + +# Compile with multiple drivers +go build -tags "tracer_static,tracer_static_jaeger,tracer_static_lightstep,tracer_static_datadog,tracer_static_stackdriver" +``` + +If the `GITLAB_TRACING` environment variable references an unknown or unregistered driver, the application will log a message and continue without tracing (graceful degradation). + +## Configuration + +### Connection String Format + +Tracing is configured via the `GITLAB_TRACING` environment variable using a connection string format: + +``` +opentracing://[DRIVER]?[PARAMETERS] +``` + +**Supported Drivers:** + +#### Jaeger +```bash +export GITLAB_TRACING="opentracing://jaeger?udp_endpoint=localhost:6831" +export GITLAB_TRACING="opentracing://jaeger?http_endpoint=http://localhost:14268/api/traces" +export GITLAB_TRACING="opentracing://jaeger?sampler=probabilistic&sampler_param=0.1" +``` + +**Parameters:** +- `service_name` - Service name (overrides initialization option) +- `debug` - Enable debug logging +- `sampler` - Sampler type (const, probabilistic, ratelimiting, remote) +- `sampler_param` - Sampler parameter (float) +- `http_endpoint` - HTTP collector endpoint +- `udp_endpoint` - UDP agent endpoint + +#### Datadog +```bash +export GITLAB_TRACING="opentracing://datadog" +``` + +**Parameters:** +- `service_name` - Service name + +#### Lightstep +```bash +export GITLAB_TRACING="opentracing://lightstep?access_token=YOUR_TOKEN" +``` + +**Parameters:** +- `service_name` - Service name +- `access_token` - Lightstep access token (required) + +#### Stackdriver +```bash +export GITLAB_TRACING="opentracing://stackdriver?project_id=my-project&sampler_probability=0.001" +``` + +**Parameters:** +- `project_id` - GCP project ID +- `service_name` - Service name +- `location` - GCP location +- `sampler_probability` - Sampling probability (0.0 to 1.0) +- `bundle_delay_threshold` - Delay threshold (duration string) +- `bundle_count_threshold` - Count threshold (integer) +- `trace_spans_buffer_max_bytes` - Buffer size (integer) +- `timeout` - Timeout (duration string) +- `number_of_workers` - Number of workers (integer) + +## Core Functions + +### Initialize + +Initializes the global distributed tracer. + +```go +func Initialize(opts ...InitializationOption) io.Closer +``` + +**Options:** +- `WithServiceName(serviceName string)` - Sets the service name for the tracer +- `WithConnectionString(connectionString string)` - Overrides the `GITLAB_TRACING` environment variable + +**Example:** + +```go +import "gitlab.com/gitlab-org/labkit/tracing" + +func main() { + closer := tracing.Initialize( + tracing.WithServiceName("gitlab-workhorse"), + ) + defer closer.Close() + + // Application code... +} +``` + +### HTTP Handler Middleware + +Instruments incoming HTTP requests by extracting tracing information from headers and creating spans. + +```go +func Handler(h http.Handler, opts ...HandlerOption) http.Handler +``` + +**Options:** +- `WithRouteIdentifier(routeIdentifier string)` - Sets a custom route identifier for the operation name + +**Features:** +- Extracts tracing context from incoming request headers +- Creates server-side spans +- Automatically skips health check endpoints (`/-/liveness`, `/-/readiness`) +- Integrates with correlation IDs + +**Example:** + +```go +http.Handle("/api/users", + tracing.Handler( + http.HandlerFunc(handleUsers), + tracing.WithRouteIdentifier("/api/users"), + ), +) +``` + +### HTTP RoundTripper + +Instruments outbound HTTP requests by injecting tracing headers. + +```go +func NewRoundTripper(delegate http.RoundTripper, opts ...RoundTripperOption) http.RoundTripper +``` + +**Features:** +- Injects tracing headers into outbound requests +- Creates client-side spans +- Tracks HTTP client events (connection, TLS handshake, headers, etc.) +- Logs request completion and errors + +**Example:** + +```go +client := &http.Client{ + Transport: tracing.NewRoundTripper(http.DefaultTransport), +} + +req, _ := http.NewRequest("GET", "http://api.example.com/data", nil) +req = req.WithContext(ctx) // Important: pass context for span propagation +resp, err := client.Do(req) +``` + +### gRPC Interceptors + +Located in the `tracing/grpc` subpackage, these interceptors provide tracing for gRPC services. + +```go +func UnaryServerTracingInterceptor() grpc.UnaryServerInterceptor +func StreamServerTracingInterceptor() grpc.StreamServerInterceptor +func UnaryClientTracingInterceptor() grpc.UnaryClientInterceptor +func StreamClientTracingInterceptor() grpc.StreamClientInterceptor +``` + +**Features:** +- Automatically filters out health check calls (`/grpc.health.v1.Health/Check`) +- Supports both unary and streaming RPCs +- Integrates with OpenTracing middleware + +**Example:** + +```go +import grpccorrelation "gitlab.com/gitlab-org/labkit/tracing/grpc" + +// Server +server := grpc.NewServer( + grpc.UnaryInterceptor(grpccorrelation.UnaryServerTracingInterceptor()), + grpc.StreamInterceptor(grpccorrelation.StreamServerTracingInterceptor()), +) + +// Client +conn, err := grpc.Dial( + address, + grpc.WithUnaryInterceptor(grpccorrelation.UnaryClientTracingInterceptor()), + grpc.WithStreamInterceptor(grpccorrelation.StreamClientTracingInterceptor()), +) +``` + +## Process-to-Process Tracing + +### Environment Injection (Parent Process) + +Injects tracing information into environment variables for child processes. + +```go +func NewEnvInjector(opts ...EnvInjectorOption) EnvInjector +``` + +**Returns:** An `EnvInjector` function that takes a context and environment slice, returning an updated environment with tracing data. + +**Example:** + +```go +envInjector := tracing.NewEnvInjector() + +cmd := exec.Command("git", "clone", "...") +env := []string{"PATH=/usr/bin"} + +// Inject tracing and correlation ID into environment +cmd.Env = envInjector(ctx, env) + +if err := cmd.Run(); err != nil { + log.Fatal(err) +} +``` + +**Injected Variables:** +- `CORRELATION_ID` - Current correlation ID +- `GITLAB_TRACING` - Tracing configuration +- Trace and span identifiers (implementation-specific) + +### Environment Extraction (Child Process) + +Extracts tracing information from environment variables in a child process. + +```go +func ExtractFromEnv(ctx context.Context, opts ...ExtractFromEnvOption) (context.Context, func()) +``` + +**Returns:** +- Updated context with span information +- Cleanup function (should be deferred) + +**Example:** + +```go +func main() { + tracing.Initialize(tracing.WithServiceName("git-subprocess")) + + ctx, finished := tracing.ExtractFromEnv(context.Background()) + defer finished() + + // Process execution with tracing context... +} +``` + +## Sampling + +### IsSampled + +Checks whether a span is being sampled (will be sent to the tracing backend). + +```go +func IsSampled(span opentracing.Span) bool +``` + +**Use Cases:** +- Conditionally enable expensive instrumentation (e.g., Git Trace2) +- Optimize performance by skipping data collection for unsampled spans +- Make informed decisions about additional telemetry + +**Example:** + +```go +span := opentracing.SpanFromContext(ctx) +if tracing.IsSampled(span) { + // Enable expensive Git Trace2 instrumentation + enableGitTrace2() +} +``` + +## Correlation Integration + +The tracing package integrates with LabKit's correlation package to propagate correlation IDs: + +### Baggage Handler + +Sets OpenTracing baggage items with the current correlation ID. + +```go +import tracingcorrelation "gitlab.com/gitlab-org/labkit/tracing/correlation" + +http.Handle("/api", + tracingcorrelation.BaggageHandler( + tracing.Handler(handler), + ), +) +``` + +**Features:** +- Automatically sets correlation ID as baggage on spans +- Extracts correlation ID from span baggage if not in context +- Bidirectional correlation ID synchronization + +## Complete Example + +```go +package main + +import ( + "context" + "fmt" + "io" + "net/http" + "time" + + "gitlab.com/gitlab-org/labkit/tracing" +) + +func main() { + // Initialize tracing + closer := tracing.Initialize( + tracing.WithServiceName("gitlab-example"), + ) + defer closer.Close() + + // Create HTTP client with tracing + tr := &http.Transport{ + MaxIdleConns: 10, + IdleConnTimeout: 30 * time.Second, + } + client := &http.Client{ + Transport: tracing.NewRoundTripper(tr), + } + + // Set up HTTP handlers with tracing + http.Handle("/foo", + tracing.Handler( + http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) { + // Make traced outbound request + req, err := http.NewRequest("GET", "http://localhost:8080/bar", nil) + if err != nil { + w.WriteHeader(http.StatusInternalServerError) + return + } + + // Important: pass context for span propagation + req = req.WithContext(r.Context()) + + resp, err := client.Do(req) + if err != nil { + w.WriteHeader(http.StatusInternalServerError) + return + } + defer resp.Body.Close() + + io.Copy(w, resp.Body) + }), + tracing.WithRouteIdentifier("/foo"), + ), + ) + + http.Handle("/bar", + tracing.Handler( + http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) { + fmt.Fprintf(w, "bar") + }), + tracing.WithRouteIdentifier("/bar"), + ), + ) + + http.ListenAndServe(":8080", nil) +} +``` + +## Architecture + +### Package Structure + +``` +tracing/ +├── doc.go # Package documentation +├── initialization.go # Global tracer initialization +├── initialization_options.go # Initialization configuration +├── inbound_http.go # HTTP server middleware +├── inbound_http_options.go # HTTP handler options +├── outbound_http.go # HTTP client RoundTripper +├── outbound_http_options.go # RoundTripper options +├── env_injector.go # Parent process environment injection +├── env_injector_option.go # Injector configuration +├── env_extractor.go # Child process environment extraction +├── env_extractor_option.go # Extractor configuration +├── sampling.go # Sampling status checking +├── errors.go # Error definitions +├── connstr/ # Connection string parser +│ └── connection_string_parser.go +├── correlation/ # Correlation ID integration +│ └── baggage_handler.go +├── grpc/ # gRPC interceptors +│ ├── client_interceptors.go +│ ├── server_interceptors.go +│ └── healthcheck_filter.go +└── impl/ # Tracer implementations + ├── tracer_registry.go # Registry for tracer factories + ├── static_tracer.go # Static tracer loader + ├── null_tracer.go # No-op tracer (no build tags) + ├── jaeger_tracer.go # Jaeger implementation + ├── datadog_tracer.go # Datadog implementation + ├── lightstep_tracer.go # Lightstep implementation + ├── stackdriver_tracer.go # Stackdriver implementation + ├── default_sampling.go # Default sampling checker + ├── jaeger_sampling.go # Jaeger sampling support + └── lightstep_sampling.go # Lightstep sampling support +``` + +### Design Principles + +1. **No Leaky Abstractions** - The package hides OpenTracing implementation details +2. **Graceful Degradation** - Missing or misconfigured tracers don't crash the application +3. **Low Overhead** - Without initialization, tracing has minimal performance impact +4. **Consistent Configuration** - Same configuration format across Go and Ruby components +5. **Conditional Compilation** - Only compile the tracer backends you need + +## Implementation Details + +### Tracer Registry + +The `impl` package maintains a registry of tracer factory functions. Each tracer implementation registers itself during initialization: + +```go +func registerTracer(name string, factory tracerFactoryFunc) +``` + +Tracer factories are conditionally compiled based on build tags and automatically register themselves in `init()` functions. + +### HTTP Client Tracing + +The `tracingRoundTripper` wraps an HTTP transport and: +1. Extracts the parent span from the request context +2. Creates a child span for the outbound request +3. Injects tracing headers into the request +4. Tracks HTTP client events using `httptrace.ClientTrace`: + - Connection start/done + - TLS handshake start/done + - Headers written + - Request written + - First response byte received +5. Logs errors and completion status + +### HTTP Server Tracing + +The `Handler` middleware: +1. Skips health check endpoints (`/-/liveness`, `/-/readiness`) +2. Extracts tracing context from incoming request headers +3. Creates a server-side span +4. Adds correlation ID as a span tag if available +5. Passes the span through the request context + +### Operation Naming + +**Default Behavior:** +- **Inbound HTTP:** `{METHOD} {PATH}` (e.g., `GET /api/users`) +- **Outbound HTTP:** `{METHOD} {SCHEME}://{HOST}` (e.g., `GET https://api.example.com`) + +**Custom Naming:** +Use `WithRouteIdentifier()` to set consistent operation names for routes with dynamic parameters: + +```go +// Instead of: GET /users/123, GET /users/456 +// Use: GET /users/:id +tracing.Handler(handler, tracing.WithRouteIdentifier("/users/:id")) +``` + +### Correlation ID Propagation + +The package integrates with LabKit's correlation package: + +1. **HTTP Requests:** Correlation IDs are added as span tags +2. **Environment Variables:** Correlation IDs are passed via `CORRELATION_ID` env var +3. **Baggage:** The `BaggageHandler` synchronizes correlation IDs with span baggage + +## Error Handling + +The package defines: +- `tracing.ErrConfiguration` - Returned when the tracer is not properly configured +- `impl.ErrConfiguration` - Base exception for implementation-specific configuration errors + +All configuration errors are logged but don't prevent the application from starting. The tracer falls back to a no-op implementation. + +## Advanced Usage + +### Sampling-Based Instrumentation + +Use `IsSampled()` to conditionally enable expensive instrumentation: + +```go +import ( + "github.com/opentracing/opentracing-go" + "gitlab.com/gitlab-org/labkit/tracing" +) + +func processGitCommand(ctx context.Context) { + span := opentracing.SpanFromContext(ctx) + + if tracing.IsSampled(span) { + // Enable Git Trace2 only for sampled requests + // This provides deeper Git process insights without + // overwhelming the system + enableGitTrace2() + } + + // Execute Git command... +} +``` + +### Multi-Service Tracing + +When spawning child processes that should participate in the same trace: + +**Parent Process:** +```go +envInjector := tracing.NewEnvInjector() + +cmd := exec.Command("./child-service") +cmd.Env = envInjector(ctx, os.Environ()) +cmd.Run() +``` + +**Child Process:** +```go +func main() { + tracing.Initialize(tracing.WithServiceName("child-service")) + + ctx, finished := tracing.ExtractFromEnv(context.Background()) + defer finished() + + // Child process work with inherited trace context... +} +``` + +### Custom Operation Names + +Provide custom operation names for better trace organization: + +```go +// For HTTP handlers +handler := tracing.Handler( + myHandler, + tracing.WithRouteIdentifier("/api/v1/projects/:id"), +) + +// Operation name will be: "GET /api/v1/projects/:id" +// instead of: "GET /api/v1/projects/123" +``` + +## Best Practices + +1. **Always pass context** - Ensure `req.WithContext(ctx)` is called for outbound requests +2. **Use route identifiers** - Avoid high-cardinality operation names with dynamic IDs +3. **Defer cleanup** - Always defer the closer from `Initialize()` and `ExtractFromEnv()` +4. **Compile selectively** - Only include tracer backends you actually use +5. **Test without tracing** - Ensure your application works when tracing is disabled +6. **Check sampling** - Use `IsSampled()` before enabling expensive instrumentation +7. **Health checks** - The package automatically excludes health checks from tracing + +## Compatibility + +This package is designed to work consistently across GitLab's polyglot architecture: +- **Go services:** Workhorse, Gitaly, GitLab Shell +- **Ruby services:** GitLab Rails (via `Gitlab::Tracing`) + +The shared `GITLAB_TRACING` environment variable allows unified configuration across all services, making it easy to enable tracing for an entire GitLab installation with a single configuration change. + +## Troubleshooting + +### Tracing Not Working + +1. **Check build tags** - Ensure you compiled with `tracer_static` and the appropriate driver tag +2. **Verify environment variable** - Check that `GITLAB_TRACING` is set correctly +3. **Check logs** - Look for initialization messages or configuration errors +4. **Test connection string** - Verify the connection string format is valid + +### No Spans Appearing + +1. **Check sampling** - The tracer may be configured with low sampling rates +2. **Verify context propagation** - Ensure contexts are passed through request chains +3. **Check backend connectivity** - Verify the tracing backend is reachable +4. **Review health checks** - Health check endpoints are automatically excluded + +### Performance Issues + +1. **Adjust sampling rates** - Lower the sampling probability +2. **Use conditional instrumentation** - Check `IsSampled()` before expensive operations +3. **Review buffer settings** - Adjust backend-specific buffer configurations +4. **Monitor overhead** - Tracing should have minimal impact when properly configured + +## Related Packages + +- `gitlab.com/gitlab-org/labkit/correlation` - Request correlation ID management +- `gitlab.com/gitlab-org/labkit/log` - Structured logging with tracing integration +- `github.com/opentracing/opentracing-go` - OpenTracing API (internal dependency)