LGTM Stack POC (Loki, Grafana, Tempo, Mimir)

An observability stack based on OpenTelemetry (OTel) and the Grafana "LGTM" suite.

Architecture

Loki: Log aggregation.
Grafana: Visualization and dashboards.
Tempo: Distributed tracing.
Mimir: Scalable Long-term storage for Prometheus metrics.
OTel Collector: Central gateway for receiving and routing telemetry data.
MinIO/S3: Object storage backend for long-term data retention.

Quick Start

1. Prerequisites

Docker and Docker Compose.
uv (for running the example app).

2. Environment Setup

Copy the example environment file and adjust if necessary:

cp .env.example .env

2.1 MinIO Setup (Optional)

Follow the MinIO setup instructions below if you want to use MinIO for local development.

3. Start the Stack

docker-compose up -d

This starts Loki, Tempo, Mimir, Grafana, the OTel Collector, and a local MinIO instance.

4. Run the Example Application

cd example/fastapi-app
uv sync
uv run python main.py

Trigger some data by visiting http://localhost:8000/process.

Configuration: MinIO vs. AWS S3

The stack is currently configured to use MinIO for local development.

Using Local MinIO (Default)

In your .env file:

S3_ENDPOINT=host.docker.internal:9000
S3_INSECURE=true
S3_FORCE_PATH_STYLE=true
AWS_ACCESS_KEY_ID=minioadmin
AWS_SECRET_ACCESS_KEY=minioadmin

Run the container docker compose up -d from example/minio to start the MinIO instance.

The MinIO instance is running at http://localhost:9000 with the default credentials minioadmin/minioadmin.

Go to the MinIO dashboard, and create the buckets loki-logs, tempo-traces, and mimir-metrics.

Using AWS S3

To switch to production AWS S3:

Update .env:
- S3_ENDPOINT: s3.us-east-1.amazonaws.com (or your region's endpoint).
- S3_INSECURE: false.
- S3_FORCE_PATH_STYLE: false.
- AWS_ACCESS_KEY_ID & AWS_SECRET_ACCESS_KEY: Your AWS credentials.
Ensure the buckets (loki-logs, tempo-traces, mimir-metrics) exist in your AWS account or update the bucket name variables in .env.

Mimir Metrics & Dashboards

Use the following table to set up your primary observability dashboard. These metrics are exported by the FastAPI application.

Panel Name	Visualization	Query (PromQL)	Description
Total Request Rate	Time series	`sum(rate(http_requests_total[$__rate_interval])) by (http_target)`	Real-time traffic per endpoint (Requests/sec).
Error Rate (%)	Stat	`sum(rate(http_errors_total[$__range])) / sum(rate(http_requests_total[$__range]))`	Percentage of requests resulting in 4xx/5xx errors over the selected time range.
P95 Latency	Time series	`histogram_quantile(0.95, sum(rate(http_request_duration_seconds_bucket[$__rate_interval])) by (le))`	95th percentile response time for all endpoints.
Active Requests	Gauge	`sum(http_server_active_requests)`	Number of concurrent requests being processed.
Errors by Endpoint	Bar chart	`sum(increase(http_errors_total[$__range])) by (http_target)`	Total errors grouped by path over the selected time range.
Top 5 Slowest Paths	Table	`topk(5, sum(rate(http_request_duration_seconds_sum[$__range])) by (http_target) / sum(rate(http_request_duration_seconds_count[$__range])) by (http_target))`	List of endpoints with the highest average latency.

How to Add a Panel

Click + Add in the top right of your dashboard -> Visualization.
Select Mimir as the data source.
Paste the Query from the table above.
Set the Title to the Panel Name.
Select the Visualization type from the right sidebar.
Click Save or Apply.

Loki Logs & Analysis

Loki allows you to query logs using LogQL. The stack is configured to automatically label logs with metadata like service_name and deployment_environment.

Key Queries

Panel Name	Visualization	Query (LogQL)	Description
Application Logs	Logs	`{service_name="fastapi-service"}`	Live stream of all logs from the FastAPI app.
Error Log Stream	Logs	`{service_name="fastapi-service"} \|= "error"`	Filtered stream showing only lines containing "error" (case-insensitive).
Log Volume	Time series	`count_over_time({service_name="fastapi-service"}[$__interval])`	Bar chart showing the number of log lines produced per interval.
Severity Distribution	Pie chart	`sum by (level) (count_over_time({service_name="fastapi-service"}[$__range]))`	Breakdown of log levels (INFO, ERROR, WARN) for the selected time range.
Error Frequency	Time series	`count_over_time({service_name="fastapi-service"} \|= "error" [$__interval])`	Specifically tracks the rate of error-level logs.

How to Add a Log Panel

Click + Add -> Visualization.
Select Loki as the data source.
Paste one of the Queries above.
Select the matching Visualization type from the right sidebar.

Trace Correlation (Loki -> Tempo)

When viewing logs in the Explore tab or a Logs panel:

Click on a log line to expand it.
Look for the trace_id field.
Click the Tempo button next to the ID to instantly see the full distributed trace for that specific log entry.

Advanced Observability Patterns

Beyond basic metrics, you can leverage the full power of the LGTM stack with these advanced patterns:

Pattern / Metric	Visualization	Query	Description
RED: Rate	Time series	`sum(rate(http_requests_total[$__rate_interval]))`	Request rate per second.
RED: Errors	Time series	`sum(rate(http_errors_total[$__rate_interval]))`	Error rate per second.
RED: Duration	Time series	`histogram_quantile(0.9, sum(rate(http_request_duration_seconds_bucket[$__rate_interval])) by (le))`	90th percentile response time.
Latency Heatmap	Heatmap	`sum(rate(http_request_duration_seconds_bucket[$__rate_interval])) by (le)`	Visual distribution of latency buckets.
Log Severity	Time series / Bar gauge	`sum by (level) (count_over_time({service_name="fastapi-service"} [$__range]))`	Monitor log health by severity over time.
Apdex Score	Stat	`(sum(rate(http_request_duration_seconds_bucket{le="0.5"}[$__range])) + sum(rate(http_request_duration_seconds_bucket{le="1.0"}[$__range])) / 2) / sum(rate(http_request_duration_seconds_count[$__range]))`	Single score (0-1) for user satisfaction.
Resource Grouping	Time series	`sum(rate(http_requests_total[$__rate_interval])) by (service_version, deployment_environment)`	Compare performance across versions/environments.

Tip

Update the fastapi-service service name to your application name.

Dynamic Time Ranges: Instead of hardcoding [5m], use Grafana global variables:
[$__range]: Adjusts to the exact time period selected in the dashboard picker (e.g., Last 1 hour). Use this for total counts (with increase()) or "Stat" panels.
[$__rate_interval]: Automatically calculates the best interval for rate() based on the graph's time range and resolution. Use this for Time series graphs.

Debugging Tips

Unhealthy Ring: If Mimir/Loki report ring issues, ensure replication_factor is set to 1 in the YAML configs for single-node setups.
Log Ingestion: Check the OTel Collector logs (docker logs otel-collector) to see if data is being received and exported correctly.
S3 Connectivity: Ensure the S3 endpoint is reachable from within the Docker containers. On MacOS, host.docker.internal is used to reach the host's port 9000.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
config		config
example		example
.env.example		.env.example
.gitignore		.gitignore
README.md		README.md
docker-compose.yml		docker-compose.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LGTM Stack POC (Loki, Grafana, Tempo, Mimir)

Architecture

Quick Start

1. Prerequisites

2. Environment Setup

2.1 MinIO Setup (Optional)

3. Start the Stack

4. Run the Example Application

Configuration: MinIO vs. AWS S3

Using Local MinIO (Default)

Using AWS S3

Mimir Metrics & Dashboards

How to Add a Panel

Loki Logs & Analysis

Key Queries

How to Add a Log Panel

Trace Correlation (Loki -> Tempo)

Advanced Observability Patterns

Debugging Tips

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

LGTM Stack POC (Loki, Grafana, Tempo, Mimir)

Architecture

Quick Start

1. Prerequisites

2. Environment Setup

2.1 MinIO Setup (Optional)

3. Start the Stack

4. Run the Example Application

Configuration: MinIO vs. AWS S3

Using Local MinIO (Default)

Using AWS S3

Mimir Metrics & Dashboards

How to Add a Panel

Loki Logs & Analysis

Key Queries

How to Add a Log Panel

Trace Correlation (Loki -> Tempo)

Advanced Observability Patterns

Debugging Tips

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages