573 lines
15 KiB
Markdown
573 lines
15 KiB
Markdown
# Access URLs & Monitoring New Applications Guide
|
|
|
|
## 🌐 Access URLs
|
|
|
|
### Required (Already Configured)
|
|
|
|
**Grafana - Main Dashboard**
|
|
- **URL**: https://grafana.betelgeusebytes.io
|
|
- **DNS Required**: Yes - `grafana.betelgeusebytes.io` → your cluster IP
|
|
- **Login**: admin / admin (change on first login!)
|
|
- **Purpose**: Unified interface for logs, metrics, and traces
|
|
- **Ingress**: Already included in deployment (20-grafana-ingress.yaml)
|
|
|
|
### Optional (Direct Component Access)
|
|
|
|
You can optionally expose these components directly:
|
|
|
|
**Prometheus - Metrics UI**
|
|
- **URL**: https://prometheus.betelgeusebytes.io
|
|
- **DNS Required**: Yes - `prometheus.betelgeusebytes.io` → your cluster IP
|
|
- **Purpose**: Direct access to Prometheus UI, query metrics, check targets
|
|
- **Deploy**: `kubectl apply -f 21-optional-ingresses.yaml`
|
|
- **Use Case**: Debugging metric collection, advanced PromQL queries
|
|
|
|
**Loki - Logs API**
|
|
- **URL**: https://loki.betelgeusebytes.io
|
|
- **DNS Required**: Yes - `loki.betelgeusebytes.io` → your cluster IP
|
|
- **Purpose**: Direct API access for log queries
|
|
- **Deploy**: `kubectl apply -f 21-optional-ingresses.yaml`
|
|
- **Use Case**: External log forwarding, API integration
|
|
|
|
**Tempo - Traces API**
|
|
- **URL**: https://tempo.betelgeusebytes.io
|
|
- **DNS Required**: Yes - `tempo.betelgeusebytes.io` → your cluster IP
|
|
- **Purpose**: Direct API access for trace queries
|
|
- **Deploy**: `kubectl apply -f 21-optional-ingresses.yaml`
|
|
- **Use Case**: External trace ingestion, API integration
|
|
|
|
### Internal Only (No DNS Required)
|
|
|
|
These are ClusterIP services accessible only from within the cluster:
|
|
|
|
```
|
|
http://prometheus.observability.svc.cluster.local:9090
|
|
http://loki.observability.svc.cluster.local:3100
|
|
http://tempo.observability.svc.cluster.local:3200
|
|
http://tempo.observability.svc.cluster.local:4317 # OTLP gRPC
|
|
http://tempo.observability.svc.cluster.local:4318 # OTLP HTTP
|
|
```
|
|
|
|
## 🎯 Recommendation
|
|
|
|
**For most users**: Just use Grafana (grafana.betelgeusebytes.io)
|
|
- Grafana provides unified access to all components
|
|
- No need to expose Prometheus, Loki, or Tempo directly
|
|
- Simpler DNS configuration (only one subdomain)
|
|
|
|
**For power users**: Add optional ingresses
|
|
- Direct Prometheus access is useful for debugging
|
|
- Helps verify targets and scrape configs
|
|
- Deploy with: `kubectl apply -f 21-optional-ingresses.yaml`
|
|
|
|
## 📊 Monitoring New Applications
|
|
|
|
### Automatic: Kubernetes Logs
|
|
|
|
**All pod logs are automatically collected!** No configuration needed.
|
|
|
|
Alloy runs as a DaemonSet and automatically:
|
|
1. Discovers all pods in the cluster
|
|
2. Reads logs from `/var/log/pods/`
|
|
3. Sends them to Loki with labels:
|
|
- `namespace`
|
|
- `pod`
|
|
- `container`
|
|
- `node`
|
|
- All pod labels
|
|
|
|
**View in Grafana:**
|
|
```logql
|
|
# All logs from your app
|
|
{namespace="your-namespace", pod=~"your-app.*"}
|
|
|
|
# Error logs only
|
|
{namespace="your-namespace"} |= "error"
|
|
|
|
# JSON logs parsed
|
|
{namespace="your-namespace"} | json | level="error"
|
|
```
|
|
|
|
**Best Practice for Logs:**
|
|
Emit structured JSON logs from your application:
|
|
|
|
```python
|
|
import json
|
|
import logging
|
|
|
|
# Python example
|
|
logging.basicConfig(
|
|
format='%(message)s',
|
|
level=logging.INFO
|
|
)
|
|
|
|
logger = logging.getLogger(__name__)
|
|
|
|
# Log as JSON
|
|
logger.info(json.dumps({
|
|
"level": "info",
|
|
"message": "User login successful",
|
|
"user_id": "123",
|
|
"ip": "1.2.3.4",
|
|
"duration_ms": 42
|
|
}))
|
|
```
|
|
|
|
### Manual: Application Metrics
|
|
|
|
#### Step 1: Expose Metrics Endpoint
|
|
|
|
Your application needs to expose metrics at `/metrics` in Prometheus format.
|
|
|
|
**Python (Flask) Example:**
|
|
```python
|
|
from prometheus_flask_exporter import PrometheusMetrics
|
|
|
|
app = Flask(__name__)
|
|
metrics = PrometheusMetrics(app)
|
|
|
|
# Now /metrics endpoint is available
|
|
# Automatic metrics: request count, duration, etc.
|
|
```
|
|
|
|
**Python (FastAPI) Example:**
|
|
```python
|
|
from prometheus_fastapi_instrumentator import Instrumentator
|
|
|
|
app = FastAPI()
|
|
Instrumentator().instrument(app).expose(app)
|
|
|
|
# /metrics endpoint is now available
|
|
```
|
|
|
|
**Go Example:**
|
|
```go
|
|
import (
|
|
"github.com/prometheus/client_golang/prometheus/promhttp"
|
|
"net/http"
|
|
)
|
|
|
|
http.Handle("/metrics", promhttp.Handler())
|
|
```
|
|
|
|
**Node.js Example:**
|
|
```javascript
|
|
const promClient = require('prom-client');
|
|
|
|
// Create default metrics
|
|
const register = new promClient.Registry();
|
|
promClient.collectDefaultMetrics({ register });
|
|
|
|
// Expose /metrics endpoint
|
|
app.get('/metrics', async (req, res) => {
|
|
res.set('Content-Type', register.contentType);
|
|
res.end(await register.metrics());
|
|
});
|
|
```
|
|
|
|
#### Step 2: Add Prometheus Annotations to Your Deployment
|
|
|
|
Add these annotations to your pod template:
|
|
|
|
```yaml
|
|
apiVersion: apps/v1
|
|
kind: Deployment
|
|
metadata:
|
|
name: my-app
|
|
namespace: my-namespace
|
|
spec:
|
|
template:
|
|
metadata:
|
|
annotations:
|
|
prometheus.io/scrape: "true" # Enable scraping
|
|
prometheus.io/port: "8080" # Port where metrics are exposed
|
|
prometheus.io/path: "/metrics" # Path to metrics (optional, /metrics is default)
|
|
spec:
|
|
containers:
|
|
- name: my-app
|
|
image: my-app:latest
|
|
ports:
|
|
- name: http
|
|
containerPort: 8080
|
|
```
|
|
|
|
#### Step 3: Verify Metrics Collection
|
|
|
|
**Check in Prometheus:**
|
|
1. Access Prometheus UI (if exposed): https://prometheus.betelgeusebytes.io
|
|
2. Go to Status → Targets
|
|
3. Look for your pod under "kubernetes-pods"
|
|
4. Should show as "UP"
|
|
|
|
**Or via Grafana:**
|
|
1. Go to Explore → Prometheus
|
|
2. Query: `up{pod=~"my-app.*"}`
|
|
3. Should return value=1
|
|
|
|
**Query your metrics:**
|
|
```promql
|
|
# Request rate
|
|
rate(http_requests_total{namespace="my-namespace"}[5m])
|
|
|
|
# Request duration 95th percentile
|
|
histogram_quantile(0.95, rate(http_request_duration_seconds_bucket[5m]))
|
|
|
|
# Error rate
|
|
rate(http_requests_total{namespace="my-namespace", status=~"5.."}[5m])
|
|
```
|
|
|
|
### Manual: Application Traces
|
|
|
|
#### Step 1: Add OpenTelemetry to Your Application
|
|
|
|
**Python Example:**
|
|
```python
|
|
from opentelemetry import trace
|
|
from opentelemetry.sdk.trace import TracerProvider
|
|
from opentelemetry.sdk.trace.export import BatchSpanProcessor
|
|
from opentelemetry.exporter.otlp.proto.grpc.trace_exporter import OTLPSpanExporter
|
|
from opentelemetry.instrumentation.flask import FlaskInstrumentor
|
|
from opentelemetry.sdk.resources import Resource
|
|
|
|
# Configure resource
|
|
resource = Resource.create({"service.name": "my-app"})
|
|
|
|
# Setup tracer
|
|
trace_provider = TracerProvider(resource=resource)
|
|
trace_provider.add_span_processor(
|
|
BatchSpanProcessor(
|
|
OTLPSpanExporter(
|
|
endpoint="http://tempo.observability.svc.cluster.local:4317",
|
|
insecure=True
|
|
)
|
|
)
|
|
)
|
|
trace.set_tracer_provider(trace_provider)
|
|
|
|
# Auto-instrument Flask
|
|
app = Flask(__name__)
|
|
FlaskInstrumentor().instrument_app(app)
|
|
|
|
# Manual spans
|
|
tracer = trace.get_tracer(__name__)
|
|
|
|
@app.route('/api/data')
|
|
def get_data():
|
|
with tracer.start_as_current_span("fetch_data") as span:
|
|
# Your code here
|
|
span.set_attribute("rows", 100)
|
|
return {"data": "..."}
|
|
```
|
|
|
|
**Install dependencies:**
|
|
```bash
|
|
pip install opentelemetry-api opentelemetry-sdk \
|
|
opentelemetry-instrumentation-flask \
|
|
opentelemetry-exporter-otlp-proto-grpc
|
|
```
|
|
|
|
**Go Example:**
|
|
```go
|
|
import (
|
|
"go.opentelemetry.io/otel"
|
|
"go.opentelemetry.io/otel/exporters/otlp/otlptrace/otlptracegrpc"
|
|
"go.opentelemetry.io/otel/sdk/trace"
|
|
)
|
|
|
|
exporter, _ := otlptracegrpc.New(
|
|
context.Background(),
|
|
otlptracegrpc.WithEndpoint("tempo.observability.svc.cluster.local:4317"),
|
|
otlptracegrpc.WithInsecure(),
|
|
)
|
|
|
|
tp := trace.NewTracerProvider(
|
|
trace.WithBatcher(exporter),
|
|
)
|
|
otel.SetTracerProvider(tp)
|
|
```
|
|
|
|
**Node.js Example:**
|
|
```javascript
|
|
const { NodeTracerProvider } = require('@opentelemetry/sdk-trace-node');
|
|
const { OTLPTraceExporter } = require('@opentelemetry/exporter-trace-otlp-grpc');
|
|
const { BatchSpanProcessor } = require('@opentelemetry/sdk-trace-base');
|
|
|
|
const provider = new NodeTracerProvider();
|
|
const exporter = new OTLPTraceExporter({
|
|
url: 'http://tempo.observability.svc.cluster.local:4317'
|
|
});
|
|
provider.addSpanProcessor(new BatchSpanProcessor(exporter));
|
|
provider.register();
|
|
```
|
|
|
|
#### Step 2: Add Trace IDs to Logs (Optional but Recommended)
|
|
|
|
This enables clicking from logs to traces in Grafana!
|
|
|
|
**Python Example:**
|
|
```python
|
|
import json
|
|
from opentelemetry import trace
|
|
|
|
def log_with_trace(message):
|
|
span = trace.get_current_span()
|
|
trace_id = format(span.get_span_context().trace_id, '032x')
|
|
|
|
log_entry = {
|
|
"message": message,
|
|
"trace_id": trace_id,
|
|
"level": "info"
|
|
}
|
|
print(json.dumps(log_entry))
|
|
```
|
|
|
|
#### Step 3: Verify Traces
|
|
|
|
**In Grafana:**
|
|
1. Go to Explore → Tempo
|
|
2. Search for service: "my-app"
|
|
3. Click on a trace to view details
|
|
4. Click "Logs for this span" to see correlated logs
|
|
|
|
## 📋 Complete Example: Monitoring a New App
|
|
|
|
Here's a complete deployment with all monitoring configured:
|
|
|
|
```yaml
|
|
---
|
|
apiVersion: v1
|
|
kind: ConfigMap
|
|
metadata:
|
|
name: my-app-config
|
|
namespace: my-namespace
|
|
data:
|
|
app.py: |
|
|
from flask import Flask
|
|
import logging
|
|
import json
|
|
from opentelemetry import trace
|
|
from opentelemetry.sdk.trace import TracerProvider
|
|
from opentelemetry.sdk.trace.export import BatchSpanProcessor
|
|
from opentelemetry.exporter.otlp.proto.grpc.trace_exporter import OTLPSpanExporter
|
|
from opentelemetry.instrumentation.flask import FlaskInstrumentor
|
|
from opentelemetry.sdk.resources import Resource
|
|
from prometheus_flask_exporter import PrometheusMetrics
|
|
|
|
# Setup logging
|
|
logging.basicConfig(level=logging.INFO, format='%(message)s')
|
|
logger = logging.getLogger(__name__)
|
|
|
|
# Setup tracing
|
|
resource = Resource.create({"service.name": "my-app"})
|
|
trace_provider = TracerProvider(resource=resource)
|
|
trace_provider.add_span_processor(
|
|
BatchSpanProcessor(
|
|
OTLPSpanExporter(
|
|
endpoint="http://tempo.observability.svc.cluster.local:4317",
|
|
insecure=True
|
|
)
|
|
)
|
|
)
|
|
trace.set_tracer_provider(trace_provider)
|
|
|
|
app = Flask(__name__)
|
|
|
|
# Setup metrics
|
|
metrics = PrometheusMetrics(app)
|
|
|
|
# Auto-instrument with traces
|
|
FlaskInstrumentor().instrument_app(app)
|
|
|
|
@app.route('/')
|
|
def index():
|
|
span = trace.get_current_span()
|
|
trace_id = format(span.get_span_context().trace_id, '032x')
|
|
|
|
logger.info(json.dumps({
|
|
"level": "info",
|
|
"message": "Request received",
|
|
"trace_id": trace_id,
|
|
"endpoint": "/"
|
|
}))
|
|
|
|
return {"status": "ok", "trace_id": trace_id}
|
|
|
|
if __name__ == '__main__':
|
|
app.run(host='0.0.0.0', port=8080)
|
|
|
|
---
|
|
apiVersion: apps/v1
|
|
kind: Deployment
|
|
metadata:
|
|
name: my-app
|
|
namespace: my-namespace
|
|
labels:
|
|
app: my-app
|
|
spec:
|
|
replicas: 2
|
|
selector:
|
|
matchLabels:
|
|
app: my-app
|
|
template:
|
|
metadata:
|
|
labels:
|
|
app: my-app
|
|
annotations:
|
|
# Enable Prometheus scraping
|
|
prometheus.io/scrape: "true"
|
|
prometheus.io/port: "8080"
|
|
prometheus.io/path: "/metrics"
|
|
spec:
|
|
containers:
|
|
- name: my-app
|
|
image: python:3.11-slim
|
|
command:
|
|
- /bin/bash
|
|
- -c
|
|
- |
|
|
pip install flask opentelemetry-api opentelemetry-sdk \
|
|
opentelemetry-instrumentation-flask \
|
|
opentelemetry-exporter-otlp-proto-grpc \
|
|
prometheus-flask-exporter && \
|
|
python /app/app.py
|
|
ports:
|
|
- name: http
|
|
containerPort: 8080
|
|
volumeMounts:
|
|
- name: app-code
|
|
mountPath: /app
|
|
resources:
|
|
requests:
|
|
cpu: 100m
|
|
memory: 256Mi
|
|
limits:
|
|
cpu: 500m
|
|
memory: 512Mi
|
|
volumes:
|
|
- name: app-code
|
|
configMap:
|
|
name: my-app-config
|
|
|
|
---
|
|
apiVersion: v1
|
|
kind: Service
|
|
metadata:
|
|
name: my-app
|
|
namespace: my-namespace
|
|
labels:
|
|
app: my-app
|
|
annotations:
|
|
prometheus.io/scrape: "true"
|
|
prometheus.io/port: "8080"
|
|
spec:
|
|
type: ClusterIP
|
|
ports:
|
|
- port: 8080
|
|
targetPort: http
|
|
protocol: TCP
|
|
name: http
|
|
selector:
|
|
app: my-app
|
|
```
|
|
|
|
## 🔍 Verification Checklist
|
|
|
|
After deploying a new app with monitoring:
|
|
|
|
### Logs ✓ (Automatic)
|
|
```bash
|
|
# Check logs appear in Grafana
|
|
# Explore → Loki → {namespace="my-namespace", pod=~"my-app.*"}
|
|
```
|
|
|
|
### Metrics ✓ (If configured)
|
|
```bash
|
|
# Check Prometheus is scraping
|
|
# Explore → Prometheus → up{pod=~"my-app.*"}
|
|
# Should return 1
|
|
|
|
# Check your custom metrics
|
|
# Explore → Prometheus → flask_http_request_total{namespace="my-namespace"}
|
|
```
|
|
|
|
### Traces ✓ (If configured)
|
|
```bash
|
|
# Check traces appear in Tempo
|
|
# Explore → Tempo → Search for service "my-app"
|
|
# Should see traces
|
|
|
|
# Verify log-trace correlation
|
|
# Click on a log line with trace_id → should jump to trace
|
|
```
|
|
|
|
## 🎓 Quick Start for Common Frameworks
|
|
|
|
### Python Flask/FastAPI
|
|
```bash
|
|
pip install opentelemetry-distro opentelemetry-exporter-otlp prometheus-flask-exporter
|
|
opentelemetry-bootstrap -a install
|
|
```
|
|
|
|
```python
|
|
# Set environment variables in your deployment:
|
|
OTEL_SERVICE_NAME=my-app
|
|
OTEL_EXPORTER_OTLP_ENDPOINT=http://tempo.observability.svc.cluster.local:4317
|
|
OTEL_EXPORTER_OTLP_PROTOCOL=grpc
|
|
|
|
# Then run with auto-instrumentation:
|
|
opentelemetry-instrument python app.py
|
|
```
|
|
|
|
### Go
|
|
```bash
|
|
go get go.opentelemetry.io/otel
|
|
go get go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp
|
|
go get go.opentelemetry.io/otel/exporters/otlp/otlptrace/otlptracegrpc
|
|
```
|
|
|
|
### Node.js
|
|
```bash
|
|
npm install @opentelemetry/sdk-node @opentelemetry/auto-instrumentations-node \
|
|
@opentelemetry/exporter-trace-otlp-grpc prom-client
|
|
```
|
|
|
|
## 📚 Summary
|
|
|
|
| Component | Automatic? | Configuration Needed |
|
|
|-----------|-----------|---------------------|
|
|
| **Logs** | ✅ Yes | None - just deploy your app |
|
|
| **Metrics** | ❌ No | Add /metrics endpoint + annotations |
|
|
| **Traces** | ❌ No | Add OpenTelemetry SDK + configure endpoint |
|
|
|
|
**Recommended Approach:**
|
|
1. **Start simple**: Deploy app, logs work automatically
|
|
2. **Add metrics**: Expose /metrics, add annotations
|
|
3. **Add traces**: Instrument with OpenTelemetry
|
|
4. **Correlate**: Add trace IDs to logs for full observability
|
|
|
|
## 🔗 Useful Links
|
|
|
|
- OpenTelemetry Python: https://opentelemetry.io/docs/instrumentation/python/
|
|
- OpenTelemetry Go: https://opentelemetry.io/docs/instrumentation/go/
|
|
- OpenTelemetry Node.js: https://opentelemetry.io/docs/instrumentation/js/
|
|
- Prometheus Client Libraries: https://prometheus.io/docs/instrumenting/clientlibs/
|
|
- Grafana Docs: https://grafana.com/docs/
|
|
|
|
## 🆘 Troubleshooting
|
|
|
|
**Logs not appearing:**
|
|
- Check Alloy is running: `kubectl get pods -n observability -l app=alloy`
|
|
- Check pod logs are being written to stdout/stderr
|
|
- View in real-time: `kubectl logs -f <pod-name> -n <namespace>`
|
|
|
|
**Metrics not being scraped:**
|
|
- Verify annotations are present: `kubectl get pod <pod> -o yaml | grep prometheus`
|
|
- Check /metrics endpoint: `kubectl port-forward pod/<pod> 8080:8080` then `curl localhost:8080/metrics`
|
|
- Check Prometheus targets: https://prometheus.betelgeusebytes.io/targets
|
|
|
|
**Traces not appearing:**
|
|
- Verify endpoint: `tempo.observability.svc.cluster.local:4317`
|
|
- Check Tempo logs: `kubectl logs -n observability tempo-0`
|
|
- Verify OTLP exporter is configured correctly in your app
|
|
- Check network policies allow traffic to observability namespace
|