6.4 KiB
6.4 KiB
CLAUDE.md - BetelgeuseBytes Full Stack
Project Overview
Kubernetes cluster deployment for BetelgeuseBytes using Ansible for infrastructure automation and kubectl for application deployment. This is a complete data science/ML platform with integrated observability, databases, and ML tools.
Infrastructure:
- 2-node Kubernetes cluster on Hetzner Cloud
- Control plane + worker: hetzner-1 (95.217.89.53)
- Worker node: hetzner-2 (138.201.254.97)
- Kubernetes v1.30.3 with Cilium CNI
Directory Structure
.
├── ansible/ # Infrastructure-as-Code for cluster setup
│ ├── inventories/prod/ # Hetzner nodes inventory & group vars
│ │ ├── hosts.ini # Node definitions
│ │ └── group_vars/all.yml # Global K8s config (versions, CIDRs)
│ ├── playbooks/
│ │ ├── site.yml # Main cluster bootstrap playbook
│ │ └── add-control-planes.yml # HA control plane expansion
│ └── roles/ # 16 reusable Ansible roles
│ ├── common/ # Swap disable, kernel modules, sysctl
│ ├── containerd/ # Container runtime
│ ├── kubernetes/ # kubeadm, kubelet, kubectl
│ ├── kubeadm_init/ # Primary control plane init
│ ├── kubeadm_join/ # Worker node join
│ ├── cilium/ # CNI plugin
│ ├── ingress/ # NGINX Ingress Controller
│ ├── cert_manager/ # Let's Encrypt integration
│ ├── labels/ # Node labeling
│ └── storage_local_path/ # Local storage provisioning
└── k8s/ # Kubernetes manifests
├── 00-namespaces.yaml # 8 namespaces
├── 01-secrets/ # Basic auth secrets
├── storage/ # StorageClass, PersistentVolumes
├── postgres/ # PostgreSQL 16 with extensions
├── redis/ # Redis 7 cache
├── elastic/ # Elasticsearch 8.14 + Kibana
├── gitea/ # Git repository service
├── jupyter/ # JupyterLab notebook
├── kafka/ # Apache Kafka broker
├── neo4j/ # Neo4j graph database
├── prometheus/ # Prometheus monitoring
├── grafana/ # Grafana dashboards
├── minio/ # S3-compatible object storage
├── mlflow/ # ML lifecycle tracking
├── vllm/ # LLM inference (Ollama)
├── label_studio/ # Data annotation platform
├── argoflow/ # Argo Workflows
├── otlp/ # OpenTelemetry collector
└── observability/ # Fluent-Bit log aggregation
Build & Deployment Commands
Phase 1: Cluster Infrastructure
# Validate connectivity
ansible -i ansible/inventories/prod/hosts.ini all -m ping
# Bootstrap Kubernetes cluster
ansible-playbook -i ansible/inventories/prod/hosts.ini ansible/playbooks/site.yml
Phase 2: Kubernetes Applications (order matters)
# 1. Namespaces & storage
kubectl apply -f k8s/00-namespaces.yaml
kubectl apply -f k8s/storage/storageclass.yaml
# 2. Secrets & auth
kubectl apply -f k8s/01-secrets/
# 3. Infrastructure (databases, cache, search)
kubectl apply -f k8s/postgres/
kubectl apply -f k8s/redis/
kubectl apply -f k8s/elastic/elasticsearch.yaml
kubectl apply -f k8s/elastic/kibana.yaml
# 4. Application layer
kubectl apply -f k8s/gitea/
kubectl apply -f k8s/jupyter/
kubectl apply -f k8s/kafka/kafka.yaml
kubectl apply -f k8s/kafka/kafka-ui.yaml
kubectl apply -f k8s/neo4j/
# 5. Observability & telemetry
kubectl apply -f k8s/otlp/
kubectl apply -f k8s/observability/fluent-bit.yaml
kubectl apply -f k8s/prometheus/
kubectl apply -f k8s/grafana/
Namespace Organization
| Namespace | Purpose | Services |
|---|---|---|
db |
Databases & cache | PostgreSQL, Redis |
scm |
Source control | Gitea |
ml |
Machine Learning | JupyterLab, MLflow, Argo, Label Studio, Ollama |
elastic |
Search & logging | Elasticsearch, Kibana |
broker |
Message brokers | Kafka |
graph |
Graph databases | Neo4j |
monitoring |
Observability | Prometheus, Grafana |
observability |
Telemetry | OpenTelemetry, Fluent-Bit |
storage |
Object storage | MinIO |
Key Configuration
Kubernetes:
- Pod CIDR: 10.244.0.0/16
- Service CIDR: 10.96.0.0/12
- CNI: Cilium v1.15.7
Storage:
- StorageClass:
local-ssd-hetzner(local volumes) - All stateful workloads pinned to hetzner-2
- Local path:
/mnt/local-ssd/{service-name}
Networking:
- Internal DNS:
service.namespace.svc.cluster.local - External:
{service}.betelgeusebytes.iovia NGINX Ingress - TLS: Let's Encrypt via cert-manager
DNS Records
A records point to both nodes:
apps.betelgeusebytes.io→ 95.217.89.53, 138.201.254.97
CNAMEs to apps.betelgeusebytes.io:
- gitea, kibana, grafana, prometheus, notebook, broker, neo4j, otlp, label, llm, mlflow, minio
Secrets Location
k8s/01-secrets/basic-auth.yaml- HTTP basic auth for protected services- Service-specific secrets inline in respective manifests (e.g., postgres-auth, redis-auth)
Manifest Conventions
- Compact YAML style:
metadata: { name: xyz, namespace: ns } - StatefulSets for persistent services (databases, brokers)
- Deployments for stateless services (web UIs, workers)
- DaemonSets for node-level agents (Fluent-Bit)
- Service port=80 for ingress routing, backend maps to container port
- Ingress with TLS + basic auth annotations where needed
Common Operations
# Check cluster status
kubectl get nodes
kubectl get pods -A
# View logs for a service
kubectl logs -n <namespace> -l app=<service-name>
# Scale a deployment
kubectl scale -n <namespace> deployment/<name> --replicas=N
# Apply changes to a specific service
kubectl apply -f k8s/<service>/
# Delete and recreate a service
kubectl delete -f k8s/<service>/ && kubectl apply -f k8s/<service>/
Notes
- This is a development/test setup; passwords are hardcoded in manifests
- Elasticsearch security is disabled for development
- GPU support for vLLM is commented out (requires nvidia.com/gpu resources)
- Neo4j Bolt protocol (7687) requires manual ingress-nginx TCP patch