Go to file
salah 5083e377a0 fix: Update NarratorSummary model to make name_arabic optional and name_transliterated nullable; enhance query filters for non-null Arabic names 2026-03-02 22:07:32 +01:00
app fix: Update NarratorSummary model to make name_arabic optional and name_transliterated nullable; enhance query filters for non-null Arabic names 2026-03-02 22:07:32 +01:00
k8s fix: Update Qdrant configuration in settings and deployment files for correct host, port, and collection 2026-02-27 00:02:31 +01:00
.env.example fix: Update Qdrant host and port configuration in environment files and deployment settings 2026-02-26 23:20:17 +01:00
Dockerfile feat: Implement Hadith and Narrator endpoints with search functionality 2026-02-26 22:17:58 +01:00
README.md fix: Update database credentials and Qdrant host in deployment configuration 2026-02-26 22:24:39 +01:00
deploy.sh feat: Update deployment configuration and add build & deploy script 2026-02-26 22:38:02 +01:00
requirements.txt feat: Implement Hadith and Narrator endpoints with search functionality 2026-02-26 22:17:58 +01:00

README.md

Hadith Scholar API — حَدِيثٌ

Production-grade REST API for analyzing Islamic hadith literature across 8+ major collections.

Built with FastAPI · PostgreSQL · Neo4j · Qdrant · Elasticsearch


Overview

The Hadith Scholar API provides structured access to ~41,000 hadiths from the major canonical collections, enriched with:

  • LLM-extracted narrator chains — structured isnad parsing with entity typing
  • Narrator knowledge graph — biographies, teacher/student networks, places, tribes (Neo4j)
  • Multilingual semantic search — find hadiths by meaning in Arabic, English, or Urdu (BGE-M3 + Qdrant)
  • Full-text Arabic search — morphological analysis with stemming and root extraction (Elasticsearch)
  • Interactive API docs — Swagger UI with Arabic examples on every endpoint

Collections

Collection Arabic Hadiths
Sahih Bukhari صحيح البخاري 6,986
Sahih Muslim صحيح مسلم 15,034
Sunan Abu Dawood سنن أبي داود 5,274
Jami` at-Tirmidhi جامع الترمذي
Sunan an-Nasa'i سنن النسائي 5,758
Sunan Ibn Majah سنن ابن ماجه 4,341
Musnad Ahmad مسند أحمد
Muwatta Malik موطأ مالك

API Endpoints

Hadiths (/hadiths)

Method Endpoint Description
GET /hadiths/{hadith_id} Full hadith details with narrator chain and topics
GET /hadiths/collection/{name} Paginated listing by collection
GET /hadiths/number/{collection}/{number} Lookup by collection name + hadith number
GET /hadiths/search/keyword?q=صلاة Arabic keyword search with filters
GET /hadiths/search/topic/{topic} Search by topic tag
GET /hadiths/search/narrator/{name} Find hadiths by narrator

Narrators (/narrators)

Method Endpoint Description
GET /narrators/search?q=أبو هريرة Search by name (Arabic or transliterated)
GET /narrators/profile/{name_arabic} Full biography, hadiths, teachers, students, places
GET /narrators/by-generation/{gen} List narrators by طبقة (صحابي, تابعي, etc.)
GET /narrators/by-place/{place} Narrators associated with a place
GET /narrators/interactions/{name} All relationships for a narrator
GET /narrators/who-met-who?narrator_a=X&narrator_b=Y Shortest path between two narrators

Isnad Chains (/chains)

Method Endpoint Description
GET /chains/hadith/{hadith_id} Chain as graph (nodes + links) for visualization
GET /chains/narrator/{name} All chains containing a narrator
GET /chains/common-chains?narrator_a=X&narrator_b=Y Hadiths where both narrators appear
Method Endpoint Description
GET /search/semantic?q=what did the prophet say about fasting Semantic search (any language)
GET /search/fulltext?q=الصلاة Arabic full-text with morphological analysis
GET /search/combined?q=صيام رمضان Both semantic + full-text in parallel

System

Method Endpoint Description
GET / API info and endpoint listing
GET /health Health check (verifies all 4 backends)
GET /stats Database statistics
GET /docs Swagger UI
GET /redoc ReDoc documentation
GET /openapi.json OpenAPI 3.1 spec

Example Requests

Search for hadiths about prayer

curl "https://hadith-api.betelgeusebytes.io/hadiths/search/keyword?q=صلاة&collection=Sahih%20Bukhari&grade=Sahih"

Get narrator profile

curl "https://hadith-api.betelgeusebytes.io/narrators/profile/أبو%20هريرة"

Semantic search (English → Arabic results)

curl "https://hadith-api.betelgeusebytes.io/search/semantic?q=what%20is%20the%20reward%20of%20prayer"

Check if two narrators are connected

curl "https://hadith-api.betelgeusebytes.io/narrators/who-met-who?narrator_a=الزهري&narrator_b=أنس%20بن%20مالك"

Get isnad chain for a hadith

curl "https://hadith-api.betelgeusebytes.io/chains/hadith/{hadith_uuid}"

Architecture

                    ┌──────────────────────────────┐
                    │      FastAPI Application      │
                    │    hadith-api.betelgeusebytes.io    │
                    └─────────┬────────────────────┘
                              │
            ┌─────────────────┼─────────────────────┐
            │                 │                       │
    ┌───────▼──────┐  ┌──────▼───────┐  ┌───────────▼──────────┐
    │  PostgreSQL   │  │    Neo4j     │  │   Qdrant + TEI       │
    │  41k hadiths  │  │  Knowledge   │  │   Semantic search    │
    │  full text    │  │  Graph       │  │   1024-dim BGE-M3    │
    └──────────────┘  │  - Narrators  │  └──────────────────────┘
                      │  - Chains     │
                      │  - Places     │  ┌──────────────────────┐
                      │  - Tribes     │  │   Elasticsearch      │
                      │  - Topics     │  │   Arabic full-text   │
                      └──────────────┘  │   morphological       │
                                        └──────────────────────┘

Backend Responsibilities

Backend What it stores Used by
PostgreSQL Raw hadith text (Arabic/English/Urdu), metadata, grades /hadiths/* keyword search, collection listing
Neo4j Narrator graph, isnad chains, topics, places, tribes /narrators/*, /chains/*, topic search
Qdrant 1024-dim BGE-M3 embeddings for all 41k hadiths /search/semantic
Elasticsearch Arabic-analyzed hadith text index /search/fulltext
TEI BGE-M3 embedding inference (query → vector) /search/semantic (query encoding)

Knowledge Graph Model

(:Narrator)-[:APPEARS_IN {chain_order, transmission_verb}]->(:Hadith)
(:Narrator)-[:NARRATED_FROM {hadith_ids}]->(:Narrator)
(:Narrator)-[:TEACHER_OF]->(:Narrator)
(:Narrator)-[:BORN_IN|LIVED_IN|DIED_IN|TRAVELED_TO]->(:Place)
(:Narrator)-[:BELONGS_TO_TRIBE]->(:Tribe)
(:Hadith)-[:HAS_TOPIC]->(:Topic)

Narrator Properties

  • name_arabic / name_transliterated — primary identifiers
  • full_nasab — complete lineage (فلان بن فلان بن فلان)
  • kunya — أبو/أم names
  • nisba — attributional (-i suffix: البخاري، المدني)
  • generation — طبقة: صحابي، تابعي، تابع التابعين
  • reliability_grade — جرح وتعديل: ثقة، صدوق، ضعيف
  • biography_summary_arabic / biography_summary_english — bilingual bios
  • birth_year_hijri / death_year_hijri — dates in Hijri calendar

Setup

Prerequisites

  • Python 3.12+
  • Docker
  • Access to PostgreSQL, Neo4j, Qdrant, Elasticsearch, TEI

Local Development

# Clone
git clone <repo_url>
cd hadith-api

# Configure
cp .env.example .env
# Edit .env with your credentials

# Install
pip install -r requirements.txt

# Run
uvicorn app.main:app --reload --port 8000

# Open docs
open http://localhost:8000/docs

Environment Variables

Variable Description Default
HADITH_PG_HOST PostgreSQL host pg.betelgeusebytes.io
HADITH_PG_PORT PostgreSQL port 5432
HADITH_PG_DBNAME Database name
HADITH_PG_USER Database user
HADITH_PG_PASSWORD Database password
HADITH_PG_SSLMODE SSL mode require
HADITH_NEO4J_URI Neo4j bolt URI neo4j+ssc://neo4j.betelgeusebytes.io:7687
HADITH_NEO4J_USER Neo4j user neo4j
HADITH_NEO4J_PASSWORD Neo4j password
HADITH_QDRANT_HOST Qdrant host qdrant.vector.svc.cluster.local
HADITH_QDRANT_PORT Qdrant port 6333
HADITH_QDRANT_COLLECTION Qdrant collection name hadiths
HADITH_ES_HOST Elasticsearch URL http://elasticsearch.elastic.svc.cluster.local:9200
HADITH_ES_INDEX Elasticsearch index hadiths
HADITH_TEI_URL TEI embedding service http://tei.ml.svc.cluster.local:80

Deployment (Kubernetes)

Build & Push

docker build -t axxs/hadith-api:latest .
docker push axxs/hadith-api:latest

Deploy

# Edit secrets in k8s/deployment.yaml first
kubectl apply -f k8s/deployment.yaml

# Watch rollout
kubectl rollout status deployment/hadith-api -n api

# Verify
kubectl get pods -n api -l app=hadith-api
curl https://hadith-api.betelgeusebytes.io/health

What gets created

  • Namespace: api
  • Secret: hadith-api-secrets (PG + Neo4j credentials)
  • Deployment: 2 replicas with health checks
  • Service: ClusterIP on port 80 → container 8000
  • Ingress: TLS via cert-manager at hadith-api.betelgeusebytes.io

Resource Limits

  • Requests: 250m CPU, 256Mi RAM per pod
  • Limits: 1 CPU, 512Mi RAM per pod

Project Structure

hadith-api/
├── app/
│   ├── main.py                 # FastAPI app, lifespan, health, stats
│   ├── config.py               # Pydantic settings (env vars)
│   ├── models/
│   │   └── schemas.py          # Response models with examples
│   ├── routers/
│   │   ├── hadiths.py          # /hadiths/* — details, search, listing
│   │   ├── narrators.py        # /narrators/* — profiles, relationships
│   │   ├── chains.py           # /chains/* — isnad visualization
│   │   └── search.py           # /search/* — semantic + full-text
│   └── services/
│       └── database.py         # PG, Neo4j, Qdrant, ES connections
├── k8s/
│   └── deployment.yaml         # K8s namespace + secret + deploy + svc + ingress
├── Dockerfile
├── .dockerignore
├── .env.example
├── requirements.txt
├── deploy.sh
└── README.md

Data Pipeline

The API consumes data produced by the hadith extraction pipeline:

  HadithAPI.com ──► PostgreSQL (41k hadiths, raw text)
                         │
                         ├──► TEI (BGE-M3) ──► Qdrant (embeddings)
                         ├──► Elasticsearch (full-text index)
                         └──► LLM Extraction (OpenAI/Gemini)
                                   │
                                   ├──► Phase A: sanad/matn split, narrator chains, entities, topics
                                   └──► Phase B: narrator biographies from classical scholarship
                                            │
                                            └──► MinIO (JSON) ──► Neo4j (knowledge graph)

License

MIT