Hadith Scholar API — حَدِيثٌ
Production-grade REST API for analyzing Islamic hadith literature across 8+ major collections.
Built with FastAPI · PostgreSQL · Neo4j · Qdrant · Elasticsearch
Overview
The Hadith Scholar API provides structured access to ~41,000 hadiths from the major canonical collections, enriched with:
- LLM-extracted narrator chains — structured isnad parsing with entity typing
- Narrator knowledge graph — biographies, teacher/student networks, places, tribes (Neo4j)
- Multilingual semantic search — find hadiths by meaning in Arabic, English, or Urdu (BGE-M3 + Qdrant)
- Full-text Arabic search — morphological analysis with stemming and root extraction (Elasticsearch)
- Interactive API docs — Swagger UI with Arabic examples on every endpoint
Collections
| Collection |
Arabic |
Hadiths |
| Sahih Bukhari |
صحيح البخاري |
6,986 |
| Sahih Muslim |
صحيح مسلم |
15,034 |
| Sunan Abu Dawood |
سنن أبي داود |
5,274 |
| Jami` at-Tirmidhi |
جامع الترمذي |
— |
| Sunan an-Nasa'i |
سنن النسائي |
5,758 |
| Sunan Ibn Majah |
سنن ابن ماجه |
4,341 |
| Musnad Ahmad |
مسند أحمد |
— |
| Muwatta Malik |
موطأ مالك |
— |
API Endpoints
Hadiths (/hadiths)
| Method |
Endpoint |
Description |
GET |
/hadiths/{hadith_id} |
Full hadith details with narrator chain and topics |
GET |
/hadiths/collection/{name} |
Paginated listing by collection |
GET |
/hadiths/number/{collection}/{number} |
Lookup by collection name + hadith number |
GET |
/hadiths/search/keyword?q=صلاة |
Arabic keyword search with filters |
GET |
/hadiths/search/topic/{topic} |
Search by topic tag |
GET |
/hadiths/search/narrator/{name} |
Find hadiths by narrator |
Narrators (/narrators)
| Method |
Endpoint |
Description |
GET |
/narrators/search?q=أبو هريرة |
Search by name (Arabic or transliterated) |
GET |
/narrators/profile/{name_arabic} |
Full biography, hadiths, teachers, students, places |
GET |
/narrators/by-generation/{gen} |
List narrators by طبقة (صحابي, تابعي, etc.) |
GET |
/narrators/by-place/{place} |
Narrators associated with a place |
GET |
/narrators/interactions/{name} |
All relationships for a narrator |
GET |
/narrators/who-met-who?narrator_a=X&narrator_b=Y |
Shortest path between two narrators |
Isnad Chains (/chains)
| Method |
Endpoint |
Description |
GET |
/chains/hadith/{hadith_id} |
Chain as graph (nodes + links) for visualization |
GET |
/chains/narrator/{name} |
All chains containing a narrator |
GET |
/chains/common-chains?narrator_a=X&narrator_b=Y |
Hadiths where both narrators appear |
Search (/search)
| Method |
Endpoint |
Description |
GET |
/search/semantic?q=what did the prophet say about fasting |
Semantic search (any language) |
GET |
/search/fulltext?q=الصلاة |
Arabic full-text with morphological analysis |
GET |
/search/combined?q=صيام رمضان |
Both semantic + full-text in parallel |
System
| Method |
Endpoint |
Description |
GET |
/ |
API info and endpoint listing |
GET |
/health |
Health check (verifies all 4 backends) |
GET |
/stats |
Database statistics |
GET |
/docs |
Swagger UI |
GET |
/redoc |
ReDoc documentation |
GET |
/openapi.json |
OpenAPI 3.1 spec |
Example Requests
Search for hadiths about prayer
curl "https://hadith-api.betelgeusebytes.io/hadiths/search/keyword?q=صلاة&collection=Sahih%20Bukhari&grade=Sahih"
Get narrator profile
curl "https://hadith-api.betelgeusebytes.io/narrators/profile/أبو%20هريرة"
Semantic search (English → Arabic results)
curl "https://hadith-api.betelgeusebytes.io/search/semantic?q=what%20is%20the%20reward%20of%20prayer"
Check if two narrators are connected
curl "https://hadith-api.betelgeusebytes.io/narrators/who-met-who?narrator_a=الزهري&narrator_b=أنس%20بن%20مالك"
Get isnad chain for a hadith
curl "https://hadith-api.betelgeusebytes.io/chains/hadith/{hadith_uuid}"
Architecture
┌──────────────────────────────┐
│ FastAPI Application │
│ hadith-api.betelgeusebytes.io │
└─────────┬────────────────────┘
│
┌─────────────────┼─────────────────────┐
│ │ │
┌───────▼──────┐ ┌──────▼───────┐ ┌───────────▼──────────┐
│ PostgreSQL │ │ Neo4j │ │ Qdrant + TEI │
│ 41k hadiths │ │ Knowledge │ │ Semantic search │
│ full text │ │ Graph │ │ 1024-dim BGE-M3 │
└──────────────┘ │ - Narrators │ └──────────────────────┘
│ - Chains │
│ - Places │ ┌──────────────────────┐
│ - Tribes │ │ Elasticsearch │
│ - Topics │ │ Arabic full-text │
└──────────────┘ │ morphological │
└──────────────────────┘
Backend Responsibilities
| Backend |
What it stores |
Used by |
| PostgreSQL |
Raw hadith text (Arabic/English/Urdu), metadata, grades |
/hadiths/* keyword search, collection listing |
| Neo4j |
Narrator graph, isnad chains, topics, places, tribes |
/narrators/*, /chains/*, topic search |
| Qdrant |
1024-dim BGE-M3 embeddings for all 41k hadiths |
/search/semantic |
| Elasticsearch |
Arabic-analyzed hadith text index |
/search/fulltext |
| TEI |
BGE-M3 embedding inference (query → vector) |
/search/semantic (query encoding) |
Knowledge Graph Model
(:Narrator)-[:APPEARS_IN {chain_order, transmission_verb}]->(:Hadith)
(:Narrator)-[:NARRATED_FROM {hadith_ids}]->(:Narrator)
(:Narrator)-[:TEACHER_OF]->(:Narrator)
(:Narrator)-[:BORN_IN|LIVED_IN|DIED_IN|TRAVELED_TO]->(:Place)
(:Narrator)-[:BELONGS_TO_TRIBE]->(:Tribe)
(:Hadith)-[:HAS_TOPIC]->(:Topic)
Narrator Properties
name_arabic / name_transliterated — primary identifiers
full_nasab — complete lineage (فلان بن فلان بن فلان)
kunya — أبو/أم names
nisba — attributional (-i suffix: البخاري، المدني)
generation — طبقة: صحابي، تابعي، تابع التابعين
reliability_grade — جرح وتعديل: ثقة، صدوق، ضعيف
biography_summary_arabic / biography_summary_english — bilingual bios
birth_year_hijri / death_year_hijri — dates in Hijri calendar
Setup
Prerequisites
- Python 3.12+
- Docker
- Access to PostgreSQL, Neo4j, Qdrant, Elasticsearch, TEI
Local Development
# Clone
git clone <repo_url>
cd hadith-api
# Configure
cp .env.example .env
# Edit .env with your credentials
# Install
pip install -r requirements.txt
# Run
uvicorn app.main:app --reload --port 8000
# Open docs
open http://localhost:8000/docs
Environment Variables
| Variable |
Description |
Default |
HADITH_PG_HOST |
PostgreSQL host |
pg.betelgeusebytes.io |
HADITH_PG_PORT |
PostgreSQL port |
5432 |
HADITH_PG_DBNAME |
Database name |
— |
HADITH_PG_USER |
Database user |
— |
HADITH_PG_PASSWORD |
Database password |
— |
HADITH_PG_SSLMODE |
SSL mode |
require |
HADITH_NEO4J_URI |
Neo4j bolt URI |
neo4j+ssc://neo4j.betelgeusebytes.io:7687 |
HADITH_NEO4J_USER |
Neo4j user |
neo4j |
HADITH_NEO4J_PASSWORD |
Neo4j password |
— |
HADITH_QDRANT_HOST |
Qdrant host |
qdrant.vector.svc.cluster.local |
HADITH_QDRANT_PORT |
Qdrant port |
6333 |
HADITH_QDRANT_COLLECTION |
Qdrant collection name |
hadiths |
HADITH_ES_HOST |
Elasticsearch URL |
http://elasticsearch.elastic.svc.cluster.local:9200 |
HADITH_ES_INDEX |
Elasticsearch index |
hadiths |
HADITH_TEI_URL |
TEI embedding service |
http://tei.ml.svc.cluster.local:80 |
Deployment (Kubernetes)
Build & Push
docker build -t axxs/hadith-api:latest .
docker push axxs/hadith-api:latest
Deploy
# Edit secrets in k8s/deployment.yaml first
kubectl apply -f k8s/deployment.yaml
# Watch rollout
kubectl rollout status deployment/hadith-api -n api
# Verify
kubectl get pods -n api -l app=hadith-api
curl https://hadith-api.betelgeusebytes.io/health
What gets created
- Namespace:
api
- Secret:
hadith-api-secrets (PG + Neo4j credentials)
- Deployment: 2 replicas with health checks
- Service: ClusterIP on port 80 → container 8000
- Ingress: TLS via cert-manager at
hadith-api.betelgeusebytes.io
Resource Limits
- Requests: 250m CPU, 256Mi RAM per pod
- Limits: 1 CPU, 512Mi RAM per pod
Project Structure
hadith-api/
├── app/
│ ├── main.py # FastAPI app, lifespan, health, stats
│ ├── config.py # Pydantic settings (env vars)
│ ├── models/
│ │ └── schemas.py # Response models with examples
│ ├── routers/
│ │ ├── hadiths.py # /hadiths/* — details, search, listing
│ │ ├── narrators.py # /narrators/* — profiles, relationships
│ │ ├── chains.py # /chains/* — isnad visualization
│ │ └── search.py # /search/* — semantic + full-text
│ └── services/
│ └── database.py # PG, Neo4j, Qdrant, ES connections
├── k8s/
│ └── deployment.yaml # K8s namespace + secret + deploy + svc + ingress
├── Dockerfile
├── .dockerignore
├── .env.example
├── requirements.txt
├── deploy.sh
└── README.md
Data Pipeline
The API consumes data produced by the hadith extraction pipeline:
HadithAPI.com ──► PostgreSQL (41k hadiths, raw text)
│
├──► TEI (BGE-M3) ──► Qdrant (embeddings)
├──► Elasticsearch (full-text index)
└──► LLM Extraction (OpenAI/Gemini)
│
├──► Phase A: sanad/matn split, narrator chains, entities, topics
└──► Phase B: narrator biographies from classical scholarship
│
└──► MinIO (JSON) ──► Neo4j (knowledge graph)
License
MIT