Go to file

salah 5083e377a0 fix: Update NarratorSummary model to make name_arabic optional and name_transliterated nullable; enhance query filters for non-null Arabic names		2026-03-02 22:07:32 +01:00
app	fix: Update NarratorSummary model to make name_arabic optional and name_transliterated nullable; enhance query filters for non-null Arabic names	2026-03-02 22:07:32 +01:00
k8s	fix: Update Qdrant configuration in settings and deployment files for correct host, port, and collection	2026-02-27 00:02:31 +01:00
.env.example	fix: Update Qdrant host and port configuration in environment files and deployment settings	2026-02-26 23:20:17 +01:00
Dockerfile	feat: Implement Hadith and Narrator endpoints with search functionality	2026-02-26 22:17:58 +01:00
README.md	fix: Update database credentials and Qdrant host in deployment configuration	2026-02-26 22:24:39 +01:00
deploy.sh	feat: Update deployment configuration and add build & deploy script	2026-02-26 22:38:02 +01:00
requirements.txt	feat: Implement Hadith and Narrator endpoints with search functionality	2026-02-26 22:17:58 +01:00

README.md

Hadith Scholar API — حَدِيثٌ

Production-grade REST API for analyzing Islamic hadith literature across 8+ major collections.

Built with FastAPI · PostgreSQL · Neo4j · Qdrant · Elasticsearch

Overview

The Hadith Scholar API provides structured access to ~41,000 hadiths from the major canonical collections, enriched with:

LLM-extracted narrator chains — structured isnad parsing with entity typing
Narrator knowledge graph — biographies, teacher/student networks, places, tribes (Neo4j)
Multilingual semantic search — find hadiths by meaning in Arabic, English, or Urdu (BGE-M3 + Qdrant)
Full-text Arabic search — morphological analysis with stemming and root extraction (Elasticsearch)
Interactive API docs — Swagger UI with Arabic examples on every endpoint

Collections

Collection	Arabic	Hadiths
Sahih Bukhari	صحيح البخاري	6,986
Sahih Muslim	صحيح مسلم	15,034
Sunan Abu Dawood	سنن أبي داود	5,274
Jami` at-Tirmidhi	جامع الترمذي	—
Sunan an-Nasa'i	سنن النسائي	5,758
Sunan Ibn Majah	سنن ابن ماجه	4,341
Musnad Ahmad	مسند أحمد	—
Muwatta Malik	موطأ مالك	—

API Endpoints

Hadiths (`/hadiths`)

Method	Endpoint	Description
`GET`	`/hadiths/{hadith_id}`	Full hadith details with narrator chain and topics
`GET`	`/hadiths/collection/{name}`	Paginated listing by collection
`GET`	`/hadiths/number/{collection}/{number}`	Lookup by collection name + hadith number
`GET`	`/hadiths/search/keyword?q=صلاة`	Arabic keyword search with filters
`GET`	`/hadiths/search/topic/{topic}`	Search by topic tag
`GET`	`/hadiths/search/narrator/{name}`	Find hadiths by narrator

Narrators (`/narrators`)

Method	Endpoint	Description
`GET`	`/narrators/search?q=أبو هريرة`	Search by name (Arabic or transliterated)
`GET`	`/narrators/profile/{name_arabic}`	Full biography, hadiths, teachers, students, places
`GET`	`/narrators/by-generation/{gen}`	List narrators by طبقة (صحابي, تابعي, etc.)
`GET`	`/narrators/by-place/{place}`	Narrators associated with a place
`GET`	`/narrators/interactions/{name}`	All relationships for a narrator
`GET`	`/narrators/who-met-who?narrator_a=X&narrator_b=Y`	Shortest path between two narrators

Isnad Chains (`/chains`)

Method	Endpoint	Description
`GET`	`/chains/hadith/{hadith_id}`	Chain as graph (nodes + links) for visualization
`GET`	`/chains/narrator/{name}`	All chains containing a narrator
`GET`	`/chains/common-chains?narrator_a=X&narrator_b=Y`	Hadiths where both narrators appear

Search (`/search`)

Method	Endpoint	Description
`GET`	`/search/semantic?q=what did the prophet say about fasting`	Semantic search (any language)
`GET`	`/search/fulltext?q=الصلاة`	Arabic full-text with morphological analysis
`GET`	`/search/combined?q=صيام رمضان`	Both semantic + full-text in parallel

System

Method	Endpoint	Description
`GET`	`/`	API info and endpoint listing
`GET`	`/health`	Health check (verifies all 4 backends)
`GET`	`/stats`	Database statistics
`GET`	`/docs`	Swagger UI
`GET`	`/redoc`	ReDoc documentation
`GET`	`/openapi.json`	OpenAPI 3.1 spec

Example Requests

Search for hadiths about prayer

curl "https://hadith-api.betelgeusebytes.io/hadiths/search/keyword?q=صلاة&collection=Sahih%20Bukhari&grade=Sahih"

Get narrator profile

curl "https://hadith-api.betelgeusebytes.io/narrators/profile/أبو%20هريرة"

Semantic search (English → Arabic results)

curl "https://hadith-api.betelgeusebytes.io/search/semantic?q=what%20is%20the%20reward%20of%20prayer"

Check if two narrators are connected

curl "https://hadith-api.betelgeusebytes.io/narrators/who-met-who?narrator_a=الزهري&narrator_b=أنس%20بن%20مالك"

Get isnad chain for a hadith

curl "https://hadith-api.betelgeusebytes.io/chains/hadith/{hadith_uuid}"

Architecture

                    ┌──────────────────────────────┐
                    │      FastAPI Application      │
                    │    hadith-api.betelgeusebytes.io    │
                    └─────────┬────────────────────┘
                              │
            ┌─────────────────┼─────────────────────┐
            │                 │                       │
    ┌───────▼──────┐  ┌──────▼───────┐  ┌───────────▼──────────┐
    │  PostgreSQL   │  │    Neo4j     │  │   Qdrant + TEI       │
    │  41k hadiths  │  │  Knowledge   │  │   Semantic search    │
    │  full text    │  │  Graph       │  │   1024-dim BGE-M3    │
    └──────────────┘  │  - Narrators  │  └──────────────────────┘
                      │  - Chains     │
                      │  - Places     │  ┌──────────────────────┐
                      │  - Tribes     │  │   Elasticsearch      │
                      │  - Topics     │  │   Arabic full-text   │
                      └──────────────┘  │   morphological       │
                                        └──────────────────────┘

Backend Responsibilities

Backend	What it stores	Used by
PostgreSQL	Raw hadith text (Arabic/English/Urdu), metadata, grades	`/hadiths/*` keyword search, collection listing
Neo4j	Narrator graph, isnad chains, topics, places, tribes	`/narrators/`, `/chains/`, topic search
Qdrant	1024-dim BGE-M3 embeddings for all 41k hadiths	`/search/semantic`
Elasticsearch	Arabic-analyzed hadith text index	`/search/fulltext`
TEI	BGE-M3 embedding inference (query → vector)	`/search/semantic` (query encoding)

Knowledge Graph Model

(:Narrator)-[:APPEARS_IN {chain_order, transmission_verb}]->(:Hadith)
(:Narrator)-[:NARRATED_FROM {hadith_ids}]->(:Narrator)
(:Narrator)-[:TEACHER_OF]->(:Narrator)
(:Narrator)-[:BORN_IN|LIVED_IN|DIED_IN|TRAVELED_TO]->(:Place)
(:Narrator)-[:BELONGS_TO_TRIBE]->(:Tribe)
(:Hadith)-[:HAS_TOPIC]->(:Topic)

Narrator Properties

name_arabic / name_transliterated — primary identifiers
full_nasab — complete lineage (فلان بن فلان بن فلان)
kunya — أبو/أم names
nisba — attributional (-i suffix: البخاري، المدني)
generation — طبقة: صحابي، تابعي، تابع التابعين
reliability_grade — جرح وتعديل: ثقة، صدوق، ضعيف
biography_summary_arabic / biography_summary_english — bilingual bios
birth_year_hijri / death_year_hijri — dates in Hijri calendar

Setup

Prerequisites

Python 3.12+
Docker
Access to PostgreSQL, Neo4j, Qdrant, Elasticsearch, TEI

Local Development

# Clone
git clone <repo_url>
cd hadith-api

# Configure
cp .env.example .env
# Edit .env with your credentials

# Install
pip install -r requirements.txt

# Run
uvicorn app.main:app --reload --port 8000

# Open docs
open http://localhost:8000/docs

Environment Variables

Variable	Description	Default
`HADITH_PG_HOST`	PostgreSQL host	`pg.betelgeusebytes.io`
`HADITH_PG_PORT`	PostgreSQL port	`5432`
`HADITH_PG_DBNAME`	Database name	—
`HADITH_PG_USER`	Database user	—
`HADITH_PG_PASSWORD`	Database password	—
`HADITH_PG_SSLMODE`	SSL mode	`require`
`HADITH_NEO4J_URI`	Neo4j bolt URI	`neo4j+ssc://neo4j.betelgeusebytes.io:7687`
`HADITH_NEO4J_USER`	Neo4j user	`neo4j`
`HADITH_NEO4J_PASSWORD`	Neo4j password	—
`HADITH_QDRANT_HOST`	Qdrant host	`qdrant.vector.svc.cluster.local`
`HADITH_QDRANT_PORT`	Qdrant port	`6333`
`HADITH_QDRANT_COLLECTION`	Qdrant collection name	`hadiths`
`HADITH_ES_HOST`	Elasticsearch URL	`http://elasticsearch.elastic.svc.cluster.local:9200`
`HADITH_ES_INDEX`	Elasticsearch index	`hadiths`
`HADITH_TEI_URL`	TEI embedding service	`http://tei.ml.svc.cluster.local:80`

Deployment (Kubernetes)

Build & Push

docker build -t axxs/hadith-api:latest .
docker push axxs/hadith-api:latest

Deploy

# Edit secrets in k8s/deployment.yaml first
kubectl apply -f k8s/deployment.yaml

# Watch rollout
kubectl rollout status deployment/hadith-api -n api

# Verify
kubectl get pods -n api -l app=hadith-api
curl https://hadith-api.betelgeusebytes.io/health

What gets created

Namespace: api
Secret: hadith-api-secrets (PG + Neo4j credentials)
Deployment: 2 replicas with health checks
Service: ClusterIP on port 80 → container 8000
Ingress: TLS via cert-manager at hadith-api.betelgeusebytes.io

Resource Limits

Requests: 250m CPU, 256Mi RAM per pod
Limits: 1 CPU, 512Mi RAM per pod

Project Structure

hadith-api/
├── app/
│   ├── main.py                 # FastAPI app, lifespan, health, stats
│   ├── config.py               # Pydantic settings (env vars)
│   ├── models/
│   │   └── schemas.py          # Response models with examples
│   ├── routers/
│   │   ├── hadiths.py          # /hadiths/* — details, search, listing
│   │   ├── narrators.py        # /narrators/* — profiles, relationships
│   │   ├── chains.py           # /chains/* — isnad visualization
│   │   └── search.py           # /search/* — semantic + full-text
│   └── services/
│       └── database.py         # PG, Neo4j, Qdrant, ES connections
├── k8s/
│   └── deployment.yaml         # K8s namespace + secret + deploy + svc + ingress
├── Dockerfile
├── .dockerignore
├── .env.example
├── requirements.txt
├── deploy.sh
└── README.md

Data Pipeline

The API consumes data produced by the hadith extraction pipeline:

  HadithAPI.com ──► PostgreSQL (41k hadiths, raw text)
                         │
                         ├──► TEI (BGE-M3) ──► Qdrant (embeddings)
                         ├──► Elasticsearch (full-text index)
                         └──► LLM Extraction (OpenAI/Gemini)
                                   │
                                   ├──► Phase A: sanad/matn split, narrator chains, entities, topics
                                   └──► Phase B: narrator biographies from classical scholarship
                                            │
                                            └──► MinIO (JSON) ──► Neo4j (knowledge graph)

License

MIT

README.md Unescape Escape