# Lector — Implementation Guide

Current as of April 2026. Describes the running production architecture and how to extend it.

---

## Running Architecture

Lector is deployed via Docker Compose at **lector.nerdbox.com** with three containers:

| Container | Image | Purpose |
|-----------|-------|---------|
| `lector-app` | `lector-app:latest` | Express backend + React frontend (port 5001→5000) |
| `lector-postgres` | `postgres:16` | User data, SRS queue, paradigm tables |
| `morpheus-api` | `morpheus-api:latest` | Python/Flask sidecar — runs Morpheus cruncher for real-time morphological analysis |

**Nginx** (host) reverse-proxies lector.nerdbox.com → port 5001. SSL via Let's Encrypt.

**Volumes:**
- `lector_data` — read-only morphology SQLite DB (`morphology.db`), DICTLINE, Morpheus stemlib
- `lector-postgres` data volume — PostgreSQL user data (persists across container restarts/rebuilds)

---

## Key Files

```
server/
  index.ts              — Boot order: ensureSchema → session → passport → auth → billing → instructor → routes → seedParadigms
  routes.ts             — Core API routes: passage delivery, morphology, SRS, admin
  db.ts                 — pg.Pool singleton + ensureSchema() (creates all tables on startup)
  pg-storage.ts         — PgStorage: full async PostgreSQL implementation
  auth.ts               — setupSession, setupPassport, attachUserId, registerAuthRoutes, linkPendingClassInvites
  billing.ts            — getEffectiveTier(), requireTier(), Stripe checkout/webhooks, promo codes, academic teams
  instructor.ts         — Instructor routes (classes, roster, assignments, dashboard) + student assignments
  gloss-service.ts      — getOrQueueGloss(), getGlossBatch() — AI-generated lemma glosses from lemma_glosses table
  morphology-db.ts      — lookupMorphDb(), lookupMorphApi(), lookupDefinition(), buildParadigm()
  paradigm-db.ts        — queryParadigmPg(), buildParadigmPg(), seedParadigms()
  paradigm-generator.ts — Rules-based paradigm generation (Latin regular/deponent, Greek thematic/contract)

server/data/
  lemmas.ts             — ~124 curated lemmas with macronized principal parts + frequency ranks
  passages.ts           — Static passage seed (9,755 live passages loaded from DB at runtime)
  whitakers.ts          — DICTLINE parser: loadWhitakersVerbs() + loadWhitakersDeponents()
  greek-verbs.ts        — morphology.db loader: loadGreekVerbs() (16K verbs + LSJ principal parts)
  lsj-principal-parts.ts — Extracts pp2–pp6 from LSJ HTML in morphology.db definitions table
  inflections.ts        — Legacy in-memory inflection table (tier 3 fallback only)

morpheus-api/
  server.py             — Flask app wrapping Morpheus cruncher; handles accentless Greek input
                          via Beta Code accent variant generation (acute + circumflex)

shared/schema.ts        — TypeScript types (Lemma, Passage, Morphology, ParadigmData, etc.)
```

---

## Morphological Parse Pipeline (Three Tiers)

When `/api/morphology?q=<word>` is called:

**Tier 1 — SQLite morphology DB** (`morphology.db`, read-only, ~222K entries):
- Lookup by exact form, then by diacritics-stripped bare form
- Returns full morphological parse + short definition
- Special case for Greek: if SQLite returns results but none are verbs, also calls Tier 2
  and merges (prevents adjective entries from blocking verb lookups — e.g. πλέος vs πλέω)

**Tier 2 — Morpheus sidecar** (`morpheus-api` container, real-time):
- Converts Unicode → Beta Code, expands accentless input into all valid accent variants
- Generates **both acute (/) and circumflex (=)** variants for diphthong/long-vowel nuclei
- Supports both smooth and rough breathings for vowel-initial words
- Calls the Morpheus cruncher binary, parses output, converts back to Unicode
- Handles forms not in the SQLite DB (inflected forms, uncommon vocabulary)

**Tier 3 — In-memory inflection table** (`server/data/inflections.ts`):
- ~2,700 hardcoded forms, legacy fallback only

### Known Morpheus Quirks (routes.ts)

**`HIDDEN_MORPHEUS_LEMMAS`** — Morpheus sometimes returns compound/artifact lemmas alongside
the correct canonical one. Add them here to suppress. Examples:
- `a-eo`: appears with `aio` for the form `ait`
- `se-queror`: appears with `sequor` for `sequere`

**j→i normalization** — Morpheus uses classical `j` for Latin consonantal `i` (e.g. `jubeo`)
while the curated lemma store and PG paradigm table use `i` (e.g. `iubeō`). The parse
endpoint and paradigm endpoint both apply `j→i` substitution when looking up in
`lemmaByHeadword` so paradigm display works correctly.

**`LATIN_FORM_SUPPLEMENTS`** — Some Latin forms Morpheus returns under the wrong POS or
misses entirely. Add curated flat morphology rows here (same shape as SQLite/Morpheus rows)
and they will be injected into the parse response alongside Morpheus results. Example:
- `obsides`: Morpheus only returns verb forms (obsideo/obsido); the noun `obses` (hostage,
  nom/acc pl) is injected here.

**Adding missing lemmas** — When a word lookup succeeds but shows no definition, the lemma
is missing from `server/data/lemmas.ts`. Add it there with macronized headword, principal
parts, and shortDef. The paradigm seeder runs at startup and will pick it up on next deploy.

---

## Paradigm Generation System

Rules-based conjugation/declension tables are generated at startup and stored in the `paradigms` PostgreSQL table. The `seeded` flag prevents re-generation on restart; `ON CONFLICT DO NOTHING` makes re-seeding harmless if the flag is bypassed.

### Seeding Order (`seedParadigms` in `paradigm-db.ts`)

1. **Latin curated lemmas** — macronized principal parts from `lemmas.ts`
2. **Latin regular verbs** — ~5,980 verbs from Whitaker's DICTLINE (non-deponent, conj 1–4 incl. 3rd-io)
3. **Latin deponent verbs** — ~720 verbs from Whitaker's DICTLINE (DEP type, all conjugations)
4. **Greek curated lemmas** — principal parts from `lemmas.ts`
5. **Greek bulk — with LSJ PPs** — ~1,028 verbs where LSJ HTML yielded extractable principal parts
6. **Greek bulk — present-system only** — remaining ~14,970 verbs from morphology.db (thematic -ω only)

Total paradigm rows: ~1.4M+

### Latin Verb Generator (`generateLatinVerb`)

Input: pp1 (1sg pres act), pp2 (inf), pp3 (1sg perf act, optional), pp4 (PPP, optional)

Generates per verb:
- **Present system**: ind/subj × pres/imperf/fut × act/pass × 6 person/number = 72 finite forms
- **Imperfect subjunctive**: pp2-derived stem × act/pass
- **Perfect active system** (if pp3): ind/subj × perf/plup/futperf × act = 30 forms + perf act inf
- **Perfect passive system** (if pp4): periphrastic PPP + esse × same tenses = 30 forms
- **Imperatives**: 2sg/2pl × act/pass
- **Non-finite**: pres/perf/fut infinitives × act/pass, pres/perf/fut participles, gerund, gerundive, supine
- **Total**: ~140 rows per verb (102 if no pp4)

Conjugation detected from pp2 ending: -āre (1st), -ēre (2nd), -ere (3rd), -ere + -iō pp1 (3rd-io), -īre (4th).

### Latin Deponent Generator (`generateLatinDeponent`)

Input: pp1 (1sg pres ind passive form: loquor), pp2 (passive inf: loquī), ppp (PPP: locūtus)

Detected by pp2 ending: -ārī (1st), -ērī (2nd), -ī + -or/-ior pp1 (3rd/3rd-io), -īrī (4th).

Generates:
- **Present system**: passive endings stored as `voice='act'` (deponents are active in meaning)
- **Imperative 2sg**: stem + active-infinitive ending (loquere, arbitrāre, verēre)
- **Imperative 2pl**: stem + -minī ending (loquiminī, arbitrāminī)
- **Perfect system** (if ppp): periphrastic PPP + esse forms, `voice='act'`
- **Active-only forms**: pres act participle (loquēns), fut act participle/inf (locūtūrus/locūtūrus esse)
- **Gerund, gerundive, supine**

### Greek Thematic -ω Generator (`generateGreekVerb`)

Input: pp1–pp6 (standard six principal parts). Generates present, imperfect, future, aorist, perfect, pluperfect across indicative, subjunctive, optative, imperative, infinitive, participle in active/middle/passive voices.

Accent: `recessiveAccent()` applies recessive accent to all finite forms per Classical Greek rules (antepenult default, circumflex on penult if long with short ultima, etc.).

### Greek Contract Verb Generator (`generateGreekContractFull`)

Detects -αω/-εω/-οω contract verbs. Generates:
- **Present/imperfect**: pre-baked contracted ending tables (α, ε, ο contraction rules)
- **Future/aorist/perfect**: synthetic principal parts from contract stem + long-vowel suffix,
  delegated to `generateGreekVerb()` with filtered tenses

### LSJ Principal Parts Extraction (`lsj-principal-parts.ts`)

Parses LSJ HTML from `morphology.db` definitions table. Splits at `Pass.` boundary to separate active (pp2–pp4) from passive (pp5–pp6) sections. Extracts forms after tense labels (fut., aor., pf.) from `<span class="lex-greek">` and `<span class="lex-quote">` spans. Validates with expected endings per principal part position. Yields ~2,612 Greek lemmas with at least one extractable PP.

---

## PostgreSQL Schema

Created by `ensureSchema()` in `server/db.ts` on startup.

### User Data Tables

| Table | Key columns |
|-------|-------------|
| `users` | `id` (Google sub or UUID), `email`, `display_name`, `subscription_tier`, `tier_override`, `stripe_customer_id` |
| `user_settings` | `user_id`, `support_level`, `language_mix`, `pos_coloring` |
| `user_state` | `user_id`, `current_day`, `streak_count`, `last_completed_date` |
| `passage_completions` | `user_id`, `passage_id`, `completed_at` |
| `review_items` | `user_id`, SM-2 fields (`easiness`, `interval_days`, `next_review_at`) |
| `sessions` | Standard `connect-pg-simple` session store |
| `promo_codes` | `code`, `tier`, `max_uses`, `uses_count`, `expires_at` |
| `promo_redemptions` | `code`, `user_id` |
| `teams` | `owner_id`, `seat_limit` (academic team management) |
| `team_members` | `team_id`, `user_id`, `email`, `status` |
| `instructor_classes` | `id`, `instructor_id`, `class_name` |
| `class_students` | `class_id`, `student_id`, `email`, `status` (pending/active/removed) |
| `assigned_passages` | `class_id`, `passage_id` (nullable), `custom_text_id` (nullable), `due_date`, `instructions`; partial unique indexes per type |
| `custom_texts` | `id`, `instructor_id`, `title`, `author`, `language`, `raw_text` (≤50k chars); access-gated to owner or active class student |
| `lemma_glosses` | `lemma_id`, `language`, `gloss`, `source` (AI-generated contextual glosses) |
| `passage_context` | `passage_id`, `context_note`, `thematic_tags` |
| `parse_quota_log` | `user_id`, `date`, `count` (free-tier parse rate limiting) |

### Paradigm Table

```sql
CREATE TABLE paradigms (
  lemma         TEXT NOT NULL,
  language      TEXT NOT NULL,       -- 'latin' | 'greek'
  part_of_speech TEXT NOT NULL,
  mood          TEXT,                -- ind, subj, opt, imp, inf, part, gerundive, supine
  tense         TEXT,                -- pres, imperf, fut, perf, plup, futperf, aor
  voice         TEXT,                -- act, pass, mid, mid/pass
  person        TEXT,                -- 1st, 2nd, 3rd
  number        TEXT,                -- sg, pl, dual
  case_         TEXT,                -- nom, gen, dat, acc, voc, abl
  gender        TEXT,                -- m, f, n
  form          TEXT NOT NULL,
  source        TEXT NOT NULL,       -- 'generated' | 'morpheus-db'
  PRIMARY KEY (lemma, language, part_of_speech, mood, tense, voice, person, number, case_, gender, form)
);
```

### Paradigm Persistence and Auto-Recovery

Paradigm data lives in the `lector_pg_data` PostgreSQL volume and is **backed up nightly** by the conductor `full-backup.sh` script as `postgres-lector.dump`. It persists across all normal operations: container restarts, image rebuilds, `docker compose down` (without `-v`), and VPS reboots.

**Auto-seeding on startup:**
`seedParadigms()` is called on every app startup. The in-process `seeded` flag (`let seeded = false` in `paradigm-db.ts`) resets when the container restarts. On startup it is always called, but `ON CONFLICT DO NOTHING` means any rows already present are silently skipped — seeding is idempotent. If the `paradigms` table is empty (e.g. after volume loss or `DELETE FROM paradigms`), the seeder regenerates all ~1.4M rows from scratch automatically without any manual intervention. This takes a few minutes; the app serves requests normally while seeding runs in the background.

**What the seeder needs to recover fully:**

| Source data | Location | Lost if | Recovery |
|-------------|----------|---------|----------|
| Curated lemmas + principal parts | `server/data/lemmas.ts` | Never — baked in image | Full Latin + ~43 Greek curated lemmas |
| Whitaker's DICTLINE | `/app/dict/` in image | Never — baked in image | ~5,980 regular + ~720 deponent Latin verbs |
| `morphology.db` (Greek verb lemmas + LSJ defs) | `lector_data` Docker volume | `docker compose down -v` or volume deletion | ~16K Greek verbs; also backed up nightly via `backup_volume lector_data` in conductor |

**Failure scenarios:**

- **Normal restart / image rebuild**: No data loss. Seeder runs, skips all existing rows via `ON CONFLICT DO NOTHING`. ✅
- **`lector_pg_data` volume lost** (e.g. accidental `docker compose down -v`): Paradigm table is empty. Seeder auto-regenerates everything from image-baked DICTLINE + `morphology.db` in `lector_data`. Full recovery in ~5 minutes. ✅
- **`lector_data` volume lost** (morphology.db gone): Paradigm table already has data — nothing happens. If paradigm table was also lost simultaneously, Greek bulk coverage drops to ~43 curated verbs until `morphology.db` is restored from backup. ⚠️
- **Both volumes lost simultaneously**: Restore `postgres-lector.dump` from conductor backup (covers paradigm table), or restore `lector_data` from conductor backup and let seeder regenerate. Conductor backs up both nightly. ✅ with backup restore.

**To force a full re-seed:**
```bash
docker exec lector-postgres psql -U lector -d lector -c "TRUNCATE paradigms;"
docker compose restart lector
# Monitor: docker logs -f lector-app | grep paradigm-db
```

---

## Authentication

Google OAuth 2.0 via Passport.js. `AUTH_ENABLED=false` uses `default-user` for local dev.

### Auth Routes

| Route | Method | Description |
|-------|--------|-------------|
| `/api/auth/status` | GET | `{ enabled, authenticated, user }` |
| `/api/auth/google` | GET | Initiates OAuth flow |
| `/api/auth/google/callback` | GET | OAuth callback → redirects to `/#/` |
| `/api/auth/logout` | POST | Destroys session |

### Environment Variables

```env
# Core
DATABASE_URL=postgres://lector:<password>@lector-postgres:5432/lector
AUTH_ENABLED=true
SESSION_SECRET=<long-random-secret>
PORT=5000

# Google OAuth
GOOGLE_CLIENT_ID=<client-id>
GOOGLE_CLIENT_SECRET=<client-secret>
GOOGLE_REDIRECT_URI=https://yourdomain.com/api/auth/google/callback

# Morphology
MORPHOLOGY_DB_PATH=/app/data/morphology.db
MORPHEUS_API_URL=http://morpheus-api:5100

# Email (Resend)
RESEND_API_KEY=re_...

# Stripe billing
STRIPE_SECRET_KEY=sk_live_...
STRIPE_WEBHOOK_SECRET=whsec_...
STRIPE_PRO_PRICE_ID=price_...
STRIPE_INSTRUCTOR_PRICE_ID=price_...
STRIPE_ACADEMIC_PRICE_ID=price_...

# AI (glosses + passage context)
OPENAI_API_KEY=sk-proj-...
ANTHROPIC_API_KEY=sk-ant-...
```

---

## Rebuilding and Deploying

The repo lives at `/opt/lector` — edit in place, build, and push.

```bash
# From /opt/lector:

# Rebuild app after TypeScript/React changes
docker compose build lector && docker compose up -d lector

# Rebuild morpheus-api after server.py changes
docker compose build morpheus-api && docker compose up -d morpheus-api

# Force re-seed paradigms (delete all rows, restart app)
docker exec lector-postgres psql -U lector -d lector -c "TRUNCATE paradigms;"
docker compose restart lector

# Env var change only — no rebuild needed, just restart
docker compose up -d lector
```

After code changes: `git add`, `git commit`, `git push` to `rutgersguy/lector`.

---

## Key Files (Phase 5C–5D additions)

```
client/src/pages/instructor.tsx  — Instructor dashboard: Classes + My Texts tabs, AssignModal,
                                   TextsPanel, ClassDetail with Roster/Assignments/Dashboard
client/src/pages/read.tsx        — Standalone passage/text preview reader (/read/passage/:id,
                                   /read/text/:id); amber "Preview mode" banner; auth-gated
client/src/pages/today.tsx       — Student view: Assignment panel (corpus + custom texts),
                                   inline passage/text loading, Mark as Done hidden for custom texts
server/instructor.ts             — All instructor + student routes; custom texts CRUD;
                                   linkPendingClassInvites() called from auth.ts on signup
```

---

## What's Not Yet Built

See `ROADMAP.md` for full phase status. Key remaining items:

- **Pre-launch polish** — Dark mode persistence, Playwright UI test suite (issues #15–19), account deletion
- **Phase 5E** — Analytics dashboard (learning curves, grammar heatmap, cohort streaks)
- **Phase 6** — Multi-instructor academic tier (gated on 20+ paying instructor customers)
- **Phase 7** — iOS/Android app (React Native; gated on 100+ Pro + 10+ Instructor customers)

### Paradigm gaps (minor)
- Some rare forms still miss SQLite coverage — handled by Morpheus sidecar fallback
- Dialectal form mapping (Ionic ↔ Attic) not implemented
- User-submitted form corrections not implemented
