# Lector App — Comprehensive Review & Enhancement Roadmap

**Date:** April 5, 2026  
**Reviewer:** Conductor AI Agent  
**Scope:** Architecture, current features, UX, and strategic improvements  
**Based on:** GitHub repo at `rutgersguy/lector` (full codebase review)

---

## EXECUTIVE SUMMARY

**Lector is production-quality for Phase 1–4.** It's a well-architected web app with:
- ✅ Solid morphological parsing (Morpheus + SQLite + fallback paradigm tables)
- ✅ Proper multi-user support (Google OAuth, Stripe billing, tier management)
- ✅ Clean UX (React + Tailwind, 100% word clickability)
- ✅ SRS implementation (SM-2 algorithm, review persistence)
- ✅ Infrastructure (Docker Compose, PostgreSQL, self-hosted)

**Gaps & Opportunities:**
1. **No AI-assisted learning layer** (your wife's suggestion has merit)
2. **Limited content enrichment** (passages have translations but minimal context)
3. **No instructor/classroom tools** (important if you want to monetize to educators)
4. **Analytics/progress dashboards incomplete** (Phase 5 partially done)
5. **Mobile app not started** (Phase 6 critical for scale/retention)

**Immediate high-ROI improvements:**
- Add smart glosses (AI augmentation of definitions) — **quick win**
- Build instructor dashboard (assign passages, track student progress) — **monetization play**
- Implement grammar explanation hints (during SRS review) — **engagement booster**
- Add passage context cards (historical, theological, literary notes) — **depth**

---

## ARCHITECTURE REVIEW

### Strengths

#### **1. Three-Tier Parser Strategy (Smart Fallback Chain)**

```
SQLite morphology.db (222K entries, read-only)
    ↓ [not found]
Morpheus sidecar (Python HTTP wrapper)
    ↓ [rare forms]
Hardcoded paradigm tables (common -μι verbs, pronouns, high-freq)
```

**Why this works:**
- Fast (SQLite queries <5ms)
- Comprehensive (Morpheus handles obscure forms)
- Resilient (falls back gracefully, no blank lookups)
- Accentless Greek input handled via variant generation (clever)

**Improvement:** Cache Morpheus results to avoid repeated sidecar hits on same form.

---

#### **2. Multi-User Isolation Done Right**

Every route is keyed by `userId` (from `req.userId` set by auth middleware):
- Settings, SRS queue, progress, parse quota, billing tier — all isolated
- No data leakage risk; supports academic teams + free tier

**Missing:** User profile page (UI not built, schema exists).

---

#### **3. Billing Tier Logic Is Clean**

Free: 10 parses/day  
Pro: $4.99/mo, unlimited parses  
Academic: $49.99/mo, team seats + usage dashboard

- Stripe integration is there (checkout, portal, webhooks, promo codes)
- Admin panel for tier override + promo management
- Parse quota enforced at the API level (not UI — good security)

**Opportunity:** Sell instructor tier ($19.99–$49.99/mo) with class management + assignment tracking.

---

### Gaps & Weaknesses

#### **1. No Content Context Layer**

Passages have:
- ✅ Greek/Latin text + English translation
- ✅ Word-by-word glosses (frequency-based)
- ✅ Morphological parsing

Missing:
- ❌ Historical/cultural context (who wrote this? when? why?)
- ❌ Theological context (for biblical texts)
- ❌ Literary references (allusions, Homeric echoes)
- ❌ Passage difficulty rating (user-generated or ML-computed)
- ❌ Thematic tags (politics, ethics, grammar concept, etc.)

**This is a major content gap.** Intermediate readers want to understand *why* they're reading this passage, not just parse it.

---

#### **2. SRS Review Is Bare-Bones**

SM-2 algorithm is correctly implemented (easiness, interval scheduling).  
But:
- ❌ No hint system (partial reveals during review)
- ❌ No contextual replay (show passage where word appeared)
- ❌ No "learning mode" (study → practice → review flow)
- ❌ No leaderboards or streaks (gamification optional but valuable)
- ❌ No deck organization (all words in one queue)

**Result:** SRS works but feels utilitarian, not engaging.

---

#### **3. Analytics / Progress Missing**

Phase 5 roadmap says "analytics & SRS" but only have:
- Raw stats endpoint (`/api/review/stats`) — just counts
- Anki export (/api/review/export)

Missing:
- ❌ Learning curve chart (words mastered over time)
- ❌ Weak areas (which grammar concepts are hard)
- ❌ Reading velocity (passages completed per week)
- ❌ Retention metrics (% of SRS cards mastered)
- ❌ Time-on-task (avg session length, daily engagement)

**Without analytics, you can't optimize learning or pitch to instructors.**

---

#### **4. Classroom/Instructor Tools Non-Existent**

Current app is solo reader + billing.  
Missing:
- ❌ Instructor registration (special tier)
- ❌ Class/section creation (organize students)
- ❌ Passage assignment (teacher picks specific passage for date)
- ❌ Completion tracking (did student read passage X?)
- ❌ SRS performance visibility (teacher sees weak vocabulary)
- ❌ Class roster & grading (partial credit for effort)
- ❌ Homework submission (passages due by date)

**This is a strategic blocker for selling to high schools / universities.**

---

#### **5. Passages Corpus Is Imbalanced**

From ROADMAP:

| Corpus | Count | Notes |
|--------|-------|-------|
| Livy (Ab Urbe Condita) | 4,312 | ✅ |
| Others (Xenophon, Caesar, Aristotle, Homer, etc.) | ~5,000 | ✅ |
| **Seneca Epistles** | 0 | **Removed** — no machine-matchable English |
| **Juvenal Satirae** | 0 | **Removed** — no line-numbered translation |

**Problem:** You have 9,755 passages but lost key texts due to translation sourcing.  
**Solution:** Ingest public-domain Gutenberg translations manually (tedious but doable), or find academic translations with open access.

**Quick win:** Add 500+ more passages from:
- Pliny's Letters (Gutenberg PG 2811)
- Marcus Aurelius *Meditations* (Gutenberg PG 2680)
- Epictetus *Discourses* (Gutenberg)

---

### Architecture Recommendations

| Issue | Severity | Fix |
|-------|----------|-----|
| No content context layer | High | Build passage metadata table: historical_context, difficulty, themes, author_notes |
| SRS review too bare | Medium | Add hint system, contextual replay, deck organization |
| Analytics missing | High | Implement learning dashboard (Phase 5 finish) |
| No instructor tools | High | Build instructor tier + class management (new feature, high ROI) |
| Passages corpus gaps | Medium | Ingest more translations (Seneca, Juvenal, etc.) |
| User profile page missing | Low | Build `/profile` page, account deletion, data export |

---

## FEATURE REVIEW

### What's Working

✅ **Daily passage delivery** — simple, effective, low friction  
✅ **Word parsing** — 3-tier strategy is solid  
✅ **Morphology popup** — shows lemma, morphology, definition, links to external tools  
✅ **POS color coding** — visual aid, toggleable  
✅ **Frequency-based gloss suppression** — smart (hides common words at higher difficulty)  
✅ **SRS queue management** — SM-2 correct, persistent across sessions  
✅ **Offline-first morphology DB** — no network required for local lookups  
✅ **Anki export** — good integration with existing Anki users  
✅ **Billing integration** — Stripe checkout, team management, promo codes  
✅ **Auth** — Google OAuth + email/password + multi-user isolation  

---

### What Needs Work

#### **1. Passage Context (Your Wife's "AI Lessons" Idea Fits Here)**

**Current state:** Click word → see morphology. Done.

**Missing:** Passage-level learning.

**Smart approach** (phases):

**Phase 1 (Immediate): Smart Glosses**
- When user clicks word: show AI-generated gloss (2–3 sentences on usage, cultural notes)
- Gloss is generated once per lemma (cached), not per passage
- Grounded in Whitaker's etymology, not hallucinated
- Link to SRS card for that word
- Example: κοινή → "the koiné Greek dialect (common Greek of the Eastern Mediterranean, used by this author for accessibility)"

**Cost:** ~500 tokens per lemma × 5,000 unique lemmas = $2–3 one-time with GPT-4 mini. Cache in DB.

**Phase 2 (1–2 weeks): Passage Context Cards**
- After passage text, show 1 optional card:
  - Historical context (50 words max)
  - Author bio + date
  - Thematic tags (e.g. #stoicism, #politics, #ethics)
  - Difficulty rating
- Generate once per passage, store in DB
- Make optional (don't clutter UI)

**Cost:** ~200 tokens per passage × 9,755 = ~2M tokens = $1–2 with GPT-4 mini. One-time.

**Phase 3 (Optional): Exercise Generation**
- After passage read, offer 3–5 practice exercises:
  - Fill-in-the-blank (conjugation/declension)
  - Parsing puzzle ("identify the dative plural")
  - Translation cloze
- Tied to SRS (appears on schedule like vocab review)
- Generate from passage text itself (no hallucination risk)

---

#### **2. Instructor Tools (Monetization Play)**

**Market:** High school Classics teachers, university instructors, summer programs.

**What they need:**
1. **Class creation** ("AP Latin 2026", "Aristotle Seminar")
2. **Student roster** (add students by email)
3. **Passage assignment** (teacher picks passage for specific date)
4. **Completion dashboard** (who read passage X by due date?)
5. **Progress tracking** (aggregate SRS scores, weak vocab)
6. **Gradebook** (optional: manual grading + participation points)

**Business model:** 
- Instructor tier: $49.99/mo (10 students) or $99.99/mo (50 students)
- Students use free tier but data is linked to instructor's class

**Time to MVP:** 2–3 weeks (class CRUD, roster management, simple completion tracking)

**High-ROI because:**
- Teachers drive adoption (word-of-mouth to students)
- Recurring revenue (teacher pays/semester)
- Defensible niche (competitors don't have this)

---

#### **3. Analytics Dashboard**

**Current:** Raw stats endpoint only.

**Needed:**
- Line chart: vocabulary mastered over time (words → % retention)
- Heatmap: which grammar concepts are weak (verb moods, declensions, etc.)
- Reading streak: days with passages completed
- Engagement: avg session length, review accuracy
- Cohort analysis: "students who review 3+ days/week pass exams 40% more often" (instructor view)

**Stack:** React + recharts or D3.js on frontend, simple aggregation queries on backend.

**Time:** 1 week for MVP (core charts).

---

#### **4. Deck Organization for SRS**

**Current:** All review items in one queue.

**Better:**
- Create decks by theme (e.g. "Stoic Ethics", "Military Verbs", "Contract Verbs")
- Users can organize their own decks
- Study mode: pick a deck, focus on that topic
- Report shows per-deck retention

**Time:** 3–4 days (schema changes + UI).

---

### Feature Priority Matrix

| Feature | ROI | Effort | Priority |
|---------|-----|--------|----------|
| **Smart glosses (Phase 1)** | High | Low | 🔴 Do First |
| **Instructor tools (MVP)** | Very High | Medium | 🔴 Do First |
| **Analytics dashboard** | High | Medium | 🟠 Do Next |
| **Passage context cards (Phase 2)** | Medium | Low | 🟠 Do Next |
| **Deck organization** | Medium | Low | 🟠 Do Next |
| **Exercise generation (Phase 3)** | Medium | Medium | 🟡 Later |
| **User profile page** | Low | Low | 🟡 Later |
| **Mobile app (Phase 6)** | Very High | Very High | 🟡 Later |

---

## YOUR WIFE'S SUGGESTION: AI-GENERATED LESSONS

**Verdict:** Good instinct, but needs careful scoping.

### Why AI Lessons Are Appealing

✅ Scales content (no hand-writing 9,755 lesson explanations)  
✅ Personalizes learning (AI can explain the concept you struggled on)  
✅ Fills gap (between "I clicked a word" and "I understand deeply")  

### Why They're Risky (If Done Naively)

❌ **Hallucination:** AI invents false etymologies or historical facts  
❌ **Token cost:** Expensive to generate unique explanations per passage  
❌ **Quality inconsistency:** Some AI explanations will be confusing  
❌ **Scope creep:** "Lessons" can balloon into full curriculum  

### Better Approach: Hybrid Model

**Don't generate "lessons"** — instead, generate **augmentations to existing content**:

1. **Smart glosses** (Phase 1): AI explains a word's *usage and cultural context*, grounded in etymology + morphology
   - Example: μητέρα → "The accusative form of μήτηρ (mother). Matriarchal structure in Greek society is…"
   - Cost: Low (reuse per lemma)
   - Risk: Low (etymology from Whitaker's, cultural notes are factual)

2. **Passage hints** (Phase 2): AI generates optional hints for difficult passages
   - "This passage uses the aorist subjunctive for hypothetical. Aorist = single action, subjunctive = not-yet-real."
   - Shown on review card if user requests
   - Cost: Medium (per passage, cached)
   - Risk: Low (explain grammar rules, not interpret)

3. **Contextual exercises** (Phase 3): AI generates grammar-focused practice problems
   - "Identify all aorist subjunctive forms in this passage"
   - Based on actual passage text (no hallucination)
   - Cost: Medium
   - Risk: Very Low

### Recommended Implementation Path

```
Week 1–2: Smart glosses (Phase 1 MVP)
  └─ Batch-generate 5,000 lemmas' glosses (GPT-4 mini, ~$2–3)
     Cache in glosses table
     Show in word popup

Week 3–4: Passage context (Phase 2 MVP)
  └─ Batch-generate 9,755 passages' context (GPT-4 mini, ~$1–2)
     Cache in passages table
     Optional card below passage text

Week 5+: Deck org + Analytics (parallel work)
  └─ Not AI, but high-value

Month 2: Exercise generation (Phase 3, if demand)
  └─ Auto-generate practice problems from passage text
     Feed into SRS schedule
```

**Estimated cost:** $5–10 total for all AI content generation (one-time). ~4 weeks work.

---

## DEPLOYMENT & INFRASTRUCTURE NOTES

### Current (Good)
- ✅ Docker Compose setup is clean
- ✅ PostgreSQL for user data (proper indexes exist)
- ✅ SQLite for morphology.db (read-only, 100MB)
- ✅ Morpheus sidecar in Python (microservice pattern)
- ✅ Nginx SSL reverse proxy
- ✅ Self-hosted on single Hostinger VPS (2 cores, 8GB RAM)

### Scalability Concerns
- Single VPS can handle ~50–100 concurrent users comfortably
- Beyond that, need:
  - PostgreSQL replication (read replicas for heavy analytics)
  - Redis for caching (SRS queries, parse results)
  - Separate Morpheus sidecar (it's CPU-heavy on rare forms)

### Recommendation
**For now:** Current setup is fine (you're likely <50 users).  
**When traffic grows:** Plan migration to 2-server setup (app + DB on separate boxes).

---

## TESTING & QA STATUS

**From ROADMAP:**
- ✅ 31 API tests (all passing) — coverage of auth, billing, access control
- ❌ Playwright UI tests — not yet implemented (issue #15)

**Recommendations:**
1. **Add Playwright tests** for critical user flows:
   - Auth: Google OAuth, email/password, logout
   - Reading: daily passage, word parsing, gloss popup
   - SRS: add card, answer, grade, summary
   - Billing: tier upgrade, promo code, invoice

2. **Manual testing checklist** before any public launch:
   - Cross-browser (Chrome, Safari, Firefox)
   - Mobile browsers (iOS Safari, Android Chrome)
   - Dark mode toggle
   - Offline morphology.db lookup (disable network, parse word)
   - Morpheus sidecar failover (kill process, verify graceful fallback)

---

## SECURITY AUDIT (Quick Review)

✅ **Auth:** Google OAuth + email/password (no plaintext passwords)  
✅ **User isolation:** Every route keyed by userId (no leakage)  
✅ **Parse quota:** Enforced server-side (not UI-only)  
✅ **Admin endpoints:** Behind `requireAdmin` middleware  
✅ **HTTPS:** Nginx SSL enabled  

⚠️ **Minor concerns:**
- Session store in PostgreSQL (good, but ensure `SESSION_SECRET` is long + rotated)
- Admin panel exposed at `/api/admin/*` (no UI, but document that it exists)
- Promo codes in cleartext in DB (fine for marketing, not a secret)

**Recommendation:** Add rate limiting to `/api/parse` and `/api/auth/login` (DoS protection).

---

## COMPETITIVE POSITIONING

### Direct Competitors

1. **Alpheios** (browser extension)
   - Browser integration (handy, but limited)
   - Morphology is basic
   - No SRS

2. **Logeion** (dictionary lookup)
   - LSJ + L&S in browser
   - No reading app, no SRS

3. **Tesserae** (intertextual allusion finder)
   - Finds parallels between texts
   - Not a reading tool

4. **Scaife Viewer** (text reader)
   - Open Greek & Latin corpus
   - No morphological parsing
   - No SRS

### Lector's Moat

✅ **Integrated reading + parsing + SRS** (no one else has all three)  
✅ **Smart parsing** (3-tier strategy, accentless input)  
✅ **Self-hosted** (privacy, offline, customizable)  
✅ **Passage curation + daily delivery** (habit-forming)  
✅ **Billing + multi-user** (monetizable)  

**Gap:** No AI differentiation yet. Smart glosses + passage context would be defensible features.

---

## ROADMAP RECOMMENDATIONS (Prioritized)

### Q2 2026 (April–June)

**High Priority (Do These First):**
1. ✅ Landing page (in progress)
2. 🔴 Smart glosses (Phase 1 AI, ~2 weeks)
3. 🔴 Instructor tools MVP (class CRUD, roster, assignment, ~3 weeks)
4. 🔴 Analytics dashboard (basic charts, ~1 week)
5. 🟠 Passage context cards (Phase 2 AI, ~1 week)

**Medium Priority:**
6. 🟡 Deck organization for SRS (~3 days)
7. 🟡 Playwright UI tests (critical flows, ~1 week)
8. 🟡 User profile page (account deletion, data export, ~3 days)

**Low Priority (Defer):**
9. ⬜ Exercise generation (Phase 3 AI, wait for user demand)
10. ⬜ Mobile app (Phase 6, summer 2026 maybe)

---

### Q3 2026 (July–September)

**If instructor tools gain traction:**
- Gradebook + manual grading
- Class performance analytics (cohort insights)
- Homework submission + due dates
- Peer learning (study groups)

**If analytics adoption is strong:**
- Predictive alerts ("student X is at risk based on vocabulary retention")
- Personalized lesson recommendations (AI suggests passages for weak areas)

**Mobile readiness:**
- Finalize API for app support
- Plan React Native port
- Offline syncing design

---

## SPECIFIC IMPLEMENTATION SUGGESTIONS

### Smart Glosses (Quick Win)

**Schema:**
```sql
CREATE TABLE lemma_glosses (
  lemma_id TEXT PRIMARY KEY,
  language TEXT,
  gloss TEXT,  -- AI-generated, 2–3 sentences
  generated_at TIMESTAMP,
  source TEXT  -- 'gpt4-mini' or 'manual'
);
```

**Generation (one-time batch):**
```python
# Pseudo-code
import openai

for lemma in allLemmas:
  prompt = f"""
    Provide a 2–3 sentence gloss for the {lemma.language} word "{lemma.headword}".
    Explain: usage context, cultural significance, morphological notes.
    Base on etymology: {lemma.etymology}
    Keep it academic but accessible.
  """
  
  gloss = openai.ChatCompletion.create(
    model="gpt-4-mini",
    messages=[{"role": "user", "content": prompt}],
    temperature=0.7,
    max_tokens=150
  )
  
  db.insert("lemma_glosses", {
    lemma_id: lemma.id,
    language: lemma.language,
    gloss: gloss.choices[0].message.content,
    source: "gpt4-mini"
  })
```

**Integration (word popup):**
```tsx
// In word-popup component
const [gloss, setGloss] = useState(null);

useEffect(() => {
  fetch(`/api/lemma-gloss/${lemmaId}`)
    .then(res => res.json())
    .then(data => setGloss(data.gloss));
}, [lemmaId]);

return (
  <div>
    <h3>{definition}</h3>
    <p className="text-sm text-gray-600">{morphology}</p>
    {gloss && <p className="italic text-sm">{gloss}</p>}
  </div>
);
```

**Cost & Time:**
- Batch generation: 2–3 hours (async job)
- Cost: ~$2–3 (GPT-4 mini)
- Integration: 2–3 hours (API endpoint + UI)

---

### Instructor Tools MVP

**Schema:**
```sql
CREATE TABLE instructor_classes (
  id UUID PRIMARY KEY,
  instructor_id TEXT,
  name TEXT,        -- "AP Latin 2026"
  created_at TIMESTAMP
);

CREATE TABLE class_students (
  id UUID PRIMARY KEY,
  class_id UUID,
  student_email TEXT,
  joined_at TIMESTAMP
);

CREATE TABLE assigned_passages (
  id UUID PRIMARY KEY,
  class_id UUID,
  passage_id TEXT,
  assigned_date DATE,
  due_date DATE,
  created_at TIMESTAMP
);

CREATE TABLE passage_completions_per_student (
  id UUID PRIMARY KEY,
  student_id TEXT,
  passage_id TEXT,
  completed_at TIMESTAMP
);
```

**Routes (MVP):**
```
POST   /api/instructor/classes               -- create class
GET    /api/instructor/classes                -- list instructor's classes
POST   /api/instructor/classes/:classId/students   -- add student
GET    /api/instructor/classes/:classId/students   -- roster
POST   /api/instructor/classes/:classId/assign     -- assign passage
GET    /api/instructor/classes/:classId/dashboard  -- completion dashboard
```

**UI:**
```
Instructor Dashboard
├─ Classes sidebar
├─ Class: Aristotle Seminar
│  ├─ Roster (5 students, invite links)
│  ├─ Passages assigned (today, next week, all)
│  └─ Completion %: πολιτεία (Ch. 1) → 4/5 (80%)
└─ Add passage (date picker, preview)
```

**Time:** 3–4 weeks (schema + routes + UI).

---

## CONCLUSION & NEXT STEPS

**Lector is a solid foundation.** You've built a proper reading app with parsing, SRS, and multi-user support. The architecture is clean and scalable.

**To unlock its potential, prioritize:**

1. **Landing page** (in progress) — critical for user acquisition
2. **Smart glosses** — quick win, AI differentiator
3. **Instructor tools** — monetization + word-of-mouth growth
4. **Analytics** — needed to optimize learning + pitch to educators

**Your wife's suggestion about AI lessons is good.** Frame it as "smart glosses + passage context" (Phase 1 & 2), not full lesson generation. Low risk, high engagement.

**Timeline:** 8–12 weeks to shipping Q2 improvements (landing page, glosses, instructor MVP, analytics). Then assess adoption and plan mobile app (Phase 6) for summer.

**Budget:** ~$10 for one-time AI content generation (glosses + context for all passages). Time investment: 200+ hours over 2–3 months.

---

**Questions for you:**
1. What's your target audience: solo learners, high school teachers, university departments, or all?
2. What's the revenue target: hobby ($0), side income ($500/mo), or venture ($50K+/yr)?
3. When do you want to launch mobile (if ever)?
4. Who's your competitive advantage: parsing quality, teacher tools, SRS, or something else?

---

**Document prepared:** 2026-04-05  
**File path:** `/home/node/.openclaw/workspace/LECTOR_APP_REVIEW.md`
