---
name: ingest-skill
description: "Convert a pCloud directory of course materials (PDFs, docs, text, and video files) into a draft SKILL.md. Transcribes videos with local Whisper (OpenAI API fallback), extracts text from docs/PDFs, then generates structured content with Claude via n8n. Trigger: any request to ingest course materials, create a skill from files, turn a folder into a skill, or learn from a pCloud directory."
user-invocable: true
metadata:
  { "openclaw": { "emoji": "📥" } }
---

# Ingest Skill from Course Materials

Convert a pCloud folder of files into a usable `SKILL.md` automatically. Handles PDFs, Word docs, plain text, and video files (Whisper transcription).

## Trigger Patterns

- "ingest [folder] as a skill called [name]"
- "turn those files into a skill"
- "create a skill from [pCloud folder]"
- "make a skill from my [course / videos / materials]"
- "generate a skill from [pCloud path]"
- "learn from the files in [folder]"

---

## How to Run

Ask for the two required pieces of info if not provided, then run:

```bash
python3 /home/node/.openclaw/workspace/scripts/ingest-skill.py \
  --pcloud-dir "/Your pCloud Folder" \
  --skill-name "your-skill-name"
```

**Use the full absolute path** — exec preflight blocks `cd ... && python3` patterns. Always pass the script as an absolute path.

After it finishes, tell the user where the draft landed and what files were processed.

---

## Parameters

| Flag | Required | Description |
|------|----------|-------------|
| `--pcloud-dir` | ✓ | pCloud path to ingest (e.g. `/Latin 101`) |
| `--skill-name` | ✓ | Output skill slug, lowercase-hyphenated (e.g. `latin-101`) |
| `--recursive` | | Also ingest subfolders |
| `--dry-run` | | List files without generating SKILL.md |
| `--no-commit` | | Write draft but don't git commit/push |
| `--no-n8n` | | Call Claude directly, bypass n8n webhook |

---

## What Happens

1. Lists the pCloud directory via `PCloudClient`
2. Downloads each supported file
3. Extracts content per type:
   - **PDFs** → text extraction via `pdf_extract.py`
   - **Text/Markdown/CSV** → read directly
   - **Video/Audio** → transcribed with local Whisper (`base` model, ffmpeg strips audio); falls back to OpenAI Whisper API automatically if local fails
4. POSTs all extracted text to n8n webhook → Claude generates `SKILL.md`
5. Writes draft to `skills/drafts/<skill-name>/SKILL.md`
6. Commits and pushes to GitHub

If the n8n webhook is inactive, the script falls back to calling Claude directly.

---

## Supported File Types

| Type | Handling |
|------|----------|
| `.pdf` | Text extraction |
| `.txt`, `.md`, `.rst`, `.csv` | Read directly |
| `.mp4`, `.mov`, `.avi`, `.mkv`, `.webm`, `.m4v` | Local Whisper → OpenAI API fallback |
| `.mp3`, `.m4a`, `.wav`, `.ogg`, `.flac`, `.aac` | Local Whisper → OpenAI API fallback |
| `.docx` | python-docx extraction |
| Images, zip, executables | Skipped |
| Non-media files > 20MB | Skipped |
| Video/audio > 500MB | Skipped |

---

## Output

`skills/drafts/<skill-name>/SKILL.md`

Drafts are **not** auto-loaded by conductor. Once reviewed, move to `skills/<skill-name>/SKILL.md` to activate.

---

## Constraints

- `PCLOUD_ACCESS_TOKEN` must be set (it is — in workspace `.env`)
- `ANTHROPIC_API_KEY` must be set (it is)
- `OPENAI_API_KEY` used only if local Whisper fails (it is set)
- Max ~100K chars of combined text sent to Claude
- Video transcription for files >25MB requires ffmpeg (installed); OpenAI API path rejects files over 25MB
