Add say voice replies skill

2026-02-14 10:21:35 -05:00
parent 7f3a2ee6b1
commit d7c92d20e8
2 changed files with 78 additions and 0 deletions
@@ -0,0 +1,78 @@
+---
+name: say-voice-replies
+description: Deliver short, hands-free responses by turning text into macOS `say` voice notes saved in the system temp directory and sending them as Telegram voice messages. Use whenever Adolfo asks for audio answers (e.g., while driving) or you decide speech will land better than text.
+---
+
+# Say Voice Replies
+
+## Overview
+Use this skill to quickly convert any response into an audio note using the built-in macOS `say` command and deliver it through Telegram as a voice message. The key constraints we uncovered on 2026-02-14 are that the `message` tool refuses files from the workspace and only accepts media written inside the macOS temporary directory, so every workflow below is designed around that.
+
+## Quick Start
+1. Draft the spoken script (aim for 10–25 seconds unless the user asked for longer).
+2. Use the macOS default `say` voice and rate unless Adolfo specifically requests a different vibe.
+3. Generate the audio in `$TMPDIR` using `say` → send it with the `message` tool using `asVoice: true`.
+4. (Optional) Delete the temp file once the voice note is confirmed delivered.
+
+## Workflow
+
+### 1. Prepare the narration
+- Keep phrasing conversational and note any emoji you want to read out loud.
+- If the reply needs structure, say the headings (e.g., “Update one… Update two…”).
+- Sum up the audio in text when sending the message (“Audio reply: ETA + blockers”) so the user knows what the note contains before tapping it.
+
+### 2. Generate a temp file that the `message` tool accepts
+MacOS exposes an agent-specific temp dir via `$TMPDIR`, which resolves to `/var/folders/...` — the only location Telegram uploads accepted during testing.
+
+```bash
+TMPFILE=$(mktemp "${TMPDIR:-/tmp}/brovoice_XXXXXX")
+OUTFILE="${TMPFILE}.m4a"
+```
+
+Use `say` to create the audio (AAC keeps the file small and Telegram-friendly):
+
+```bash
+say --data-format=aac \
+    "Yo bro, leaving the studio now. Should hit the church in fifteen." \
+    -o "$OUTFILE"
+```
+
+Notes:
+- The stock system voice/rate keeps everything consistent; only add `-v <VoiceName>` or `-r <speed>` if Adolfo requests a different vibe.
+- If you need emphasis, split into multiple `say` calls and concatenate with `afconvert`, but most replies can be cut in one take.
+- `say` writes synchronously; no extra waits are needed before sending.
+
+### 3. Send the voice note via Telegram
+The `message` tool needs three things:
+
+```json
+{
+  "action": "send",
+  "channel": "telegram",
+  "media": "/var/folders/.../brovoice_a1b2c3.m4a",
+  "asVoice": true,
+  "message": "Audio reply: key takeaways + ETA"
+}
+```
+
+Guidelines:
+- Always set `asVoice: true` so Telegram renders the clip as a push-to-play voice message instead of a regular file.
+- Keep a short text caption that previews the content (helps when the user can’t tap immediately).
+- If the user explicitly wants only audio, you can keep the caption minimal (e.g., “Audio-only reply”).
+
+### 4. Clean up
+Once Telegram confirms `{ "ok": true }`, remove the temp file:
+
+```bash
+rm -f "$OUTFILE"
+```
+
+This avoids leaving dozens of `.m4a` files inside `/var/folders/...`.
+
+## Tips & Variations
+- **Faster follow-ups:** Capture the temp path and reuse it if you plan to cut a second take immediately; otherwise regenerate with `mktemp`.
+- **Speaking rate tweaks (optional):** If Adolfo asks for faster/slower delivery, `say -r 160` ≈ relaxed, `-r 190` ≈ energetic; stay under `-r 210` for intelligibility while driving.
+- **Edge cases:** If `say` errors (rare on headless sessions), fall back to the built-in `tts` tool—it already writes to the same temp tree and produces a `MEDIA:/path/to/file.mp3` string you can feed directly into `message`.
+- **Captioning:** When summarizing long instructions, include bullet cues in the text reply while the audio carries the nuance.
+
+Stick to this flow whenever Adolfo asks for a voice response (especially while driving) or when tone-of-voice will land better than plain text.