Merge branch 'main' of github.com:nousresearch/hermes-agent
This commit is contained in:
142
Project_notes.md
Normal file
142
Project_notes.md
Normal file
@@ -0,0 +1,142 @@
|
||||
# Project Notes
|
||||
|
||||
*Maintained by Hermes — last updated June 2025*
|
||||
|
||||
---
|
||||
|
||||
## 1. Kandinsky (Multimodal Transformer)
|
||||
- **Repo:** https://github.com/samherring99/kandinsky
|
||||
- **Local path:** `~/Desktop/Projects/kandinsky`
|
||||
- **Description:** An anything-to-anything transformer combining text, image, and audio modalities. Trains on Pokemon BLIP captions paired with Gen 1 Pokemon audio cries. Uses audio tokenization adapted from nanoGPT.
|
||||
- **Status:** Early POC. Training code exists (`model.py`) and dataset creation (`create_dataset.py`) works. Audio heads are producing the same sound — unclear if it's a training issue or data issue.
|
||||
- **TODO:**
|
||||
- Debug why audio heads produce identical output
|
||||
- Investigate if model needs more training time
|
||||
- Design a data pipeline for better/more training data
|
||||
- General repo cleanup (requirements.txt, proper CLI, etc.)
|
||||
|
||||
---
|
||||
|
||||
## 2. NightwingGameSim (LLM → GameBoy ROM Generator)
|
||||
- **Repo:** https://github.com/samherring99/NightwingGameSim
|
||||
- **Local path:** `~/Desktop/Projects/NightwingGameSim`
|
||||
- **Description:** AI-powered pipeline that turns natural language prompts into playable GameBoy ROM files. Generates C code, compiles with GBDK, outputs `.gb` files. Supports Claude API, local Llama, and RAG backends.
|
||||
- **Status:** Functional — generation pipeline works end-to-end with Claude 4 system prompt. Has tests, docs, examples, and retry logic.
|
||||
- **TODO:**
|
||||
- Harden the repo, clean up structure
|
||||
- Build a better testing pipeline
|
||||
- Come up with better prompt ideas / examples
|
||||
|
||||
---
|
||||
|
||||
## 3. ContentBasedMIR (Music Information Retrieval)
|
||||
- **Repo:** https://github.com/samherring99/ContentBasedMIR
|
||||
- **Local path:** `~/Desktop/Projects/ContentBasedMIR`
|
||||
- **Description:** Music similarity analysis using Spotify API track data. Extracts 54 audio features per song and visualizes similarity matrices for music recommendation.
|
||||
- **Status:** Early stage. Can download Spotify track analysis data and plot similarity matrices. Needs significant expansion.
|
||||
- **TODO:**
|
||||
- Expand analysis pipeline with more features
|
||||
- Integrate with text message data for personalized recommendations
|
||||
- Build out visualization and exploration tools
|
||||
- General modernization (dependencies, structure)
|
||||
|
||||
---
|
||||
|
||||
## 4. MessageRetrieval (iMessage RAG/SQL)
|
||||
- **Repo:** https://github.com/samherring99/MessageRetrieval
|
||||
- **Local path:** `~/Desktop/Projects/MessageRetrieval`
|
||||
- **Description:** Natural language querying over iMessage data using SQL generation (text2SQL) instead of vector embeddings. Uses LLM-as-Judge pattern for scoring and ranking retrieved messages.
|
||||
- **Status:** Has initial text2SQL pipeline and summarization tool. Recently worked on with Claude Code. Needs testing.
|
||||
- **TODO:**
|
||||
- Test out the recent Claude Code work
|
||||
- Build "iMessage Jarvis" — answer questions about texts
|
||||
- Improve SQL generation prompts and accuracy
|
||||
- Better error handling and UX
|
||||
|
||||
---
|
||||
|
||||
## 5. Grailed Embedding Search
|
||||
- **Repo:** https://github.com/samherring99/grailed-embedding-search
|
||||
- **Local path:** `~/Desktop/Projects/grailed-embedding-search`
|
||||
- **Description:** Semantic similarity search over Grailed fashion listings using CLIP embeddings and FAISS. Search by image URL or text description to find visually similar products.
|
||||
- **Status:** Functional core pipeline. CLIP ViT-B/32 embeds product cover photos into 512-dim vectors, indexed with FAISS cosine similarity. Has CLI, batch embedding, persistent index save/load, and logging.
|
||||
- **Recent work (June 2025):**
|
||||
- PR #1 — Initial cleanup: docstrings, type hints, `.gitignore`, `requirements.txt`, README rewrite
|
||||
- PR #2 — Feature improvements: persistent FAISS save/load, batch embedding, CLI (`cli.py`), proper logging throughout, lazy Grailed client, `fetch_details` toggle
|
||||
- **TODO:**
|
||||
- Embedding cache (avoid re-embedding known product URLs)
|
||||
- Async/threaded image downloads for faster batch indexing
|
||||
- Search result visualization (matplotlib grid of cover photos)
|
||||
- Filter by category, designer, price range before search
|
||||
- Web UI (Gradio or Streamlit)
|
||||
|
||||
---
|
||||
|
||||
## 6. NightwingNBA (Sports Analytics)
|
||||
- **Repo:** https://github.com/samherring99/NightwingNBA
|
||||
- **Local path:** `~/Desktop/Projects/NightwingNBA`
|
||||
- **Description:** NBA game prediction system. Builds a database of game data, trains a PyTorch model, and makes daily predictions. Has full pipeline: build DB → write data → train → predict.
|
||||
- **Status:** Functional pipeline exists. Has database building, training, prediction, and daily update scripts.
|
||||
- **TODO:**
|
||||
- Explore and potentially revive
|
||||
- Update data sources if stale
|
||||
- Improve model accuracy
|
||||
- Add visualization/reporting
|
||||
|
||||
---
|
||||
|
||||
## 7. Stable Audio Sample Explorer
|
||||
- **Repo:** https://github.com/samherring99/stable-audio-sample-explorer
|
||||
- **Local path:** `~/Desktop/Projects/stable-audio-sample-explorer`
|
||||
- **Description:** Tool for exploring audio samples generated by Stable Audio.
|
||||
- **Status:** 🪦 **Dead** — no active work needed per Sam.
|
||||
|
||||
---
|
||||
|
||||
## 8. NightwingArt (Art Tools)
|
||||
- **Repo:** https://github.com/samherring99/NightwingArt
|
||||
- **Local path:** `~/Desktop/Projects/NightwingArt`
|
||||
- **Description:** Collection of art tooling scripts — video editing, clip splicing with beat matching, damage effects, and general image manipulation.
|
||||
- **Status:** Maintenance mode. Tools exist for various effects. Work happens as-needed.
|
||||
- **TODO:**
|
||||
- Add tools as needed for new art projects
|
||||
|
||||
---
|
||||
|
||||
## 9. Claude-based VST Building ⚠️ *Needs new repo*
|
||||
- **Description:** Generate VST audio plugins for DAWs from English language prompts. LLM-powered audio plugin creation.
|
||||
- **Status:** Concept only — no repo exists yet.
|
||||
- **TODO:**
|
||||
- Create repo
|
||||
- Research VST SDK / JUCE framework
|
||||
- Design prompt → code → compile pipeline
|
||||
|
||||
---
|
||||
|
||||
## 10. Government Auction Site Scraper ⚠️ *Needs new repo*
|
||||
- **Description:** Tool that monitors and scrapes government auction sites in San Francisco for deals.
|
||||
- **Status:** Concept only — no repo exists yet.
|
||||
- **TODO:**
|
||||
- Create repo
|
||||
- Research SF government auction sites and their structure
|
||||
- Build scraper + notification system
|
||||
|
||||
---
|
||||
|
||||
## Priority Assessment
|
||||
|
||||
| Project | Activity Level | Suggested Priority |
|
||||
|---------|---------------|-------------------|
|
||||
| NightwingGameSim | Active | 🔴 High |
|
||||
| MessageRetrieval | Active | 🔴 High |
|
||||
| Kandinsky | Active | 🟡 Medium |
|
||||
| ContentBasedMIR | Exploratory | 🟡 Medium |
|
||||
| Grailed Embedding Search | Early | 🟡 Medium |
|
||||
| NightwingNBA | Dormant | 🟢 Low |
|
||||
| NightwingArt | As-needed | 🟢 Low |
|
||||
| VST Builder | Concept | 🔵 Future |
|
||||
| Gov Auction Scraper | Concept | 🔵 Future |
|
||||
| Stable Audio Explorer | Dead | ⚫ None |
|
||||
|
||||
|
||||
|
||||
@@ -6,10 +6,11 @@ and implement the required methods.
|
||||
"""
|
||||
|
||||
import asyncio
|
||||
import re
|
||||
from abc import ABC, abstractmethod
|
||||
from dataclasses import dataclass, field
|
||||
from datetime import datetime
|
||||
from typing import Dict, List, Optional, Any, Callable, Awaitable
|
||||
from typing import Dict, List, Optional, Any, Callable, Awaitable, Tuple
|
||||
from enum import Enum
|
||||
|
||||
import sys
|
||||
@@ -177,6 +178,68 @@ class BasePlatformAdapter(ABC):
|
||||
"""
|
||||
pass
|
||||
|
||||
async def send_image(
|
||||
self,
|
||||
chat_id: str,
|
||||
image_url: str,
|
||||
caption: Optional[str] = None,
|
||||
reply_to: Optional[str] = None,
|
||||
) -> SendResult:
|
||||
"""
|
||||
Send an image natively via the platform API.
|
||||
|
||||
Override in subclasses to send images as proper attachments
|
||||
instead of plain-text URLs. Default falls back to sending the
|
||||
URL as a text message.
|
||||
"""
|
||||
# Fallback: send URL as text (subclasses override for native images)
|
||||
text = f"{caption}\n{image_url}" if caption else image_url
|
||||
return await self.send(chat_id=chat_id, content=text, reply_to=reply_to)
|
||||
|
||||
@staticmethod
|
||||
def extract_images(content: str) -> Tuple[List[Tuple[str, str]], str]:
|
||||
"""
|
||||
Extract image URLs from markdown and HTML image tags in a response.
|
||||
|
||||
Finds patterns like:
|
||||
- 
|
||||
- <img src="https://example.com/image.png">
|
||||
- <img src="https://example.com/image.png"></img>
|
||||
|
||||
Args:
|
||||
content: The response text to scan.
|
||||
|
||||
Returns:
|
||||
Tuple of (list of (url, alt_text) pairs, cleaned content with image tags removed).
|
||||
"""
|
||||
images = []
|
||||
cleaned = content
|
||||
|
||||
# Match markdown images: 
|
||||
md_pattern = r'!\[([^\]]*)\]\((https?://[^\s\)]+)\)'
|
||||
for match in re.finditer(md_pattern, content):
|
||||
alt_text = match.group(1)
|
||||
url = match.group(2)
|
||||
# Only extract URLs that look like actual images
|
||||
if any(url.lower().endswith(ext) or ext in url.lower() for ext in
|
||||
['.png', '.jpg', '.jpeg', '.gif', '.webp', 'fal.media', 'fal-cdn', 'replicate.delivery']):
|
||||
images.append((url, alt_text))
|
||||
|
||||
# Match HTML img tags: <img src="url"> or <img src="url"></img> or <img src="url"/>
|
||||
html_pattern = r'<img\s+src=["\']?(https?://[^\s"\'<>]+)["\']?\s*/?>\s*(?:</img>)?'
|
||||
for match in re.finditer(html_pattern, content):
|
||||
url = match.group(1)
|
||||
images.append((url, ""))
|
||||
|
||||
# Remove matched image tags from content if we found images
|
||||
if images:
|
||||
cleaned = re.sub(md_pattern, '', cleaned)
|
||||
cleaned = re.sub(html_pattern, '', cleaned)
|
||||
# Clean up leftover blank lines
|
||||
cleaned = re.sub(r'\n{3,}', '\n\n', cleaned).strip()
|
||||
|
||||
return images, cleaned
|
||||
|
||||
async def _keep_typing(self, chat_id: str, interval: float = 2.0) -> None:
|
||||
"""
|
||||
Continuously send typing indicator until cancelled.
|
||||
@@ -231,23 +294,41 @@ class BasePlatformAdapter(ABC):
|
||||
|
||||
# Send response if any
|
||||
if response:
|
||||
result = await self.send(
|
||||
chat_id=event.source.chat_id,
|
||||
content=response,
|
||||
reply_to=event.message_id
|
||||
)
|
||||
# Extract image URLs and send them as native platform attachments
|
||||
images, text_content = self.extract_images(response)
|
||||
|
||||
# Log send failures (don't raise - user already saw tool progress)
|
||||
if not result.success:
|
||||
print(f"[{self.name}] Failed to send response: {result.error}")
|
||||
# Try sending without markdown as fallback
|
||||
fallback_result = await self.send(
|
||||
# Send the text portion first (if any remains after extracting images)
|
||||
if text_content:
|
||||
result = await self.send(
|
||||
chat_id=event.source.chat_id,
|
||||
content=f"(Response formatting failed, plain text:)\n\n{response[:3500]}",
|
||||
content=text_content,
|
||||
reply_to=event.message_id
|
||||
)
|
||||
if not fallback_result.success:
|
||||
print(f"[{self.name}] Fallback send also failed: {fallback_result.error}")
|
||||
|
||||
# Log send failures (don't raise - user already saw tool progress)
|
||||
if not result.success:
|
||||
print(f"[{self.name}] Failed to send response: {result.error}")
|
||||
# Try sending without markdown as fallback
|
||||
fallback_result = await self.send(
|
||||
chat_id=event.source.chat_id,
|
||||
content=f"(Response formatting failed, plain text:)\n\n{text_content[:3500]}",
|
||||
reply_to=event.message_id
|
||||
)
|
||||
if not fallback_result.success:
|
||||
print(f"[{self.name}] Fallback send also failed: {fallback_result.error}")
|
||||
|
||||
# Send extracted images as native attachments
|
||||
for image_url, alt_text in images:
|
||||
try:
|
||||
img_result = await self.send_image(
|
||||
chat_id=event.source.chat_id,
|
||||
image_url=image_url,
|
||||
caption=alt_text if alt_text else None,
|
||||
)
|
||||
if not img_result.success:
|
||||
print(f"[{self.name}] Failed to send image: {img_result.error}")
|
||||
except Exception as img_err:
|
||||
print(f"[{self.name}] Error sending image: {img_err}")
|
||||
|
||||
# Check if there's a pending message that was queued during our processing
|
||||
if session_key in self._pending_messages:
|
||||
|
||||
@@ -8,6 +8,7 @@ Uses discord.py library for:
|
||||
"""
|
||||
|
||||
import asyncio
|
||||
import os
|
||||
from typing import Dict, List, Optional, Any
|
||||
|
||||
try:
|
||||
@@ -173,6 +174,61 @@ class DiscordAdapter(BasePlatformAdapter):
|
||||
except Exception as e:
|
||||
return SendResult(success=False, error=str(e))
|
||||
|
||||
async def send_image(
|
||||
self,
|
||||
chat_id: str,
|
||||
image_url: str,
|
||||
caption: Optional[str] = None,
|
||||
reply_to: Optional[str] = None,
|
||||
) -> SendResult:
|
||||
"""Send an image natively as a Discord file attachment."""
|
||||
if not self._client:
|
||||
return SendResult(success=False, error="Not connected")
|
||||
|
||||
try:
|
||||
import aiohttp
|
||||
|
||||
channel = self._client.get_channel(int(chat_id))
|
||||
if not channel:
|
||||
channel = await self._client.fetch_channel(int(chat_id))
|
||||
if not channel:
|
||||
return SendResult(success=False, error=f"Channel {chat_id} not found")
|
||||
|
||||
# Download the image and send as a Discord file attachment
|
||||
# (Discord renders attachments inline, unlike plain URLs)
|
||||
async with aiohttp.ClientSession() as session:
|
||||
async with session.get(image_url, timeout=aiohttp.ClientTimeout(total=30)) as resp:
|
||||
if resp.status != 200:
|
||||
raise Exception(f"Failed to download image: HTTP {resp.status}")
|
||||
|
||||
image_data = await resp.read()
|
||||
|
||||
# Determine filename from URL or content type
|
||||
content_type = resp.headers.get("content-type", "image/png")
|
||||
ext = "png"
|
||||
if "jpeg" in content_type or "jpg" in content_type:
|
||||
ext = "jpg"
|
||||
elif "gif" in content_type:
|
||||
ext = "gif"
|
||||
elif "webp" in content_type:
|
||||
ext = "webp"
|
||||
|
||||
import io
|
||||
file = discord.File(io.BytesIO(image_data), filename=f"image.{ext}")
|
||||
|
||||
msg = await channel.send(
|
||||
content=caption if caption else None,
|
||||
file=file,
|
||||
)
|
||||
return SendResult(success=True, message_id=str(msg.id))
|
||||
|
||||
except ImportError:
|
||||
print(f"[{self.name}] aiohttp not installed, falling back to URL. Run: pip install aiohttp")
|
||||
return await super().send_image(chat_id, image_url, caption, reply_to)
|
||||
except Exception as e:
|
||||
print(f"[{self.name}] Failed to send image attachment, falling back to URL: {e}")
|
||||
return await super().send_image(chat_id, image_url, caption, reply_to)
|
||||
|
||||
async def send_typing(self, chat_id: str) -> None:
|
||||
"""Send typing indicator."""
|
||||
if self._client:
|
||||
@@ -232,6 +288,36 @@ class DiscordAdapter(BasePlatformAdapter):
|
||||
|
||||
async def _handle_message(self, message: DiscordMessage) -> None:
|
||||
"""Handle incoming Discord messages."""
|
||||
# In server channels (not DMs), require the bot to be @mentioned
|
||||
# UNLESS the channel is in the free-response list.
|
||||
#
|
||||
# Config:
|
||||
# DISCORD_FREE_RESPONSE_CHANNELS: Comma-separated channel IDs where the
|
||||
# bot responds to every message without needing a mention.
|
||||
# DISCORD_REQUIRE_MENTION: Set to "false" to disable mention requirement
|
||||
# globally (all channels become free-response). Default: "true".
|
||||
|
||||
if not isinstance(message.channel, discord.DMChannel):
|
||||
# Check if this channel is in the free-response list
|
||||
free_channels_raw = os.getenv("DISCORD_FREE_RESPONSE_CHANNELS", "")
|
||||
free_channels = {ch.strip() for ch in free_channels_raw.split(",") if ch.strip()}
|
||||
channel_id = str(message.channel.id)
|
||||
|
||||
# Global override: if DISCORD_REQUIRE_MENTION=false, all channels are free
|
||||
require_mention = os.getenv("DISCORD_REQUIRE_MENTION", "true").lower() not in ("false", "0", "no")
|
||||
|
||||
is_free_channel = channel_id in free_channels
|
||||
|
||||
if require_mention and not is_free_channel:
|
||||
# Must be @mentioned to respond
|
||||
if self._client.user not in message.mentions:
|
||||
return # Silently ignore messages that don't mention the bot
|
||||
|
||||
# Strip the bot mention from the message text so the agent sees clean input
|
||||
if self._client.user and self._client.user in message.mentions:
|
||||
message.content = message.content.replace(f"<@{self._client.user.id}>", "").strip()
|
||||
message.content = message.content.replace(f"<@!{self._client.user.id}>", "").strip()
|
||||
|
||||
# Determine message type
|
||||
msg_type = MessageType.TEXT
|
||||
if message.content.startswith("/"):
|
||||
|
||||
@@ -174,6 +174,31 @@ class TelegramAdapter(BasePlatformAdapter):
|
||||
except Exception as e:
|
||||
return SendResult(success=False, error=str(e))
|
||||
|
||||
async def send_image(
|
||||
self,
|
||||
chat_id: str,
|
||||
image_url: str,
|
||||
caption: Optional[str] = None,
|
||||
reply_to: Optional[str] = None,
|
||||
) -> SendResult:
|
||||
"""Send an image natively as a Telegram photo."""
|
||||
if not self._bot:
|
||||
return SendResult(success=False, error="Not connected")
|
||||
|
||||
try:
|
||||
# Telegram can send photos directly from URLs
|
||||
msg = await self._bot.send_photo(
|
||||
chat_id=int(chat_id),
|
||||
photo=image_url,
|
||||
caption=caption[:1024] if caption else None, # Telegram caption limit
|
||||
reply_to_message_id=int(reply_to) if reply_to else None,
|
||||
)
|
||||
return SendResult(success=True, message_id=str(msg.message_id))
|
||||
except Exception as e:
|
||||
print(f"[{self.name}] Failed to send photo, falling back to URL: {e}")
|
||||
# Fallback: send as text link
|
||||
return await super().send_image(chat_id, image_url, caption, reply_to)
|
||||
|
||||
async def send_typing(self, chat_id: str) -> None:
|
||||
"""Send typing indicator."""
|
||||
if self._bot:
|
||||
|
||||
@@ -392,7 +392,7 @@ def get_image_tool_definitions() -> List[Dict[str, Any]]:
|
||||
"type": "function",
|
||||
"function": {
|
||||
"name": "image_generate",
|
||||
"description": "Generate high-quality images from text prompts using FLUX 2 Pro model with automatic 2x upscaling. Creates detailed, artistic images that are automatically upscaled for hi-rez results. Returns a single upscaled image URL that can be displayed using <img src=\"{URL}\"></img> tags.",
|
||||
"description": "Generate high-quality images from text prompts using FLUX 2 Pro model with automatic 2x upscaling. Creates detailed, artistic images that are automatically upscaled for hi-rez results. Returns a single upscaled image URL. Display it using markdown: ",
|
||||
"parameters": {
|
||||
"type": "object",
|
||||
"properties": {
|
||||
|
||||
@@ -169,6 +169,8 @@ TOOLSETS = {
|
||||
"web_search", "web_extract",
|
||||
# Vision - analyze images sent by users
|
||||
"vision_analyze",
|
||||
# Image generation
|
||||
"image_generate",
|
||||
# Skills - access knowledge base
|
||||
"skills_list", "skill_view",
|
||||
# Cronjob management - let users schedule tasks
|
||||
@@ -188,6 +190,8 @@ TOOLSETS = {
|
||||
"web_search", "web_extract",
|
||||
# Vision - analyze images sent by users
|
||||
"vision_analyze",
|
||||
# Image generation
|
||||
"image_generate",
|
||||
# Skills - access knowledge base
|
||||
"skills_list", "skill_view",
|
||||
# Cronjob management - let users schedule tasks
|
||||
@@ -207,6 +211,8 @@ TOOLSETS = {
|
||||
"read_file", "write_file", "patch", "search",
|
||||
# Vision
|
||||
"vision_analyze",
|
||||
# Image generation
|
||||
"image_generate",
|
||||
# Skills
|
||||
"skills_list", "skill_view",
|
||||
# Cronjob management
|
||||
|
||||
Reference in New Issue
Block a user