144 lines
6.3 KiB
Markdown
144 lines
6.3 KiB
Markdown
# Timmy Time App — MVP Specification
|
|
|
|
## Overview
|
|
|
|
Native iPad app that serves as the primary interface to a sovereign AI agent
|
|
(Timmy) running on a Mac server. All heavy computation stays on the Mac.
|
|
The iPad handles UI, sensor capture, and local preprocessing.
|
|
|
|
## Target Platform
|
|
|
|
- iPadOS 26.1+
|
|
- iPad Pro 13" (primary target)
|
|
- Swift 6 / SwiftUI
|
|
- Minimum deployment target: iPadOS 26.0
|
|
|
|
## Server Requirements
|
|
|
|
The Mac runs the existing Timmy dashboard (FastAPI) with new API endpoints
|
|
added specifically for the app. Communication over Tailscale (private network).
|
|
|
|
### API Endpoints (to build on Mac side)
|
|
|
|
#### Chat
|
|
|
|
```
|
|
POST /api/v1/chat
|
|
Body: { "message": "string", "session_id": "string", "attachments": [...] }
|
|
Response: streaming text/event-stream (SSE)
|
|
|
|
GET /api/v1/chat/history?session_id=X&limit=50
|
|
Response: { "messages": [...] }
|
|
```
|
|
|
|
#### Upload / Media
|
|
|
|
```
|
|
POST /api/v1/upload
|
|
Body: multipart/form-data with file
|
|
Response: { "id": "string", "type": "image|audio|document|url",
|
|
"summary": "string", "metadata": {...} }
|
|
```
|
|
|
|
The upload endpoint auto-detects media type and routes to the right processor:
|
|
- Images (jpg, png, heic) → vision model analysis
|
|
- Audio (m4a, mp3, wav, caf) → Whisper transcription
|
|
- Documents (pdf, txt, md) → text extraction
|
|
- URLs (detected from text) → web page extraction
|
|
|
|
#### Status
|
|
|
|
```
|
|
GET /api/v1/status
|
|
Response: { "timmy": "online|offline", "model": "qwen3:30b",
|
|
"ollama": "running|stopped", "uptime": "..." }
|
|
```
|
|
|
|
## App Structure
|
|
|
|
```
|
|
TimmyTime/
|
|
├── TimmyTimeApp.swift # App entry point
|
|
├── Models/
|
|
│ ├── Message.swift # Chat message model
|
|
│ ├── Attachment.swift # Media attachment model
|
|
│ └── ServerConfig.swift # Server URL, auth config
|
|
├── Views/
|
|
│ ├── ChatView.swift # Main chat interface
|
|
│ ├── MessageBubble.swift # Individual message rendering
|
|
│ ├── AttachmentPicker.swift # Photo/file/camera picker
|
|
│ ├── VoiceButton.swift # Hold-to-talk microphone
|
|
│ ├── SettingsView.swift # Server URL config
|
|
│ └── StatusBar.swift # Connection/model status
|
|
├── Services/
|
|
│ ├── ChatService.swift # HTTP + SSE streaming client
|
|
│ ├── UploadService.swift # Multipart file upload
|
|
│ ├── AudioRecorder.swift # AVFoundation mic recording
|
|
│ └── PersistenceService.swift # Local chat history (SwiftData)
|
|
├── Assets.xcassets/ # App icons, colors
|
|
└── Info.plist
|
|
```
|
|
|
|
## Screen Layout (iPad Landscape)
|
|
|
|
```
|
|
┌──────────────────────────────────────────────────────────┐
|
|
│ ◉ Timmy Time qwen3:30b ● Online ⚙️ │
|
|
├──────────────────────────────────────────────────────────┤
|
|
│ │
|
|
│ ┌─────────────────────────────┐ │
|
|
│ │ Timmy: │ │
|
|
│ │ Here's what I found... │ │
|
|
│ └─────────────────────────────┘ │
|
|
│ │
|
|
│ ┌─────────────────────────────┐ │
|
|
│ │ You: │ │
|
|
│ │ [📷 photo.jpg] │ │
|
|
│ │ What's in this image? │ │
|
|
│ └─────────────────────────────┘ │
|
|
│ │
|
|
│ ┌─────────────────────────────┐ │
|
|
│ │ Timmy: │ │
|
|
│ │ I see a circuit board... │ │
|
|
│ │ ▋ (streaming) │ │
|
|
│ └─────────────────────────────┘ │
|
|
│ │
|
|
├──────────────────────────────────────────────────────────┤
|
|
│ 📎 📷 🎤 │ Type a message... ➤ │
|
|
└──────────────────────────────────────────────────────────┘
|
|
|
|
📎 = file picker 📷 = camera 🎤 = hold to talk ➤ = send
|
|
```
|
|
|
|
## Data Flow
|
|
|
|
1. User types/speaks/attaches media
|
|
2. App sends to Mac: POST /api/v1/chat (text) or /api/v1/upload (media)
|
|
3. Upload endpoint processes media, returns summary + ID
|
|
4. Chat endpoint receives message + attachment IDs, streams response
|
|
5. App renders streaming response in real time
|
|
6. Chat saved to local SwiftData store for offline viewing
|
|
|
|
## Design Principles
|
|
|
|
- **Touch-first.** Everything reachable by thumb. No tiny tap targets.
|
|
- **Sovereign.** No cloud dependencies. All traffic stays on Tailscale.
|
|
- **Media-rich.** Images, audio, links displayed inline, not as file names.
|
|
- **Fast.** Streaming responses start appearing immediately.
|
|
- **Simple.** One screen. Chat is the interface. Everything else is secondary.
|
|
|
|
## Color Palette
|
|
|
|
- Dark mode primary (easier on eyes, looks good on OLED iPad Pro)
|
|
- Accent color: match Timmy's personality — warm gold or sovereign blue
|
|
- Message bubbles: subtle differentiation between user and Timmy
|
|
|
|
## Phase 2 Features (post-MVP)
|
|
|
|
- Apple Pencil: draw on images, handwriting input
|
|
- Core ML: on-device Whisper for instant transcription
|
|
- Split View: chat + status side by side
|
|
- Drag and drop from other apps
|
|
- Share Sheet extension ("Send to Timmy")
|
|
- Push notifications for long-running task completion
|