Bring Your LM 2026.04.28
The biggest BYLM release to date: responses stream in live and can be stopped or retried, messages carry image attachments, two new provider types join the list (Anthropic and llama.cpp), and every reply reports what it cost — in tokens, money, and time.
Live, controllable responses
- Assistant replies stream token-by-token instead of appearing only when the model finishes.
- A Stop button cancels a runaway response immediately, keeping whatever was produced so far.
- If a response fails mid-stream (network drop, provider error), a Retry button on the failed message re-runs the turn — no retyping the prompt.
- You can keep typing while the assistant answers: sends queue up as the next turn instead of being blocked, and your draft survives the streaming transitions.
- The list auto-scrolls to follow a long reply while you're at the bottom, and stops fighting you the moment you scroll up to read something.
- Text you select in a finished message stays selected while new tokens stream in.
Reasoning, visible and controllable
- Reasoning models get a proper thinking tile: expandable mid-stream, full message width, flowing prose, with a live seconds/words counter that stops the moment the model stops thinking.
- Each bot has a reasoning-effort setting, translated to the matching wire-level control for Anthropic, OpenRouter, Ollama, OpenAI, and llama.cpp — and "Off" genuinely disables thinking on Ollama and llama.cpp instead of being silently ignored.
- The waiting indicator is stage-aware: it tells you whether the backend is loading the model, processing the prompt, thinking, or generating, each with its own live timer.
Images in conversations
- On vision-capable models, a paperclip menu attaches images to a message — from a file or straight from the clipboard (Ctrl+V / Cmd+V works too).
- Attached images render as inline thumbnails with a tap-to-zoom fullscreen viewer, both in the composer and in sent messages.
- Image data is stored encrypted on disk, like the rest of your local data, with small previews generated so image-heavy conversations scroll smoothly.
New providers, better model lists
- Anthropic: talk to Claude models with your own API key, with per-model capabilities (image input, thinking, context length) read from the native model catalog.
- llama.cpp: point BYLM at your own llama.cpp server as a first-class provider type, including reasoning control and per-model detail fetch.
- Providers accept custom auth and request headers, so OpenAI-compatible backends with non-standard schemes work.
- The model list was redesigned: modality and capability icons, context and pricing at a glance, expandable details, filters by capability, multi-word token search, and newest-first sorting.
- Model lists are cached and refreshable, load without blocking the page, and the bot form warns inline when the typed model id isn't in the selected provider's catalog.
Know what each reply cost
- Per-message token usage (prompt / completion / thinking), updated live during streaming and persisted.
- Per-message cost in dollars when the provider reports it (OpenRouter).
- Conversation totals and a context-window usage bar in the page header.
- A per-message performance breakdown: model load, prompt processing, and generation time with tokens-per-second.
- Each response records which provider and model produced it, so switching a bot's model later doesn't lose attribution on past turns.
Richer message rendering
- LaTeX rendering for
$...$and$$...$$blocks. - Syntax-highlighted fenced code blocks with a working Copy button (even mid-stream), and unlabeled code fences render instead of erroring.
- A copy button on every message.
- Markdown tables render and scroll smoothly instead of freezing the page.
- On wide screens, messages stay in a comfortable reading column.
Reliability and polish
- A migration framework runs one-time data upgrades behind a progress screen on first launch after an update, instead of revealing a half-migrated database.
- Image decryption moved to a hardware-accelerated backend — image-heavy conversations open without multi-second freezes.
- Navigation cleanups throughout: Back after saving a provider, bot, or tool returns to the list; switching sections from the drawer resets the stack; deleting a conversation no longer leaves a stale page behind.
- Providers and tools can be deleted straight from their detail pages, bots gain a one-tap "Start conversation" shortcut, and bot parameters accept precise numeric input alongside the sliders.