Build 1
Build 1
v1/messages output_config.effort (low, medium, high, max)Build 1
reasoning_effort and reasoning_tokens in OpenAI-compatible v1/chat/completionsreasoning field to the /api/v1/models API response, indicating each model's supported reasoning capabilities/REST configuration options learn morereasoning to low when using Nemotron 3 Super via the /api/v1/chat or OpenAI-compatible /v1/responses API would error outBuild 4
Build 3
/v1/responses which sometimes caused "Output items missing; timeline invariant violated"$...$ parsing so plain currency/text (for example $10, $1.23, $37 trillion) is no longer incorrectly rendered as math$ appears in normal prose\[ \] and \( \) handling so bracket/paren math parses correctly, and empty forms stay visible as literal text/v1/messages API would error when properties were not provided for a tool input schemastring type were sometimes incorrectly parsed as object/number/booleangpt-oss models using llama.cpp engine, significantly increasing tool call success rate for these models (requires llama.cpp engines updated to v2.7.1 or later)Build 2
/v1/messages API now surfaces errors when the model generates an invalid tool call, enabling Claude Code to recover gracefullyBuild 1
reasoning_content and content in API responses" is now ON by default in order to improve compatibility with /v1/chat/completions clients
parallel parameter to /api/v1/load endpointpresence_penalty sampling parameter/v1/responses endpoint erroring on none and xhigh reasoning effort/v1/responses responses included logProbs for MLX models even if message.output_text.logprobs was omittedBuild 1
Build 2
Build 1
Build 1
Build 1
Build 2
Build 1
?? arches and limited context lengths. Impacted unsloth/Qwen3-Coder-Next/v1/responses and /v1/chatBuild 2
Build 1
safe filterCannot read properties of null (reading 'visionAdapter')lms commands were not colored correctlylms chat would not preserve newline formatting