LM Studio 0.4 adds daemon mode for server-native deployment, parallel inference requests, and a stateful REST API that supports using local MCP servers.
Introducing llmster
Parallel Requests
Serve multiple concurrent inference requests. Requests to the same model can now be processed in parallel, instead of queued. Try this feature by putting two chats side-by-side in the new Split View feature.
ollama
LM studio
llama.ccp
vllm-mlx on Apple Silicon
Local LLMs
Mac Studio
Apple M4 Max chip
16-core CPU, a 40-core GPU, and a 16-core Neural Engine
128GB of integrated memory
1TB SSD storage
130k baht
-----------------------
Mac Studio
Apple M3 Ultra chip
28-core CPU, a 60-core GPU, and a 32-core Neural Engine
256GB of integrated memory
1TB SSD storage