MoonshotAI: Kimi Linear 48B A3B Instruct

Kimi Linear is a hybrid linear attention architecture that outperforms traditional full attention methods across various contexts, including short, long, and reinforcement learning (RL) scaling regimes. At its core is Kimi Delta Attention (KDA)—a refined version of Gated DeltaNet that introduces a more efficient gating mechanism to optimize the use of finite-state RNN memory. Kimi Linear achieves superior performance and hardware efficiency, especially for long-context tasks. It reduces the need for large KV caches by up to 75% and boosts decoding throughput by up to 6x for contexts as long as 1M tokens.

Context Length:1.0M tokens
Pricing:$0.30M
Created:November 8, 2025

Model Information

1.0M
November 8, 2025
Other
text->text
text
text

Pricing Information

Prompt
$0.30M
per 1M tokens
Completion
$0.60M
per 1M tokens

Supported Parameters

frequency_penaltylogit_biaslogprobsmax_tokensmin_ppresence_penaltyrepetition_penaltyresponse_formatseedstopstructured_outputstemperaturetop_ktop_logprobstop_p

Common Use Cases

Text Generation

  • • Content writing and editing
  • • Code generation and debugging
  • • Creative writing and storytelling
  • • Translation and summarization

General Applications

  • • Chatbots and virtual assistants
  • • Educational content creation
  • • Research and analysis
  • • Automation and workflow

Frequently Asked Questions

What is the context length of this model?

This model has a context length of 1.0M tokens, which means it can process and remember up to 1.0M tokens of text in a single conversation or request.

How much does it cost to use this model?

Prompt tokens cost $0.30M/1M tokens and completion tokens cost $0.60M/1M tokens.

What modalities does this model support?

This model supports text->text modality, accepting textas input and producing text as output.

When was this model created?

This model was created on November 8, 2025.

MoonshotAI: Kimi Linear 48B A3B Instruct - AI Model Details & Pricing | Zenspeed | Zenspeed