Google Gemini Live

Google Gemini Live provides multimodal large language model capabilities with real-time audio processing, enabling natural voice conversations without separate ASR/TTS components. This page covers integration using the Gemini Developer API, authenticated with a Gemini API key obtained from Google AI Studio.

info

Enabling MLLM automatically disables ASR, LLM, and TTS since the MLLM handles end-to-end voice processing directly. See turn_detection for turn detection options available with MLLMs.

Enable MLLM

To enable MLLM functionality, set enable_mllm to true under advanced_features.

"advanced_features": {
  "enable_mllm": true
}

Sample configuration

The following example shows a starting mllm parameter configuration you can use when you Start a conversational AI agent.

"mllm": {
  "api_key": "<GOOGLE_GEMINI_API_KEY>",
  "messages": [
    {
      "role": "user",
      "content": "<HISTORY_CONTENT>"
    }
  ],
  "params": {
    "model": "gemini-3.1-flash-live-preview",
    "instructions": "You are a friendly assistant.",
    "voice": "Charon",
    "affective_dialog": false,
    "proactive_audio": false,
    "transcribe_agent": true,
    "transcribe_user": true,
    "http_options": {
      "api_version": "v1beta"
    }
  },
  "input_modalities": [
    "audio"
  ],
  "output_modalities": [
    "audio"
  ],
  "greeting_message": "Hi, how can I assist you today?",
  "failure_message": "Sorry, I encountered an issue. Please try again.",
  "vendor": "gemini"
}

Key parameters

mllmrequired

api_key stringrequired

The Google Gemini API key used to authenticate requests. You can generate an API key in Google AI Studio.

messages array[object]nullable

An array of conversation history items passed to the model as context. Each item represents a single message in the conversation history.

Show propertiesHide properties

role stringrequired

The role of the message author. For example, user.

content stringrequired

The content of the message.

params objectrequired

Configuration object for the Gemini Live model.

Show propertiesHide properties

model stringrequired

The Gemini Live model identifier.

instructions stringnullable

System instructions that define the agent's behavior or tone.

voice stringnullable

The voice identifier for audio output. For example, Aoede, Puck, Charon, Kore, Fenrir, Leda, Orus, or Zephyr.

affective_dialog booleannullable

Whether to enable affective dialog, which allows the model to adapt its tone based on the user's emotional cues.

proactive_audio booleannullable

When enabled, the model may choose not to respond if the user's input does not require a reply, such as background speech or incomplete requests.

transcribe_agent booleannullable

Whether to transcribe the agent's speech in real time.

transcribe_user booleannullable

Whether to transcribe the user's speech in real time.

http_options objectnullable

HTTP request options for the Gemini Live API.

Show propertiesHide properties

api_version stringnullable

The API version to use. For example, v1beta.

input_modalities array[string]nullable

Default: ["audio"]

Input modalities for the MLLM.

["audio"]: Audio-only input
["audio", "text"]: Accept both audio and text input

output_modalities array[string]nullable

Default: ["audio"]

Output modalities for the MLLM.

["audio"]: Audio-only output
["text", "audio"]: Combined text and audio output

greeting_message stringnullable

The message the agent speaks when a user joins the channel.

failure_message stringnullable

The message the agent speaks when an error occurs.

vendor stringrequired

The MLLM provider identifier. Set to "gemini" to use Google Gemini Live with the Gemini Developer API.

For comprehensive API reference, real-time capabilities, and detailed parameter descriptions, see the Google Gemini Live API.

Enable MLLM​

Sample configuration​

Key parameters​

Enable MLLM

Sample configuration

Key parameters