Google Gemini Live
Google Gemini Live provides multimodal large language model capabilities with real-time audio processing, enabling natural voice conversations without separate ASR/TTS components. This page covers integration using the Gemini Developer API, authenticated with a Gemini API key obtained from Google AI Studio.
Enabling MLLM automatically disables ASR, LLM, and TTS since the MLLM handles end-to-end voice processing directly. See turn_detection for turn detection options available with MLLMs.
Enable MLLM
To enable MLLM functionality, set enable_mllm to true under advanced_features.
Sample configuration
The following example shows a starting mllm parameter configuration you can use when you Start a conversational AI agent.
Key parameters
mllmrequired
- api_key stringrequired
The Google Gemini API key used to authenticate requests. You can generate an API key in Google AI Studio.
- messages array[object]nullable
An array of conversation history items passed to the model as context. Each item represents a single message in the conversation history.
- params objectrequired
Configuration object for the Gemini Live model.
Show propertiesHide properties
- model stringrequired
The Gemini Live model identifier.
- instructions stringnullable
System instructions that define the agent's behavior or tone.
- voice stringnullable
The voice identifier for audio output. For example,
Aoede,Puck,Charon,Kore,Fenrir,Leda,Orus, orZephyr. - affective_dialog booleannullable
Whether to enable affective dialog, which allows the model to adapt its tone based on the user's emotional cues.
- proactive_audio booleannullable
When enabled, the model may choose not to respond if the user's input does not require a reply, such as background speech or incomplete requests.
- transcribe_agent booleannullable
Whether to transcribe the agent's speech in real time.
- transcribe_user booleannullable
Whether to transcribe the user's speech in real time.
- http_options objectnullable
HTTP request options for the Gemini Live API.
Show propertiesHide properties
- api_version stringnullable
The API version to use. For example,
v1beta.
- input_modalities array[string]nullable
Default:
["audio"]Input modalities for the MLLM.
["audio"]: Audio-only input["audio", "text"]: Accept both audio and text input
- output_modalities array[string]nullable
Default:
["audio"]Output modalities for the MLLM.
["audio"]: Audio-only output["text", "audio"]: Combined text and audio output
- greeting_message stringnullable
The message the agent speaks when a user joins the channel.
- failure_message stringnullable
The message the agent speaks when an error occurs.
- vendor stringrequired
The MLLM provider identifier. Set to
"gemini"to use Google Gemini Live with the Gemini Developer API.
For comprehensive API reference, real-time capabilities, and detailed parameter descriptions, see the Google Gemini Live API.