[go: up one dir, main page]

Skip to main content

Google Gemini Live

Google Gemini Live provides multimodal large language model capabilities with real-time audio processing, enabling natural voice conversations without separate ASR/TTS components. This page covers integration using the Gemini Developer API, authenticated with a Gemini API key obtained from Google AI Studio.

info

Enabling MLLM automatically disables ASR, LLM, and TTS since the MLLM handles end-to-end voice processing directly. See turn_detection for turn detection options available with MLLMs.

Enable MLLM

To enable MLLM functionality, set enable_mllm to true under advanced_features.


_3
"advanced_features": {
_3
"enable_mllm": true
_3
}

Sample configuration

The following example shows a starting mllm parameter configuration you can use when you Start a conversational AI agent.


_30
"mllm": {
_30
"api_key": "<GOOGLE_GEMINI_API_KEY>",
_30
"messages": [
_30
{
_30
"role": "user",
_30
"content": "<HISTORY_CONTENT>"
_30
}
_30
],
_30
"params": {
_30
"model": "gemini-3.1-flash-live-preview",
_30
"instructions": "You are a friendly assistant.",
_30
"voice": "Charon",
_30
"affective_dialog": false,
_30
"proactive_audio": false,
_30
"transcribe_agent": true,
_30
"transcribe_user": true,
_30
"http_options": {
_30
"api_version": "v1beta"
_30
}
_30
},
_30
"input_modalities": [
_30
"audio"
_30
],
_30
"output_modalities": [
_30
"audio"
_30
],
_30
"greeting_message": "Hi, how can I assist you today?",
_30
"failure_message": "Sorry, I encountered an issue. Please try again.",
_30
"vendor": "gemini"
_30
}

Key parameters

mllmrequired
  • api_key stringrequired

    The Google Gemini API key used to authenticate requests. You can generate an API key in Google AI Studio.

  • messages array[object]nullable

    An array of conversation history items passed to the model as context. Each item represents a single message in the conversation history.

    Show propertiesHide properties
    • role stringrequired

      The role of the message author. For example, user.

    • content stringrequired

      The content of the message.

  • params objectrequired

    Configuration object for the Gemini Live model.

    Show propertiesHide properties
    • model stringrequired

      The Gemini Live model identifier.

    • instructions stringnullable

      System instructions that define the agent's behavior or tone.

    • voice stringnullable

      The voice identifier for audio output. For example, Aoede, Puck, Charon, Kore, Fenrir, Leda, Orus, or Zephyr.

    • affective_dialog booleannullable

      Whether to enable affective dialog, which allows the model to adapt its tone based on the user's emotional cues.

    • proactive_audio booleannullable

      When enabled, the model may choose not to respond if the user's input does not require a reply, such as background speech or incomplete requests.

    • transcribe_agent booleannullable

      Whether to transcribe the agent's speech in real time.

    • transcribe_user booleannullable

      Whether to transcribe the user's speech in real time.

    • http_options objectnullable

      HTTP request options for the Gemini Live API.

      Show propertiesHide properties
      • api_version stringnullable

        The API version to use. For example, v1beta.

  • input_modalities array[string]nullable

    Default: ["audio"]

    Input modalities for the MLLM.

    • ["audio"]: Audio-only input
    • ["audio", "text"]: Accept both audio and text input
  • output_modalities array[string]nullable

    Default: ["audio"]

    Output modalities for the MLLM.

    • ["audio"]: Audio-only output
    • ["text", "audio"]: Combined text and audio output
  • greeting_message stringnullable

    The message the agent speaks when a user joins the channel.

  • failure_message stringnullable

    The message the agent speaks when an error occurs.

  • vendor stringrequired

    The MLLM provider identifier. Set to "gemini" to use Google Gemini Live with the Gemini Developer API.

For comprehensive API reference, real-time capabilities, and detailed parameter descriptions, see the Google Gemini Live API.