<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Evan Lin</title>
    <description>The latest articles on DEV Community by Evan Lin (@evanlin).</description>
    <link>https://dev.to/evanlin</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F409957%2Fc150d4a7-cb20-469d-a230-bac27232c577.jpeg</url>
      <title>DEV Community: Evan Lin</title>
      <link>https://dev.to/evanlin</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/evanlin"/>
    <language>en</language>
    <item>
      <title>[Gemini] Building a LINE E-commerce Chatbot That Can "Tell Stories from Images"</title>
      <dc:creator>Evan Lin</dc:creator>
      <pubDate>Sun, 29 Mar 2026 02:08:30 +0000</pubDate>
      <link>https://dev.to/evanlin/gemini-building-a-line-e-commerce-chatbot-that-can-tell-stories-from-images-41i0</link>
      <guid>https://dev.to/evanlin/gemini-building-a-line-e-commerce-chatbot-that-can-tell-stories-from-images-41i0</guid>
      <description>&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fuc7ulj3k2ehr5j0fwdch.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fuc7ulj3k2ehr5j0fwdch.png" alt="image-20260225234804185" width="800" height="860"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxxox886v85qv8909apuo.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxxox886v85qv8909apuo.png" alt="image-20260225234701217" width="800" height="858"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Reference articles:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://ai.google.dev/gemini-api/docs/function-calling?hl=zh-tw#multimodal" rel="noopener noreferrer"&gt;Gemini API - Function Calling with Multimodal&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/kkdai/linebot-gemini-multimodel-funcal" rel="noopener noreferrer"&gt;GitHub: linebot-gemini-multimodel-funcal&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://cloud.google.com/vertex-ai/generative-ai/docs/multimodal/function-calling#mm-fr" rel="noopener noreferrer"&gt;Vertex AI - Multimodal Function Response&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Complete code &lt;a href="https://github.com/kkdai/linebot-gemini-multimodel-funcal" rel="noopener noreferrer"&gt;GitHub&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;h1&gt;
  
  
  Background
&lt;/h1&gt;

&lt;p&gt;I believe many people have used the combination of LINE Bot + Function Calling. When a user asks "What clothes did I buy last month?", the Bot calls the database query function, retrieves the order data, and then Gemini answers based on that JSON:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="err"&gt;Traditional&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;process&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;designed&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;by&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;developers:&lt;/span&gt;&lt;span class="w"&gt;

&lt;/span&gt;&lt;span class="err"&gt;User:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Help me see the jacket I bought before"&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="err"&gt;Bot:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="err"&gt;Call&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;get_order_history()&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="err"&gt;Function&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;returns:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"product_name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Brown pilot jacket"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"order_date"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"2026-01-15"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;...&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="err"&gt;Gemini:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"You bought a brown pilot jacket on January 15th for NT$1,890."&lt;/span&gt;&lt;span class="w"&gt;

&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The answer is completely correct, but it always feels like something is missing—the user is talking about "that jacket," and Gemini is just restating the text in the JSON, with no way to "confirm" what the jacket looks like. If there happen to be three jackets in the database, the AI can't even determine which one is the one the user remembers.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;AI can read text, but it can't see pictures&lt;/strong&gt;—this limitation has always been a blind spot in the traditional Function Calling architecture.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvkluxi9r5zkhj1vys100.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvkluxi9r5zkhj1vys100.png" alt="Google Chrome 2026-02-26 10.34.51" width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fw3gpn1tkbj80ifh65vsr.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fw3gpn1tkbj80ifh65vsr.png" alt="Google Chrome 2026-02-26 10.34.58" width="800" height="443"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This problem was truly solved only after Gemini introduced &lt;strong&gt;Multimodal Function Response&lt;/strong&gt;.&lt;/p&gt;




&lt;h2&gt;
  
  
  What is Multimodal Function Response?
&lt;/h2&gt;

&lt;p&gt;The traditional Function Calling process is as follows:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;[User message] → Gemini → [function_call] → [Execute function] → [Return JSON] → Gemini → [Text answer]

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Multimodal Function Response&lt;/strong&gt; changes that middle step. The function can not only return JSON, but also include images (JPEG/PNG/WebP) or documents (PDF) in the same response:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fge58c6ayrjas18sl2qjz.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fge58c6ayrjas18sl2qjz.png" alt="Google Chrome 2026-02-25 23.04.28" width="800" height="439"&gt;&lt;/a&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;[User message] → Gemini → [function_call] → [Execute function] → [Return JSON + image bytes] → Gemini → [Text answer that has seen the image]

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;When Gemini generates the next round of answers, it can "see" both the structured data and the image returned by the function, thereby generating richer and more accurate responses.&lt;/p&gt;

&lt;p&gt;The official currently supported media formats:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Category&lt;/th&gt;
&lt;th&gt;Supported formats&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Image&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;image/jpeg&lt;/code&gt;, &lt;code&gt;image/png&lt;/code&gt;, &lt;code&gt;image/webp&lt;/code&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Document&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;application/pdf&lt;/code&gt;, &lt;code&gt;text/plain&lt;/code&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The application scenarios for this feature are very broad: e-commerce customer service (identifying product images), medical consultation (analyzing PDF inspection reports), design review (giving suggestions based on screenshots)... almost all scenarios that require "functions to return visual data for AI analysis" are applicable.&lt;/p&gt;




&lt;h2&gt;
  
  
  Project Goal
&lt;/h2&gt;

&lt;p&gt;This time, I used Multimodal Function Response to create a &lt;strong&gt;LINE e-commerce customer service robot&lt;/strong&gt;, demonstrating the following scenario:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;User: "Help me see the jacket I bought before" Bot (traditional): "You bought a brown pilot jacket." Bot (Multimodal): "From the photo, you can see that this is a brown pilot jacket, made of lightweight nylon, with metal zipper pockets on the sides. This is your January 15th order ORD-2026-0115, for a total of NT$1,890, and it has been delivered." + &lt;strong&gt;Product photo&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;The difference is obvious: Gemini really "saw" the jacket, rather than just restating the text in the database.&lt;/p&gt;




&lt;h2&gt;
  
  
  Architecture Design
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Why not use Google ADK?
&lt;/h3&gt;

&lt;p&gt;Originally, this repo used Google ADK (Agent Development Kit) to manage the Agent. The &lt;code&gt;Runner&lt;/code&gt; and &lt;code&gt;Agent&lt;/code&gt; of ADK encapsulated the entire process of Function Calling, which was very convenient.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;But Multimodal Function Response requires manually including image bytes in the &lt;code&gt;parts&lt;/code&gt; of the function response, and ADK completely encapsulates this layer, so it can't be intervened.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;So this time, I directly used &lt;code&gt;google.genai.Client&lt;/code&gt; to implement the iterative cycle of function calls myself:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Old architecture (ADK)
&lt;/span&gt;&lt;span class="n"&gt;runner&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Runner&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;agent&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;root_agent&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;...)&lt;/span&gt;
&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;event&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;runner&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;run_async&lt;/span&gt;&lt;span class="p"&gt;(...):&lt;/span&gt;
    &lt;span class="p"&gt;...&lt;/span&gt; &lt;span class="c1"&gt;# ADK handles all function calls for you, but you can't control the response content
&lt;/span&gt;
&lt;span class="c1"&gt;# New architecture (directly use google.genai)
&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;aio&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;models&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;generate_content&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;contents&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;contents&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;config&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;types&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;GenerateContentConfig&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tools&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;ECOMMERCE_TOOLS&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="c1"&gt;# Handle function calls yourself, include images yourself
&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Overall architecture
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;LINE User
    │
    ▼ POST /
FastAPI Webhook Handler
    │
    ▼
EcommerceAgent.process_message(text, line_user_id)
    │
    ├─ ① Call Gemini (with conversation history)
    │
    ├─ ② Gemini decides to call a tool → function_call
    │
    ├─ ③ _execute_tool()
    │ ├─ Execute query function (search_products / get_order_history / get_product_details)
    │ └─ Read real product photos in the img/ directory (Unsplash JPEG)
    │
    ├─ ④ Construct Multimodal Function Response
    │ └─ FunctionResponsePart(inline_data=FunctionResponseBlob(data=image_bytes))
    │
    ├─ ⑤ Call Gemini again (Gemini sees the image + data)
    │
    └─ ⑥ Return (ai_text, image_bytes)
    │
    ▼
LINE Reply:
  TextSendMessage(text=ai_text)
  ImageSendMessage(url=BOT_HOST_URL/images/{uuid}) ← FastAPI /images endpoint provides

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  How to get product images?
&lt;/h3&gt;

&lt;p&gt;This demo uses real &lt;strong&gt;Unsplash clothing photography photos&lt;/strong&gt;. Each of the five products corresponds to an actual photo of the item, stored in the &lt;code&gt;img/&lt;/code&gt; directory. The reading logic is very simple:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;generate_product_image&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;product&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;bytes&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Read the product image and return JPEG bytes.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="nf"&gt;open&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;product&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;image_path&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;rb&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;read&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Each product in &lt;code&gt;PRODUCTS_DB&lt;/code&gt; has an &lt;code&gt;image_path&lt;/code&gt; field pointing to the corresponding image file:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Product ID&lt;/th&gt;
&lt;th&gt;Name&lt;/th&gt;
&lt;th&gt;Image&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;P001&lt;/td&gt;
&lt;td&gt;Brown pilot jacket&lt;/td&gt;
&lt;td&gt;tobias-tullius-…-unsplash.jpg&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;P002&lt;/td&gt;
&lt;td&gt;White cotton university T&lt;/td&gt;
&lt;td&gt;mediamodifier-…-unsplash.jpg&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;P003&lt;/td&gt;
&lt;td&gt;Dark blue denim jacket&lt;/td&gt;
&lt;td&gt;caio-coelho-…-unsplash.jpg&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;P004&lt;/td&gt;
&lt;td&gt;Beige knitted shawl&lt;/td&gt;
&lt;td&gt;milada-vigerova-…-unsplash.jpg&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;P005&lt;/td&gt;
&lt;td&gt;Light blue simple T-shirt&lt;/td&gt;
&lt;td&gt;cristofer-maximilian-…-unsplash.jpg&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The image bytes read have two uses:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt; As &lt;code&gt;FunctionResponseBlob&lt;/code&gt; to include for Gemini analysis—real photos allow Gemini to describe the actual fabric texture and tailoring details&lt;/li&gt;
&lt;li&gt; Temporarily stored in the &lt;code&gt;image_cache&lt;/code&gt; dict, provided to the LINE Bot for display through the FastAPI &lt;code&gt;/images/{uuid}&lt;/code&gt; endpoint&lt;/li&gt;
&lt;/ol&gt;




&lt;h2&gt;
  
  
  Detailed explanation of the core code
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Step 1: Define tools (FunctionDeclaration)
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;google.genai&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;types&lt;/span&gt;

&lt;span class="n"&gt;ECOMMERCE_TOOLS&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
    &lt;span class="n"&gt;types&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Tool&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;function_declarations&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;
        &lt;span class="n"&gt;types&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;FunctionDeclaration&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;get_order_history&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;description&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Query the current user&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;s order history&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;parameters&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;types&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Schema&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
                &lt;span class="nb"&gt;type&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;types&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Type&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;OBJECT&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="n"&gt;properties&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
                    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;time_range&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;types&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Schema&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
                        &lt;span class="nb"&gt;type&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;types&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Type&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;STRING&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                        &lt;span class="n"&gt;description&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Time range: all / last_month / last_3_months&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                        &lt;span class="n"&gt;enum&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;all&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;last_month&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;last_3_months&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
                    &lt;span class="p"&gt;),&lt;/span&gt;
                &lt;span class="p"&gt;},&lt;/span&gt;
                &lt;span class="n"&gt;required&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[],&lt;/span&gt;
            &lt;span class="p"&gt;),&lt;/span&gt;
        &lt;span class="p"&gt;),&lt;/span&gt;
        &lt;span class="c1"&gt;# ... search_products, get_product_details
&lt;/span&gt;    &lt;span class="p"&gt;])&lt;/span&gt;
&lt;span class="p"&gt;]&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Step 2: Function call cycle (up to 5 iterations)
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;process_message&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;line_user_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;contents&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;_get_history&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;line_user_id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
        &lt;span class="n"&gt;types&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Content&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;role&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;parts&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;types&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Part&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;)])&lt;/span&gt;
    &lt;span class="p"&gt;]&lt;/span&gt;

    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;_iteration&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nf"&gt;range&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="c1"&gt;# Up to 5 times, to prevent infinite loops
&lt;/span&gt;        &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;_client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;aio&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;models&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;generate_content&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;_model&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;contents&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;contents&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;config&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;types&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;GenerateContentConfig&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
                &lt;span class="n"&gt;system_instruction&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;_SYSTEM_INSTRUCTION&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="n"&gt;tools&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;ECOMMERCE_TOOLS&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="p"&gt;),&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="n"&gt;model_content&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;candidates&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;
        &lt;span class="n"&gt;contents&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;model_content&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="c1"&gt;# Find all function_call parts
&lt;/span&gt;        &lt;span class="n"&gt;fc_parts&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;p&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;p&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;model_content&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;parts&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;p&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;function_call&lt;/span&gt; &lt;span class="ow"&gt;and&lt;/span&gt; &lt;span class="n"&gt;p&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;function_call&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="n"&gt;fc_parts&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="c1"&gt;# No function call → final text response
&lt;/span&gt;            &lt;span class="n"&gt;final_text&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;""&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;join&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;p&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;p&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;model_content&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;parts&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;p&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="k"&gt;break&lt;/span&gt;

        &lt;span class="c1"&gt;# Has function call → execute tool, include image
&lt;/span&gt;        &lt;span class="n"&gt;tool_parts&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;
        &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;fc_part&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;fc_parts&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;result_dict&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;image_bytes&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;_execute_tool&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
                &lt;span class="n"&gt;fc_part&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;function_call&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="nf"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;fc_part&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;function_call&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
                &lt;span class="n"&gt;line_user_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="n"&gt;tool_parts&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
                &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;_build_multimodal_response&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;fc_part&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;function_call&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;result_dict&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;image_bytes&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="n"&gt;contents&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;types&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Content&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;role&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;tool&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;parts&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;tool_parts&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Step 3: Construct Multimodal Function Response (the most critical step)
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;_build_multimodal_response&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;func_name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;result_dict&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;image_bytes&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;multimodal_parts&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;

    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;image_bytes&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="c1"&gt;# ⚠️ Note: Use FunctionResponseBlob here, not types.Blob!
&lt;/span&gt;        &lt;span class="n"&gt;multimodal_parts&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;types&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;FunctionResponsePart&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
                &lt;span class="n"&gt;inline_data&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;types&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;FunctionResponseBlob&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
                    &lt;span class="n"&gt;mime_type&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;image/jpeg&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                    &lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;image_bytes&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="c1"&gt;# raw bytes, SDK handles base64 internally
&lt;/span&gt;                &lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;types&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Part&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;from_function_response&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;func_name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;result_dict&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="c1"&gt;# Structured JSON data
&lt;/span&gt;        &lt;span class="n"&gt;parts&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;multimodal_parts&lt;/span&gt; &lt;span class="ow"&gt;or&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="c1"&gt;# ← Image is here! Gemini can "see" it after receiving it
&lt;/span&gt;    &lt;span class="p"&gt;)&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Gemini will receive both &lt;code&gt;result_dict&lt;/code&gt; (order JSON) and &lt;code&gt;image_bytes&lt;/code&gt; (product image) in the next &lt;code&gt;generate_content&lt;/code&gt; call, and the generated answer can therefore describe the visual content of the image.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 4: LINE Bot simultaneously returns text + image
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# main.py
&lt;/span&gt;
&lt;span class="n"&gt;ai_text&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;image_bytes&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;ecommerce_agent&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;process_message&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;msg_text&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;line_user_id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;reply_messages&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nc"&gt;TextSendMessage&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;ai_text&lt;/span&gt;&lt;span class="p"&gt;)]&lt;/span&gt;

&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;image_bytes&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;image_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;str&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;uuid&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;uuid4&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;
    &lt;span class="n"&gt;image_cache&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;image_id&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;image_bytes&lt;/span&gt; &lt;span class="c1"&gt;# Temporary storage
&lt;/span&gt;    &lt;span class="n"&gt;image_url&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;BOT_HOST_URL&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;/images/&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;image_id&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="c1"&gt;# FastAPI provides service
&lt;/span&gt;    &lt;span class="n"&gt;reply_messages&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="nc"&gt;ImageSendMessage&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;original_content_url&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;image_url&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;preview_image_url&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;image_url&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;get_line_bot_api&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;reply_message&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;reply_token&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;reply_messages&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;LINE Bot's &lt;code&gt;reply_message&lt;/code&gt; supports returning multiple messages at once (up to 5), so text and images can be sent simultaneously.&lt;/p&gt;




&lt;h2&gt;
  
  
  Potholes
&lt;/h2&gt;

&lt;h3&gt;
  
  
  ❌ Pitfall 1: &lt;code&gt;FunctionResponseBlob&lt;/code&gt; is not &lt;code&gt;Blob&lt;/code&gt;
&lt;/h3&gt;

&lt;p&gt;The most common pitfall: When constructing multimodal image parts, &lt;strong&gt;you cannot use &lt;code&gt;types.Blob&lt;/code&gt;&lt;/strong&gt;, you must use &lt;code&gt;types.FunctionResponseBlob&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# ❌ Error (will TypeError)
&lt;/span&gt;&lt;span class="n"&gt;types&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;FunctionResponsePart&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;inline_data&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;types&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Blob&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;mime_type&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;image/jpeg&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;image_bytes&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# ✅ Correct
&lt;/span&gt;&lt;span class="n"&gt;types&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;FunctionResponsePart&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;inline_data&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;types&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;FunctionResponseBlob&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;mime_type&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;image/jpeg&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;image_bytes&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Although both have &lt;code&gt;mime_type&lt;/code&gt; and &lt;code&gt;data&lt;/code&gt; fields, the &lt;code&gt;inline_data&lt;/code&gt; field type of &lt;code&gt;FunctionResponsePart&lt;/code&gt; is &lt;code&gt;FunctionResponseBlob&lt;/code&gt;, and Pydantic validation will directly reject &lt;code&gt;Blob&lt;/code&gt;. You can confirm this with &lt;code&gt;python -c "from google.genai import types; print(types.FunctionResponsePart.model_fields)"&lt;/code&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  ❌ Pitfall 2: &lt;code&gt;aiohttp.ClientSession&lt;/code&gt; cannot be created at the module level
&lt;/h3&gt;

&lt;p&gt;The original code directly created &lt;code&gt;aiohttp.ClientSession()&lt;/code&gt; at the module level:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# ❌ Old method: module level
&lt;/span&gt;&lt;span class="n"&gt;session&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;aiohttp&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;ClientSession&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="c1"&gt;# Will warn or error if there is no running event loop
&lt;/span&gt;&lt;span class="n"&gt;async_http_client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;AiohttpAsyncHttpClient&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;session&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;When importing &lt;code&gt;main.py&lt;/code&gt; in pytest tests, because there is no running event loop, &lt;code&gt;RuntimeError: no running event loop&lt;/code&gt; will appear. The solution is to change to lazy initialization, and create it only when it is actually needed for the first time:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# ✅ New method: lazy init
&lt;/span&gt;&lt;span class="n"&gt;_line_bot_api&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;get_line_bot_api&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="k"&gt;global&lt;/span&gt; &lt;span class="n"&gt;_line_bot_api&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;_line_bot_api&lt;/span&gt; &lt;span class="ow"&gt;is&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;session&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;aiohttp&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;ClientSession&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="c1"&gt;# Called within the async route handler, guaranteeing an event loop
&lt;/span&gt;        &lt;span class="n"&gt;_line_bot_api&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;AsyncLineBotApi&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;channel_access_token&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nc"&gt;AiohttpAsyncHttpClient&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;session&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;_line_bot_api&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  ❌ Pitfall 3: LINE Bot needs HTTPS URL to send images
&lt;/h3&gt;

&lt;p&gt;Gemini receives raw bytes, but LINE Bot's &lt;code&gt;ImageSendMessage&lt;/code&gt; requires a &lt;strong&gt;publicly accessible HTTPS URL&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;The solution is to add a &lt;code&gt;/images/{image_id}&lt;/code&gt; endpoint in FastAPI, temporarily store the read image bytes in the &lt;code&gt;image_cache&lt;/code&gt; dict, and LINE gets the image through this endpoint:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="nd"&gt;@app.get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;/images/{image_id}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;serve_image&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;image_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;image_bytes&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;image_cache&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;image_id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;image_bytes&lt;/span&gt; &lt;span class="ow"&gt;is&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;raise&lt;/span&gt; &lt;span class="nc"&gt;HTTPException&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;status_code&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;404&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;detail&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Image not found&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nc"&gt;Response&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;image_bytes&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;media_type&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;image/jpeg&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Use &lt;code&gt;ngrok&lt;/code&gt; to expose port 8000 for local development, and use the service URL directly after Cloud Run deployment.&lt;/p&gt;




&lt;h2&gt;
  
  
  Demo Display
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Mock database (default data for demo)
&lt;/h3&gt;

&lt;p&gt;The system has 5 built-in products (all with real Unsplash photos), and each LINE user automatically binds two demo orders when querying orders for the first time:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Order number&lt;/th&gt;
&lt;th&gt;Date&lt;/th&gt;
&lt;th&gt;Product&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;ORD-2026-0115&lt;/td&gt;
&lt;td&gt;2026-01-15&lt;/td&gt;
&lt;td&gt;P001 Brown pilot jacket&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;ORD-2026-0108&lt;/td&gt;
&lt;td&gt;2026-01-08&lt;/td&gt;
&lt;td&gt;P003 Dark blue denim jacket&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  Scenario 1: "Help me see the jacket I bought before"
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;User sends: "Help me see the jacket I bought before"

[Gemini → function_call]
  get_order_history(time_range="all")

[_execute_tool execution]
  - get_order_history() returns two orders (P001, P003)
  - Read img/tobias-tullius-...-unsplash.jpg → Brown pilot jacket real photo bytes

[Multimodal Function Response]
  Part.from_function_response(
    name="get_order_history",
    response={"orders": [...], "order_count": 2},
    parts=[FunctionResponsePart(inline_data=FunctionResponseBlob(data=&amp;lt;photo&amp;gt;))]
  )

[Gemini responds after seeing the real photo]
  "From the photo, you can see that this is a brown pilot jacket, made of lightweight nylon with
   a glossy feel, and a metal zipper pocket on the left sleeve. This is your January 15, 2026
   order ORD-2026-0115, for a total of NT$1,890, status: delivered."

LINE displays: [Text] + [Brown pilot jacket real photo]

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Scenario 2: "Are there any dark blue jackets?"
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;[Gemini → function_call]
  search_products(description="dark blue jacket", color="dark blue")

[Gemini sees the real photo of the P003 dark blue denim jacket]
  "Yes! This dark blue denim jacket (P003) in the photo features a retro stitching design,
   a lapel with metal buttons, and a very complete garment feel, priced at NT$1,490, with 8 in stock."

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Scenario 3: "What are the features of the P004 knitted shawl?"
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;[Gemini → function_call]
  get_product_details(product_id="P004")

[Gemini sees the real photo of the beige knitted shawl]
  "The photo shows a beige handmade crochet shawl, with a V-neck design and tassels at the bottom,
   you can see the light lace-like mesh weave, elegant texture, priced at NT$1,290."

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Traditional Function Response vs Multimodal Function Response
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;Traditional&lt;/th&gt;
&lt;th&gt;&lt;strong&gt;Multimodal&lt;/strong&gt;&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Function return&lt;/td&gt;
&lt;td&gt;Pure JSON&lt;/td&gt;
&lt;td&gt;JSON + image/PDF bytes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Gemini perception&lt;/td&gt;
&lt;td&gt;Text data&lt;/td&gt;
&lt;td&gt;Text + visual content&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Answer quality&lt;/td&gt;
&lt;td&gt;"You bought a brown pilot jacket"&lt;/td&gt;
&lt;td&gt;"You can see the nylon texture in the photo, with a zipper pocket on the left sleeve..."&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;API difference&lt;/td&gt;
&lt;td&gt;&lt;code&gt;Part.from_function_response(name, response)&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;Part.from_function_response(name, response, parts=[...])&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Applicable scenarios&lt;/td&gt;
&lt;td&gt;Pure text data queries&lt;/td&gt;
&lt;td&gt;Scenarios that require visual recognition/confirmation&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h2&gt;
  
  
  Analysis and Outlook
&lt;/h2&gt;

&lt;p&gt;This implementation gave me a new understanding of Gemini's Function Calling capabilities.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The problem that Multimodal Function Response truly solves&lt;/strong&gt; is to allow AI agents to bring in visual information in the action of "calling an external system" itself, instead of first checking text and then uploading images separately. This will be an important basic capability in areas highly related to visuals, such as e-commerce, medicine, and design.&lt;/p&gt;

&lt;p&gt;However, there are still a few limitations worth noting:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Image URLs cannot be used directly&lt;/strong&gt;: Gemini's &lt;code&gt;FunctionResponseBlob&lt;/code&gt; requires raw bytes, and URLs cannot be filled in directly (this is different from bringing images directly in the prompt). If the image is originally a URL, you need to download it with &lt;code&gt;requests.get()&lt;/code&gt; to bytes before passing it in.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;No display_name can also be used&lt;/strong&gt;: The official documentation examples have &lt;code&gt;display_name&lt;/code&gt; and &lt;code&gt;$ref&lt;/code&gt; JSON reference, but in actual testing in google-genai 1.49.0, it can also work normally without filling in display_name, and Gemini can still see and analyze the image.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Model limitations&lt;/strong&gt;: The official mark supports the Gemini 3 series, but &lt;code&gt;gemini-2.0-flash&lt;/code&gt; can also handle it normally in actual testing, and the API structure is the same.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;There are many directions that can be extended in the future: let users send their own product photos for the Bot to compare, include PDF catalogs in the function response for Gemini to read directly, or let the Bot analyze the report images converted from DICOM in medical scenarios... As long as visual data can be obtained from external systems, Multimodal Function Response can make the AI's answers more in-depth.&lt;/p&gt;

&lt;h2&gt;
  
  
  Summary
&lt;/h2&gt;

&lt;p&gt;The focus of this LINE Bot implementation is only one sentence: &lt;strong&gt;Let the function response carry the image, and Gemini's answer will be upgraded from "restating data" to "telling a story based on the picture"&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;The core API is just these few lines, but it takes a lot of details to get the whole process working:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# The complete way for Gemini to see the image returned by the function
&lt;/span&gt;&lt;span class="n"&gt;types&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Part&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;from_function_response&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;get_order_history&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;orders&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[...]},&lt;/span&gt;
    &lt;span class="n"&gt;parts&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;
        &lt;span class="n"&gt;types&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;FunctionResponsePart&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;inline_data&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;types&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;FunctionResponseBlob&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt; &lt;span class="c1"&gt;# ← Not types.Blob!
&lt;/span&gt;                &lt;span class="n"&gt;mime_type&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;image/jpeg&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;image_bytes&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;],&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The complete code is on &lt;a href="https://github.com/kkdai/linebot-gemini-multimodel-funcal" rel="noopener noreferrer"&gt;GitHub&lt;/a&gt;, feel free to clone and play with it.&lt;/p&gt;

&lt;p&gt;See you next time!&lt;/p&gt;

</description>
      <category>ai</category>
      <category>api</category>
      <category>gemini</category>
      <category>llm</category>
    </item>
    <item>
      <title>Gemini Tool Combo: Building a LINE Meetup Helper with Maps Grounding and Places API in a Single API Call</title>
      <dc:creator>Evan Lin</dc:creator>
      <pubDate>Sun, 29 Mar 2026 02:07:59 +0000</pubDate>
      <link>https://dev.to/gde/gemini-tool-combo-building-a-line-meetup-helper-with-maps-grounding-and-places-api-in-a-single-api-3ppd</link>
      <guid>https://dev.to/gde/gemini-tool-combo-building-a-line-meetup-helper-with-maps-grounding-and-places-api-in-a-single-api-3ppd</guid>
      <description>&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1ljj7q6yd4dju6v6uxg2.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1ljj7q6yd4dju6v6uxg2.png" alt="image-20260327164715459" width="800" height="446"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Reference articles:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  &lt;a href="https://blog.google/innovation-and-ai/technology/developers-tools/gemini-api-tooling-updates/" rel="noopener noreferrer"&gt;Gemini API tooling updates: context circulation, tool combos and Maps grounding for Gemini 3&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;  &lt;a href="https://developers.google.com/maps/documentation/places/web-service/nearby-search" rel="noopener noreferrer"&gt;Google Places API (New) - searchNearby&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;  &lt;a href="https://github.com/kkdai/linebot-spot-finder" rel="noopener noreferrer"&gt;GitHub: linebot-spot-finder&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;  Complete code &lt;a href="https://github.com/kkdai/linebot-spot-finder" rel="noopener noreferrer"&gt;GitHub&lt;/a&gt; (Meeting Helper LINE Bot Spot Finder)&lt;/li&gt;
&lt;/ul&gt;

&lt;h1&gt;
  
  
  Background
&lt;/h1&gt;

&lt;p&gt;The combination of LINE Bot + Gemini is already very common. Whether it's using Google Search Grounding to let the model look up real-time information or using Function Calling to let the model call custom logic, they are both mature when used alone.&lt;/p&gt;

&lt;p&gt;But what if you want to achieve both "map location context" and "query real ratings" &lt;strong&gt;in the same question&lt;/strong&gt;?&lt;/p&gt;

&lt;p&gt;Taking restaurant search as an example, the traditional approach usually looks like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;User: "Help me find a hot pot restaurant nearby with a rating of 4 stars or above"

Solution A (using only Maps Grounding):
Gemini has map context, but the rating information is described by AI itself, and accuracy is not guaranteed.

Solution B (using only Places API):
You can get real ratings, but there is no map context, and Gemini doesn't know where the user is.

To have both, you usually need to make two API calls, or manually connect them yourself.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;AI can search maps and call external APIs, but doing both in a single call&lt;/strong&gt;—has always been an awkward blank in the old Gemini API architecture.&lt;/p&gt;

&lt;p&gt;Until March 17, 2026, Google released &lt;a href="https://blog.google/innovation-and-ai/technology/developers-tools/gemini-api-tooling-updates/" rel="noopener noreferrer"&gt;Gemini API Tooling Updates&lt;/a&gt; (by Mariano Cocirio), which provided an official solution to this problem.&lt;/p&gt;




&lt;h2&gt;
  
  
  What are Tool Combinations?
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fl69w6em7cc4jzvdxmdiu.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fl69w6em7cc4jzvdxmdiu.png" alt="image-20260327163136077" width="800" height="341"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Google announced three core features in this &lt;a href="https://blog.google/innovation-and-ai/technology/developers-tools/gemini-api-tooling-updates/" rel="noopener noreferrer"&gt;update&lt;/a&gt;:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;1. Tool Combinations&lt;/strong&gt; Developers can now attach built-in tools (such as Google Search, Google Maps) and custom Function Declarations simultaneously in a &lt;strong&gt;single Gemini API call&lt;/strong&gt;. The model decides which tool to call and when to call it, and finally integrates the results to generate an answer.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. Maps Grounding&lt;/strong&gt; Gemini can now directly perceive map data, not just text descriptions of "location", but truly has spatial context—knowing where the user is and what's nearby.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. Context Circulation&lt;/strong&gt; Allows the context between multi-turn tool calls to flow naturally, and the model can fully remember the results of the first tool call when making the second call.&lt;/p&gt;

&lt;p&gt;The key to this change is:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Old approach (two tools cannot coexist)
&lt;/span&gt;&lt;span class="n"&gt;types&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Tool&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;google_search&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;types&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;GoogleSearch&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;
&lt;span class="n"&gt;types&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Tool&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;function_declarations&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;MY_FN&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;

&lt;span class="c1"&gt;# New approach (the same Tool object, both coexist)
&lt;/span&gt;&lt;span class="n"&gt;types&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Tool&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;google_maps&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;types&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;GoogleMaps&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
    &lt;span class="n"&gt;function_declarations&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;MY_FN&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;One line of modification opens up a whole new combination method.&lt;/p&gt;




&lt;h2&gt;
  
  
  Project Goal
&lt;/h2&gt;

&lt;p&gt;This time, I used Tool Combinations to transform the existing &lt;strong&gt;linebot-spot-finder&lt;/strong&gt;, upgrading it from "only Maps Grounding for rough answers" to "Google Maps context + Places API real data":&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;After the user sends their GPS location, they enter: "Please find a hot pot restaurant with a rating of 4 stars or above, suitable for group dining, and list the name, address, and review summary."&lt;/p&gt;

&lt;p&gt;Bot (old Maps Grounding): "There are several hot pot restaurants nearby, and the ratings are good." (AI describes it itself, which may not be accurate)&lt;/p&gt;

&lt;p&gt;Bot (new Tool Combo): "Lao Wang Hot Pot | 100 Shimin Avenue, Xinyi District, Taipei City | Rating 4.6 (312) | Reviews: Large portions, great value for money, suitable for group dining; efficient service, fast serving."&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;The difference is: Gemini now receives both map context (where you are) and the &lt;strong&gt;real structured data&lt;/strong&gt; (rating numbers, review text) from the Places API, so the answer changes from a "vague description" to "informed information".&lt;/p&gt;




&lt;h2&gt;
  
  
  Architecture Design
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Overall Message Flow
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;LINE User sends GPS location
    │
    ▼
handle_location() → session.metadata stores lat/lng
    │
    └──► Returns Quick Reply (restaurant / gas station / parking lot)

LINE User sends text question (e.g. "Find a hot pot restaurant with a rating of 4 stars or above")
    │
    ▼
handle_text()
    │
    ├── session has lat/lng?
    │ Yes → tool_combo_search(query, lat, lng) ← Focus of this article
    │ No → fallback: Gemini Chat + Google Search
    │
    └──► Returns natural language answer
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Tool Combo Agentic Loop
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;tool_combo_search(query, lat, lng)
         │
         ▼
  Step 1: generate_content()
  tools = [google_maps + search_nearby_restaurants]
         │
         ▼
  response.candidates[0].content.parts has function_call?
       ╱ ╲
      Yes   No
      │     │
      ▼     ▼
  _execute_function()  Directly returns response.text
  → _call_places_api()
    (Places API searchNearby)
    Returns rating, address, reviews
      │
      ▼
  Collect into a single Content(role="user")
  Add to history
      │
      ▼
  Step 3: generate_content(contents=history)
  Gemini integrates map context + Places data
      │
      ▼
  Returns final.text
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Why not put lat/lng in Function Declaration?
&lt;/h3&gt;

&lt;p&gt;This is an important design decision.&lt;/p&gt;

&lt;p&gt;If you add &lt;code&gt;lat&lt;/code&gt;/&lt;code&gt;lng&lt;/code&gt; to the parameters of &lt;code&gt;SEARCH_NEARBY_RESTAURANTS_FN&lt;/code&gt;, Gemini will fill in the coordinates itself—but it fills in the "approximate location" inferred from the conversation, not the user's actual GPS coordinates, and the error can be as high as several kilometers.&lt;/p&gt;

&lt;p&gt;The correct approach is to let the Python dispatcher extract the precise coordinates from &lt;code&gt;session.metadata&lt;/code&gt; and &lt;strong&gt;inject&lt;/strong&gt; them:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;_execute_function&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;lat&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;float&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;lng&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;float&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;name&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;search_nearby_restaurants&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;_call_places_api&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;lat&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;lat&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;lng&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;lng&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="c1"&gt;# ← Inject from session, don't let Gemini guess
&lt;/span&gt;            &lt;span class="n"&gt;keyword&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;keyword&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;""&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
            &lt;span class="n"&gt;min_rating&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nf"&gt;float&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;min_rating&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;4.0&lt;/span&gt;&lt;span class="p"&gt;)),&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Core Code Details
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Step 1: Define Function Declaration
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;google.genai&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;types&lt;/span&gt;

&lt;span class="n"&gt;SEARCH_NEARBY_RESTAURANTS_FN&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;types&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;FunctionDeclaration&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;search_nearby_restaurants&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;description&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Search for nearby restaurants using Google Places API, and return the rating, address, and user reviews.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;lat/lng is automatically included by the system and does not need to be provided.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="n"&gt;parameters&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;types&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Schema&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="nb"&gt;type&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;types&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Type&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;OBJECT&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;properties&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;keyword&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;types&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Schema&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
                &lt;span class="nb"&gt;type&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;types&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Type&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;STRING&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="n"&gt;description&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Restaurant type or keyword, such as: hot pot, hot pot, Italian&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="p"&gt;),&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;min_rating&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;types&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Schema&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
                &lt;span class="nb"&gt;type&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;types&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Type&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;NUMBER&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="n"&gt;description&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Minimum rating threshold (1–5), default 4.0&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="p"&gt;),&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;radius_m&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;types&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Schema&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
                &lt;span class="nb"&gt;type&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;types&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Type&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;INTEGER&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="n"&gt;description&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Search radius (meters), default 1000&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="p"&gt;),&lt;/span&gt;
        &lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="p"&gt;),&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The description clearly tells the model "lat/lng is included by the system", avoiding the model filling in the coordinates itself in the args.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 2: Places API Call
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;httpx&lt;/span&gt;

&lt;span class="n"&gt;PLACES_API_URL&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;https://places.googleapis.com/v1/places:searchNearby&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="n"&gt;PLACES_FIELD_MASK&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;places.displayName,&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;places.rating,&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;places.userRatingCount,&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;places.formattedAddress,&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;places.reviews&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;_call_places_api&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;lat&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;lng&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;keyword&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;""&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;min_rating&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;4.0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;radius_m&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1000&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;body&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;includedTypes&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;restaurant&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;maxResultCount&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;locationRestriction&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;circle&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;center&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;latitude&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;lat&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;longitude&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;lng&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;radiusMeters&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;radius_m&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="p"&gt;}&lt;/span&gt;
        &lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;httpx&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;post&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;PLACES_API_URL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;headers&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;X-Goog-Api-Key&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getenv&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;GOOGLE_MAPS_API_KEY&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;X-Goog-FieldMask&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;PLACES_FIELD_MASK&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="p"&gt;},&lt;/span&gt;
        &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;body&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;timeout&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;10.0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;raise_for_status&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="n"&gt;data&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;json&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

    &lt;span class="n"&gt;restaurants&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;place&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;places&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;[]):&lt;/span&gt;
        &lt;span class="n"&gt;rating&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;place&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;rating&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;rating&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="n"&gt;min_rating&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="k"&gt;continue&lt;/span&gt;
        &lt;span class="n"&gt;reviews&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
            &lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;text&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;text&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
            &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;r&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;place&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;reviews&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;[])[:&lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
            &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;text&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{}).&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;text&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="p"&gt;]&lt;/span&gt;
        &lt;span class="n"&gt;restaurants&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;name&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;place&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;displayName&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;text&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;address&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;place&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;formattedAddress&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;""&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;rating&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;rating&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;rating_count&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;place&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;userRatingCount&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;reviews&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;reviews&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="p"&gt;})&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;restaurants&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;restaurants&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Step 3: Tool Combo Main Function (Agentic Loop)
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;tool_combo_search&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;lat&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;float&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;lng&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;float&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;genai&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Client&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;vertexai&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;project&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getenv&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;GOOGLE_CLOUD_PROJECT&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
        &lt;span class="n"&gt;location&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getenv&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;GOOGLE_CLOUD_LOCATION&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;us-central1&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
        &lt;span class="n"&gt;http_options&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;types&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;HttpOptions&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;api_version&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;v1&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;enriched_query&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;User&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;s current location: latitude &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;lat&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;, longitude &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;lng&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;.&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
        &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Please answer in traditional Chinese using Taiwanese terminology, and do not use markdown format.&lt;/span&gt;&lt;span class="se"&gt;\n\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
        &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Question: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;tool_config&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;types&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;GenerateContentConfig&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;tools&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;
            &lt;span class="n"&gt;types&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Tool&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
                &lt;span class="n"&gt;google_maps&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;types&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;GoogleMaps&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt; &lt;span class="c1"&gt;# ← Maps grounding
&lt;/span&gt;                &lt;span class="n"&gt;function_declarations&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;SEARCH_NEARBY_RESTAURANTS_FN&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="c1"&gt;# ← Places API
&lt;/span&gt;            &lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# ── Step 1 ──────────────────────────────────────────────────────
&lt;/span&gt;    &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;models&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;generate_content&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;TOOL_COMBO_MODEL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;contents&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;enriched_query&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;config&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;tool_config&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;candidates&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt; &lt;span class="ow"&gt;or&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;（Unable to get a reply）&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

    &lt;span class="n"&gt;history&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
        &lt;span class="n"&gt;types&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Content&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;role&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;parts&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;types&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Part&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;enriched_query&lt;/span&gt;&lt;span class="p"&gt;)]),&lt;/span&gt;
        &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;candidates&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;]&lt;/span&gt;

    &lt;span class="c1"&gt;# ── Step 2：Processing function_call ──────────────────────────────────
&lt;/span&gt;    &lt;span class="n"&gt;function_response_parts&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;part&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;candidates&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;parts&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;part&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;function_call&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;fn&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;part&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;function_call&lt;/span&gt;
            &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;_execute_function&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;fn&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nf"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;fn&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;args&lt;/span&gt; &lt;span class="ow"&gt;or&lt;/span&gt; &lt;span class="p"&gt;{}),&lt;/span&gt; &lt;span class="n"&gt;lat&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;lng&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="n"&gt;function_response_parts&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
                &lt;span class="n"&gt;types&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Part&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
                    &lt;span class="n"&gt;function_response&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;types&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;FunctionResponse&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
                        &lt;span class="nb"&gt;id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;fn&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nb"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;fn&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                    &lt;span class="p"&gt;)&lt;/span&gt;
                &lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;function_response_parts&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;history&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;types&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Content&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;role&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;parts&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;function_response_parts&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;

        &lt;span class="c1"&gt;# ── Step 3 ────────────────────────────────────────────────────
&lt;/span&gt;        &lt;span class="n"&gt;final&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;models&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;generate_content&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;TOOL_COMBO_MODEL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;contents&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;history&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;config&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;tool_config&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;final&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt; &lt;span class="ow"&gt;or&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;（Unable to get a reply）&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt; &lt;span class="ow"&gt;or&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;（Unable to get a reply）&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Pitfalls Encountered
&lt;/h2&gt;

&lt;h3&gt;
  
  
  ❌ Pitfall 1: &lt;code&gt;Part.from_function_response()&lt;/code&gt; does not accept the &lt;code&gt;id&lt;/code&gt; parameter
&lt;/h3&gt;

&lt;p&gt;This is the easiest pitfall to step into this time, and the error only explodes when &lt;strong&gt;real model calls&lt;/strong&gt; are made, and unit tests almost never detect it.&lt;/p&gt;

&lt;p&gt;Originally, I wrote it like this, referring to the official example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# ❌ Error——TypeError occurs at runtime
&lt;/span&gt;&lt;span class="n"&gt;types&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Part&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;from_function_response&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="nb"&gt;id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;fn&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nb"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="c1"&gt;# ← This parameter does not exist!
&lt;/span&gt;    &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;fn_name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The actual signature of &lt;code&gt;from_function_response&lt;/code&gt; is:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;(*, name: str, response: dict, parts: Optional[list] = None) -&amp;gt; Part
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;There is no &lt;code&gt;id&lt;/code&gt; parameter at all. Every time the model actually triggers a function_call, the program will throw a &lt;code&gt;TypeError&lt;/code&gt; at this line, and then silently enter the except of Step 3, returning an error message, and the results of the Places API are never truly returned to Gemini.&lt;/p&gt;

&lt;p&gt;The correct way is to directly construct &lt;code&gt;types.FunctionResponse&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# ✅ Correct
&lt;/span&gt;&lt;span class="n"&gt;types&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Part&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;function_response&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;types&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;FunctionResponse&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="nb"&gt;id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;fn&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nb"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;fn_name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You can immediately confirm the parameter list with &lt;code&gt;python -c "from google.genai import types; help(types.Part.from_function_response)"&lt;/code&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  ❌ Pitfall 2: &lt;code&gt;include_server_side_tool_invocations=True&lt;/code&gt; causes Pydantic to explode
&lt;/h3&gt;

&lt;p&gt;I thought I should add this parameter after seeing the official documentation example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# ❌ Error
&lt;/span&gt;&lt;span class="n"&gt;types&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;GenerateContentConfig&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;tools&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[...],&lt;/span&gt;
    &lt;span class="n"&gt;include_server_side_tool_invocations&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="c1"&gt;# ← The installed SDK version does not support it
&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;In &lt;code&gt;google-genai 1.49.0&lt;/code&gt;, this field is not in the model fields of &lt;code&gt;GenerateContentConfig&lt;/code&gt;, and Pydantic will directly throw an &lt;code&gt;extra_forbidden&lt;/code&gt; validation error. Just remove it, and the function is completely normal.&lt;/p&gt;

&lt;h3&gt;
  
  
  ❌ Pitfall 3: &lt;code&gt;textQuery&lt;/code&gt; is a parameter of &lt;code&gt;searchText&lt;/code&gt;, not &lt;code&gt;searchNearby&lt;/code&gt;
&lt;/h3&gt;

&lt;p&gt;I thought "if there is a keyword, then bring it into the Places API", and intuitively added it to the request body:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# ❌ Error——Invalid field for searchNearby endpoint
&lt;/span&gt;&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;keyword&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;body&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;textQuery&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;keyword&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;searchNearby&lt;/code&gt; only accepts fields such as &lt;code&gt;includedTypes&lt;/code&gt;, &lt;code&gt;locationRestriction&lt;/code&gt;; &lt;code&gt;textQuery&lt;/code&gt; is a parameter of the &lt;code&gt;searchText&lt;/code&gt; endpoint. Adding this field will not report an error (in some versions), but the keyword will not take effect at all.&lt;/p&gt;

&lt;p&gt;The correct approach is to leave the keyword in the description of the Function Declaration for Gemini to refer to, let the model translate the intent to &lt;code&gt;enriched_query&lt;/code&gt;, let Maps Grounding handle the keyword semantics, and Places API is only responsible for returning real rating data.&lt;/p&gt;

&lt;h3&gt;
  
  
  ❌ Pitfall 4: No guard for &lt;code&gt;response.candidates[0]&lt;/code&gt;
&lt;/h3&gt;

&lt;p&gt;When the model encounters security filtering, RECITATION, or other abnormal termination, &lt;code&gt;candidates&lt;/code&gt; may be an empty list, and then directly &lt;code&gt;response.candidates[0]&lt;/code&gt; is &lt;code&gt;IndexError&lt;/code&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# ❌ No guard
&lt;/span&gt;&lt;span class="n"&gt;history&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
    &lt;span class="n"&gt;types&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Content&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;role&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;parts&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;types&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Part&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;enriched_query&lt;/span&gt;&lt;span class="p"&gt;)]),&lt;/span&gt;
    &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;candidates&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="c1"&gt;# ← If candidates is empty, it will explode
&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

&lt;span class="c1"&gt;# ✅ Add guard
&lt;/span&gt;&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;candidates&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt; &lt;span class="ow"&gt;or&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;（Unable to get a reply）&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

&lt;span class="n"&gt;history&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[...]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Demo Display
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvionmd6lsyr2srm5gg87.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvionmd6lsyr2srm5gg87.png" alt="image-20260327163200329" width="800" height="1739"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Scenario 1: "Find a hot pot restaurant with a rating of 4 stars or above for group dining"
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;User sends: GPS location (Xinyi District, Taipei City, 25.0441, 121.5598)

User enters: "Please find a hot pot restaurant with a rating of 4 stars or above, suitable for group dining, and list the name, address, and review summary."

[Step 1: Gemini receives query + map context]
  → Detects the need for restaurant data, emit function_call:
    search_nearby_restaurants(keyword="hot pot", min_rating=4.0)

[Step 2: Python calls Places API]
  → lat=25.0441, lng=121.5598 injected from session
  → Returns 3 restaurants with a rating ≥ 4.0, including review text

[Step 3: Gemini integrates Maps context + Places data]
  → "Lao Wang Hot Pot｜100 Shimin Avenue, Xinyi District｜⭐ 4.6 (312)
      Review summary: Large portions, great value for money, a top choice for friends to dine; fast service, fresh dishes.
     ... (3 restaurants in total)"
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Scenario 2: "Are there any high-value Japanese restaurants?"
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;User enters: "Are there any high-value Japanese restaurants nearby?"

[Step 1: Gemini]
  → function_call: search_nearby_restaurants(keyword="Japanese cuisine", min_rating=4.0)

[Step 2: Places API]
  → Returns 2 Japanese restaurants that meet the rating criteria

[Step 3: Gemini]
  → "There are two recommendations:
      Washoku ○○｜...｜⭐ 4.4｜Reviews: Weekday lunch set is only 280 yuan, very fresh.
      ..."
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Demo Script Quick Test
&lt;/h3&gt;

&lt;p&gt;No need for LINE Bot, directly on the local machine:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Only test Tool Combo (main function)&lt;/span&gt;
python demo.py combo

&lt;span class="c"&gt;# Run all three functions&lt;/span&gt;
python demo.py all
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Old Architecture vs. New Architecture
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;Old Architecture (Maps Grounding only)&lt;/th&gt;
&lt;th&gt;&lt;strong&gt;New Architecture (Tool Combo)&lt;/strong&gt;&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Tool&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;google_maps&lt;/code&gt; (built-in)&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;google_maps&lt;/code&gt; + &lt;code&gt;search_nearby_restaurants&lt;/code&gt; (custom)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Rating Data&lt;/td&gt;
&lt;td&gt;Gemini describes it itself (may not be accurate)&lt;/td&gt;
&lt;td&gt;Places API real numbers&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Reviews&lt;/td&gt;
&lt;td&gt;AI generated&lt;/td&gt;
&lt;td&gt;Real user reviews (up to 3)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;API Call Count&lt;/td&gt;
&lt;td&gt;1 time&lt;/td&gt;
&lt;td&gt;1 time (Step1) + 1 time (Step3) = 2 times, but transparent to the user&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Accuracy&lt;/td&gt;
&lt;td&gt;Medium&lt;/td&gt;
&lt;td&gt;High&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Custom Filtering&lt;/td&gt;
&lt;td&gt;Rely on prompt&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;min_rating&lt;/code&gt;, &lt;code&gt;radius_m&lt;/code&gt; precise control&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h2&gt;
  
  
  Analysis and Outlook
&lt;/h2&gt;

&lt;p&gt;This implementation has given me a clearer understanding of the potential of Gemini Tool Combinations.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The problem that Tool Combinations truly solves&lt;/strong&gt; is that Grounding and Function Calling are no longer mutually exclusive. Previously, to achieve "map context + real external data", you could only manually connect two APIs yourself at the application layer, or use Gemini's text generation to "simulate" external data (unreliable). Now the model itself knows when to use map context and when to call the Places API, and developers only need to attach the tools.&lt;/p&gt;

&lt;p&gt;However, there are also a few things to note about this implementation:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;&lt;code&gt;lat/lng&lt;/code&gt; injection mode is very important&lt;/strong&gt;: You can't let the model guess the coordinates itself, you must inject them from the session, otherwise the positioning accuracy will be very poor. This mode also applies to all function calling scenarios that "have session status".&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;The cost of two &lt;code&gt;generate_content&lt;/code&gt; calls&lt;/strong&gt;: The agentic loop of Tool Combo requires two model calls, and the token consumption is about 1.5–2 times that of a single call. This needs to be especially considered for scenarios with high latency requirements.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;SDK version differences&lt;/strong&gt;: Different versions of &lt;code&gt;google-genai&lt;/code&gt; have different support for the fields of &lt;code&gt;GenerateContentConfig&lt;/code&gt;, and new fields like &lt;code&gt;include_server_side_tool_invocations&lt;/code&gt; should be used after confirming the version number, otherwise Pydantic validation errors are hard to track.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Future directions that can be extended:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  Connect the Postback quick replies (click the "Find Restaurant" button) to Tool Combo, so that each entry can get real ratings&lt;/li&gt;
&lt;li&gt;  Add the &lt;code&gt;searchText&lt;/code&gt; endpoint to support more complex keyword searches (e.g. Michelin recommendations)&lt;/li&gt;
&lt;li&gt;  Tool Combo combined with other built-in tools (such as &lt;code&gt;google_search&lt;/code&gt;) to achieve more complex multi-tool chaining&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Summary
&lt;/h2&gt;

&lt;p&gt;The core concept of this modification is only one sentence: &lt;strong&gt;Put Google Maps grounding and the Places API function tool in the same &lt;code&gt;types.Tool&lt;/code&gt;, and Gemini will coordinate the two in a single conversation.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The key code is only these few lines:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# This is all the magic of Tool Combo
&lt;/span&gt;&lt;span class="n"&gt;types&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Tool&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;google_maps&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;types&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;GoogleMaps&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt; &lt;span class="c1"&gt;# ← Maps context
&lt;/span&gt;    &lt;span class="n"&gt;function_declarations&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;SEARCH_NEARBY_RESTAURANTS_FN&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="c1"&gt;# ← Places API
&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;But to make it really work, you also need to pay attention to: the construction method of &lt;code&gt;FunctionResponse&lt;/code&gt;, the guard of &lt;code&gt;candidates&lt;/code&gt;, the correct fields of the Places API endpoint, and the injection of &lt;code&gt;lat/lng&lt;/code&gt; from the session instead of letting the model guess.&lt;/p&gt;

&lt;p&gt;The complete code is on &lt;a href="https://github.com/kkdai/linebot-spot-finder" rel="noopener noreferrer"&gt;GitHub&lt;/a&gt;, feel free to clone and play with it.&lt;/p&gt;

&lt;p&gt;See you next time!&lt;/p&gt;

</description>
    </item>
    <item>
      <title>Gemini 3.1: Real-World Voice Recognition with Flash Live: Making Your LINE Bot Understand You</title>
      <dc:creator>Evan Lin</dc:creator>
      <pubDate>Sun, 29 Mar 2026 02:07:26 +0000</pubDate>
      <link>https://dev.to/gde/gemini-31-real-world-voice-recognition-with-flash-live-making-your-line-bot-understand-you-560o</link>
      <guid>https://dev.to/gde/gemini-31-real-world-voice-recognition-with-flash-live-making-your-line-bot-understand-you-560o</guid>
      <description>&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fjl6ycarc8j4uczflmtoj.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fjl6ycarc8j4uczflmtoj.png" alt="image-20260328203306501" width="800" height="1739"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h1&gt;
  
  
  Background
&lt;/h1&gt;

&lt;p&gt;Google released &lt;strong&gt;Gemini 3.1 Flash Live&lt;/strong&gt; at the end of March 2026 &lt;a href="https://blog.google/innovation-and-ai/models-and-research/gemini-models/gemini-3-1-flash-live/" rel="noopener noreferrer"&gt;March&lt;/a&gt;, focusing on "making audio AI more natural and reliable." This model is specifically designed for real-time two-way voice conversations, with low latency, interruptibility, and multi-language support.&lt;/p&gt;

&lt;p&gt;I happened to have a LINE Bot project (&lt;a href="https://github.com/kkdai/linebot-helper-python" rel="noopener noreferrer"&gt;linebot-helper-python&lt;/a&gt;) on hand, which already handles text, images, URLs, PDFs, and YouTube, but completely ignores voice messages:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;User sends a voice message
Bot: (Silence)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This time, I'll add voice support and share a few pitfalls I encountered.&lt;/p&gt;




&lt;h2&gt;
  
  
  Design Decision: Flash Live or Standard Gemini API?
&lt;/h2&gt;

&lt;p&gt;The first question: Gemini 3.1 Flash Live is designed for &lt;strong&gt;real-time streaming&lt;/strong&gt;, but LINE's voice messages are &lt;strong&gt;pre-recorded m4a files&lt;/strong&gt;, not real-time audio streams.&lt;/p&gt;

&lt;p&gt;Using Flash Live to process pre-recorded files is like using a live streaming camera to take photos – technically feasible, but the wrong tool.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Decided to use the standard Gemini API&lt;/strong&gt; – directly passing the audio bytes as inline data, and getting the transcribed text in one call. It's simpler and more suitable for this scenario.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Frdx9agqz7jujs89xky8x.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Frdx9agqz7jujs89xky8x.png" alt="image-20260328203340798" width="800" height="322"&gt;&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  Architecture Design
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Integration Approach
&lt;/h3&gt;

&lt;p&gt;This repo already has a complete Orchestrator architecture, which automatically routes to different Agents (Chat, Content, Location, Vision, GitHub) based on the message content. The goal for voice messages is clear:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Convert voice to text, and then treat it as a regular text message and pass it into the Orchestrator – allowing all existing features to automatically support voice input.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;User says "Help me search for nearby gas stations" → transcribed into text → Orchestrator determines it's a location query → LocationAgent processes it. No need to implement separate logic for voice.&lt;/p&gt;

&lt;h3&gt;
  
  
  Complete Flow
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;User sends AudioMessage (m4a)
    │
    ▼ handle_audio_message()
    │
    ├─ ① LINE SDK downloads audio bytes
    │ get_message_content(message_id) → iter_content()
    │
    ├─ ② Gemini transcription
    │ tools/audio_tool.py → transcribe_audio()
    │ model: gemini-3.1-flash-lite-preview
    │
    ├─ ③ Reply #1: "You said: {transcription}"
    │ reply_message() (consumes reply token)
    │
    └─ ④ Reply #2: Orchestrator routing
            handle_text_message_via_orchestrator(push_user_id=user_id)
            ↓
            push_message() (reply token already used, use push instead)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Why two replies?
&lt;/h3&gt;

&lt;p&gt;The replies are divided into two parts to let the user &lt;strong&gt;see the transcription result immediately&lt;/strong&gt;, without waiting for the Orchestrator to finish processing to know if the Bot understood what they said.&lt;/p&gt;




&lt;h2&gt;
  
  
  Core Code Explanation
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Step 1: Audio Transcription Tool (tools/audio_tool.py)
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;google&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;genai&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;google.genai&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;types&lt;/span&gt;

&lt;span class="n"&gt;TRANSCRIPTION_MODEL&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;gemini-3.1-flash-lite-preview&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;transcribe_audio&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;audio_bytes&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;bytes&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;mime_type&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;audio/mp4&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;
    Transcribe audio bytes to text using Gemini.
    LINE voice messages are always m4a, MIME type is always audio/mp4.
    &lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;genai&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Client&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;vertexai&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;project&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getenv&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;GOOGLE_CLOUD_PROJECT&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
        &lt;span class="n"&gt;location&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getenv&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;GOOGLE_CLOUD_LOCATION&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;us-central1&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;audio_part&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;types&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Part&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;from_bytes&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;audio_bytes&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;mime_type&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;mime_type&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;aio&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;models&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;generate_content&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;TRANSCRIPTION_MODEL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;contents&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;
            &lt;span class="n"&gt;types&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Content&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
                &lt;span class="n"&gt;role&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="n"&gt;parts&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;
                    &lt;span class="n"&gt;audio_part&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                    &lt;span class="n"&gt;types&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Part&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Please transcribe the above audio content into text completely, preserving the original language, and do not add any explanations or prefixes.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
                &lt;span class="p"&gt;],&lt;/span&gt;
            &lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt; &lt;span class="ow"&gt;or&lt;/span&gt; &lt;span class="sh"&gt;""&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Design principle: The function itself does not catch exceptions, allowing the upper-level handler to handle error responses uniformly.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 2: Handler Main Flow (main.py)
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;handle_audio_message&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;MessageEvent&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Handle audio (voice) messages — transcribe and route through Orchestrator.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="n"&gt;user_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;source&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;user_id&lt;/span&gt;
    &lt;span class="n"&gt;replied&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;False&lt;/span&gt; &lt;span class="c1"&gt;# Track if the reply token has been used
&lt;/span&gt;    &lt;span class="k"&gt;try&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="c1"&gt;# Download audio
&lt;/span&gt;        &lt;span class="n"&gt;message_content&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;line_bot_api&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get_message_content&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nb"&gt;id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;audio_bytes&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sa"&gt;b&lt;/span&gt;&lt;span class="sh"&gt;""&lt;/span&gt;
        &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;chunk&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;message_content&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;iter_content&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
            &lt;span class="n"&gt;audio_bytes&lt;/span&gt; &lt;span class="o"&gt;+=&lt;/span&gt; &lt;span class="n"&gt;chunk&lt;/span&gt;

        &lt;span class="c1"&gt;# Transcription
&lt;/span&gt;        &lt;span class="n"&gt;transcription&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;transcribe_audio&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;audio_bytes&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="c1"&gt;# Empty transcription (silent or too short)
&lt;/span&gt;        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="n"&gt;transcription&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;strip&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
            &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;line_bot_api&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;reply_message&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
                &lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;reply_token&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nc"&gt;TextSendMessage&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Unable to recognize voice content, please re-record.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)]&lt;/span&gt;
            &lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="k"&gt;return&lt;/span&gt;

        &lt;span class="c1"&gt;# Reply #1: Let the user confirm the transcription result (consumes reply token)
&lt;/span&gt;        &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;line_bot_api&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;reply_message&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;reply_token&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nc"&gt;TextSendMessage&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;You said: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;transcription&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;strip&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)]&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;replied&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;True&lt;/span&gt;

        &lt;span class="c1"&gt;# Reply #2: Send to Orchestrator, using push_message (token already used)
&lt;/span&gt;        &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;handle_text_message_via_orchestrator&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;transcription&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;strip&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
            &lt;span class="n"&gt;push_user_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;except&lt;/span&gt; &lt;span class="nb"&gt;Exception&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;logger&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;error&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Error handling audio for &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;exc_info&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;error_text&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;LineService&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;format_error_message&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;processing voice message&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;error_msg&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;TextSendMessage&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;error_text&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;replied&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="c1"&gt;# reply token has been consumed, use push instead
&lt;/span&gt;            &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;line_bot_api&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;push_message&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;error_msg&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
        &lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;line_bot_api&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;reply_message&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;reply_token&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;error_msg&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Step 3: Enabling Orchestrator to Support External Text Input
&lt;/h3&gt;

&lt;p&gt;The original &lt;code&gt;handle_text_message_via_orchestrator&lt;/code&gt; directly reads &lt;code&gt;event.message.text&lt;/code&gt;. AudioMessage doesn't have &lt;code&gt;.text&lt;/code&gt;, so add two optional parameters:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;handle_text_message_via_orchestrator&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;MessageEvent&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="c1"&gt;# ← External text input (voice transcription)
&lt;/span&gt;    &lt;span class="n"&gt;push_user_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="c1"&gt;# ← Use push_message when set
&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;msg&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;text&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;text&lt;/span&gt; &lt;span class="ow"&gt;is&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt; &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;strip&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="k"&gt;try&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;orchestrator&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;process_text&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;msg&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;response_text&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;format_orchestrator_response&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;reply_msg&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;TextSendMessage&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;response_text&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;push_user_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;line_bot_api&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;push_message&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;push_user_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;reply_msg&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
        &lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;line_bot_api&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;reply_message&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;reply_token&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;reply_msg&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
    &lt;span class="k"&gt;except&lt;/span&gt; &lt;span class="nb"&gt;Exception&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;error_msg&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;TextSendMessage&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;LineService&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;format_error_message&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;processing your question&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;push_user_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;line_bot_api&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;push_message&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;push_user_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;error_msg&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
        &lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;line_bot_api&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;reply_message&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;reply_token&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;error_msg&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;text is not None&lt;/code&gt; (instead of &lt;code&gt;text or ...&lt;/code&gt;) is intentional – in case the voice transcription results in an empty string, allow the empty string to pass through (and then be intercepted by the upper-level &lt;code&gt;if not transcription.strip()&lt;/code&gt;), instead of falling back to &lt;code&gt;event.message.text&lt;/code&gt;.&lt;/p&gt;




&lt;h2&gt;
  
  
  Pitfalls Encountered
&lt;/h2&gt;

&lt;h3&gt;
  
  
  ❌ Pitfall 1: &lt;code&gt;Part.from_text()&lt;/code&gt; does not accept positional arguments
&lt;/h3&gt;

&lt;p&gt;The first TypeError encountered:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# ❌ Error (TypeError: Part.from_text() takes 1 positional argument but 2 were given)
&lt;/span&gt;&lt;span class="n"&gt;types&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Part&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;from_text&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Please transcribe the above audio content into text completely, preserving the original language, and do not add any explanations or prefixes.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# ✅ Correct
&lt;/span&gt;&lt;span class="n"&gt;types&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Part&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Please transcribe the above audio content into text completely, preserving the original language, and do not add any explanations or prefixes.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;In this version of the SDK, &lt;code&gt;text&lt;/code&gt; in &lt;code&gt;Part.from_text()&lt;/code&gt; is a keyword argument, or use the &lt;code&gt;Part(text=...)&lt;/code&gt; constructor directly for more safety.&lt;/p&gt;

&lt;h3&gt;
  
  
  ❌ Pitfall 2: LINE reply token can only be used once
&lt;/h3&gt;

&lt;p&gt;LINE's reply token is &lt;strong&gt;one-time use&lt;/strong&gt;. Once &lt;code&gt;reply_message()&lt;/code&gt; is called, the token is invalidated.&lt;/p&gt;

&lt;p&gt;This project's voice flow will call twice:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt; Reply #1 (display transcription text) → &lt;strong&gt;consumes token&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt; Reply #2 (Orchestrator result) → &lt;strong&gt;token is invalid, will receive LINE 400 error&lt;/strong&gt;
&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The solution is to have the Orchestrator handler support &lt;code&gt;push_message&lt;/code&gt; mode (via the &lt;code&gt;push_user_id&lt;/code&gt; parameter), and Reply #2 changes to &lt;code&gt;push_message&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;Error handling should also be noted: if Orchestrator throws an exception after Reply #1 succeeds, the &lt;code&gt;reply_message&lt;/code&gt; cannot be used in the except block, and it also needs to be changed to &lt;code&gt;push_message&lt;/code&gt;. This is the purpose of the &lt;code&gt;replied&lt;/code&gt; flag in the code.&lt;/p&gt;

&lt;h3&gt;
  
  
  ❌ Pitfall 3: Gemini Flash Live is not suitable for pre-recorded files
&lt;/h3&gt;

&lt;p&gt;Not a real "pitfall", but worth clarifying:&lt;/p&gt;

&lt;p&gt;Gemini 3.1 Flash Live is designed for &lt;strong&gt;real-time two-way streaming&lt;/strong&gt;, which has the overhead of connection establishment and streaming protocols. LINE voice messages are complete pre-recorded m4a files, which can be processed once.&lt;/p&gt;

&lt;p&gt;Using &lt;code&gt;client.aio.models.generate_content()&lt;/code&gt; directly to pass inline audio bytes is simpler, and the delay is not bad. Leave Flash Live for scenarios that truly require real-time conversations.&lt;/p&gt;




&lt;h2&gt;
  
  
  Effect Demonstration
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Scenario 1: Voice Command Query
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;User sends: [Voice] Help me search for cafes near Taipei Main Station

Bot Reply #1: You said: Help me search for cafes near Taipei Main Station
Bot Reply #2: [LocationAgent replies with a list of nearby cafes]
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Scenario 2: Voice Question
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;User sends: [Voice] What's the difference between Gemini and GPT-4

Bot Reply #1: You said: What's the difference between Gemini and GPT-4
Bot Reply #2: [ChatAgent with Google Search Grounding replies with comparison results]
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Scenario 3: Voice Send URL
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;User sends: [Voice] Help me summarize this article https://example.com/article

Bot Reply #1: You said: Help me summarize this article https://example.com/article
Bot Reply #2: [ContentAgent fetches and summarizes the article]
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The text transcribed from voice goes directly into the Orchestrator, and all existing URL detection and intent determination work as usual, with zero extra logic.&lt;/p&gt;




&lt;h2&gt;
  
  
  Traditional Text Input vs. Voice Input
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;Text Input&lt;/th&gt;
&lt;th&gt;&lt;strong&gt;Voice Input&lt;/strong&gt;&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Input Format&lt;/td&gt;
&lt;td&gt;TextMessage&lt;/td&gt;
&lt;td&gt;AudioMessage (m4a)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Pre-processing&lt;/td&gt;
&lt;td&gt;None&lt;/td&gt;
&lt;td&gt;Gemini transcription&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;reply token&lt;/td&gt;
&lt;td&gt;Direct use&lt;/td&gt;
&lt;td&gt;Reply #1 consumes, Reply #2 changes to push&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Orchestrator&lt;/td&gt;
&lt;td&gt;Direct routing&lt;/td&gt;
&lt;td&gt;Route after transcription&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Supported Functions&lt;/td&gt;
&lt;td&gt;All&lt;/td&gt;
&lt;td&gt;All (no additional settings required)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Error Handling&lt;/td&gt;
&lt;td&gt;reply_message&lt;/td&gt;
&lt;td&gt;replied flag determines reply/push&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h2&gt;
  
  
  Analysis and Outlook
&lt;/h2&gt;

&lt;p&gt;What I am most satisfied with in this integration is that &lt;strong&gt;I hardly need to change the Orchestrator itself&lt;/strong&gt;. As long as the voice is converted to text at the input end, all the routing logic, Agent calls, and error handling are automatically inherited.&lt;/p&gt;

&lt;p&gt;Gemini's multimodal audio understanding performs very stably in this scenario – Traditional Chinese, Taiwanese accents, and sentences mixed with English can basically be transcribed accurately.&lt;/p&gt;

&lt;p&gt;Future directions for extension:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  &lt;strong&gt;Multi-language automatic detection&lt;/strong&gt;: Tell Gemini to preserve the original language during transcription, Japanese voice → Japanese transcription, and then the Orchestrator decides whether to translate&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Group voice support&lt;/strong&gt;: Currently limited to 1:1, voice messages in groups are temporarily ignored&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Long recording summary&lt;/strong&gt;: Recordings exceeding a certain length go directly to ContentAgent for summarization, instead of being treated as commands&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Extension: 🔊 Read Summary Aloud – Make the Bot Speak
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fghuiv4o4wq6yt1jyupmu.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fghuiv4o4wq6yt1jyupmu.png" alt="Preview Program 2026-03-28 20.33.53" width="800" height="1739"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Voice recognition allows the Bot to "understand" what the user is saying. After this is done, the next question naturally arises:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Can the Bot respond by speaking?&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;The Gemini Live API has a setting &lt;code&gt;response_modalities: ["AUDIO"]&lt;/code&gt;, which can directly output an audio PCM stream. I connected it to another scenario – &lt;strong&gt;reading summaries aloud&lt;/strong&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  Function Design
&lt;/h3&gt;

&lt;p&gt;Each time the Bot summarizes a URL, YouTube, or PDF, a "🔊 Read Aloud" QuickReply button will appear below the message. When the user presses it, the Bot sends the summary text into Gemini Live TTS, converts the PCM audio to m4a, and then sends it back using &lt;code&gt;AudioSendMessage&lt;/code&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;URL summary complete
    │
    ▼ [🔊 Read Aloud] QuickReply button
    │
User presses the button → PostbackEvent
    │
    ▼ handle_read_aloud_postback()
    │
    ├─ ① Retrieve the summary text from summary_store (10 minutes TTL)
    │
    ├─ ② Gemini Live API → PCM audio
    │ model: gemini-live-2.5-flash-native-audio
    │ response_modalities: ["AUDIO"]
    │
    ├─ ③ ffmpeg transcoding: PCM → m4a
    │ s16le, 16kHz, mono → AAC
    │
    └─ ④ AudioSendMessage sent to the user
            original_content_url: /audio/{uuid}
            duration: {ms}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Core Code (tools/tts_tool.py)
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;LIVE_MODEL&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;gemini-live-2.5-flash-native-audio&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;text_to_speech&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;tuple&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;bytes&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;]:&lt;/span&gt;
    &lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;genai&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Client&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;vertexai&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;project&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;VERTEX_PROJECT&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;location&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;us-central1&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;config&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;response_modalities&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;AUDIO&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]}&lt;/span&gt;

    &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;aio&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;live&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;connect&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;LIVE_MODEL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;config&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;config&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;session&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;session&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;send_client_content&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;turns&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;types&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Content&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;role&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;parts&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;types&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Part&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;)]),&lt;/span&gt;
            &lt;span class="n"&gt;turn_complete&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;pcm_chunks&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;
        &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;message&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;session&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;receive&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
            &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;server_content&lt;/span&gt; &lt;span class="ow"&gt;and&lt;/span&gt; &lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;server_content&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;model_turn&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;part&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;server_content&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;model_turn&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;parts&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;part&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;inline_data&lt;/span&gt; &lt;span class="ow"&gt;and&lt;/span&gt; &lt;span class="n"&gt;part&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;inline_data&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                        &lt;span class="n"&gt;pcm_chunks&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;part&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;inline_data&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;server_content&lt;/span&gt; &lt;span class="ow"&gt;and&lt;/span&gt; &lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;server_content&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;turn_complete&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                &lt;span class="k"&gt;break&lt;/span&gt;

    &lt;span class="n"&gt;pcm_bytes&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sa"&gt;b&lt;/span&gt;&lt;span class="sh"&gt;""&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;join&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;pcm_chunks&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;duration_ms&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;int&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;pcm_bytes&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="mi"&gt;32000&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mi"&gt;1000&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="c1"&gt;# 16kHz × 16-bit mono
&lt;/span&gt;
    &lt;span class="c1"&gt;# PCM → m4a (temp file mode, avoid moov atom problem)
&lt;/span&gt;    &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="n"&gt;tempfile&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;NamedTemporaryFile&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;suffix&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;.pcm&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;delete&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;False&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;write&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;pcm_bytes&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;pcm_path&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;
    &lt;span class="n"&gt;m4a_path&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pcm_path&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;replace&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;.pcm&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;.m4a&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;subprocess&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;run&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;ffmpeg&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;-y&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;-f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;s16le&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;-ar&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;16000&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;-ac&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;1&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
         &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;-i&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;pcm_path&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;-c:a&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;aac&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;m4a_path&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
        &lt;span class="n"&gt;check&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;capture_output&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="nf"&gt;open&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;m4a_path&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;rb&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;read&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt; &lt;span class="n"&gt;duration_ms&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Pitfalls of Read Aloud Function
&lt;/h2&gt;

&lt;h3&gt;
  
  
  ❌ Pitfall 4: Completely Different Model Name
&lt;/h3&gt;

&lt;p&gt;The first attempt at Gemini Live TTS was:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;LIVE_MODEL&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;gemini-3.1-flash-live-preview&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Following the inference of &lt;code&gt;gemini-3.1-flash-lite-preview&lt;/code&gt; used for voice recognition, the result was a direct 1008 policy violation:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Publisher Model `projects/line-vertex/locations/global/publishers/google/
models/gemini-3.1-flash-live-preview` was not found
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Listing the available models on Vertex AI revealed that the model naming rules for Live/native audio are completely different:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# ✅ Correct
&lt;/span&gt;&lt;span class="n"&gt;LIVE_MODEL&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;gemini-live-2.5-flash-native-audio&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;There is &lt;strong&gt;no Live version&lt;/strong&gt; of Gemini 3.1 on Vertex AI. The Live/native audio feature is currently the 2.5 generation, and the naming format is &lt;code&gt;gemini-live-{version}-{variant}-native-audio&lt;/code&gt;, which is completely separate from the general model &lt;code&gt;gemini-{version}-flash-{variant}&lt;/code&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  ❌ Pitfall 5: &lt;code&gt;GOOGLE_CLOUD_LOCATION=global&lt;/code&gt; Causes Live API to Disconnect
&lt;/h3&gt;

&lt;p&gt;After changing to the correct model name, the error message was still the same:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Publisher Model `projects/line-vertex/locations/global/...` was not found
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This time the model name was correct, but &lt;code&gt;locations/global&lt;/code&gt; was strange – we clearly set &lt;code&gt;us-central1&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;Investigating the source code of the Google GenAI SDK revealed:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# _api_client.py
&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;location&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;location&lt;/span&gt; &lt;span class="ow"&gt;or&lt;/span&gt; &lt;span class="n"&gt;env_location&lt;/span&gt;
&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;location&lt;/span&gt; &lt;span class="ow"&gt;and&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;api_key&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;location&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;global&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt; &lt;span class="c1"&gt;# ← here
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;location or env_location&lt;/code&gt; – if the passed-in &lt;code&gt;location&lt;/code&gt; is an empty string, it will fall back to &lt;code&gt;global&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;The root cause of the problem is the environment variable of Cloud Run:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"GOOGLE_CLOUD_LOCATION"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"value"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"global"&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;GOOGLE_CLOUD_LOCATION&lt;/code&gt; was set to the &lt;code&gt;"global"&lt;/code&gt; string. &lt;code&gt;os.getenv("GOOGLE_CLOUD_LOCATION", "us-central1")&lt;/code&gt; did not get &lt;code&gt;"us-central1"&lt;/code&gt;, but &lt;code&gt;"global"&lt;/code&gt; – then the SDK obediently connected to the global endpoint, but &lt;code&gt;gemini-live-2.5-flash-native-audio&lt;/code&gt; does not have BidiGenerateContent support in global.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Endpoint&lt;/th&gt;
&lt;th&gt;Standard API&lt;/th&gt;
&lt;th&gt;Live API&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;global&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;✅ Available&lt;/td&gt;
&lt;td&gt;❌ Model not here&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;us-central1&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;✅ Available&lt;/td&gt;
&lt;td&gt;✅ &lt;code&gt;gemini-live-2.5-flash-native-audio&lt;/code&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Solution: Hardcode the location of the Live API, and don't read from the env var:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# ❌ Affected by GOOGLE_CLOUD_LOCATION=global
&lt;/span&gt;&lt;span class="n"&gt;VERTEX_LOCATION&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getenv&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;GOOGLE_CLOUD_LOCATION&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;us-central1&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# ✅ Hardcoded, not affected by env var
&lt;/span&gt;&lt;span class="n"&gt;VERTEX_LOCATION&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;us-central1&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="c1"&gt;# Live API needs a regional endpoint
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Voice Recognition vs. Read Summary Aloud
&lt;/h2&gt;

&lt;p&gt;The two functions use completely different Gemini APIs:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;Voice Recognition&lt;/th&gt;
&lt;th&gt;Read Summary Aloud&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Direction&lt;/td&gt;
&lt;td&gt;Audio → Text&lt;/td&gt;
&lt;td&gt;Text → Audio&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;API&lt;/td&gt;
&lt;td&gt;Standard &lt;code&gt;generate_content&lt;/code&gt;
&lt;/td&gt;
&lt;td&gt;Live API &lt;code&gt;BidiGenerateContent&lt;/code&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Model&lt;/td&gt;
&lt;td&gt;&lt;code&gt;gemini-3.1-flash-lite-preview&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;gemini-live-2.5-flash-native-audio&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Location&lt;/td&gt;
&lt;td&gt;Follows env var&lt;/td&gt;
&lt;td&gt;Hardcoded &lt;code&gt;us-central1&lt;/code&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Output Format&lt;/td&gt;
&lt;td&gt;text&lt;/td&gt;
&lt;td&gt;PCM → ffmpeg → m4a&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;LINE Message Type&lt;/td&gt;
&lt;td&gt;Input: &lt;code&gt;AudioMessage&lt;/code&gt;
&lt;/td&gt;
&lt;td&gt;Output: &lt;code&gt;AudioSendMessage&lt;/code&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;The release of Gemini 3.1 Flash Live makes audio AI more worthy of serious consideration. This time, both voice recognition and read summary aloud were integrated into the LINE Bot:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  &lt;strong&gt;Voice Recognition&lt;/strong&gt;: Standard Gemini API, pre-recorded m4a one-time transcription, connected to the existing Orchestrator&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Read Summary Aloud&lt;/strong&gt;: Gemini Live TTS, summary text to PCM, ffmpeg to m4a, &lt;code&gt;AudioSendMessage&lt;/code&gt; returns&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The most troublesome part is not the function itself, but &lt;strong&gt;finding the correct model name&lt;/strong&gt; and &lt;strong&gt;locating the SDK's location logic&lt;/strong&gt; – neither of these are clearly written in a prominent place in the documentation, and the answer can only be found by listing available models and reading the SDK source code.&lt;/p&gt;

&lt;p&gt;The full code is on &lt;a href="https://github.com/kkdai/linebot-helper-python" rel="noopener noreferrer"&gt;GitHub&lt;/a&gt;, feel free to refer to it.&lt;/p&gt;

&lt;p&gt;See you next time!&lt;/p&gt;

</description>
      <category>ai</category>
      <category>gemini</category>
      <category>python</category>
      <category>tutorial</category>
    </item>
    <item>
      <title>Building an Agent Skill Hub: From Skill Development to Automated Multilingual Documentation Deployment on GitHub Pages</title>
      <dc:creator>Evan Lin</dc:creator>
      <pubDate>Fri, 27 Mar 2026 01:45:20 +0000</pubDate>
      <link>https://dev.to/evanlin/building-an-agent-skill-hub-from-skill-development-to-automated-multilingual-documentation-5ae7</link>
      <guid>https://dev.to/evanlin/building-an-agent-skill-hub-from-skill-development-to-automated-multilingual-documentation-5ae7</guid>
      <description>&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8cu37bxccsz7i7k6wl0n.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8cu37bxccsz7i7k6wl0n.png" alt="image-20260322225856161" width="800" height="692"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Reference links:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://github.com/kkdai/agent-skill-hub" rel="noopener noreferrer"&gt;Agent Skill Hub Repository&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://plateaukao.github.io/whisperASR/" rel="noopener noreferrer"&gt;whisperASR Reference Example&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://docs.github.com/en/pages" rel="noopener noreferrer"&gt;GitHub Pages Official Documentation&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This article documents how I built a skill description specification from scratch and created a GitHub Pages documentation site that supports both Chinese and English, drawing inspiration from minimalist aesthetics, while developing the &lt;strong&gt;Agent Skill Hub (2026 Skill Library)&lt;/strong&gt;.&lt;/p&gt;

&lt;h1&gt;
  
  
  Background
&lt;/h1&gt;

&lt;p&gt;With the popularity of AI Agents (such as OpenClaw or Gemini CLI), we found that "how to quickly understand and execute specific tasks for the Agent" has become key. Instead of writing long prompts every time, it's better to package common operations into standardized &lt;strong&gt;Skills&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;To facilitate community communication and Agent reading, I created &lt;code&gt;agent-skill-hub&lt;/code&gt;. But code alone is not enough; we also need a decent "facade" – a document website that is both aesthetically pleasing and has technical details.&lt;/p&gt;




&lt;h2&gt;
  
  
  🛠️ Step 1: Standardize Skill Descriptions (SKILL.md)
&lt;/h2&gt;

&lt;p&gt;In &lt;code&gt;agent-skill-hub&lt;/code&gt;, each skill (such as &lt;code&gt;gcp-helper&lt;/code&gt; or &lt;code&gt;n8n-executor&lt;/code&gt;) has a &lt;code&gt;SKILL.md&lt;/code&gt;. The structure of this file is crucial because it's not just for humans to read, but also for LLMs to read:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Name &amp;amp; Description&lt;/strong&gt;: Let the Agent know what this is.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;When to Use&lt;/strong&gt;: Define trigger scenarios.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Core Pattern&lt;/strong&gt;: Provide standard instruction examples.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Common Mistakes&lt;/strong&gt;: Reduce errors caused by Agent hallucinations.&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  🎨 Step 2: Design Style — Tribute to Minimalist Aesthetics
&lt;/h2&gt;

&lt;p&gt;When designing the web pages under the &lt;code&gt;docs&lt;/code&gt; directory, I referenced the style of &lt;strong&gt;whisperASR&lt;/strong&gt;. That design of a dark background with bright accent colors (Teal) is very in line with the aesthetics of modern developers:&lt;/p&gt;

&lt;h3&gt;
  
  
  Visual Element Highlights:
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt; &lt;strong&gt;Gradient Title&lt;/strong&gt;: Use &lt;code&gt;linear-gradient&lt;/code&gt; to create a high-end texture.&lt;/li&gt;
&lt;li&gt; &lt;strong&gt;Teal Accent Color&lt;/strong&gt;: Use &lt;code&gt;#14b8a6&lt;/code&gt; as the highlight color for key buttons and titles.&lt;/li&gt;
&lt;li&gt; &lt;strong&gt;Card-style Layout&lt;/strong&gt;: Clearly present the icons and introductions of each skill, with good responsive design.&lt;/li&gt;
&lt;/ol&gt;




&lt;h2&gt;
  
  
  🌐 Step 3: Multilingual Support and Automatic Switching
&lt;/h2&gt;

&lt;p&gt;To make it available to developers worldwide, I adopted a directory-structured language management method:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;docs/
├── index.html (Language detection and redirection)
├── en/ (English version)
│ └── skills/
└── zh/ (Traditional Chinese version)
    └── skills/

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;I added a simple JavaScript snippet to the root directory's &lt;code&gt;index.html&lt;/code&gt;, which automatically redirects to the correct language based on the user's browser settings:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;lang&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;navigator&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;language&lt;/span&gt; &lt;span class="o"&gt;||&lt;/span&gt; &lt;span class="nb"&gt;navigator&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;userLanguage&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;lang&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;startsWith&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;zh&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nb"&gt;window&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;location&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;href&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;./zh/index.html&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nb"&gt;window&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;location&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;href&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;./en/index.html&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  🚀 Step 4: GitHub Pages Deployment Process
&lt;/h2&gt;

&lt;p&gt;In 2026, the most recommended deployment method is to put the content in the &lt;code&gt;docs/&lt;/code&gt; directory of the main branch, which can keep the &lt;code&gt;main&lt;/code&gt; branch clean while keeping development and documentation synchronized.&lt;/p&gt;

&lt;h3&gt;
  
  
  1. Prepare the Directory Structure
&lt;/h3&gt;

&lt;p&gt;Create all the necessary directories at once using the command:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;mkdir&lt;/span&gt; &lt;span class="nt"&gt;-p&lt;/span&gt; docs/en/skills docs/zh/skills docs/assets/css

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  2. Git Commit and Push
&lt;/h3&gt;

&lt;p&gt;After completing HTML/CSS development, execute the standard Git process:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;git add docs/
git commit &lt;span class="nt"&gt;-m&lt;/span&gt; &lt;span class="s2"&gt;"docs: add GitHub Pages documentation in English and Chinese"&lt;/span&gt;
git push origin main

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  3. Enable GitHub Pages Settings
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt; Go to &lt;strong&gt;Settings &amp;gt; Pages&lt;/strong&gt; in the GitHub repository.&lt;/li&gt;
&lt;li&gt; Under &lt;strong&gt;Build and deployment&lt;/strong&gt;, in &lt;strong&gt;Branch&lt;/strong&gt;, select the &lt;code&gt;main&lt;/code&gt; branch and the &lt;code&gt;/docs&lt;/code&gt; folder.&lt;/li&gt;
&lt;li&gt; Click &lt;strong&gt;Save&lt;/strong&gt;, and the website will be online in a few minutes.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fnj277v30sramun4c3ae5.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fnj277v30sramun4c3ae5.png" alt="image-20260322225932252" width="800" height="476"&gt;&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  🛠️ Common Pitfalls and Troubleshooting
&lt;/h2&gt;

&lt;h3&gt;
  
  
  ❓ Why can't the webpage style (CSS) be loaded?
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Reason:&lt;/strong&gt; In HTML files under subdirectories (such as &lt;code&gt;en/skills/&lt;/code&gt;), the referenced paths must correctly use relative paths. &lt;strong&gt;Correction:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight html"&gt;&lt;code&gt;&lt;span class="c"&gt;&amp;lt;!-- In the home page index.html --&amp;gt;&lt;/span&gt;
&lt;span class="nt"&gt;&amp;lt;link&lt;/span&gt; &lt;span class="na"&gt;rel=&lt;/span&gt;&lt;span class="s"&gt;"stylesheet"&lt;/span&gt; &lt;span class="na"&gt;href=&lt;/span&gt;&lt;span class="s"&gt;"../assets/css/style.css"&lt;/span&gt;&lt;span class="nt"&gt;&amp;gt;&lt;/span&gt;
&lt;span class="c"&gt;&amp;lt;!-- In the skill detail page --&amp;gt;&lt;/span&gt;
&lt;span class="nt"&gt;&amp;lt;link&lt;/span&gt; &lt;span class="na"&gt;rel=&lt;/span&gt;&lt;span class="s"&gt;"stylesheet"&lt;/span&gt; &lt;span class="na"&gt;href=&lt;/span&gt;&lt;span class="s"&gt;"../../assets/css/style.css"&lt;/span&gt;&lt;span class="nt"&gt;&amp;gt;&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  ❓ How to ensure that the Agent can correctly read the document?
&lt;/h3&gt;

&lt;p&gt;We have retained a large number of semantic tags (&lt;code&gt;article&lt;/code&gt;, &lt;code&gt;h2&lt;/code&gt;, &lt;code&gt;pre&lt;/code&gt;, &lt;code&gt;code&lt;/code&gt;) in the HTML, so that the Agent can more accurately capture the core logic when performing RAG (Retrieval-Augmented Generation) or directly reading the webpage.&lt;/p&gt;




&lt;h2&gt;
  
  
  🏁 Conclusion
&lt;/h2&gt;

&lt;p&gt;Through this development, I have realized the importance of "documentation as product". A good AI skill library, in addition to powerful program logic, also needs a clear, intuitive, and multilingual-friendly navigation system.&lt;/p&gt;

&lt;p&gt;If you also want to create a professional facade for your AI project, you might as well refer to the &lt;code&gt;docs/&lt;/code&gt; structure layout. Happy Coding! 🦞&lt;/p&gt;




</description>
      <category>agents</category>
      <category>automation</category>
      <category>documentation</category>
      <category>github</category>
    </item>
    <item>
      <title>Security Declaration for AI Agents: Deep Dive into A2AS (Agent-to-Agent Security) Certification Mechanism</title>
      <dc:creator>Evan Lin</dc:creator>
      <pubDate>Fri, 27 Mar 2026 01:45:10 +0000</pubDate>
      <link>https://dev.to/evanlin/security-declaration-for-ai-agents-deep-dive-into-a2as-agent-to-agent-security-certification-2okf</link>
      <guid>https://dev.to/evanlin/security-declaration-for-ai-agents-deep-dive-into-a2as-agent-to-agent-security-certification-2okf</guid>
      <description>&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fimg.shields.io%2Fbadge%2FA2AS-CERTIFIED-f3af80" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fimg.shields.io%2Fbadge%2FA2AS-CERTIFIED-f3af80" alt="A2AS-CERTIFIED" width="110" height="20"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Reference links:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://a2as.org" rel="noopener noreferrer"&gt;A2AS.org Official Website&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.a2as.org/certified/agents/kkdai/linebot-adk" rel="noopener noreferrer"&gt;linebot-adk Project Certification Page&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This article documents an interesting Pull Request I received while maintaining &lt;strong&gt;linebot-adk (LINE Bot Agent Development Kit)&lt;/strong&gt;: adding the &lt;strong&gt;A2AS security certificate&lt;/strong&gt; to the project. This is not just a YAML file, but a significant milestone for AI Agents to move towards "industrial-grade security" in 2026.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fa0eydihb3vmxsrfh8k20.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fa0eydihb3vmxsrfh8k20.png" alt="Google Chrome 2026-03-26 22.45.44" width="800" height="598"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h1&gt;
  
  
  Background
&lt;/h1&gt;

&lt;p&gt;When we develop Agents like &lt;code&gt;linebot-adk&lt;/code&gt; that have Tool Use (Function Calling) capabilities, the biggest concern for users is often: "Will this Agent issue commands without my permission?" or "What data can it access?".&lt;/p&gt;

&lt;p&gt;Traditionally, we could only write explanations in &lt;code&gt;README.md&lt;/code&gt;, but that's for humans to read, not for system verification. This is why &lt;strong&gt;A2AS (Agent-to-Agent Security)&lt;/strong&gt; emerged – it's hailed as the "HTTPS of the AI world".&lt;/p&gt;




&lt;h2&gt;
  
  
  🛠️ Step 1: Understanding the BASIC Model of A2AS
&lt;/h2&gt;

&lt;p&gt;A2AS is not just a name; it has a complete &lt;strong&gt;BASIC security model&lt;/strong&gt; behind it, designed to solve the trust issue between AI Agents:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;(B)ehavior Certificates&lt;/strong&gt;: Declarative certificates that clearly define the behavior boundaries of the Agent.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;(A)uthenticated Prompts&lt;/strong&gt;: Ensures that the source of prompts is trustworthy and traceable.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;(S)ecurity Boundaries&lt;/strong&gt;: Uses structured tags (such as &lt;code&gt;&amp;lt;a2as:user&amp;gt;&lt;/code&gt;) to isolate untrusted input.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;(I)n-Context Defenses&lt;/strong&gt;: Embeds defense logic in prompts to reject malicious injections.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;(C)odified Policies&lt;/strong&gt;: Writes business rules into code and enforces them during inference.&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  🎨 Step 2: Deconstructing a2as.yaml – The Agent's ID Card
&lt;/h2&gt;

&lt;p&gt;In PR #1 received by &lt;code&gt;linebot-adk&lt;/code&gt;, the most core change was the addition of &lt;code&gt;a2as.yaml&lt;/code&gt;. This file is like the Agent's "digital signature", making the code logic explicit:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;manifest&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;subject&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;kkdai/linebot-adk&lt;/span&gt;
    &lt;span class="na"&gt;scope&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="nv"&gt;main.py&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;multi_tool_agent/agent.py&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;
  &lt;span class="na"&gt;issued&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;by&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;A2AS.org&lt;/span&gt;
    &lt;span class="na"&gt;url&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;https://a2as.org/certified/agents/kkdai/linebot-adk&lt;/span&gt;

&lt;span class="na"&gt;agents&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;root_agent&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;instance&lt;/span&gt;
    &lt;span class="na"&gt;models&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="nv"&gt;gemini-2.5-flash&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;
    &lt;span class="na"&gt;tools&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="nv"&gt;get_weather&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;get_current_time&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Why is this important?
&lt;/h3&gt;

&lt;p&gt;This certificate is directly linked to the content of our &lt;code&gt;main.py&lt;/code&gt;. When the certificate declares &lt;code&gt;tools: [get_weather, get_current_time]&lt;/code&gt;, it means this is a &lt;strong&gt;limited-authorization&lt;/strong&gt; Agent. If it tries to execute &lt;code&gt;delete_database&lt;/code&gt;, the security monitoring system can immediately detect that it is outside the certificate scope.&lt;/p&gt;




&lt;h2&gt;
  
  
  🌐 Step 3: Combining Code Logic
&lt;/h2&gt;

&lt;p&gt;In &lt;code&gt;linebot-adk&lt;/code&gt;, we used Google's &lt;strong&gt;ADK (Agent Development Kit)&lt;/strong&gt; to build the Agent. The A2AS certificate can accurately map our program architecture:&lt;/p&gt;

&lt;h3&gt;
  
  
  1. Tool Declaration and Implementation
&lt;/h3&gt;

&lt;p&gt;In &lt;code&gt;multi_tool_agent/agent.py&lt;/code&gt;, we defined two tools:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;get_weather&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;city&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="c1"&gt;# Implement the logic to get the weather
&lt;/span&gt;    &lt;span class="bp"&gt;...&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;get_current_time&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;city&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="c1"&gt;# Implement the logic to get the time
&lt;/span&gt;    &lt;span class="bp"&gt;...&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The A2AS certificate will register these &lt;code&gt;function&lt;/code&gt;s in the &lt;code&gt;tools&lt;/code&gt; block, ensuring that the Agent's capability boundaries are transparent and auditable.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Runner and Execution Loop
&lt;/h3&gt;

&lt;p&gt;In &lt;code&gt;main.py&lt;/code&gt;, we start the Agent through &lt;code&gt;Runner&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;runner&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Runner&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;agent&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;root_agent&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;app_name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;APP_NAME&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;session_service&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;session_service&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;manifest.subject.scope&lt;/code&gt; in the certificate marks &lt;code&gt;main.py&lt;/code&gt;, which means the entire startup process (including FastAPI's Webhook processing) is within the A2AS compliant scope.&lt;/p&gt;




&lt;h2&gt;
  
  
  🚀 Step 4: Why is this the "HTTPS of the AI world"?
&lt;/h2&gt;

&lt;p&gt;Imagine if you want a "travel agent Agent" to talk to a "hotel reservation Agent".&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Without A2AS&lt;/strong&gt;: The travel Agent can only "blindly trust" the hotel Agent.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;With A2AS&lt;/strong&gt;: The travel Agent can first check the other party's &lt;code&gt;a2as.yaml&lt;/code&gt; certificate. If the other party claims to have the right to "modify orders" but the certificate doesn't say so, the travel Agent can refuse the transaction.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This &lt;strong&gt;"verify first, then execute"&lt;/strong&gt; model is the trust network that A2AS wants to build.&lt;/p&gt;




&lt;h2&gt;
  
  
  🛠️ Common Pitfalls and Troubleshooting
&lt;/h2&gt;

&lt;h3&gt;
  
  
  ❓ What if the certificate expires or the Commit Hash doesn't match?
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Reason:&lt;/strong&gt; A2AS certificates are bound to a specific Git Commit. When you modify the logic of &lt;code&gt;agent.py&lt;/code&gt; but don't update the certificate, the verification will fail. &lt;strong&gt;Correction:&lt;/strong&gt; Every time you modify the core functions of the Agent (such as adding a Tool or changing the Model), you must regenerate and sign &lt;code&gt;a2as.yaml&lt;/code&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  ❓ Does using A2AS increase latency?
&lt;/h3&gt;

&lt;p&gt;No. A2AS is mainly a "declarative" and "structured" specification. During the inference phase, it uses structured tags (S in the BASIC model) to help LLMs distinguish between instructions and data, which can reduce the hallucinations caused by the model's confusion and improve execution efficiency.&lt;/p&gt;




&lt;h2&gt;
  
  
  🏁 Conclusion
&lt;/h2&gt;

&lt;p&gt;Through the introduction of this A2AS certificate, &lt;code&gt;linebot-adk&lt;/code&gt; is no longer just a simple LINE Bot example; it has become a transparent Agent that meets the 2026 security standards. In an era where AI agents are gradually penetrating our lives, "transparency" is the best defense.&lt;/p&gt;

&lt;p&gt;If you are also developing AI Agents, you might as well go to &lt;a href="https://a2as.org" rel="noopener noreferrer"&gt;A2AS.org&lt;/a&gt; and add that badge of trust to your project. Happy Coding! 🦞&lt;/p&gt;

</description>
      <category>agents</category>
      <category>ai</category>
      <category>opensource</category>
      <category>security</category>
    </item>
    <item>
      <title>Deploying OpenClaw on Google Cloud VM: Avoiding Sudo and NVM Pitfalls</title>
      <dc:creator>Evan Lin</dc:creator>
      <pubDate>Sun, 01 Mar 2026 14:04:54 +0000</pubDate>
      <link>https://dev.to/gde/deploying-openclaw-on-google-cloud-vm-avoiding-sudo-and-nvm-pitfalls-92k</link>
      <guid>https://dev.to/gde/deploying-openclaw-on-google-cloud-vm-avoiding-sudo-and-nvm-pitfalls-92k</guid>
      <description>&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fatem7u193qqdcdo7bfox.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fatem7u193qqdcdo7bfox.png" alt="OpenClaw on GCP" width="800" height="436"&gt;&lt;/a&gt;&lt;em&gt;(Image generated by &lt;a href="https://github.com/kkdai/nanobanana" rel="noopener noreferrer"&gt;Nano Banana&lt;/a&gt; - Gemini Image Generation)&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;References:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://openclaw.ai/" rel="noopener noreferrer"&gt;OpenClaw Official Website&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://yu-wenhao.com/zh-TW/blog/openclaw-tools-skills-tutorial/" rel="noopener noreferrer"&gt;OpenClaw Practical Tutorial: Chinese FAQ and Recommended Skills&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://yu-wenhao.com/zh-TW/blog/2026-02-04-is-openclaw-safe-security-guide/" rel="noopener noreferrer"&gt;OpenClaw Security Guide: Security Enhancement Recommendations&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://youtu.be/FC3Wo3ew130" rel="noopener noreferrer"&gt;YouTube Tutorial: Deploying OpenClaw on GCP&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This article documents the complete solution process for the permission, environment variable, and process persistence issues encountered when installing &lt;strong&gt;OpenClaw (2026 Latest Version)&lt;/strong&gt; in a Debian/Ubuntu environment on Google Cloud Platform (GCP).&lt;/p&gt;

&lt;h1&gt;
  
  
  Preface
&lt;/h1&gt;

&lt;p&gt;The AI Agent field has been very popular recently. &lt;strong&gt;OpenClaw&lt;/strong&gt;, as an open-source AI agent that can operate 24 hours a day, has impressed people with its powerful system access and browsing capabilities. For security reasons, deploying it on a cloud VM (such as GCP GCE) is the most ideal approach, which can ensure 24/7 online availability and isolate sensitive local data.&lt;/p&gt;

&lt;p&gt;However, in the default Debian/Ubuntu environment of GCP, due to the permission mechanism being slightly different from that of a general Desktop Linux, following the official script for installation often leads to many pitfalls.&lt;/p&gt;




&lt;h2&gt;
  
  
  🛠️ Basic Installation Process of OpenClaw on GCP
&lt;/h2&gt;

&lt;p&gt;Before we get into troubleshooting, let's quickly go through the standard installation logic:&lt;/p&gt;

&lt;h3&gt;
  
  
  1. Create a VM Instance
&lt;/h3&gt;

&lt;p&gt;Create a new VM in the GCP Console:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Machine type&lt;/strong&gt;: Recommended &lt;code&gt;e2-small&lt;/code&gt; or &lt;code&gt;e2-medium&lt;/code&gt; (depending on your Agent load).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Operating system&lt;/strong&gt;: Recommended to choose &lt;strong&gt;Ubuntu 24.04 LTS&lt;/strong&gt; or &lt;strong&gt;Debian 12&lt;/strong&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Hard disk&lt;/strong&gt;: Recommended 20GB or more.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  2. Connect and Basic Updates
&lt;/h3&gt;

&lt;p&gt;After entering the VM via SSH, first perform a system update:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;sudo apt update &amp;amp;&amp;amp; sudo apt upgrade -y
sudo apt install -y git curl build-essential

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  3. Officially Install OpenClaw
&lt;/h3&gt;

&lt;p&gt;The official website provides a one-click installation script:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;curl -fsSL https://openclaw.ai/install.sh | bash

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;But!&lt;/strong&gt; If you directly execute the above script, you will usually encounter the following two serious permission and path problems on GCP.&lt;/p&gt;




&lt;h2&gt;
  
  
  🛠️ Problem 1: "HAL 9000" Style Denial of sudo-rs
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Symptom:&lt;/strong&gt; When executing the official installation script, the following error is encountered with &lt;code&gt;sudo-rs&lt;/code&gt;:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;code&gt;sudo-rs: I'm sorry evanslin. I'm afraid I can't do that&lt;/code&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;strong&gt;Reason:&lt;/strong&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt; &lt;strong&gt;Interaction Restriction&lt;/strong&gt;: The script executed via &lt;code&gt;curl ... | bash&lt;/code&gt; cannot obtain password input from the terminal when &lt;code&gt;sudo&lt;/code&gt; is required.&lt;/li&gt;
&lt;li&gt; &lt;strong&gt;No Password Account&lt;/strong&gt;: GCP defaults to using SSH Key login, and the user account usually does not have a physical password set, leading to &lt;code&gt;sudo&lt;/code&gt; authentication failure.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;strong&gt;Solution:&lt;/strong&gt; Use &lt;strong&gt;NVM (Node Version Manager)&lt;/strong&gt; to install Node.js, and build the environment under the user directory, completely avoiding the &lt;code&gt;sudo&lt;/code&gt; requirement.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;# 1. Install NVM
curl -o- https://raw.githubusercontent.com/nvm-sh/nvm/v0.39.7/install.sh | bash

# Reload shell configuration
source ~/.bashrc

# 2. Install Node.js
nvm install node # Recommended version v25.7.0+

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  🛠️ Problem 2: NVM Path and Environment Variables
&lt;/h2&gt;

&lt;p&gt;After using NVM, although &lt;code&gt;sudo&lt;/code&gt; is avoided, a new problem arises: when you log in again or execute commands using a non-interactive shell, the system may not be able to find the &lt;code&gt;node&lt;/code&gt; or &lt;code&gt;openclaw&lt;/code&gt; command.&lt;/p&gt;

&lt;p&gt;This is because the NVM path is dynamically loaded. It is recommended to ensure that the following content exists in &lt;code&gt;~/.bashrc&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;export NVM_DIR="$HOME/.nvm"
[-s "$NVM_DIR/nvm.sh"] &amp;amp;&amp;amp; \. "$NVM_DIR/nvm.sh"
[-s "$NVM_DIR/bash_completion"] &amp;amp;&amp;amp; \. "$NVM_DIR/bash_completion"

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  🛠️ Problem 3: How to Make OpenClaw Run 24/7 Stably?
&lt;/h2&gt;

&lt;p&gt;After installation, in order to keep the Agent running after closing the SSH window, I switched from the original GCP Web SSH to using the local &lt;code&gt;gcloud&lt;/code&gt; CLI, but I also found a new small pitfall.&lt;/p&gt;

&lt;h3&gt;
  
  
  1. Why gcloud ssh can't find openclaw?
&lt;/h3&gt;

&lt;p&gt;This is usually because GCP's &lt;code&gt;gcloud compute ssh&lt;/code&gt; may create a new username based on your &lt;strong&gt;local account name&lt;/strong&gt;, instead of using the account you used when installing on the VM (e.g., &lt;code&gt;evanslin&lt;/code&gt;).&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Verification method:&lt;/strong&gt; Please enter the following in the "Web SSH" and "Local gcloud SSH" windows respectively:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;whoami

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Root cause:&lt;/strong&gt; If the web version shows &lt;code&gt;evanslin&lt;/code&gt;, but the gcloud version shows a name like &lt;code&gt;evan_lin_yourdomain_com&lt;/code&gt;, then the home directory paths of the two are completely different, and your NVM and OpenClaw settings will of course "disappear".&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Solution:&lt;/strong&gt; When executing the &lt;code&gt;gcloud&lt;/code&gt; command, &lt;strong&gt;explicitly specify&lt;/strong&gt; the account to log in to:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;gcloud compute ssh evanslin@openclaw-evanlin

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This will ensure that you return to the correct environment!&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Use tmux and Startup Script to Achieve Perfect Execution
&lt;/h3&gt;

&lt;p&gt;In order to ensure that environment variables can be loaded correctly in any SSH session (web version or gcloud version), and to keep OpenClaw running stably in the background, it is recommended to use the following "scripted" startup method.&lt;/p&gt;

&lt;h4&gt;
  
  
  Step 1: Create a Startup Script
&lt;/h4&gt;

&lt;p&gt;In a window where you can normally execute &lt;code&gt;openclaw&lt;/code&gt; (usually Web SSH), create a startup script:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;cat &amp;lt;&amp;lt; 'EOF' &amp;gt; ~/start_openclaw.sh
#!/bin/bash
# 1. Force loading NVM path
export NVM_DIR="$HOME/.nvm"
[-s "$NVM_DIR/nvm.sh"] &amp;amp;&amp;amp; \. "$NVM_DIR/nvm.sh"

# 2. Automatically correct PATH (please adjust the path according to your Node version)
export PATH="$HOME/.nvm/versions/node/v25.7.0/bin:$PATH"

# 3. Execute command
openclaw "$@"
EOF

# Grant execution permission
chmod +x ~/start_openclaw.sh

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  Step 2: Verify the Script
&lt;/h4&gt;

&lt;p&gt;From now on, no matter where you log in from, please use this script uniformly. Test in the &lt;code&gt;gcloud ssh&lt;/code&gt; window:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;~/start_openclaw.sh gateway

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If it can run successfully, it means the path has been manually connected!&lt;/p&gt;

&lt;h4&gt;
  
  
  Step 3: Combine tmux to Solve the Disconnection Problem
&lt;/h4&gt;

&lt;p&gt;Now we combine the script with &lt;code&gt;tmux&lt;/code&gt; to achieve true 24/7 background operation:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt; &lt;strong&gt;Open a new session&lt;/strong&gt;: &lt;code&gt;tmux new -s openclaw&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt; &lt;strong&gt;Execute the script inside&lt;/strong&gt;: &lt;code&gt;~/start_openclaw.sh gateway&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt; &lt;strong&gt;Perfectly detach&lt;/strong&gt;: Press &lt;code&gt;Ctrl + B&lt;/code&gt; and release, then press &lt;code&gt;D&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt; &lt;strong&gt;Reconnect at any time&lt;/strong&gt;: Next time you log in, execute &lt;code&gt;tmux a -t openclaw&lt;/code&gt;.&lt;/li&gt;
&lt;/ol&gt;




&lt;h2&gt;
  
  
  Summary
&lt;/h2&gt;

&lt;p&gt;The key to deploying OpenClaw on GCP is &lt;strong&gt;"user directory priority"&lt;/strong&gt;. By using NVM to avoid the system-level &lt;code&gt;sudo-rs&lt;/code&gt; restriction, not only is the installation process smoother, but it also makes it easier to switch Node.js versions to meet the latest requirements of OpenClaw.&lt;/p&gt;

&lt;p&gt;After successful deployment, don't forget to use &lt;code&gt;openclaw onboard&lt;/code&gt; to start configuring your API Keys and communication channels (such as Telegram or Discord).&lt;/p&gt;

&lt;p&gt;I hope this note can help developers who are also working hard on GCP. See you next time!&lt;/p&gt;

</description>
      <category>ai</category>
      <category>cloud</category>
      <category>google</category>
      <category>tutorial</category>
    </item>
    <item>
      <title>Sharing Good Books: Secrets to Successful WFH</title>
      <dc:creator>Evan Lin</dc:creator>
      <pubDate>Sat, 28 Feb 2026 16:38:19 +0000</pubDate>
      <link>https://dev.to/evanlin/sharing-good-books-secrets-to-successful-wfh-5g8j</link>
      <guid>https://dev.to/evanlin/sharing-good-books-secrets-to-successful-wfh-5g8j</guid>
      <description>&lt;p&gt;&lt;a href="http://moo.im/a/7opqFR" rel="noopener noreferrer"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3pw9cfbmn02qn16w62a1.jpg" width="210" height="298"&gt;&lt;/a&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;WFH在家工作的成功祕訣
美國中小企業最佳CEO教你高效、彈性、具團隊精神的企業競爭新優勢
How to Thrive in the Virtual Workplace : Simple and Effective Tips for Successful， Productive and Empowered Remote Work

作者： 羅伯特・格雷瑟 米克・史隆 原文作者： Robert Glazer Mick Sloan 譯者： 孟令函 出版社：遠流出版

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  Purchase Recommendation Website:
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="http://moo.im/a/7opqFR" rel="noopener noreferrer"&gt;Readmoo Online Book Purchase&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h1&gt;
  
  
  Preface:
&lt;/h1&gt;

&lt;p&gt;This is the eleventh book I've read this year. Since the international outbreak of the pandemic in 2020, the company has actually started to accelerate its transformation into a Hybrid Office concept. The so-called Hybrid Office means having flexible office seats, plus flexible remote work. During the severe pandemic, we even directly launched a full-stage WFH.&lt;/p&gt;

&lt;p&gt;Whether you are employees or supervisors, are you afraid or fond of WFH? Do you like it because you don't need to spend extra commuting time, but you also worry that your home doesn't have enough equipment, and you also worry that you don't have actual interaction with your colleagues? When I saw this book, I thought it was interesting, so I bought it and read it.&lt;/p&gt;

&lt;h1&gt;
  
  
  Content Introduction:
&lt;/h1&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Build a virtual office and enhance future competitiveness!
"If you are still struggling with remote work, Robert Glazer can provide you with some immediately actionable advice." - Adam Grant (Professor at Wharton Business School, author of "Give and Take")

When millions of office workers around the world were suddenly forced to work from home (WFH, Work From Home) to prevent the pandemic, business owners found that employees were more willing to accept it than they had previously understood, and most of the work content could still operate normally. However, not every company and every office worker can smoothly transition overnight, and it's not enough to simply apply the work procedures and strategies commonly used in physical offices. In the future, as remote or hybrid work models become more and more common, companies that do well will have a clear competitive advantage and attract the best talent.

As the founder and CEO of "Accelerate Partners," a 100% remote work organization with 170 employees working from home, Robert Glazer has drawn on more than ten years of valuable experience to extract the correct principles, strategies, and tools for managing remote employees, allowing companies to excel in both the virtual and real worlds.

Office workers will from now on:
✔ Don't have to commute, stay away from the pressure of high housing prices and high rents in the city
✔ Not be disturbed, create their own work schedule and environment
✔ Enjoy the ideal life of balancing family, interests, and work

Companies can even:
✔ Save costs, or can invest more resources in employees and customers
✔ Improve efficiency, and can achieve excellent performance and work results globally
✔ Create an equal and cohesive work environment, retaining outstanding talent
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h1&gt;
  
  
  Chapter Outline
&lt;/h1&gt;

&lt;h2&gt;
  
  
  Part 1: The Winning Mindset for Remote Workers
&lt;/h2&gt;

&lt;p&gt;What exactly is remote work? It's not a product of the pandemic. Before the pandemic, many companies needed businesses or customer service marketing personnel around the world. But they couldn't afford to set up physical offices in every region. The result was that employees came from all over the world and could work from their own homes.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;The Winning Mindset for Remote Workers&lt;/strong&gt;
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Recruit diligent, responsible, and self-disciplined employees&lt;/li&gt;
&lt;li&gt;Give them enough trust&lt;/li&gt;
&lt;li&gt;Perfect work procedures&lt;/li&gt;
&lt;li&gt;Excellent company culture&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;The Basics for Remote Workers&lt;/strong&gt;
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Develop a work plan and execute it effectively&lt;/li&gt;
&lt;li&gt;Create a suitable work environment&lt;/li&gt;
&lt;li&gt;Establish a clear boundary between work and personal life&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Properly Manage Your Email
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;After remote work, the probability of email exchanges will increase.&lt;/li&gt;
&lt;li&gt;How to let others know your reply frequency is very important.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Methods to Improve Work Efficiency
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Allocate energy well&lt;/li&gt;
&lt;li&gt;Create a buffer before and after work&lt;/li&gt;
&lt;li&gt;Prioritize and allocate time.&lt;/li&gt;
&lt;li&gt;Establish expectations&lt;/li&gt;
&lt;li&gt;Stay focused

&lt;ul&gt;
&lt;li&gt;Try to focus on one thing for at least 15 to 20 minutes a day.&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;Take care of yourself

&lt;ul&gt;
&lt;li&gt;Physical and mental health is very important, don't ruin your health because of WFH.&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;Establish communication between people

&lt;ul&gt;
&lt;li&gt;Create some chat channels&lt;/li&gt;
&lt;li&gt;Allow more participants to speak in meetings.&lt;/li&gt;
&lt;li&gt;Make good use of asynchronous video (use videos instead of emails or announcements)&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;/ul&gt;

&lt;h3&gt;
  
  
  Changing Work Location
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Make sure that the region (country) has an office of the company, otherwise there may be problems with salary remittances.&lt;/li&gt;
&lt;li&gt;Due to the difference in labor laws and tax rates in various countries, employee benefits and labor regulations are different.&lt;/li&gt;
&lt;li&gt;Changing countries may result in salary differences, which will be adjusted based on the cost of living in each location.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Part 2: The Success Rules for Remote Work Companies
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Starting from Organizational Culture
&lt;/h3&gt;

&lt;p&gt;Since remote work companies care a lot about employees' autonomous work motivation, every colleague needs to have an in-depth understanding of the organizational culture. (And also be able to deeply identify with it).&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Company Culture:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Vision&lt;/li&gt;
&lt;li&gt;Values&lt;/li&gt;
&lt;li&gt;Goals&lt;/li&gt;
&lt;li&gt;Consistency&lt;/li&gt;
&lt;li&gt;Clear and explicit&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  How to Describe the Core Concept of the Company:
&lt;/h4&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;The core concept is to think about a specific future point in time, and use the tone of describing the current facts, as much as possible, to detail what the company and employees will be like at that time, and how they will feel.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  When to Use the Core Concept:
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Recruiting employees&lt;/li&gt;
&lt;li&gt;Major policy decisions&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  How to Recruit Suitable Remote Employees
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Ask the other party if they have remote work experience&lt;/li&gt;
&lt;li&gt;Whether they agree with the core concept&lt;/li&gt;
&lt;li&gt;Look at the other party's concept and handling methods for remote work&lt;/li&gt;
&lt;li&gt;You can ask detailed questions

&lt;ul&gt;
&lt;li&gt;Do you like remote work -&amp;gt; Why do you like it -&amp;gt; How do you arrange it -&amp;gt; Self-adjustment&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;/ul&gt;

&lt;h3&gt;
  
  
  How to Conduct Remote Interviews
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Fact-based interview questions&lt;/li&gt;
&lt;li&gt;What changes does remote work bring&lt;/li&gt;
&lt;li&gt;Are you troubled because you can't work face-to-face?&lt;/li&gt;
&lt;li&gt;How to communicate effectively without meeting&lt;/li&gt;
&lt;li&gt;How to avoid feeling isolated while working from home
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Will not waste training resources for someone who only meets the average standard
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Notes for Remote Work Colleagues
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Complete onboarding process

&lt;ul&gt;
&lt;li&gt;1 on 1 with each supervisor&lt;/li&gt;
&lt;li&gt;Colleagues breaking the ice&lt;/li&gt;
&lt;li&gt;Setting up equipment&lt;/li&gt;
&lt;li&gt;Related pre-education system&lt;/li&gt;
&lt;li&gt;More special:&lt;/li&gt;
&lt;li&gt;Introduction to company regulations (especially related to remote work)&lt;/li&gt;
&lt;li&gt;Introduction to company culture (to constantly keep everyone on the same core concept)&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;Reduce meetings, especially reduce regular meetings, and change to irregular, fast, and concise discussions with a small number of participants

&lt;ul&gt;
&lt;li&gt;Meeting participants rate themselves whether they need to attend, if it is less than six points. Then cancel the relevant meeting.&lt;/li&gt;
&lt;li&gt;Everyone participating in the meeting must speak&lt;/li&gt;
&lt;li&gt;Meeting summaries are very important (to avoid someone not being able to participate)&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;Etiquette in different time zones

&lt;ul&gt;
&lt;li&gt;Emails and messages should clearly indicate the time zone that needs attention. (If possible, convert it to their time zone)&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;Travel strategy

&lt;ul&gt;
&lt;li&gt;The probability of business trips is expected to decrease after the pandemic&lt;/li&gt;
&lt;li&gt;Become more individual, more face-to-face meetings with fewer people&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;Team camaraderie

&lt;ul&gt;
&lt;li&gt;Deepen camaraderie through regular meetings and casual chats after meetings.&lt;/li&gt;
&lt;li&gt;Play some online games&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;Performance management

&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Compared to physical work, remote work requires more feedback and suggestions&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;Don't always think that you only give opinions during performance evaluations.&lt;/li&gt;
&lt;li&gt;This can increase the sense of trust between colleagues.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;Praise, praise immediately!&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;Responsible culture

&lt;ul&gt;
&lt;li&gt;Avoid the strategy of close monitoring&lt;/li&gt;
&lt;li&gt;Through weekly reports, or daily regular reports.&lt;/li&gt;
&lt;li&gt;Give more trust and care appropriately.&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;Trust crisis:

&lt;ul&gt;
&lt;li&gt;If any violations occur, they need to be handled immediately.&lt;/li&gt;
&lt;li&gt;And announce it (without announcing the name, only providing the violation) as a reminder among colleagues&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;Physical employee conference

&lt;ul&gt;
&lt;li&gt;Although everyone is working remotely, it doesn't mean that everyone doesn't need to meet. You can arrange for everyone to meet in the same place once a year.&lt;/li&gt;
&lt;li&gt;Connect feelings and synchronize company culture&lt;/li&gt;
&lt;li&gt;And it can make more people work more smoothly&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;/ul&gt;

&lt;h1&gt;
  
  
  Thoughts
&lt;/h1&gt;

&lt;p&gt;This book was written by the founder of a startup accelerator company, and his company has also been doing full remote work for a long time. The whole book clearly explains remote work through the impact of the pre-pandemic and the pandemic. Two major aspects:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;As a remote worker, how should you adjust?&lt;/li&gt;
&lt;li&gt;As a manager, how should you manage your all-remote team? (or even a full-remote company)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This book gives remote workers the psychological preparation they should have. After all, remote work is not just about saving commuting time, but also about paying more attention to the overall transformation of the work model. Remote work requires a higher degree of self-discipline and proactive aspects. Only then can supervisors and company colleagues trust and feel at ease. It is even more necessary to balance your life and work to avoid blurring the lines between work and personal life because of working from home, which can lead to early burn-out.&lt;/p&gt;

&lt;p&gt;And as a manager, you need to pay more attention to the company culture and core concepts. Because employees are scattered everywhere, they cannot feel the banners and slogans in many office decorations. You need to frequently communicate relevant information, and you also need to pay special attention when recruiting employees. Not all employees can understand and properly use the benefits that remote work brings to them. This book also spends a lot of time teaching how to build corporate culture and core concepts (remotely), which also gives me a lot of in-depth understanding.&lt;/p&gt;

&lt;p&gt;Finally, whether you are a prospective remote worker or a management level who may become a remote worker, this book can help you.&lt;/p&gt;

</description>
      <category>career</category>
      <category>management</category>
      <category>productivity</category>
      <category>resources</category>
    </item>
    <item>
      <title>LINE Bot with Long Memory: Firebase Database, Gemini Pro, and Cloud Functions</title>
      <dc:creator>Evan Lin</dc:creator>
      <pubDate>Sat, 28 Feb 2026 16:38:04 +0000</pubDate>
      <link>https://dev.to/evanlin/line-bot-with-long-memory-firebase-database-gemini-pro-and-cloud-functions-455j</link>
      <guid>https://dev.to/evanlin/line-bot-with-long-memory-firebase-database-gemini-pro-and-cloud-functions-455j</guid>
      <description>&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fe0ewdqrz4j8ctmd78pjh.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fe0ewdqrz4j8ctmd78pjh.png" alt="image-20240413210750427" width="800" height="1731"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h1&gt;
  
  
  Preface:
&lt;/h1&gt;

&lt;p&gt;This is the second in a series of articles for the BUILD WITH AI (BWAI) WORKSHOP, in collaboration with the Google Developer Group on 04/18 (it's unknown how many more articles are needed).&lt;/p&gt;

&lt;p&gt;This article will focus on the following aspects:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Firebase Database setup&lt;/li&gt;
&lt;li&gt;How to access Firebase through the official Golang on Cloud Function&lt;/li&gt;
&lt;li&gt;Using Firebase Database to make your Gemini remember everything that has been said, optimizing the LINE Bot built in &lt;a href="https://dev.to/evanlin/bwai-workshopgolang-line-oa-cloudfunction-geminipro-firebase-lu-xing-xiao-bang-shou-line-liao-tian-ji-qi-ren-23j9-temp-slug-2266421"&gt;the last time&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;h1&gt;
  
  
  Article List:
&lt;/h1&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://dev.to/evanlin/bwai-workshopgolang-line-oa-cloudfunction-geminipro-firebase-lu-xing-xiao-bang-shou-line-liao-tian-ji-qi-ren-23j9-temp-slug-2266421"&gt;[BwAI workshop][Golang] LINE OA + CloudFunction + GeminiPro + Firebase = Travel Assistant LINE Chatbot (1): Scenery Recognition Assistant&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;[BwAI workshop][Golang] LINE OA + CloudFunction + GeminiPro + Firebase = Travel Assistant LINE Chatbot (2): Firebase Database gives LINEBot a super long memory&lt;/li&gt;
&lt;/ul&gt;

&lt;h1&gt;
  
  
  Preparation
&lt;/h1&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://developers.line.biz/en/" rel="noopener noreferrer"&gt;LINE Developer Account&lt;/a&gt;&lt;/strong&gt;: You only need a LINE account to apply for a developer account.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://cloud.google.com/functions?hl=zh_cn" rel="noopener noreferrer"&gt;&lt;strong&gt;Google Cloud Functions&lt;/strong&gt;&lt;/a&gt;: The &lt;strong&gt;deployment platform&lt;/strong&gt; for Go code, generating the webhook address for LINEBot.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://firebase.google.com/" rel="noopener noreferrer"&gt;&lt;strong&gt;Firebase&lt;/strong&gt;&lt;/a&gt;: Create a &lt;strong&gt;Realtime database&lt;/strong&gt;, LINE Bot can remember your previous conversations and even answer many interesting questions.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://aistudio.google.com/" rel="noopener noreferrer"&gt;Google AI Studio&lt;/a&gt;&lt;/strong&gt;: You can get the Gemini Key here.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Applying for Firebase Database Service
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Remember to go to &lt;a href="https://console.firebase.google.com/" rel="noopener noreferrer"&gt;Firebase Console&lt;/a&gt; and create a project.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Create a Firebase Realtime Database, which will be used later&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Select the US region&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Start in “lock mode”&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;For ease of development, set it to read and write in “Rules”. Pay close attention:&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fs2umoeo1n59unkzyx7cf.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fs2umoeo1n59unkzyx7cf.png" alt="image-20240413213202354" width="633" height="413"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Remember the URL (Note! &lt;strong&gt;You need to change the permissions back when you go live&lt;/strong&gt;), and add an item: “ &lt;strong&gt;BwAI&lt;/strong&gt; ”&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fouj8ilfmmmoah9bt50u9.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fouj8ilfmmmoah9bt50u9.png" alt="image-20240413213802313" width="800" height="448"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Applying for Services Account Credential to connect Cloud Function to Google services
&lt;/h2&gt;

&lt;p&gt;You can actually refer to another article of mine for this part of the tutorial. &lt;a href="https://www.evanlin.com/til-heroku-gcp-key/" rel="noopener noreferrer"&gt;[Learning Document] How to use Golang to access Google Cloud services on Heroku&lt;/a&gt;, but I'll quickly go through it here.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Enter Google Cloud Console, go to IAM &amp;amp; Admin and select Create Services Account&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fq73jo7sv5juu64eome4o.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fq73jo7sv5juu64eome4o.png" alt="image-20240413221505536" width="444" height="382"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Decide on the Services Account Name yourself, pay attention (the project and Firebase &lt;strong&gt;project names must be consistent&lt;/strong&gt;)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fr4ys6mumbx82iiwf69zq.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fr4ys6mumbx82iiwf69zq.png" alt="image-20240413222847247" width="651" height="684"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Grant this service account access to project. When setting the identity, it is recommended to use Editor first (it is larger and needs to be used with caution)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F541gygfqb74m3zava6ed.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F541gygfqb74m3zava6ed.png" alt="image-20240413223055288" width="800" height="468"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;“Grant users access to this service account” does not need to be specifically set&lt;/li&gt;
&lt;li&gt;Press “Manage Keys” to prepare to download Credential&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0vgtbig2kdgql9sxhwt8.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0vgtbig2kdgql9sxhwt8.png" alt="image-20240413223225404" width="800" height="296"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Select Add Key -&amp;gt; Create new Key -&amp;gt; Download JSON&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ftiutm3is5eitmz93w6a9.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ftiutm3is5eitmz93w6a9.png" alt="image-20240413223613244" width="555" height="343"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Things to note when using Golang Google Options package:
&lt;/h2&gt;

&lt;p&gt;Although Firebase Realtime Database has been set to allow everyone to read and write, if you access it through Golang, you will get an Unauthorized request error message. This is because the Project of your JSON file is different from your Firebase Project. Just recreate a Services Account and update the JSON content.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fao9k6eymdu1cienk09hb.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fao9k6eymdu1cienk09hb.png" alt="image-20240413220630196" width="800" height="62"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h1&gt;
  
  
  How to import Services Account Credential in Google Cloud Function?
&lt;/h1&gt;

&lt;p&gt;Next, I will share how to correctly use it within Cloud Function. If you want to directly use Cloud Function to open the Credential JSON file, you will always get an error message that you cannot get the credential correctly.&lt;/p&gt;

&lt;p&gt;At this time, you need to add it through environment variables:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Copy all the content in the JSON file&lt;/li&gt;
&lt;li&gt;Set the &lt;code&gt;GOOGLE_APPLICATION_CREDENTIALS&lt;/code&gt; parameter, and then paste all the content into the environment parameter.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Flbhayrqqfametdn4v8w2.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Flbhayrqqfametdn4v8w2.png" alt="image-20240413225710980" width="247" height="315"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Next, I will tell you how to modify the relevant code?
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;    // Init firebase related variables
    ctx := context.Background()
    opt := option.WithCredentialsJSON([]byte(os.Getenv("GOOGLE_APPLICATION_CREDENTIALS")))
    config := &amp;amp;firebase.Config{DatabaseURL: os.Getenv("FIREBASE_URL")}
    app, err := firebase.NewApp(ctx, config, opt)
    if err != nil {
        log.Fatalf("error initializing app: %v", err)
    }
    client, err := app.Database(ctx)
    if err != nil {
        log.Fatalf("error initializing database: %v", err)
    }

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;First, &lt;code&gt;option.WithCredentialsJSON([]byte(os.Getenv("GOOGLE_APPLICATION_CREDENTIALS")))&lt;/code&gt; allows you to read the credential from the environment variable.&lt;/li&gt;
&lt;li&gt;Next, &lt;code&gt;&amp;amp;firebase.Config{DatabaseURL: os.Getenv("FIREBASE_URL")}&lt;/code&gt; sets the FIREBASE_URL content.&lt;/li&gt;
&lt;li&gt;This can be executed correctly, and then we will look at the relevant processing of Gemini chat history.&lt;/li&gt;
&lt;/ul&gt;

&lt;h1&gt;
  
  
  How to correctly process Gemini Pro Chat History?
&lt;/h1&gt;

&lt;h1&gt;
  
  
  Full Source Code
&lt;/h1&gt;

&lt;p&gt;You can find the relevant open source code here: &lt;a href="https://github.com/kkdai/linebot-cf-firebase" rel="noopener noreferrer"&gt;https://github.com/kkdai/linebot-cf-firebase&lt;/a&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>gemini</category>
      <category>go</category>
      <category>tutorial</category>
    </item>
    <item>
      <title>Gemini: Building a LINE E-commerce Chatbot That Can "Tell Stories" from Images</title>
      <dc:creator>Evan Lin</dc:creator>
      <pubDate>Thu, 26 Feb 2026 02:44:27 +0000</pubDate>
      <link>https://dev.to/gde/gemini-building-a-line-e-commerce-chatbot-that-can-tell-stories-from-images-5dd9</link>
      <guid>https://dev.to/gde/gemini-building-a-line-e-commerce-chatbot-that-can-tell-stories-from-images-5dd9</guid>
      <description>&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fuc7ulj3k2ehr5j0fwdch.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fuc7ulj3k2ehr5j0fwdch.png" alt="image-20260225234804185" width="800" height="860"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxxox886v85qv8909apuo.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxxox886v85qv8909apuo.png" alt="image-20260225234701217" width="800" height="858"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;References:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  &lt;a href="https://ai.google.dev/gemini-api/docs/function-calling?hl=zh-tw#multimodal" rel="noopener noreferrer"&gt;Gemini API - Function Calling with Multimodal&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;  &lt;a href="https://github.com/kkdai/linebot-gemini-multimodel-funcal" rel="noopener noreferrer"&gt;GitHub: linebot-gemini-multimodel-funcal&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;  &lt;a href="https://cloud.google.com/vertex-ai/generative-ai/docs/multimodal/function-calling#mm-fr" rel="noopener noreferrer"&gt;Vertex AI - Multimodal Function Response&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;  Complete code &lt;a href="https://github.com/kkdai/linebot-gemini-multimodel-funcal" rel="noopener noreferrer"&gt;GitHub&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;h1&gt;
  
  
  Background
&lt;/h1&gt;

&lt;p&gt;I believe many people have used the combination of LINE Bot + Function Calling. When a user asks "What clothes did I buy last month?", the Bot calls the database query function, retrieves the order data, and then Gemini answers based on that JSON:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Traditional process designed by developers:

User: "Help me take a look at the jacket I bought before"
Bot: [Call get_order_history()]
Function returns: {"product_name": "Brown pilot jacket", "order_date": "2026-01-15", ...}
Gemini: "You bought a brown pilot jacket on January 15th for NT$1,890."

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The answer is completely correct, but it always feels like something is missing - the user is talking about "that jacket", and Gemini is just restating the text in the JSON, and has no way to "confirm" what that piece of clothing looks like. If there happen to be three jackets in the database, the AI simply cannot determine which one is the one the user remembers.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;AI can read text, but cannot see images&lt;/strong&gt; - this limitation has always been a blind spot under the traditional Function Calling architecture.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3uneu8jv8mnj4t3576l3.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3uneu8jv8mnj4t3576l3.png" alt="image-20260225230645814" width="800" height="452"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This problem was truly solved until Gemini launched &lt;strong&gt;Multimodal Function Response&lt;/strong&gt;.&lt;/p&gt;




&lt;h2&gt;
  
  
  What is Multimodal Function Response?
&lt;/h2&gt;

&lt;p&gt;The traditional Function Calling process is as follows:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;[User message] → Gemini → [function_call] → [Execute function] → [Return JSON] → Gemini → [Text answer]

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Multimodal Function Response&lt;/strong&gt; changed that middle step. The function can not only return JSON, but also include images (JPEG/PNG/WebP) or documents (PDF) in the same response:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fge58c6ayrjas18sl2qjz.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fge58c6ayrjas18sl2qjz.png" alt="Google Chrome 2026-02-25 23.04.28" width="800" height="439"&gt;&lt;/a&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;[User message] → Gemini → [function_call] → [Execute function] → [Return JSON + image bytes] → Gemini → [Text answer after seeing the image]

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Gemini can "see" the structured data and images returned by the function at the same time when generating the next round of answers, thereby generating richer and more accurate responses.&lt;/p&gt;

&lt;p&gt;The media formats currently supported by the official:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Category&lt;/th&gt;
&lt;th&gt;Supported format&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Image&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;image/jpeg&lt;/code&gt;, &lt;code&gt;image/png&lt;/code&gt;, &lt;code&gt;image/webp&lt;/code&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Document&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;application/pdf&lt;/code&gt;, &lt;code&gt;text/plain&lt;/code&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The application scenarios of this function are very broad: e-commerce customer service (identifying product images), medical consultation (analyzing PDF of inspection reports), design review (giving suggestions based on screenshots)... almost all scenarios that require "functions to return visual data for AI analysis" are applicable.&lt;/p&gt;




&lt;h2&gt;
  
  
  Project Goals
&lt;/h2&gt;

&lt;p&gt;This time, I used Multimodal Function Response to create a &lt;strong&gt;LINE e-commerce customer service robot&lt;/strong&gt;, demonstrating the following scenario:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;User: "Help me take a look at the jacket I bought before" Bot (traditional): "You bought a brown pilot jacket." Bot (Multimodal): "From the photo, you can see that this is a brown pilot jacket, made of lightweight nylon, with metal zipper decorative pockets on the sides. This is your January 15th order ORD-2026-0115, a total of NT$1,890, and has been delivered." + &lt;strong&gt;Product photo&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;The difference is obvious: Gemini really "saw" that piece of clothing, rather than just restating the text in the database.&lt;/p&gt;




&lt;h2&gt;
  
  
  Architecture Design
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Why not use Google ADK?
&lt;/h3&gt;

&lt;p&gt;Originally, this repo used Google ADK (Agent Development Kit) to manage the Agent. The &lt;code&gt;Runner&lt;/code&gt; and &lt;code&gt;Agent&lt;/code&gt; of ADK encapsulated the entire process of Function Calling, which was very convenient.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;But Multimodal Function Response needs to manually include image bytes in the &lt;code&gt;parts&lt;/code&gt; of the function response, and ADK completely encapsulates this layer, so it cannot intervene.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;So this time, I directly used &lt;code&gt;google.genai.Client&lt;/code&gt; to implement the iterative loop of function calls myself:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;# Old architecture (ADK)
runner = Runner(agent=root_agent, ...)
async for event in runner.run_async(...):
    ... # ADK handles all function calls for you, but you cannot control the response content

# New architecture (directly use google.genai)
response = await client.aio.models.generate_content(
    model=model,
    contents=contents,
    config=types.GenerateContentConfig(tools=ECOMMERCE_TOOLS),
)
# Handle function calls yourself, include images yourself

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Overall Architecture
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;LINE User
    │
    ▼ POST /
FastAPI Webhook Handler
    │
    ▼
EcommerceAgent.process_message(text, line_user_id)
    │
    ├─ ① Call Gemini (with conversation history)
    │
    ├─ ② Gemini decides to call the tool → function_call
    │
    ├─ ③ _execute_tool()
    │ ├─ Execute query function (search_products / get_order_history / get_product_details)
    │ └─ Read real product photos in the img/ directory (Unsplash JPEG)
    │
    ├─ ④ Construct Multimodal Function Response
    │ └─ FunctionResponsePart(inline_data=FunctionResponseBlob(data=image_bytes))
    │
    ├─ ⑤ Call Gemini again (Gemini sees the image + data)
    │
    └─ ⑥ Return (ai_text, image_bytes)
    │
    ▼
LINE Reply:
  TextSendMessage(text=ai_text)
  ImageSendMessage(url=BOT_HOST_URL/images/{uuid}) ← FastAPI /images endpoint provided

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  How do the product images come from?
&lt;/h3&gt;

&lt;p&gt;This demo uses real &lt;strong&gt;Unsplash clothing photography photos&lt;/strong&gt;. Each of the five products corresponds to an actual photo of the clothing, stored in the &lt;code&gt;img/&lt;/code&gt; directory. The reading logic is very simple:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;def generate_product_image(product: dict) -&amp;gt; bytes:
    """Read the product image and return JPEG bytes."""
    with open(product["image_path"], "rb") as f:
        return f.read()

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Each product in &lt;code&gt;PRODUCTS_DB&lt;/code&gt; has an &lt;code&gt;image_path&lt;/code&gt; field pointing to the corresponding image file:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Product ID&lt;/th&gt;
&lt;th&gt;Name&lt;/th&gt;
&lt;th&gt;Image&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;P001&lt;/td&gt;
&lt;td&gt;Brown pilot jacket&lt;/td&gt;
&lt;td&gt;tobias-tullius-...-unsplash.jpg&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;P002&lt;/td&gt;
&lt;td&gt;White cotton T-shirt&lt;/td&gt;
&lt;td&gt;mediamodifier-...-unsplash.jpg&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;P003&lt;/td&gt;
&lt;td&gt;Dark blue denim jacket&lt;/td&gt;
&lt;td&gt;caio-coelho-...-unsplash.jpg&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;P004&lt;/td&gt;
&lt;td&gt;Beige knit shawl&lt;/td&gt;
&lt;td&gt;milada-vigerova-...-unsplash.jpg&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;P005&lt;/td&gt;
&lt;td&gt;Light blue simple T-shirt&lt;/td&gt;
&lt;td&gt;cristofer-maximilian-...-unsplash.jpg&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The image bytes read have two uses:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt; As &lt;code&gt;FunctionResponseBlob&lt;/code&gt; to be included for Gemini analysis - real photos allow Gemini to describe the actual fabric texture and tailoring details&lt;/li&gt;
&lt;li&gt; Temporarily stored in the &lt;code&gt;image_cache&lt;/code&gt; dict, provided to the LINE Bot for display through the FastAPI &lt;code&gt;/images/{uuid}&lt;/code&gt; endpoint&lt;/li&gt;
&lt;/ol&gt;




&lt;h2&gt;
  
  
  Core Code Details
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Step 1: Define Tools (FunctionDeclaration)
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;from google.genai import types

ECOMMERCE_TOOLS = [
    types.Tool(function_declarations=[
        types.FunctionDeclaration(
            name="get_order_history",
            description="Query the current user's order history",
            parameters=types.Schema(
                type=types.Type.OBJECT,
                properties={
                    "time_range": types.Schema(
                        type=types.Type.STRING,
                        description="Time range: all / last_month / last_3_months",
                        enum=["all", "last_month", "last_3_months"],
                    ),
                },
                required=[],
            ),
        ),
        # ... search_products, get_product_details
    ])
]

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Step 2: Function Call Loop (up to 5 iterations)
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;async def process_message(self, text: str, line_user_id: str):
    contents = self._get_history(line_user_id) + [
        types.Content(role="user", parts=[types.Part(text=text)])
    ]

    for _iteration in range(5): # Up to 5 times, to prevent infinite loops
        response = await self._client.aio.models.generate_content(
            model=self._model,
            contents=contents,
            config=types.GenerateContentConfig(
                system_instruction=_SYSTEM_INSTRUCTION,
                tools=ECOMMERCE_TOOLS,
            ),
        )

        model_content = response.candidates[0].content
        contents.append(model_content)

        # Find all function_call parts
        fc_parts = [p for p in model_content.parts if p.function_call and p.function_call.name]

        if not fc_parts:
            # No function call → final text response
            final_text = "".join(p.text for p in model_content.parts if p.text)
            break

        # Has function call → execute tool, include image
        tool_parts = []
        for fc_part in fc_parts:
            result_dict, image_bytes = _execute_tool(
                fc_part.function_call.name,
                dict(fc_part.function_call.args),
                line_user_id,
            )
            tool_parts.append(
                self._build_multimodal_response(fc_part.function_call.name, result_dict, image_bytes)
            )

        contents.append(types.Content(role="tool", parts=tool_parts))

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Step 3: Construct Multimodal Function Response (the most critical step)
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;def _build_multimodal_response(self, func_name, result_dict, image_bytes):
    multimodal_parts = []

    if image_bytes:
        # ⚠️ Note: Here you need to use FunctionResponseBlob, not types.Blob!
        multimodal_parts.append(
            types.FunctionResponsePart(
                inline_data=types.FunctionResponseBlob(
                    mime_type="image/jpeg",
                    data=image_bytes, # raw bytes, SDK handles base64 internally
                )
            )
        )

    return types.Part.from_function_response(
        name=func_name,
        response=result_dict, # Structured JSON data
        parts=multimodal_parts or None, # ← Image is here! Gemini can "see" it after receiving it
    )

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Gemini will receive &lt;code&gt;result_dict&lt;/code&gt; (order JSON) and &lt;code&gt;image_bytes&lt;/code&gt; (product image) at the same time in the next &lt;code&gt;generate_content&lt;/code&gt; call, and the generated answer can therefore describe the visual content of the image.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 4: LINE Bot simultaneously returns text + image
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;# main.py

ai_text, image_bytes = await ecommerce_agent.process_message(msg_text, line_user_id)

reply_messages = [TextSendMessage(text=ai_text)]

if image_bytes:
    image_id = str(uuid.uuid4())
    image_cache[image_id] = image_bytes # Temporary storage
    image_url = f"{BOT_HOST_URL}/images/{image_id}" # FastAPI provides service
    reply_messages.append(
        ImageSendMessage(
            original_content_url=image_url,
            preview_image_url=image_url,
        )
    )

await get_line_bot_api().reply_message(event.reply_token, reply_messages)

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;LINE Bot's &lt;code&gt;reply_message&lt;/code&gt; supports returning multiple messages at once (up to 5), so text and images can be sent at the same time.&lt;/p&gt;




&lt;h2&gt;
  
  
  Pitfalls
&lt;/h2&gt;

&lt;h3&gt;
  
  
  ❌ Pitfall 1: &lt;code&gt;FunctionResponseBlob&lt;/code&gt; is not &lt;code&gt;Blob&lt;/code&gt;
&lt;/h3&gt;

&lt;p&gt;The easiest pitfall to step on: When constructing multimodal image components, &lt;strong&gt;you cannot use &lt;code&gt;types.Blob&lt;/code&gt;&lt;/strong&gt;, you need to use &lt;code&gt;types.FunctionResponseBlob&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;# ❌ Error (will TypeError)
types.FunctionResponsePart(
    inline_data=types.Blob(mime_type="image/jpeg", data=image_bytes)
)

# ✅ Correct
types.FunctionResponsePart(
    inline_data=types.FunctionResponseBlob(mime_type="image/jpeg", data=image_bytes)
)

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Although both have &lt;code&gt;mime_type&lt;/code&gt; and &lt;code&gt;data&lt;/code&gt; fields, the &lt;code&gt;inline_data&lt;/code&gt; field type of &lt;code&gt;FunctionResponsePart&lt;/code&gt; is &lt;code&gt;FunctionResponseBlob&lt;/code&gt;, and Pydantic validation will directly reject &lt;code&gt;Blob&lt;/code&gt;. You can confirm this with &lt;code&gt;python -c "from google.genai import types; print(types.FunctionResponsePart.model_fields)"&lt;/code&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  ❌ Pitfall 2: &lt;code&gt;aiohttp.ClientSession&lt;/code&gt; cannot be created at the module level
&lt;/h3&gt;

&lt;p&gt;The original code directly created &lt;code&gt;aiohttp.ClientSession()&lt;/code&gt; at the module level:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;# ❌ Old method: module level
session = aiohttp.ClientSession() # If there is no running event loop, there will be a warning or error
async_http_client = AiohttpAsyncHttpClient(session)

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;When importing &lt;code&gt;main.py&lt;/code&gt; in pytest tests, because there is no running event loop, &lt;code&gt;RuntimeError: no running event loop&lt;/code&gt; will appear. The solution is to change to lazy initialization, and create it only when it is really needed for the first time:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;# ✅ New method: lazy init
_line_bot_api = None

def get_line_bot_api():
    global _line_bot_api
    if _line_bot_api is None:
        session = aiohttp.ClientSession() # Called within the async route handler, ensuring there is an event loop
        _line_bot_api = AsyncLineBotApi(channel_access_token, AiohttpAsyncHttpClient(session))
    return _line_bot_api

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  ❌ Pitfall 3: LINE Bot needs HTTPS URL to send images
&lt;/h3&gt;

&lt;p&gt;Gemini receives raw bytes, but LINE Bot's &lt;code&gt;ImageSendMessage&lt;/code&gt; needs a &lt;strong&gt;publicly accessible HTTPS URL&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;The solution is to add a &lt;code&gt;/images/{image_id}&lt;/code&gt; endpoint in FastAPI, temporarily store the read image bytes in the &lt;code&gt;image_cache&lt;/code&gt; dict, and LINE retrieves the image through this endpoint:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;@app.get("/images/{image_id}")
async def serve_image(image_id: str):
    image_bytes = image_cache.get(image_id)
    if image_bytes is None:
        raise HTTPException(status_code=404, detail="Image not found")
    return Response(content=image_bytes, media_type="image/jpeg")

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Use &lt;code&gt;ngrok&lt;/code&gt; to expose port 8000 for local development, and use the service URL directly after Cloud Run deployment.&lt;/p&gt;




&lt;h2&gt;
  
  
  Demo Display
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Mock Database (default data for Demo)
&lt;/h3&gt;

&lt;p&gt;The system has 5 built-in products (all with real Unsplash photos), and each LINE user automatically binds two demo orders when querying orders for the first time:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Order Number&lt;/th&gt;
&lt;th&gt;Date&lt;/th&gt;
&lt;th&gt;Product&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;ORD-2026-0115&lt;/td&gt;
&lt;td&gt;2026-01-15&lt;/td&gt;
&lt;td&gt;P001 Brown pilot jacket&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;ORD-2026-0108&lt;/td&gt;
&lt;td&gt;2026-01-08&lt;/td&gt;
&lt;td&gt;P003 Dark blue denim jacket&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  Scenario 1: "Help me take a look at the jacket I bought before"
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;User sends: "Help me take a look at the jacket I bought before"

[Gemini → function_call]
  get_order_history(time_range="all")

[_execute_tool execution]
  - get_order_history() returns two orders (P001, P003)
  - Read img/tobias-tullius-...-unsplash.jpg → Brown pilot jacket real photo bytes

[Multimodal Function Response]
  Part.from_function_response(
    name="get_order_history",
    response={"orders": [...], "order_count": 2},
    parts=[FunctionResponsePart(inline_data=FunctionResponseBlob(data=&amp;lt;photo&amp;gt;))]
  )

[Gemini responds after seeing the real photo]
  "From the photo, you can see that this is a brown pilot jacket, made of lightweight nylon
   material with a glossy feel, and a metal zipper decorative pocket on the left sleeve. This is your January 15, 2026
   order ORD-2026-0115, a total of NT$1,890, status: delivered."

LINE displays: [Text] + [Brown pilot jacket real photo]

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Scenario 2: "Are there any dark blue jackets?"
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;[Gemini → function_call]
  search_products(description="dark blue jacket", color="dark blue")

[Gemini sees the real photo of P003 dark blue denim jacket]
  "Yes! This dark blue denim jacket (P003) in the photo adopts a retro stitching design,
   lapel with metal buttons, the overall garment feel is full, priced at NT$1,490, 8 in stock."

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Scenario 3: "What are the features of the P004 knit shawl?"
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;[Gemini → function_call]
  get_product_details(product_id="P004")

[Gemini sees the real photo of the beige knit shawl]
  "The photo shows a beige handmade crochet shawl, V-neck design with tassels at the bottom,
   you can see the light lace-like mesh weave, elegant texture, priced at NT$1,290."

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Traditional Function Response vs Multimodal Function Response
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;Traditional&lt;/th&gt;
&lt;th&gt;&lt;strong&gt;Multimodal&lt;/strong&gt;&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Function return&lt;/td&gt;
&lt;td&gt;Pure JSON&lt;/td&gt;
&lt;td&gt;JSON + image/PDF bytes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Gemini perception&lt;/td&gt;
&lt;td&gt;Text data&lt;/td&gt;
&lt;td&gt;Text + visual content&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Answer quality&lt;/td&gt;
&lt;td&gt;"You bought a brown pilot jacket"&lt;/td&gt;
&lt;td&gt;"You can see the nylon material gloss, zipper pocket on the left sleeve..."&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;API difference&lt;/td&gt;
&lt;td&gt;&lt;code&gt;Part.from_function_response(name, response)&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;Part.from_function_response(name, response, parts=[...])&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Applicable scenarios&lt;/td&gt;
&lt;td&gt;Pure text data query&lt;/td&gt;
&lt;td&gt;Scenarios that require visual recognition/confirmation&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h2&gt;
  
  
  Analysis and Outlook
&lt;/h2&gt;

&lt;p&gt;This implementation gave me a new understanding of Gemini's Function Calling capabilities.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The problem that Multimodal Function Response truly solves&lt;/strong&gt; is to allow the AI agent to bring in visual information in the action of "calling an external system" itself, instead of first checking the text and then uploading the image separately. This will be an important basic capability in areas highly related to visuals such as e-commerce, medical, and design.&lt;/p&gt;

&lt;p&gt;However, there are still a few limitations worth noting:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Image URLs cannot be used directly&lt;/strong&gt;: Gemini's &lt;code&gt;FunctionResponseBlob&lt;/code&gt; needs raw bytes, and cannot directly fill in the URL (this is different from directly including images in the prompt). If the image is originally a URL, you need to download it with &lt;code&gt;requests.get()&lt;/code&gt; to bytes and then pass it in.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;No &lt;code&gt;display_name&lt;/code&gt; can also be used&lt;/strong&gt;: The official documentation examples have &lt;code&gt;display_name&lt;/code&gt; and &lt;code&gt;$ref&lt;/code&gt; JSON reference, but in actual tests in google-genai 1.49.0, it can also work normally without filling in display_name, and Gemini can still see and analyze the image.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Model limitations&lt;/strong&gt;: The official mark supports the Gemini 3 series, but &lt;code&gt;gemini-2.0-flash&lt;/code&gt; can also handle it normally in actual tests, and the API structure is the same.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;There are many directions that can be extended in the future: let users send their own product photos for the Bot to compare, include PDF catalogs in the function response for Gemini to read directly, or let the Bot analyze the report images converted from DICOM in medical scenarios... As long as visual data can be obtained from external systems, Multimodal Function Response can make the AI's answers more in-depth.&lt;/p&gt;

&lt;h2&gt;
  
  
  Summary
&lt;/h2&gt;

&lt;p&gt;The focus of this LINE Bot implementation is only one sentence: &lt;strong&gt;Let the function response carry images, and Gemini's answer will upgrade from "restating data" to "telling stories based on images"&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;The core API is just these few lines, but it takes a lot of details to get the whole process through:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;# Gemini sees the complete writing of the image returned by the function
types.Part.from_function_response(
    name="get_order_history",
    response={"orders": [...]},
    parts=[
        types.FunctionResponsePart(
            inline_data=types.FunctionResponseBlob( # ← Not types.Blob!
                mime_type="image/jpeg",
                data=image_bytes,
            )
        )
    ],
)

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The complete code is on &lt;a href="https://github.com/kkdai/linebot-gemini-multimodel-funcal" rel="noopener noreferrer"&gt;GitHub&lt;/a&gt;, feel free to clone and play with it.&lt;/p&gt;

&lt;p&gt;See you next time!&lt;/p&gt;

</description>
      <category>ai</category>
      <category>gemini</category>
      <category>llm</category>
      <category>tutorial</category>
    </item>
    <item>
      <title>Google Developer Year-end 2025 Recap: Gemini 2025 New Features and Perfect Integration with LINE Bot</title>
      <dc:creator>Evan Lin</dc:creator>
      <pubDate>Sun, 08 Feb 2026 05:03:50 +0000</pubDate>
      <link>https://dev.to/gde/google-developer-year-end-2025-recap-gemini-2025-new-features-and-perfect-integration-with-line-bot-3n2m</link>
      <guid>https://dev.to/gde/google-developer-year-end-2025-recap-gemini-2025-new-features-and-perfect-integration-with-line-bot-3n2m</guid>
      <description>&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0jslymx707xu3ny3gwkq.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0jslymx707xu3ny3gwkq.png" alt="image-20260207202439911" width="800" height="600"&gt;&lt;/a&gt;s&lt;/p&gt;

&lt;h2&gt;
  
  
  Background
&lt;/h2&gt;

&lt;p&gt;Yesterday, I attended the Google Developer Year-end 2025 event hosted by Google and also visited the Google Banqiao office. I was very happy to share my observations on the evolution of Gemini technology throughout 2025 with everyone in my capacity as LINE Taiwan Developer Relations.&lt;/p&gt;

&lt;p&gt;In the popular anime "Frieren: Beyond Journey's End," I really like the character "Übel" from the First-Class Mage Exam arc. She has a unique ability concept: "If you can imagine cutting it, you can definitely cut it."&lt;/p&gt;

&lt;p&gt;This sentence perfectly echoes the current AI era - &lt;strong&gt;imagination and comprehension have become more important than ever before&lt;/strong&gt;. How to "precisely imagine how to solve a problem" has become the key to enabling AI to assist you accurately. This article will summarize the key Gemini 2025 features shared that day, as well as my views on the core capabilities of "software engineers" in the AI wave.&lt;/p&gt;

&lt;h3&gt;
  
  
  2025 Gemini Feature Evolution Review
&lt;/h3&gt;

&lt;p&gt;Looking back at 2025, the integration of Gemini and LINE Bot had groundbreaking updates at multiple points in time. Here's a review of the technical nodes from this year:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Time Point&lt;/th&gt;
&lt;th&gt;Feature Update&lt;/th&gt;
&lt;th&gt;Description&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;2025.04&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Google ADK&lt;/td&gt;
&lt;td&gt;Initial integration of Agent and Messaging API, demonstrating basic Agent applications such as weather inquiries.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;2025.06&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Gemini CLI&lt;/td&gt;
&lt;td&gt;Major upgrade to the developer experience, directly collaborating with AI in the terminal to perform file operations and code writing.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;2025.08&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Video Understanding&lt;/td&gt;
&lt;td&gt;Support for YouTube video understanding. Gemini 2.5 directly grabs subtitles and video content for summarization and interaction.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;2025.11&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;File Search&lt;/td&gt;
&lt;td&gt;Enhanced file search capabilities, supporting RAG applications for various formats such as JSON, JS, PDF, and Python.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;2025.12&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Map Grounding&lt;/td&gt;
&lt;td&gt;Combined with Google Maps Platform, allowing the Bot to answer geographical information questions such as "recent earthquake information" or "nearby restaurants."&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h4&gt;
  
  
  Detailed Explanation of Technical Highlights
&lt;/h4&gt;

&lt;h5&gt;
  
  
  1. Gemini CLI and Vibe Coding
&lt;/h5&gt;

&lt;p&gt;The Gemini CLI, launched in June, changed the habits of many developers. It's not just printing "Hello World"; it integrates tools like Git and Gcloud. This brings forth a new development concept: &lt;strong&gt;Vibe Coding&lt;/strong&gt;.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  &lt;strong&gt;Definition&lt;/strong&gt;: This is not just writing code, but allowing the development process to enter a "flow" state through tools like Gemini CLI, Vertex AI Studio, and Antigravity.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Key&lt;/strong&gt;: The focus is on how developers orchestrate the connection of these tools, rather than manually writing every line of code.&lt;/li&gt;
&lt;/ul&gt;

&lt;h5&gt;
  
  
  2. Integration of Visual and Geographic Information (Video &amp;amp; Map)
&lt;/h5&gt;

&lt;p&gt;Video Understanding in August allowed us to directly input YouTube links, and Gemini could generate summaries and even answer video details. At the end of the year, Map Grounding filled the biggest gap in LLMs: "real-time geographic information."&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  &lt;strong&gt;Application Scenario&lt;/strong&gt;: When users ask "find restaurants," the Bot uses Map Grounding to find nearby restaurants like "CHILLAX" or "博感情" and provides addresses and types.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Data Source&lt;/strong&gt;: Combines World Knowledge (Google Search) and Private Knowledge (Your Data/RAG) to make the answers more grounded.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Re-examining: The Three Pillars of Outstanding Talent
&lt;/h3&gt;

&lt;p&gt;While technology tools are constantly evolving, I'm also thinking about what kind of abilities AI cannot replace. Just like "Übel's" imagination mentioned earlier, I believe that outstanding talent needs to possess three pillars:&lt;/p&gt;

&lt;h4&gt;
  
  
  1. AI Collaboration
&lt;/h4&gt;

&lt;p&gt;This is not just knowing how to use tools, but also having the ability of &lt;strong&gt;Prompt Engineering&lt;/strong&gt;.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  &lt;strong&gt;Difference&lt;/strong&gt;: Those who know how to converse with AI and guide AI to produce results can increase their production speed by 10 times.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Key&lt;/strong&gt;: AI is your Copilot, but you are the captain.&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  2. Domain Depth
&lt;/h4&gt;

&lt;p&gt;In an era where AI is rampant, &lt;strong&gt;Domain Knowledge&lt;/strong&gt; is your strongest moat.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  &lt;strong&gt;Value&lt;/strong&gt;: AI can write syntactically correct code, but "experience in solving complex problems" and "a deep understanding of business logic" are difficult for AI to imitate.&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  3. Empathy &amp;amp; Creativity
&lt;/h4&gt;

&lt;p&gt;Transforming "passion" into human-specific &lt;strong&gt;critical thinking&lt;/strong&gt; and &lt;strong&gt;empathy&lt;/strong&gt;.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  &lt;strong&gt;Core&lt;/strong&gt;: This is the core of building relationships and managing decisions. AI can process data, but it cannot understand the true motivations behind people's emotions and needs.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Conclusion
&lt;/h3&gt;

&lt;p&gt;This Google Developer Year-end event reaffirmed for me that the understanding of systems and the recognition of problems are the most important abilities for software engineers.&lt;/p&gt;

&lt;p&gt;Development in the AI era is no longer just simple coding, but like designing the AP2 protocol, it requires consideration of the overall architecture and security. If we only quickly Vibe Code and ignore the underlying principles (such as forgetting to consider the timeliness of Tokens or the correctness of data), it is easy to produce a flawed system.&lt;/p&gt;

&lt;p&gt;Therefore, maintaining curiosity about technology while deepening your domain knowledge is the important reason why we can prove that AI cannot replace us.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Slides Download&lt;/strong&gt;: Friends who are interested can refer to the slides from that day: &lt;a href="https://speakerdeck.com/line_developers_tw/2025-features-recap-perfect-integration-linebot" rel="noopener noreferrer"&gt;https://speakerdeck.com/line_developers_tw/2025-features-recap-perfect-integration-linebot&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;em&gt;(If you find this helpful, feel free to share this article!)&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>gemini</category>
      <category>google</category>
      <category>news</category>
    </item>
    <item>
      <title>[Gemini CLI] Google Developer Knowledge API and MCP Server: Equipping Your AI Assistant with an Official Knowledge Base</title>
      <dc:creator>Evan Lin</dc:creator>
      <pubDate>Sun, 08 Feb 2026 05:03:37 +0000</pubDate>
      <link>https://dev.to/gde/gemini-cli-google-developer-knowledge-api-and-mcp-server-equipping-your-ai-assistant-with-an-3gee</link>
      <guid>https://dev.to/gde/gemini-cli-google-developer-knowledge-api-and-mcp-server-equipping-your-ai-assistant-with-an-3gee</guid>
      <description>&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Flimezcggfj3qlw3khdw5.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Flimezcggfj3qlw3khdw5.png" alt="iTerm2 2026-02-08 01.24.09" width="800" height="372"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;References:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://developers.googleblog.com/introducing-the-developer-knowledge-api-and-mcp-server/" rel="noopener noreferrer"&gt;Introducing the Developer Knowledge API and MCP server&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://developers.google.com/knowledge/mcp#claude-code" rel="noopener noreferrer"&gt;Google Knowledge MCP Server&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://developers.google.com/knowledge/reference/corpus-reference" rel="noopener noreferrer"&gt;Developer Knowledge API Corpus Reference&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h1&gt;
  
  
  Background
&lt;/h1&gt;

&lt;p&gt;Remember last week when I was integrating the Gemini API using the Gemini CLI, and it confidently told me, "This is how you use this API parameter." But after running it, I got a bunch of errors. It turned out that Google had changed the API format three months ago. This isn't the AI's fault; its training data cutoff date is what it is. Faced with ever-changing technical documentation, even the strongest models will "become outdated."&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Typical scenarios we've encountered in the past:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Developer: "Gemini, help me write an example of Gemini Function Calling"
AI: "Okay, you can write it like this..." [Generates code based on June 2024 documentation]
Developer: [Copy and paste, execute]
Terminal: ❌ Error: Parameter 'tools' format has changed in v2
Developer: 😤 "I have to go look at the official documentation again..."

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Are you familiar with this cycle? Even Gemini 1.5 Pro sometimes gives outdated suggestions because its own API updates too quickly. &lt;strong&gt;AI's knowledge is static, but technical documentation is dynamic&lt;/strong&gt;, and this contradiction has always troubled us.&lt;/p&gt;

&lt;p&gt;To completely solve this problem, Google released two major killer tools in early 2025:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Developer Knowledge API&lt;/strong&gt; - A machine-readable official documentation API&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Knowledge MCP Server&lt;/strong&gt; - A real-time document query service based on the Model Context Protocol&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This means that your AI assistant is no longer just "remembering" how to write code, but can actively "consult the latest official documentation" when needed, becoming a development expert truly guaranteed by the official source and never outdated.&lt;/p&gt;

&lt;h2&gt;
  
  
  What is the Developer Knowledge API?
&lt;/h2&gt;

&lt;h3&gt;
  
  
  How AI used to learn documents: The dilemma of web crawlers
&lt;/h3&gt;

&lt;p&gt;Traditionally, AI models learn documents by crawling web pages. But this method has several fatal problems:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;❌ Noise Interference&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;&amp;lt;!-- Actual content seen by AI --&amp;gt;
&amp;lt;nav&amp;gt;...&amp;lt;/nav&amp;gt; &amp;lt;!-- Navigation bar --&amp;gt;
&amp;lt;ad&amp;gt;...&amp;lt;/ad&amp;gt; &amp;lt;!-- Advertisement --&amp;gt;
&amp;lt;cookie-banner&amp;gt;...&amp;lt;/cookie-banner&amp;gt; &amp;lt;!-- Cookie prompt --&amp;gt;
&amp;lt;div class="content"&amp;gt;
  &amp;lt;!-- The real document content only accounts for 30% --&amp;gt;
  This is how to use the Gemini API...
&amp;lt;/div&amp;gt;
&amp;lt;footer&amp;gt;...&amp;lt;/footer&amp;gt; &amp;lt;!-- Footer --&amp;gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;AI has to "guess" from this pile of HTML which is the real document content.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;❌ Inconsistent Formatting&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Some use &lt;code&gt;&amp;lt;code&amp;gt;&lt;/code&gt; tags, some use &lt;code&gt;&amp;lt;pre&amp;gt;&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Some use Markdown rendering, some use custom syntax&lt;/li&gt;
&lt;li&gt;Image descriptions may be in &lt;code&gt;alt&lt;/code&gt;, &lt;code&gt;title&lt;/code&gt;, or &lt;code&gt;figcaption&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;❌ Update Delay&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Crawlers may only crawl every few months&lt;/li&gt;
&lt;li&gt;New API parameters have to wait for the next training to know&lt;/li&gt;
&lt;li&gt;The training data cutoff date becomes a perpetual pain&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Developer Knowledge API: A machine-first document system
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Developer Knowledge API&lt;/strong&gt; completely changes the game, it provides:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;✅ Machine-readable source of truth&lt;/strong&gt;:

&lt;ul&gt;
&lt;li&gt;Directly provides pure Markdown format&lt;/li&gt;
&lt;li&gt;No noise, no ads, no navigation bar&lt;/li&gt;
&lt;li&gt;Structured metadata (author, update time, version)&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;

&lt;strong&gt;✅ Real-time&lt;/strong&gt;:

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Synchronized updates&lt;/strong&gt; with Google's official documentation (delay &amp;lt; 1 hour)&lt;/li&gt;
&lt;li&gt;When the API changes, the AI can immediately read the new documents&lt;/li&gt;
&lt;li&gt;There will never be the problem of "outdated training data"&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;

&lt;strong&gt;✅ Comprehensive&lt;/strong&gt;: It can directly retrieve and obtain documents from the following Google official domains. If your development field is related to these, it is strongly recommended to enable this MCP:

&lt;ul&gt;
&lt;li&gt;&lt;code&gt;ai.google.dev&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;developer.android.com&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;developer.chrome.com&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;developers.home.google.com&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;developers.google.com&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;docs.cloud.google.com&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;docs.apigee.com&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;firebase.google.com&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;fuchsia.dev&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;web.dev&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;www.tensorflow.org&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;/ul&gt;

&lt;h2&gt;
  
  
  MCP Server: Making AI more "knowledgeable"
&lt;/h2&gt;

&lt;p&gt;Model Context Protocol (MCP) is an open standard, which is like an "add-on slot" for AI tools. Google's &lt;strong&gt;Knowledge MCP Server&lt;/strong&gt;, launched this time, allows various tools that support MCP (such as Claude Code, Cursor, and even our favorite Gemini CLI) to easily integrate.&lt;/p&gt;

&lt;p&gt;Through this MCP Server, AI no longer just writes code from memory, but can "consult the books" for specific questions:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Implementation guidance&lt;/strong&gt;: Ask for the best implementation method for a new feature.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Troubleshooting&lt;/strong&gt;: Diagnose directly based on the latest Error Code documentation.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Version comparison&lt;/strong&gt;: Understand the differences between different versions of the API.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you are interested in MCP applications in a specific field, I also shared in a previous article &lt;a href="https://dev.to/gde/geminigoogle-maps-building-location-aware-ai-apps-with-the-google-maps-grounding-api-4l36"&gt;Google Maps Platform Assist MCP: Let AI help you write more accurate map applications&lt;/a&gt;, which is also a very powerful tool that can give AI assistants an advantage when developing map features.&lt;/p&gt;

&lt;h2&gt;
  
  
  Hands-on: Letting AI assistants import the official knowledge base
&lt;/h2&gt;

&lt;p&gt;To enable AI assistants to read official documentation, we need to complete some simple preparations in Google Cloud.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 1: Enable the Developer Knowledge API
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;Go to the &lt;a href="https://console.cloud.google.com/apis/library/knowledge.googleapis.com" rel="noopener noreferrer"&gt;Developer Knowledge API page&lt;/a&gt; in the Google API Library.&lt;/li&gt;
&lt;li&gt;Make sure you have selected the correct project.&lt;/li&gt;
&lt;li&gt;Click "&lt;strong&gt;Enable&lt;/strong&gt;". This API does not require special IAM permissions to use.&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  Step 2: Create and protect your API key
&lt;/h3&gt;

&lt;p&gt;For security, it is recommended to restrict the key:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;In the Google Cloud console, navigate to the "&lt;strong&gt;Credentials&lt;/strong&gt;" page.&lt;/li&gt;
&lt;li&gt;Click "&lt;strong&gt;Create Credentials&lt;/strong&gt;", then select "&lt;strong&gt;API key&lt;/strong&gt;".&lt;/li&gt;
&lt;li&gt;Click "&lt;strong&gt;Edit API key&lt;/strong&gt;".&lt;/li&gt;
&lt;li&gt;Enter a recognizable name in the name field (e.g., &lt;code&gt;Dev-Knowledge-Key&lt;/code&gt;).&lt;/li&gt;
&lt;li&gt;Under "API restrictions", select "&lt;strong&gt;Restrict key&lt;/strong&gt;".&lt;/li&gt;
&lt;li&gt;Select "&lt;strong&gt;Developer Knowledge API&lt;/strong&gt;" from the API list, and then click OK.&lt;/li&gt;
&lt;li&gt;Click "&lt;strong&gt;Save&lt;/strong&gt;".&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;After creation, click "Show Key" and write it down, this is the credential we will use next.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fm909ocurb0ukzxup163i.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fm909ocurb0ukzxup163i.png" alt="Google Chrome 2026-02-07 20.52.15" width="800" height="234"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;If you are using &lt;strong&gt;Claude Code&lt;/strong&gt; or &lt;strong&gt;Gemini CLI&lt;/strong&gt;, you can now make it stronger with a simple configuration.&lt;/p&gt;

&lt;h3&gt;
  
  
  Configuration Example (using Gemini CLI as an example)
&lt;/h3&gt;

&lt;p&gt;You only need to add Google's MCP Server address to the settings and include your API Key:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;# Add Google Developer Knowledge MCP Server
gemini mcp add -t http -H "X-Goog-Api-Key: YOUR_API_KEY" google-developer-knowledge https://developerknowledge.googleapis.com/mcp --scope user

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Once the configuration is complete, when you ask "how to use the latest Gemini API for Function Calling", the AI will actively call the MCP Server to retrieve the most accurate and up-to-date document content from the official website to answer you.&lt;/p&gt;

&lt;h2&gt;
  
  
  Analysis and Outlook: Why is this important?
&lt;/h2&gt;

&lt;p&gt;The launch of this technology marks two major shifts in the development process:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;From "relying on memory" to "real-time query"&lt;/strong&gt; In the past, we pursued making the model bigger and remembering more things. Now, we let the model learn to "look up information" through MCP. This not only greatly reduces hallucinations, but also reduces the pressure on the model to retrain frequently.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;More powerful development agents (AI Agents)&lt;/strong&gt; When AI assistants can read documents, execute instructions, and perform version control, they truly evolve into "digital colleagues" who can handle tasks independently. The structured information provided by the Developer Knowledge API is the fuel that AI Agents need to perform complex reasoning.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Summary
&lt;/h2&gt;

&lt;p&gt;This time, Google not only provides powerful models, but also provides excellent "data interfaces". For developers who pursue efficiency, configuring the Developer Knowledge MCP Server is definitely worth the 5 minutes of investment.&lt;/p&gt;

&lt;p&gt;In the future, when writing code, the AI assistant is no longer just a machine that can write code, but a technical consultant who is always consulting the latest official documentation and giving you the most accurate advice. Why not apply for an API Key and try it out?&lt;/p&gt;

&lt;p&gt;See you next time!&lt;/p&gt;

</description>
      <category>api</category>
      <category>gemini</category>
      <category>google</category>
      <category>mcp</category>
    </item>
    <item>
      <title>Building Interoperable AI Business Agents with UCP: DevBooks Agent Implementation Analysis</title>
      <dc:creator>Evan Lin</dc:creator>
      <pubDate>Sat, 31 Jan 2026 17:23:33 +0000</pubDate>
      <link>https://dev.to/gde/building-interoperable-ai-business-agents-devbooks-agent-implementation-analysis-1gfp</link>
      <guid>https://dev.to/gde/building-interoperable-ai-business-agents-devbooks-agent-implementation-analysis-1gfp</guid>
      <description>&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fa7h70a7nmuj3qpc5fzlq.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fa7h70a7nmuj3qpc5fzlq.png" alt="Google Chrome 2026-01-31 21.09.06" width="800" height="337"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h1&gt;
  
  
  Previous Article Recap
&lt;/h1&gt;

&lt;p&gt;In the previous article, we explored how to implement Agentic Vision using LINE Bot. Today, we'll shift our focus to another important area of AI Agents: &lt;strong&gt;E-commerce and Interoperability&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Most current AI Agents are "islands." If you want to buy a book, you might need a dedicated bookstore Bot; to buy groceries, you'll need another grocery Bot. These Agents don't communicate with each other, and the user experience is fragmented.&lt;/p&gt;

&lt;p&gt;To solve this problem, the &lt;strong&gt;Universal Commerce Protocol (UCP)&lt;/strong&gt; was born. It's like HTML for the commerce world, defining a set of standard languages that allow different AI Agents (buyer agents and seller agents) to "communicate" with each other and complete complex commercial transactions.&lt;/p&gt;

&lt;p&gt;In this article, I will take you deep into the code of &lt;code&gt;devbooks_agent&lt;/code&gt;, a UCP-based technical bookstore Agent, to demonstrate how it works.&lt;/p&gt;

&lt;h2&gt;
  
  
  What is UCP and A2A?
&lt;/h2&gt;

&lt;p&gt;Before diving into the code, let's briefly understand two core concepts:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt; &lt;strong&gt;UCP (Universal Commerce Protocol)&lt;/strong&gt;: A standardized commerce protocol. It defines data structures (Schemas) such as "Product", "Order", and "Checkout", ensuring that the "Product" your Agent says is the same as the "Product" my Agent understands.&lt;/li&gt;
&lt;li&gt; &lt;strong&gt;A2A (Agent-to-Agent)&lt;/strong&gt;: The communication model between Agents. Here, we will have two roles:

&lt;ul&gt;
&lt;li&gt;  &lt;strong&gt;User Agent (Client)&lt;/strong&gt;: Represents the user, responsible for sending requests (e.g., "I want to buy a React book").&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Business Agent (Merchant)&lt;/strong&gt;: Represents the merchant (like DevBooks), responsible for providing product information and processing orders.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Our focus today is this &lt;strong&gt;Business Agent&lt;/strong&gt; — &lt;code&gt;devbooks_agent&lt;/code&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Project Structure Overview
&lt;/h2&gt;

&lt;p&gt;&lt;code&gt;devbooks_agent&lt;/code&gt; is a standard Python project that uses Google's Agent Development Kit (ADK).&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;devbooks_agent/
├── src/devbooks_agent/
│ ├── agent.py # Agent's brain: defines tools and behaviors
│ ├── ucp_profile_resolver.py # UCP handshake protocol: confirms each other's capabilities
│ ├── store.py # Simulated database and business logic
│ ├── data/
│ │ ├── ucp.json # UCP capability declaration
│ │ ├── products.json # Book catalog
│ │ └── agent_card.json # Agent's business card
│ └── main.py # Program entry point
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  1. Defining Agent Capabilities (&lt;code&gt;ucp.json&lt;/code&gt;)
&lt;/h3&gt;

&lt;p&gt;First, the Agent needs to tell the world what it "can do." This is defined through &lt;code&gt;ucp.json&lt;/code&gt;. This is like the Agent's resume.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;{
  "ucp": {
    "version": "2026-01-11",
    "capabilities": [
      {
        "name": "dev.ucp.shopping.checkout",
        "version": "2026-01-11",
        "spec": "https://ucp.dev/specs/shopping/checkout"
      },
      {
        "name": "dev.ucp.shopping.fulfillment",
        "version": "2026-01-11",
        "extends": "dev.ucp.shopping.checkout"
      }
    ]
  }
}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This configuration declares that the DevBooks Agent supports the 2026 version of the UCP protocol and has the capabilities of "shopping checkout" and "logistics delivery."&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Agent's Brain and Tools (&lt;code&gt;agent.py&lt;/code&gt;)
&lt;/h3&gt;

&lt;p&gt;This is the most core part. We use &lt;code&gt;google.adk.agents.Agent&lt;/code&gt; to define the Agent and give it various tools (Tools).&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;# src/devbooks_agent/agent.py

root_agent = Agent(
    name="devbooks_agent",
    model="gemini-2.5-flash", # Uses the latest Gemini model
    description="Agent to help with shopping for technical books",
    instruction=(
        "You are a helpful agent who assists developers in finding and purchasing"
        " technical books..."
        # ... Detailed Prompt instructions ...
    ),
    tools=[
        search_shopping_catalog, # Search for books
        preview_book, # Preview (DevBooks exclusive feature)
        add_to_checkout, # Add to cart
        start_payment, # Start checkout
        complete_checkout, # Complete order
        # ... Other tools
    ],
    # ... callback settings
)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Featured Tool: &lt;code&gt;preview_book&lt;/code&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Unlike general grocery stores, selling books usually requires a "preview." This is where the benefits of Agent toolification come in handy, allowing us to easily add custom features:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;def preview_book(tool_context: ToolContext, book_id: str) -&amp;gt; dict:
  """Gets a preview/sample chapter of a book."""
  try:
    preview = store.get_book_preview(book_id)
    if preview is None:
        # Handle the case where there is no preview
        return _create_error_response(...)

    return {
        "preview": preview.model_dump(mode="json"),
        "status": "success"
    }
  except Exception:
    # Error handling
    return _create_error_response(...)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  3. UCP Handshake Protocol (&lt;code&gt;ucp_profile_resolver.py&lt;/code&gt;)
&lt;/h3&gt;

&lt;p&gt;When the User Agent connects to the Business Agent, both parties need to "tune in" first to confirm the UCP versions they support. This is handled by &lt;code&gt;ProfileResolver&lt;/code&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;# src/devbooks_agent/ucp_profile_resolver.py

def resolve_profile(self, client_profile_url: str, user_id: str | None = None) -&amp;gt; dict:
    # 1. Get the Client's Profile
    profile = self._fetch_profile(client_profile_url, headers=headers)

    # 2. Check version compatibility
    client_version = profile.get("ucp").get("version")
    merchant_version = self.merchant_profile.get("ucp").get("version")

    # If the Client version is too new, and the Merchant doesn't support it, then report an error
    if client_version &amp;gt; merchant_version:
      raise ServerError(...)

    return profile
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This ensures that both parties in the transaction are on the same channel and that there is no miscommunication.&lt;/p&gt;

&lt;h2&gt;
  
  
  Practical Demo
&lt;/h2&gt;

&lt;p&gt;After understanding the architecture, let's run a complete UCP test flow. This Demo will simulate a developer purchasing a technical book.&lt;/p&gt;

&lt;h3&gt;
  
  
  Environment Preparation
&lt;/h3&gt;

&lt;p&gt;Make sure you have started the following services:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt; &lt;strong&gt;Business Agent (DevBooks)&lt;/strong&gt;: &lt;code&gt;http://localhost:11000&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt; &lt;strong&gt;Chat Client&lt;/strong&gt;: &lt;code&gt;http://localhost:3000&lt;/code&gt;
&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  Test Script
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ft1vrrucmm9027bigpf6t.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ft1vrrucmm9027bigpf6t.png" alt="Google Chrome 2026-01-31 21.19.00" width="800" height="400"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Please open the Chat Client (&lt;code&gt;http://localhost:3000&lt;/code&gt;) in your browser and follow these steps:&lt;/p&gt;

&lt;h4&gt;
  
  
  1. Book Search
&lt;/h4&gt;

&lt;p&gt;&lt;strong&gt;User Input:&lt;/strong&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;"I looking for some books about React to learn."&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;strong&gt;Behind the Scenes:&lt;/strong&gt; The Agent will call the &lt;code&gt;search_shopping_catalog&lt;/code&gt; tool and return a list of matching books (e.g., "Learning React", "React Design Patterns").&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Expected Result:&lt;/strong&gt; You will see book cards with cover images and prices.&lt;/p&gt;

&lt;h4&gt;
  
  
  2. Preview Content
&lt;/h4&gt;

&lt;p&gt;This is a DevBooks exclusive feature.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;User Input:&lt;/strong&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;"Can I see a preview of the first one?"&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;strong&gt;Behind the Scenes:&lt;/strong&gt; The Agent identifies the user's intent and calls the &lt;code&gt;preview_book&lt;/code&gt; tool to get the preview chapter content.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Expected Result:&lt;/strong&gt; The Agent returns the first chapter excerpt or a preview link of the book.&lt;/p&gt;

&lt;h4&gt;
  
  
  3. Add to Cart
&lt;/h4&gt;

&lt;p&gt;&lt;strong&gt;User Action:&lt;/strong&gt; Click the &lt;strong&gt;"Add to Checkout"&lt;/strong&gt; button on the card, or enter "Add Learning React to my cart".&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Behind the Scenes:&lt;/strong&gt; Calls &lt;code&gt;add_to_checkout&lt;/code&gt;. At this point, the Agent creates a UCP Checkout Session (&lt;code&gt;ADK_USER_CHECKOUT_ID&lt;/code&gt;) in the background.&lt;/p&gt;

&lt;h4&gt;
  
  
  4. Checkout Info
&lt;/h4&gt;

&lt;p&gt;&lt;strong&gt;User Input:&lt;/strong&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;"My email is &lt;a href="mailto:dev@example.com"&gt;dev@example.com&lt;/a&gt;, ship to 456 Tech Blvd, San Francisco, CA 94107"&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;strong&gt;Behind the Scenes:&lt;/strong&gt; The Agent parses the address information and calls &lt;code&gt;update_customer_details&lt;/code&gt; to fill in the information in the UCP Checkout object.&lt;/p&gt;

&lt;h4&gt;
  
  
  5. Payment
&lt;/h4&gt;

&lt;p&gt;&lt;strong&gt;User Action:&lt;/strong&gt; Click &lt;strong&gt;"Complete Payment"&lt;/strong&gt; -&amp;gt; Select a payment method (Mock Pay) -&amp;gt; &lt;strong&gt;"Confirm Purchase"&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Behind the Scenes:&lt;/strong&gt; Calls &lt;code&gt;complete_checkout&lt;/code&gt;. The Agent interacts with &lt;code&gt;MockPaymentProcessor&lt;/code&gt;, verifies the payment, and finally calls &lt;code&gt;store.place_order&lt;/code&gt; to complete the order.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Expected Result:&lt;/strong&gt; Receive order confirmation message: "Order Confirmed! Order ID: ORDER-12345".&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7j9kpok9w26h5t3cg5tr.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7j9kpok9w26h5t3cg5tr.png" alt="Google Chrome 2026-01-31 21.19.48" width="800" height="398"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Technical Development Experience
&lt;/h2&gt;

&lt;h3&gt;
  
  
  1. The Importance of State Management
&lt;/h3&gt;

&lt;p&gt;In &lt;code&gt;agent.py&lt;/code&gt;, you can see a lot of &lt;code&gt;tool_context.state&lt;/code&gt; usage. Because the Agent's interaction is a multi-turn conversation, we must preserve &lt;code&gt;checkout_id&lt;/code&gt; between conversations.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;def add_to_checkout(tool_context: ToolContext, ...):
    # Read or create Checkout ID from Context
    checkout_id = tool_context.state.get(ADK_USER_CHECKOUT_ID)
    if not checkout_id:
        # ... Create new Checkout
        tool_context.state[ADK_USER_CHECKOUT_ID] = checkout.id
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is similar to the concept of Session in traditional Web development, but in Agent development, this is maintained jointly by the LLM's Context Window and an external State Store.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. The Power of UCP: Structured Data
&lt;/h3&gt;

&lt;p&gt;Pay attention to the &lt;code&gt;after_tool_modifier&lt;/code&gt; function:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;def after_tool_modifier(..., tool_response: Dict) -&amp;gt; Optional[Dict]:
    # ...
    # Inject structured UCP data into the response
    if UcpExtension.URI in extensions:
        tool_context.state[ADK_LATEST_TOOL_RESULT] = tool_response
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This allows the Agent to return not just a piece of text ("Okay, added to cart"), but a complete, machine-readable JSON object. The Client side (front-end UI) receives this JSON and can render beautiful product cards or checkout buttons, instead of just displaying plain text. This is the essence of A2A: &lt;strong&gt;Not just chatting, but data exchange.&lt;/strong&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;Through the implementation of &lt;code&gt;devbooks_agent&lt;/code&gt;, we see how UCP elevates AI Agents from simple chatbots to "digital clerks" capable of handling complex business logic.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  &lt;strong&gt;Standardization&lt;/strong&gt;: UCP allows Agents written by different developers to communicate with each other.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Modularity&lt;/strong&gt;: Through ADK Tools, we can easily extend functionality (such as previews).&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Interoperability&lt;/strong&gt;: The front-end UI can automatically generate interfaces based on standard protocols, without customizing the screen for each Agent.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The future of AI Agents is definitely not a solo fight, but an interconnected world.&lt;/p&gt;

&lt;h2&gt;
  
  
  References
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;  &lt;a href="https://ucp.dev" rel="noopener noreferrer"&gt;Universal Commerce Protocol (UCP) Specification&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;  &lt;a href="https://github.com/google/adk" rel="noopener noreferrer"&gt;Google Agent Development Kit (ADK)&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;  &lt;a href="//../business_agent/"&gt;DevBooks Agent Source Code&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>agents</category>
      <category>ai</category>
      <category>api</category>
      <category>architecture</category>
    </item>
  </channel>
</rss>
