Documentation - Deep Research Agent
Everyone can contribute. Help move this issue forward while earning points, leveling up and collecting rewards.
This document provides a detailed overview of the GitLab Deep Research Agent 🔬 - POC (&17948), its architecture, and the execution flow of its research stages. It's designed for developers and contributors who want to understand the inner workings of the agent.
Table of Contents
- Overview
- Core Architecture
- Overall Execution Flow
- Detailed Execution Stages
- Components & Technologies
- Differences from Duo Workflow
1. Overview
The GitLab Research Agent is an AI-powered system designed to perform in-depth analysis of GitLab artifacts (Epics, Issues, Merge Requests, etc.) and their relationships. It leverages a Knowledge Graph (powered by KùzuDB) and an agentic framework (LangGraph) to understand user queries, plan research steps, gather information from GitLab, and synthesize comprehensive reports.
The agent architecture is composed of two main types of agents:
- Orchestrator Agent: The primary agent that interprets the user's query, plans the overall research strategy, dispatches tasks to Server Item Agents, aggregates their findings, and synthesizes the final global report.
- Server Item Agent(s): Specialized sub-agents responsible for conducting detailed research on individual GitLab items (e.g., a specific Issue or Merge Request). They gather data, analyze it, and produce a focused report for that item.
This document will walk through the lifecycle of a research query, detailing each stage of the process.
2. Core Architecture
Knowledge Graph Integration
At its core, the agent relies on a Knowledge Graph built with KùzuDB. This graph stores GitLab entities (Issues, MRs, Epics, Projects, Groups, Files) as nodes and their interconnections (e.g., RELATED_TO_ISSUE
, ISSUE_TO_EPIC
, MODIFIES_FILE
) as typed relationships.
-
Data Ingestion: An indexer component (
knowledge_graph/indexer.ts
) is responsible for populating and updating the knowledge graph. It fetches data from GitLab using the GitLab API client (gitlab_api/client.ts
) and a caching layer (gitlab_api/fetch_with_cache.ts
), then transforms and loads it into KùzuDB. The schema definitions for nodes and relationships are inknowledge_graph/schemas.ts
. -
Data Retrieval: Agents query the graph using Cypher (via
knowledge_graph/kuzu.ts
) to discover relevant items and their relationships, providing contextual data for LLM reasoning. Parameterized Cypher templates are used for common query patterns like nearest-neighbor or hierarchy searches.
Agentic Framework (LangGraph)
The agent's control flow and state management are built using LangGraph, a library for building stateful, multi-actor applications with LLMs. LangGraph allows defining the research process as a graph of nodes, where each node represents a specific step or agent.
-
Orchestrator Agent Graph (
orchestrator_agent/graph.ts
): Defines the main workflow for handling user queries. -
Server Item Agent Graph (
server_item_agent/graph.ts
): Defines the workflow for researching individual GitLab items.
State is managed within these graphs, allowing information to be passed between nodes and updated as the research progresses. Refer to orchestrator_agent/state.ts
and server_item_agent/state.ts
for state definitions.
Orchestrator and Server Item Agents
As mentioned, the system uses a two-tiered agent design:
- Orchestrator Agent: Manages the end-to-end research process. It breaks down the user's query, decides what information is needed, coordinates with Server Item Agents, and compiles the final answer.
- Server Item Agent: Acts as a specialist, focusing on a single GitLab item. It uses available tools to gather details, comments, related items (via graph queries), and file content, then synthesizes this information into a report specific to its assigned item.
This separation of concerns allows for parallel processing of item research and a modular approach to adding new capabilities.
3. Overall Execution Flow
The research process, from user query to final report, involves several distinct stages orchestrated by the main agent.
Mermaid Diagram: Top-Down Architecture
graph TD
D[Orchestrator Agent]
D --> E[Analyze Input & Extract Item IDs]
E --> F{Index Items?}
F -->|Yes| G[Update Knowledge Graph]
F -->|No| H[Create Research Plan]
G --> H
H --> I[Select Items for Research]
I --> J[Execute Parallel Sub-Agents]
J --> K[Aggregate Results]
K --> L{More Research Needed?}
L -->|Yes| I
L -->|No| M[Synthesize Final Report]
M --> N[Stream Response to User]
subgraph "Sub-Agent Research"
J1[Initialize Item Research]
J2[Iterative Data Gathering]
J3[Generate Item Report]
J1 --> J2 --> J3
end
J -.-> J1
4. Detailed Execution Stages
The execution of a research query is managed by the Orchestrator Agent Graph defined in orchestrator_agent/graph.ts
.
Stage 1: Input Analysis & Item ID Extraction
-
Node:
inputAnalysisNode
(orchestrator_agent/nodes/input_analysis.ts
) - Purpose: To parse the user's input query, identify any explicitly mentioned GitLab item URLs or references, extract them into a structured format, and determine initial research topics.
-
Process:
- The node receives the raw input query from the user (e.g., "What is this POC about? Does it support JetBrains? gitlab-org/editor-extensions&158").
- It utilizes an LLM, prompted with a system message to act as a "GitLab Server Item ID Extractor."
- The LLM is equipped with a tool (
itemIdTool
) whose parameters are defined byinitialInputSchema
(server_items, reasoning, researchTopics). - The LLM processes the query and calls the
itemIdTool
with the extracted item specifications (itemType, iid, fullPath), its reasoning for the extraction, and a list of research topics derived from the query. - The extracted item specifications are then converted into
CompleteServerItemId
objects usingbuildServerItemIds
fromitem_id.ts
. - An event
orchestrator_server_item_ids_extracted
is dispatched with the extracted items, reasoning, and research topics.
-
Output:
InputAnalysisNodeOutput
containinginitialServerItems
,researchTopics
, and any error. This output is passed to the next node in the orchestrator graph.
Reliable GitLab Item ID Extraction
The system uses a combination of LLM understanding and structured tooling to reliably extract GitLab item IDs:
-
LLM Prompting: The
inputAnalysisNode
provides a clear system prompt to the LLM, instructing it to identify GitLab items and related research topics from the user's query. -
Tool Definition with Schema: A specific tool,
itemIdTool
, is defined with a Zod schema (ServerItemToolArgsSchema
fromitem_id.ts
andinitialInputSchema
inorchestrator_agent/nodes/input_analysis.ts
). This schema guides the LLM on the expected output format for each item:-
itemType
: Enum (Issue, MergeRequest, Epic) -
iid
: String (Internal ID of the item) -
fullPath
: String (Full path of the project or group, e.g., "gitlab-org/gitlab") The schema description includes examples of URLs and common textual references for Issues, MRs, and Epics to aid the LLM.
-
- LLM Tool Calling: The LLM is expected to call this tool with an array of identified items matching the schema.
-
Structured Conversion: The arguments provided by the LLM via the tool call are then processed by
buildServerItemIds
(item_id.ts
). This function takes the array ofServerItemToolArgs
and constructsCompleteServerItemId
objects, which include the globally uniqueitemId
(e.g.,gitlab-org/gitlab#123
) generated byServerItemIdResolver.buildItemId
.
This multi-step process, combining flexible LLM parsing with strict schema enforcement and standardized ID generation, ensures that various forms of item references from the user query are reliably converted into a consistent, usable format for the rest of the agent's operations.
Stage 2: Knowledge Graph Indexing (Optional)
-
Node:
graphIndexNode
(orchestrator_agent/nodes/graph_index.ts
) -
Purpose: If the
shouldIndex
flag in the input is true, this node is responsible for indexing the initial items (and their related entities) into the Knowledge Graph. This ensures the graph is up-to-date before proceeding with research. -
Process:
- The node receives the
initialServerItems
from theinputAnalysisNodeOutput
. - It uses the
Indexer
service (knowledge_graph/indexer.ts
) configured inAppConfig
. - For each initial item, it calls
indexer.run()
, which fetches the item and its related entities from GitLab and upserts them into KùzuDB. - During indexing,
graph_index_progress
events are dispatched to provide feedback on the indexing status. - Successfully indexed items are collected.
- The node receives the
-
Output:
GraphIndexNodeOutput
containingindexedItems
and any error. -
Routing:
- If an error occurs, the graph ends.
- Otherwise, it proceeds to the
planNode
.
The Indexer
itself uses a priority queue (PQueue
) to manage fetching and processing items. It explores the GitLab data graph starting from the initial items, respecting configured depth limits per item type (depthByItemConfig
). It fetches data using the GitLabDataFetcher
, which in turn uses the CachedFetcher
. Nodes are added to KùzuDB via kuzu.upsertNode()
and relationships via kuzu.addRelationshipBulk()
.
Mermaid Diagram: Graph Indexing
graph TD
INPUT_ANALYSIS_OUTPUT[From Input Analysis: initialServerItems] --> GRAPH_INDEX_NODE{"Graph Index"};
GRAPH_INDEX_NODE -- For each initial item --> INDEXER_RUN["indexer.run(item)"];
INDEXER_RUN -- Uses --> GITLAB_DATA_FETCHER["GitLabDataFetcher (`fetcher.ts`)"];
GITLAB_DATA_FETCHER -- Uses --> CACHED_FETCHER["CachedFetcher (`fetch_with_cache.ts`)"];
CACHED_FETCHER -- Fetches from --> GITLAB_API[GitLab API];
INDEXER_RUN -- Upserts data to --> KUZU_DB["KùzuDB (`kuzu.ts`)"];
INDEXER_RUN -- Dispatches Events --> EVENT_GRAPH_INDEX_PROGRESS[Event: graph_index_progress];
GRAPH_INDEX_NODE --> OUTPUT_INDEXING[Output: indexedItems];
OUTPUT_INDEXING --> PLAN_NODE[To Plan Node];
Stage 3: Research Plan Construction
-
Node:
planNode
(orchestrator_agent/nodes/plan.ts
) - Purpose: To create a high-level research plan. This typically involves exploring the Knowledge Graph starting from the initial items to identify a broader set of relevant items that need to be investigated by the Server Item Agents.
-
Process:
- The node initializes a conversation history with a system prompt that instructs the LLM to act as a "GitLab Research Agent." The prompt emphasizes iterative graph exploration using a
graphDB
tool and concluding with afinalAnswer
tool call that provides the research plan. - Initial user message includes details of the
initialServerItems
(fetched usinggetFormattedItems
fromfetch.ts
) and theresearchTopics
. - The node enters a loop (up to
MAX_ITERATIONS
):- The LLM is called with the current conversation history and available tools:
-
graphDB
: To perform Breadth-First Search (BFS) in the Knowledge Graph. This tool's parameters are defined byGraphDBToolParamsSchema
(itemIdsToSearch, searchType, searchDepth, reason). -
finalAnswer
: To signal completion and provide the final research plan text. Its parameters are defined byFinalAnswerToolParamsSchema
(answer).
-
- If the
graphDB
tool is called:-
fetchGraphDbTool
(fromtools/graph_db_fetch.ts
) is executed with the provided arguments. This function queries KùzuDB (specificallykuzuService.getItemBFS
). - The results (or an error) are added to the conversation history as a tool result.
- An
orchestrator_plan_step_generated
event is dispatched.
-
- If the
finalAnswer
tool is called:- The provided
answer
(the research plan) is recorded. - The
isComplete
flag is set to true, and the loop terminates.
- The provided
- If no tool or an unexpected tool is called, or if
graphDB
is called with emptyitemIdsToSearch
, the system may re-prompt the LLM or conclude the planning phase.
- The LLM is called with the current conversation history and available tools:
- The conversation history, loop count, final reasoning (often the research plan itself or an error message), and completion status are saved.
- The node initializes a conversation history with a system prompt that instructs the LLM to act as a "GitLab Research Agent." The prompt emphasizes iterative graph exploration using a
-
Output:
PlanNodeOutput
containingreasoning
(the plan),loopCount
,conversationHistory
, andcompletionStatus
(isComplete, researchPlan). -
Routing:
- If planning is not complete or the plan is empty, the graph ends.
- Otherwise, it proceeds to the
selectNextItemsNode
.
Mermaid Diagram: Plan Construction
graph TD
PLAN_NODE[Plan Construction]
PLAN_NODE --> ANALYZE_ITEMS[Analyze Initial Items & Research Topics]
ANALYZE_ITEMS --> EXPLORE_GRAPH[Explore Knowledge Graph]
EXPLORE_GRAPH --> DISCOVER_ITEMS[Discover Related Items]
DISCOVER_ITEMS --> DECISION{Found Enough Context?}
DECISION -->|No| EXPLORE_GRAPH
DECISION -->|Yes| CREATE_PLAN[Create Research Plan]
CREATE_PLAN --> PLAN_OUTPUT[Output: Research Plan & Item List]
PLAN_OUTPUT --> SELECT_NEXT_ITEMS_NODE[To Select Next Items Node]
Stage 4: Iterative Research - Selecting and Executing Sub-Research
This is a loop managed by the orchestrator, involving selecting a batch of items for detailed research, dispatching them to Server Item Agents, and then aggregating the results. This loop continues until a conclusion is reached or a maximum number of iterations is hit.
4.1: Select Next Items Node
-
Node:
selectNextItemsNode
(orchestrator_agent/nodes/select_next_items.ts
) - Purpose: To decide which GitLab items (from the research plan or previously discovered items) should be researched in the current batch, or if the overall research should conclude.
-
Process:
- The node maintains its own conversation history for selecting items, which accumulates over iterations.
-
Initial Call / Loop Start:
- If it's the first iteration (empty conversation history), a system prompt is constructed. This prompt instructs the LLM to act as a "research orchestrator," using the
research_plan
from theplanNode
and the originaluser_query
to select an initial batch of items. It has two tools:select_items_for_research
andend_research
. - If it's a subsequent iteration, summaries from the
processedItems
of the previous batch (fromaggregateBatchResultsNodeOutput
) are added to the user message, providing new context to the LLM. A continuation prompt asks the LLM to select the next batch or end research.
- If it's the first iteration (empty conversation history), a system prompt is constructed. This prompt instructs the LLM to act as a "research orchestrator," using the
-
LLM Interaction:
- The LLM is called with the current selection conversation history and the tools:
-
select_items_for_research
: Parameters defined bySelectItemsToolParamsSchema
(items_to_research, reasoning_for_selection, research_goal_for_selected_items). -
end_research
: Parameters defined byEndResearchToolParamsSchema
(reason_for_ending_research, overall_conclusion).
-
- The node handles LLM responses, including retries if
InvalidToolArgumentsError
occurs.
- The LLM is called with the current selection conversation history and the tools:
-
Tool Call Processing:
- If
select_items_for_research
is called:- The
items_to_research
are converted toCompleteServerItemId
objects. - The
research_goal_for_selected_items
is recorded. - An
orchestrator_items_selected
event is dispatched.
- The
- If
end_research
is called:- The
overall_conclusion
is recorded, signaling the orchestrator to proceed to final report synthesis.
- The
- If no tool is called or after retries, the system may force a conclusion.
- If
- The loop count for selection (
state.selectNextItemsNodeOutput.loopCount
) is incremented. If it exceedsMAX_SELECT_ITEMS_LOOPS
, research is forced to conclude.
-
Output:
SelectNextItemsNodeOutput
containingitemsToResearchThisIteration
,currentBatchResearchGoal
,reasoning
(which is theoverall_conclusion
if research is ending), and updatedresearchSelectionConversationHistory
andloopCount
. -
Routing:
- If
reasoning
(overall_conclusion) is present, it routes tosynthesizeGlobalReportNode
. - If
itemsToResearchThisIteration
is not empty, it routes toexecuteSubResearchNode
. - If
MAX_SELECT_ITEMS_LOOPS
is reached or no items are selected and no conclusion is made, the graph ends.
- If
4.2: Execute Sub-Research Node (Invoking Server Item Agents)
-
Node:
executeSubResearchNode
(orchestrator_agent/nodes/execute_sub_research.ts
) -
Purpose: To execute the detailed research for each item selected by
selectNextItemsNode
. This is done by invoking instances of the Server Item Agent graph in parallel. -
Process:
- Receives
itemsToResearchThisIteration
andcurrentBatchResearchGoal
fromselectNextItemsNodeOutput
. - If no items are selected, it returns an empty
subAgentRunOutputs
. - A
PQueue
is used to manage concurrent execution of Server Item Agents (concurrency set to 5). - For each
itemToResearch
:- A
parentAgentConversationHistory
is constructed. This includes:- System messages with the main user query and the specific research goal for this item.
- Short summaries from previously processed items in the current orchestration cycle (from
state.aggregateBatchResultsNodeOutput.processedItems
) to provide broader context.
- An
ServerItemAgentInput
is prepared, includingitemIdToResearch
, the specificresearchGoal
(or the main query if no specific goal),parentAgentConversationHistory
,maxLoops
(default 7 ), andcurrentDate
. - The
serverItemAgentGraph
(fromserver_item_agent/graph.ts
) is invoked with this input and the application configuration. - The final state of the Server Item Agent (
finalReport
,finalReportShortSummary
,error
) is collected.
- A
- All results from the sub-agent runs are collected.
- Receives
-
Output:
subAgentRunOutputs
, an array where each element contains theitemIdToResearch
,finalReport
,finalReportShortSummary
, and anyerror
from a Server Item Agent run. This output is implicitly passed to theaggregateBatchResultsNode
via the orchestrator state.
4.3: Server Item Agent Deep Dive
Each Server Item Agent, defined in server_item_agent/graph.ts
, performs a detailed investigation of a single GitLab item.
-
Graph Definition:
initializeResearchNode
->researchIterationNode
(loops) ->finalizeReportNode
(parallel withgenerateShortSummaryNode
) ->joinServerReportsNode
->END
. -
1.
initializeResearchNode
(server_item_agent/nodes/initialize_research.ts
):- Fetches initial details of the assigned
itemIdToResearch
usingappConfig.cachedFetcher
orfetchGenericItemDetails
. - Formats these details into an
initialItemContextString
usinggetFormattedItems
. - Constructs the initial conversation history:
- A system prompt defining its role as a "GitLab Research Sub-Agent," its tools (
graphDB_BFS
,fetch_comments
,fetch_mr_file_content
,fetch_item_details
,generate_final_report
), and the current date. - If
parentAgentConversationHistory
is provided by the orchestrator, it's incorporated. - A user message combining the
researchGoal
and theinitialItemContextString
.
- A system prompt defining its role as a "GitLab Research Sub-Agent," its tools (
- Initializes state variables like
loopCount
,isComplete
,currentResearchData
.
- Fetches initial details of the assigned
-
2.
researchIterationNode
(server_item_agent/nodes/research_iteration.ts
):- This is the main research loop for the sub-agent.
- Increments
loopCount
. IfmaxLoops
is reached, it adds a message urging the LLM to callgenerate_final_report
. - Calls an LLM (configured via
getLlmClient
) with the currentconversationHistory
and a set of tools:-
TOOL_GRAPH_DB_BFS
(GraphDBToolArgsSchema
): ExecutesfetchGraphDbTool
to query KùzuDB. Results are added tocurrentResearchData.bfsResults
andvisitedBfsItemIdsSet
is updated. -
TOOL_FETCH_COMMENTS
(FetchCommentsToolArgsSchema
): ExecutesfetchCommentsTool
. Results stored incurrentResearchData.comments
. -
TOOL_FETCH_MR_FILE_CONTENT
(FetchMrFileToolArgsSchema
): ExecutesfetchMRFileTool
. Results stored incurrentResearchData.fileContents
. -
TOOL_FETCH_ITEM_DETAILS
(FetchItemDetailsToolArgsSchema
): ExecutesfetchItemDetailsTool
. Results stored incurrentResearchData.formattedItems
. -
TOOL_GENERATE_FINAL_REPORT
: Signals readiness to compile the final report. SetspendingReportArguments
with thereasoning_for_completion
.
-
- LLM response (text and tool calls) and subsequent tool results are added to
conversationHistory
. - If
appConfig.saveOutput
is true, prompts and responses are saved. - A
server_item_research_iteration
event is dispatched. -
Routing:
- If
pendingReportArguments
is set (byTOOL_GENERATE_FINAL_REPORT
), it routes to the parallelfinalizeReportNode
andgenerateShortSummaryNode
. - If an error occurs, it ends.
- If
loopCount
exceedsmaxLoops
, it ends. - Otherwise, it loops back to
researchIterationNode
.
- If
-
3.
finalizeReportNode
(server_item_agent/nodes/finalize_report.ts
) (runs in parallel withgenerateShortSummaryNode
):- Checks for
pendingReportArguments
. If missing, an error is set. - Constructs a system prompt for the LLM to synthesize a comprehensive, detailed Markdown report based on the entire conversation history and the provided
reasoning_for_completion
. The prompt asks to include exact copies of relevant code, diagrams, quotes, and comments. - Calls the LLM to generate
finalReportContent
. - The report content (or an error message) is added to the
conversationHistory
. - If
appConfig.saveOutput
is true, the final report is saved to disk. - Sets
finalReport
in the state and clearspendingReportArguments
if successful.
- Checks for
-
4.
generateShortSummaryNode
(server_item_agent/nodes/generate_short_summary.ts
) (runs in parallel withfinalizeReportNode
):- Constructs a system prompt for the LLM to synthesize a concise research summary (max 500 words) based on the conversation history and
reasoning_for_completion
. Markdown is not required for this summary. - Calls the LLM to generate
shortSummaryContent
. - Sets
finalReportShortSummary
in the state. - Any LLM error is appended to the main state error.
- Constructs a system prompt for the LLM to synthesize a concise research summary (max 500 words) based on the conversation history and
-
5.
joinServerReportsNode
(server_item_agent/nodes/join_server_reports.ts
):- This node signifies the completion of the sub-agent's work.
- It dispatches a
server_item_research_completed
event containing thefinalReport
,finalReportSummary
,conversationHistory
, status, and any error. - Sets
isComplete
to true in the state.
4.4: Aggregate Batch Results Node
-
Node:
aggregateBatchResultsNode
(orchestrator_agent/nodes/aggregate_batch_results.ts
) -
Purpose: To collect the
finalReport
,finalReportShortSummary
, anderror
status from each Server Item Agent run in the current batch and update the orchestrator's list ofprocessedItems
. -
Process:
- Iterates through
state.subAgentRunOutputs
(which were produced byexecuteSubResearchNode
). - For each sub-agent output:
- Determines the status ("success" or "error").
- Creates an
OrchestratorItemResearchSummary
event and dispatches it. This event includes the item ID, type, status, final report, short summary, and error details. - Appends the result to the
updatedProcessedItems
list instate.aggregateBatchResultsNodeOutput
.
- Increments the
loopCount
inaggregateBatchResultsNodeOutput
.
- Iterates through
-
Output: Updates
aggregateBatchResultsNodeOutput
with the newprocessedItems
andloopCount
. ThecurrentBatchResearchGoal
andsubAgentRunOutputs
are cleared. -
Routing:
- If
selectNextItemsNodeOutput.reasoning
(the overall conclusion) is present, it routes tosynthesizeGlobalReportNode
. - Otherwise, it loops back to
selectNextItemsNode
for the next iteration.
- If
Mermaid Diagram: Orchestrator's Research Loop (Select -> Execute -> Aggregate)
graph TD
PLAN_OUTPUT[From Plan Node] --> SELECT_ITEMS_NODE_LOOP_ENTRY{"Select Next Items"};
subgraph Orchestrator Iteration
direction TB
SELECT_ITEMS_NODE_LOOP_ENTRY -- Items & Goal --> EXECUTE_SUB_RESEARCH_NODE{"Execute Sub-Research "};
EXECUTE_SUB_RESEARCH_NODE -- Invokes for each item --> SERVER_ITEM_AGENT[Server Item Agent Graph Instances];
SERVER_ITEM_AGENT -- Returns Reports/Summaries --> AGGREGATE_RESULTS_NODE{"Aggregate Batch Results "};
AGGREGATE_RESULTS_NODE -- Updated processedItems --> SELECT_ITEMS_NODE_LOOP_ENTRY;
end
SELECT_ITEMS_NODE_LOOP_ENTRY -- No More Items / Conclusion Reached --> SYNTHESIZE_GLOBAL_REPORT_NODE[To Synthesize Global Report];
Stage 5: Final Report Synthesis
-
Node:
synthesizeGlobalReportNode
(orchestrator_agent/nodes/synthesize_global_report.ts
) - Purpose: To generate a single, comprehensive global report based on the original user query and all the detailed research reports and summaries gathered from the Server Item Agents throughout the process.
-
Process:
- An
orchestrator_running_global_report
event is dispatched. - It compiles
allDetailedReports
by concatenating thefinalReport
from each successfully processed item instate.aggregateBatchResultsNodeOutput.processedItems
. If an item failed or had no report, a note is included. If no items were processed, theplanNodeOutput.conversationHistory
is used as context. - A detailed system prompt is constructed for the LLM, instructing it to act as a "research report synthesizer." The prompt includes:
- The original
user_query
. - The
research_plan
andplan_reasoning
fromplanNodeOutput
. - The
reasoning
fromselectNextItemsNodeOutput
if available (explaining why research might be concluding). - Formatted initial item details (using
getFormattedItems
). - The compiled
allDetailedReports
. - Instructions on formatting (GitLab Flavored Markdown), integrating all data, addressing all aspects of the query, and handling missing information. Specific instructions for formatting links and item IDs are also provided.
- The original
- The LLM is called using
streamText
to generate theglobalReportContent
chunk by chunk. - For each chunk received, an
orchestrator_running_global_report_chunk
event is dispatched, allowing for streaming of the final report to the user. - If
appConfig.saveOutput
is true, the full prompt and the final report are saved to disk. - An
orchestrator_all_research_cycles_completed
event is dispatched with thefinalConclusion
(the global report),totalItemsProcessed
, and status.
- An
-
Output: Sets
orchestratorFinalReport
in the state with the generatedglobalReportContent
and passes through any errors. -
Routing: This is the final node in the main research path, so it proceeds to
END
.
5. Components & Technologies
app_config.ts
)
Configuration (- The
AppConfig
interface defines the central configuration for the application. - It includes services like
KuzuDBService
,GitLabApiClient
,CachedFetcher
, andIndexer
. - It also manages settings like
saveOutput
,outputPath
, API keys (googleApiKey
), preferred LLM provider, and model configurations per node (modelConfigs
). - The
ensureAppConfig
function validates and provides default values for the configuration when a LangGraph runnable is invoked. It checks for required services and API keys based on chosen models. -
getNodeModelConfig
retrieves the appropriate LLM model configuration for a given graph node, falling back to defaults if necessary. -
getLlmClient
instantiates the correct LLM client (e.g., Gemini or Anthropic) based on the node's model configuration and theAppConfig
.
State Management
-
Orchestrator Agent State (
orchestrator_agent/state.ts
): Defines the overall state of the research process, including inputs, outputs of various nodes (input analysis, plan, selection, aggregation), the final report, and error status. -
Server Item Agent State (
server_item_agent/state.ts
): Defines the state for an individual item's research, including its input (itemIdToResearch
,researchGoal
), conversation history, fetched data (comments, BFS results, file contents, formatted items), loop count, completion status, and reports. - LangGraph's
StateGraphArgs
and reducers are used to manage how state is updated and merged across different nodes in the graphs.
Data Fetching and Caching
-
GitLabApiClient
: Handles direct communication with the GitLab API. -
CachedFetcher
: Provides a caching layer on top of theGitLabApiClient
to reduce redundant API calls and improve performance. It's used by various parts of the system, including theIndexer
and item detail fetching tools. -
GitLabDataFetcher
: Used by theIndexer
, this class is responsible for fetching specific details for differentItemType
s (Issue, MR, Epic, Project, Group) and transforming them into the schema expected by the Knowledge Graph. It leverages theCachedFetcher
. -
fetch.ts
: Contains utility functions likefetchItemDetails
(generic item fetching) andgetFormattedItems
(formats item data into a string representation for LLM prompts).
Tooling
The agents use various tools to interact with GitLab data and the Knowledge Graph:
-
Item ID Extraction Tool: Used by
inputAnalysisNode
(internally, not an explicit tool file). -
Knowledge Graph Tools:
-
fetchGraphDbTool
(tools/graph_db_fetch.ts
): Allows agents (primarilyplanNode
andresearchIterationNode
) to perform BFS searches in KùzuDB.
-
-
GitLab Data Fetching Tools (for Server Item Agent):
-
fetchCommentsTool
(tools/fetch_comments.ts
): Fetches comments for an item. -
fetchItemDetailsTool
(tools/fetch_item_details.ts
): Fetches and formats detailed information for specified items. -
fetchMRFileTool
(tools/fetch_mr_file_content.ts
): Fetches content of files within a Merge Request.
-
-
Control Flow Tools (for Server Item Agent):
-
generate_final_report
tool: Used byresearchIterationNode
to signal completion.
-
-
Control Flow Tools (for Orchestrator Agent -
selectNextItemsNode
):-
select_items_for_research
: To choose the next batch of items. -
end_research
: To conclude the entire research.
-
Each tool typically has a Zod schema defining its arguments (e.g., FetchCommentsToolArgsSchema
, GraphDBToolArgsSchema
).
events.ts
)
Event System (- The system uses a custom event dispatching mechanism (
eventGenerator
usingRunnableLambda
anddispatchCustomEvent
) to emit various events throughout the research process. -
Event Types (
AllEvents
): A union of all possible event data structures, categorized by Orchestrator events and Server Item Agent events.-
Orchestrator Events Examples:
orchestrator_server_item_ids_extracted
,orchestrator_plan_step_generated
,orchestrator_items_selected
,orchestrator_item_research_summary
,orchestrator_running_global_report_chunk
,orchestrator_all_research_cycles_completed
,graph_index_progress
. -
Server Item Agent Events Examples:
server_item_research_iteration
,server_item_research_completed
. -
FatalGraphError
for critical errors.
-
Orchestrator Events Examples:
- These events are intended for monitoring, logging, and potentially driving UI updates or other external integrations (e.g., streaming updates via the
chat/route.ts
endpoint ).
Differences from Duo Workflow
The GitLab Research Agent and Duo Workflow, both of which leverage LangGraph, exhibit distinct patterns and serve complementary primary purposes, although they serve different purposes.
LangGraph Patterns and Architecture Breakdown
-
GitLab Research Agent:
This agent employs an orchestrator pattern with multiple specialized mini-agents. Its process is geared towards in-depth investigation and knowledge synthesis:
-
Input Analysis & Planning (
planNode
): It first analyzes the user's query. A crucial part of its planning involves traversing a Knowledge Graph to identify relevant GitLab items (Epics, Issues, MRs, etc.). This stage determines what entities need to be researched. - Creating a Research Plan: Based on the graph traversal and query analysis, a research plan is formulated, which is essentially a list of specific GitLab items requiring detailed investigation.
- Executing Sub-Agent Research: The orchestrator then dispatches parallel mini-agents. Each of these is a focused mini-agent responsible for conducting iterative, in-depth research on a single GitLab item. These agents utilize a range of tools to retrieve details, comments, code content, and perform additional localized graph queries.
- Aggregation and Synthesis: Finally, the findings from all mini agents are aggregated, and a comprehensive global report is synthesized to answer the initial query.
-
Input Analysis & Planning (
-
GitLab Duo Workflow:
-
Software Development Workflow Pattern: This involves distinct phases such as context building, goal disambiguation, planning by a
PlanSupervisorAgent
, optional human approval steps (for plans and tools), and execution by aToolsExecutor
. ThePlannerAgent
in Duo Workflow focuses on creating a detailed, step-by-step plan of actions for an "engineer agent" to follow to achieve a user's goal (e.g., "create an issue," "edit a file"). TheToolsExecutor
then executes these pre-defined tasks. -
Chat Workflow Pattern: The chat workflow demonstrates a more direct iterative loop: an agent processes input, potentially calls tools via a
ToolsExecutor
, and then the agent processes the results to continue the conversation or call more tools.
-
Software Development Workflow Pattern: This involves distinct phases such as context building, goal disambiguation, planning by a
Architecture Pattern Differences:
- Nature of the "Plan": The Research Agent's plan is a list of entities to investigate. Duo Workflow's plan is a sequence of actions to execute.
- Use of Knowledge Graph in Planning: The Research Agent explicitly uses graph database traversal in its planning phase to discover items. Duo Workflow's planner, based on the provided context, derives its plan from the user's goal and the available tools, rather than relying on graph traversal for planning.
- Sub-Agent Specialization: The Research Agent's use of "mini-agents" for in-depth exploration of individual items is a distinct characteristic of its orchestrator pattern, tailored for information gathering and synthesis. Duo Workflow's agents (Planner and Executor) are primarily concerned with defining and then executing a sequence of operations.
Research Agent as a Workflow Inside Duo Workflow
The GitLab Research Agent can be conceptualized as a specialized workflow that could be invoked by a broader system like Duo Workflow.
- Scenario: Imagine a Duo Workflow tasked with a complex goal, such as: "Analyze the performance implications of the new caching strategy (Epic-123) and refactor related services if bottlenecks are found."
-
Integration:
- Duo Workflow could initiate its process.
- As part of its "build context" or "planning" phase, it could call the GitLab Research Agent as a sub-workflow or a powerful tool. The request would be something like: "Provide a comprehensive report on Epic-123, including all related MRs, issues, discussions on performance, and code modules affected by these MRs."
- The Research Agent would execute its full process (graph traversal, sub-agent research, synthesis) and return a detailed report.
- This report then becomes rich, contextual input for Duo Workflow's
PlannerAgent
, enabling it to create a much more informed and effective plan for refactoring. Duo Workflow'sToolsExecutor
would then carry out the refactoring tasks.
This way, the Research Agent acts as a deep information-gathering service that feeds into the action-oriented capabilities of Duo Workflow.
TLDR;
This highlights the core difference in their intended purpose:
- GitLab Research Agent: It is fundamentally designed for "deep" insights and context provisioning. Its primary output is comprehensive, synthesized knowledge derived from exploring and connecting disparate pieces of information across GitLab. It answers the "what," "why," "how are these connected," and "what is the history." It builds understanding.
- GitLab Duo Workflow: It is primarily geared towards taking action and effecting change within the GitLab environment based on a user's goal. While it provides some context, its primary focus is on executing a plan to achieve a tangible outcome (e.g., creating an issue, modifying a file, or running a CI pipeline).
The Research Agent provides the foundational understanding and rich context that can make other agents, like those within Duo Workflow, have access to the entire GitLab platform. Duo Workflow agents can consume the Research Agent's output to make more informed decisions, create more relevant plans, and interact with the user or system with a deeper understanding of the underlying context.