이 페이지는 Cloud Translation API를 통해 번역되었습니다.

Gemini Live API에서 Vertex AI RAG Engine 사용

검색 증강 생성 (RAG)은 LLM이 검증 가능한 대답을 생성할 수 있도록 관련 정보를 검색하고 제공하는 데 사용되는 기술입니다. 정보에는 최신 정보, 주제 및 컨텍스트 또는 정답이 포함될 수 있습니다.

이 페이지에서는 RAG 코퍼스에서 정보를 지정하고 검색할 수 있는 Gemini Live API와 함께 Vertex AI RAG Engine을 사용하는 방법을 보여줍니다.

기본 요건

멀티모달 Live API와 함께 Vertex AI RAG Engine을 사용하려면 다음 선행 조건을 완료해야 합니다.

Vertex AI에서 RAG API를 사용 설정합니다.
RAG 코퍼스 만들기 예시
RAG 코퍼스에 파일을 업로드하려면 RAG 파일 가져오기 예시 API를 참고하세요.

설정

Vertex AI RAG Engine을 도구로 지정하여 Live API와 함께 Vertex AI RAG Engine을 사용할 수 있습니다. 다음 코드 샘플은 Vertex AI RAG 엔진을 도구로 지정하는 방법을 보여줍니다.

다음 변수를 바꿉니다.

YOUR_PROJECT_ID: Google Cloud 프로젝트의 ID.
YOUR_CORPUS_ID: 코퍼스 ID입니다.
YOUR_LOCATION: 요청을 처리하는 리전.

PROJECT_ID = "YOUR_PROJECT_ID"
RAG_CORPUS_ID = "YOUR_CORPUS_ID"
LOCATION = "YOUR_LOCATION"

TOOLS = {
  "retrieval": {
    "vertex_rag_store": {
        "rag_resources": {
        "rag_corpus": "projects/${PROJECT_ID}/locations/${LOCATION}/ragCorpora/${RAG_CORPUS_ID}"
      }
    }
  }
}

실시간 커뮤니케이션에 `Websocket` 사용하기

클라이언트와 서버 간의 실시간 통신을 사용 설정하려면 Websocket를 사용해야 합니다. 이 코드 샘플은 Python API와 Python SDK를 사용하여 Websocket를 사용하는 방법을 보여줍니다.

Python API

CONFIG = {"response_modalities": ["TEXT"], "speech_config": { "language_code": "en-US" }}
headers = {
  "Content-Type": "application/json",
  "Authorization": f"Bearer {bearer_token[0]}",
}
HOST= "${LOCATION}-aiplatform.googleapis.com"
SERVICE_URL = f"wss://{HOST}/ws/google.cloud.aiplatform.v1beta1.LlmBidiService/BidiGenerateContent"
MODEL="gemini-2.0-flash-exp"

# Connect to the server
async with connect(SERVICE_URL, additional_headers=headers) as ws:
  # Setup the session
  await ws.send(
json.dumps(
          {
              "setup": {
                  "model": MODEL,
                  "generation_config": CONFIG,
                  # Setup RAG as a retrieval tool
                  "tools": TOOLS,
              }
          }
      )
  )

  # Receive setup response
  raw_response = await ws.recv(decode=False)
  setup_response = json.loads(raw_response.decode("ascii"))

  # Send text message
  text_input = "What are popular LLMs?"
  display(Markdown(f"**Input:** {text_input}"))

  msg = {
      "client_content": {
          "turns": [{"role": "user", "parts": [{"text": text_input}]}],
          "turn_complete": True,
      }
  }

  await ws.send(json.dumps(msg))

  responses = []

  # Receive chunks of server response
  async for raw_response in ws:
      response = json.loads(raw_response.decode())
      server_content = response.pop("serverContent", None)
      if server_content is None:
          break

      model_turn = server_content.pop("modelTurn", None)
      if model_turn is not None:
          parts = model_turn.pop("parts", None)
          if parts is not None:
              display(Markdown(f"**parts >** {parts}"))
              responses.append(parts[0]["text"])

      # End of turn
      turn_complete = server_content.pop("turnComplete", None)
      if turn_complete:
          grounding_metadata = server_content.pop("groundingMetadata", None)
          if grounding_metadata is not None:
            grounding_chunks = grounding_metadata.pop("groundingChunks", None)
            if grounding_chunks is not None:
              for chunk in grounding_chunks:
                display(Markdown(f"**grounding_chunk >** {chunk}"))
          break

  # Print the server response
  display(Markdown(f"**Response >** {''.join(responses)}"))

Python SDK

생성형 AI SDK를 설치하는 방법을 알아보려면 라이브러리 설치를 참고하세요.

from google import genai
from google.genai import types
from google.genai.types import (Content, LiveConnectConfig, HttpOptions, Modality, Part,)
from IPython import display

MODEL="gemini-2.0-flash-exp"

client = genai.Client(
  vertexai=True,
  project=PROJECT_ID,
  location=LOCATION
)

async with client.aio.live.connect(
  model=MODEL,
  config=LiveConnectConfig(response_modalities=[Modality.TEXT],
                            tools=TOOLS),
) as session:
  text_input = "\'What are core LLM techniques?\'"
  print("> ", text_input, "\n")
  await session.send_client_content(
      turns=Content(role="user", parts=[Part(text=text_input)])
  )

  async for message in session.receive()
      if message.text:
          display.display(display.Markdown(message.text))
          continue

Vertex AI RAG Engine을 컨텍스트 스토어로 사용

Vertex AI RAG Engine을 Gemini Live API의 컨텍스트 스토어로 사용하여 세션 컨텍스트를 저장하여 대화와 관련된 이전 컨텍스트를 형성하고 검색하며 모델 생성을 위한 현재 컨텍스트를 보강할 수 있습니다. 이 기능을 활용하여 여러 Live API 세션 간에 컨텍스트를 공유할 수도 있습니다.

Vertex AI RAG Engine은 세션 컨텍스트에서 다음 형식의 데이터를 저장하고 색인을 생성하는 것을 지원합니다.

텍스트
오디오 음성

MemoryCorpus 유형 코퍼스 만들기

세션 컨텍스트의 대화 텍스트를 저장하고 색인을 생성하려면 MemoryCorpus 유형의 RAG 코퍼스를 만들어야 합니다. 또한 색인 생성을 위해 Live API에서 저장된 세션 컨텍스트를 파싱하여 메모리를 빌드하는 데 사용되는 메모리 코퍼스 구성에서 LLM 파서를 지정해야 합니다.

이 코드 샘플은 코퍼스를 만드는 방법을 보여줍니다. 하지만 먼저 변수를 값으로 바꿔야 합니다.

# Currently supports Google first-party embedding models
EMBEDDING_MODEL = YOUR_EMBEDDING_MODEL  # Such as "publishers/google/models/text-embedding-005"
MEMORY_CORPUS_DISPLAY_NAME = YOUR_MEMORY_CORPUS_DISPLAY_NAME
LLM_PARSER_MODEL_NAME = YOUR_LLM_PARSER_MODEL_NAME  # Such as "projects/{project_id}/locations/{location}/publishers/google/models/gemini-2.5-pro-preview-05-06"

memory_corpus = rag.create_corpus(
   display_name=MEMORY_CORPUS_DISPLAY_NAME,
   corpus_type_config=rag.RagCorpusTypeConfig(
       corpus_type_config=rag.MemoryCorpus(
           llm_parser=rag.LlmParserConfig(
               model_name=LLM_PARSER_MODEL_NAME,
           )
       )
   ),
   backend_config=rag.RagVectorDbConfig(
       rag_embedding_model_config=rag.RagEmbeddingModelConfig(
           vertex_prediction_endpoint=rag.VertexPredictionEndpoint(
               publisher_model=EMBEDDING_MODEL
           )
       )
   ),
)

컨텍스트를 저장할 메모리 코퍼스 지정

Live API에서 메모리 코퍼스를 사용하는 경우 메모리 코퍼스를 검색 도구로 지정한 다음 store_context를 true로 설정하여 Live API가 세션 컨텍스트를 저장하도록 허용해야 합니다.

이 코드 샘플은 컨텍스트를 저장할 메모리 코퍼스를 지정하는 방법을 보여줍니다. 단, 먼저 변수를 값으로 바꿔야 합니다.

from google import genai
from google.genai import types
from google.genai.types import (Content, LiveConnectConfig, HttpOptions, Modality, Part)
from IPython import display

PROJECT_ID=YOUR_PROJECT_ID
LOCATION=YOUR_LOCATION
TEXT_INPUT=YOUR_TEXT_INPUT
MODEL_NAME=YOUR_MODEL_NAME  # Such as "gemini-2.0-flash-exp"

client = genai.Client(
   vertexai=True,
   project=PROJECT_ID,
   location=LOCATION,
)

memory_store=types.VertexRagStore(
   rag_resources=[
       types.VertexRagStoreRagResource(
           rag_corpus=memory_corpus.name
       )
   ],
   store_context=True
)

async with client.aio.live.connect(
   model=MODEL_NAME,
   config=LiveConnectConfig(response_modalities=[Modality.TEXT],
                            tools=[types.Tool(
                                retrieval=types.Retrieval(
                                    vertex_rag_store=memory_store))]),
) as session:
   text_input=TEXT_INPUT
   await session.send_client_content(
       turns=Content(role="user", parts=[Part(text=text_input)])
   )

   async for message in session.receive():
       if message.text:
           display.display(display.Markdown(message.text))
           continue

다음 단계

Vertex AI RAG Engine을 자세히 알아보려면 Vertex AI RAG Engine 개요를 참조하세요.
RAG API에 대해 자세히 알아보려면 Vertex AI RAG Engine API를 참고하세요.
RAG 코퍼스를 관리하려면 코퍼스 관리를 참고하세요.
RAG 파일을 관리하려면 파일 관리를 참고하세요.
Vertex AI SDK를 사용하여 Vertex AI RAG Engine 태스크를 실행하는 방법은 Python용 RAG 빠른 시작을 참조하세요.