概覽
LlamaIndex 是一個用於 LLM 應用程式的資料框架,在構建 RAG(Retrieval Augmented Generation)系統方面特別強大。LemonData 可與 LlamaIndex 的 OpenAI 整合無縫運作。安裝
複製
pip install llama-index llama-index-llms-openai llama-index-embeddings-openai
基本配置
複製
from llama_index.llms.openai import OpenAI
from llama_index.core import Settings
# Configure LLM
llm = OpenAI(
model="gpt-4o",
api_key="sk-your-lemondata-key",
api_base="https://api.lemondata.cc/v1"
)
# Set as default
Settings.llm = llm
# Simple query
response = llm.complete("What is LemonData?")
print(response.text)
使用不同模型
複製
# OpenAI GPT-4o
gpt4 = OpenAI(
model="gpt-4o",
api_key="sk-your-key",
api_base="https://api.lemondata.cc/v1"
)
# Anthropic Claude (via OpenAI-compatible endpoint)
claude = OpenAI(
model="claude-sonnet-4-5",
api_key="sk-your-key",
api_base="https://api.lemondata.cc/v1"
)
# Google Gemini
gemini = OpenAI(
model="gemini-2.5-flash",
api_key="sk-your-key",
api_base="https://api.lemondata.cc/v1"
)
對話介面
複製
from llama_index.core.llms import ChatMessage
messages = [
ChatMessage(role="system", content="You are a helpful assistant."),
ChatMessage(role="user", content="What is the capital of France?")
]
response = llm.chat(messages)
print(response.message.content)
串流
複製
# Streaming completion
for chunk in llm.stream_complete("Write a poem about AI"):
print(chunk.delta, end="", flush=True)
# Streaming chat
for chunk in llm.stream_chat(messages):
print(chunk.delta, end="", flush=True)
Embeddings
複製
from llama_index.embeddings.openai import OpenAIEmbedding
embed_model = OpenAIEmbedding(
model="text-embedding-3-small",
api_key="sk-your-lemondata-key",
api_base="https://api.lemondata.cc/v1"
)
# Set as default
Settings.embed_model = embed_model
# Get embeddings
embeddings = embed_model.get_text_embedding("Hello, world!")
print(f"Embedding dimension: {len(embeddings)}")
使用文件的 RAG
複製
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader
# Configure settings
Settings.llm = llm
Settings.embed_model = embed_model
# Load documents
documents = SimpleDirectoryReader("./data").load_data()
# Create index
index = VectorStoreIndex.from_documents(documents)
# Query
query_engine = index.as_query_engine()
response = query_engine.query("What is in my documents?")
print(response)
對話引擎
複製
# Create chat engine with memory
chat_engine = index.as_chat_engine(chat_mode="condense_question")
# Multi-turn conversation
response = chat_engine.chat("What is LemonData?")
print(response)
response = chat_engine.chat("How many models does it support?")
print(response)
非同步用法
複製
import asyncio
async def main():
response = await llm.acomplete("Hello!")
print(response.text)
asyncio.run(main())
環境變數
為了使程式碼更簡潔,請使用環境變數:複製
export OPENAI_API_KEY="sk-your-lemondata-key"
export OPENAI_API_BASE="https://api.lemondata.cc/v1"
複製
from llama_index.llms.openai import OpenAI
# Will automatically use environment variables
llm = OpenAI(model="gpt-4o")
最佳實踐
選擇合適的模型
選擇合適的模型
將較快的模型 (GPT-4o-mini) 用於 Embedding 和摘要任務,將強大的模型 (GPT-4o, Claude) 留給最終回應。
優化區塊大小
優化區塊大小
根據您的文件類型調整區塊大小 (chunk size)。對於密集的技術文件使用較小的區塊,對於敘述性內容則使用較大的區塊。
使用快取
使用快取
啟用 LlamaIndex 快取功能,以避免在開發過程中產生多餘的 API 呼叫。