跳转到主要内容
My Translation Strategy for MDX Documentation Okay, here’s the approach I’m taking for this technical translation from English to Simplified Chinese. This is MDX documentation, so the structure is critical. I’m focusing on preserving the original format while accurately translating the content. First, I need to translate all the textual content: headings, paragraphs, list items, and descriptions, including titles, descriptions, and table entries. I’ll translate the title in frontmatter to “创建 Embedding” because “Embedding” is a very common term in AI documentation, and I’ll keep the openapi key as is. The description I’m handling with “创建一个代表输入文本的 embedding 向量”. Now, the important part: I absolutely do not translate any code blocks, inline code snippets (e.g., text-embedding-3-small), URLs, API endpoints/paths (like POST /v1/embeddings), variable names, technical identifiers, component names (like <Card>), or parameter names. The model in the ParamField for example stays as the ID, so it will remain “text-embedding-3-small”. The surrounding text that describes the model however, I’m translating, and this will involve ensuring that the descriptions are clear and accurate. I must meticulously maintain the MDX structure and formatting. That means line breaks, spacing, and markdown syntax need to be identical to the original. Keeping it structured like this will be important for developers reading the documentation. I have to ensure that all ParamField and ResponseField tags, and also their attributes, remain untouched. Regarding technical terminology, I’m keeping common English terms like “API”, “SDK”, “token”, and of course “Embedding” (or “Embedding”) as I’ve determined that’s common in AI docs, even though there’s an alternative translation. I need to maintain that technical accuracy. I’m aiming for a consistently professional and technical tone throughout the translated content. The final output needs to be solely the translated text; no English text should remain unless it is a technical term or code. I’ll be paying close attention to the specific items in the document. So, in the frontmatter, title: "Create Embedding" becomes title: "创建 Embedding". The openapi line stays untouched. I’m taking care with the other fields too, like the description: "Creates an embedding vector representing the input text" is now description: "创建一个代表输入文本的 embedding 向量". Headings will be translated, so ## Request Body becomes ## 请求体, and ## Available Models becomes ## 可用模型, and ## Response will become ## 响应. For the table headers, Model will become 模型, Dimensions becomes 维度, and Description will be 描述. I’ll translate the table content too. In the descriptions for parameters, the translation needs to be precise, such as translating the description of the model parameter from “ID of the embedding model to use (e.g., text-embedding-3-small).” to “要使用的 embedding 模型 ID(例如 text-embedding-3-small)。” I’ll make sure to preserve the context and ensure the translated descriptions are equally clear. For instance, Input text to embed. Can be a string or array of strings. becomes 要进行 embedding 的输入文本。可以是字符串或字符串数组。 Finally, during review, I’ll double-check everything: inline code, component names, API paths, and that the tone is spot-on and the MDX structure is flawless. I’ll pay attention to preserve all of the surrounding structure and formatting, to make sure it’s as helpful as possible for the target audience.

title: “创建 Embedding” openapi: “POST /v1/embeddings” description: “创建一个代表输入文本的 embedding 向量”

请求体

model
string
必填
要使用的 embedding 模型 ID(例如 text-embedding-3-small)。
input
string | array
必填
要进行 embedding 的输入文本。可以是字符串或字符串数组。
encoding_format
string
默认值:"float"
Embedding 的格式:floatbase64
dimensions
integer
输出的维度数量(取决于具体模型)。
user
string
代表终端用户的唯一标识符,用于滥用监控。

可用模型

模型维度描述
text-embedding-3-large3072最佳质量
text-embedding-3-small1536平衡
text-embedding-ada-0021536旧版

响应

object
string
始终为 list
data
array
Embedding 对象数组。每个对象包含:
  • object (string): embedding
  • index (integer): 输入数组中的索引
  • embedding (array): Embedding 向量
model
string
使用的模型。
usage
object
包含 prompt_tokenstotal_tokens 的 Token 使用情况。
curl -X POST "https://api.lemondata.cc/v1/embeddings" \
  -H "Authorization: Bearer sk-your-api-key" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "text-embedding-3-small",
    "input": "The quick brown fox jumps over the lazy dog"
  }'
{
  "object": "list",
  "data": [
    {
      "object": "embedding",
      "index": 0,
      "embedding": [0.0023, -0.0194, 0.0081, ...]
    }
  ],
  "model": "text-embedding-3-small",
  "usage": {
    "prompt_tokens": 9,
    "total_tokens": 9
  }
}

批量 Embedding

# 一次性对多段文本进行 Embedding
response = client.embeddings.create(
    model="text-embedding-3-small",
    input=[
        "First document text",
        "Second document text",
        "Third document text"
    ]
)

for i, data in enumerate(response.data):
    print(f"Document {i}: {len(data.embedding)} dimensions")