第四章：Chroma 入门¶

安装¶

pip install chromadb

基本使用¶

创建客户端¶

import chromadb

# 内存模式
client = chromadb.Client()

# 持久化模式
client = chromadb.PersistentClient(path='./chroma_db')

创建集合¶

collection = client.create_collection(name='documents')

添加文档¶

collection.add(
    documents=['这是第一篇文章', '这是第二篇文章'],
    metadatas=[{'source': 'web'}, {'source': 'book'}],
    ids=['doc1', 'doc2']
)

查询¶

results = collection.query(
    query_texts=['搜索内容'],
    n_results=2
)

print(results['documents'])
print(results['distances'])

使用嵌入函数¶

from chromadb.utils import embedding_functions

# OpenAI 嵌入
openai_ef = embedding_functions.OpenAIEmbeddingFunction(
    api_key='your-api-key',
    model_name='text-embedding-3-small'
)

# Sentence Transformers
sentence_ef = embedding_functions.SentenceTransformerEmbeddingFunction(
    model_name='all-MiniLM-L6-v2'
)

# 创建集合时指定
collection = client.create_collection(
    name='documents',
    embedding_function=sentence_ef
)

更新和删除¶

# 更新
collection.update(
    ids=['doc1'],
    documents=['更新后的内容']
)

# 删除
collection.delete(ids=['doc2'])

与 LangChain 集成¶

from langchain_community.vectorstores import Chroma
from langchain_openai import OpenAIEmbeddings

embeddings = OpenAIEmbeddings()

vectorstore = Chroma.from_documents(
    documents=documents,
    embedding=embeddings,
    persist_directory='./chroma_db'
)

# 搜索
results = vectorstore.similarity_search('查询内容', k=3)

小结¶

本章学习了：

✅ Chroma 安装
✅ 创建集合
✅ 添加和查询文档
✅ 嵌入函数
✅ LangChain 集成

恭喜完成向量数据库教程！ 🎉