第七章:性能优化¶
批量操作¶
# 批量插入
def batch_insert(collection, data, batch_size=1000):
for i in range(0, len(data), batch_size):
batch = data[i:i+batch_size]
collection.insert(batch)
collection.flush()
# 批量搜索
def batch_search(collection, vectors, batch_size=100):
results = []
for i in range(0, len(vectors), batch_size):
batch = vectors[i:i+batch_size]
batch_results = collection.search(
data=batch,
anns_field="embedding",
param={"metric_type": "COSINE"},
limit=10
)
results.extend(batch_results)
return results
分区策略¶
# 创建分区
collection.create_partition("partition_2024")
collection.create_partition("partition_2023")
# 插入到指定分区
collection.insert(data, partition_name="partition_2024")
# 搜索指定分区
results = collection.search(
data=[query_vector],
anns_field="embedding",
param={"metric_type": "COSINE"},
limit=10,
partition_names=["partition_2024"]
)
内存管理¶
小结¶
性能优化要点:
- 批量操作:减少请求次数
- 分区策略:按时间/类别分区
- 内存管理:按需加载
下一章我们将学习最佳实践。