ai

1. Weaviate 是什么？ #

https://weaviate.io 是一个开源的 向量搜索引擎 和 AI-native 数据库，专为存储、检索和分析 向量数据（embeddings） 设计。它结合了传统数据库的灵活性和现代 AI 驱动的语义搜索能力，适用于：

语义搜索（Semantic Search）
推荐系统（Recommendation Systems）
问答系统（Q&A）
异常检测（Anomaly Detection）
知识图谱（Knowledge Graphs）

核心特点 #

支持多种向量索引（HNSW、FLAT、PQ 等）。
内置 NLP 模型（如 OpenAI、Cohere、Hugging Face），可直接生成向量。
GraphQL API 提供灵活的查询方式。
云原生 & 可扩展，支持 Kubernetes 部署。
混合搜索（结合关键词 + 向量搜索）。

2. 核心概念 #

2.1 数据模型 #

Weaviate 的数据结构基于 “类（Classes）” + “属性（Properties）”，类似于传统数据库的表和字段，但支持向量存储：

{
  "class": "Article",
  "properties": {
    "title": { "type": "text" },
    "content": { "type": "text" },
    "author": { "type": "string" }
  },
  "vectorizer": "text2vec-openai"  // 指定向量生成模型
}

2.2 向量化（Vectorization） #

内置向量化：通过 text2vec-* 模块（如 text2vec-openai）自动将文本转为向量。
自定义向量：支持直接上传预计算的向量。

2.3 查询方式 #

向量搜索：基于相似度（如余弦相似度）查找相近数据。
混合搜索：结合关键词过滤（BM25）和向量搜索。
GraphQL API：灵活查询数据关系和属性。

3. 快速入门 #

3.1 安装 Weaviate #

3.1.1 方式 1：Docker 运行 #

docker run -d \
  -p 8080:8080 \
  -e "QUERY_DEFAULTS_LIMIT=25" \
  -e "AUTHENTICATION_ANONYMOUS_ACCESS_ENABLED=true" \
  semitechnologies/weaviate:latest

3.1.2 方式 2：Kubernetes 部署 #

helm repo add weaviate https://weaviate.github.io/weaviate-helm
helm install my-weaviate weaviate/weaviate

3.2 Python 客户端示例 #

import weaviate

# 连接 Weaviate
client = weaviate.Client("http://localhost:8080")

# 创建类（表）
class_obj = {
    "class": "Article",
    "vectorizer": "text2vec-openai",
}
client.schema.create_class(class_obj)

# 插入数据
client.data_object.create(
    {"title": "AI in 2023", "content": "Generative AI is changing the world."},
    "Article"
)

# 向量搜索
result = client.query.get(
    "Article", ["title", "content"]
).with_near_text({
    "concepts": ["technology trends"]
}).do()
print(result)

4. 核心功能 #

4.1 语义搜索 #

{
  Get {
    Article(
      nearText: {
        concepts: ["artificial intelligence"],
        certainty: 0.7  # 相似度阈值
      }
    ) {
      title
      content
    }
  }
}

4.2 混合搜索（关键词 + 向量） #

{
  Get {
    Article(
      hybrid: {
        query: "AI",
        alpha: 0.5  # 权重（0=纯关键词，1=纯向量）
      }
    ) {
      title
    }
  }
}

4.3 自定义向量 #

# 上传预计算向量
client.data_object.create(
    data_object={"title": "Custom Vector"},
    class_name="Article",
    vector=[0.1, 0.2, ..., 0.9]  # 自定义向量
)

4.4 模块化扩展 #

向量生成模块：text2vec-openai、text2vec-cohere、text2vec-huggingface。
其他模块：qna-openai（问答）、ner-transformers（实体识别）。

5. 适用场景 #

场景	说明
语义搜索	用自然语言搜索（如“找关于机器学习的文章”）。
推荐系统	基于内容相似度推荐（如商品、文章）。
去重与聚类	识别相似数据（如新闻去重）。
知识图谱	存储和查询实体关系（如人物-公司-事件）。

6. 竞品对比 #

特性	Weaviate	Milvus	Elasticsearch
核心能力	向量 + GraphQL	纯向量搜索	全文检索 + 向量
内置 NLP	✅（支持 OpenAI/Cohere）	❌	❌
查询语言	GraphQL	SDK/SQL	REST API
部署复杂度	中等	高	低

7. 学习资源 #

官方文档: https://weaviate.io/docs
GitHub: https://github.com/weaviate/weaviate
社区 Slack: Weaviate Slack

8. 总结 #

Weaviate 是 AI 原生数据库 的领先选择，特别适合需要结合语义搜索、推荐系统和知识管理的场景。它的优势在于：

开箱即用的 NLP 集成（如 OpenAI、Cohere）。
灵活的 GraphQL API。
高性能向量索引（HNSW）。

如果你需要构建一个智能搜索或推荐系统，Weaviate 值得尝试！