type
status
date
slug
summary
tags
category
icon
password
发布于 2024 年 11 月 25 日
作者:Darren Edge(高级主管)、Ha Trinh(高级数据科学家)、Jonathan Larson(高级首席数据架构师)

适用于每个用例的经济实惠的 GraphRAG

GraphRAG 项目旨在通过利用非结构化文本中的隐含关系,扩大 AI 系统可以在私有数据集上回答的问题类型。
与传统的向量 RAG(或"语义搜索")相比,GraphRAG 的一个关键优势是它能够回答涉及整个数据集的"全局查询",例如"数据中的主要主题是什么?"或"对 X 最重要的影响是什么?"。相反,向量 RAG 擅长处理答案与查询相似且可以在特定文本区域中找到的"局部查询",这通常适用于"谁"、"什么"、"何时"和"在哪里"等问题。
在最近的博客文章中,我们分享了两种新的查询机制,它们利用 GraphRAG 创建的丰富的基于摘要的数据索引来分别提高局部搜索性能和降低全局搜索成本。
在这篇博文中,我们介绍了一种完全不同的启用图的 RAG 方法,它无需预先对源数据进行摘要,避免了可能对某些用户和用例来说过高的前期索引成本。我们称这种方法为"LazyGraphRAG"。
LazyGraphRAG 的一个关键优势是它在成本和质量方面都具有固有的可扩展性。在一系列竞争方法中(标准向量 RAG、RAPTOR 以及 GraphRAG 的局部、全局和 DRIFT 搜索机制),LazyGraphRAG 在成本-质量范围内表现出色:
  • LazyGraphRAG 的数据索引成本与向量 RAG 相同,仅为完整 GraphRAG 成本的 0.1%。
  • 在与向量 RAG 相当的查询成本下,LazyGraphRAG 在局部查询方面优于所有竞争方法,包括长上下文向量 RAG 和 GraphRAG DRIFT 搜索(我们最近推出的被证明优于向量 RAG 的方法)以及 GraphRAG 局部搜索。
  • 同样的 LazyGraphRAG 配置在全局查询方面也显示出与 GraphRAG 全局搜索相当的答案质量,但查询成本降低了 700 多倍。
  • 在仅为 GraphRAG 全局搜索成本的 4% 的情况下,LazyGraphRAG 在局部和全局查询类型上都显著优于所有竞争方法,包括 C2 级别的 GraphRAG 全局搜索(推荐用于大多数应用的社区层次结构的第三级)。
LazyGraphRAG 即将加入我们的开源 GraphRAG 库,为局部和全局查询提供统一的查询接口,且数据索引成本与标准向量 RAG 相当。

通过延迟 LLM 使用来混合向量 RAG 和图 RAG

LazyGraphRAG 旨在融合向量 RAG 和图 RAG 的优势,同时克服它们各自的局限性:
  • 向量 RAG 是一种"最佳优先搜索"形式,使用与查询的相似性来选择最匹配的源文本块。但是,它对于全局查询无法考虑数据集的广度。
  • GraphRAG 全局搜索是一种"广度优先搜索"形式,利用源文本实体的社区结构来确保查询在考虑数据集的完整广度的情况下得到回答。但是,它对于局部查询无法确定要考虑的最佳社区。
LazyGraphRAG 以迭代加深的方式结合了最佳优先和广度优先搜索动态(表 1)。与完整 GraphRAG 的全局搜索机制相比,这种方法在延迟 LLM 使用方面是"懒惰的",极大地提高了答案生成的效率。通过一个主要参数 - 相关性测试预算 - 可以调整整体性能,该参数以一致的方式控制成本-质量的权衡。

表 1:GraphRAG 与 LazyGraphRAG 的比较

GraphRAG
LazyGraphRAG
构建索引
使用 LLM 提取和描述实体及其关系, b) 使用 LLM 总结每个实体和关系的所有观察结果, c) 使用图统计优化实体图并提取层次社区结构
使用 NLP 名词短语提取来提取概念及其共现, b) 使用图统计优化概念图并提取层次社区结构
总结索引
使用 LLM 总结每个社区中的实体和关系
无 - "懒惰"方法将所有 LLM 使用推迟到查询时
优化查询
无 - 整个过程使用原始查询
使用 LLM 来 a) 识别相关子查询并将它们重新组合成单个扩展查询, b) 用概念图中匹配的概念优化子查询
匹配查询
无 - 所有查询都使用所有社区摘要来回答(广度优先)
对于每个 q 个子查询[3-5]: - 使用文本块嵌入和块-社区关系首先按与查询的相似度对文本块排序,然后按其前 k 个文本块的排名对社区排序(最佳优先) - 使用基于 LLM 的句子级相关性评估器按排名顺序对来自社区的前 k 个未测试文本块的相关性进行评级(广度优先) - 在连续 z 个社区产生零相关文本块后,或达到相关性测试预算 / q 时递归到子社区(迭代加深)
映射答案
使用 LLM 并行回答随机批次社区摘要的原始查询
对于每个 q 个子查询[3-5]: - 从相关文本块构建概念子图 - 使用概念的社区分配将相关块分组在一起 - 使用 LLM 从相关块组中提取与子查询相关的声明,仅关注相关内容 - 对提取的声明进行排序和过滤以适应预定义的上下文窗口大小
减少答案
使用 LLM 用映射的答案回答原始查询
使用 LLM 用提取的映射声明回答扩展的查询

LazyGraphRAG 答案质量达到了最先进水平

我们将不同相关性测试预算级别的 LazyGraphRAG 与一系列竞争方法进行了比较:
  • 数据集:5,590 篇 AP 新闻文章(获得许可使用)
  • 查询:100 个合成查询(50 个局部和 50 个全局),使用将在未来博文中描述的新方法生成
  • 指标:全面性、多样性、赋能性(如此处所述,使用 LLM 对每个指标的答案对进行一对一比较)
  • 条件:包括三种相关性测试预算设置的 LazyGraphRAG,以及来自 GraphRAG 和文献的八种竞争条件(表 2)

表 2:测试条件说明

条件
描述
Z100_Lite
LazyGraphRAG,相关性测试预算为 100,所有步骤都使用低成本 LLM 模型
Z500
LazyGraphRAG,相关性测试预算为 500,对相关性测试使用低成本 LLM,对查询优化和映射/减少答案生成使用更高级(成本更高)的 LLM
Z1500
LazyGraphRAG,相关性测试预算为 1,500,对相关性测试使用低成本 LLM,对查询优化和映射/减少答案生成使用更高级(成本更高)的 LLM
C1
社区级别 1 的 GraphRAG 全局搜索
C2
社区级别 2 的 GraphRAG 全局搜索
C3_Dynamic
使用动态社区选择的社区级别 3 的 GraphRAG 全局搜索(在前一篇博文中介绍的成本节省解决方案)
LS
GraphRAG 局部搜索
DRIFT
GraphRAG DRIFT 搜索
SS_8K
检索 8k 标记输入的向量 RAG 语义搜索
SS_64K
检索 64k 标记输入的向量 RAG 语义搜索(包含以检查长上下文窗口对向量 RAG 的影响)
RAPTOR
一种基于层次聚类的 RAG 方法,与 LazyGraphRAG 有一些相似特征
图 1 显示了 LazyGraphRAG 对八种竞争条件的胜率。在使用低成本 LLM 模型的最低预算级别 100 相关性测试(与 SS_8K 成本相同)下,LazyGraphRAG 在局部和全局查询上显著优于所有条件,除了全局查询的 GraphRAG 全局搜索条件。在增加到 500 相关性测试并使用更高级 LLM 模型(C2 查询成本的 4%)的预算下,LazyGraphRAG 在局部和全局查询上都显著优于所有条件。随着相关性测试预算增加到 1,500,LazyGraphRAG 的胜率继续提高,展示了其在成本与质量方面的可扩展性。
图 1. 显示了 LazyGraphRAG 在不同相关性测试预算(100, 500, 1,500)下,针对每种组合(局部,全局)查询和(全面性,多样性,赋权)指标的竞争条件的获胜率的集群条形图。
图 1. 显示了 LazyGraphRAG 在不同相关性测试预算(100, 500, 1,500)下,针对每种组合(局部,全局)查询和(全面性,多样性,赋权)指标的竞争条件的获胜率的集群条形图。
示例比较见表 3。
查询:在 2024 年健康保险计划的开放注册期间,美国的个人在选择健康保险计划时应考虑哪些因素?
Vector RAG (64K context)
LazyGraphRAG
# Factors to Consider When Selecting a Health Insurance Plan During Open Enrollment for 2024  Selecting a health insurance plan during the open enrollment period can be a complex process, but understanding the key factors to consider can help individuals make informed decisions. Here are some important aspects to keep in mind:  ## Coverage and Networks  One of the primary considerations is the coverage network of the plan. Insurers form coverage networks of hospitals and doctors, and it is crucial to ensure that your preferred healthcare providers are included in the network of the plan you are considering. Many marketplace plans, such as Health Maintenance Organizations (HMOs) and Exclusive Provider Organizations (EPOs), do not cover claims for most non-emergency care outside their networks [Data: Sources (47419, 47421, 47420, 47410, 47408)].  ## Costs: Premiums, Deductibles, and Out-of-Pocket Maximums  Understanding the costs associated with a health insurance plan is essential. This includes the monthly premiums, deductibles, copays, and coinsurance. Plans are typically categorized into different levels such as platinum, gold, silver, and bronze, with bronze plans generally having lower premiums but higher out-of-pocket costs [Data: Sources (47411, 47412, 47416, 47414, 47418)]. Additionally, it is important to consider the out-of-pocket maximum, which is the most you will have to pay in a year for covered services before the insurance starts covering all costs [Data: Sources (47416, 47415)].  ## Tax Credits and Cost-Sharing Reductions  Individuals with lower incomes may qualify for income-based tax credits to help reduce the cost of premiums. It is important to accurately estimate your income for the coming year when applying for these credits to avoid having to repay any excess credits received [Data: Sources (47412, 47413)]. Additionally, those with incomes below 250% of the federal poverty level may qualify for cost-sharing reductions if they select a silver plan, which can lower deductibles and out-of-pocket maximums [Data: Sources (47417, 47418)].  ## Special Considerations for Medicaid and Medicare  For those transitioning from Medicaid, it is important to understand that marketplace plans may come with higher premiums and copays compared to Medicaid. Additionally, many people who have been removed from Medicaid may be eligible to enroll in marketplace plans through an extended enrollment period [Data: Sources (6284, 6285, 6286, 6287)]. For seniors, the Medicare open enrollment period allows for switching between traditional Medicare and Medicare Advantage plans, with considerations for prescription drug coverage and additional benefits like dental and vision [Data: Sources (72060, 72061, 72062, 72063, 72064)].  ## Assistance and Resources  Navigating the complexities of health insurance can be challenging, and seeking assistance from insurance agents or government-funded navigators can be beneficial. These professionals can help individuals understand their options and make informed decisions without recommending specific plans [Data: Sources (47424, 47425, 47426)].  ## Conclusion  Selecting the right health insurance plan requires careful consideration of various factors including coverage networks, costs, eligibility for tax credits and cost-sharing reductions, and available assistance. By thoroughly evaluating these aspects, individuals can choose a plan that best meets their healthcare needs and financial situation for the upcoming year.
### Types of Health Insurance Plans  When selecting a health insurance plan during the open enrollment period for 2024, individuals in the United States have several options to consider:  1. **Marketplace Plans**: These are available under the Affordable Care Act (ACA) and are categorized into platinum, gold, silver, and bronze levels. Bronze plans generally have lower premiums but higher out-of-pocket costs, while platinum plans have higher premiums but lower out-of-pocket costs [Data: Sources (47411, 47420, 47421)].  2. **Low-Cost Marketplace Plans**: These often come with high deductibles, requiring patients to pay thousands of dollars before most coverage kicks in. They also have annual out-of-pocket maximums that can exceed $9,000 for individuals and $18,000 for families [Data: Sources (47415, 47416, 47414)].  3. **Exclusive Provider Organizations (EPOs) and Health Maintenance Organizations (HMOs)**: EPOs and HMOs generally restrict patients to a network of doctors and require a primary care doctor to direct care. They tend to be cheaper but lack out-of-network flexibility [Data: Sources (47420, 47421, 43218, 43217)].  4. **Preferred Provider Organizations (PPOs)**: These plans allow for out-of-network services but at a higher cost. They offer more flexibility compared to HMOs and EPOs [Data: Sources (43217)].  5. **High-Deductible Health Plans (HDHPs)**: Defined as plans with a deductible of at least $1,600 for individual coverage or $3,200 for family coverage, with out-of-pocket maximums of no more than $8,050 or $16,100, respectively. HDHPs usually have lower premiums, and sometimes companies contribute to a health savings account (HSA) to help cover the deductible [Data: Sources (43227, 43226)].  6. **Medicare Advantage**: These are privately run versions of the federal government’s Medicare program, mostly for people aged 65 and over. They often include prescription drug coverage and may offer additional benefits like dental or vision coverage not provided by traditional Medicare [Data: Sources (72063, 72061, 72060, 72062)].  7. **Short-Term Health Insurance Plans**: These plans are limited to three months and can only be renewed for a maximum of four months under new rules. They are intended for temporary coverage but often lack comprehensive benefits [Data: Sources (97999, 97995, 97996, 97997)].  ### Cost Factors: Premiums, Deductibles, Co-pays, and Out-of-Pocket Maximums  The overall cost of health insurance plans in 2024 is influenced by several factors:  – **Premiums**: This is the set monthly cost you pay for your health insurance plan. Premiums have been rising, with a notable increase of 7% for both family and single plans in 2023, partly due to inflation [Data: Sources (83383, 83382, 83384, 83385, 83381, +more)].  – **Deductibles**: The amount you pay out-of-pocket for health care services before your insurance starts to pay. For HDHPs, the deductible is at least $1,600 for individual coverage or $3,200 for family coverage [Data: Sources (43226, 43225)].  – **Co-pays and Co-insurance**: These are the costs you pay each time you receive a medical service. Co-pays are fixed amounts, while co-insurance is a percentage of the service cost.  – **Out-of-Pocket Maximums**: This is the maximum amount you will pay for covered services in a year. For example, HDHPs have out-of-pocket maximums of no more than $8,050 for individual coverage or $16,100 for family coverage [Data: Sources (43227, 43226)].  ### Provider Networks: In-Network vs. Out-of-Network  The network of healthcare providers is a crucial factor in selecting a health insurance plan:  – **In-Network Providers**: These are doctors and hospitals that have agreements with your insurance plan to provide services at lower rates. Ensuring your preferred doctors and specialists are in-network can save you significant costs [Data: Sources (43216, 47419)].  – **Out-of-Network Providers**: Services from these providers are usually more expensive and may not be covered at all, except in emergencies. PPO plans offer some out-of-network coverage but at a higher cost, while HMOs and EPOs generally do not cover non-emergency out-of-network care [Data: Sources (43217, 47421)].  ### Specific Medical Needs and Services  When selecting a health insurance plan, individuals should consider their specific medical needs:  – **Prescription Drugs**: Ensure that your medications are covered by the plan’s formulary, as drug coverage can change annually [Data: Sources (43220, 43218, 43219)].  – **Mental Health Services**: Coverage for mental health treatments is essential, especially with new rules pushing insurers to increase their coverage of these services [Data: Sources (97031, 97028, 97027, 97030, 97033, +more)].  – **Chronic Conditions**: Plans should cover ongoing treatments and medications for chronic conditions. Medicare Supplement Insurance (Medigap) can help cover gaps in Medicare for chronic disease management [Data: Sources (93367, 93368)].  – **Preventive Care**: Coverage for preventive services like cancer screenings and HIV prevention is mandated under the ACA, though its future is uncertain due to ongoing legal battles [Data: Sources (71106, 71109, 71098, 71099, 71100, +more)].  ### Key Dates and Steps for Open Enrollment  The open enrollment period for 2024 health insurance plans involves several key dates and steps:  – **Marketplace Plans**: Open enrollment starts on November 1, 2023, and runs through mid-December in most states, ending on January 16, 2024 [Data: Sources (47419, 47411, 47416, 47421, 47409, +more)].  – **Medicare**: Open enrollment for Medicare runs from October 15, 2023, to December 7, 2023. During this period, individuals can choose between traditional Medicare, Medicare Advantage plans, and prescription drug plans [Data: Sources (72061, 72063, 72060, 72062)].  – **Special Enrollment Periods**: Individuals who lose coverage due to life events like job loss or moving may qualify for special enrollment periods. For example, those removed from Medicaid may enroll in marketplace plans through July 2024 [Data: Sources (6288, 6289)].  By considering these factors, individuals can make informed decisions about their health insurance coverage for 2024, ensuring they select plans that best meet their medical needs and financial situations.

Looking forward

展望未来

LazyGraphRAG 表明,单一灵活的查询机制可以在局部-全局查询范围内显著优于各种专门的查询机制,而且无需 LLM 数据摘要的前期成本。其非常快速且几乎免费的索引使 LazyGraphRAG 非常适合一次性查询、探索性分析和流数据用例,而其能够随着相关性测试预算的增加平稳提高答案质量的特性,使其成为基准测试 RAG 方法的有价值工具(例如,"RAG 方法 X 在任务 Z 上胜过预算为 Y 的 LazyGraphRAG")。
这是否意味着所有启用图的 RAG 都应该是懒惰的?我们认为答案是否定的,原因有三:
  1. GraphRAG 数据索引的实体、关系和社区摘要在问答之外具有使用价值(例如,作为报告阅读和共享)。
  1. GraphRAG 数据索引的实体、关系和社区摘要,结合类似 LazyGraphRAG 的搜索机制,可能会比单独使用 LazyGraphRAG 取得更好的结果。
  1. 设计用于支持类似 LazyGraphRAG 搜索机制的新型 GraphRAG 数据索引(例如,通过预先声明和主题提取)可能会获得最好的可能结果。
我们将在接下来的时期探索这些方向,所有进展(包括 LazyGraphRAG 本身)都将通过 GraphRAG GitHub 仓库发布。敬请期待!

作者简介

Darren Edge

高级主管

Ha Trinh

高级数据科学家

Jonathan Larson

高级首席数据架构师
 

补充资料

需要对比GraphRag和LazyGraphRag的区别的朋友可以看下面这里的对比。
GraphRAG 和 LazyGraphRAG 的区别Cataclysm: Dark Days Ahead 0.H "Herbert" 版本发布公告
Loading...