What is LlamaIndex used for in RAG applications?

LlamaIndex helps you connect your data to LLMs with ingestion, indexing, and retrieval components. It streamlines building RAG systems by handling chunking, vector/keyword indexes, and query orchestration.

How do I choose the right index type in LlamaIndex?

Use a vector index for semantic queries, BM25 for exact matches like IDs or codes, and a hybrid approach for best overall recall and precision. Many teams combine both and add reranking for top‑K results.

How can I improve accuracy when using LlamaIndex?

Tune chunk sizes, include rich metadata, enable hybrid retrieval, and add a reranker. Also implement evaluation for faithfulness and relevance, and use citation mode to show sources.

Can LlamaIndex work with my existing vector database?

Yes. LlamaIndex integrates with popular vector stores like Pinecone, Weaviate, Chroma, Qdrant, and Elasticsearch. Persist indexes for scalability and incremental updates.

How do I deploy a LlamaIndex app to production?

Wrap your query engine in an API (e.g., FastAPI), persist data in a vector store, add caching and observability, and evaluate quality continuously. Enforce metadata filters and access control for security.

วิธีใช้งาน LlamaIndex: คู่มือเชิงปฏิบัติจากศูนย์สู่การใช้งานจริง

หากคุณเคยพยายามสร้างแอปพลิเคชัน Retrieval-Augmented Generation (RAG) และคิดว่า “ทำไมการเชื่อมต่อ Embeddings, Vector Stores และ Prompts ถึงยุ่งยากจัง” คุณไม่ได้อยู่คนเดียว มีขึ้นเพื่อให้กระบวนการนั้นรวดเร็ว สมเหตุสมผล และพร้อมใช้งานจริง ในคู่มือเชิงปฏิบัติที่เน้นการแก้ปัญหา เราจะแนะนำวิธีใช้งาน แบบครบวงจร ตั้งแต่การนำเข้าข้อมูล การจัดทำดัชนี การสืบค้น การประเมินผล และการนำไปใช้งาน เพื่อให้คุณสามารถส่งมอบสิ่งที่เชื่อถือได้โดยไม่หลงทางในโค้ดเชื่อมต่อ

เราจะใช้โครงสร้างที่นำโดยคำถามพร้อมขั้นตอนที่ต่อเนื่อง ตัวอย่างโค้ดที่รันได้ และเคล็ดลับในโลกแห่งความเป็นจริง ไม่ว่าคุณจะสร้างต้นแบบแชทบอทสำหรับเอกสารภายในหรือปรับใช้ผู้ช่วยด้านความรู้สำหรับลูกค้า การเรียนรู้วิธีใช้งาน อย่างมีประสิทธิภาพจะช่วยประหยัดเวลาของคุณได้หลายวัน

: คือเฟรมเวิร์กที่ช่วยให้คุณเชื่อมต่อข้อมูลของคุณกับโมเดลภาษาขนาดใหญ่ด้วยเครื่องมือจัดทำดัชนี การดึงข้อมูล และการจัดการ ซึ่งเหมาะสำหรับ RAG, Agents และ Structured Outputs

คืออะไร และทำไมต้องใช้

LlamaIndex คือเฟรมเวิร์กข้อมูลสำหรับแอป LLM โดยมีส่วนประกอบสำหรับการ:

Ingestion (การนำเข้า): โหลดไฟล์ หน้าเว็บ ฐานข้อมูล และ APIs

Chunking & Indexing (การแบ่งส่วนและการจัดทำดัชนี): เปลี่ยนเนื้อหาดิบให้เป็นโครงสร้างที่สามารถสืบค้นได้ (Vector, Keyword, Graph Indexes)

Retrieval (การดึงข้อมูล): ดึงข้อมูลบริบทด้วยกลยุทธ์ที่ยืดหยุ่น (BM25, Hybrid, Reranking)

Query Engines & Agents (กลไกการสืบค้นและ Agents): รวบรวมการดึงข้อมูล เครื่องมือ และ Prompts ให้เป็นประสบการณ์ QA ที่สอดคล้องกัน

Evaluation & Monitoring (การประเมินผลและการตรวจสอบ): ตัดสินคุณภาพการดึงข้อมูลและความเกี่ยวข้องของคำตอบ

เมื่อใดควรใช้ LlamaIndex:

คุณต้องการ RAG Stack ที่แข็งแกร่งโดยไม่ต้องคิดค้นการแบ่งส่วน, Embeddings และการดึงข้อมูลใหม่

คุณต้องรวมแหล่งข้อมูลหลายแหล่ง (PDFs + Notion + SQL)

คุณต้องการทดลองกับการดึงข้อมูลแบบ Hybrid, Reranking หรือ Structured Outputs

LlamaIndex เมื่อเรียนรู้วิธีใช้งาน LlamaIndex:

Data → Nodes → Index → Retriever → Query Engine → App

Quickstart: The Minimal RAG Loop

นี่คือเส้นทางที่เร็วที่สุดสู่ต้นแบบที่ใช้งานได้ เราจะโหลดเอกสาร สร้าง Vector Index และถามคำถาม

# 1) ติดตั้ง
# pip install llama-index llama-index-embeddings-openai llama-index-llms-openai
import os
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader
from llama_index.embeddings.openai import OpenAIEmbedding
from llama_index.llms.openai import OpenAI
# 2) กำหนดค่า Model + Embeddings ของคุณ
os.environ["OPENAI_API_KEY"] = "YOUR_KEY" # หรือใช้ผู้ให้บริการ LLM/Embedding ที่รองรับ
llm = OpenAI(model="gpt-4o-mini")
embed_model = OpenAIEmbedding(model="text-embedding-3-small")
# 3) โหลดเอกสาร (เช่น ./data/*.pdf, .md, .txt)
docs = SimpleDirectoryReader("./data").load_data
# 4) สร้าง Index
index = VectorStoreIndex.from_documents(docs, embed_model=embed_model)
# 5) สร้าง Query Engine และถามคำถาม
query_engine = index.as_query_engine(llm=llm)
response = query_engine.query("What are the key security practices mentioned in the docs?")
print(response)

นั่นคือหัวใจสำคัญ จากตรงนี้ แอปจริงจะเพิ่มการแบ่งส่วนที่ดีขึ้น, Reranking, Structured Prompts และ Observability

Ingestion: นำข้อมูลของคุณเองมา (BYOD) อย่างถูกวิธี

เมื่อคุณตัดสินใจว่าจะใช้ สำหรับข้อมูลจริงอย่างไร ให้เลือก Loaders ที่ตรงกับแหล่งข้อมูลของคุณและรักษาโครงสร้างไว้

Common Loaders (ตัวโหลดทั่วไป):

Files (ไฟล์): SimpleDirectoryReader, PDF/HTML/Markdown Readers

Web (เว็บ): BeautifulSoupWebReader, Sitemap Readers

SaaS: Notion, Confluence, Slack, Google Drive (ผ่าน Connectors)

Databases (ฐานข้อมูล): SQL และ Vector DBs (Pinecone, Weaviate, Chroma, Elasticsearch)

Tip (เคล็ดลับ): ปรับ Metadata ให้เป็นมาตรฐาน (Title, Author, URL, Created_at) Metadata ที่ดีจะเพิ่มประสิทธิภาพการ Reranking และ Filtering ในภายหลัง

from llama_index.core import SimpleDirectoryReader
from llama_index.readers.web import SimpleWebPageReader
file_docs = SimpleDirectoryReader("./policies").load_data
web_docs = SimpleWebPageReader(html_to_text=True).load_data
all_docs = file_docs + web_docs

Chunking และ Node Parsers: Garbage In, Garbage Out

การแบ่งส่วนให้ถูกต้องเป็นหนึ่งในขั้นตอนที่สำคัญที่สุดเมื่อเรียนรู้วิธีใช้งาน อย่างมีประสิทธิภาพ

Why Chunking Matters (ทำไมการแบ่งส่วนถึงสำคัญ): ใหญ่เกินไป → Token Bloat และการดึงข้อมูลที่ไม่เกี่ยวข้อง เล็กเกินไป → Context Fragmentation

Defaults (ค่าเริ่มต้น): สมเหตุสมผลสำหรับหลายกรณี แต่ปรับแต่งสำหรับประเภทเนื้อหาของคุณ

Heuristics (หลักการชี้นำ):

Technical Docs (เอกสารทางเทคนิค): Chunk ขนาด 512–1024 Tokens โดยมี Overlap 10–20%

FAQs: Chunk ขนาดเล็กกว่า (256–512) เพื่อให้ Q/A Pairs ยังคงอยู่

Legal/Policy (กฎหมาย/นโยบาย): Chunk ขนาดใหญ่กว่า (1024–1536) เพื่อรักษานิยาม + ข้อกำหนด

from llama_index.core.node_parser import SentenceSplitter
from llama_index.core import Document
parser = SentenceSplitter(chunk_size=800, chunk_overlap=100)
nodes = []
for d in all_docs:
nodes.extend(parser.get_nodes_from_documents([Document(text=d.text, metadata=d.metadata)]))

Index Strategies: Vector, Keyword, or Hybrid?

การเลือก Index ที่เหมาะสมเป็นสิ่งสำคัญ ข่าวดีคือ: ช่วยให้คุณรวม Index เหล่านั้นได้

Vector Index: เหมาะสำหรับการค้นหาเชิงความหมาย ดีที่สุดสำหรับ “อธิบาย X” หรือ Fuzzy Queries

Keyword (BM25): แข็งแกร่งสำหรับ Exact Terms, IDs, Error Codes, Logs

Hybrid: รวมทั้งสองอย่าง Rerank Top Candidates ด้วย LLM หรือ Cross‑Encoder

from llama_index.core import VectorStoreIndex, SummaryIndex
from llama_index.core.retrievers import BM25Retriever
from llama_index.core.query_engine import RetrieverQueryEngine
# Vector Index จาก Pre-Parsed Nodes
v_index = VectorStoreIndex(nodes)
# BM25 Keyword Retriever
bm25_retriever = BM25Retriever.from_defaults(nodes=nodes, similarity_top_k=6)
# Hybrid: Merge Candidates, Then Rerank
from llama_index.core.retrievers import RouterRetriever
from llama_index.retrievers.merge import MergerRetriever
v_retriever = v_index.as_retriever(similarity_top_k=6)
hybrid = MergerRetriever(retrievers=[v_retriever, bm25_retriever], top_k=8)
query_engine = RetrieverQueryEngine.from_args(retriever=hybrid)

Reranking และ Filters: เพิ่ม Precision โดยไม่จ่ายแพงเกินไป

Reranking ปรับปรุงคุณภาพคำตอบโดยการจัดลำดับ Chunk ที่ดึงมาใหม่ตามความเกี่ยวข้อง

When to Rerank (เมื่อใดควร Rerank): หากผู้ใช้รายงาน Citations ที่ไม่เกี่ยวข้องหรือ Context ที่ยาวและยืดเยื้อ

Approaches (แนวทาง):

Cross‑Encoders (Bi‑Encoder Embedding Search → Cross‑Encoder Rerank)

LLM‑Based Reranking (มีค่าใช้จ่ายสูงกว่า บางครั้งฉลาดกว่าในข้อความที่ซับซ้อน)

Metadata Filters (เช่น source == 'handbook', created_at > 2024-01-01)

from llama_index.postprocessor.flag_embedding_reranker import FlagEmbeddingReranker
from llama_index.core.query_engine import RetrieverQueryEngine
reranker = FlagEmbeddingReranker(top_n=5, model="BAAI/bge-reranker-base")
query_engine = v_index.as_query_engine(
similarity_top_k=12,
node_postprocessors=[reranker]
)

Prompting และ Query Engines: จากการค้นหาสู่คำตอบ

Query Engine คือที่ที่การดึงข้อมูลมาพบกับการสร้าง เพื่อให้เชี่ยวชาญวิธีใช้งาน ใน Production ให้ออกแบบ Prompts และ Response Synthesis อย่างระมัดระวัง

Answer Synthesis (การสังเคราะห์คำตอบ) Strategies:

Simple “Stuff” (Concatenate) สำหรับ Context ขนาดเล็ก

Tree หรือ Map‑Reduce สำหรับ Context ที่ยาวกว่า

Citation Mode เพื่อแสดงแหล่งที่มา

from llama_index.core.response_synthesizers import get_response_synthesizer
from llama_index.core import ServiceContext
synth = get_response_synthesizer(response_mode="tree_summarize")
query_engine = v_index.as_query_engine(response_synthesizer=synth)
ans = query_engine.query("Summarize the onboarding steps and cite sources.")
print(ans)

Custom Prompts (Prompts ที่กำหนดเอง): ปรับแต่ง Tone, Structured Outputs หรือ Guardrails

from llama_index.core.prompts import PromptTemplate
qa_tmpl = PromptTemplate(
"""
You are a terse, evidence-first assistant. Use only the provided context.
If unsure, say you don't know. Return JSON with keys: answer, sources.
Question: {query_str}
Context: {context_str}
"""
)
query_engine = v_index.as_query_engine(text_qa_template=qa_tmpl)

Agents และ Tools: เมื่อการดึงข้อมูลไม่เพียงพอ

บางครั้งคำตอบต้องใช้การดำเนินการ: เรียกใช้ SQL, เรียก APIs หรือ Browsing Agents ประสานงาน Tools และ Reasoning กับ Retrieval Pipeline ของคุณ

Use Cases (กรณีการใช้งาน): KPI Dashboards (SQL Tool), Support Bots (Ticket Lookup API), Research Agents (Web + RAG)

from llama_index.core.agent import ReActAgent
from llama_index.tools.sql import SQLQueryEngineTool
from sqlalchemy import create_engine
engine = create_engine("sqlite:///analytics.db")
sql_tool = SQLQueryEngineTool.from_engine(engine)
agent = ReActAgent.from_tools([sql_tool], llm=llm, verbose=True)
agent.chat("What was monthly churn in Q2 2025? If needed, query the DB.")

Evaluation: อย่าปล่อยเรือออกไปทั้งที่ยังมองไม่เห็น

การเรียนรู้วิธีใช้งาน อย่างมีความรับผิดชอบหมายถึงการตรวจสอบทั้งการดึงข้อมูลและคำตอบก่อนการเปิดตัว

Offline Eval (การประเมินผลแบบออฟไลน์): ตัดสิน Retrieval Recall/Precision บน Labeled Set

Online Eval (การประเมินผลแบบออนไลน์): Log User Prompts, วัด Satisfaction, Deflection Rates และ Hallucinations

LlamaIndex: LlamaIndex มี Evaluation Helpers สำหรับ Faithfulness และ Answer Relevance

from llama_index.core.evaluation import FaithfulnessEvaluator, RelevancyEvaluator
faith = FaithfulnessEvaluator(llm=llm)
rel = RelevancyEvaluator(llm=llm)
pred = query_engine.query("List SOC 2 control families in our policy.")
print("faithful?", faith.evaluate_response(pred))
print("relevant?", rel.evaluate_response(pred))

Practical Bar (เกณฑ์เชิงปฏิบัติ): สำหรับ Internal Assistants ให้ตั้งเป้า >80% “Useful” Rating ใน Top Queries ก่อนการเปิดตัวในวงกว้าง

Persistence และ Vector Stores: ทำให้สามารถปรับขนาดได้

Indexes ที่สร้างใน Memory จะไม่สามารถใช้งานได้จริงสำหรับ Workloads จริง Persist ไปยัง Vector DB และเปิดใช้งาน Incremental Updates

Popular Backends (Backends ที่นิยม): Pinecone, Weaviate, Chroma, Elasticsearch/OpenSearch, Qdrant

Tip (เคล็ดลับ): ใช้ Namespaces ต่อ Tenant หรือ Department รักษา Metadata ให้สมบูรณ์

# Example: Chroma
# pip install chromadb llama-index-vector-stores-chroma
from llama_index.vector_stores.chroma import ChromaVectorStore
from llama_index.core import StorageContext
import chromadb
chroma_client = chromadb.PersistentClient(path="./chroma_store")
collection = chroma_client.get_or_create_collection("company_knowledge")
vector_store = ChromaVectorStore(chroma_collection=collection)
storage_context = StorageContext.from_defaults(vector_store=vector_store)
index = VectorStoreIndex.from_documents(all_docs, storage_context=storage_context)

Security และ Governance: ส่วนที่ทุกคนลืม

PII Handling: Redact หรือ Hash Sensitive Fields ระหว่าง Ingestion

Access Controls: Filter โดย User Roles ด้วย Metadata Constraints

Content Freshness: Schedule Re‑Ingestion; Mark Versions

Safety: เพิ่ม Refusal Policies และ Source‑Only Constraints ใน Prompts

# Example: Metadata-Based Filtering At Query Time
retriever = index.as_retriever(similarity_top_k=8)
retriever.metadata_filters = {"department": ["legal", "security"], "published": [True]}

From Prototype to Production: Deployment Patterns

Server Pattern: Expose Endpoint /query; Keep Index Warm ใน Memory

Serverless Gotcha: Cold Starts + Large Models สามารถทำร้าย Latency ได้ พิจารณา Managed Inference

Caching: Cache Embeddings และ Frequent Query Results เปิดใช้งาน Partial Updates

Observability: Log Retrieved Nodes, Token Usage, Answer Length และ User Feedback

# Minimal FastAPI Wrapper
# pip install fastapi uvicorn
from fastapi import FastAPI
app = FastAPI
qe = index.as_query_engine(llm=llm)
@app.post("/query")
async def query(payload: dict):
q = payload.get("q", "")
resp = qe.query(q)
return {"answer": str(resp), "sources": [s.node.metadata for s in resp.source_nodes]}

Real‑World Blueprints: เลือกเส้นทางของคุณ

Internal Policy Assistant

Index: Hybrid (BM25 + Vector) พร้อม Reranking

Guardrails: Source‑Only Mode; “I Don’t Know” Fallback

KPI: Resolution Rate สำหรับ Policy Questions

Customer Support Copilot

Index: Product Docs + Release Notes + Tickets

Agents: API Tool เพื่อตรวจสอบ Order/Ticket Status

KPI: First‑Contact Resolution, Deflection, CSAT

Research Analyst

Index: Web + PDFs + Notes; Strong Deduplication

Rerank: Cross‑Encoder; Synthesis: Map‑Reduce

KPI: Time to Insight; Citation Accuracy

Data QA for BI

Tools: SQL Engine + RAG บน Metric Definitions

Governance: Row‑Level Policies; Query Audit

KPI: Correctness vs. Ground Truth

Cost และ Latency: ทำให้รวดเร็ว (และราคาถูก)

Embeddings: Batch Where Possible; ใช้ Smaller Models สำหรับ Recall, Rerank Selectively

Context Size: Aim สำหรับ 1–2k Tokens ของ Most Relevant Chunks

Caching: Cache Top‑K Retrieval สำหรับ Hot Queries; Memoize LLM Calls ด้วย Hashed Prompts

Parallelism: Fan‑Out Retrieval → Fan‑In Rerank เพื่อลด Tail Latency

Common Pitfalls เมื่อเรียนรู้วิธีใช้งาน

Over‑Chunking นำไปสู่ Shallow, Noisy Retrieval

No Metadata Filters ทำให้ Irrelevant Sources หลุดเข้ามา

Relying บน Single Index Type สำหรับ All Content

Skipping Evaluation; Shipping โดยไม่มี Quality Bar

Letting Indexes Get Stale; No Scheduled Refresh

By the Way: Speeding Up Your Workflow ใน Editor

เมื่อคุณ Iterate บน Prompts, Chunkers และ Retrieval Settings ควรสังเกตว่า AI Coding และ Research Sidebar อย่าง Sider.ai สามารถเร่ง Loop ได้ คุณสามารถเก็บ Snippets, Prompts และ Evaluation Notes ไว้ในมือ สร้าง Diffs ของ Prompt Changes และทดสอบ Variations อย่างรวดเร็วโดยไม่ต้องออกจาก Browser ของคุณ สิ่งนี้มีประโยชน์อย่างยิ่งเมื่อคุณกำลังปรับแต่งวิธีใช้งาน ใน Retrieval Strategies ที่แตกต่างกัน

Step‑by‑Step Checklist: จากศูนย์สู่ Production

Ingest Sources และ Normalize Metadata

Tune Chunk Sizes ตาม Content Type

สร้าง Vector + BM25 Indexes เปิดใช้งาน Hybrid Retrieval

เพิ่ม Reranking และ Metadata Filters

Customize Prompts เปิดใช้งาน Citations และ Refusal Policy

Evaluate Faithfulness และ Relevance บน Test Set

Persist ไปยัง Vector Store เปิดใช้งาน Incremental Updates

เพิ่ม Observability, Caching และ RBAC Filters

Wrap ใน API และตั้ง SLAs; Document Failure Modes

Key Takeaways

หากคุณต้องการ Robust RAG App การเรียนรู้วิธีใช้งาน จะช่วยประหยัดเวลาในการ Glue Engineering ได้หลายสัปดาห์

Start Simple จากนั้น Layer Hybrid Retrieval, Reranking และ Structured Prompts

Evaluate ก่อนที่คุณจะ Scale; Persist Indexes และ Monitor Quality ใน Production

Design สำหรับ Governance ตั้งแต่วันแรก Security ไม่ใช่ Bolt‑On

Next Steps

Prototype Quickstart บน Small Document Set

Experiment กับ Hybrid Retrieval และ Reranker

เพิ่ม Evaluation และ Citations; Track Quality Metrics

Move ไปยัง Persistent Vector Store และ Deploy API

FAQ

Q1: ใช้ทำอะไรในแอปพลิเคชัน RAG ช่วยให้คุณเชื่อมต่อข้อมูลของคุณกับ LLMs ด้วยส่วนประกอบ Ingestion, Indexing และ Retrieval ช่วยปรับปรุงการสร้างระบบ RAG โดยการจัดการ Chunking, Vector/Keyword Indexes และ Query Orchestration

Q2: ฉันจะเลือก Index Type ที่ถูกต้องใน ได้อย่างไร ใช้ Vector Index สำหรับ Semantic Queries, BM25 สำหรับ Exact Matches เช่น IDs หรือ Codes และ Hybrid Approach เพื่อให้ได้ Overall Recall และ Precision ที่ดีที่สุด หลายทีมรวมทั้งสองอย่างและเพิ่ม Reranking สำหรับ Top‑K Results

คำถามที่ 3: ฉันจะปรับปรุงความแม่นยำเมื่อใช้ LlamaIndex ได้อย่างไร? ปรับขนาด chunk, ใส่ metadata ที่สมบูรณ์, เปิดใช้งานการดึงข้อมูลแบบผสมผสาน และเพิ่ม reranker นอกจากนี้ ให้ใช้การประเมินความถูกต้องและความเกี่ยวข้อง และใช้โหมดการอ้างอิงเพื่อแสดงแหล่งที่มา

คำถามที่ 4: LlamaIndex สามารถทำงานร่วมกับ vector database ที่ฉันมีอยู่ได้หรือไม่? ได้ LlamaIndex ทำงานร่วมกับ vector store ยอดนิยม เช่น Pinecone, Weaviate, Chroma, Qdrant และ Elasticsearch ได้ ผสานรวม indexes เพื่อความสามารถในการปรับขนาดและการอัปเดตเพิ่มเติม

คำถามที่ 5: ฉันจะนำแอป LlamaIndex ไปใช้งานจริงได้อย่างไร? ครอบ query engine ของคุณใน API (เช่น FastAPI), จัดเก็บข้อมูลใน vector store, เพิ่ม caching และ observability และประเมินคุณภาพอย่างต่อเนื่อง บังคับใช้ metadata filters และ access control เพื่อความปลอดภัย