Actualitzat el 25 Set. 2025
6 min
# Pythonpip install litellm# Node.jsnpm install litellm# Exemple: utilitzant OpenAI + Anthropic + Mistralexport OPENAI_API_KEY=sk-...export ANTHROPIC_API_KEY=sk-ant-...export MISTRAL_API_KEY=sk-mis-...from litellm import completionresp = completion(model="gpt-4o-mini", # o "anthropic/claude-3-5-sonnet", "mistral/mistral-large"messages=.---## Streaming, Tools, and JSON Mode### Streaming Responses```pythonfrom litellm import completionfor chunk in completion(model="gpt-4o-mini",messages=.### Cost and Token UsageLiteLLM pot fer un seguiment de l'ús de tokens i estimar el cost per sol·licitud, model o projecte. Amb el proxy, pots exportar l'ús a registres, panells o un receptor de facturació. Això és de gran valor quan combines venedors amb diferents preus.---## The LiteLLM Proxy (LLM Gateway)Si ets un equip o una plataforma, el proxy és la veritable superpotència: un servei central amb encaminament, autenticació, límits de velocitat, registre i observabilitat. Interaccions amb ell utilitzant la superfície de l'API d'OpenAI perquè el codi de la teva aplicació gairebé no canviï.### Start the Proxy```bash# simplest local runlitellm --port 4000/v1/chat/completions. Apunta el teu client OpenAI existent a ` i ja estàs llest.config.yaml:model_list:- model_name: gpt-4o-minilitellm_params:model: openai/gpt-4o-miniapi_key: ${OPENAI_API_KEY}- model_name: claude-3-5-sonnetlitellm_params:model: anthropic/claude-3-5-sonnetapi_key: ${ANTHROPIC_API_KEY}router:strategy: simple_weightedroutes:- model: gpt-4o-miniweight: 0.6- model: claude-3-5-sonnetweight: 0.4rate_limits:requests_per_minute: 120logging:level: infosink: stdoutauth:api_keys:- key: svc-app-123litellm --config config.yaml --port 4000from openai import OpenAIclient = OpenAI(base_url=" api_key="svc-app-123")resp = client.chat.completions.create(model="gpt-4o-mini",messages=.---## Advanced Routing: Latency, Cost, or ReliabilityYou can implement routing strategies like:- Weighted round-robin to A/B models- Lowest-latency-first by region- Cost-aware routing for non-critical endpoints- Fallback-on-error/retry across providersWith a router policy, you can say “prefer cheap, fall back to premium for tough prompts.” This offers high availability and predictable budgets.---## Guardrails, Moderation, and SafetyAdd pre- and post-processing middleware to strip PII, enforce safety filters, or moderate outputs before returning to clients. Combine provider-native moderation (e.g., OpenAI, Google) with your own policy checks in the proxy. Example: require JSON schema validation and re-ask when invalid.---## Observability and Logging- Enable request/response logging with redaction.- Export metrics to Prometheus/Grafana or your APM.- Trace latency, tokens, and cost by endpoint and user.This turns “model roulette” into a managed service with SLOs and budgets.---## Real-World Usage Patterns1) Multi-vendor resilience- Primary: fast/cheap model; Fallback: high-accuracy model on 429/5xx.- Benefits: better uptime, cost control, and stable quality.2) Feature flag model upgrades- Use router weights to canary a new model to 5% of traffic; monitor metrics; ramp up when stable.3) Product tiers- Free tier routed to small models; Pro tier to premium models.4) Prompt registries and templates- Centralize prompts in the proxy so services inherit improvements without redeploys.5) Team billing and budgets- Track spend by API key; enforce soft and hard limits per team or product.---## Security and Compliance Checklist- Store provider keys in your secret manager; reference via env vars in config.- Turn on request redaction and PII scrubbing in logs.- Use per-service API keys for the proxy; rotate regularly.- Set org-wide rate limits and quotas.- Add allowlists/denylists for models and endpoints.---## Troubleshooting: Fast Fixes- “Unauthorized” via proxy: Check `auth.api_keys` and that your client uses `base_url` + correct key.- Model not found: Ensure `model_list` contains the friendly name you’re calling.- Timeouts: Increase `timeout` or route to a lower-latency provider region.- Weird outputs: Enable JSON schema + validation; add retries and fallbacks.- Cost spikes: Turn on caching; route bulk traffic to cheaper models; set per-key quotas.For deeper dives and latest features, the official docs are updated frequently and worth bookmarking. Tutorials like DataCamp’s guide are great for hands-on patterns, and the beginner crash course video can help you see the concepts in action.---## Put It All Together: Reference App Skeleton (Python FastAPI)```python# app.pyfrom fastapi import FastAPIfrom pydantic import BaseModelfrom litellm import completionimport osclass ChatReq(BaseModel):question: strapp = FastAPI@app.post("/ask")async def ask(req: ChatReq):resp = completion(model=os.getenv("DEFAULT_MODEL", "gpt-4o-mini"),messages=.### FAQQ1:What is LiteLLM and why use it over direct provider SDKs?LiteLLM is an OpenAI-compatible gateway for 100+ LLMs, giving you one API and one mental model. It reduces vendor lock-in, simplifies routing, and adds ops features like caching, retries, and cost tracking.Q2:How do I use LiteLLM with the OpenAI SDK?Point the SDK’s base URL to the LiteLLM proxy and use your proxy API key. Your code can stay the same while the proxy swaps providers or models behind the scenes.Q3:Can LiteLLM stream responses and return JSON?Yes. Use `stream=True` to get token streams, and `response_format` with JSON schema to enforce structured outputs across providers.Q4:How do I control costs across different LLM providers?Enable usage logging and cost estimation, add caching, set rate limits, and route bulk traffic to cheaper models via the proxy. Monitor with dashboards for budgets and SLOs.Q5:Is LiteLLM suitable for production teams?Yes. The proxy provides auth, rate limits, routing, observability, and safety middleware. It’s designed as an LLM gateway that centralizes governance while keeping your app OpenAI-compatible.
Com dominar ChatPDF: obtenir informació més ràpidament de documents densos

La millor alternativa a X Auto-Translation per a documents ràpids i precisos

La traducció AI de Samsung no està disponible a l'Iran? Solucions pràctiques

Eines de traducció persa: una guia pràctica per a un treball més ràpid i precís

La millor alternativa a Grok per a una recerca profunda i citada

Les 15 millors funcions del generador d'imatges d'IA que realment utilitzaràs