Error Handling Best Practices¶

Learn how to properly handle errors when integrating Vectorcache into your application to ensure resilient, production-ready implementations.

Overview¶

Vectorcache returns standard HTTP status codes to indicate the success or failure of API requests. Your application should implement proper error handling to gracefully handle failures and provide fallback behavior when the cache service is unavailable.

HTTP Status Codes¶

Vectorcache uses the following HTTP status codes:

Status Code	Meaning	When It Happens	What To Do
200	Success	Request processed successfully	Use the response data
400	Bad Request	Malformed request body or invalid parameters	Fix the request format
401	Unauthorized	Invalid or missing API key	Check your API key
422	Unprocessable Entity	Content not cacheable (images, PDFs, streaming, etc.)	Call LLM directly
429	Too Many Requests	Rate limit exceeded (monthly quota)	Wait or upgrade plan
500	Internal Server Error	Unexpected server error	Retry with exponential backoff
503	Service Unavailable	Database or service outage	Call LLM directly, check Retry-After header

Error Response Format¶

All error responses follow this structure:

{
  "error": "Short error type",
  "message": "Human-readable error description",
  "retry_after": 60,  // (Optional) Seconds to wait before retry
  "fallback": "Suggested fallback action"  // (Optional)
}

JavaScript/TypeScript Error Handling¶

Complete Example with Fallback¶

import { VectorcacheClient, VectorcacheError } from 'vectorcache';
import OpenAI from 'openai';

const vectorcache = new VectorcacheClient({
  apiKey: process.env.VECTORCACHE_API_KEY!,
  baseURL: 'https://api.vectorcache.com'
});

const openai = new OpenAI({
  apiKey: process.env.OPENAI_API_KEY!
});

async function getChatResponse(prompt: string, context: string = ''): Promise<string> {
  try {
    // Try Vectorcache first
    const result = await vectorcache.query({
      prompt: prompt,
      context: context,
      model: 'gpt-4',
      similarity_threshold: 0.85
    });

    if (result.cache_hit) {
      console.log('✅ Cache hit! Saved time and money');
    } else {
      console.log('📝 Cache miss - new response generated');
    }

    return result.response;

  } catch (error: any) {
    return handleVectorcacheError(error, prompt, context);
  }
}

async function handleVectorcacheError(
  error: any,
  prompt: string,
  context: string
): Promise<string> {
  const status = error.response?.status;

  switch (status) {
    case 503:
      // Service unavailable - use fallback
      console.warn('⚠️ Vectorcache temporarily unavailable, calling LLM directly');
      const retryAfter = error.response?.headers['retry-after'] || 60;
      console.log(`   Retry after ${retryAfter} seconds`);
      return callLLMDirectly(prompt, context);

    case 422:
      // Content not cacheable
      console.warn('⚠️ Content not cacheable (may contain images/PDFs), calling LLM directly');
      return callLLMDirectly(prompt, context);

    case 429:
      // Rate limit exceeded
      console.error('❌ Monthly Vectorcache limit exceeded');
      const detail = error.response?.data;
      console.error(`   Usage: ${detail.current_usage}/${detail.monthly_limit}`);
      throw new Error('Vectorcache monthly limit reached. Please upgrade your plan.');

    case 401:
      // Invalid API key
      console.error('❌ Invalid Vectorcache API key');
      throw new Error('Vectorcache authentication failed. Check your API key.');

    case 500:
      // Internal server error - retry with exponential backoff
      console.error('❌ Vectorcache internal error, retrying...');
      await sleep(1000);
      try {
        return await getChatResponse(prompt, context);
      } catch {
        // Retry failed, fallback to direct LLM
        return callLLMDirectly(prompt, context);
      }

    default:
      // Unexpected error - fallback
      console.error('❌ Unexpected Vectorcache error:', error.message);
      return callLLMDirectly(prompt, context);
  }
}

async function callLLMDirectly(prompt: string, context: string = ''): Promise<string> {
  /**
   * Fallback: Call OpenAI directly when Vectorcache is unavailable
   */
  console.log('🔄 Falling back to direct OpenAI call');

  const messages: any[] = [];
  if (context) {
    messages.push({ role: 'system', content: context });
  }
  messages.push({ role: 'user', content: prompt });

  const response = await openai.chat.completions.create({
    model: 'gpt-4',
    messages: messages
  });

  return response.choices[0].message.content || '';
}

function sleep(ms: number): Promise<void> {
  return new Promise(resolve => setTimeout(resolve, ms));
}

// Usage
async function main() {
  const response = await getChatResponse(
    'What are the benefits of semantic caching?',
    'You are a helpful AI assistant'
  );
  console.log('Response:', response);
}

main();

Python Error Handling¶

Complete Example with Fallback¶

import os
import time
from typing import Optional
from vectorcache import VectorcacheClient, VectorcacheError
from openai import OpenAI

vectorcache = VectorcacheClient(
    api_key=os.getenv("VECTORCACHE_API_KEY"),
    base_url="https://api.vectorcache.com"
)

openai_client = OpenAI(api_key=os.getenv("OPENAI_API_KEY"))


def get_chat_response(prompt: str, context: str = "") -> str:
    """
    Get chat response with Vectorcache, falling back to direct LLM on error.
    """
    try:
        # Try Vectorcache first
        result = vectorcache.query(
            prompt=prompt,
            context=context,
            model="gpt-4",
            similarity_threshold=0.85
        )

        if result['cache_hit']:
            print('✅ Cache hit! Saved time and money')
        else:
            print('📝 Cache miss - new response generated')

        return result['response']

    except VectorcacheError as e:
        return handle_vectorcache_error(e, prompt, context)


def handle_vectorcache_error(error: VectorcacheError, prompt: str, context: str = "") -> str:
    """
    Handle Vectorcache errors with appropriate fallback strategies.
    """
    status = error.status_code

    if status == 503:
        # Service unavailable - use fallback
        print('⚠️ Vectorcache temporarily unavailable, calling LLM directly')
        retry_after = error.retry_after or 60
        print(f'   Retry after {retry_after} seconds')
        return call_llm_directly(prompt, context)

    elif status == 422:
        # Content not cacheable
        print('⚠️ Content not cacheable (may contain images/PDFs), calling LLM directly')
        return call_llm_directly(prompt, context)

    elif status == 429:
        # Rate limit exceeded
        print(f'❌ Monthly Vectorcache limit exceeded: {error.message}')
        raise Exception('Vectorcache monthly limit reached. Please upgrade your plan.')

    elif status == 401:
        # Invalid API key
        print('❌ Invalid Vectorcache API key')
        raise Exception('Vectorcache authentication failed. Check your API key.')

    elif status == 500:
        # Internal server error - retry once with exponential backoff
        print('❌ Vectorcache internal error, retrying...')
        time.sleep(1)
        try:
            return get_chat_response(prompt, context)
        except:
            # Retry failed, fallback to direct LLM
            return call_llm_directly(prompt, context)

    else:
        # Unexpected error - fallback
        print(f'❌ Unexpected Vectorcache error: {error.message}')
        return call_llm_directly(prompt, context)


def call_llm_directly(prompt: str, context: str = "") -> str:
    """
    Fallback: Call OpenAI directly when Vectorcache is unavailable.
    """
    print('🔄 Falling back to direct OpenAI call')

    messages = []
    if context:
        messages.append({"role": "system", "content": context})
    messages.append({"role": "user", "content": prompt})

    response = openai_client.chat.completions.create(
        model="gpt-4",
        messages=messages
    )

    return response.choices[0].message.content


# Usage
if __name__ == "__main__":
    response = get_chat_response(
        prompt="What are the benefits of semantic caching?",
        context="You are a helpful AI assistant"
    )
    print(f"Response: {response}")

Retry Strategies¶

Exponential Backoff¶

For 500/503 errors, implement exponential backoff:

async function retryWithBackoff<T>(
  fn: () => Promise<T>,
  maxRetries: number = 3,
  initialDelay: number = 1000
): Promise<T> {
  let lastError: any;

  for (let attempt = 0; attempt < maxRetries; attempt++) {
    try {
      return await fn();
    } catch (error: any) {
      lastError = error;

      // Don't retry on client errors (4xx)
      if (error.response?.status >= 400 && error.response?.status < 500) {
        throw error;
      }

      // Calculate delay: 1s, 2s, 4s, 8s...
      const delay = initialDelay * Math.pow(2, attempt);
      console.log(`Retry attempt ${attempt + 1}/${maxRetries} in ${delay}ms`);

      await new Promise(resolve => setTimeout(resolve, delay));
    }
  }

  throw lastError;
}

// Usage
const result = await retryWithBackoff(() =>
  vectorcache.query({ prompt, context, model: 'gpt-4' })
);

Respecting Retry-After Headers¶

When receiving 503 errors, always check the Retry-After header:

if (error.response?.status === 503) {
  const retryAfter = parseInt(error.response.headers['retry-after'] || '60');
  console.log(`Service unavailable. Retry after ${retryAfter} seconds`);

  // Option 1: Wait and retry
  await new Promise(resolve => setTimeout(resolve, retryAfter * 1000));
  return await getChatResponse(prompt, context);

  // Option 2: Fallback immediately
  return callLLMDirectly(prompt, context);
}

Timeout Configuration¶

Set appropriate timeouts to prevent hanging requests:

// JavaScript/TypeScript
const vectorcache = new VectorcacheClient({
  apiKey: process.env.VECTORCACHE_API_KEY!,
  timeout: 10000  // 10 second timeout
});

# Python
import httpx

vectorcache = VectorcacheClient(
    api_key=os.getenv("VECTORCACHE_API_KEY"),
    timeout=10.0  # 10 second timeout
)

Health Check Monitoring¶

Monitor Vectorcache health before making requests:

async function checkVectorcacheHealth(): Promise<boolean> {
  try {
    const response = await fetch('https://api.vectorcache.com/health/ready');
    const data = await response.json();

    if (response.status === 200 && data.status === 'ready') {
      return true;
    }

    console.warn('Vectorcache not ready:', data);
    return false;
  } catch (error) {
    console.error('Vectorcache health check failed:', error);
    return false;
  }
}

// Usage
const isHealthy = await checkVectorcacheHealth();
if (isHealthy) {
  // Use Vectorcache
  result = await vectorcache.query({...});
} else {
  // Skip cache, use LLM directly
  result = await callLLMDirectly(prompt);
}

Production Checklist¶

Before deploying to production, ensure you:

✅ Implement try-catch blocks around all Vectorcache calls
✅ Add fallback logic to call your LLM directly on errors
✅ Set appropriate timeouts (recommended: 10 seconds)
✅ Log errors for monitoring and debugging
✅ Respect Retry-After headers for 503 errors
✅ Implement exponential backoff for retries
✅ Handle 422 errors (non-cacheable content) gracefully
✅ Monitor rate limits (429 errors) and alert users
✅ Test error scenarios in staging environment

Common Patterns¶

Pattern 1: Always Fallback¶

async function getChatResponse(prompt: string): Promise<string> {
  try {
    const result = await vectorcache.query({...});
    return result.response;
  } catch {
    // Any error: fallback to LLM
    return await callLLMDirectly(prompt);
  }
}

Pattern 2: Fail on Rate Limits¶

async function getChatResponse(prompt: string): Promise<string> {
  try {
    const result = await vectorcache.query({...});
    return result.response;
  } catch (error: any) {
    if (error.response?.status === 429) {
      // Don't fallback on rate limits - force user to upgrade
      throw new Error('Monthly cache limit exceeded');
    }
    // Other errors: fallback
    return await callLLMDirectly(prompt);
  }
}

Pattern 3: Circuit Breaker Client-Side¶

class VectorcacheCircuitBreaker {
  private failureCount = 0;
  private lastFailureTime = 0;
  private readonly threshold = 5;
  private readonly timeout = 60000; // 60 seconds

  async call<T>(fn: () => Promise<T>): Promise<T> {
    // If too many recent failures, skip cache
    if (this.failureCount >= this.threshold) {
      if (Date.now() - this.lastFailureTime < this.timeout) {
        throw new Error('Circuit breaker open');
      }
      // Reset after timeout
      this.failureCount = 0;
    }

    try {
      const result = await fn();
      this.failureCount = 0; // Success resets counter
      return result;
    } catch (error) {
      this.failureCount++;
      this.lastFailureTime = Date.now();
      throw error;
    }
  }
}

Next Steps¶

SDK Reference - Detailed SDK documentation
API Reference - Complete API specification
Best Practices - General best practices
Security - Security considerations