Skip to content

Similarity Threshold Tuning

Optimize your cache hit rate by tuning the similarity threshold parameter.

Understanding Similarity Threshold

The similarity_threshold parameter determines how similar two prompts must be for a cache hit. It ranges from 0 to 1:

  • 1.0 = Exact match only
  • 0.95-0.99 = Nearly identical
  • 0.85-0.94 = Very similar (recommended)
  • 0.75-0.84 = Somewhat similar
  • 0.0-0.74 = Loosely similar (not recommended)

How Similarity Works

Vectorcache uses cosine similarity between vector embeddings:

similarity_score = cosine_similarity(embedding_A, embedding_B)

if similarity_score >= similarity_threshold:
    return cached_response  # Cache hit
else:
    call_llm()              # Cache miss

Example Similarity Scores

Real examples from production:

Prompt 1 Prompt 2 Score Match at 0.85?
"What is ML?" "What is machine learning?" 0.94 ✅ Yes
"Explain AI" "What is artificial intelligence?" 0.88 ✅ Yes
"Python tutorial" "How to learn Python" 0.82 ❌ No
"Reset password" "Change password" 0.79 ❌ No
"Order status" "Track my order" 0.91 ✅ Yes

Finding Your Optimal Threshold

Step 1: Start with Default (0.85)

Begin with the recommended default:

const result = await client.query({
  prompt: userQuery,
  model: 'gpt-4o',
  similarityThreshold: 0.85  // Default
});

Step 2: Monitor Similarity Scores

Track actual similarity scores from cache hits:

const scores: number[] = [];

async function trackQuery(prompt: string) {
  const result = await client.query({
    prompt,
    model: 'gpt-4o',
    similarityThreshold: 0.85
  });

  if (result.cache_hit && result.similarity_score) {
    scores.push(result.similarity_score);
  }

  return result;
}

// Analyze scores after 100+ queries
function analyzeScores() {
  const avg = scores.reduce((a, b) => a + b, 0) / scores.length;
  const min = Math.min(...scores);
  const max = Math.max(...scores);

  console.log(`Average: ${avg.toFixed(3)}`);
  console.log(`Min: ${min.toFixed(3)}`);
  console.log(`Max: ${max.toFixed(3)}`);
}

Step 3: Test Different Thresholds

Run A/B tests with various thresholds:

async function testThresholds(prompts: string[]) {
  const thresholds = [0.75, 0.80, 0.85, 0.90, 0.95];

  for (const threshold of thresholds) {
    let hits = 0;
    const scores: number[] = [];

    for (const prompt of prompts) {
      const result = await client.query({
        prompt,
        model: 'gpt-4o',
        similarityThreshold: threshold
      });

      if (result.cache_hit) {
        hits++;
        if (result.similarity_score) {
          scores.push(result.similarity_score);
        }
      }
    }

    const hitRate = (hits / prompts.length * 100).toFixed(1);
    const avgScore = scores.length > 0
      ? (scores.reduce((a, b) => a + b, 0) / scores.length).toFixed(3)
      : 'N/A';

    console.log(`Threshold ${threshold}: ${hitRate}% hits, avg score: ${avgScore}`);
  }
}

Step 4: Choose Based on Use Case

Select threshold based on your requirements:

Use Case Priority Recommended Threshold
Legal/Medical Accuracy 0.92-0.95
Financial Accuracy 0.90-0.93
Customer Support Balance 0.85-0.90
Educational Hit Rate 0.80-0.85
FAQs Hit Rate 0.80-0.85
General Content Balance 0.85

Use Case Examples

For legal content, false positives are costly:

const legalQuery = await client.query({
  prompt: 'Interpret clause 5.2 of the agreement',
  context: 'legal-contract-review',
  model: 'gpt-4o',
  similarityThreshold: 0.93  // High threshold for accuracy
});

Why 0.93? - Legal queries must be very specific - Different clauses require different interpretations - Cost of wrong answer > cost of LLM call

Balanced: Customer Support

For support chatbots, balance hit rate and accuracy:

const supportQuery = await client.query({
  prompt: 'How do I reset my password?',
  context: 'customer-support',
  model: 'gpt-4o',
  similarityThreshold: 0.87  // Balanced threshold
});

Why 0.87? - Similar questions should get same answer - "Reset password" vs "Change password" should match - Most support queries have common variations

High Hit Rate: Educational FAQs

For educational content, maximize cache hits:

const eduQuery = await client.query({
  prompt: 'What is photosynthesis?',
  context: 'biology-education',
  model: 'gpt-4o',
  similarityThreshold: 0.82  // Lower threshold for more hits
});

Why 0.82? - Educational questions have many phrasings - "What is X?" vs "Explain X" vs "Define X" should match - General explanations are reusable

Dynamic Threshold Adjustment

Adjust threshold based on context:

function getThreshold(context: string): number {
  const thresholdMap: Record<string, number> = {
    'legal': 0.93,
    'medical': 0.92,
    'financial': 0.90,
    'support': 0.87,
    'education': 0.82,
    'default': 0.85
  };

  return thresholdMap[context] || thresholdMap['default'];
}

// Usage
const result = await client.query({
  prompt: userQuery,
  context: userContext,
  model: 'gpt-4o',
  similarityThreshold: getThreshold(userContext)
});

Measuring Impact

Hit Rate vs Threshold

Track how threshold affects hit rate:

interface ThresholdMetrics {
  threshold: number;
  queries: number;
  hits: number;
  hitRate: number;
  avgSimilarity: number;
  costSaved: number;
}

async function measureImpact(
  prompts: string[],
  threshold: number
): Promise<ThresholdMetrics> {
  let hits = 0;
  let totalSimilarity = 0;
  let costSaved = 0;

  for (const prompt of prompts) {
    const result = await client.query({
      prompt,
      model: 'gpt-4o',
      similarityThreshold: threshold
    });

    if (result.cache_hit) {
      hits++;
      totalSimilarity += result.similarity_score || 0;
      costSaved += result.cost_saved;
    }
  }

  return {
    threshold,
    queries: prompts.length,
    hits,
    hitRate: hits / prompts.length,
    avgSimilarity: hits > 0 ? totalSimilarity / hits : 0,
    costSaved
  };
}

Quality vs Quantity Trade-off

Higher threshold = Higher quality, Lower hit rate:

Threshold 0.95: 15% hit rate, $45/month saved   (high quality)
Threshold 0.90: 35% hit rate, $105/month saved  (balanced)
Threshold 0.85: 50% hit rate, $150/month saved  (balanced)
Threshold 0.80: 65% hit rate, $195/month saved  (quantity)
Threshold 0.75: 75% hit rate, $225/month saved  (risky)

Recommendation: Choose the highest threshold that still gives you acceptable hit rate.

Advanced Techniques

Context-Specific Thresholds

Use different thresholds for different contexts:

const contextThresholds = {
  'legal-review': 0.93,
  'medical-advice': 0.92,
  'customer-support': 0.87,
  'product-info': 0.85,
  'general-faq': 0.82
};

async function queryWithContextThreshold(
  prompt: string,
  context: string
) {
  const threshold = contextThresholds[context] || 0.85;

  return await client.query({
    prompt,
    context,
    model: 'gpt-4o',
    similarityThreshold: threshold
  });
}

Adaptive Thresholds

Adjust threshold based on user feedback:

class AdaptiveCache {
  private threshold = 0.85;
  private feedbackScores: number[] = [];

  async query(prompt: string) {
    return await client.query({
      prompt,
      model: 'gpt-4o',
      similarityThreshold: this.threshold
    });
  }

  recordFeedback(helpful: boolean, similarityScore?: number) {
    if (!similarityScore) return;

    // If user found it helpful, this score is good
    if (helpful) {
      this.feedbackScores.push(similarityScore);
    }

    // Adjust threshold based on feedback
    if (this.feedbackScores.length >= 20) {
      const avgGoodScore = this.feedbackScores.reduce((a, b) => a + b) /
                          this.feedbackScores.length;

      // Set threshold slightly below average good score
      this.threshold = Math.max(0.75, avgGoodScore - 0.05);

      console.log(`Adjusted threshold to ${this.threshold.toFixed(2)}`);
      this.feedbackScores = []; // Reset for next period
    }
  }
}

Multi-Tier Thresholds

Try high threshold first, fall back to lower:

async function multiTierQuery(prompt: string) {
  // Try high precision first
  let result = await client.query({
    prompt,
    model: 'gpt-4o',
    similarityThreshold: 0.92
  });

  if (result.cache_hit) {
    return { ...result, tier: 'high-precision' };
  }

  // Fall back to balanced
  result = await client.query({
    prompt,
    model: 'gpt-4o',
    similarityThreshold: 0.85
  });

  return { ...result, tier: result.cache_hit ? 'balanced' : 'miss' };
}

Common Pitfalls

❌ Setting Threshold Too Low

// DON'T: Too many false positives
similarityThreshold: 0.65  // Will match unrelated queries

Problem: You'll get irrelevant cached responses

❌ Setting Threshold Too High

// DON'T: Rarely any cache hits
similarityThreshold: 0.98  // Only exact matches

Problem: Very low hit rate, wasting cache potential

❌ Not Testing with Real Data

// DON'T: Guess based on assumptions
similarityThreshold: 0.85  // "I think this will work"

Solution: Always test with actual user queries

✅ The Right Approach

// DO: Test and measure
const optimalThreshold = await findOptimalThreshold(realUserQueries);

Monitoring and Alerts

Set up alerts for threshold issues:

function monitorThreshold(metrics: CacheMetrics) {
  const hitRate = metrics.cacheHits / metrics.totalQueries;
  const avgSimilarity = metrics.avgSimilarityScore;

  // Hit rate too low
  if (hitRate < 0.2 && metrics.totalQueries > 100) {
    alert('Low cache hit rate - consider lowering threshold', {
      hitRate,
      currentThreshold: 0.85
    });
  }

  // Similarity scores too close to threshold
  if (avgSimilarity < 0.88 && currentThreshold === 0.85) {
    alert('Avg similarity close to threshold - may need adjustment', {
      avgSimilarity,
      currentThreshold: 0.85
    });
  }
}

Summary

Quick Reference:

Scenario Recommended Threshold
Just starting 0.85
Need high accuracy 0.90-0.95
Want high hit rate 0.80-0.85
Legal/Medical 0.92-0.95
Customer support 0.85-0.90
Education/FAQs 0.80-0.85

Process: 1. Start with 0.85 2. Monitor actual similarity scores 3. Test different thresholds with real data 4. Choose based on your quality vs quantity needs 5. Adjust based on user feedback

Next Steps