Rate Limiting & Throttling Overview Genesys Cloud API has rate limits to ensure fair usage and platform stability. Understanding and respecting these limits is critical for production integrations. Standard Rate Limit: 600 requests per minute per organization How Rate Limiting Works Time Windows Rate limits are enforced in 1-minute rolling windows : Window 1: 00:00-00:59 → 600 requests allowed Window 2: 00:01-01:00 → 600 requests allowed (overlaps with Window 1) Window 3: 00:02-01:01 → 600 requests allowed Each minute, the window slides forward. Old requests drop off, new requests are added. Rate Limit Headers Every API response includes rate limit information: X-Rate-Limit-Limit: 600 (max requests per window) X-Rate-Limit-Remaining: 450 (requests left) X-Rate-Limit-Reset: 1679491234 (Unix timestamp) Example You make request 150 at timestamp 1679491200 Header: X-Rate-Limit-Remaining: 450 This means: 150 requests made, 450 more allowed before limit. Detecting Rate Limit Conditions Proactive Detection (Before hitting limit) Monitor remaining requests: async function makeRequest(endpoint) { const response = await fetch(endpoint); const remaining = parseInt( response.headers.get('X-Rate-Limit-Remaining') ); const limit = parseInt(response.headers.get('X-Rate-Limit-Limit')); const percentRemaining = (remaining / limit) * 100; if (percentRemaining < 20) { console.warn('⚠️ Only 20% of rate limit remaining. Slowing down...'); // Reduce request frequency } if (percentRemaining < 5) { console.error('🛑 Critical: <5% remaining. STOP requests immediately.'); // Pause all requests } return response; } Reactive Detection (After hitting limit) Watch for 429 responses: async function makeRequest(endpoint) { const response = await fetch(endpoint); if (response.status === 429) { console.error('❌ Rate limit exceeded!'); const retryAfter = response.headers.get('Retry-After'); const resetTime = response.headers.get('X-Rate-Limit-Reset'); if (retryAfter) { // API tells you when to retry const seconds = parseInt(retryAfter); console.warn(`Wait ${seconds} seconds`); } else if (resetTime) { // Calculate wait time from reset timestamp const now = Math.floor(Date.now() / 1000); const waitSeconds = parseInt(resetTime) - now; console.warn(`Wait until ${new Date(resetTime * 1000).toISOString()}`); } } return response; } Strategy 1: Exponential Backoff (Reactive) Only for when you hit the limit: async function requestWithBackoff(endpoint, maxAttempts = 4) { const delays = [3000, 9000, 27000, 300000]; // 3s, 9s, 27s, 5min for (let attempt = 0; attempt < maxAttempts; attempt++) { const response = await fetch(endpoint); if (response.status !== 429) { return response; // Success (or other error) } // Rate limited if (attempt >= maxAttempts - 1) { throw new Error('Rate limit exceeded after max retries'); } const delayMs = delays[attempt]; console.warn(`Rate limited. Waiting ${delayMs / 1000}s...`); await sleep(delayMs); } } Strategy 2: Request Throttling (Proactive) Limit request rate to stay well below limit: class ThrottledClient { constructor(requestsPerSecond = 8) { // 8 requests/sec = 480/min (80% of 600 limit) this.requestsPerSecond = requestsPerSecond; this.minIntervalMs = 1000 / requestsPerSecond; this.lastRequestTime = 0; } async makeRequest(endpoint) { const now = Date.now(); const timeSinceLastRequest = now - this.lastRequestTime; if (timeSinceLastRequest < this.minIntervalMs) { const waitMs = this.minIntervalMs - timeSinceLastRequest; await sleep(waitMs); } this.lastRequestTime = Date.now(); return fetch(endpoint); } } // Usage const client = new ThrottledClient(8); // 8 req/sec (safe limit) await client.makeRequest('/contacts'); await client.makeRequest('/contacts'); // These will be spaced 125ms apart Strategy 3: Request Queue with Batch Processing Buffer requests and process in batches: class RequestQueue { constructor(batchSize = 50, batchIntervalMs = 5000) { this.queue = []; this.batchSize = batchSize; this.batchIntervalMs = batchIntervalMs; this.processing = false; } async add(endpoint, body) { return new Promise((resolve, reject) => { this.queue.push({ endpoint, body, resolve, reject }); if (!this.processing) { this.processBatch(); } }); } async processBatch() { this.processing = true; while (this.queue.length > 0) { const batch = this.queue.splice(0, this.batchSize); // Process batch in parallel (but within rate limit) const promises = batch.map(item => fetch(item.endpoint, { body: JSON.stringify(item.body) }) .then(res => item.resolve(res)) .catch(err => item.reject(err)) ); await Promise.all(promises); // Wait between batches if (this.queue.length > 0) { console.log(`Processed ${batch.length} requests. Waiting ${this.batchIntervalMs}ms...`); await sleep(this.batchIntervalMs); } } this.processing = false; } } // Usage const queue = new RequestQueue(50, 5000); // 50 requests per 5 seconds for (const contact of millionContacts) { queue.add('/contacts', contact); } Strategy 4: Bulk Operations Most efficient: use bulk endpoints instead of individual requests. Without Bulk (SLOW - 100 requests) // Creating 100 contacts individually for (const contact of contacts) { await fetch('/contacts', { method: 'POST', body: JSON.stringify(contact) }); } // Uses 100 API calls! With Bulk (FAST - 1 request) // Creating 100 contacts in one batch await fetch('/contacts/bulk', { method: 'POST', body: JSON.stringify({ contacts: contacts // Array of 100 }) }); // Uses 1 API call! Benefit: 100x reduction in API calls. Strategy 5: Caching Avoid repeated requests for same data: class CachedClient { constructor(cacheTtlSeconds = 3600) { this.cache = new Map(); this.cacheTtlSeconds = cacheTtlSeconds; } async getContact(contactId) { const cacheKey = `contact:${contactId}`; // Check cache const cached = this.cache.get(cacheKey); if (cached && Date.now() - cached.timestamp < this.cacheTtlSeconds * 1000) { console.log(`Cache hit: ${cacheKey}`); return cached.data; } // Not cached, fetch from API console.log(`Cache miss: ${cacheKey}`); const response = await fetch(`/contacts/${contactId}`); const data = await response.json(); // Store in cache this.cache.set(cacheKey, { data, timestamp: Date.now() }); return data; } clearCache() { this.cache.clear(); } } // Usage const client = new CachedClient(3600); // Cache for 1 hour const contact1 = await client.getContact('c1'); // API call const contact1Again = await client.getContact('c1'); // Cache hit! Strategy 6: WebSocket Subscriptions (Real-time, Low Overhead) Instead of polling, use WebSocket for real-time updates: Polling (Uses many requests) // Poll every 10 seconds = 6 requests/min per user setInterval(async () => { const status = await fetch('/users/user-123?expand=presence'); // Expensive for many users! }, 10000); WebSocket (1 connection) // Single WebSocket connection for real-time updates const ws = new WebSocket('wss://...'); ws.onmessage = (event) => { const { userId, presence } = JSON.parse(event.data); console.log(`User ${userId} is now ${presence}`); }; // Can handle thousands of users on one connection! Benefit: Eliminates polling entirely. Rate Limit Calculation Scenario 1: Simple API calls 10,000 contacts to sync 1 API call per contact = 10,000 calls Rate limit: 600/minute Time needed: 10,000 / 600 = 16.67 minutes Strategy: Use bulk endpoint (1 call) or batch in 600-request chunks Scenario 2: Contact lookup every second 100 concurrent agents Each looks up contact every second = 100 requests/second = 6,000/minute Rate limit: 600/minute You'd exceed limit 10x over! Strategy: Add 100ms delay between requests = 10 requests/sec = 600/minute (perfect!) Scenario 3: Mixed workload - Sync contacts: 1000 requests (use bulk) - Agent presence updates: 100 requests/minute (natural rate) - Search queries: variable (depends on usage) - Reporting: 50 requests/day (batch at night) Total: 1000 + 100 + variable + ~2 = Safe if variable < 500/min Monitoring & Alerting Track rate limit over time class RateLimitMonitor { constructor() { this.history = []; } record(remaining, limit) { const now = new Date(); const percentUsed = ((limit - remaining) / limit) * 100; this.history.push({ now, remaining, percentUsed }); // Alert if trend is concerning if (this.history.length > 10) { const recentAverage = this.history .slice(-10) .reduce((sum, h) => sum + h.percentUsed, 0) / 10; if (recentAverage > 80) { console.warn('⚠️ Average usage >80%. Trending toward limit!'); } } } report() { const avg = this.history.reduce((sum, h) => sum + h.percentUsed, 0) / this.history.length; const max = Math.max(...this.history.map(h => h.percentUsed)); console.log(` Rate Limit Usage Report: Average: ${avg.toFixed(1)}% Peak: ${max.toFixed(1)}% Samples: ${this.history.length} `); } } Recommended Approach: Tiered Strategy Tier 1 - Proactive (Always do this) Monitor X-Rate-Limit-Remaining on every request Implement throttling (8 req/sec = 480/min, safe) Batch operations when possible Tier 2 - Reactive (If approaching limit) Reduce request frequency further Implement caching Queue requests instead of fire-and-forget Tier 3 - Emergency (If hitting limit) Implement exponential backoff Stop new requests Alert operations team Best Practices Monitor proactively - Don't wait for 429 errors Use bulk endpoints - Single call for multiple records Implement throttling - Spread requests evenly Cache aggressively - Don't re-fetch same data Use WebSockets - For real-time subscriptions Batch requests - Process in groups, not individually Set timeouts - Don't retry forever Alert operations - Know when limit is approached Common Mistakes ❌ Fire-and-forget requests - No rate limit awareness ✅ Monitor headers, throttle proactively ❌ Polling instead of WebSocket - Wastes requests ✅ Use WebSocket for real-time data ❌ Individual API calls in loop - 1000 calls instead of 1 ✅ Use bulk endpoints, batch in groups ❌ Ignore rate limit warnings - Hit limit unexpectedly ✅ Monitor, reduce frequency before limit ❌ Unlimited retry - Keep hammering API ✅ Exponential backoff, respect Retry-After Related Topics Chapter 11: Error Handling & Retry Strategy Chapter 11: API Endpoints Reference Chapter 5: Data Actions (rate limiting in flows)