Rate Limiting & Throttling

Overview 
 Genesys Cloud API has rate limits to ensure fair usage and platform stability. Understanding and respecting these limits is critical for production integrations. 
 Standard Rate Limit: 600 requests per minute per organization 
 
 How Rate Limiting Works 
 Time Windows 
 Rate limits are enforced in 1-minute rolling windows : 
 Window 1: 00:00-00:59 → 600 requests allowed
Window 2: 00:01-01:00 → 600 requests allowed
 (overlaps with Window 1)
Window 3: 00:02-01:01 → 600 requests allowed
 
 Each minute, the window slides forward. Old requests drop off, new requests are added. 
 
 Rate Limit Headers 
 Every API response includes rate limit information: 
 X-Rate-Limit-Limit: 600 (max requests per window)
X-Rate-Limit-Remaining: 450 (requests left)
X-Rate-Limit-Reset: 1679491234 (Unix timestamp)
 
 Example 
 You make request 150 at timestamp 1679491200
Header: X-Rate-Limit-Remaining: 450

This means: 150 requests made, 450 more allowed before limit.
 
 
 Detecting Rate Limit Conditions 
 Proactive Detection (Before hitting limit) 
 Monitor remaining requests: 
 async function makeRequest(endpoint) {
 const response = await fetch(endpoint);
 const remaining = parseInt(
 response.headers.get('X-Rate-Limit-Remaining')
 );
 const limit = parseInt(response.headers.get('X-Rate-Limit-Limit'));
 
 const percentRemaining = (remaining / limit) * 100;
 
 if (percentRemaining < 20) {
 console.warn('⚠️ Only 20% of rate limit remaining. Slowing down...');
 // Reduce request frequency
 }
 
 if (percentRemaining < 5) {
 console.error('🛑 Critical: <5% remaining. STOP requests immediately.');
 // Pause all requests
 }

 return response;
}
 
 Reactive Detection (After hitting limit) 
 Watch for 429 responses: 
 async function makeRequest(endpoint) {
 const response = await fetch(endpoint);
 
 if (response.status === 429) {
 console.error('❌ Rate limit exceeded!');
 
 const retryAfter = response.headers.get('Retry-After');
 const resetTime = response.headers.get('X-Rate-Limit-Reset');
 
 if (retryAfter) {
 // API tells you when to retry
 const seconds = parseInt(retryAfter);
 console.warn(`Wait ${seconds} seconds`);
 } else if (resetTime) {
 // Calculate wait time from reset timestamp
 const now = Math.floor(Date.now() / 1000);
 const waitSeconds = parseInt(resetTime) - now;
 console.warn(`Wait until ${new Date(resetTime * 1000).toISOString()}`);
 }
 }
 
 return response;
}
 
 
 Strategy 1: Exponential Backoff (Reactive) 
 Only for when you hit the limit: 
 async function requestWithBackoff(endpoint, maxAttempts = 4) {
 const delays = [3000, 9000, 27000, 300000]; // 3s, 9s, 27s, 5min

 for (let attempt = 0; attempt < maxAttempts; attempt++) {
 const response = await fetch(endpoint);

 if (response.status !== 429) {
 return response; // Success (or other error)
 }

 // Rate limited
 if (attempt >= maxAttempts - 1) {
 throw new Error('Rate limit exceeded after max retries');
 }

 const delayMs = delays[attempt];
 console.warn(`Rate limited. Waiting ${delayMs / 1000}s...`);
 await sleep(delayMs);
 }
}
 
 
 Strategy 2: Request Throttling (Proactive) 
 Limit request rate to stay well below limit: 
 class ThrottledClient {
 constructor(requestsPerSecond = 8) {
 // 8 requests/sec = 480/min (80% of 600 limit)
 this.requestsPerSecond = requestsPerSecond;
 this.minIntervalMs = 1000 / requestsPerSecond;
 this.lastRequestTime = 0;
 }

 async makeRequest(endpoint) {
 const now = Date.now();
 const timeSinceLastRequest = now - this.lastRequestTime;

 if (timeSinceLastRequest < this.minIntervalMs) {
 const waitMs = this.minIntervalMs - timeSinceLastRequest;
 await sleep(waitMs);
 }

 this.lastRequestTime = Date.now();
 return fetch(endpoint);
 }
}

// Usage
const client = new ThrottledClient(8); // 8 req/sec (safe limit)
await client.makeRequest('/contacts');
await client.makeRequest('/contacts');
// These will be spaced 125ms apart
 
 
 Strategy 3: Request Queue with Batch Processing 
 Buffer requests and process in batches: 
 class RequestQueue {
 constructor(batchSize = 50, batchIntervalMs = 5000) {
 this.queue = [];
 this.batchSize = batchSize;
 this.batchIntervalMs = batchIntervalMs;
 this.processing = false;
 }

 async add(endpoint, body) {
 return new Promise((resolve, reject) => {
 this.queue.push({ endpoint, body, resolve, reject });

 if (!this.processing) {
 this.processBatch();
 }
 });
 }

 async processBatch() {
 this.processing = true;

 while (this.queue.length > 0) {
 const batch = this.queue.splice(0, this.batchSize);

 // Process batch in parallel (but within rate limit)
 const promises = batch.map(item =>
 fetch(item.endpoint, { body: JSON.stringify(item.body) })
 .then(res => item.resolve(res))
 .catch(err => item.reject(err))
 );

 await Promise.all(promises);

 // Wait between batches
 if (this.queue.length > 0) {
 console.log(`Processed ${batch.length} requests. Waiting ${this.batchIntervalMs}ms...`);
 await sleep(this.batchIntervalMs);
 }
 }

 this.processing = false;
 }
}

// Usage
const queue = new RequestQueue(50, 5000); // 50 requests per 5 seconds

for (const contact of millionContacts) {
 queue.add('/contacts', contact);
}
 
 
 Strategy 4: Bulk Operations 
 Most efficient: use bulk endpoints instead of individual requests. 
 Without Bulk (SLOW - 100 requests) 
 // Creating 100 contacts individually
for (const contact of contacts) {
 await fetch('/contacts', {
 method: 'POST',
 body: JSON.stringify(contact)
 });
}
// Uses 100 API calls!
 
 With Bulk (FAST - 1 request) 
 // Creating 100 contacts in one batch
await fetch('/contacts/bulk', {
 method: 'POST',
 body: JSON.stringify({
 contacts: contacts // Array of 100
 })
});
// Uses 1 API call!
 
 Benefit: 100x reduction in API calls. 
 
 Strategy 5: Caching 
 Avoid repeated requests for same data: 
 class CachedClient {
 constructor(cacheTtlSeconds = 3600) {
 this.cache = new Map();
 this.cacheTtlSeconds = cacheTtlSeconds;
 }

 async getContact(contactId) {
 const cacheKey = `contact:${contactId}`;

 // Check cache
 const cached = this.cache.get(cacheKey);
 if (cached && Date.now() - cached.timestamp < this.cacheTtlSeconds * 1000) {
 console.log(`Cache hit: ${cacheKey}`);
 return cached.data;
 }

 // Not cached, fetch from API
 console.log(`Cache miss: ${cacheKey}`);
 const response = await fetch(`/contacts/${contactId}`);
 const data = await response.json();

 // Store in cache
 this.cache.set(cacheKey, { data, timestamp: Date.now() });

 return data;
 }

 clearCache() {
 this.cache.clear();
 }
}

// Usage
const client = new CachedClient(3600); // Cache for 1 hour
const contact1 = await client.getContact('c1'); // API call
const contact1Again = await client.getContact('c1'); // Cache hit!
 
 
 Strategy 6: WebSocket Subscriptions (Real-time, Low Overhead) 
 Instead of polling, use WebSocket for real-time updates: 
 Polling (Uses many requests) 
 // Poll every 10 seconds = 6 requests/min per user
setInterval(async () => {
 const status = await fetch('/users/user-123?expand=presence');
 // Expensive for many users!
}, 10000);
 
 WebSocket (1 connection) 
 // Single WebSocket connection for real-time updates
const ws = new WebSocket('wss://...');

ws.onmessage = (event) => {
 const { userId, presence } = JSON.parse(event.data);
 console.log(`User ${userId} is now ${presence}`);
};

// Can handle thousands of users on one connection!
 
 Benefit: Eliminates polling entirely. 
 
 Rate Limit Calculation 
 Scenario 1: Simple API calls 
 10,000 contacts to sync
1 API call per contact = 10,000 calls
Rate limit: 600/minute
Time needed: 10,000 / 600 = 16.67 minutes

Strategy: Use bulk endpoint (1 call) or batch in 600-request chunks
 
 Scenario 2: Contact lookup every second 
 100 concurrent agents
Each looks up contact every second
= 100 requests/second = 6,000/minute
Rate limit: 600/minute
You'd exceed limit 10x over!

Strategy: Add 100ms delay between requests
= 10 requests/sec = 600/minute (perfect!)
 
 Scenario 3: Mixed workload 
 - Sync contacts: 1000 requests (use bulk)
- Agent presence updates: 100 requests/minute (natural rate)
- Search queries: variable (depends on usage)
- Reporting: 50 requests/day (batch at night)

Total: 1000 + 100 + variable + ~2 = Safe if variable < 500/min
 
 
 Monitoring & Alerting 
 Track rate limit over time 
 class RateLimitMonitor {
 constructor() {
 this.history = [];
 }

 record(remaining, limit) {
 const now = new Date();
 const percentUsed = ((limit - remaining) / limit) * 100;
 
 this.history.push({ now, remaining, percentUsed });

 // Alert if trend is concerning
 if (this.history.length > 10) {
 const recentAverage = this.history
 .slice(-10)
 .reduce((sum, h) => sum + h.percentUsed, 0) / 10;

 if (recentAverage > 80) {
 console.warn('⚠️ Average usage >80%. Trending toward limit!');
 }
 }
 }

 report() {
 const avg = this.history.reduce((sum, h) => sum + h.percentUsed, 0) / this.history.length;
 const max = Math.max(...this.history.map(h => h.percentUsed));
 
 console.log(`
 Rate Limit Usage Report:
 Average: ${avg.toFixed(1)}%
 Peak: ${max.toFixed(1)}%
 Samples: ${this.history.length}
 `);
 }
}
 
 
 Recommended Approach: Tiered Strategy 
 Tier 1 - Proactive (Always do this) 
 
 Monitor X-Rate-Limit-Remaining on every request 
 Implement throttling (8 req/sec = 480/min, safe) 
 Batch operations when possible 
 
 Tier 2 - Reactive (If approaching limit) 
 
 Reduce request frequency further 
 Implement caching 
 Queue requests instead of fire-and-forget 
 
 Tier 3 - Emergency (If hitting limit) 
 
 Implement exponential backoff 
 Stop new requests 
 Alert operations team 
 
 
 Best Practices 
 
 Monitor proactively - Don't wait for 429 errors 
 Use bulk endpoints - Single call for multiple records 
 Implement throttling - Spread requests evenly 
 Cache aggressively - Don't re-fetch same data 
 Use WebSockets - For real-time subscriptions 
 Batch requests - Process in groups, not individually 
 Set timeouts - Don't retry forever 
 Alert operations - Know when limit is approached 
 
 
 Common Mistakes 
 ❌ Fire-and-forget requests - No rate limit awareness 
✅ Monitor headers, throttle proactively 
 ❌ Polling instead of WebSocket - Wastes requests 
✅ Use WebSocket for real-time data 
 ❌ Individual API calls in loop - 1000 calls instead of 1 
✅ Use bulk endpoints, batch in groups 
 ❌ Ignore rate limit warnings - Hit limit unexpectedly 
✅ Monitor, reduce frequency before limit 
 ❌ Unlimited retry - Keep hammering API 
✅ Exponential backoff, respect Retry-After 
 
 Related Topics 
 
 Chapter 11: Error Handling & Retry Strategy 
 Chapter 11: API Endpoints Reference 
 Chapter 5: Data Actions (rate limiting in flows)