Rate Limiting, Token Management & Performance
API Rate Limiting
Genesys Cloud Scale:
Volume:
├─ 8+ billion API requests per week
├─ Automatically scaling infrastructure
├─ Multiple microservices (hundreds)
├─ Global deployment (multiple regions)
└─ Protection against abuse
Rate Limiting Purpose:
Protect Platform Stability:
├─ Prevent denial-of-service attacks
├─ Ensure fair access for all
├─ Protect against runaway applications
├─ Maintain performance for everyone
└─ Distribute resources fairly
Standard Rate Limits:
Authorization Code: 60 requests/minute per user session
Client Credentials: 60 requests/minute per application
SCIM Integration: 120 requests/minute per application
Enterprise: Custom limits available
Per-Microservice Limits:
├─ Each service has own limits
├─ Different services, different limits
├─ Aggregate across all services
└─ Total impact depends on mix
Detecting Rate Limits
Response Headers:
X-Rate-Limit-Limit: 60
├─ Maximum requests allowed per minute
└─ Standard: 60, SCIM: 120
X-Rate-Limit-Remaining: 42
├─ Requests remaining in current window
├─ Monitor this value
├─ When low: Start proactive management
└─ At 0: Next request will be rate limited
X-Rate-Limit-Reset: 1234567890
├─ Unix timestamp when limits reset
├─ Countdown until fresh window
└─ Use for retry calculations
Rate Limited Response:
HTTP 429 Too Many Requests
{
"error": {
"message": "Rate limit exceeded",
"code": "RATE_LIMIT",
"status": 429
}
}
Retry-After: 60
├─ Seconds to wait before retry
├─ Recommended wait time
├─ Should respect this value
└─ Minimum wait suggested
Handling 429 Response:
1. Detect Status Code:
├─ Check for 429
└─ Act immediately
2. Check Retry-After Header:
├─ Wait specified seconds
└─ Example: 60 seconds
3. Implement Backoff:
├─ First retry: 3 seconds
├─ Second retry: 9 seconds
├─ Third retry: 27 seconds
└─ Increment: 5-minute intervals
4. Retry Request:
├─ After waiting
├─ Same request parameters
└─ Should succeed
5. If Still Failed:
├─ Contact Genesys
├─ Provide correlation ID
├─ May need custom limit
└─ Provide usage data
Token Management
Token Lifecycle:
Creation:
├─ User authenticates
├─ Access token issued
├─ Refresh token provided (Auth Code only)
├─ Timestamp noted
└─ Token valid from this point
Active Use:
├─ Included in every API request
├─ Format: "Authorization: Bearer {token}"
├─ Server validates on each request
├─ Token proves authenticated access
└─ Scopes enforced
Expiration Tracking:
Store Expiration Time:
expiresAt = now + (expires_in * 1000) // milliseconds
Proactive Check (Recommended):
if (now + 5min >= expiresAt) {
refresh_token() // Get new token
}
Reactive Check (Less Ideal):
if (401_response) {
refresh_token() // Try again
retry_request()
}
Refresh Token Lifecycle:
Obtained:
├─ With access token (Authorization Code Grant)
├─ Not with Client Credentials
├─ Long-lived (30 days default, 450 days max)
└─ Stored securely
Refresh:
├─ Use refresh_token to get new access_token
├─ Happens on demand
├─ New refresh_token provided (optional)
└─ Keep updating refresh_token
Expiration:
├─ After configured duration
├─ Automatic cleanup
├─ Cannot extend manually
└─ Must re-authenticate if needed
Revocation:
On Logout:
DELETE /oauth/sessions/me
Authorization: Bearer {access_token}
Effect:
├─ Access token immediately invalid
├─ Refresh token immediately invalid
├─ User logged out
└─ New login required
Manual Revocation:
├─ Admin can delete OAuth client
├─ All tokens from that client invalid
├─ Immediate effect
└─ Audit log records action
Token Storage Best Practices:
Browser Applications:
DO:
├─ Store in memory (volatile)
├─ SessionStorage (cleared on close)
├─ Secure HTTP-only cookies
└─ Temporary locations only
DON'T:
├─ LocalStorage (persistent, exposed)
├─ Plain text
├─ Unencrypted
└─ Accessible to JavaScript
Backend Applications:
DO:
├─ Encrypted storage (database)
├─ Cache with expiration
├─ Environment variables
├─ Secure vault
└─ Limited lifetime
DON'T:
├─ Plain text
├─ Version control
├─ Logs
├─ Comments
└─ Hardcoded
Token Refresh Implementation:
JavaScript Example:
```javascript
const tokenExpiry = Date.now() + (response.expiresIn * 1000);
// Proactive refresh (recommended)
setInterval(() => {
if (Date.now() >= tokenExpiry - 5*60*1000) {
// Token expiring in 5 minutes
refreshToken();
}
}, 60000); // Check every minute
async function refreshToken() {
const response = await fetch(tokenUrl, {
method: 'POST',
body: new URLSearchParams({
grant_type: 'refresh_token',
refresh_token: savedRefreshToken,
...credentials
})
});
const data = await response.json();
saveAccessToken(data.access_token);
tokenExpiry = Date.now() + (data.expiresIn * 1000);
}
Python Example:
from datetime import datetime, timedelta
token_expiry = datetime.now() + timedelta(seconds=response['expires_in'])
# Proactive refresh
if datetime.now() >= token_expiry - timedelta(minutes=5):
# Token expiring in 5 minutes
refresh_token()
def refresh_token():
response = requests.post(token_url, data={
'grant_type': 'refresh_token',
'refresh_token': saved_refresh_token,
**credentials
})
data = response.json()
global token_expiry
saved_access_token = data['access_token']
token_expiry = datetime.now() + timedelta(seconds=data['expires_in'])
---
## Performance Optimization
Optimization Strategy Hierarchy:
Priority 1: Use Bulk/Batch APIs ├─ Reduce 10,000 requests to 1-2 ├─ 99.99% reduction ├─ Most important optimization └─ Example: POST /conversations/batch
Priority 2: Use WebSocket Notifications ├─ Replace polling with events ├─ 99% reduction in requests ├─ Real-time data delivery └─ Subscribe to /v2/users/{id}/presence
Priority 3: Implement Caching ├─ Avoid repeat requests ├─ Cache with expiration ├─ 50-90% reduction └─ Example: Cache user list for 1 hour
Priority 4: Use Pagination ├─ Don't retrieve all records at once ├─ Request only needed fields ├─ Reduce payload size └─ Server-side filtering
Priority 5: Asynchronous Processing ├─ Don't block on API calls ├─ Queue requests ├─ Process in background └─ Better overall throughput
Specific Use Cases:
Use Case: Query 10,000 Conversations
Naive Approach: for i in 1..10000: GET /api/v2/conversations/{i} // 10,000 requests!
Problem: ├─ Rate limited at 60 requests/minute ├─ Takes 166+ minutes ├─ 99.99% inefficient └─ Cannot meet deadline
Optimized Approach: GET /api/v2/analytics/conversations/details?... // 1-2 requests!
Benefits: ├─ 1-2 requests vs 10,000 ├─ Completes in seconds ├─ No rate limiting └─ 99.99% better
Use Case: Monitor Agent Presence
Naive Approach: every 1 second: GET /api/v2/users/{id}/presence // 60 req/min per agent
Problem: ├─ 60 requests/minute per agent ├─ 100 agents = 6,000 req/min ├─ Hits rate limit immediately └─ Wasted bandwidth
Optimized Approach: WebSocket subscribe: /v2/users/{id}/presence
Benefits: ├─ Real-time event delivery ├─ 0 polling requests ├─ Lower latency ├─ No rate limiting impact └─ Event-driven architecture
Use Case: Create 10,000 Contacts
Naive Approach: for contact in contacts: POST /api/v2/externalcontacts/contacts // 10,000 requests
Problem: ├─ 10,000 individual requests ├─ Lock contention in database ├─ High API overhead ├─ Slow execution (hours) └─ Rate limiting likely
Optimized Approach: POST /api/v2/externalcontacts/contacts/bulk [array of 500 contacts] // 20 requests!
Benefits: ├─ 20 requests vs 10,000 ├─ Completes in minutes ├─ Reduced overhead ├─ No lock contention └─ Within rate limits
Field Selection Optimization:
Naive: GET /api/v2/users
Returns ALL fields (large payload)
Optimized: GET /api/v2/users?fields=id,email,name
Returns ONLY needed fields (smaller payload)
Benefits: ├─ Reduced bandwidth ├─ Faster response ├─ Lower processing └─ Better performance
Filter Server-Side:
Naive: GET /api/v2/users // Get all filter in code // Filter locally
Optimized: GET /api/v2/users?q=active:true // Server filters
Benefits: ├─ Smaller response ├─ Faster network ├─ Server-side indexes └─ Better performance
---
## Backoff Strategies
Exponential Backoff Standard:
Recommended Timing: ├─ First retry: 3 seconds ├─ Second retry: 9 seconds ├─ Third retry: 27 seconds ├─ Fourth retry: 5 minutes + retry ├─ Fifth retry: 10 minutes + retry └─ Continue as needed
Real-Time Applications: ├─ Tolerance: Few retries (3-5 max) ├─ Max wait: 10-30 seconds ├─ Then: Alert user or fail gracefully └─ Example: UI interactions
Batch Applications: ├─ Tolerance: Many retries (10-20+) ├─ Max wait: Hours if needed ├─ Continue retrying: Until success └─ Example: Nightly sync jobs
Implementation:
JavaScript:
async function apiCallWithBackoff(url, options, maxRetries = 5) {
for (let attempt = 0; attempt <= maxRetries; attempt++) {
try {
const response = await fetch(url, options);
if (response.status === 429) {
// Rate limited
const retryAfter = response.headers.get('Retry-After') || 60;
const waitTime = Math.pow(3, attempt) * 1000;
const finalWait = Math.max(waitTime, retryAfter * 1000);
console.log(`Rate limited, waiting ${finalWait}ms`);
await sleep(finalWait);
continue; // Retry
}
if (!response.ok) {
throw new Error(`HTTP ${response.status}`);
}
return response.json(); // Success
} catch (error) {
if (attempt === maxRetries) {
throw error; // Give up
}
const waitTime = Math.pow(3, attempt) * 1000;
console.log(`Attempt ${attempt + 1} failed, retrying in ${waitTime}ms`);
await sleep(waitTime);
}
}
}
function sleep(ms) {
return new Promise(resolve => setTimeout(resolve, ms));
}
Python:
import time
from requests.adapters import HTTPAdapter
from urllib3.util.retry import Retry
def requests_with_backoff(session, method, url, **kwargs):
# Configure retry strategy
retry = Retry(
total=5,
backoff_factor=3, # 3, 9, 27, ... seconds
status_forcelist=[429, 502, 503, 504]
)
adapter = HTTPAdapter(max_retries=retry)
session.mount('http://', adapter)
session.mount('https://', adapter)
# Make request with automatic backoff
return session.request(method, url, **kwargs)
# Usage
import requests
session = requests.Session()
response = requests_with_backoff(session, 'GET', api_url)
---
## Monitoring & Alerting
What to Monitor:
Rate Limit Health: ├─ X-Rate-Limit-Remaining per request ├─ Alert if below 10 ├─ Alert if 429 responses ├─ Track pattern over time └─ Identify bottlenecks
Token Expiration: ├─ Track token age ├─ Alert on tokens near expiry ├─ Monitor refresh failures ├─ Detect refresh token issues └─ Plan proactive refreshes
API Performance: ├─ Request latency (p50, p95, p99) ├─ Response times trending ├─ Error rates by type ├─ Endpoint-specific metrics └─ Regional differences
Application Health: ├─ Authentication success rate ├─ Failed requests trend ├─ Retry frequency ├─ Unusual access patterns └─ Error type distribution
Metrics to Track:
Requests/Minute: ├─ Actual vs limit ├─ Trend over time ├─ Peak times ├─ Per endpoint └─ Per user/application
Success Rate: ├─ 2XX responses ├─ 4XX errors (auth, scope) ├─ 5XX errors (server issues) ├─ 429 rate limit └─ 401 token expired
Average Response Time: ├─ Overall ├─ By endpoint ├─ By region ├─ Trend detection └─ Outlier identification
Token Metrics: ├─ Tokens created ├─ Tokens refreshed ├─ Refresh failures ├─ Token age └─ Expiration near events
Alerting Thresholds:
Critical (Immediate): ├─ 429 rate limited for 5+ min ├─ 401 auth failures spike ├─ 503 service unavailable ├─ High error rate (>10%) └─ P99 latency spike
Warning (Within 30 min): ├─ 429 rate limited ├─ Approaching rate limit ├─ Token refresh failures ├─ Error rate increasing └─ Latency degrading
Informational (Daily): ├─ API usage report ├─ Performance summary ├─ Token refresh count ├─ Endpoint breakdown └─ Trend analysis
Dashboard Example:
Current Status: ├─ Requests/min: 45 of 60 (75%) ├─ Success rate: 99.8% ├─ Avg latency: 245ms ├─ Active tokens: 3 └─ Last refresh: 2 hours ago
Recent Alerts: ├─ None currently active ├─ Last alert: 3 days ago (resolved) └─ Next review: Today 5:00 PM
Trending: ├─ Requests/min: ↑ +5% this week ├─ Latency: ↓ -10% this month ├─ Errors: ↓ -3% this week └─ Status: Healthy
---
## Error Handling
HTTP Status Codes:
Retryable (Use Backoff): ├─ 429 Too Many Requests (rate limit) ├─ 502 Bad Gateway (temp infrastructure) ├─ 503 Service Unavailable (maintenance) ├─ 504 Gateway Timeout (temp slowness) └─ Action: Wait and retry
Client Errors (Don't Retry): ├─ 400 Bad Request (fix format) ├─ 401 Unauthorized (refresh token) ├─ 403 Forbidden (add scope/permission) ├─ 404 Not Found (resource missing) ├─ 405 Method Not Allowed (wrong HTTP verb) └─ Action: Fix and retry (or fail)
Server Errors (Usually Retryable): ├─ 500 Internal Server Error (try again) ├─ 502 Bad Gateway (usually temp) ├─ 503 Service Unavailable (usually temp) ├─ 504 Gateway Timeout (usually temp) └─ Action: Backoff and retry
Decision Tree:
Is Response Successful (2XX)? ├─ Yes → Return data, success └─ No → Check status code
Is Status 401 (Unauthorized)? ├─ Yes → Refresh token, retry └─ No → Continue
Is Status 403 (Forbidden)? ├─ Yes → Check scope/permission, error └─ No → Continue
Is Status 4XX (Other Client)? ├─ Yes → Log error, fail (don't retry) └─ No → Continue
Is Status 429 (Rate Limited)? ├─ Yes → Backoff, retry └─ No → Continue
Is Status 5XX or Other? ├─ Yes → Backoff, retry └─ No → Log, fail
Implementation:
def handle_api_response(response):
if response.status_code in [200, 201, 204]:
return response.json() if response.text else None
elif response.status_code == 401:
# Unauthorized - refresh token
refresh_token()
raise RetryException("Token refreshed, retry")
elif response.status_code == 403:
# Forbidden - permission/scope issue
log_error(f"Access denied: {response.json()}")
raise PermissionException("Insufficient permissions")
elif response.status_code == 404:
# Not found
raise NotFoundException("Resource not found")
elif response.status_code == 429:
# Rate limited
retry_after = response.headers.get('Retry-After', 60)
raise RateLimitException(f"Retry after {retry_after}s")
elif response.status_code >= 500:
# Server error - retry
raise ServerException(f"HTTP {response.status_code}")
else:
# Other error
raise APIException(f"HTTP {response.status_code}")
---
## Key Takeaways: Chapter 7
- **Rate Limits Exist** - 60 req/min standard, per application
- **Monitor Headers** - X-Rate-Limit-Remaining tells story
- **Exponential Backoff** - 3, 9, 27 second strategy
- **Token Lifespan** - 1 hour (configurable), proactive refresh recommended
- **Bulk APIs Critical** - 99.99% reduction in requests
- **WebSocket Events** - 99% reduction in polling
- **Caching Helps** - 50-90% reduction in repeat queries
- **Error Handling** - 429/5XX retry, 4XX (except 401) fail
---
## Interview Prep
| Question | Answer |
|---|---|
| Rate limit? | 60 requests/minute per application (standard) |
| 429 handling? | Exponential backoff: 3s → 9s → 27s |
| Token lifetime? | 1 hour default (configurable 300-172,800 sec) |
| Token refresh? | When expiring, use refresh_token to get new |
| Bulk API benefit? | Reduce 10,000 requests to 1-2 (99.99% saving) |
| WebSocket benefit? | Replace polling, event-driven, 99% reduction |
| Caching benefit? | Avoid repeat queries, 50-90% reduction |
| 401 handling? | Refresh token, get new access_token |
| 403 handling? | Add missing scope or permission, fail |
| Backoff factor? | Exponential: 3^(attempt) seconds |
---
## Document Version
**Chapter**: 7 of 8
**Last Updated**: March 2026
**Status**: Current with OAuth 2.0 standards
**Scope**: Rate limiting, token management, performance, error handling
No Comments