Skip to main content

Rate Limiting, Token Management & Performance

API Rate Limiting

Genesys Cloud Scale:

Volume:
├─ 8+ billion API requests per week
├─ Automatically scaling infrastructure
├─ Multiple microservices (hundreds)
├─ Global deployment (multiple regions)
└─ Protection against abuse

Rate Limiting Purpose:

Protect Platform Stability:
├─ Prevent denial-of-service attacks
├─ Ensure fair access for all
├─ Protect against runaway applications
├─ Maintain performance for everyone
└─ Distribute resources fairly

Standard Rate Limits:

Authorization Code: 60 requests/minute per user session
Client Credentials: 60 requests/minute per application
SCIM Integration: 120 requests/minute per application
Enterprise: Custom limits available

Per-Microservice Limits:
├─ Each service has own limits
├─ Different services, different limits
├─ Aggregate across all services
└─ Total impact depends on mix

Detecting Rate Limits

Response Headers:

X-Rate-Limit-Limit: 60
├─ Maximum requests allowed per minute
└─ Standard: 60, SCIM: 120

X-Rate-Limit-Remaining: 42
├─ Requests remaining in current window
├─ Monitor this value
├─ When low: Start proactive management
└─ At 0: Next request will be rate limited

X-Rate-Limit-Reset: 1234567890
├─ Unix timestamp when limits reset
├─ Countdown until fresh window
└─ Use for retry calculations

Rate Limited Response:

HTTP 429 Too Many Requests

{
  "error": {
    "message": "Rate limit exceeded",
    "code": "RATE_LIMIT",
    "status": 429
  }
}

Retry-After: 60
├─ Seconds to wait before retry
├─ Recommended wait time
├─ Should respect this value
└─ Minimum wait suggested

Handling 429 Response:

1. Detect Status Code:
   ├─ Check for 429
   └─ Act immediately

2. Check Retry-After Header:
   ├─ Wait specified seconds
   └─ Example: 60 seconds

3. Implement Backoff:
   ├─ First retry: 3 seconds
   ├─ Second retry: 9 seconds
   ├─ Third retry: 27 seconds
   └─ Increment: 5-minute intervals

4. Retry Request:
   ├─ After waiting
   ├─ Same request parameters
   └─ Should succeed

5. If Still Failed:
   ├─ Contact Genesys
   ├─ Provide correlation ID
   ├─ May need custom limit
   └─ Provide usage data

Token Management

Token Lifecycle:

Creation:
├─ User authenticates
├─ Access token issued
├─ Refresh token provided (Auth Code only)
├─ Timestamp noted
└─ Token valid from this point

Active Use:
├─ Included in every API request
├─ Format: "Authorization: Bearer {token}"
├─ Server validates on each request
├─ Token proves authenticated access
└─ Scopes enforced

Expiration Tracking:

Store Expiration Time:
  expiresAt = now + (expires_in * 1000)  // milliseconds

Proactive Check (Recommended):
  if (now + 5min >= expiresAt) {
    refresh_token()  // Get new token
  }

Reactive Check (Less Ideal):
  if (401_response) {
    refresh_token()  // Try again
    retry_request()
  }

Refresh Token Lifecycle:

Obtained:
├─ With access token (Authorization Code Grant)
├─ Not with Client Credentials
├─ Long-lived (30 days default, 450 days max)
└─ Stored securely

Refresh:
├─ Use refresh_token to get new access_token
├─ Happens on demand
├─ New refresh_token provided (optional)
└─ Keep updating refresh_token

Expiration:
├─ After configured duration
├─ Automatic cleanup
├─ Cannot extend manually
└─ Must re-authenticate if needed

Revocation:

On Logout:
  DELETE /oauth/sessions/me
  Authorization: Bearer {access_token}

Effect:
├─ Access token immediately invalid
├─ Refresh token immediately invalid
├─ User logged out
└─ New login required

Manual Revocation:
├─ Admin can delete OAuth client
├─ All tokens from that client invalid
├─ Immediate effect
└─ Audit log records action

Token Storage Best Practices:

Browser Applications:

DO:
├─ Store in memory (volatile)
├─ SessionStorage (cleared on close)
├─ Secure HTTP-only cookies
└─ Temporary locations only

DON'T:
├─ LocalStorage (persistent, exposed)
├─ Plain text
├─ Unencrypted
└─ Accessible to JavaScript

Backend Applications:

DO:
├─ Encrypted storage (database)
├─ Cache with expiration
├─ Environment variables
├─ Secure vault
└─ Limited lifetime

DON'T:
├─ Plain text
├─ Version control
├─ Logs
├─ Comments
└─ Hardcoded

Token Refresh Implementation:

JavaScript Example:
```javascript
const tokenExpiry = Date.now() + (response.expiresIn * 1000);

// Proactive refresh (recommended)
setInterval(() => {
  if (Date.now() >= tokenExpiry - 5*60*1000) {
    // Token expiring in 5 minutes
    refreshToken();
  }
}, 60000); // Check every minute

async function refreshToken() {
  const response = await fetch(tokenUrl, {
    method: 'POST',
    body: new URLSearchParams({
      grant_type: 'refresh_token',
      refresh_token: savedRefreshToken,
      ...credentials
    })
  });
  
  const data = await response.json();
  saveAccessToken(data.access_token);
  tokenExpiry = Date.now() + (data.expiresIn * 1000);
}

Python Example:

from datetime import datetime, timedelta

token_expiry = datetime.now() + timedelta(seconds=response['expires_in'])

# Proactive refresh
if datetime.now() >= token_expiry - timedelta(minutes=5):
    # Token expiring in 5 minutes
    refresh_token()

def refresh_token():
    response = requests.post(token_url, data={
        'grant_type': 'refresh_token',
        'refresh_token': saved_refresh_token,
        **credentials
    })
    
    data = response.json()
    global token_expiry
    saved_access_token = data['access_token']
    token_expiry = datetime.now() + timedelta(seconds=data['expires_in'])

---

## Performance Optimization

Optimization Strategy Hierarchy:

Priority 1: Use Bulk/Batch APIs ├─ Reduce 10,000 requests to 1-2 ├─ 99.99% reduction ├─ Most important optimization └─ Example: POST /conversations/batch

Priority 2: Use WebSocket Notifications ├─ Replace polling with events ├─ 99% reduction in requests ├─ Real-time data delivery └─ Subscribe to /v2/users/{id}/presence

Priority 3: Implement Caching ├─ Avoid repeat requests ├─ Cache with expiration ├─ 50-90% reduction └─ Example: Cache user list for 1 hour

Priority 4: Use Pagination ├─ Don't retrieve all records at once ├─ Request only needed fields ├─ Reduce payload size └─ Server-side filtering

Priority 5: Asynchronous Processing ├─ Don't block on API calls ├─ Queue requests ├─ Process in background └─ Better overall throughput

Specific Use Cases:

Use Case: Query 10,000 Conversations

Naive Approach: for i in 1..10000: GET /api/v2/conversations/{i} // 10,000 requests!

Problem: ├─ Rate limited at 60 requests/minute ├─ Takes 166+ minutes ├─ 99.99% inefficient └─ Cannot meet deadline

Optimized Approach: GET /api/v2/analytics/conversations/details?... // 1-2 requests!

Benefits: ├─ 1-2 requests vs 10,000 ├─ Completes in seconds ├─ No rate limiting └─ 99.99% better

Use Case: Monitor Agent Presence

Naive Approach: every 1 second: GET /api/v2/users/{id}/presence // 60 req/min per agent

Problem: ├─ 60 requests/minute per agent ├─ 100 agents = 6,000 req/min ├─ Hits rate limit immediately └─ Wasted bandwidth

Optimized Approach: WebSocket subscribe: /v2/users/{id}/presence

Benefits: ├─ Real-time event delivery ├─ 0 polling requests ├─ Lower latency ├─ No rate limiting impact └─ Event-driven architecture

Use Case: Create 10,000 Contacts

Naive Approach: for contact in contacts: POST /api/v2/externalcontacts/contacts // 10,000 requests

Problem: ├─ 10,000 individual requests ├─ Lock contention in database ├─ High API overhead ├─ Slow execution (hours) └─ Rate limiting likely

Optimized Approach: POST /api/v2/externalcontacts/contacts/bulk [array of 500 contacts] // 20 requests!

Benefits: ├─ 20 requests vs 10,000 ├─ Completes in minutes ├─ Reduced overhead ├─ No lock contention └─ Within rate limits

Field Selection Optimization:

Naive: GET /api/v2/users

Returns ALL fields (large payload)

Optimized: GET /api/v2/users?fields=id,email,name

Returns ONLY needed fields (smaller payload)

Benefits: ├─ Reduced bandwidth ├─ Faster response ├─ Lower processing └─ Better performance

Filter Server-Side:

Naive: GET /api/v2/users // Get all filter in code // Filter locally

Optimized: GET /api/v2/users?q=active:true // Server filters

Benefits: ├─ Smaller response ├─ Faster network ├─ Server-side indexes └─ Better performance


---

## Backoff Strategies

Exponential Backoff Standard:

Real-Time Applications: ├─ Tolerance: Few retries (3-5 max) ├─ Max wait: 10-30 seconds ├─ Then: Alert user or fail gracefully └─ Example: UI interactions

Batch Applications: ├─ Tolerance: Many retries (10-20+) ├─ Max wait: Hours if needed ├─ Continue retrying: Until success └─ Example: Nightly sync jobs

Implementation:

JavaScript:

async function apiCallWithBackoff(url, options, maxRetries = 5) {
  for (let attempt = 0; attempt <= maxRetries; attempt++) {
    try {
      const response = await fetch(url, options);
      
      if (response.status === 429) {
        // Rate limited
        const retryAfter = response.headers.get('Retry-After') || 60;
        const waitTime = Math.pow(3, attempt) * 1000;
        const finalWait = Math.max(waitTime, retryAfter * 1000);
        
        console.log(`Rate limited, waiting ${finalWait}ms`);
        await sleep(finalWait);
        continue; // Retry
      }
      
      if (!response.ok) {
        throw new Error(`HTTP ${response.status}`);
      }
      
      return response.json(); // Success
      
    } catch (error) {
      if (attempt === maxRetries) {
        throw error; // Give up
      }
      
      const waitTime = Math.pow(3, attempt) * 1000;
      console.log(`Attempt ${attempt + 1} failed, retrying in ${waitTime}ms`);
      await sleep(waitTime);
    }
  }
}

function sleep(ms) {
  return new Promise(resolve => setTimeout(resolve, ms));
}

Python:

import time
from requests.adapters import HTTPAdapter
from urllib3.util.retry import Retry

def requests_with_backoff(session, method, url, **kwargs):
    # Configure retry strategy
    retry = Retry(
        total=5,
        backoff_factor=3,  # 3, 9, 27, ... seconds
        status_forcelist=[429, 502, 503, 504]
    )
    
    adapter = HTTPAdapter(max_retries=retry)
    session.mount('http://', adapter)
    session.mount('https://', adapter)
    
    # Make request with automatic backoff
    return session.request(method, url, **kwargs)

# Usage
import requests
session = requests.Session()
response = requests_with_backoff(session, 'GET', api_url)

---

## Monitoring & Alerting

What to Monitor:

Rate Limit Health: ├─ X-Rate-Limit-Remaining per request ├─ Alert if below 10 ├─ Alert if 429 responses ├─ Track pattern over time └─ Identify bottlenecks

Token Expiration: ├─ Track token age ├─ Alert on tokens near expiry ├─ Monitor refresh failures ├─ Detect refresh token issues └─ Plan proactive refreshes

API Performance: ├─ Request latency (p50, p95, p99) ├─ Response times trending ├─ Error rates by type ├─ Endpoint-specific metrics └─ Regional differences

Application Health: ├─ Authentication success rate ├─ Failed requests trend ├─ Retry frequency ├─ Unusual access patterns └─ Error type distribution

Metrics to Track:

Requests/Minute: ├─ Actual vs limit ├─ Trend over time ├─ Peak times ├─ Per endpoint └─ Per user/application

Success Rate: ├─ 2XX responses ├─ 4XX errors (auth, scope) ├─ 5XX errors (server issues) ├─ 429 rate limit └─ 401 token expired

Average Response Time: ├─ Overall ├─ By endpoint ├─ By region ├─ Trend detection └─ Outlier identification

Token Metrics: ├─ Tokens created ├─ Tokens refreshed ├─ Refresh failures ├─ Token age └─ Expiration near events

Alerting Thresholds:

Critical (Immediate): ├─ 429 rate limited for 5+ min ├─ 401 auth failures spike ├─ 503 service unavailable ├─ High error rate (>10%) └─ P99 latency spike

Warning (Within 30 min): ├─ 429 rate limited ├─ Approaching rate limit ├─ Token refresh failures ├─ Error rate increasing └─ Latency degrading

Informational (Daily): ├─ API usage report ├─ Performance summary ├─ Token refresh count ├─ Endpoint breakdown └─ Trend analysis

Dashboard Example:

Current Status: ├─ Requests/min: 45 of 60 (75%) ├─ Success rate: 99.8% ├─ Avg latency: 245ms ├─ Active tokens: 3 └─ Last refresh: 2 hours ago

Recent Alerts: ├─ None currently active ├─ Last alert: 3 days ago (resolved) └─ Next review: Today 5:00 PM

Trending: ├─ Requests/min: ↑ +5% this week ├─ Latency: ↓ -10% this month ├─ Errors: ↓ -3% this week └─ Status: Healthy


---

## Error Handling

HTTP Status Codes:

Retryable (Use Backoff): ├─ 429 Too Many Requests (rate limit) ├─ 502 Bad Gateway (temp infrastructure) ├─ 503 Service Unavailable (maintenance) ├─ 504 Gateway Timeout (temp slowness) └─ Action: Wait and retry

Client Errors (Don't Retry): ├─ 400 Bad Request (fix format) ├─ 401 Unauthorized (refresh token) ├─ 403 Forbidden (add scope/permission) ├─ 404 Not Found (resource missing) ├─ 405 Method Not Allowed (wrong HTTP verb) └─ Action: Fix and retry (or fail)

Server Errors (Usually Retryable): ├─ 500 Internal Server Error (try again) ├─ 502 Bad Gateway (usually temp) ├─ 503 Service Unavailable (usually temp) ├─ 504 Gateway Timeout (usually temp) └─ Action: Backoff and retry

Decision Tree:

Is Response Successful (2XX)? ├─ Yes → Return data, success └─ No → Check status code

Is Status 401 (Unauthorized)? ├─ Yes → Refresh token, retry └─ No → Continue

Is Status 403 (Forbidden)? ├─ Yes → Check scope/permission, error └─ No → Continue

Is Status 4XX (Other Client)? ├─ Yes → Log error, fail (don't retry) └─ No → Continue

Is Status 429 (Rate Limited)? ├─ Yes → Backoff, retry └─ No → Continue

Is Status 5XX or Other? ├─ Yes → Backoff, retry └─ No → Log, fail

Implementation:

def handle_api_response(response):
    if response.status_code in [200, 201, 204]:
        return response.json() if response.text else None
    
    elif response.status_code == 401:
        # Unauthorized - refresh token
        refresh_token()
        raise RetryException("Token refreshed, retry")
    
    elif response.status_code == 403:
        # Forbidden - permission/scope issue
        log_error(f"Access denied: {response.json()}")
        raise PermissionException("Insufficient permissions")
    
    elif response.status_code == 404:
        # Not found
        raise NotFoundException("Resource not found")
    
    elif response.status_code == 429:
        # Rate limited
        retry_after = response.headers.get('Retry-After', 60)
        raise RateLimitException(f"Retry after {retry_after}s")
    
    elif response.status_code >= 500:
        # Server error - retry
        raise ServerException(f"HTTP {response.status_code}")
    
    else:
        # Other error
        raise APIException(f"HTTP {response.status_code}")

---

## Key Takeaways: Chapter 7

- **Rate Limits Exist** - 60 req/min standard, per application
- **Monitor Headers** - X-Rate-Limit-Remaining tells story
- **Exponential Backoff** - 3, 9, 27 second strategy
- **Token Lifespan** - 1 hour (configurable), proactive refresh recommended
- **Bulk APIs Critical** - 99.99% reduction in requests
- **WebSocket Events** - 99% reduction in polling
- **Caching Helps** - 50-90% reduction in repeat queries
- **Error Handling** - 429/5XX retry, 4XX (except 401) fail

---

## Interview Prep

| Question | Answer |
|---|---|
| Rate limit? | 60 requests/minute per application (standard) |
| 429 handling? | Exponential backoff: 3s → 9s → 27s |
| Token lifetime? | 1 hour default (configurable 300-172,800 sec) |
| Token refresh? | When expiring, use refresh_token to get new |
| Bulk API benefit? | Reduce 10,000 requests to 1-2 (99.99% saving) |
| WebSocket benefit? | Replace polling, event-driven, 99% reduction |
| Caching benefit? | Avoid repeat queries, 50-90% reduction |
| 401 handling? | Refresh token, get new access_token |
| 403 handling? | Add missing scope or permission, fail |
| Backoff factor? | Exponential: 3^(attempt) seconds |

---

## Document Version
**Chapter**: 7 of 8  
**Last Updated**: March 2026  
**Status**: Current with OAuth 2.0 standards  
**Scope**: Rate limiting, token management, performance, error handling