# Rate Limiting, Token Management & Performance

## API Rate Limiting

```
Genesys Cloud Scale:

Volume:
├─ 8+ billion API requests per week
├─ Automatically scaling infrastructure
├─ Multiple microservices (hundreds)
├─ Global deployment (multiple regions)
└─ Protection against abuse

Rate Limiting Purpose:

Protect Platform Stability:
├─ Prevent denial-of-service attacks
├─ Ensure fair access for all
├─ Protect against runaway applications
├─ Maintain performance for everyone
└─ Distribute resources fairly

Standard Rate Limits:

Authorization Code: 60 requests/minute per user session
Client Credentials: 60 requests/minute per application
SCIM Integration: 120 requests/minute per application
Enterprise: Custom limits available

Per-Microservice Limits:
├─ Each service has own limits
├─ Different services, different limits
├─ Aggregate across all services
└─ Total impact depends on mix
```

---

## Detecting Rate Limits

```
Response Headers:

X-Rate-Limit-Limit: 60
├─ Maximum requests allowed per minute
└─ Standard: 60, SCIM: 120

X-Rate-Limit-Remaining: 42
├─ Requests remaining in current window
├─ Monitor this value
├─ When low: Start proactive management
└─ At 0: Next request will be rate limited

X-Rate-Limit-Reset: 1234567890
├─ Unix timestamp when limits reset
├─ Countdown until fresh window
└─ Use for retry calculations

Rate Limited Response:

HTTP 429 Too Many Requests

{
  "error": {
    "message": "Rate limit exceeded",
    "code": "RATE_LIMIT",
    "status": 429
  }
}

Retry-After: 60
├─ Seconds to wait before retry
├─ Recommended wait time
├─ Should respect this value
└─ Minimum wait suggested

Handling 429 Response:

1. Detect Status Code:
   ├─ Check for 429
   └─ Act immediately

2. Check Retry-After Header:
   ├─ Wait specified seconds
   └─ Example: 60 seconds

3. Implement Backoff:
   ├─ First retry: 3 seconds
   ├─ Second retry: 9 seconds
   ├─ Third retry: 27 seconds
   └─ Increment: 5-minute intervals

4. Retry Request:
   ├─ After waiting
   ├─ Same request parameters
   └─ Should succeed

5. If Still Failed:
   ├─ Contact Genesys
   ├─ Provide correlation ID
   ├─ May need custom limit
   └─ Provide usage data
```

---

## Token Management

```
Token Lifecycle:

Creation:
├─ User authenticates
├─ Access token issued
├─ Refresh token provided (Auth Code only)
├─ Timestamp noted
└─ Token valid from this point

Active Use:
├─ Included in every API request
├─ Format: "Authorization: Bearer {token}"
├─ Server validates on each request
├─ Token proves authenticated access
└─ Scopes enforced

Expiration Tracking:

Store Expiration Time:
  expiresAt = now + (expires_in * 1000)  // milliseconds

Proactive Check (Recommended):
  if (now + 5min >= expiresAt) {
    refresh_token()  // Get new token
  }

Reactive Check (Less Ideal):
  if (401_response) {
    refresh_token()  // Try again
    retry_request()
  }

Refresh Token Lifecycle:

Obtained:
├─ With access token (Authorization Code Grant)
├─ Not with Client Credentials
├─ Long-lived (30 days default, 450 days max)
└─ Stored securely

Refresh:
├─ Use refresh_token to get new access_token
├─ Happens on demand
├─ New refresh_token provided (optional)
└─ Keep updating refresh_token

Expiration:
├─ After configured duration
├─ Automatic cleanup
├─ Cannot extend manually
└─ Must re-authenticate if needed

Revocation:

On Logout:
  DELETE /oauth/sessions/me
  Authorization: Bearer {access_token}

Effect:
├─ Access token immediately invalid
├─ Refresh token immediately invalid
├─ User logged out
└─ New login required

Manual Revocation:
├─ Admin can delete OAuth client
├─ All tokens from that client invalid
├─ Immediate effect
└─ Audit log records action

Token Storage Best Practices:

Browser Applications:

DO:
├─ Store in memory (volatile)
├─ SessionStorage (cleared on close)
├─ Secure HTTP-only cookies
└─ Temporary locations only

DON'T:
├─ LocalStorage (persistent, exposed)
├─ Plain text
├─ Unencrypted
└─ Accessible to JavaScript

Backend Applications:

DO:
├─ Encrypted storage (database)
├─ Cache with expiration
├─ Environment variables
├─ Secure vault
└─ Limited lifetime

DON'T:
├─ Plain text
├─ Version control
├─ Logs
├─ Comments
└─ Hardcoded

Token Refresh Implementation:

JavaScript Example:
```javascript
const tokenExpiry = Date.now() + (response.expiresIn * 1000);

// Proactive refresh (recommended)
setInterval(() => {
  if (Date.now() >= tokenExpiry - 5*60*1000) {
    // Token expiring in 5 minutes
    refreshToken();
  }
}, 60000); // Check every minute

async function refreshToken() {
  const response = await fetch(tokenUrl, {
    method: 'POST',
    body: new URLSearchParams({
      grant_type: 'refresh_token',
      refresh_token: savedRefreshToken,
      ...credentials
    })
  });
  
  const data = await response.json();
  saveAccessToken(data.access_token);
  tokenExpiry = Date.now() + (data.expiresIn * 1000);
}
```

Python Example:
```python
from datetime import datetime, timedelta

token_expiry = datetime.now() + timedelta(seconds=response['expires_in'])

# Proactive refresh
if datetime.now() >= token_expiry - timedelta(minutes=5):
    # Token expiring in 5 minutes
    refresh_token()

def refresh_token():
    response = requests.post(token_url, data={
        'grant_type': 'refresh_token',
        'refresh_token': saved_refresh_token,
        **credentials
    })
    
    data = response.json()
    global token_expiry
    saved_access_token = data['access_token']
    token_expiry = datetime.now() + timedelta(seconds=data['expires_in'])
```
```

---

## Performance Optimization

```
Optimization Strategy Hierarchy:

Priority 1: Use Bulk/Batch APIs
├─ Reduce 10,000 requests to 1-2
├─ 99.99% reduction
├─ Most important optimization
└─ Example: POST /conversations/batch

Priority 2: Use WebSocket Notifications
├─ Replace polling with events
├─ 99% reduction in requests
├─ Real-time data delivery
└─ Subscribe to /v2/users/{id}/presence

Priority 3: Implement Caching
├─ Avoid repeat requests
├─ Cache with expiration
├─ 50-90% reduction
└─ Example: Cache user list for 1 hour

Priority 4: Use Pagination
├─ Don't retrieve all records at once
├─ Request only needed fields
├─ Reduce payload size
└─ Server-side filtering

Priority 5: Asynchronous Processing
├─ Don't block on API calls
├─ Queue requests
├─ Process in background
└─ Better overall throughput

Specific Use Cases:

Use Case: Query 10,000 Conversations

Naive Approach:
  for i in 1..10000:
    GET /api/v2/conversations/{i}    // 10,000 requests!
  
Problem:
  ├─ Rate limited at 60 requests/minute
  ├─ Takes 166+ minutes
  ├─ 99.99% inefficient
  └─ Cannot meet deadline

Optimized Approach:
  GET /api/v2/analytics/conversations/details?...   // 1-2 requests!
  
Benefits:
  ├─ 1-2 requests vs 10,000
  ├─ Completes in seconds
  ├─ No rate limiting
  └─ 99.99% better

Use Case: Monitor Agent Presence

Naive Approach:
  every 1 second:
    GET /api/v2/users/{id}/presence  // 60 req/min per agent
  
Problem:
  ├─ 60 requests/minute per agent
  ├─ 100 agents = 6,000 req/min
  ├─ Hits rate limit immediately
  └─ Wasted bandwidth

Optimized Approach:
  WebSocket subscribe:
    /v2/users/{id}/presence
    
Benefits:
  ├─ Real-time event delivery
  ├─ 0 polling requests
  ├─ Lower latency
  ├─ No rate limiting impact
  └─ Event-driven architecture

Use Case: Create 10,000 Contacts

Naive Approach:
  for contact in contacts:
    POST /api/v2/externalcontacts/contacts  // 10,000 requests
  
Problem:
  ├─ 10,000 individual requests
  ├─ Lock contention in database
  ├─ High API overhead
  ├─ Slow execution (hours)
  └─ Rate limiting likely

Optimized Approach:
  POST /api/v2/externalcontacts/contacts/bulk
    [array of 500 contacts]  // 20 requests!
  
Benefits:
  ├─ 20 requests vs 10,000
  ├─ Completes in minutes
  ├─ Reduced overhead
  ├─ No lock contention
  └─ Within rate limits

Field Selection Optimization:

Naive:
  GET /api/v2/users
  
Returns ALL fields (large payload)

Optimized:
  GET /api/v2/users?fields=id,email,name
  
Returns ONLY needed fields (smaller payload)

Benefits:
  ├─ Reduced bandwidth
  ├─ Faster response
  ├─ Lower processing
  └─ Better performance

Filter Server-Side:

Naive:
  GET /api/v2/users  // Get all
  filter in code    // Filter locally
  
Optimized:
  GET /api/v2/users?q=active:true  // Server filters
  
Benefits:
  ├─ Smaller response
  ├─ Faster network
  ├─ Server-side indexes
  └─ Better performance
```

---

## Backoff Strategies

```
Exponential Backoff Standard:

Recommended Timing:
├─ First retry: 3 seconds
├─ Second retry: 9 seconds
├─ Third retry: 27 seconds
├─ Fourth retry: 5 minutes + retry
├─ Fifth retry: 10 minutes + retry
└─ Continue as needed

Real-Time Applications:
├─ Tolerance: Few retries (3-5 max)
├─ Max wait: 10-30 seconds
├─ Then: Alert user or fail gracefully
└─ Example: UI interactions

Batch Applications:
├─ Tolerance: Many retries (10-20+)
├─ Max wait: Hours if needed
├─ Continue retrying: Until success
└─ Example: Nightly sync jobs

Implementation:

JavaScript:
```javascript
async function apiCallWithBackoff(url, options, maxRetries = 5) {
  for (let attempt = 0; attempt <= maxRetries; attempt++) {
    try {
      const response = await fetch(url, options);
      
      if (response.status === 429) {
        // Rate limited
        const retryAfter = response.headers.get('Retry-After') || 60;
        const waitTime = Math.pow(3, attempt) * 1000;
        const finalWait = Math.max(waitTime, retryAfter * 1000);
        
        console.log(`Rate limited, waiting ${finalWait}ms`);
        await sleep(finalWait);
        continue; // Retry
      }
      
      if (!response.ok) {
        throw new Error(`HTTP ${response.status}`);
      }
      
      return response.json(); // Success
      
    } catch (error) {
      if (attempt === maxRetries) {
        throw error; // Give up
      }
      
      const waitTime = Math.pow(3, attempt) * 1000;
      console.log(`Attempt ${attempt + 1} failed, retrying in ${waitTime}ms`);
      await sleep(waitTime);
    }
  }
}

function sleep(ms) {
  return new Promise(resolve => setTimeout(resolve, ms));
}
```

Python:
```python
import time
from requests.adapters import HTTPAdapter
from urllib3.util.retry import Retry

def requests_with_backoff(session, method, url, **kwargs):
    # Configure retry strategy
    retry = Retry(
        total=5,
        backoff_factor=3,  # 3, 9, 27, ... seconds
        status_forcelist=[429, 502, 503, 504]
    )
    
    adapter = HTTPAdapter(max_retries=retry)
    session.mount('http://', adapter)
    session.mount('https://', adapter)
    
    # Make request with automatic backoff
    return session.request(method, url, **kwargs)

# Usage
import requests
session = requests.Session()
response = requests_with_backoff(session, 'GET', api_url)
```
```

---

## Monitoring & Alerting

```
What to Monitor:

Rate Limit Health:
├─ X-Rate-Limit-Remaining per request
├─ Alert if below 10
├─ Alert if 429 responses
├─ Track pattern over time
└─ Identify bottlenecks

Token Expiration:
├─ Track token age
├─ Alert on tokens near expiry
├─ Monitor refresh failures
├─ Detect refresh token issues
└─ Plan proactive refreshes

API Performance:
├─ Request latency (p50, p95, p99)
├─ Response times trending
├─ Error rates by type
├─ Endpoint-specific metrics
└─ Regional differences

Application Health:
├─ Authentication success rate
├─ Failed requests trend
├─ Retry frequency
├─ Unusual access patterns
└─ Error type distribution

Metrics to Track:

Requests/Minute:
├─ Actual vs limit
├─ Trend over time
├─ Peak times
├─ Per endpoint
└─ Per user/application

Success Rate:
├─ 2XX responses
├─ 4XX errors (auth, scope)
├─ 5XX errors (server issues)
├─ 429 rate limit
└─ 401 token expired

Average Response Time:
├─ Overall
├─ By endpoint
├─ By region
├─ Trend detection
└─ Outlier identification

Token Metrics:
├─ Tokens created
├─ Tokens refreshed
├─ Refresh failures
├─ Token age
└─ Expiration near events

Alerting Thresholds:

Critical (Immediate):
├─ 429 rate limited for 5+ min
├─ 401 auth failures spike
├─ 503 service unavailable
├─ High error rate (>10%)
└─ P99 latency spike

Warning (Within 30 min):
├─ 429 rate limited
├─ Approaching rate limit
├─ Token refresh failures
├─ Error rate increasing
└─ Latency degrading

Informational (Daily):
├─ API usage report
├─ Performance summary
├─ Token refresh count
├─ Endpoint breakdown
└─ Trend analysis

Dashboard Example:

Current Status:
├─ Requests/min: 45 of 60 (75%)
├─ Success rate: 99.8%
├─ Avg latency: 245ms
├─ Active tokens: 3
└─ Last refresh: 2 hours ago

Recent Alerts:
├─ None currently active
├─ Last alert: 3 days ago (resolved)
└─ Next review: Today 5:00 PM

Trending:
├─ Requests/min: ↑ +5% this week
├─ Latency: ↓ -10% this month
├─ Errors: ↓ -3% this week
└─ Status: Healthy
```

---

## Error Handling

```
HTTP Status Codes:

Retryable (Use Backoff):
├─ 429 Too Many Requests (rate limit)
├─ 502 Bad Gateway (temp infrastructure)
├─ 503 Service Unavailable (maintenance)
├─ 504 Gateway Timeout (temp slowness)
└─ Action: Wait and retry

Client Errors (Don't Retry):
├─ 400 Bad Request (fix format)
├─ 401 Unauthorized (refresh token)
├─ 403 Forbidden (add scope/permission)
├─ 404 Not Found (resource missing)
├─ 405 Method Not Allowed (wrong HTTP verb)
└─ Action: Fix and retry (or fail)

Server Errors (Usually Retryable):
├─ 500 Internal Server Error (try again)
├─ 502 Bad Gateway (usually temp)
├─ 503 Service Unavailable (usually temp)
├─ 504 Gateway Timeout (usually temp)
└─ Action: Backoff and retry

Decision Tree:

Is Response Successful (2XX)?
├─ Yes → Return data, success
└─ No → Check status code

Is Status 401 (Unauthorized)?
├─ Yes → Refresh token, retry
└─ No → Continue

Is Status 403 (Forbidden)?
├─ Yes → Check scope/permission, error
└─ No → Continue

Is Status 4XX (Other Client)?
├─ Yes → Log error, fail (don't retry)
└─ No → Continue

Is Status 429 (Rate Limited)?
├─ Yes → Backoff, retry
└─ No → Continue

Is Status 5XX or Other?
├─ Yes → Backoff, retry
└─ No → Log, fail

Implementation:

```python
def handle_api_response(response):
    if response.status_code in [200, 201, 204]:
        return response.json() if response.text else None
    
    elif response.status_code == 401:
        # Unauthorized - refresh token
        refresh_token()
        raise RetryException("Token refreshed, retry")
    
    elif response.status_code == 403:
        # Forbidden - permission/scope issue
        log_error(f"Access denied: {response.json()}")
        raise PermissionException("Insufficient permissions")
    
    elif response.status_code == 404:
        # Not found
        raise NotFoundException("Resource not found")
    
    elif response.status_code == 429:
        # Rate limited
        retry_after = response.headers.get('Retry-After', 60)
        raise RateLimitException(f"Retry after {retry_after}s")
    
    elif response.status_code >= 500:
        # Server error - retry
        raise ServerException(f"HTTP {response.status_code}")
    
    else:
        # Other error
        raise APIException(f"HTTP {response.status_code}")
```
```

---

## Key Takeaways: Chapter 7

- **Rate Limits Exist** - 60 req/min standard, per application
- **Monitor Headers** - X-Rate-Limit-Remaining tells story
- **Exponential Backoff** - 3, 9, 27 second strategy
- **Token Lifespan** - 1 hour (configurable), proactive refresh recommended
- **Bulk APIs Critical** - 99.99% reduction in requests
- **WebSocket Events** - 99% reduction in polling
- **Caching Helps** - 50-90% reduction in repeat queries
- **Error Handling** - 429/5XX retry, 4XX (except 401) fail

---

## Interview Prep

| Question | Answer |
|---|---|
| Rate limit? | 60 requests/minute per application (standard) |
| 429 handling? | Exponential backoff: 3s → 9s → 27s |
| Token lifetime? | 1 hour default (configurable 300-172,800 sec) |
| Token refresh? | When expiring, use refresh_token to get new |
| Bulk API benefit? | Reduce 10,000 requests to 1-2 (99.99% saving) |
| WebSocket benefit? | Replace polling, event-driven, 99% reduction |
| Caching benefit? | Avoid repeat queries, 50-90% reduction |
| 401 handling? | Refresh token, get new access_token |
| 403 handling? | Add missing scope or permission, fail |
| Backoff factor? | Exponential: 3^(attempt) seconds |

---

## Document Version
**Chapter**: 7 of 8  
**Last Updated**: March 2026  
**Status**: Current with OAuth 2.0 standards  
**Scope**: Rate limiting, token management, performance, error handling