# Api Troubleshooting

# Deep API Troubleshooting Guide

## Goal

API troubleshooting is about finding **where the failure is happening**:

```text
Client/App → Network → API Gateway → Authentication → Backend Service → Database/External System
```

A good implementation engineer isolates the layer before escalating.

---

# 1. Start With 4 Basic Questions

| Question                    | Why It Matters                               |
| --------------------------- | -------------------------------------------- |
| What is expected to happen? | Defines success                              |
| What actually happens?      | Defines the failure                          |
| Is it reproducible?         | Separates one-time issue from systemic issue |
| Who is affected?            | Helps identify scope                         |

Example:

```text
Expected: CRM ticket is created after chat ends.
Actual: Chat ends, but no CRM ticket appears.
Scope: One customer environment, all agents.
```

---

# 2. Identify the API Direction

| Direction    | Meaning                               | Example                 |
| ------------ | ------------------------------------- | ----------------------- |
| Inbound API  | Customer system calls platform API    | CRM calls Glia API      |
| Outbound API | Platform calls customer system        | Glia calls CRM API      |
| Webhook      | Event notification sent automatically | Chat ended → CRM ticket |
| Callback     | Response sent after async process     | Payment status update   |

This matters because it tells you **which side to check first**.

---

# 3. Check the HTTP Method

| Method | Troubleshooting Focus                       |
| ------ | ------------------------------------------- |
| GET    | Is the resource ID correct?                 |
| POST   | Is the payload valid?                       |
| PUT    | Are all required fields included?           |
| PATCH  | Are updated fields allowed?                 |
| DELETE | Does the user/token have delete permission? |

Example issue:

```http
GET /api/create-ticket
```

But the API expects:

```http
POST /api/create-ticket
```

Likely result:

```http
405 Method Not Allowed
```

---

# 4. Check the Endpoint

Validate:

```text
Base URL
API version
Path
Resource ID
Query parameters
Environment
```

Example:

```http
https://api.vendor.com/v1/customers/12345
```

Common mistakes:

| Problem             | Example                                     |
| ------------------- | ------------------------------------------- |
| Wrong environment   | Using sandbox instead of production         |
| Wrong API version   | `/v1/` instead of `/v2/`                    |
| Typo in endpoint    | `/costumers/` instead of `/customers/`      |
| Missing resource ID | `/customers/` instead of `/customers/12345` |

---

# 5. Check Headers

Important headers:

| Header         | Purpose                     |
| -------------- | --------------------------- |
| Authorization  | Sends bearer token/API key  |
| Content-Type   | Format of request body      |
| Accept         | Format expected in response |
| x-api-key      | API key authentication      |
| Correlation-ID | Request tracking            |

Example:

```http
Authorization: Bearer eyJhbGc...
Content-Type: application/json
Accept: application/json
```

Common issue:

```text
Content-Type missing
```

API may not understand the JSON body.

---

# 6. Check Authentication

Most API failures happen here.

| Error | Meaning                   | Common Cause                  |
| ----- | ------------------------- | ----------------------------- |
| 401   | Not authenticated         | Missing/expired/invalid token |
| 403   | Authenticated but blocked | Missing permission/scope/role |

## 401 Checklist

Check:

```text
Is Authorization header present?
Is token expired?
Is token copied correctly?
Is token for the right environment?
Was token revoked?
```

## 403 Checklist

Check:

```text
Does token have correct scope?
Does user/app have permission?
Is IP allowlisting required?
Is this endpoint restricted?
```

---

# 7. Check OAuth Scope

Scopes define what the token can do.

Example:

```text
read:customers
write:tickets
admin
```

If token only has:

```text
read:customers
```

but you call:

```http
POST /api/tickets
```

You may get:

```http
403 Forbidden
```

---

# 8. Check the JSON Payload

For POST/PUT/PATCH, inspect payload carefully.

Example:

```json
{
  "customerId": "12345",
  "channel": "chat",
  "authenticated": true
}
```

Validate:

| Item            | Example Problem                      |
| --------------- | ------------------------------------ |
| Syntax          | Missing comma                        |
| Required fields | Missing customerId                   |
| Data type       | `"true"` instead of `true`           |
| Field name      | `customerID` instead of `customerId` |
| Nesting         | Object expected, string sent         |
| Array format    | Single value sent instead of list    |

---

# 9. Understand Status Codes by Category

| Code Range | Meaning              |
| ---------- | -------------------- |
| 2xx        | Success              |
| 3xx        | Redirect             |
| 4xx        | Client/request issue |
| 5xx        | Server/backend issue |

## Important Codes

| Code | Meaning                | First Place To Look          |
| ---- | ---------------------- | ---------------------------- |
| 200  | Success                | Validate response body       |
| 201  | Created                | Confirm resource ID returned |
| 204  | No Content             | Success, but no body         |
| 400  | Bad Request            | Payload/parameters           |
| 401  | Unauthorized           | Authentication               |
| 403  | Forbidden              | Permissions/scopes           |
| 404  | Not Found              | Endpoint/resource            |
| 405  | Method Not Allowed     | HTTP method                  |
| 408  | Timeout                | Network/backend delay        |
| 409  | Conflict               | Duplicate/existing resource  |
| 415  | Unsupported Media Type | Content-Type                 |
| 422  | Validation error       | Field validation             |
| 429  | Rate limit             | Too many requests            |
| 500  | Server error           | Backend logs                 |
| 502  | Bad gateway            | Proxy/upstream               |
| 503  | Unavailable            | Outage/maintenance           |
| 504  | Gateway timeout        | Backend too slow             |

---

# 10. Troubleshooting by Error Code

## 400 Bad Request

Likely:

```text
Invalid payload, missing fields, wrong query parameter
```

Check:

```text
JSON syntax
Required fields
Field names
Data types
API documentation
```

---

## 401 Unauthorized

Likely:

```text
Authentication failed
```

Check:

```text
Token present
Token expired
Correct auth method
Correct environment
```

---

## 403 Forbidden

Likely:

```text
Permission issue
```

Check:

```text
User role
OAuth scopes
API permissions
IP allowlist
```

---

## 404 Not Found

Likely:

```text
Wrong endpoint or resource does not exist
```

Check:

```text
URL path
API version
Resource ID
Environment
```

---

## 409 Conflict

Likely:

```text
Duplicate or conflicting resource
```

Example:

```text
Trying to create a user that already exists
```

---

## 415 Unsupported Media Type

Likely:

```text
Wrong or missing Content-Type
```

Check:

```http
Content-Type: application/json
```

---

## 422 Validation Error

Likely:

```text
Payload is valid JSON but fails business rules
```

Example:

```text
Phone number format invalid
Required field has invalid value
```

---

## 429 Too Many Requests

Likely:

```text
Rate limit exceeded
```

Check:

```text
Retry-After header
API rate limits
Looping automation
Retry logic
```

---

## 500 Internal Server Error

Likely:

```text
Backend application error
```

Action:

```text
Collect request ID, timestamp, payload sample, and escalate
```

---

## 502 / 503 / 504

Likely:

```text
Gateway, service availability, or timeout issue
```

Check:

```text
Load balancer
API gateway
Backend health
Maintenance window
Network latency
```

---

# 11. Network Layer Checks

APIs depend on network reachability.

Check:

```text
DNS resolution
Firewall
Proxy
Port 443
TLS certificate
IP allowlisting
VPN/private connectivity
```

Common symptoms:

| Symptom            | Possible Cause                 |
| ------------------ | ------------------------------ |
| Timeout            | Firewall, proxy, backend delay |
| TLS error          | Certificate or TLS mismatch    |
| DNS failure        | Wrong hostname                 |
| Connection refused | Service down or port blocked   |

---

# 12. TLS / Certificate Checks

For HTTPS APIs, validate:

```text
Certificate not expired
Hostname matches certificate
Certificate chain trusted
TLS 1.2 or 1.3 supported
```

Common failures:

```text
certificate expired
self-signed certificate
hostname mismatch
TLS handshake failed
```

---

# 13. Rate Limiting

Rate limiting protects APIs from overload.

Common response:

```http
429 Too Many Requests
```

Check headers:

```http
Retry-After: 60
```

Troubleshooting:

```text
Reduce request frequency
Add backoff/retry logic
Check if automation is looping
Review API quota
```

---

# 14. Timeouts

Timeouts can occur at multiple layers:

```text
Client timeout
API gateway timeout
Load balancer timeout
Backend service timeout
Database timeout
```

Ask:

```text
How long before failure?
Does it fail always or intermittently?
Is payload large?
Is backend dependency slow?
```

---

# 15. Correlation IDs

A correlation ID tracks one request across systems.

Example:

```http
X-Correlation-ID: abc-123
```

Use it to trace:

```text
API Gateway → Auth Service → Backend Service → Database
```

For escalation, always include:

```text
correlation ID
timestamp
endpoint
HTTP method
response code
```

---

# 16. Logs To Review

| Log Type            | What It Shows                         |
| ------------------- | ------------------------------------- |
| API Gateway logs    | Request entry, routing, response code |
| Auth logs           | OAuth/token failures                  |
| Application logs    | Backend errors                        |
| Webhook logs        | Delivery attempts                     |
| Load balancer logs  | Upstream health/timeouts              |
| Firewall/proxy logs | Blocked connections                   |
| CRM logs            | Customer/ticket integration failures  |

---

# 17. Good Troubleshooting Workflow

```text
1. Confirm expected behavior
2. Reproduce issue
3. Identify endpoint + method
4. Check response code
5. Validate auth
6. Validate headers
7. Validate payload
8. Validate network/TLS
9. Check logs/correlation ID
10. Isolate failed layer
11. Escalate with evidence
```

---

# 18. Example Scenario

## Issue

```text
Chat ends but CRM ticket is not created.
```

## Expected Flow

```text
Chat completed → Webhook sent → CRM API creates ticket
```

## Troubleshooting

Check:

```text
Was chat_completed event generated?
Was webhook sent?
Did CRM receive webhook?
What HTTP response did CRM return?
Was token valid?
Was JSON payload accepted?
Was ticket required field missing?
```

Possible findings:

| Finding                   | Meaning                     |
| ------------------------- | --------------------------- |
| No event generated        | Platform workflow issue     |
| Webhook sent, no response | CRM endpoint/network issue  |
| 401 from CRM              | Auth/token issue            |
| 400 from CRM              | Payload/field mapping issue |
| 500 from CRM              | CRM backend issue           |

---

# 19. Another Scenario

## Issue

```text
Customer lookup fails during chat start.
```

## Expected Flow

```text
Customer enters → API calls CRM → CRM returns profile
```

Check:

```text
Customer ID passed correctly?
CRM endpoint correct?
Token valid?
Customer exists?
Field mapping correct?
CRM API response code?
```

Likely outcomes:

| Response | Likely Issue           |
| -------- | ---------------------- |
| 401      | Token expired          |
| 403      | Missing CRM permission |
| 404      | Customer not found     |
| 400      | Bad customerId format  |
| 500      | CRM backend failure    |

---

# 20. What To Say In Interview

## API Troubleshooting Answer

> “I start by confirming the expected workflow and reproducing the issue. Then I identify the API endpoint, HTTP method, response code, headers, authentication method, and payload. Based on the response code, I isolate whether the issue is authentication, permissions, request formatting, endpoint, networking, or backend related. I also check logs, timestamps, and correlation IDs before escalating with clear evidence.”

---

# 21. Implementation Engineer Mindset

Do not just say:

```text
The API failed.
```

Say:

```text
The POST request to the CRM ticket endpoint is returning 400 because the payload is missing the required caseType field.
```

That shows strong technical ownership.