Health Check System Documentation
Overview
The CS2 Inspect Web application includes a comprehensive health check system that monitors the status of all critical dependencies and services. This system provides real-time and historical health data through multiple endpoints and a visual status dashboard.
Health Check Endpoints
1. Liveness Probe - /api/health/live
Purpose: Indicates if the application process is running.
Response:
{
"status": "ok",
"timestamp": "2024-10-21T00:00:00.000Z"
}Status Codes:
200 OK- Process is alive
Usage: Container orchestrators use this to determine if the container should be restarted.
2. Readiness Probe - /api/health/ready
Purpose: Indicates if the application is ready to serve traffic.
Response (Healthy):
{
"status": "ok",
"timestamp": "2024-10-21T00:00:00.000Z",
"ready": true,
"checks": [
{
"name": "database",
"status": "ok",
"latency_ms": 15
},
{
"name": "environment",
"status": "ok",
"latency_ms": 2
}
]
}Response (Unhealthy):
{
"status": "fail",
"timestamp": "2024-10-21T00:00:00.000Z",
"ready": false,
"checks": [
{
"name": "database",
"status": "fail",
"latency_ms": 5000
}
]
}Status Codes:
200 OK- All critical dependencies are healthy503 Service Unavailable- One or more critical dependencies are unhealthy
Critical Checks:
- Database connectivity
- Environment configuration
Usage: Load balancers and orchestrators use this to determine if traffic should be routed to this instance.
3. Detailed Health - /api/health/details
Purpose: Provides detailed health information for all system components.
Response:
{
"status": "ok",
"timestamp": "2024-10-21T00:00:00.000Z",
"checks": [
{
"name": "database",
"status": "ok",
"latency_ms": 15,
"message": "Database connection healthy",
"metadata": {
"pool_active_connections": 2,
"pool_total_connections": 5,
"pool_idle_connections": 3
},
"checked_at": "2024-10-21T00:00:00.000Z"
},
{
"name": "steam_api",
"status": "ok",
"latency_ms": 1,
"message": "Steam API key configured",
"metadata": {
"api_key_length": 32,
"has_steam_username": true,
"has_steam_password": true
},
"checked_at": "2024-10-21T00:00:00.000Z"
},
{
"name": "steam_client",
"status": "ok",
"latency_ms": 5,
"message": "Steam client ready - connected",
"metadata": {
"is_ready": true,
"status": "connected",
"queue_length": 0,
"unmasked_support": true
},
"checked_at": "2024-10-21T00:00:00.000Z"
},
{
"name": "environment",
"status": "ok",
"latency_ms": 1,
"message": "All required environment variables present",
"metadata": {
"required_vars_count": 5,
"present_vars_count": 5,
"missing_vars": [],
"node_env": "production",
"port": "3000"
},
"checked_at": "2024-10-21T00:00:00.000Z"
}
]
}Status Codes:
200 OK- Always returns 200, check individual component statuses
Health Checks:
- Database: Connection health, latency, connection pool status
- Steam API: API key configuration, credentials presence
- Steam Client: CS2 inspect client connection status, queue health
- Environment: Required environment variables validation
4. Historical Health Data - /api/health/history
Purpose: Provides historical health check data for visualization.
Query Parameters:
check_name(optional) - Filter by specific check namestart_time(optional) - Start of time range (ISO 8601 format), defaults to 24 hours agoend_time(optional) - End of time range (ISO 8601 format)limit(optional) - Maximum number of data points, defaults to 100
Example Request:
GET /api/health/history?check_name=database&start_time=2024-10-20T00:00:00Z&limit=200Response:
[
{
"check_name": "database",
"data_points": [
{
"timestamp": "2024-10-21T00:00:00.000Z",
"status": "ok",
"latency_ms": 15
},
{
"timestamp": "2024-10-21T00:01:00.000Z",
"status": "ok",
"latency_ms": 18
}
]
}
]Status Codes:
200 OK- Returns historical data (may be empty array)
Status Dashboard
A visual status dashboard is available at /status that provides:
- Overall System Status Banner - Quick view of overall health
- Individual Service Cards - Status cards for each dependency with:
- Current status (Operational/Degraded/Failed)
- Response latency
- Status message
- Historical Performance Charts - Time-series graphs showing:
- Status changes over time
- Latency trends
- Configurable time ranges (1h, 6h, 24h, 7d)
- Auto-refresh - Automatically updates every 30 seconds
Accessing the Status Page
Navigate to: http://your-domain.com/status
Health Status Values
Health checks return one of three status values:
ok- Service is fully operationaldegraded- Service is operational but performance is below normal thresholdsfail- Service is unavailable or not functioning
Latency Thresholds
Default thresholds for degraded/failed status:
| Service | Degraded | Failed |
|---|---|---|
| Database | > 50ms | > 200ms |
| Steam API | > 300ms | > 1000ms |
| Steam Client | > 500ms | > 2000ms |
| Environment | > 10ms | > 50ms |
Background Sampling
The health check system includes a background sampler that:
- Runs health checks every 60 seconds
- Persists results to the database for historical tracking
- Automatically cleans up data older than 7 days
- Starts automatically when the server starts
Docker Health Check
The Dockerfile includes a HEALTHCHECK instruction that:
- Calls
/api/health/readyendpoint - Runs every 30 seconds
- Has a 5-second timeout
- Allows 30 seconds for startup
- Retries 3 times before marking as unhealthy
Container orchestrators (Docker, Kubernetes, etc.) use this to automatically restart unhealthy containers.
Database Schema
The health check system uses two database tables:
health_check_history
Stores historical health check results:
id- Primary keycheck_name- Name of the health checkstatus- Health status (ok/degraded/fail)latency_ms- Response latency in millisecondsmessage- Status message or error detailsmetadata- Additional check-specific data (JSON)checked_at- Timestamp of the check
health_check_config
Stores configuration for health checks:
id- Primary keycheck_name- Name of the health checkenabled- Whether the check is enabledwarning_threshold_ms- Latency threshold for degraded statuserror_threshold_ms- Latency threshold for error status
Database Setup
Migrations run automatically on server startup!
When you start the server, it will automatically:
- Create the
_migrationstracking table (if needed) - Check for pending migrations
- Execute them in order (000_initial.sql, 001_add_health_checks.sql, etc.)
- Skip already-executed migrations
For new installations:
- Just start the server - all migrations will run automatically
- The system will create all required tables
For existing installations:
- Start the server - only new migrations will be executed
- Already-existing tables are safely skipped (thanks to
IF NOT EXISTS)
Manual migration (if needed for troubleshooting):
# Apply a specific migration manually
mysql -h <host> -u <user> -p <database> < server/database/migrations/001_add_health_checks.sqlSee server/database/migrations/README.md for more details on the migration system.
Monitoring and Alerting
Recommended Monitoring Setup
External Monitoring
- Set up external monitoring (e.g., UptimeRobot, Pingdom) to check
/api/health/ready - Alert on 503 status codes or timeouts
- Set up external monitoring (e.g., UptimeRobot, Pingdom) to check
Prometheus Integration (Optional)
- Export health metrics to Prometheus
- Create Grafana dashboards for visualization
- Set up alerts based on status and latency thresholds
Log Monitoring
- Monitor application logs for health check failures
- Set up alerts for repeated failures
Alert Thresholds
Recommended alert conditions:
/api/health/readyreturns 503 for more than 2 consecutive checks- Any service shows
failstatus for more than 5 minutes - Database latency exceeds 200ms for more than 3 consecutive minutes
- Steam client disconnects when credentials are configured
Troubleshooting
Health Check Fails on Startup
Symptom: Container marked unhealthy immediately after start
Solution:
- Increase
start_periodin Dockerfile HEALTHCHECK - Verify database connectivity
- Check environment variables are properly set
Database Health Check Fails
Symptom: Database shows fail status
Common Causes:
- Database server is down or unreachable
- Invalid database credentials
- Connection pool exhausted
- Network issues
Solutions:
- Verify database server is running
- Check DATABASE_* environment variables
- Review connection pool settings
- Check network connectivity
Steam Client Health Check Fails
Symptom: Steam client shows fail status when credentials are configured
Common Causes:
- Invalid Steam credentials
- Steam servers are down
- Rate limiting
- Network issues
Solutions:
- Verify STEAM_USERNAME and STEAM_PASSWORD
- Check Steam server status
- Ensure Steam account is not logged in elsewhere
- Check network connectivity to Steam servers
Historical Data Not Appearing
Symptom: Status page shows no historical data
Common Causes:
- Database tables not initialized
- Sampler not running
- Insufficient permissions
Solutions:
- Run health_schema.sql to create tables
- Check server logs for sampler errors
- Verify database user has INSERT permissions on health_check_history table
Environment Variables
The health check system uses these environment variables:
Required:
DATABASE_HOST- Database server hostnameDATABASE_USER- Database usernameDATABASE_PASSWORD- Database passwordDATABASE_NAME- Database nameJWT_TOKEN- JWT secret token
Optional:
DATABASE_PORT- Database port (default: 3306)DATABASE_CONNECTION_LIMIT- Max connections (default: 5)STEAM_API_KEY- Steam API keySTEAM_USERNAME- Steam account usernameSTEAM_PASSWORD- Steam account passwordLOG_API_REQUESTS- Enable request logging (default: false)
Best Practices
Monitor Regularly
- Check the status dashboard daily
- Set up automated alerts for failures
Review Historical Trends
- Use historical data to identify patterns
- Look for gradual degradation over time
Test Health Checks
- Regularly verify health endpoints respond correctly
- Test failover scenarios
Keep Data Fresh
- The system automatically cleans up data older than 7 days
- Adjust retention period if needed in sampler.ts
Secure Endpoints
/api/health/liveand/api/health/readyare public (needed for orchestrators)- Consider protecting
/api/health/detailsand/api/health/historywith authentication - Use firewall rules to restrict access to health endpoints from untrusted sources
API Rate Limits
Health check endpoints are lightweight and can be called frequently:
/api/health/live- No rate limit (very fast)/api/health/ready- Recommended: max 1 call per 5 seconds/api/health/details- Recommended: max 1 call per 10 seconds/api/health/history- Recommended: max 1 call per 30 seconds
The status page auto-refresh uses these recommended intervals.