Troubleshooting Guide
Quick Diagnostic Checklist
- Dashboard loading properly?
- API configuration active?
- Streams being monitored?
- Instances deployed correctly?
- Rules triggering as expected?
- Events being created?
- Database connectivity OK?
- Background monitoring active?
- Logs showing errors?
- System resources adequate?
- CONNECT API responding?
- Network connectivity stable?
Common Issues & Solutions
Connection Issues
CONNECT API Connection Failed
Symptoms:
- Dashboard shows "API Not Configured"
- No streams being discovered
- Authentication errors in logs
Solutions:
- Verify tenant ID and namespace ID
- Check client credentials (ID and secret)
- Test network connectivity to CONNECT endpoints
- Verify OAuth2 permissions
- Check firewall and proxy settings
Database Connection Error
Symptoms:
- Application fails to start
- 500 errors on all pages
- Database connection errors in logs
Solutions:
- Verify DATABASE_URL environment variable
- Check PostgreSQL service status
- Verify database user permissions
- Test connection manually with psql
- Check database server disk space
Rule Evaluation Issues
Rules Not Triggering
Symptoms:
- No events being created
- Instance appears deployed but inactive
- Stream values changing but no rule evaluation
Solutions:
- Verify instance is properly deployed
- Check placeholder mappings are correct
- Ensure stream names match exactly
- Verify rule conditions and operators
- Check for missing required placeholders
- Review monitoring engine logs
Placeholder Resolution Errors
Symptoms:
- Warning messages about unmapped placeholders
- Event names contain literal placeholder text
- Rules fail to evaluate properly
Solutions:
- Check all placeholders are mapped in instance
- Verify stream names exist and are accessible
- Use system tags for automatic resolution
- Check placeholder data type compatibility
- Review template placeholder definitions
Event Management Issues
Events Not Closing
Symptoms:
- Multiple overlapping events for same condition
- Events remain active indefinitely
- Event count continuously growing
Solutions:
- Check rule stop_processing configuration
- Verify event closure conditions
- Review category-based event closure logic
- Manually close events if needed
- Check for rule evaluation errors
Event Wall Not Loading
Symptoms:
- Event Wall shows empty or loading state
- JavaScript errors in browser console
- API endpoints returning errors
Solutions:
- Check browser JavaScript console for errors
- Verify API endpoints are accessible
- Clear browser cache and cookies
- Check network connectivity
- Verify template and event data integrity
Diagnostic Procedures
System Health Check
1. Service Status
# Check application service
systemctl status race-console
# Check database service
systemctl status postgresql
# Check web server
systemctl status nginx
# View recent logs
journalctl -u race-console -n 50
2. Database Connectivity
# Test database connection
psql -h localhost -U race_app -d race_console_prod -c "SELECT version();"
# Check active connections
psql -h localhost -U race_app -d race_console_prod -c "SELECT count(*) FROM pg_stat_activity;"
# Check table status
psql -h localhost -U race_app -d race_console_prod -c "\dt"
API Connectivity Testing
3. CONNECT API Test
# Test API endpoint
curl -X GET \
"https://euno.datahub.connect.aveva.com/api/v1/Tenants/{tenant}/Namespaces" \
-H "Authorization: Bearer {access_token}"
# Check DNS resolution
nslookup euno.datahub.connect.aveva.com
# Test network connectivity
ping euno.datahub.connect.aveva.com
4. Internal API Test
# Test internal endpoints
curl -X GET http://localhost:5000/api/instances
curl -X GET http://localhost:5000/api/streams/active
curl -X GET http://localhost:5000/api/rule-events?hours=1
# Check application health
curl -X GET http://localhost:5000/
Log Analysis
5. Application Logs
# View application logs
tail -f /var/log/race-console/app.log
# Search for errors
grep -i error /var/log/race-console/app.log
# Search for specific issues
grep -i "connection" /var/log/race-console/app.log
grep -i "rule_engine" /var/log/race-console/app.log
6. System Logs
# Check system logs
dmesg | tail -50
# Check disk space
df -h
# Check memory usage
free -h
# Check process status
ps aux | grep race
Performance Issues
Slow Dashboard Loading
- Database Query Optimization: Check for long-running queries
- Index Management: Ensure proper database indexes
- Connection Pooling: Optimize database connection settings
- Caching: Implement Redis for frequently accessed data
- Asset Optimization: Minify CSS/JS files
High Memory Usage
- Memory Leaks: Monitor for gradual memory increase
- Large Datasets: Implement pagination for large result sets
- Background Tasks: Optimize monitoring engine batch sizes
- Session Management: Configure session timeout and cleanup
- Process Restart: Schedule regular application restarts
Monitoring Engine Issues
- Polling Frequency: Adjust monitoring intervals
- Batch Processing: Optimize stream processing batch sizes
- API Rate Limits: Respect CONNECT API rate limits
- Error Handling: Implement proper retry mechanisms
- Resource Management: Monitor background process resources
Database Performance
- Query Optimization: Analyze slow queries
- Index Strategy: Create indexes for frequently queried columns
- Data Retention: Implement automatic cleanup of old events
- Connection Pooling: Configure optimal pool sizes
- Regular Maintenance: VACUUM and ANALYZE operations
Error Message Reference
Common Error Messages
| Error Message | Cause | Solution |
|---|---|---|
Template cannot be modified - has deployed instances |
Attempting to edit template with active instances | Undeploy all instances before modifying template |
Placeholder {name} not mapped in instance {id} |
Instance missing required placeholder mapping | Add missing placeholder mapping to instance |
Stream {name} not found or accessible |
Referenced stream doesn't exist or is unavailable | Verify stream name and API connectivity |
OAuth2 authentication failed |
Invalid CONNECT API credentials | Verify client ID, secret, and permissions |
Database connection timeout |
Database server overloaded or network issues | Check database performance and network |
Rule evaluation error: {details} |
Error in rule condition evaluation | Check rule syntax and data types |
Recovery Procedures
Application Recovery
# Restart application service
sudo systemctl restart race-console
# Clear application cache
sudo rm -rf /opt/race-console/cache/*
# Reset to safe configuration
sudo cp /opt/race-console/config/app_config.json.backup \
/opt/race-console/config/app_config.json
# Check service status
sudo systemctl status race-console
Database Recovery
# Restore from backup
pg_restore -h localhost -U race_app -d race_console_prod \
/backup/database_20250807.sql
# Rebuild indexes
psql -h localhost -U race_app -d race_console_prod \
-c "REINDEX DATABASE race_console_prod;"
# Update statistics
psql -h localhost -U race_app -d race_console_prod \
-c "ANALYZE;"
Emergency Procedures
- Stop Monitoring: Temporarily disable background monitoring
- Safe Mode: Start application in read-only mode
- Data Export: Export critical configuration data
- Contact Support: Escalate to system administrators
- Document Issue: Record problem details for analysis
Prevention Checklist
- Regular backup verification
- Monitoring system health metrics
- Scheduled maintenance windows
- Configuration change documentation
- Staff training on recovery procedures
- Emergency contact information