← Back to ColdFusion Administrator Reference

Debugging & Logging - Probes

Monitor server health with probes, thresholds, and automated alerts

Last updated: October 27, 20257 min read

Overview

Probes are monitoring endpoints and health check mechanisms in ColdFusion that continuously track server performance and resource usage. They enable proactive monitoring by detecting issues before they impact users, triggering alerts when configurable thresholds are exceeded, and providing real-time visibility into server health metrics.

Properly configured probes are essential for production environments, enabling operations teams to respond quickly to performance degradation, resource exhaustion, or other issues. Probes can monitor CPU usage, memory consumption, request queue depth, active sessions, and various other metrics, sending notifications or taking automated actions when problems occur.

Probe Types and Metrics

Configure monitoring for critical server resources and performance indicators.

CPU Usage Monitoring

MetricPercentage of CPU utilization by ColdFusion process

ThresholdConfigurable percentage (e.g., alert at 80%)

DurationSustained high CPU vs. temporary spikes

Use CaseDetect runaway requests, inefficient code, high load

RecommendedWarning: 70%, Critical: 85%

Best Practice: Configure two-tier alerting with warning level before critical to allow proactive response before service impact.

Memory Usage Monitoring

MetricJVM heap usage percentage

Threshold TypesUsed memory, free memory, garbage collection frequency

Critical LevelsAlert before OutOfMemoryError occurs

GC MonitoringTrack garbage collection time and frequency

RecommendedWarning: 75% heap, Critical: 90%

Critical: Monitor both heap usage and GC frequency. Frequent garbage collection (even with available heap) indicates memory pressure.

Request Queue Monitoring

MetricNumber of queued requests waiting for processing

IndicatorServer overload or slow requests blocking threads

ImpactQueue buildup causes response time degradation

RecommendedWarning: 10 queued, Critical: 25

Root CausesInsufficient threads, slow database, external service delays

Tip: A persistent queue (even if small) indicates you need more capacity or faster request processing. Ideally, queue should be zero most of the time.

Other Metrics

Active Sessions

Metric: Number of active user sessions
Use Cases: Detect session explosion, potential memory issues
Security: Unusual spikes may indicate attack

Track session growth over time for capacity planning and security monitoring.

Active Threads

Metric: Number of threads actively processing requests
Max Threads: Compare to configured maximum
Threshold: Warning at 80% utilization

Alert when thread exhaustion occurs - all threads busy indicates capacity issues.

Database Connection Pool

Metric: Active vs. available database connections
Monitor: Each datasource independently
Detect: Connection leaks and pool exhaustion

Alert when no connections available or trigger connection pool reset.

Request Execution Time

Metric: Average and maximum request duration
Threshold: Alert on requests over 5-10 seconds
Correlation: Link to other metrics (DB time, queue depth)

Identify slow pages and detect performance degradation over time.

Configuring Probe Endpoints

Set up HTTP endpoints and APIs for external monitoring and health checks.

Health Check URLs

Purpose: HTTP endpoints returning server health status
Response Format: HTTP status code (200 OK, 503 Service Unavailable)
Example URL: /CFIDE/probe.cfm or custom endpoint

Restrict access by IP or require authentication. Used by load balancers and external monitoring.

Built-in Monitoring API

Access: Programmatic access to all metrics
Permissions: Requires administrator credentials
Format: Structured data (arrays, structs)

CF Administrator API for custom monitoring scripts and dashboards.

Custom Probe CFCs

Implementation: Create custom CFC for application-specific checks
Checks: Database connectivity, external service availability, cache health
Return Status: Boolean healthy/unhealthy or detailed metrics

Keep probe logic lightweight and fast (under 100ms execution time).

Threshold Configuration

Setting Appropriate Thresholds

Baseline Measurement: Monitor normal operations first
Peak vs. Average: Account for traffic spikes
Warning vs. Critical: Two-tier alerting (warning at 75%, critical at 90%)
Sustained vs. Spike: Require threshold exceeded for duration (e.g., 2 minutes)
Seasonal Patterns: Adjust for known high-traffic periods
Growth Accommodation: Review and adjust thresholds quarterly

CPU Thresholds

Warning: 70-75% sustained for 2+ minutes
Critical: 85-90% sustained for 5+ minutes
Spike Tolerance: Brief 95%+ acceptable during batch jobs
Multi-Core: Monitor per-core if possible

Memory Thresholds

Warning: 75% heap usage
Critical: 90% heap usage or frequent full GC
GC Threshold: Alert if GC taking >10% of CPU time
Prevention: Set thresholds to prevent OOM errors

Request Queue Thresholds

Warning: 5-10 queued requests
Critical: 25+ queued requests
Zero Queue Goal: Ideally, queue should be empty most of the time
Capacity Planning: Persistent queue indicates need for scaling

Alerting Configuration

Email Notifications

Configuration: Recipient addresses for alerts
Content: Metric value, threshold, timestamp, server identifier
Frequency: Rate limiting to prevent email flooding
Escalation: Different recipients for warning vs. critical
Recovery: Send notification when issue resolved

SNMP Traps

Purpose: Integration with enterprise monitoring systems
Configuration: SNMP manager address and community string
Trap OIDs: Unique identifiers for each metric type
Severity Mapping: Map thresholds to SNMP severity levels
Standards: Follow SNMP v2c or v3 protocols

Custom Actions

Script Execution: Run custom scripts on threshold breach
HTTP Webhooks: POST to external API or service
Auto-Remediation: Restart services, clear caches, kill long-running requests
Logging: Write detailed entries to probe log file
Integration: Trigger PagerDuty, Slack, or other notification services

Alert Tuning

Reduce Noise: Eliminate false positive alerts
Rate Limiting: One alert per issue per time period
Quiet Hours: Suppress non-critical alerts during off-hours
Dependency Awareness: Don't alert on secondary issues from primary problem
Escalation Path: Progressive notifications if issue persists

Integration with Monitoring Tools

Load Balancer Health Checks

Purpose: Automatically remove unhealthy servers from rotation
Health Check URL: Probe endpoint returning 200 when healthy
Check Frequency: Every 5-30 seconds
Failure Threshold: Remove after 2-3 consecutive failures
Recovery: Re-add to pool after successful checks
Drain Mode: Stop sending new requests but allow existing to complete

APM Integration

FusionReactor: Native CF APM with extensive probes
New Relic: Cloud APM with custom metric support
AppDynamics: Enterprise APM platform
Datadog: Infrastructure and application monitoring
Custom Metrics: Push CF probe data to APM systems

Log Aggregation

Splunk: Index and search CF probe logs
ELK Stack: Elasticsearch, Logstash, Kibana
CloudWatch: AWS native log monitoring
Structured Logging: JSON format for easy parsing
Correlation: Link probe events with application logs

Dashboards and Visualization

Grafana: Open-source dashboard platform
Custom Dashboards: Build application-specific views
Real-Time: Live updating metrics
Historical: Trend analysis and capacity planning
Alerts: Visual indication of threshold breaches

Best Practices

Production Monitoring

Enable probes on all production servers
Configure alerts to on-call team members
Set warning thresholds before critical levels
Test alert delivery regularly
Document alert response procedures
Review probe logs during post-mortems
Correlate probe alerts with application errors
Use probes for capacity planning decisions

Threshold Configuration

Start with conservative thresholds
Monitor for false positives and adjust
Account for normal traffic patterns
Set different thresholds for different times (peak vs. off-peak)
Review and update thresholds quarterly
Document threshold decisions and rationale

Alert Response

Create runbooks for common alert scenarios
Define escalation procedures
Track mean time to acknowledge (MTTA)
Track mean time to resolution (MTTR)
Conduct post-incident reviews
Continuously improve response procedures

Security Considerations

Restrict access to probe endpoints by IP
Require authentication for detailed metrics
Don't expose sensitive data in health checks
Use HTTPS for probe endpoints
Monitor for probe endpoint abuse or DDoS
Rate-limit probe endpoint access

Performance Impact

Keep probe logic lightweight (< 10ms execution time)
Cache probe results if appropriate (30-60 seconds)
Avoid heavy operations in health checks
Don't query large datasets for probes
Test probe performance under load
Monitor probe execution time itself

Common Issues and Solutions

False Positive Alerts

Symptom: Alerts triggered without actual problems
Causes: Thresholds too aggressive, temporary spikes normal
Solution: Increase thresholds or require sustained breach
Duration: Require metric above threshold for 2-5 minutes
Review: Analyze historical data to set better thresholds

Missed Issues

Symptom: Problems occur without probe alerts
Causes: Thresholds too permissive, wrong metrics monitored
Solution: Lower thresholds or add additional probes
Gap Analysis: Review incidents and ensure relevant probes exist
Comprehensive: Monitor all critical resource types

Alert Fatigue

Symptom: Team ignores alerts due to high volume
Causes: Too many low-priority alerts, duplicate notifications
Solution: Tune thresholds, implement rate limiting
Prioritization: Only alert on actionable issues
Escalation: Route non-critical alerts differently

Probe Endpoint Unavailable

Symptom: Health checks fail incorrectly
Causes: Web server issue, IP restriction, CF restart
Solution: Ensure probe endpoint is lightweight and reliable
Redundancy: Multiple health check mechanisms
Testing: Verify probe accessibility during deployments

Delayed Notifications

Symptom: Alerts arrive minutes or hours after issue
Causes: Probe check interval too long, email delays
Solution: Increase probe frequency, use faster notification methods
Real-Time: Use push notifications or webhooks instead of email
Monitoring: Monitor notification delivery time

Advanced Patterns

Composite Health Scores

Combine multiple metrics into overall health score
Weight metrics by criticality
Single number indicating overall system health (0-100)
Easy consumption by non-technical stakeholders

Predictive Alerting

Analyze metric trends to predict issues
Alert on trajectory toward threshold (will hit in X minutes)
Machine learning for anomaly detection
Proactive intervention before user impact

Auto-Remediation

Automatically respond to certain alert conditions
Clear caches, restart services, kill long-running requests
Implement circuit breakers for external dependencies
Log all automatic actions for audit trail
Notify team of automatic remediation actions

Dependency Tracking

Map dependencies between services
Suppress secondary alerts when primary service fails
Visualize cascade effects
Focus response on root cause

Metrics to Monitor

Server-Level Metrics

CPU usage (process and system)
Memory usage (heap and non-heap)
Disk I/O and space
Network bandwidth
Thread count and state

Application-Level Metrics

Request queue depth
Active request count
Average/max request duration
Request throughput (requests/second)
Error rate and types

Database Metrics

Connection pool usage
Query execution time
Database server health
Connection errors
Query queue depth

Session Metrics

Active session count
Session creation rate
Session timeout rate
Average session size
Total session memory

Cache Metrics

Template cache hit ratio
Query cache hit ratio
Object cache size and hits
Cache eviction rate
Cache memory usage

Related Resources

Debugging & Logging - Debug Output SettingsDebug configuration and output control Debugging & Logging - Logging SettingsLog file configuration and management Server Settings - Java and JVMMemory configuration and JVM tuning Request TuningThread and queue configuration for optimal performance ColdFusion Administrator ReferenceComplete administrator guide and settings reference