Process Scheduler Down
Categories:
- Target Database: —
- Context Type: —
- Alert Severity: —
- Triggered Time: —
- Firing Context:
—
Process Scheduler Down Alert
Alert ID: process_scheduler_down
Category: Process Scheduler
Default threshold: 10 minutes
What This Alert Detects
This alert triggers when any active Process Scheduler server registered in PSSERVERSTAT has not reported a status update (heartbeat) within the configured amount of time.
The alert automatically ignores servers whose status is explicitly set to "1" (Down) or "7" (Suspended - Offline), as these represent intentionally stopped or offline schedulers. It will only flag active server configurations (e.g., Running, Suspended, Error, Overloaded) that have stalled or stopped updating.
Severity Logic
| Condition | Severity |
|---|---|
Heartbeat stale by more than thresholdMinutes | Warning |
Heartbeat stale by more than thresholdMinutes × 2 | Critical |
For example, with the default threshold of 10 minutes:
- A scheduler that hasn’t heartbeat’ed for 12 minutes → Warning
- A scheduler that hasn’t heartbeat’ed for 22 minutes → Critical
What Gets Checked
The alert queries the PSSERVERSTAT table to retrieve all server status definitions. For each active scheduler (status not Down/Offline), it calculates the elapsed time since its LASTUPDDTTM timestamp. If that time exceeds the configured threshold, the alert fires.
Alert Details
Each alert item includes:
- Server name (
SERVERNAME) - Current status code and friendly string status (e.g., Running, Error, Suspended)
- Last heartbeat timestamp (
LASTUPDDTTM) - Host name (
SRVRHOSTNAME) - A detailed explanation of how long the heartbeat has been stale
- A link to the Server Definition detail page for that server
Configuration
alerts:
checks:
process_scheduler_down:
enabled: true
thresholdMinutes: 10 # Minutes stale before flagging as Warning
excludeProcesses: # Server names (e.g., PSUNX, PSNT) to skip
- PSUNX_OLD
| Setting | Default | Description |
|---|---|---|
thresholdMinutes | 10 | Minutes of stale heartbeat status updates before a scheduler triggers a Warning alert. Critical fires at 2× this value. |
excludeProcesses | [] | List of server names to exclude from this check. Use for retired scheduler definitions that linger in PSSERVERSTAT but aren’t cleaned up. |
How to Respond
- Click the alert link to go directly to the Server Definition detail page for the affected scheduler.
- Check the Host Name where the Process Scheduler daemon runs.
- Access the server host and verify whether the Process Scheduler processes (e.g.,
psadmin,PSAESRV, etc.) are running. - Review the Process Scheduler logs (e.g.,
TUXLOG,SCHED_*.LOG) on the host machine to diagnose why the process has hung or crashed. - If the scheduler has hung, stop the process scheduler daemon and restart it using
psadmin. - If the server definition is obsolete or decommissioned, consider deleting it in PeopleSoft Server Definitions configuration to clean up the
PSSERVERSTATrow.