Process Scheduler Down

Tailored Operational Context
  • Target Database:
  • Context Type:
  • Alert Severity:
  • Triggered Time:
  • Firing Context:

Process Scheduler Down Alert

Alert ID: process_scheduler_down Category: Process Scheduler Default threshold: 10 minutes

What This Alert Detects

This alert triggers when any active Process Scheduler server registered in PSSERVERSTAT has not reported a status update (heartbeat) within the configured amount of time.

Severity Logic

ConditionSeverity
Heartbeat stale by more than thresholdMinutesWarning
Heartbeat stale by more than thresholdMinutes × 2Critical

For example, with the default threshold of 10 minutes:

  • A scheduler that hasn’t heartbeat’ed for 12 minutes → Warning
  • A scheduler that hasn’t heartbeat’ed for 22 minutes → Critical

What Gets Checked

The alert queries the PSSERVERSTAT table to retrieve all server status definitions. For each active scheduler (status not Down/Offline), it calculates the elapsed time since its LASTUPDDTTM timestamp. If that time exceeds the configured threshold, the alert fires.

Alert Details

Each alert item includes:

  • Server name (SERVERNAME)
  • Current status code and friendly string status (e.g., Running, Error, Suspended)
  • Last heartbeat timestamp (LASTUPDDTTM)
  • Host name (SRVRHOSTNAME)
  • A detailed explanation of how long the heartbeat has been stale
  • A link to the Server Definition detail page for that server

Configuration

alerts:
  checks:
    process_scheduler_down:
      enabled: true
      thresholdMinutes: 10          # Minutes stale before flagging as Warning
      excludeProcesses:             # Server names (e.g., PSUNX, PSNT) to skip
        - PSUNX_OLD
SettingDefaultDescription
thresholdMinutes10Minutes of stale heartbeat status updates before a scheduler triggers a Warning alert. Critical fires at 2× this value.
excludeProcesses[]List of server names to exclude from this check. Use for retired scheduler definitions that linger in PSSERVERSTAT but aren’t cleaned up.

How to Respond

  1. Click the alert link to go directly to the Server Definition detail page for the affected scheduler.
  2. Check the Host Name where the Process Scheduler daemon runs.
  3. Access the server host and verify whether the Process Scheduler processes (e.g., psadmin, PSAESRV, etc.) are running.
  4. Review the Process Scheduler logs (e.g., TUXLOG, SCHED_*.LOG) on the host machine to diagnose why the process has hung or crashed.
  5. If the scheduler has hung, stop the process scheduler daemon and restart it using psadmin.
  6. If the server definition is obsolete or decommissioned, consider deleting it in PeopleSoft Server Definitions configuration to clean up the PSSERVERSTAT row.