Process Scheduler
Process Scheduler alerts: long-running processes, process errors, backlogged processes, locked operators, and critical process monitoring.
Process Scheduler alerts monitor your PeopleSoft batch processing environment for errors, stalls, and missing critical runs.
| Alert | Description |
|---|
| Long-Running Processes | Processes that have been running (Initiated or Processing) longer than the configured threshold |
| Process Errors | Processes that ended in Error, Not Successful, or Unable to Post status within the lookback window |
| Backlogged Processes | Queued or blocked processes whose scheduled run time has passed by more than the configured threshold |
| Locked OPRID Scheduled Processes | Queued or scheduled processes where the submitting operator account is locked |
| Process Run Check | Configured critical processes that have not run successfully within their expected time window |
| No Process Completed | No process has successfully completed within the lookback window. May indicate the scheduler is down. |
| Process Scheduler Down | Schedulers that have not updated their heartbeat status in PSSERVERSTAT recently |
1 - Long-Running Processes
- Target Database:
—
- Context Type:
—
- Alert Severity:
—
- Triggered Time:
—
- Firing Context:
—
Long-Running Processes Alert
Alert ID: long_running_processes
Category: Process Scheduler
Default threshold: 20 minutes
What This Alert Detects
This alert finds Process Scheduler requests that are currently in Initiated or Processing status and have been running longer than the configured time threshold. A process that has been running for a long time may be stuck, consuming excessive server resources, or waiting on a lock or resource that will never become available.
Severity Logic
| Condition | Severity |
|---|
Running longer than thresholdMinutes | Warning |
For example, with the default threshold of 20 minutes:
- A process running for 25 minutes or more → Warning
What Gets Checked
The alert queries the Process Scheduler request table for processes in run status 6 (Initiated) or 7 (Processing). For each result, it calculates how long the process has been running based on its BeginDttm (begin datetime) and the current server time.
Processes with no BeginDttm value are skipped (the process hasn’t truly started yet).
Alert Details
Each alert item includes:
- Process name (
PRCSNAME) - Process instance number
- How long the process has been running (in minutes)
- The operator who submitted the request
- A link to the Process Monitor detail page for that instance
Configuration
alerts:
checks:
long_running_processes:
enabled: true
thresholdMinutes: 20 # Minutes before flagging as Warning
excludeProcesses: # Process names to skip
- SOME_LONG_BATCH_JOB
| Setting | Default | Description |
|---|
thresholdMinutes | 20 | Minutes a process must be running to trigger a Warning alert. |
excludeProcesses | [] | List of process names to exclude from this check. Use for known long-running processes that are expected to take a long time. |
How to Respond
- Click the alert link to go directly to the Process Monitor entry for the flagged process
- Review the process details: what it is, who submitted it, when it started
- Check whether the process appears to be making progress or is stuck
- If the process is genuinely stuck, you may need to cancel it from PeopleSoft’s Process Monitor
- Investigate why it got stuck: look for locks, resource contention, or data issues
Tuning the Threshold
The right threshold depends on your environment. If you have batch jobs that are expected to run for 30-60 minutes, set thresholdMinutes to something higher than your longest expected normal runtime. You can also use excludeProcesses to exclude specific jobs from the check rather than raising the threshold for everything.
2 - Process Errors
- Target Database:
—
- Context Type:
—
- Alert Severity:
—
- Triggered Time:
—
- Firing Context:
—
Process Errors Alert
Alert ID: process_errors
Category: Process Scheduler
Default lookback: 24 hours
What This Alert Detects
This alert finds Process Scheduler requests that have failed within a configurable lookback window. It catches processes that ended in one of three error statuses:
| Run Status | PeopleSoft Code | Meaning |
|---|
| Error | 3 | The process ended with an error condition |
| Not Successful | 10 | The process ran but reported a non-success result |
| Unable to Post | 12 | The process output could not be delivered |
Severity Logic
| Process Type | Status | Severity |
|---|
| Recurring (on a recurrence schedule) | Error (3), Not Successful (10), Unable to Post (12) | Critical |
| Non-Recurring (ad-hoc execution) | Error (3), Not Successful (10), Unable to Post (12) | Warning |
- Recurring Processes: Any failure fires Critical immediately.
- Non-Recurring Processes: Fire Warning after the
thresholdMinutes grace period.
Alert Details
Each alert item includes:
- Process name and instance number
- Run status label (Error, Not Successful, Unable to Post)
- The operator who submitted the request
- When the process ran
- A link to the Process Monitor detail page for that instance
Configuration
alerts:
checks:
process_errors:
enabled: true
lookbackHours: 24 # How far back to look for failures
thresholdMinutes: 15 # Grace period buffer in minutes for non-recurring errors
excludeProcesses: # Process names to skip
- KNOWN_FLAKY_PROCESS
| Setting | Default | Description |
|---|
lookbackHours | 24 | Number of hours back to search for failed processes |
thresholdMinutes | 0 | Grace period buffer (in minutes) for non-recurring process errors before they raise a Warning alert. |
excludeProcesses | [] | List of process names to exclude from this check |
How to Respond
- Click the alert link to go directly to the Process Monitor entry for the failed process
- Review the process details: run status, begin and end times, server
- Look for output files or log information that might explain the failure
- Check whether this is a one-time failure or a repeating issue
- If the process needs to be rerun, submit a new request from PeopleSoft
Common Causes of Process Failures
- Data errors: The process encountered unexpected data (null values, bad formats, constraint violations)
- Resource issues: The server ran out of memory or disk space
- Timeout: The process exceeded its allowed run time
- Configuration problems: A required configuration parameter is missing or incorrect
- Dependency failures: A process that runs after another failed because the first one didn’t complete correctly
Reducing Alert Noise
If certain processes fail regularly and you’re already tracking them separately, add them to excludeProcesses to keep the alert list focused on unexpected failures.
3 - Backlogged Processes
- Target Database:
—
- Context Type:
—
- Alert Severity:
—
- Triggered Time:
—
- Firing Context:
—
Backlogged Processes Alert
Alert ID: backlogged_processes
Category: Process Scheduler
Default threshold: 30 minutes
What This Alert Detects
This alert finds Process Scheduler requests that are in Queued or Blocked status and whose scheduled run time (RUNDTTM) has already passed by more than the configured threshold.
Severity Logic
| Condition | Severity |
|---|
Overdue by more than thresholdMinutes | Warning |
Overdue by more than thresholdMinutes × 2 | Critical |
For example, with the default threshold of 30 minutes:
- A process scheduled 40 minutes ago that is still queued → Warning
- A process scheduled 65 minutes ago that is still queued → Critical
What Gets Checked
The alert queries the Process Scheduler request table for processes in run status 5 (Queued) or 18 (Blocked) whose RUNDTTM (scheduled run datetime) is in the past. For each result, it calculates how far past the scheduled time the process is based on RUNDTTM and the current server time.
Processes with no RUNDTTM value are skipped.
Alert Details
Each alert item includes:
- Process name (
PRCSNAME) - Process instance number
- How long the process is overdue (in minutes)
- Current run status (Queued or Blocked)
- The operator who submitted the request
- A link to the Process Monitor detail page for that instance
Configuration
alerts:
checks:
backlogged_processes:
enabled: true
thresholdMinutes: 30 # Minutes overdue before flagging as Warning
excludeProcesses: # Process names to skip
- SOME_LOW_PRIORITY_JOB
| Setting | Default | Description |
|---|
thresholdMinutes | 30 | Minutes past the scheduled run time before a queued/blocked process triggers a Warning alert. Critical fires at 2× this value. |
excludeProcesses | [] | List of process names to exclude from this check. Use for processes that are known to queue for a long time and are not a concern. |
How to Respond
- Click the alert link to go directly to the Process Monitor entry for the flagged process
- Check whether the Process Scheduler server is running and accepting work
- Look at how many processes are currently running on the server. It may have hit its concurrency limit
- Check if the process type or class has reached its maximum allowed concurrent instances
- For blocked processes, investigate what is blocking them (dependencies, server restrictions, etc.)
- If the Process Scheduler server is down, restart it from PeopleSoft’s Process Scheduler administration
Tuning the Threshold
The right threshold depends on how busy your Process Scheduler is. In environments where many jobs are submitted at once, some queuing is normal. Set thresholdMinutes high enough to avoid false positives during peak batch windows but low enough to catch genuine problems. You can also use excludeProcesses to exclude specific low-priority processes that are known to queue for long periods.
4 - Locked OPRID Scheduled Processes
- Target Database:
—
- Context Type:
—
- Alert Severity:
—
- Triggered Time:
—
- Firing Context:
—
Locked OPRID Scheduled Processes Alert
Alert ID: locked_oprid_processes
Category: Process Scheduler
What This Alert Detects
This alert finds queued or scheduled Process Scheduler requests where the submitting operator’s account (OPRID) is currently locked in PSOPRDEFN (ACCTLOCK = 1).
When an operator account is locked after a process has been queued, PeopleSoft will refuse to run the process, or run it under the locked account and immediately fail. PeopleSoft does not surface this condition anywhere obvious: Process Monitor shows the job queued, the operator’s user page shows them locked, but nothing connects the two. This alert does.
Common scenarios:
- A service or batch account had its password expire and was locked
- An employee left and their account was locked, but scheduled jobs were not transferred
- A security lockout from failed login attempts affected a batch account
Severity Logic
All findings are reported at Warning severity. Every queued or scheduled process with a locked submitting account is flagged.
What Gets Checked
The alert queries PSPRCSRQST joined to PSOPRDEFN for process requests in Queued or Scheduled run status where the submitting OPRID has ACCTLOCK = 1.
Alert Details
Each alert item includes:
- Process name and instance number
- Submitting OPRID (with link to User detail page)
- Current run status (Queued, Scheduled, etc.)
- Scheduled run date/time
- Recurrence name (if applicable)
Configuration
alerts:
checks:
locked_oprid_processes:
enabled: true
excludeProcesses: [] # Process names to ignore
| Setting | Default | Description |
|---|
excludeProcesses | [] | List of process names to exclude from this check |
How to Respond
- Click the alert link to open the Process Monitor detail page for the affected instance
- Identify the locked OPRID shown in the alert
- Navigate to the User detail page to review the account lock status
- Either unlock the account (if appropriate) or re-queue the process under an active operator account
- For recurring processes, update the recurrence definition to use a non-locked operator
- Investigate why the account was locked. If it was a failed login lockout, check the Failed Logins alert for additional context
Tables Queried
| Table | Description |
|---|
| PSPRCSRQST | Process Scheduler request queue |
| PSOPRDEFN | Operator definitions (user accounts) |
5 - Process Run Check
- Target Database:
—
- Context Type:
—
- Alert Severity:
—
- Triggered Time:
—
- Firing Context:
—
Process Run Check Alert
Alert ID: process_run_check
Category: Process Scheduler
What This Alert Detects
This alert monitors configured critical processes and fires when one has not completed successfully within its expected time window. It is the alert equivalent of the Process Run Check report. The difference is that this runs automatically on every check cycle and surfaces failures on the dashboard without any manual action.
Use this alert for processes that must run on a regular cadence, such as:
- Nightly batch jobs that must complete before business hours
- Data synchronization processes that run every few hours
- Critical integrations that should run multiple times per day
- Post-maintenance verification of essential processes
Severity Logic
| Condition | Severity |
|---|
| Process has run recently but not successfully in the configured window | Warning |
| Process has no run history at all | Critical |
Configuration
Process checks are configured per process name in config.yaml. Each entry specifies the process name and the number of hours within which a successful run is expected.
alerts:
checks:
process_run_check:
enabled: true
processChecks:
SOMEJOBNAME: 24 # Must run successfully within 24 hours
ANOTHERJOB: 8 # Must run successfully within 8 hours
NIGHTLY_ETL: 12 # Must run successfully within 12 hours
| Setting | Default | Description |
|---|
processChecks | {} | Map of process name to expected run window in hours |
If a process name is listed with 0 or a negative value, the check defaults to a 24-hour window.
What Gets Checked
For each configured process, psLens queries PSPRCSRQST for successful runs (RunStatus = 9 / Success) within the configured time window. If none are found, it then checks for any run history to determine severity:
- No successful run in window + recent run history found: Warning
- No run history at all: Critical
Alert Details
Each alert item includes:
- Process name
- Configured threshold (hours)
- Last known run status (if any history exists)
- Last known run time (if any history exists)
- Link to the Process Definition detail page
How to Respond
- Click the alert link to open the Process Definition detail page for the affected process
- Review recent run history to understand what happened. Did the process run but fail, or did it not run at all?
- Check the Process Scheduler server configuration if the process never ran
- Investigate error logs if the process ran but ended in a failed state
- If the process ran and succeeded but outside the expected window, consider adjusting the threshold in
config.yaml
Tables Queried
| Table | Description |
|---|
| PSPRCSRQST | Process Scheduler request queue and run history |
6 - Process Scheduler Down
- Target Database:
—
- Context Type:
—
- Alert Severity:
—
- Triggered Time:
—
- Firing Context:
—
Process Scheduler Down Alert
Alert ID: process_scheduler_down
Category: Process Scheduler
Default threshold: 10 minutes
What This Alert Detects
This alert triggers when any active Process Scheduler server registered in PSSERVERSTAT has not reported a status update (heartbeat) within the configured amount of time.
Note
The alert automatically ignores servers whose status is explicitly set to "1" (Down) or "7" (Suspended - Offline), as these represent intentionally stopped or offline schedulers. It will only flag active server configurations (e.g., Running, Suspended, Error, Overloaded) that have stalled or stopped updating.
Severity Logic
| Condition | Severity |
|---|
Heartbeat stale by more than thresholdMinutes | Warning |
Heartbeat stale by more than thresholdMinutes × 2 | Critical |
For example, with the default threshold of 10 minutes:
- A scheduler that hasn’t heartbeat’ed for 12 minutes → Warning
- A scheduler that hasn’t heartbeat’ed for 22 minutes → Critical
What Gets Checked
The alert queries the PSSERVERSTAT table to retrieve all server status definitions. For each active scheduler (status not Down/Offline), it calculates the elapsed time since its LASTUPDDTTM timestamp. If that time exceeds the configured threshold, the alert fires.
Alert Details
Each alert item includes:
- Server name (
SERVERNAME) - Current status code and friendly string status (e.g., Running, Error, Suspended)
- Last heartbeat timestamp (
LASTUPDDTTM) - Host name (
SRVRHOSTNAME) - A detailed explanation of how long the heartbeat has been stale
- A link to the Server Definition detail page for that server
Configuration
alerts:
checks:
process_scheduler_down:
enabled: true
thresholdMinutes: 10 # Minutes stale before flagging as Warning
excludeProcesses: # Server names (e.g., PSUNX, PSNT) to skip
- PSUNX_OLD
| Setting | Default | Description |
|---|
thresholdMinutes | 10 | Minutes of stale heartbeat status updates before a scheduler triggers a Warning alert. Critical fires at 2× this value. |
excludeProcesses | [] | List of server names to exclude from this check. Use for retired scheduler definitions that linger in PSSERVERSTAT but aren’t cleaned up. |
How to Respond
- Click the alert link to go directly to the Server Definition detail page for the affected scheduler.
- Check the Host Name where the Process Scheduler daemon runs.
- Access the server host and verify whether the Process Scheduler processes (e.g.,
psadmin, PSAESRV, etc.) are running. - Review the Process Scheduler logs (e.g.,
TUXLOG, SCHED_*.LOG) on the host machine to diagnose why the process has hung or crashed. - If the scheduler has hung, stop the process scheduler daemon and restart it using
psadmin. - If the server definition is obsolete or decommissioned, consider deleting it in PeopleSoft Server Definitions configuration to clean up the
PSSERVERSTAT row.
7 - No Process Completed
- Target Database:
—
- Context Type:
—
- Alert Severity:
—
- Triggered Time:
—
- Firing Context:
—
No Process Completed Alert
Alert ID: no_process_completed
Category: Process Scheduler
Default lookback: 1 hour
What This Alert Detects
This alert fires when no process has successfully completed within the configured lookback window. It is a broad scheduler health check. If nothing has finished successfully in the past hour, the Process Scheduler may be down, stalled, or not dispatching jobs.
This is distinct from the Process Run Check, which monitors specific named processes. This alert monitors overall scheduler activity.
Severity Logic
| Condition | Severity |
|---|
| Zero successful completions in the lookback window | Warning |
What Gets Checked
The alert queries PSPRCSRQST for any process with RunStatus = 9 (Success) and an end datetime within the lookback window. If no rows are returned, the alert fires.
Only one result is needed to resolve the alert. The check uses a limit of 1 for efficiency.
Alert Details
When firing, the alert produces a single item:
- Summary:
No process completed successfully in the last N hour(s) - Lookback hours used for the check
Configuration
alerts:
checks:
no_process_completed:
enabled: true
lookbackHours: 1 # How far back to look for completed processes
| Setting | Default | Description |
|---|
lookbackHours | 1 | How many hours back to look for a successfully completed process. |
How to Respond
- Check PeopleSoft’s Process Monitor to see if any processes are running, queued, or have recently completed
- Verify the Process Scheduler server is running (PeopleSoft > PeopleTools > Process Scheduler > Servers)
- If processes are queued but not running, the scheduler daemon may need to be restarted
- If this fires regularly during off-hours when no jobs run, increase
lookbackHours or disable the alert for those periods
Tuning
If your environment has periods where no batch jobs are expected to run (e.g., overnight maintenance windows), consider increasing lookbackHours to cover those gaps, or disable the alert entirely during those windows.