Process Errors
Categories:
Tailored Operational Context
- Target Database: —
- Context Type: —
- Alert Severity: —
- Triggered Time: —
- Firing Context:
—
Process Errors Alert
Alert ID: process_errors
Category: Process Scheduler
Default lookback: 24 hours
What This Alert Detects
This alert finds Process Scheduler requests that have failed within a configurable lookback window. It catches processes that ended in one of three error statuses:
| Run Status | PeopleSoft Code | Meaning |
|---|---|---|
| Error | 3 | The process ended with an error condition |
| Not Successful | 10 | The process ran but reported a non-success result |
| Unable to Post | 12 | The process output could not be delivered |
Severity Logic
| Process Type | Status | Severity |
|---|---|---|
| Recurring (on a recurrence schedule) | Error (3), Not Successful (10), Unable to Post (12) | Critical |
| Non-Recurring (ad-hoc execution) | Error (3), Not Successful (10), Unable to Post (12) | Warning |
- Recurring Processes: Any failure fires Critical immediately.
- Non-Recurring Processes: Fire Warning after the
thresholdMinutesgrace period.
Alert Details
Each alert item includes:
- Process name and instance number
- Run status label (Error, Not Successful, Unable to Post)
- The operator who submitted the request
- When the process ran
- A link to the Process Monitor detail page for that instance
Configuration
alerts:
checks:
process_errors:
enabled: true
lookbackHours: 24 # How far back to look for failures
thresholdMinutes: 15 # Grace period buffer in minutes for non-recurring errors
excludeProcesses: # Process names to skip
- KNOWN_FLAKY_PROCESS
| Setting | Default | Description |
|---|---|---|
lookbackHours | 24 | Number of hours back to search for failed processes |
thresholdMinutes | 0 | Grace period buffer (in minutes) for non-recurring process errors before they raise a Warning alert. |
excludeProcesses | [] | List of process names to exclude from this check |
How to Respond
- Click the alert link to go directly to the Process Monitor entry for the failed process
- Review the process details: run status, begin and end times, server
- Look for output files or log information that might explain the failure
- Check whether this is a one-time failure or a repeating issue
- If the process needs to be rerun, submit a new request from PeopleSoft
Common Causes of Process Failures
- Data errors: The process encountered unexpected data (null values, bad formats, constraint violations)
- Resource issues: The server ran out of memory or disk space
- Timeout: The process exceeded its allowed run time
- Configuration problems: A required configuration parameter is missing or incorrect
- Dependency failures: A process that runs after another failed because the first one didn’t complete correctly
Reducing Alert Noise
If certain processes fail regularly and you’re already tracking them separately, add them to excludeProcesses to keep the alert list focused on unexpected failures.