Process Errors

Tailored Operational Context
  • Target Database:
  • Context Type:
  • Alert Severity:
  • Triggered Time:
  • Firing Context:

Process Errors Alert

Alert ID: process_errors Category: Process Scheduler Default lookback: 24 hours

What This Alert Detects

This alert finds Process Scheduler requests that have failed within a configurable lookback window. It catches processes that ended in one of three error statuses:

Run StatusPeopleSoft CodeMeaning
Error3The process ended with an error condition
Not Successful10The process ran but reported a non-success result
Unable to Post12The process output could not be delivered

Severity Logic

Process TypeStatusSeverity
Recurring (on a recurrence schedule)Error (3), Not Successful (10), Unable to Post (12)Critical
Non-Recurring (ad-hoc execution)Error (3), Not Successful (10), Unable to Post (12)Warning
  • Recurring Processes: Any failure fires Critical immediately.
  • Non-Recurring Processes: Fire Warning after the thresholdMinutes grace period.

Alert Details

Each alert item includes:

  • Process name and instance number
  • Run status label (Error, Not Successful, Unable to Post)
  • The operator who submitted the request
  • When the process ran
  • A link to the Process Monitor detail page for that instance

Configuration

alerts:
  checks:
    process_errors:
      enabled: true
      lookbackHours: 24        # How far back to look for failures
      thresholdMinutes: 15     # Grace period buffer in minutes for non-recurring errors
      excludeProcesses:        # Process names to skip
        - KNOWN_FLAKY_PROCESS
SettingDefaultDescription
lookbackHours24Number of hours back to search for failed processes
thresholdMinutes0Grace period buffer (in minutes) for non-recurring process errors before they raise a Warning alert.
excludeProcesses[]List of process names to exclude from this check

How to Respond

  1. Click the alert link to go directly to the Process Monitor entry for the failed process
  2. Review the process details: run status, begin and end times, server
  3. Look for output files or log information that might explain the failure
  4. Check whether this is a one-time failure or a repeating issue
  5. If the process needs to be rerun, submit a new request from PeopleSoft

Common Causes of Process Failures

  • Data errors: The process encountered unexpected data (null values, bad formats, constraint violations)
  • Resource issues: The server ran out of memory or disk space
  • Timeout: The process exceeded its allowed run time
  • Configuration problems: A required configuration parameter is missing or incorrect
  • Dependency failures: A process that runs after another failed because the first one didn’t complete correctly

Reducing Alert Noise

If certain processes fail regularly and you’re already tracking them separately, add them to excludeProcesses to keep the alert list focused on unexpected failures.