# Process Scheduler Down

LLMS index: [llms.txt](/llms.txt)

---

<div id="pslens-context-panel" class="card border-info mb-4 d-none">
  <div class="card-header bg-light text-info py-2 fw-bold d-flex align-items-center border-bottom border-info-subtle">
    <i class="bi bi-info-circle-fill me-2"></i>
    <span>Tailored Operational Context</span>
  </div>
  <div class="card-body p-0">
    <ul class="list-group list-group-flush">
      <li id="row-db" class="list-group-item d-flex align-items-center justify-content-between py-2 d-none">
        <strong>Target Database:</strong>
        <span id="ctx-db" class="badge bg-secondary font-monospace">&mdash;</span>
      </li>
      <li id="row-type" class="list-group-item d-flex align-items-center justify-content-between py-2 d-none">
        <strong>Context Type:</strong>
        <span id="ctx-type" class="badge bg-light text-dark border font-monospace text-uppercase">&mdash;</span>
      </li>
      <li id="row-severity" class="list-group-item d-flex align-items-center justify-content-between py-2 d-none">
        <strong>Alert Severity:</strong>
        <span id="ctx-severity" class="badge">&mdash;</span>
      </li>
      <li id="row-time" class="list-group-item d-flex align-items-center justify-content-between py-2 d-none">
        <strong>Triggered Time:</strong>
        <span id="ctx-time" class="text-muted small">&mdash;</span>
      </li>
      <li id="row-details" class="list-group-item py-2 d-none">
        <strong id="label-details" class="d-block mb-1">Firing Context:</strong>
        <code id="ctx-details" class="d-block p-2 bg-light border rounded small" style="white-space: pre-wrap; word-break: break-all;">&mdash;</code>
      </li>
    </ul>
  </div>
</div>

<script>
  (function() {
    const params = new URLSearchParams(window.location.search);
    const metadata = params.get('metadata');
    if (!metadata) return;

    try {
      
      const base64 = metadata.replace(/-/g, '+').replace(/_/g, '/');
      const jsonStr = decodeURIComponent(escape(window.atob(base64)));
      const data = JSON.parse(jsonStr);

      if (data) {
        let hasData = false;

        if (data.db) {
          document.getElementById('ctx-db').textContent = data.db;
          document.getElementById('row-db').classList.remove('d-none');
          hasData = true;
        }

        if (data.type) {
          document.getElementById('ctx-type').textContent = data.type;
          document.getElementById('row-type').classList.remove('d-none');
          hasData = true;
        }

        if (data.severity) {
          const severityBadge = document.getElementById('ctx-severity');
          const severity = data.severity.toLowerCase();
          severityBadge.textContent = severity.toUpperCase();
          if (severity === 'critical') {
            severityBadge.className = 'badge bg-danger';
          } else if (severity === 'warning') {
            severityBadge.className = 'badge bg-warning text-dark';
          } else {
            severityBadge.className = 'badge bg-info';
          }
          document.getElementById('row-severity').classList.remove('d-none');
          hasData = true;
        }

        if (data.t) {
          const date = new Date(data.t * 1000);
          document.getElementById('ctx-time').textContent = date.toLocaleString();
          document.getElementById('row-time').classList.remove('d-none');
          hasData = true;
        }

        if (data.details) {
          document.getElementById('ctx-details').textContent = data.details;

          
          const labelDetails = document.getElementById('label-details');
          if (data.type === 'object') {
            labelDetails.textContent = 'Object Metadata Details:';
          } else if (data.type === 'report') {
            labelDetails.textContent = 'Report Description:';
          } else {
            labelDetails.textContent = 'Firing Context:';
          }

          document.getElementById('row-details').classList.remove('d-none');
          hasData = true;
        }

        if (hasData) {
          document.getElementById('pslens-context-panel').classList.remove('d-none');
        }
      }
    } catch (e) {
      console.error('Failed to parse operational context metadata:', e);
    }
  })();
</script>


## Process Scheduler Down Alert

**Alert ID:** `process_scheduler_down`
**Category:** Process Scheduler
**Default threshold:** 10 minutes

### What This Alert Detects

This alert triggers when any active Process Scheduler server registered in `PSSERVERSTAT` has not reported a status update (heartbeat) within the configured amount of time.

> [!NOTE]
> The alert automatically ignores servers whose status is explicitly set to `"1"` (Down) or `"7"` (Suspended - Offline), as these represent intentionally stopped or offline schedulers. It will only flag active server configurations (e.g., Running, Suspended, Error, Overloaded) that have stalled or stopped updating.

### Severity Logic

|                  Condition                  | Severity |
| ------------------------------------------- | -------- |
| Heartbeat stale by more than `thresholdMinutes`     | Warning  |
| Heartbeat stale by more than `thresholdMinutes × 2` | Critical |

For example, with the default threshold of 10 minutes:

- A scheduler that hasn't heartbeat'ed for 12 minutes → **Warning**
- A scheduler that hasn't heartbeat'ed for 22 minutes → **Critical**

### What Gets Checked

The alert queries the `PSSERVERSTAT` table to retrieve all server status definitions. For each active scheduler (status not Down/Offline), it calculates the elapsed time since its `LASTUPDDTTM` timestamp. If that time exceeds the configured threshold, the alert fires.

### Alert Details

Each alert item includes:

- Server name (`SERVERNAME`)
- Current status code and friendly string status (e.g., Running, Error, Suspended)
- Last heartbeat timestamp (`LASTUPDDTTM`)
- Host name (`SRVRHOSTNAME`)
- A detailed explanation of how long the heartbeat has been stale
- A link to the Server Definition detail page for that server

### Configuration

```yaml
alerts:
  checks:
    process_scheduler_down:
      enabled: true
      thresholdMinutes: 10          # Minutes stale before flagging as Warning
      excludeProcesses:             # Server names (e.g., PSUNX, PSNT) to skip
        - PSUNX_OLD
```

|      Setting       | Default |                                                            Description                                                             |
| ------------------ | ------- | ---------------------------------------------------------------------------------------------------------------------------------- |
| `thresholdMinutes` | `10`    | Minutes of stale heartbeat status updates before a scheduler triggers a Warning alert. Critical fires at 2× this value.            |
| `excludeProcesses` | `[]`    | List of server names to exclude from this check. Use for retired scheduler definitions that linger in `PSSERVERSTAT` but aren't cleaned up. |

### How to Respond

1. Click the alert link to go directly to the Server Definition detail page for the affected scheduler.
2. Check the **Host Name** where the Process Scheduler daemon runs.
3. Access the server host and verify whether the Process Scheduler processes (e.g., `psadmin`, `PSAESRV`, etc.) are running.
4. Review the Process Scheduler logs (e.g., `TUXLOG`, `SCHED_*.LOG`) on the host machine to diagnose why the process has hung or crashed.
5. If the scheduler has hung, stop the process scheduler daemon and restart it using `psadmin`.
6. If the server definition is obsolete or decommissioned, consider deleting it in PeopleSoft Server Definitions configuration to clean up the `PSSERVERSTAT` row.
