TL;DR

I build a market anomaly “safety layer” that an investing agent can consult before acting: “Is the market in a normal regime or an incident regime?”
In trust mode (normal conditions), we keep alerts rare to avoid false alarms.
In incident mode (stress/volatility spikes), we switch policy: adaptive alert-rate (review budget) to stay resilient under regime shifts.

Outline (headings)

Why agentic investing needs a safety layer (security, resilience, trust)
Dataset (S&P 500 5-year OHLCV) + quick sanity checks
Building a market-stress signal (returns + cross-sectional volatility)
Trust mode policy (conservative thresholding)
Incident mode policy (adaptive alert-rate / review budget)
Results (plots + “what changed during stress”)
How an investing agent would use this (guardrail logic)
Limitations + next steps

Line chart of market stress proxy from 2013–2018 with spikes and a dashed incident threshold. — Figure 1. Cross-sectional volatility of daily returns across S&P 500 constituents. Days above the 95th percentile represent incident-like regimes where the agent should escalate safety policy.

2.2 Data card

rows	tickers	start_date	end_date	missing_open	missing_high	missing_low	missing_close
619040	505	2013-02-08	2018-02-07	11	8	8	0

This is good news: the dataset is clean enough to build reliable daily features with minimal preprocessing. We’ll just decide how to handle the small OHLC missingness (drop those rows or forward-fill per ticker).

2.3 Safety policy preview (one paragraph + pseudocode)

We’ll implement a simple two-mode policy:

Trust mode (normal): agent can trade normally, but with conservative checks.
Incident mode (stress): agent tightens controls (smaller position sizes, tighter risk limits, or human confirmation).

if market_stress > incident_threshold:
    incident_mode()  # reduce risk + increase scrutiny
else:
    trust_mode()     # normal operation

Data preparation + feature definitions (results-first)

3.1 Cleaning (minimal and explicit)

This dataset is already clean, with only a handful of missing OHLC values. For reproducibility and simplicity, we drop rows where open/high/low are missing. This avoids hidden assumptions and keeps our features well-defined.

We also:

parse date
sort by (Name, date)
compute per-ticker log returns (stable for volatility features)

Clean + compute features + save summary

import numpy as np
import pandas as pd

# 1) Drop rows with missing OHLC (minimal cleaning)
before = len(df)
df_clean = df.dropna(subset=["open", "high", "low"]).copy()
after = len(df_clean)

# 2) Sort and compute per-ticker log returns
df_clean = df_clean.sort_values(["Name", "date"]).reset_index(drop=True)
df_clean["log_close"] = np.log(df_clean["close"].astype(float))
df_clean["logret"] = df_clean.groupby("Name")["log_close"].diff()

# 3) Market-level features
mkt = (
    df_clean.groupby("date")
    .agg(
        xs_vol=("logret", "std"),        # cross-sectional volatility (stress proxy)
        mean_ret=("logret", "mean"),     # equal-weight mean return proxy
        mean_vol=("volume", "mean"),     # average volume proxy
        n=("logret", "count"),           # number of tickers contributing
    )
    .dropna()
    .reset_index()
)

# 4) Rolling stress + incident threshold
mkt["xs_vol_roll20"] = mkt["xs_vol"].rolling(20, min_periods=5).mean()
incident_pct = 95
thr = float(np.nanpercentile(mkt["xs_vol"], incident_pct))
mkt["incident"] = (mkt["xs_vol"] > thr).astype(int)

# 5) Tiny summary card for the post
summary = {
    "rows_before": before,
    "rows_after_dropna_ohlc": after,
    "dropped_rows": before - after,
    "dates": int(mkt["date"].nunique()),
    "avg_tickers_per_day": float(mkt["n"].mean()),
    "incident_threshold_pct": incident_pct,
    "incident_threshold_value": thr,
    "incident_days_rate": float(mkt["incident"].mean()),
}

pd.DataFrame([summary])

	rows_before	rows_after_dropna_ohlc	dropped_rows	dates	avg_tickers_per_day	incident_threshold_pct	incident_threshold_value	incident_days_rate
0	619040	619029	11	1258	491.672496	95	0.020871	0.050079

Safety policy: trust mode vs incident mode (agent guardrail logic)

4.1 Two-mode safety layer (the core idea)

A trading agent should not behave the same way in calm markets and stressed markets. We implement a simple, auditable two-mode policy:

Trust mode (normal regime)

When market stress is below the incident threshold, the agent can operate with standard automation:

normal position sizing
standard execution logic
routine monitoring

Incident mode (stress regime)

When market stress exceeds the threshold (top ~5% of days in this dataset), the safety layer escalates:

reduce risk automatically (smaller position sizes / tighter leverage caps)
increase scrutiny (more conservative trade filters)
optionally require human confirmation for high-impact actions

This is the same “resilience knob” idea from cybersecurity: when base conditions shift, policy must adapt.

4.2 Concrete guardrail policy

Here’s a practical policy you can implement in an agent today:

If incident mode:

Cut max position size (e.g., 50–80% reduction)
Block new symbols (only trade tickers already held/watchlisted)
Require confirmation for large notional trades
Tighten execution constraints (limit orders only; wider slippage checks)
Increase monitoring frequency (more frequent evaluation of risk signals)

stress = market_stress[today]

if stress > incident_threshold:
    mode = "INCIDENT"
    max_position *= 0.3
    require_confirmation_if(notional > N)
    restrict_to_watchlist()
else:
    mode = "TRUST"
    normal_limits()

4.3 Data-backed calibration note

In this dataset, using a 95th percentile stress threshold produces incident mode on ~5.0% of trading days. That’s a reasonable operational rate: rare enough to maintain trust, frequent enough to catch meaningful stress events.

(From Segment 3: threshold ≈ 0.020871, incident rate ≈ 0.050.)

Figure 2 (Notebook cell): highlight incident days on the stress timeline

This produces and saves: figures/fig2_incident_days.png.

import os
import matplotlib.pyplot as plt

os.makedirs("figures", exist_ok=True)

# Scatter incident days on top of the stress curve
inc = mkt[mkt["incident"] == 1]

plt.figure()
plt.plot(mkt["date"], mkt["xs_vol_roll20"], label="20-day rolling stress")
plt.scatter(inc["date"], inc["xs_vol_roll20"], s=10, label="incident days (top 5%)")
plt.axhline(mkt["xs_vol_roll20"].quantile(0.95), linestyle="--", label="rolling stress 95th pct (visual guide)")
plt.xlabel("date")
plt.ylabel("rolling stress (20d mean xs_vol)")
plt.title("Incident days highlighted (stress-based regime switch)")
plt.legend()
plt.tight_layout()
plt.savefig("figures/fig2_incident_days.png", dpi=200)
plt.show()

print("Saved: figures/fig2_incident_days.png")

Time series of rolling stress with highlighted points indicating incident days. — Incident days (top 5% stress) highlighted on the rolling market-stress signal. This defines when the agent switches from trust mode to incident mode.

From “stress signal” to anomaly detector (scoring + alert policies)

5.1 Turning market stress into an anomaly score

5.2 Two deployment policies (trust vs incident)

Just like in the cybersecurity post, performance depends on policy:

Trust-mode threshold (quiet by default)

Choose a high percentile threshold (e.g., 95th)
You get ~5% incident days in this dataset
This keeps alerts rare and builds trust

Incident-mode alert-rate (resilience knob)

When the agent or user declares “elevated risk,” switch from a fixed threshold to:

alert top X% of days (or tighten the percentile to 97–99%)
optionally combine with additional checks (volume spikes, drawdowns, news)

The key idea is the same:

Resilience = adaptive alerting under regime shift.

Histogram of stress z-scores with three vertical threshold lines at the 95th, 97th, and 99th percentiles. — Distribution of normalized market stress scores (z-score of rolling cross-sectional volatility). Vertical lines show thresholds that can define trust-mode vs stricter incident triggers.

Agent guardrail implementation (mode switching + auditability)

6.1 The guardrail interface: one function the agent must call

A safety layer is only useful if the agent actually consults it. The cleanest pattern is to require a single “mode lookup” call before any high-impact action:

Input: today’s date
Output: TRUST, INCIDENT, or EXTREME
Plus: a short decision record (so you can audit later)

We’ll base the mode on the normalized stress score stress_z:

stress_z >= 1.8085 → INCIDENT (top 5% stress)
stress_z >= 2.1842 → INCIDENT (tight) (top 3% stress)
stress_z >= 3.6239 → EXTREME (top 1% stress)

This gives you a single knob: tighten or relax the trigger depending on how cautious you want the agent to be.

6.2 What changes by mode (trust → resilience)

Here’s a practical default policy set:

TRUST (normal operations)

normal position sizing
normal automation
allow new symbols if they pass standard checks

INCIDENT (stress regime)

reduce max position size (e.g., ×0.3)
restrict to watchlist / existing holdings
require confirmation for large notional trades
use limit orders only (avoid slippage surprises)

EXTREME (rare, high stress)

pause autonomous trading OR require confirmation for all trades
focus on risk reduction (hedge, rebalance to cash, tighten stops)
increase monitoring frequency

The key is that the agent remains useful, but becomes safer when the environment is less predictable.

6.3 Code: mode function + “decision record”

from dataclasses import dataclass

# thresholds from Figure 3
P95 = 1.8085408824587415
P97 = 2.184172679824946
P99 = 3.6239406886467354

@dataclass
class SafetyDecision:
    date: str
    stress_z: float
    mode: str
    risk_multiplier: float
    notes: str

def safety_mode_for_date(date, mkt_df, use="p95"):
    """
    Returns TRUST / INCIDENT / EXTREME and a risk multiplier.
    use: "p95" (default), "p97" (tighter), or "p99" (extreme-only)
    """
    row = mkt_df.loc[mkt_df["date"] == pd.to_datetime(date)]
    if row.empty:
        return SafetyDecision(str(date), float("nan"), "UNKNOWN", 0.0, "No market data for date")

    z = float(row["stress_z"].iloc[0])

    # choose the incident trigger level
    incident_thr = {"p95": P95, "p97": P97, "p99": P99}[use]

    if z >= P99:
        return SafetyDecision(str(date), z, "EXTREME", 0.0, "Pause or require confirmation for all trades")
    elif z >= incident_thr:
        return SafetyDecision(str(date), z, "INCIDENT", 0.3, "Reduce size, tighten constraints, consider confirmation")
    else:
        return SafetyDecision(str(date), z, "TRUST", 1.0, "Normal operation")

# Example: pick the most recent date in the dataset
example_date = str(mkt["date"].max().date())
decision = safety_mode_for_date(example_date, mkt, use="p95")
decision

How to use it in an agent loop

decision = safety_layer(today)

if decision.mode == EXTREME:
    do_not_trade_without_human()
elif decision.mode == INCIDENT:
    trade_with_reduced_risk(decision.risk_multiplier)
else:
    trade_normally()
log(decision)

6.4 Why this supports “trust”

This pattern creates auditable behavior:

Every trade can be linked to a mode decision and a stress score.
If something goes wrong, you can explain why the agent was allowed to act and under what regime.

That’s the “trust” part: not just accuracy, but governance.

Results & demonstration (what the safety layer actually does)

7.1 How often does the agent enter incident mode?

Using the stress score thresholds (Figure 3), the market regime split over the dataset is:

TRUST: 1,191 / 1,258 days (≈94.7%)
INCIDENT (≥ p95): 50 / 1,258 days (≈4.0%)
EXTREME (≥ p99): 13 / 1,258 days (≈1.0%)

Interpretation:

The agent remains in normal autonomous mode most of the time.
Escalation is rare (good for trust), but not vanishingly rare (good for resilience).

This is the practical “trust vs resilience” calibration: you can choose thresholds so incident mode is a small fraction of time, then tighten or relax depending on your risk tolerance.

7.2 Demonstration: “worst days first” (triage view)

A safety layer should support rapid decision-making. A simple demonstration is to list the top stress days, which becomes a triage queue:

On EXTREME days, the system recommends pausing autonomous trading or requiring confirmation.
On INCIDENT days, the system recommends reduced risk and tighter constraints.

This is analogous to top-K alerting in cybersecurity: the highest-scoring days are the ones you review first.

Code: show the top 10 stress days + mode

This produces a small table that’s perfect to embed in the post.

<?php

import pandas as pd
import numpy as np

top = (
    mkt.sort_values("stress_z", ascending=False)
       .loc[:, ["date","xs_vol","xs_vol_roll20","stress_z"]]
       .head(10)
       .copy()
)

def mode_from_z(z):
    if z >= P99:
        return "EXTREME"
    elif z >= P95:
        return "INCIDENT"
    else:
        return "TRUST"

top["mode"] = top["stress_z"].map(mode_from_z)
top

	date	xs_vol	xs_vol_roll20	stress_z	mode
756	2016-02-11	0.025267	0.023314	4.267417	EXTREME
757	2016-02-12	0.020915	0.023250	4.239872	EXTREME
760	2016-02-18	0.019035	0.023156	4.199892	EXTREME
758	2016-02-16	0.018177	0.023144	4.194787	EXTREME
759	2016-02-17	0.019501	0.023133	4.189833	EXTREME
755	2016-02-10	0.021353	0.023132	4.189587	EXTREME
754	2016-02-09	0.026353	0.022863	4.074646	EXTREME
761	2016-02-19	0.015958	0.022711	4.009542	EXTREME
753	2016-02-08	0.034655	0.022681	3.996723	EXTREME
762	2016-02-22	0.018931	0.022638	3.978242	EXTREME

7.3 What this means for an investing agent

This safety layer is not a trading strategy. It’s a control system that changes how the agent behaves:

Trust mode: automate normally, because the market is stable enough.
Incident mode: maintain functionality but reduce risk and increase scrutiny.
Extreme mode: pause or require confirmation, because mistakes are more costly.

This is exactly how you’d design safety for agentic systems: use measurable signals to adapt policy in a way that is auditable and explainable.

Integrating the safety layer into an agent workflow (pre-trade guardrails)

8.1 The pattern: “agent proposes → safety layer gates”

A practical way to make agentic systems safer is to separate:

proposal (agent suggests a trade)
authorization (safety layer decides whether/how it may execute)

This prevents a single bad reasoning step (or tool injection) from becoming an unbounded action.

We implement a pre-trade gate that uses the market regime:

TRUST: allow trades under normal limits
INCIDENT: allow only with reduced size + tighter constraints
EXTREME: require confirmation (or block autonomous execution)

8.2 Guardrail rules (simple but effective)

Here’s a minimal rule set that’s easy to explain:

Inputs

ticker, side, notional_usd
date (today)
watchlist / allowed_symbols

Rules

TRUST: allow if notional ≤ max_notional
INCIDENT: scale notional by 0.3 and restrict to watchlist
EXTREME: require confirmation for any trade (or block)

This is “security thinking” applied to finance:

least privilege (restrict symbols)
risk reduction (limit exposure)
human-in-the-loop only when needed (rare extreme days)

Code: pre-trade check + decision record

from dataclasses import dataclass
import pandas as pd

@dataclass
class TradeRequest:
    date: str
    ticker: str
    side: str          # "BUY" or "SELL"
    notional_usd: float

@dataclass
class TradeDecision:
    decision: str      # "ALLOW", "ALLOW_WITH_RESTRICTIONS", "REQUIRE_CONFIRMATION", "BLOCK"
    approved_notional: float
    mode: str
    reason: str

def pre_trade_check(req: TradeRequest, mkt_df, watchlist=None, use="p95",
                    max_notional_trust=10_000, max_notional_incident=3_000):
    watchlist = set(watchlist or [])

    safety = safety_mode_for_date(req.date, mkt_df, use=use)

    # EXTREME: always require confirmation (or block autonomous trading)
    if safety.mode == "EXTREME":
        return TradeDecision(
            decision="REQUIRE_CONFIRMATION",
            approved_notional=0.0,
            mode=safety.mode,
            reason=f"EXTREME stress (z={safety.stress_z:.2f}). Pause autonomous trading."
        )

    # INCIDENT: restrict + reduce size
    if safety.mode == "INCIDENT":
        if watchlist and req.ticker not in watchlist:
            return TradeDecision(
                decision="BLOCK",
                approved_notional=0.0,
                mode=safety.mode,
                reason=f"INCIDENT mode: ticker {req.ticker} not in watchlist."
            )
        approved = min(req.notional_usd * safety.risk_multiplier, max_notional_incident)
        return TradeDecision(
            decision="ALLOW_WITH_RESTRICTIONS",
            approved_notional=approved,
            mode=safety.mode,
            reason=f"INCIDENT mode (z={safety.stress_z:.2f}): reduced size + tighter execution constraints."
        )

    # TRUST: normal constraints
    approved = min(req.notional_usd, max_notional_trust)
    return TradeDecision(
        decision="ALLOW",
        approved_notional=approved,
        mode=safety.mode,
        reason=f"TRUST mode (z={safety.stress_z:.2f}): normal operation."
    )

# Demo: pick one normal day and one extreme day from your top-10 table
req_normal  = TradeRequest(date="2018-02-07", ticker="AAPL", side="BUY", notional_usd=12_000)
req_extreme = TradeRequest(date="2016-02-11", ticker="AAPL", side="BUY", notional_usd=12_000)

watchlist = {"AAPL", "MSFT", "AMZN"}

print("Normal day:", pre_trade_check(req_normal, mkt, watchlist=watchlist))
print("Extreme day:", pre_trade_check(req_extreme, mkt, watchlist=watchlist))

Normal day: TradeDecision(decision=’ALLOW’, approved_notional=10000, mode=’TRUST’, reason=’TRUST mode (z=0.83): normal operation.’) Extreme day: TradeDecision(decision=’REQUIRE_CONFIRMATION’, approved_notional=0.0, mode=’EXTREME’, reason=’EXTREME stress (z=4.27). Pause autonomous trading.’)

8.3 Why this supports trust and resilience

Trust: Most days the agent proceeds normally, and every decision has a clear reason string (auditable).
Resilience: On stress days, the agent automatically reduces risk, restricts actions, or requires confirmation—without needing a new ML model.

Limitations and extensions (keep it honest, keep it useful)

9.1 What this safety layer does (and doesn’t) do

This post builds a market-regime safety layer, not a trading alpha model. It answers: “Is today a normal day or a stress day?” and adapts agent policy accordingly.

Limitations to be clear about:

Market-wide, not single-name: Cross-sectional volatility captures broad regime stress, but it won’t detect a single-stock blowup if the overall market is calm.
Daily resolution: This is end-of-day (or daily bar) data. Intraday shocks can happen faster than this signal updates.
No exogenous context: It doesn’t incorporate news, macro events, earnings, or order book liquidity.
Constituent coverage: The dataset has ~505 tickers/day on average, but membership and missingness can shift over time.

These are fine for a first safety layer; the key is that the control is auditable and policy-driven.

9.2 Extensions that make this production-ready

If you want to push this toward a PhD-ready line of work, here are natural next steps:

A) Add “severity” tiers (already compatible)

You already have:

TRUST / INCIDENT / EXTREME

You can refine this further by adding:

drawdown-based severity (e.g., rolling 20d drawdown)
liquidity stress (median volume collapse / spread proxy if available)

B) Add “local” stress signals

To handle single-name risk:

compute per-sector volatility (group tickers by sector if you join metadata)
compute single-ticker anomaly score (e.g., z-score of returns vs its own history)
combine: global_stress + local_stress

C) Make it more “security-like”

For agentic systems research, the key is policy enforcement under uncertainty:

treat the safety layer as an authorization gate before tool execution
log every decision (mode, score, threshold version)
add “tamper resistance” (agent can’t bypass the guardrail)

D) Evaluate against known stress events (optional)

You can map spikes to historical episodes (e.g., Feb 2016 market turbulence) and show that the safety layer escalates at sensible times. (We’ll only do this if you want to add web-cited context later.)

Wrap-up + reproducibility

10.1 What we built

A simple, interpretable market stress score from S&P 500 daily data
A two-mode safety policy (trust vs incident) with an extreme tier
A pre-trade gate that outputs an auditable decision record

10.2 Why it matters for agentic investing

The core idea generalizes:

In calm regimes, automation builds value.
In stressed regimes, automation needs resilience controls: reduced risk, tighter constraints, and human-in-the-loop for extreme conditions.

This is how you build trustworthy agentic systems: not by assuming perfect prediction, but by designing policies that adapt safely when conditions change.

Market Anomaly Safety Layer for Investing Agents (S&P 500, 5 Years): Trust Mode vs Incident Mode

ByTimothy Adegbola