TL;DR
-
I build a market anomaly “safety layer” that an investing agent can consult before acting: “Is the market in a normal regime or an incident regime?”
-
In trust mode (normal conditions), we keep alerts rare to avoid false alarms.
-
In incident mode (stress/volatility spikes), we switch policy: adaptive alert-rate (review budget) to stay resilient under regime shifts.
Outline (headings)
-
Why agentic investing needs a safety layer (security, resilience, trust)
-
Dataset (S&P 500 5-year OHLCV) + quick sanity checks
-
Building a market-stress signal (returns + cross-sectional volatility)
-
Trust mode policy (conservative thresholding)
-
Incident mode policy (adaptive alert-rate / review budget)
-
Results (plots + “what changed during stress”)
-
How an investing agent would use this (guardrail logic)
-
Limitations + next steps

2.2 Data card
| rows | tickers | start_date | end_date | missing_open | missing_high | missing_low | missing_close | |
|---|---|---|---|---|---|---|---|---|
| 619040 | 505 | 2013-02-08 | 2018-02-07 | 11 | 8 | 8 | 0 |
This is good news: the dataset is clean enough to build reliable daily features with minimal preprocessing. We’ll just decide how to handle the small OHLC missingness (drop those rows or forward-fill per ticker).
2.3 Safety policy preview (one paragraph + pseudocode)
We’ll implement a simple two-mode policy:
-
Trust mode (normal): agent can trade normally, but with conservative checks.
-
Incident mode (stress): agent tightens controls (smaller position sizes, tighter risk limits, or human confirmation).
if market_stress > incident_threshold:
incident_mode() # reduce risk + increase scrutiny
else:
trust_mode() # normal operation
Data preparation + feature definitions (results-first)
3.1 Cleaning (minimal and explicit)
This dataset is already clean, with only a handful of missing OHLC values. For reproducibility and simplicity, we drop rows where open/high/low are missing. This avoids hidden assumptions and keeps our features well-defined.
We also:
-
parse
date -
sort by
(Name, date) -
compute per-ticker log returns (stable for volatility features)

Clean + compute features + save summary
import numpy as np
import pandas as pd
# 1) Drop rows with missing OHLC (minimal cleaning)
before = len(df)
df_clean = df.dropna(subset=["open", "high", "low"]).copy()
after = len(df_clean)
# 2) Sort and compute per-ticker log returns
df_clean = df_clean.sort_values(["Name", "date"]).reset_index(drop=True)
df_clean["log_close"] = np.log(df_clean["close"].astype(float))
df_clean["logret"] = df_clean.groupby("Name")["log_close"].diff()
# 3) Market-level features
mkt = (
df_clean.groupby("date")
.agg(
xs_vol=("logret", "std"), # cross-sectional volatility (stress proxy)
mean_ret=("logret", "mean"), # equal-weight mean return proxy
mean_vol=("volume", "mean"), # average volume proxy
n=("logret", "count"), # number of tickers contributing
)
.dropna()
.reset_index()
)
# 4) Rolling stress + incident threshold
mkt["xs_vol_roll20"] = mkt["xs_vol"].rolling(20, min_periods=5).mean()
incident_pct = 95
thr = float(np.nanpercentile(mkt["xs_vol"], incident_pct))
mkt["incident"] = (mkt["xs_vol"] > thr).astype(int)
# 5) Tiny summary card for the post
summary = {
"rows_before": before,
"rows_after_dropna_ohlc": after,
"dropped_rows": before - after,
"dates": int(mkt["date"].nunique()),
"avg_tickers_per_day": float(mkt["n"].mean()),
"incident_threshold_pct": incident_pct,
"incident_threshold_value": thr,
"incident_days_rate": float(mkt["incident"].mean()),
}
pd.DataFrame([summary])
| rows_before | rows_after_dropna_ohlc | dropped_rows | dates | avg_tickers_per_day | incident_threshold_pct | incident_threshold_value | incident_days_rate | |
|---|---|---|---|---|---|---|---|---|
| 0 | 619040 | 619029 | 11 | 1258 | 491.672496 | 95 | 0.020871 | 0.050079 |
Safety policy: trust mode vs incident mode (agent guardrail logic)
4.1 Two-mode safety layer (the core idea)
A trading agent should not behave the same way in calm markets and stressed markets. We implement a simple, auditable two-mode policy:
Trust mode (normal regime)
When market stress is below the incident threshold, the agent can operate with standard automation:
-
normal position sizing
-
standard execution logic
-
routine monitoring
Incident mode (stress regime)
When market stress exceeds the threshold (top ~5% of days in this dataset), the safety layer escalates:
-
reduce risk automatically (smaller position sizes / tighter leverage caps)
-
increase scrutiny (more conservative trade filters)
-
optionally require human confirmation for high-impact actions
This is the same “resilience knob” idea from cybersecurity: when base conditions shift, policy must adapt.
4.2 Concrete guardrail policy
Here’s a practical policy you can implement in an agent today:
If incident mode:
-
Cut max position size (e.g., 50–80% reduction)
-
Block new symbols (only trade tickers already held/watchlisted)
-
Require confirmation for large notional trades
-
Tighten execution constraints (limit orders only; wider slippage checks)
-
Increase monitoring frequency (more frequent evaluation of risk signals)
stress = market_stress[today]
if stress > incident_threshold:
mode = "INCIDENT"
max_position *= 0.3
require_confirmation_if(notional > N)
restrict_to_watchlist()
else:
mode = "TRUST"
normal_limits()
4.3 Data-backed calibration note
In this dataset, using a 95th percentile stress threshold produces incident mode on ~5.0% of trading days. That’s a reasonable operational rate: rare enough to maintain trust, frequent enough to catch meaningful stress events.
(From Segment 3: threshold ≈ 0.020871, incident rate ≈ 0.050.)
Figure 2 (Notebook cell): highlight incident days on the stress timeline
This produces and saves: figures/fig2_incident_days.png.
import os
import matplotlib.pyplot as plt
os.makedirs("figures", exist_ok=True)
# Scatter incident days on top of the stress curve
inc = mkt[mkt["incident"] == 1]
plt.figure()
plt.plot(mkt["date"], mkt["xs_vol_roll20"], label="20-day rolling stress")
plt.scatter(inc["date"], inc["xs_vol_roll20"], s=10, label="incident days (top 5%)")
plt.axhline(mkt["xs_vol_roll20"].quantile(0.95), linestyle="--", label="rolling stress 95th pct (visual guide)")
plt.xlabel("date")
plt.ylabel("rolling stress (20d mean xs_vol)")
plt.title("Incident days highlighted (stress-based regime switch)")
plt.legend()
plt.tight_layout()
plt.savefig("figures/fig2_incident_days.png", dpi=200)
plt.show()
print("Saved: figures/fig2_incident_days.png")

From “stress signal” to anomaly detector (scoring + alert policies)
5.1 Turning market stress into an anomaly score

5.2 Two deployment policies (trust vs incident)
Just like in the cybersecurity post, performance depends on policy:
Trust-mode threshold (quiet by default)
-
Choose a high percentile threshold (e.g., 95th)
-
You get ~5% incident days in this dataset
-
This keeps alerts rare and builds trust
Incident-mode alert-rate (resilience knob)
When the agent or user declares “elevated risk,” switch from a fixed threshold to:
-
alert top X% of days (or tighten the percentile to 97–99%)
-
optionally combine with additional checks (volume spikes, drawdowns, news)
The key idea is the same:
Resilience = adaptive alerting under regime shift.

Agent guardrail implementation (mode switching + auditability)
6.1 The guardrail interface: one function the agent must call
A safety layer is only useful if the agent actually consults it. The cleanest pattern is to require a single “mode lookup” call before any high-impact action:
-
Input: today’s date
-
Output:
TRUST,INCIDENT, orEXTREME -
Plus: a short decision record (so you can audit later)
We’ll base the mode on the normalized stress score stress_z:
-
stress_z >= 1.8085→ INCIDENT (top 5% stress) -
stress_z >= 2.1842→ INCIDENT (tight) (top 3% stress) -
stress_z >= 3.6239→ EXTREME (top 1% stress)
This gives you a single knob: tighten or relax the trigger depending on how cautious you want the agent to be.
6.2 What changes by mode (trust → resilience)
Here’s a practical default policy set:
TRUST (normal operations)
-
normal position sizing
-
normal automation
-
allow new symbols if they pass standard checks
INCIDENT (stress regime)
-
reduce max position size (e.g., ×0.3)
-
restrict to watchlist / existing holdings
-
require confirmation for large notional trades
-
use limit orders only (avoid slippage surprises)
EXTREME (rare, high stress)
-
pause autonomous trading OR require confirmation for all trades
-
focus on risk reduction (hedge, rebalance to cash, tighten stops)
-
increase monitoring frequency
The key is that the agent remains useful, but becomes safer when the environment is less predictable.
6.3 Code: mode function + “decision record”
from dataclasses import dataclass
# thresholds from Figure 3
P95 = 1.8085408824587415
P97 = 2.184172679824946
P99 = 3.6239406886467354
@dataclass
class SafetyDecision:
date: str
stress_z: float
mode: str
risk_multiplier: float
notes: str
def safety_mode_for_date(date, mkt_df, use="p95"):
"""
Returns TRUST / INCIDENT / EXTREME and a risk multiplier.
use: "p95" (default), "p97" (tighter), or "p99" (extreme-only)
"""
row = mkt_df.loc[mkt_df["date"] == pd.to_datetime(date)]
if row.empty:
return SafetyDecision(str(date), float("nan"), "UNKNOWN", 0.0, "No market data for date")
z = float(row["stress_z"].iloc[0])
# choose the incident trigger level
incident_thr = {"p95": P95, "p97": P97, "p99": P99}[use]
if z >= P99:
return SafetyDecision(str(date), z, "EXTREME", 0.0, "Pause or require confirmation for all trades")
elif z >= incident_thr:
return SafetyDecision(str(date), z, "INCIDENT", 0.3, "Reduce size, tighten constraints, consider confirmation")
else:
return SafetyDecision(str(date), z, "TRUST", 1.0, "Normal operation")
# Example: pick the most recent date in the dataset
example_date = str(mkt["date"].max().date())
decision = safety_mode_for_date(example_date, mkt, use="p95")
decision
How to use it in an agent loop
decision = safety_layer(today)
if decision.mode == EXTREME:
do_not_trade_without_human()
elif decision.mode == INCIDENT:
trade_with_reduced_risk(decision.risk_multiplier)
else:
trade_normally()
log(decision)
6.4 Why this supports “trust”
This pattern creates auditable behavior:
-
Every trade can be linked to a mode decision and a stress score.
-
If something goes wrong, you can explain why the agent was allowed to act and under what regime.
That’s the “trust” part: not just accuracy, but governance.
Results & demonstration (what the safety layer actually does)
7.1 How often does the agent enter incident mode?
Using the stress score thresholds (Figure 3), the market regime split over the dataset is:
-
TRUST: 1,191 / 1,258 days (≈94.7%)
-
INCIDENT (≥ p95): 50 / 1,258 days (≈4.0%)
-
EXTREME (≥ p99): 13 / 1,258 days (≈1.0%)
Interpretation:
-
The agent remains in normal autonomous mode most of the time.
-
Escalation is rare (good for trust), but not vanishingly rare (good for resilience).
This is the practical “trust vs resilience” calibration: you can choose thresholds so incident mode is a small fraction of time, then tighten or relax depending on your risk tolerance.
7.2 Demonstration: “worst days first” (triage view)
A safety layer should support rapid decision-making. A simple demonstration is to list the top stress days, which becomes a triage queue:
-
On EXTREME days, the system recommends pausing autonomous trading or requiring confirmation.
-
On INCIDENT days, the system recommends reduced risk and tighter constraints.
This is analogous to top-K alerting in cybersecurity: the highest-scoring days are the ones you review first.
Code: show the top 10 stress days + mode
This produces a small table that’s perfect to embed in the post.
<?php
import pandas as pd
import numpy as np
top = (
mkt.sort_values("stress_z", ascending=False)
.loc[:, ["date","xs_vol","xs_vol_roll20","stress_z"]]
.head(10)
.copy()
)
def mode_from_z(z):
if z >= P99:
return "EXTREME"
elif z >= P95:
return "INCIDENT"
else:
return "TRUST"
top["mode"] = top["stress_z"].map(mode_from_z)
top
| date | xs_vol | xs_vol_roll20 | stress_z | mode | |
|---|---|---|---|---|---|
| 756 | 2016-02-11 | 0.025267 | 0.023314 | 4.267417 | EXTREME |
| 757 | 2016-02-12 | 0.020915 | 0.023250 | 4.239872 | EXTREME |
| 760 | 2016-02-18 | 0.019035 | 0.023156 | 4.199892 | EXTREME |
| 758 | 2016-02-16 | 0.018177 | 0.023144 | 4.194787 | EXTREME |
| 759 | 2016-02-17 | 0.019501 | 0.023133 | 4.189833 | EXTREME |
| 755 | 2016-02-10 | 0.021353 | 0.023132 | 4.189587 | EXTREME |
| 754 | 2016-02-09 | 0.026353 | 0.022863 | 4.074646 | EXTREME |
| 761 | 2016-02-19 | 0.015958 | 0.022711 | 4.009542 | EXTREME |
| 753 | 2016-02-08 | 0.034655 | 0.022681 | 3.996723 | EXTREME |
| 762 | 2016-02-22 | 0.018931 | 0.022638 | 3.978242 | EXTREME |
7.3 What this means for an investing agent
This safety layer is not a trading strategy. It’s a control system that changes how the agent behaves:
-
Trust mode: automate normally, because the market is stable enough.
-
Incident mode: maintain functionality but reduce risk and increase scrutiny.
-
Extreme mode: pause or require confirmation, because mistakes are more costly.
This is exactly how you’d design safety for agentic systems: use measurable signals to adapt policy in a way that is auditable and explainable.
Integrating the safety layer into an agent workflow (pre-trade guardrails)
8.1 The pattern: “agent proposes → safety layer gates”
A practical way to make agentic systems safer is to separate:
-
proposal (agent suggests a trade)
-
authorization (safety layer decides whether/how it may execute)
This prevents a single bad reasoning step (or tool injection) from becoming an unbounded action.
We implement a pre-trade gate that uses the market regime:
-
TRUST: allow trades under normal limits
-
INCIDENT: allow only with reduced size + tighter constraints
-
EXTREME: require confirmation (or block autonomous execution)
8.2 Guardrail rules (simple but effective)
Here’s a minimal rule set that’s easy to explain:
Inputs
-
ticker,side,notional_usd -
date(today) -
watchlist/allowed_symbols
Rules
-
TRUST: allow if notional ≤ max_notional
-
INCIDENT: scale notional by 0.3 and restrict to watchlist
-
EXTREME: require confirmation for any trade (or block)
This is “security thinking” applied to finance:
-
least privilege (restrict symbols)
-
risk reduction (limit exposure)
-
human-in-the-loop only when needed (rare extreme days)
Code: pre-trade check + decision record
from dataclasses import dataclass
import pandas as pd
@dataclass
class TradeRequest:
date: str
ticker: str
side: str # "BUY" or "SELL"
notional_usd: float
@dataclass
class TradeDecision:
decision: str # "ALLOW", "ALLOW_WITH_RESTRICTIONS", "REQUIRE_CONFIRMATION", "BLOCK"
approved_notional: float
mode: str
reason: str
def pre_trade_check(req: TradeRequest, mkt_df, watchlist=None, use="p95",
max_notional_trust=10_000, max_notional_incident=3_000):
watchlist = set(watchlist or [])
safety = safety_mode_for_date(req.date, mkt_df, use=use)
# EXTREME: always require confirmation (or block autonomous trading)
if safety.mode == "EXTREME":
return TradeDecision(
decision="REQUIRE_CONFIRMATION",
approved_notional=0.0,
mode=safety.mode,
reason=f"EXTREME stress (z={safety.stress_z:.2f}). Pause autonomous trading."
)
# INCIDENT: restrict + reduce size
if safety.mode == "INCIDENT":
if watchlist and req.ticker not in watchlist:
return TradeDecision(
decision="BLOCK",
approved_notional=0.0,
mode=safety.mode,
reason=f"INCIDENT mode: ticker {req.ticker} not in watchlist."
)
approved = min(req.notional_usd * safety.risk_multiplier, max_notional_incident)
return TradeDecision(
decision="ALLOW_WITH_RESTRICTIONS",
approved_notional=approved,
mode=safety.mode,
reason=f"INCIDENT mode (z={safety.stress_z:.2f}): reduced size + tighter execution constraints."
)
# TRUST: normal constraints
approved = min(req.notional_usd, max_notional_trust)
return TradeDecision(
decision="ALLOW",
approved_notional=approved,
mode=safety.mode,
reason=f"TRUST mode (z={safety.stress_z:.2f}): normal operation."
)
# Demo: pick one normal day and one extreme day from your top-10 table
req_normal = TradeRequest(date="2018-02-07", ticker="AAPL", side="BUY", notional_usd=12_000)
req_extreme = TradeRequest(date="2016-02-11", ticker="AAPL", side="BUY", notional_usd=12_000)
watchlist = {"AAPL", "MSFT", "AMZN"}
print("Normal day:", pre_trade_check(req_normal, mkt, watchlist=watchlist))
print("Extreme day:", pre_trade_check(req_extreme, mkt, watchlist=watchlist))
Normal day: TradeDecision(decision=’ALLOW’, approved_notional=10000, mode=’TRUST’, reason=’TRUST mode (z=0.83): normal operation.’) Extreme day: TradeDecision(decision=’REQUIRE_CONFIRMATION’, approved_notional=0.0, mode=’EXTREME’, reason=’EXTREME stress (z=4.27). Pause autonomous trading.’)
8.3 Why this supports trust and resilience
-
Trust: Most days the agent proceeds normally, and every decision has a clear reason string (auditable).
-
Resilience: On stress days, the agent automatically reduces risk, restricts actions, or requires confirmation—without needing a new ML model.
Limitations and extensions (keep it honest, keep it useful)
9.1 What this safety layer does (and doesn’t) do
This post builds a market-regime safety layer, not a trading alpha model. It answers: “Is today a normal day or a stress day?” and adapts agent policy accordingly.
Limitations to be clear about:
-
Market-wide, not single-name: Cross-sectional volatility captures broad regime stress, but it won’t detect a single-stock blowup if the overall market is calm.
-
Daily resolution: This is end-of-day (or daily bar) data. Intraday shocks can happen faster than this signal updates.
-
No exogenous context: It doesn’t incorporate news, macro events, earnings, or order book liquidity.
-
Constituent coverage: The dataset has ~505 tickers/day on average, but membership and missingness can shift over time.
These are fine for a first safety layer; the key is that the control is auditable and policy-driven.
9.2 Extensions that make this production-ready
If you want to push this toward a PhD-ready line of work, here are natural next steps:
A) Add “severity” tiers (already compatible)
You already have:
-
TRUST / INCIDENT / EXTREME
You can refine this further by adding:
-
drawdown-based severity (e.g., rolling 20d drawdown)
-
liquidity stress (median volume collapse / spread proxy if available)
B) Add “local” stress signals
To handle single-name risk:
-
compute per-sector volatility (group tickers by sector if you join metadata)
-
compute single-ticker anomaly score (e.g., z-score of returns vs its own history)
-
combine:
global_stress+local_stress
C) Make it more “security-like”
For agentic systems research, the key is policy enforcement under uncertainty:
-
treat the safety layer as an authorization gate before tool execution
-
log every decision (mode, score, threshold version)
-
add “tamper resistance” (agent can’t bypass the guardrail)
D) Evaluate against known stress events (optional)
You can map spikes to historical episodes (e.g., Feb 2016 market turbulence) and show that the safety layer escalates at sensible times. (We’ll only do this if you want to add web-cited context later.)
Wrap-up + reproducibility
10.1 What we built
-
A simple, interpretable market stress score from S&P 500 daily data
-
A two-mode safety policy (trust vs incident) with an extreme tier
-
A pre-trade gate that outputs an auditable decision record
10.2 Why it matters for agentic investing
The core idea generalizes:
-
In calm regimes, automation builds value.
-
In stressed regimes, automation needs resilience controls: reduced risk, tighter constraints, and human-in-the-loop for extreme conditions.
This is how you build trustworthy agentic systems: not by assuming perfect prediction, but by designing policies that adapt safely when conditions change.