research scoreboard
What works, and what doesn't
Across ~20 lanes on survivorship-free, multi-decade data. The deliverable was never a market-beating strategy — it is a machine that rejects bad ones honestly. Published whether they pass or fail.
- 22
- lanes tested
- 0
- beat the index on return
- 4
- usable — risk only
- loss / ruin / fail
- no usable edge
- a usable result
Selection / factors — all FAIL
10 lanes · 0 usablebreakout (standalone + satellite)
FAILNow FAILed OOS across multiple regimes (paid J-Quants 2018-2026, incl. 2018Q4/2020/2022 bears): the breakout portfolio loses to NISA after costs (−3.4%/yr vs +13.5%). Definitive, not a single-regime power artifact.
PEAD / earnings surprise
FAILIS +1.40pp → OOS −1.39pp sign-flip (overfit); daily t+5d window on 21k paid events also no edge after cost.
event-window / announcement timing
FAIL21k liquid events, drift −0.29% after 0.3% cost ≈ random; positive only in 2018Q4.
supply-demand / marginUNUSABLEReal information, but corr 0.94 to the index; the famous 信用倍率 has forward |r|≈0.07, 47% hit-rate = a coin.
Factor screen: real information (beats random 5/5) but corr 0.94 to the index; the neutral version is negative even gross. The famous 信用倍率 (market margin long/short, 465 weeks) is pure coincident sentiment too — forward |r|≈0.07, contrarian hit-rate 47% (a coin).quality / value
NO EDGEBeats the index on return but corr 0.96, loses the selloffs, and is reform-contingent — not a diversifier.
engineered multi-factor
NO EDGESector-neutral value+quality+mom+lowvol fixes the naive losses, matches the index risk-adjusted, doesn't beat it.
momentum / short / long-shortFAILAll lose to tax-free NISA OOS; the EDINET-reading LLM long/short book scores Spearman ≈ 0 — even at qwen-32b.
All lost to tax-free NISA out-of-sample. The most direct LLM test — ranking names by the model's reading of EDINET filings into a market-neutral long/short book (the NISA-impossible axis) — has no edge either: Spearman(score, fwd) ≈ −0.08 (t+21d) / 0.00 (t+63d), long-short spread −0.2pp, gross of borrow cost (186 PIT decisions, 2024-25). Re-running the same panel on a larger model (qwen2.5:32b) confirms it — Spearman still ≈ 0, spread sign-flips across horizons, and the bigger model is if anything more uniformly bullish (62% vs 25% high-conviction). Not a small-model artifact.forecast / guidance revisionNO EDGEA small, real gross edge (+0.46%/1m) — cost-killed (net −0.14%) and it never clears a random-selection control.
Upward FEPS-guidance revisers post a small, real, non-momentum gross edge over NISA EW (+0.46%/1m, +1.48%/3m pooled, n=75,542 signals over 9 regimes) that is fully erased by cost (net −0.14%/−0.02%) and never clears a same-universe random-selection control even at gross (1m +0.31% vs ctrl mean +0.45%, p95 +3.16%; cohort beats ctrl mean only 59/97 months). Momentum + monotonicity ablations PASS but the core spread + random gates FAIL. NO EDGE, not shippable.volume × fundamental-acceleration (conjunction)NO EDGE3m −0.24% net, fails the size-matched random control (p=0.58) — just re-discovers the closed breakout lane.
The Volume-Surge × Sales-Growth-Acceleration conjunction does NOT beat EW-NISA (3m excess −0.24% net, loses even gross; n=13,398 over 7 regimes), fails the size-matched random control (p=0.58), and fails the ablation gate (conjunction ≤ max(volume-only, growth-only)) at both horizons. The fundamental leg adds nothing — it re-discovers the already-closed breakout lane. Net-positive in only 2/7 regime years.passive core choice (quality / size / value index)NO DURABLE EDGENo NISA-eligible index durably beats broad TOPIX on Sharpe; the best tilt (+0.032) reverses post-2023.
The honest passive-choice question (not an active rule): is any NISA-eligible buy-and-hold index a durably higher-Sharpe core than broad cap-weighted TOPIX? Cached J-Quants /indices 2016-2026, fair same-basis (price-vs-price / TR-vs-TR), pre/post-2023 durability split. NO. Best edge is a large-cap tilt (TOPIX 100, +0.032 Sharpe) — within a fund fee AND it reverses post-2023 (1.614 < broad 1.643). Small/Mid caps are worse (Sharpe 0.62–0.72); Value only matches broad via the post-2023 reform burst (trails pre-2023); the ROE/quality index (likely JPX-Nikkei 400) trails broad on a fair TR basis (0.947 < 0.977). The eye-catching high-Sharpe names are artifacts — Total-Return versions (6000 = TOPIX + dividends; every index has one) and 2022-23-launched recency indices. Broad cap-weighted buy-and-hold stays the evidence-backed core; 'more return' options are more risk, not alpha.
Timing / leverage / structure
7 lanes · 0 usableleverage 2x / 3x
RUINFull-cycle 2x −99%, 3x −100% (below 1x); 30%/yr only in hindsight-selected bull windows.
concentration (1–few stocks)
LOTTERYSurvivorship-free: single-stock median 3.1%/yr < index 7.4%; only 0.8% hit ≥30%, 34% lose money.
contrarian / fade
FAIL−0.1%/yr, −82% DD. The winning opposite of a losing trade is HOLD, not reverse.
TA confluence + leverage
FAIL4 indicators (corr 0.6–0.8 = redundant) underperform 1 simple rule; leverage just adds full-cycle ruin.
seasonality (sell-in-May, turn-of-month)NOT TRADEABLEEvery harvest rule loses to buy-and-hold; on TOPIX 'sell in May' is even backwards (May–Oct was positive).
Every harvest rule loses to buy-hold (gives up exposure + pays turnover). Confirmed on the S&P (1927-2026) AND TOPIX (2016-2026): on TOPIX 'sell in May' is even backwards — May-Oct was positive and Nov-Apr-only badly trailed holding; turn-of-month returned −5.8%. Holding captures any real shape free.crypto added to trend (BTC)
ARTIFACT22%/yr is the one-time BTC adoption ramp (can't recur); ~zero forward information.
FX hedge: hedged vs unhedged S&P (JP holder)NO EDGE (a yen bet)A directional yen bet, not an edge: unhedged won only via the post-2013 yen collapse, and hedging costs the carry.
Is 為替ヘッジ better than unhedged for a JP NISA holder's S&P/オルカン core? FRED 2002-2026 (SPY×USDJPY vs SPY−carry, carry = US−JP 3M rate gap by covered interest parity). NO EDGE — a directional yen view, not a free lunch. Unhedged won the full window (CAGR 11.1% vs 8.7%) but ONLY via the post-2013 yen collapse: pre-2013 (a yen-STRENGTHENING decade) HEDGED was better (Sharpe 0.388 vs 0.285). FX is a regime wildcard, not a hedge — it amplified the 2008 crash (unhedged −58% vs hedged −51%, yen strengthened) but cushioned 2022 (−6% vs −20%, yen weakened). And hedging COSTS the carry (avg ~1.4%/yr here, ~5%/yr in 2023-24 — why JP hedged-US funds lagged badly). Unhedged = carry-free bet on continued yen weakness + deeper FX drawdowns; hedged = pay the differential to isolate the equity. Neither beats broad-index buy-and-hold; orthogonal to the gold-sleeve result.
The usable results — risk-REDUCTION (one NISA-usable), plus one tail signal
5 lanes · 4 usable12m-trend crash hedge (equity ↔ cash)USABLECuts S&P max-drawdown −86%→−52% across 4/4 sustained bears — but TAXABLE ONLY (selling burns the NISA quota).
S&P500 1927-2026: 4/4 sustained bears beat buy-hold + random + static; maxDD −86%→−52%, ~1 switch/yr. Helps sustained bears, not flash crashes. TAXABLE ONLY — inside NISA the 年360万 acquisition limit throttles re-entry to ~20%/yr, so after each exit you are stranded in cash through the recovery (terminal 310×→2.4×, recovery-capture 0.06–0.18). Use it in a separate taxable sleeve, never inside NISA.multi-asset (equity + gold) trend diversifier
USABLE2002-26: Sharpe ~1.1 vs equity 0.81, maxDD ~−13% vs −51%, corr ~0.4; protected 2008 AND 2022. Lower drawdown, not higher end-wealth.
static gold-diversified core (NISA-USABLE)USABLE (NISA, risk only)The one NISA-usable win: a static equity+gold blend halves drawdown (−33% vs −51%) — risk reduction, not higher return.
The FIRST risk result that survives the NISA wrapper. The trend overlays work but SELL to de-risk → they burn the NISA acquisition quota (taxable-only). A STATIC blend never sells, so it IS NISA-compatible. On a fair total-return basis (SPY/TLT dividend+coupon-adjusted; correcting a price-index dividend trap), a static equity+gold blend durably raises Sharpe AND halves drawdown vs 100% equity in BOTH eras (70/30 eq/gold: Sharpe 1.03 vs 0.80, maxDD −33% vs −51%, post-2013 1.17 vs 1.00). GOLD carries it via low correlation — bonds do NOT (60/40 eq/bond trails equity post-2013 and bonds fell WITH stocks in 2022). BIG caveat: 2002-2026 is a gold bull (~12%/yr), so the return parity here is a regime artifact (cached gold can't reach gold's 1980-2000 flat decades) — the DURABLE claim is lower drawdown from diversification, NOT free higher return. It is a RISK PREFERENCE (give up some expected equity return for shallower drawdowns), not a wealth edge. Confirmed in JPY (the JP holder's own currency, vs USDJPY): the gold sleeve still durably raises Sharpe and cuts drawdown — and matters MORE for a JP holder, since unhedged foreign equity can take a deeper JPY drawdown (the yen's crisis role is regime-dependent: it strengthened in 2008, weakened in 2022 — so FX is not the hedge, gold is). A return-maximiser still holds 100% equity.breakout multibagger screen (TAIL only)TAIL EDGE~2× the base rate of ≥5× multibaggers (fixed horizon, all regimes) — but a 1-in-70 tail lottery, not an index core.
J-Quants 2016-2026, all listed names PIT, FIXED horizon: 3m≥30% + ≥90% of 6m-high + volume↑ ~doubles the base rate of ≥5× winners (P(≥5×) lift 2.1× over 24m, 1.8× over 36m; n=13,651 triggers). Survives the full multi-regime window and a fixed horizon — fixing the two flaws of the first 2024-26-only read. ROBUST to params too: an 18-cell threshold grid is ≥1.5× in 15/18 cells (OOS median 1.99×), and params picked on 2017-21 still lift 2.18× on held-out 2022-26 — a smooth gradient (stronger momentum → bigger lift), not a lucky point. BUT it is a TAIL signal, not a mean one: median trigger ~1.23×, P(≥5×)≈1.4% = 1-in-70. Survives a realistic exit too: even buy-and-hold to 24m with NO peak-timing keeps the ≥5× lift at 2.4×, while the median trigger fades to ~0.95× — a true fat-tail lottery (most fade, a few run far). A sized, spray-many-small-bets lottery, never an index replacement.household barbell — NISA core + taxable tail sleeveNO EDGENISA core + a taxable tail sleeve: the sleeve runs 4.7% < 9.4%, corr 0.86 — it beats plain NISA only 4.6% of the time.
The one construction a household actually has (never tested before, because prior breakout tests were all NISA-INTERNAL): tax-free NISA broad-index core (frozen) + a SEPARATE taxable breakout-tail sleeve (the tail signal above), so no quota burn. Tested 2018-2026 with correct 20.315% JP tax. Sleeve net CAGR 4.7% < TOPIX 9.4%, vol 18.2% > 14.5%, corr 0.86 — the worst possible diversifier. Every slice (0.5–5%) loses to plain NISA on terminal AND Sharpe; a 4,000-draw joint block-bootstrap shows even the p99.9 tail is lower and P(barbell > plain NISA) = 4.6%. Why: with no ex-ante tell for the 1-in-70 winners you must spray ~1,400 names equal-weight, so a 75× jackpot is 1/1,400 of the book (best single month +13.9%) — convexity you can't aim dilutes to nothing while the fading median trigger (0.954) drags the sleeve below the index. The tail edge is a discretionary lottery, NOT a NISA-household improver.
Bottom line
No strategy produced an ex-ante, repeatable, cost- and cycle-surviving way to beat a broad-index buy-and-hold on MEAN return out-of-sample. The one return-side exception is a TAIL signal, not a mean one — a breakout screen (3m≥30% + near 6m-high + volume↑) ~2× the base rate of ≥5× multibaggers across 2016-2026 on a fixed horizon — but the median trigger only ~1.2×, so it's a sized spray-many-small-bets lottery, not an index-beating core. The only robust active wins on the core were risk-REDUCTION, never higher return — and even those don't survive the NISA wrapper: the 12m-trend overlay needs fast re-entry to catch recoveries, which the 年360万 acquisition limit makes impossible (it turns a 310× buy-and-hold into 2.4×). For a NISA holder the honest answer is the simplest one: a broad index held forever (tax-free, ~0 turnover), with contributions invested immediately — timing even the new money by trend loses to plain DCA in ~85% of historical windows. One drawdown tool DOES survive inside NISA, though: a STATIC equity+gold blend (it never sells, so it doesn't burn the quota) — shallower drawdowns, not higher return, a risk preference. The active TREND overlay, by contrast, only helps in a SEPARATE taxable account because de-risking means selling. Not investment advice.
Full capstone: docs/handover_2026-06-03_program_capstone.md
ai-stock-agent