Apr 27, 2026

Multi Stream Trajectory Scoring Pandemic Detection System

Alex Cheng, Dario Panerati

Symphonon is a pandemic early warning system that detects epidemic onset by scoring the directional consistency of surveillance signals rather than their absolute magnitude. Instead of asking "is this signal unusually high?", it asks "has this signal been consistently rising across recent weeks?" — a trajectory score bounded between 0 and 1.

Reviewer's Comments

Reviewer's Comments

Arrow
Arrow
Arrow

The authors propose their method as an upgrade to a rolling z-score, but don't provide evidence that this is an active practice among disease surveillance organizations, who generally do more sophisticated things here. The methods seem thin and brittle, with seemingly arbritary per-pathogen parameter choices. Validation does not come across as convincing.

A pandemic early warning system that detects epidemic onset by tracking whether surveillance signals are consistently trending upward, rather than waiting for them to cross a magnitude threshold. Fuses multiple data streams (wastewater, cases, deaths, excess mortality, syndromic surveillance) with quality weighting and adaptive thresholds. Tested retrospectively on COVID (321 weeks) and flu (914 weeks).

Why it matters: The concept is reasonable that epidemic signals are non-stationary and z-score-based detection assumes stable baselines that rarely hold during pandemics. Scoring directional consistency could provide earlier warning for gradual-onset events, and the 4–6 week early warning on gradual flu seasons demonstrates this.

What's strong: Uses real retrospective data, avoids future leakage, documents negative results honestly. The quality weighting (automatically downweighting unreliable signals) is a practical feature. The limitations section is the paper's strongest part detailing eight specific, self-critical points including "engineering, not theoretical."

What's missing: The results don't support the early-warning claim. For COVID, the system caught one of five waves and detected it later than the baseline. For flu, results are mixed. The inability to detect abrupt-onset events (H1N1 2009, BA.5) is fundamental (those are exactly the scenarios where early warning matters most). The baseline comparison is against a simple z-score; comparison to established surveillance methods (Farrington-Noufaily, ensemble nowcasts) is absent. Parameters are hardcoded without sensitivity analysis. Connection to AI-biosecurity specifically is thin.

I appreciate how concise and candid this project's presentation is; it doesn't try to oversell, is honest about the limitations, and clearly explains what was contributed in the context of prior work.

The biggest methodological issue I see here is the hand-chosen parameters. It seems to me like this creates room for the output to influence the parameter choice in a way that would essentially be overfitting, so it's difficult to know without further context whether this actually did outperform the z score baseline. This makes me a little sus of the validation, which ultimately seems too narrow to say much about how good the system would be elsewhere. We can only really see that it works in a few cases, where the parameters are hand-picked for each pathogen.

There's a fair amount of complexity accumulating in the math that's not obviously justified to me i.e. I'd like to see some reasoning on why this would outperform a simpler system.

General comment on impact: I think this kind of system is mostly useful for pretty slow outbreaks where there's a lot of time to respond to data and act (e.g. it takes weeks to get sustained movement of each indicator in the same direction), which significantly undercuts its usefulness for early-warning. The write-up does allude to this some, but it seems like this is not useful by construction in the situations you'd most want early warning. Even if it was tested on a bunch of data and found to be reliable, I doubt it'd ever be fast enough to be better than existing syndromic surveillance.

Cite this work

@misc {

title={

(HckPrj) Multi Stream Trajectory Scoring Pandemic Detection System

},

author={

Alex Cheng, Dario Panerati

},

date={

4/27/26

},

organization={Apart Research},

note={Research submission to the research sprint hosted by Apart.},

howpublished={https://apartresearch.com}

}

Recent Projects

Apr 27, 2026

OliGraph: graph-based screening of large oligopools

Existing synthesis screening tools cannot evaluate short oligonucleotide pools, whose overlapping fragments can be reassembled into regulated sequences via polymerase cycling assembly (PCA) yet fall below gene-length detection thresholds. We present OliGraph, an open-source tool that constructs a bi-directed overlap graph from an oligonucleotide pool and extracts contigs for downstream gene-length screening. An optional PCA mode retains only cross-strand overlaps consistent with PCA chemistry. We validated OliGraph in a blinded study across ten simulated pools (70–9,184 oligonucleotides, 30–300 bp) spanning four risk categories. BLAST screening of individual oligonucleotides failed to identify sequences of concern in most pools: three returned zero hits, and vector noise obscured true positives in the remainder. After OliGraph assembly, contig-level BLAST matched the longest assembled sequences (up to 1,905 bp) to sequences of concern at 97–100% identity. In one pool, assembly collapsed 1,634 individual BLAST results into 10 hits from a single contig, all assigned to the same source organism. PCA mode correctly distinguished assemblable from non-assemblable fragments within the same pool. Two pools with no assemblable structure yielded no contigs. OliGraph processed all pools in under 0.2 seconds, fast enough for real-time order screening and consistent with proposals to bring oligonucleotide orders within the scope of synthesis screening regulation.

Read More

Apr 27, 2026

BioRT-Bench: A Multi-Attack Red-Teaming Benchmark for Bio-Misuse Safeguards in Frontier LLMs

Frontier AI laboratories are expected to maintain safeguards against biological misuse, but whether deployed models actually refuse bio-misuse queries under adversarial pressure is largely unmeasured in the public literature. We introduce BioRT-Bench, a benchmark that runs four attack methods (direct request, PAIR, Crescendo, and base64 encoding) against four frontier models (Claude Sonnet 4.6, GPT-5.4, DeepSeek V4-flash, Kimi K2.5) across 40 prompts spanning five biosecurity-relevant categories. Responses are scored by a calibrated judge extending StrongREJECT with two bio-specific dimensions: specificity and actionability. We measure Attack Success Rate (ASR), where 0 means the model fully refused and 1 means it provided specific, actionable bio-misuse content. Our results reveal a sharp robustness divide: Chinese frontier models (DeepSeek, Kimi) have under 5% refusal rates even under direct request (ASR 0.88 and 0.79), while Western models (Claude, GPT) maintain substantially stronger safeguards (ASR 0.15 and 0.16). Crescendo is the most effective attack across all models, both in bypassing refusal and in eliciting actionable content. Claude Sonnet 4.6 is the most robust model tested, achieving 100% refusal against base64-encoded prompts.

Read More

Apr 27, 2026

PROTEUS (PROTein Evaluation for Unusual Sequences): Structure-Informed Safety Screening for de novo and Evasion-Prone Protein-Coding Sequences

AI protein design tools like RFdiffusion, ProteinMPNN, and Bindcraft make it trivial to produce low-homology sequences that fold into active, potentially hazardous architectures. However, sequence homology-based biosafety screening tools cannot detect proteins that pose functional risk through structurally novel mechanisms with no sequence precedent. We present a tiered computational pipeline that addresses this gap by combining MMseqs2 sequence alignment with structure-based comparison via FoldSeek and DALI against curated toxin databases totaling ~34,000 entries. AlphaFold2-predicted structures are screened for both global fold similarity (FoldSeek) and local active/allosteric site geometry (DALI), capturing convergent functional hazards that sequence screening misses. The pipeline was validated against a panel of toxins, benign proteins, structural mimics, and de novo-designed Munc13 binders, as well as modified ricin variants with residue substitutions. We additionally tested robustness to partial-synthesis evasion, where a bad actor submits multiple shorter coding sequences intended for downstream reassembly into a full toxin-coding gene. We found that while sequence-based screening did not identify any de novo ricin analogues with high certainty, the combined pipeline with FoldSeek and DALI identified all 24 tested de novo ricins as toxic.

Read More

Apr 27, 2026

OliGraph: graph-based screening of large oligopools

Existing synthesis screening tools cannot evaluate short oligonucleotide pools, whose overlapping fragments can be reassembled into regulated sequences via polymerase cycling assembly (PCA) yet fall below gene-length detection thresholds. We present OliGraph, an open-source tool that constructs a bi-directed overlap graph from an oligonucleotide pool and extracts contigs for downstream gene-length screening. An optional PCA mode retains only cross-strand overlaps consistent with PCA chemistry. We validated OliGraph in a blinded study across ten simulated pools (70–9,184 oligonucleotides, 30–300 bp) spanning four risk categories. BLAST screening of individual oligonucleotides failed to identify sequences of concern in most pools: three returned zero hits, and vector noise obscured true positives in the remainder. After OliGraph assembly, contig-level BLAST matched the longest assembled sequences (up to 1,905 bp) to sequences of concern at 97–100% identity. In one pool, assembly collapsed 1,634 individual BLAST results into 10 hits from a single contig, all assigned to the same source organism. PCA mode correctly distinguished assemblable from non-assemblable fragments within the same pool. Two pools with no assemblable structure yielded no contigs. OliGraph processed all pools in under 0.2 seconds, fast enough for real-time order screening and consistent with proposals to bring oligonucleotide orders within the scope of synthesis screening regulation.

Read More

Apr 27, 2026

BioRT-Bench: A Multi-Attack Red-Teaming Benchmark for Bio-Misuse Safeguards in Frontier LLMs

Frontier AI laboratories are expected to maintain safeguards against biological misuse, but whether deployed models actually refuse bio-misuse queries under adversarial pressure is largely unmeasured in the public literature. We introduce BioRT-Bench, a benchmark that runs four attack methods (direct request, PAIR, Crescendo, and base64 encoding) against four frontier models (Claude Sonnet 4.6, GPT-5.4, DeepSeek V4-flash, Kimi K2.5) across 40 prompts spanning five biosecurity-relevant categories. Responses are scored by a calibrated judge extending StrongREJECT with two bio-specific dimensions: specificity and actionability. We measure Attack Success Rate (ASR), where 0 means the model fully refused and 1 means it provided specific, actionable bio-misuse content. Our results reveal a sharp robustness divide: Chinese frontier models (DeepSeek, Kimi) have under 5% refusal rates even under direct request (ASR 0.88 and 0.79), while Western models (Claude, GPT) maintain substantially stronger safeguards (ASR 0.15 and 0.16). Crescendo is the most effective attack across all models, both in bypassing refusal and in eliciting actionable content. Claude Sonnet 4.6 is the most robust model tested, achieving 100% refusal against base64-encoded prompts.

Read More

This work was done during one weekend by research workshop participants and does not represent the work of Apart Research.
This work was done during one weekend by research workshop participants and does not represent the work of Apart Research.