Methodology

Magistra Side Effects Predictor™ — Version 4.0 (dual-track)

Last updated: April 2026 | Open for peer review

Invitation to researchers

This methodology is deliberately transparent. We invite biostatisticians, epidemiologists, clinical researchers, and ML researchers to critique our approach, propose improvements, or collaborate. All feedback is publicly acknowledged in future versions.

Submit feedback ↓

1. Dual-track architecture

Rather than blending clinical and community data into a single number (which hides the evidence hierarchy), we compute two parallel estimates per side effect:

Clinical

Only published clinical trials + regulatory reports. Conservative. All modifiers from peer-reviewed sources.

Real world

All sources including Reddit, forums, news. Captures selection bias but also real experiences missing from trials.

The gap between them is itself informative. For example: clinical trials systematically underreport emotional blunting, hair loss, and fatigue. When the real-world estimate is significantly higher, this indicates a gap in clinical reporting — not error in the prediction. As more clinical data arrives, both tracks should converge.

2. Data collection

Daily, 18 scrapers collect data from four source categories. Claude Haiku extracts structured data points with a confidence level (high/medium/low):

  • • Clinical: PubMed, ClinicalTrials.gov, Cochrane, WHO, EMA, MHRA, preprints
  • • Regulatory: FDA FAERS, EMA, MHRA
  • • User reports: Reddit (13 subreddits), Trustpilot, Drugs.com, Quora
  • • News & guidelines: Google News RSS, professional guidelines

Each data point retains its sourceType for deduplication (sourceUrl + sideEffect) and for filtering into the two tracks. Extraction is conservative: only explicitly stated rates, no inference.

3. Statistical approach

For each track:

  1. Filter data points by patient profile (sex, dose, ethnicity, exercise)
  2. Weighted average (weight = sample size × extraction confidence)
  3. Winsorization at 5th/95th percentile if n > 10 (bounds logged explicitly)
  4. Dose scaling if data lacks dose specificity
  5. Log-odds modifier application (capped at total shift of 2.5 log-odds)
  6. Random-effects confidence interval (DerSimonian-Laird τ²)
logit(p) = logit(base_rate) + Σ ln(OR_i)   (|Σ| ≤ 2.5)

4. Self-evolving model

A daily statistical analysis computes empirical odds ratios for every dimension × effect. Key safeguards:

  • • FDR correction (Benjamini-Hochberg) across all hypothesis tests
  • • Auto-update only when n ≥ 30, corrected p < 0.01, and OR change < 0.3
  • • Larger changes are flagged for human review
  • • New parameters (ethnicity, BMI, diet) get 'promoted' when significant for 2+ effects
  • • Max 5 auto-updates per day; all changes logged with provenance
  • • Every previous config version is retained for rollback

5. Honest limitations

We publish our limitations because hidden weaknesses are more dangerous than visible ones:

  • Data volume: Currently ~217 data points across 15 effects. Model health is 'degraded' until we reach n ≥ 100 per effect.
  • Demographic gaps: Female bias (87% of specified sexes), minimal ethnic diversity in sources.
  • Modifier sources: Initial values are hand-coded from literature. Being replaced with empirical values as data accumulates.
  • No interaction terms: Modifiers are applied independently. Interactions (e.g., sex × age) are not modeled — we cap cumulative log-odds shifts to limit stacking bias.
  • Calibration testing: Not yet formally validated against independent outcome data. Planned once n ≥ 500 per effect.
  • LLM extraction: Imperfect. Gold-standard manual audit is planned on a sample of 50 sources per effect.
  • Journey predictor: Weight trajectory and muscle preservation models use STEP trial constants with expert-coded modifiers. Not empirically validated.
  • Not causal: These are population-average conditional risks, not individual causal predictions.

6. Publication roadmap

  • • Phase 1 (current): dual-track framework, open methodology, community feedback
  • • Phase 2: n ≥ 100 per effect, formal calibration testing
  • • Phase 3: n ≥ 500, external validation on independent dataset
  • • Phase 4: pre-register at OSF.io, submit to peer-reviewed journal (target: Nature Medicine)

We are a small team and our methodology inevitably has weaknesses. If you see something wrong or improvable, let us know.

Researcher feedback

If you're a biostatistician, epidemiologist, clinical researcher, or ML scientist: critique our approach. We acknowledge all contributors in future versions and maintain a public changelog.

All feedback is manually reviewed. Contributors are acknowledged in the public methodology changelog (unless anonymity is requested).

Open source on GitHub

The full methodology, source code, and preprint are published under Apache 2.0. Inspect, fork, or submit a pull request.

github.com/saurabhgoyal75/magistra-predictor →

Cite our preprint

The full methodology preprint is published on Zenodo with a permanent DOI under CC BY 4.0.

Goyal, S. (2026). A Dual-Track Framework for GLP-1 Side Effect Estimation: Separating Clinical Evidence from Real-World Patient Reports (v4.0). Zenodo. https://doi.org/10.5281/zenodo.19559749
DOI: 10.5281/zenodo.19559749 →

Magistra Side Effects Predictor™ — Statistical indicator, not medical advice.

Personalised weight management, doctor-guided

Free Questionnaire