Methodology
Methodology
This page documents exactly how the numbers on TitrateLab are computed. It exists so reviewers, researchers, and readers can check our work before deciding to trust it. If a figure in one of our articles doesn't match what you'd derive from the pipeline described here, that's a bug, and we want to hear about it.
What this page is
TitrateLab publishes research about the peptide grey market: Certificate-of-Analysis coverage, purity and dose-accuracy distributions, vendor-closure timelines, community sentiment, pricelist drift. Every published figure comes out of one of two corpora (a COA database and a Discord/forum message database) processed through the pipelines described below. We are documenting those pipelines in public, with their known limitations, so that no one has to take a TitrateLab number on faith.
We will not publish a vendor score, a vendor leaderboard, or any per-vendor claim until the open methodology issues flagged at the bottom of this page are resolved. That is an editorial rule, not a future goal. The numbers we do publish today are population-level findings across the full corpus.
Data is current as of 2026-04-23. Counts move daily; the shapes don't.
The COA corpus
The COA database is our ground-truth layer. Every record is a third-party assay of a specific peptide batch from a specific manufacturer, tied, where possible, to a public verification URL that the originating lab will confirm.
Sources, in descending order of volume.
- Finnrick panel reports. Finnrick runs rotating-panel testing against a pool of aggregator labs and publishes summary reports at
finnrick.com/vendors. We ingest the published vendor grades and the underlying per-batch records where Finnrick exposes them. This is the largest single source in our corpus (~6,448 records with measured purity at last count). - Janoshik public portal.
public.janoshik.comis the verification front-end for Janoshik Analytical, a Czech lab most experienced community buyers treat as the gold standard despite its non-ISO/IEC 17025 status. We OCR every accessible public test page and store the structured fields. After a crawl of the MESO-Rx "Analytical Lab Testing" subforum plus incremental backfill, the Janoshik-derived slice is ~1,600 records. - MESO-Rx Analytical Lab Testing subforum.
thinksteroids.com/communitycarries the densest community-lab-test discourse in the English-language grey market. Users post verification URLs in reply chains; we crawl the subforum, extract the URLs, and pipe them back into whichever lab scraper they belong to. MESO-Rx itself is not a COA source, it's an index into the public-portal corpus. - Discord COA image uploads. Buyers who pay out-of-pocket for HPLC occasionally drop the resulting COA image directly into community servers rather than a verification portal. Our listener fleet flags image attachments in peptide-tagged channels and routes them through the same OCR pipeline as the Janoshik PNGs. This is a small absolute source (low hundreds of records) but a high-credibility one: a buyer who pays for their own HPLC has less incentive to manipulate the result than a vendor submitting to a paid panel.
OCR pipeline. Janoshik's public portal exposes verification pages whose structured text is rendered inside an image rather than as HTML. We originally passed these PNGs through a rate-limited third-party OCR service. After hitting cost and throughput ceilings, we migrated the inner loop to Claude Haiku 4.5 vision, validated against a held-out Gemini 2.5 Flash ground-truth set on a random ~5% sample. The two models agree on peptide name, purity percentage, quantity, and manufacturer well above the threshold we need for aggregate analysis. Where they disagree on a specific field, the record is flagged and excluded from aggregates pending manual review.
Local image caching. The Janoshik public portal purges PNG images from older test records unpredictably. When we crawled historical data, roughly 71% of PNGs older than a few weeks had already been purged: the structured test metadata persists in Janoshik's system, but the image file backing it 404s. We now cache every PNG locally at ingest time, so records OCR'd into structured fields remain auditable even after the upstream image disappears. Anything we didn't capture at first touch is functionally lost.
Numbers, as of 2026-04-23.
- Batches with at least one third-party assay: 9,246
- Batches with measured purity (HPLC or UHPLC): 7,599
- Batches with measured quantity deviation: 7,428
- Distinct manufacturer strings: 296 (see caveat below on manufacturer-string resolution; this counts
manufacturer_rawstrings in our COA database, not canonical entities) - Time span: Sep 26 2024 through Apr 21 2026 (~19 months)
Outlier filtering. 214 records with absolute quantity deviation greater than 50% are excluded from aggregate deviation statistics as almost certainly OCR errors or vial-label misreads (e.g. a 1 mg-labeled cagrilintide vial that OCR'd as "11.75 mg tested" when the real label was 10 mg). The 214 excluded records represent less than 3% of the purity-populated corpus. Earlier versions of our analysis that included them produced implausibly high Janoshik aggregates; the filtered version is what we publish today.
The Discord corpus
The Discord database is our behavioral layer. It is how we measure what buyers are actually discussing, asking about, complaining about, and recommending in real time across the peptide and biohacking underground. It is also how we find vendor-closure signals (exit-scam language patterns cluster 30 to 60 days after first FDA enforcement news reaches a community).
Overwatch fleet. Our listener infrastructure, codename Overwatch, reaches 1,000+ Discord servers spanning the major peptide, biohacker, bodybuilding, and GLP-1 communities. Bots are invited members of the servers they watch: no scraping, no API rate-limit games, no terms-of-service violations on the Discord side. The scanner captures every message in every channel it reaches.
Numbers, as of 2026-04-23.
- Raw message corpus: approximately 3.5 million messages across all monitored servers.
- Peptide-campaign hits: approximately 100,000 messages that pass the regex-based keyword filter for peptide-relevant content.
- MiniLM embeddings: ~1.2 million 384-dimensional sentence embeddings (all-MiniLM-L6-v2), used for semantic search and clustering across guild-level content.
- Tier-2 enrichment: 5,005 peptide-tagged messages classified through Claude Haiku 4.5 for sentiment, intent, and relevance-confirmation as of this revision.
Two-stage filter. Stage one is a regex match against the peptide-campaign keyword set (tirzepatide, retatrutide, BPC-157, TB-500, and so on, plus short codes like "t5" or "reta"). Stage two is a Haiku 4.5 LLM classification that decides whether the stage-one match is actually peptide-relevant. The classifier rejects approximately 21% of stage-one hits as false positives: "roids" used in sports-banter contexts, "sust" as a video-game character name, "HGH" in a rap lyric. Every classification is stored with the model's reasoning and a confidence score, so the methodology is auditable row by row.
Privacy. We do not publish per-user Discord data in any article. All community-sentiment numbers are aggregated across the campaign-tagged corpus with no individual identification. Usernames, user IDs, and guild names are not exposed in any published figure.
Scoring
Vendor rankings — as of April 24, 2026 — use a two-axis Bayesian trust score that replaces the naive purity-average we started with. The methodology below is the rubric the chat bot and any vendor leaderboard references. Source-of-truth code is in scripts/chat/kb.py::score_vendor_trust; the 22-case regression suite in scripts/chat/test_vendor_trust.py is what gates changes.
The two numbers
| Axis | Range | What it means |
|---|---|---|
quality_score |
0 – 1 | Shrunken posterior-mean grade across the vendor's tested batches, with continuous recency decay and hard penalties for dose / endotoxin / catastrophic failures. |
data_sufficiency |
0 – 1 | How much fresh-equivalent evidence that quality rests on. A vendor with ten 2-year-old tests has low sufficiency; a vendor with ten tests in the last 60 days has high sufficiency. Untested = 0. |
A composite confidence_score = quality_lb × √data_sufficiency is used when a single number is required for ranking. It falls to zero for untested vendors — explicitly; we do not default unknown vendors to "average" because unknown is its own signal.
Per-batch grade
Each individual COA is first mapped to a grade in [0, 1]:
- If a Finnrick
test_score(0–10 composite: identity + dose + purity + endotoxin) is present, usetest_score / 10. - Else, if a purity percent is present, use
max(0, min(1, (purity − 90) / 10))— so 90% → 0.0, 100% → 1.0. - Else the batch has no usable quality signal and is excluded from quantity and quality math.
The grade is then capped:
|quantity_deviation| ≥ 25%→ grade capped at 0.4 (severe dose miss overrides a high purity read).- Endotoxin status in {
detected,high,positive} → grade capped at 0.2 (safety floor).
These caps exist because a vial can be 99% pure and still be wrong: wrong compound identified, half the label dose, or contaminated. Purity alone is a beauty contest when the other signals are failing.
Recency weighting
Each batch contributes with weight w = 0.5^(age_days / 180). A 6-month half-life, continuous. A test from yesterday contributes w ≈ 1.0; a test from 2 years ago contributes w ≈ 0.04. No cliffs, no "day 90 good / day 91 bad" step functions.
Bayesian shrinkage
The raw weighted mean is a noisy estimator for small samples — two lucky tests at 99% look identical to 200 consistent tests at 99%. We shrink toward a corpus prior using a Beta-Binomial posterior:
μ_post = (Σw·g_batch + α) / (Σw + α + β) with α = 5.6, β = 2.4 (equivalent to 8 imaginary batches at the corpus mean grade of 0.70).
A vendor with 2 perfect tests lands near 0.78 (pulled down from 1.0 by the prior). A vendor with 100 perfect tests lands near 0.98 (prior's influence washes out).
Wilson-style lower bound
The posterior mean is a point estimate; we also compute an 80% one-sided lower bound: lb = max(0, μ_post − 0.84 · √(μ_post·(1−μ_post)/(Σw+α+β))). This is what rankings actually sort on. A wide confidence interval (thin data) pulls the lower bound down, so 2-for-2 vendors rank below 50-for-50 vendors with the same point estimate. Same principle as ranking Reddit comments or UCB bandits.
Critical-failure multipliers
On top of the shrunken grade, we apply two rate-based multipliers:
- Critical-fail rate = smoothed rate of batches with Finnrick
test_score < 4. Multiplier:exp(−12·rate), floor 0.2. A 1-in-50 catastrophic rate drops the score by ~22%; a 1-in-5 rate floors it. - Endotoxin rate = smoothed rate of batches flagged detected/high/positive. Multiplier:
exp(−15·rate), floor 0.3. Endotoxin is a safety matter; the penalty is stiff even at low rates.
Dose failures are already folded into the per-batch grade cap (0.4) — no separate multiplier.
Data sufficiency
data_sufficiency = 1 − exp(−W / 8) where W is the weighted sum of batches (fresh-equivalent count). A vendor with 1 fresh test scores ~0.12; with 5 fresh tests ~0.46; with 20+ fresh tests ~0.92. This is the "how much do we know" axis — published alongside the quality score so readers can distinguish "well-characterized as bad" from "well-characterized as good" from "not enough data to say."
Why it's different from the Finnrick letter grade
Finnrick publishes a single letter (A–E) per vendor per peptide, based on their internal composite across their own tests. Ours is different because:
- We aggregate across multiple labs (Finnrick + Janoshik public + community-published + TitrateLab blend-expansion derivations).
- We propagate uncertainty via the lower bound, so sparse vendors don't get overrated off a few lucky tests.
- We separate "how good" from "how much we know" as distinct signals.
When our ranking contradicts Finnrick's, it's almost always because we're including evidence they don't see (Janoshik, community, blend components) or because their letter grade is a point estimate and ours is a confidence-interval lower bound. When Finnrick and we agree, that's a strong signal. When we disagree, the raw per-batch evidence is queryable via the chat bot — we don't ask you to trust either grade, just to check the batches.
Regression protection
Changes to priors, penalty constants, or grade-cap thresholds are gated by a 22-case edge-case suite covering sparsity (untested, 1 test, 100 tests), failure-mode isolation (dose-only, endotoxin-only, all-three), time decay (stale, fresh, mixed), and missing-data handling (null test_score fallback to purity, malformed dates). The suite runs nightly via systemd and alerts Discord on any failure. See scripts/chat/test_vendor_trust.py.
What this supersedes
The earlier disclosure on this page — "we don't publish vendor scores because our internal test_score is bimodal" — described a methodology stuck in development. That description no longer applies. The Bayesian composite above is bounded to [0, 1], non-bimodal, and regression-gated. However, the original caveat about "Janoshik versus Finnrick quantity-deviation disagreement" is still real and is noted under Known issues below; the composite weights both but flags their divergence on individual batches.
Temporal coverage
Most of our data is from the last few months. Be honest about that before drawing longitudinal conclusions from it.
COA corpus temporal bias. 61% of our lab data is from Q4 2025 and Q1 2026. Manufacturer behavior in early 2024 was different; the regulatory environment was different; the vendor population was different. Drawing conclusions about "2024 peptide quality" from this corpus is inappropriate. Where our articles compare year-over-year quality (Feb 2026 versus Feb 2025, for example), we footnote the sample sizes explicitly.
Discord/forum corpus temporal bias. The Overwatch fleet reached its current scale in late 2025 and has been real-time ingesting at that scale since. Earlier periods are covered by select targeted backfills: specific high-value forum threads, vendor-review archives, the MESO-Rx analytical-lab subforum crawl, Janoshik public-portal expansion. Those backfills are not comprehensive. Longitudinal claims about community sentiment or discussion volume that span the Q3/Q4 2025 boundary should be read as directional, not statistically complete.
What this means in practice. When an article says "HGH sentiment is question-dominant," that finding rests on 267 HGH-tagged messages out of 3,959 peptide-relevant messages out of 5,005 classified so far. The shape has held stable across earlier checkpoints (n=1,500, n=3,000, n=5,005), but it is not a claim about what peptide-buyer sentiment looked like in 2023. We don't have 2023.
What we don't have
The gaps matter more than the coverage, because the gaps tell you where our conclusions can't reach.
- An unverified archive. Vendors that simply disappeared without any primary-source trail, no FDA action, no DOJ filing, no community post-mortem, are not in our Peptide Vendor Graveyard archive. They belong in a different artifact that we have not built yet.
- Chinese-language sources. Roughly half of the manufacturing supply chain we track originates from Chinese OEMs. We do not currently ingest Chinese-language forums, WeChat groups, Baidu Tieba threads, or Alibaba seller pages. The information asymmetry between our English-speaking buyer corpus and the Chinese-speaking manufacturer corpus is real and unresolved.
- Telegram-only vendors. A growing cohort of vendors operates exclusively in Telegram channels, with no public storefront, no MESO-Rx thread, and no Discord presence. We do not have them. Our corpus is systematically biased toward vendors who maintain a Western web surface.
- Per-manufacturer samples for HGH. Even after corpus-expansion, HGH testing is thin. Our 65-batch HGH corpus spans 20 manufacturers, most appearing only once or twice. Any per-manufacturer HGH claim at this corpus size is n=1 or n=2 and should be read as anecdotal rather than statistical.
- Glp1forum attachment backfill (intentionally empty). We inspected 229 randomly-sampled attachments from the "review" and "warning" post categories on glp1forum and zero were actual third-party lab reports. Real COAs on that forum are referenced as URLs to
public.janoshik.com, which our existing Janoshik scraper already ingests. The 9,246-batch number is not understated by a hidden cache of COAs on that forum, as we initially hypothesized.
Known issues
These are open methodology bugs. Fixing them is work we are doing; publishing them is an editorial choice.
1. Finnrick versus Janoshik quantity-deviation disagreement (unresolved). On identical peptides, Janoshik reports quantity deviation roughly 5× higher than Finnrick's aggregator-lab panels. In Finding 3 of our flagship article, across 12 distinct peptides, Janoshik reads 6 to 9 percentage points heavier than Finnrick every time. That is not noise. Three explanations compete:
- Janoshik's methodology (HPLC plus mass spectrometry) detects mass that Finnrick's aggregator methods don't: counter-ions, residual solvents, water of hydration. Under this reading, Janoshik is more rigorous.
- Self-selection in the submission pipeline: vendors who believe their batch is heavy submit to Janoshik preferentially because they trust the result.
- Finnrick's aggregator labs systematically undercount over-labeling, because vendors contracting into the panels want to see purity rather than mass. This is the uncharitable reading and we do not assert it, but we cannot rule it out.
We have not resolved the disagreement. Until we do, any quantity-deviation figure we publish is labeled with its source (Finnrick, Janoshik, or combined). The combined aggregate is published with an explicit caveat; we do not treat the two lab pipelines as interchangeable.
2. Manufacturer-string resolution. Our 296 distinct "manufacturer" strings almost certainly reduce to roughly 60 to 100 actual underlying OEMs once the Western-storefront-to-OEM graph is properly resolved. "JKL Peptides," "JKL," and "JKL Biotech" may all refer to the same supply chain. We are rebuilding the mapping and will publish it when the graph is clean. Until then, per-manufacturer aggregates should be read as per-string, with the understanding that some strings collapse into each other.
3. test_score bimodality. Documented under "Scoring" above. Open.
4. HGH dimer-content re-extraction. Our current HGH dimer-content figure (5 of 23 batches with measurable dimerized somatropin) comes from an earlier manually-read 23-batch set. The expanded 60-batch HGH corpus has not yet been re-run for dimer analysis. That's a methodology pass scheduled for the next HGH update, not a published claim about the larger corpus.
How to reach us with corrections
For factual corrections, methodology disputes, missing sources, or anything in our published research that you think misrepresents the record: legal@titratelab.com. We read every message and respond to the substantive ones. Corrections that land in a published article are footnoted with the date of correction and, where the correspondent prefers attribution, a short acknowledgment.
Vendor and manufacturer names are used descriptively in our articles to identify parties in the documentary record. Inclusion is not endorsement. Exclusion is not condemnation. If you represent a vendor or lab and a passage misrepresents your operation, the same address applies.
Every aggregate in a TitrateLab article is reproducible from the corpora described above. If you want to reproduce a specific figure and cannot, email us: we will either point you at the query, correct the article, or correct the pipeline. All three outcomes have happened in the past. This page will be revised as methodology evolves; material changes are logged in the site's git history and surfaced in article footnotes when they affect already-published numbers.