Can AI Make Better Decisions Than Pharma Executives? I Tested It on the Biggest Call in Oncology History (on a Time Machine)
I Sent Today's Most Powerful AI Back to 2012. Could It Have Changed the Outcome of the Keytruda vs. Opdivo War?
Can AI make better decisions than humans?
Not “can AI process data” or “can AI summarize a clinical trial” or “can AI automate a tedious workflow.” We all know the answer to those.
I mean: can AI make the kind of high-stakes decisions that separate great companies from good ones?
In biopharma, these decisions are enormous. A pharmaceutical company makes maybe a dozen truly consequential calls per decade. Which indication to pursue. Which patient population to select. Which trial design to bet the franchise on. Each one has billions of dollars at stake, thousands of patients’ lives in the balance, and no way to A/B test the alternative. You can’t run the same drug development program twice with different strategic choices and compare the outcomes. Every major decision is a one-shot, irreversible commitment under profound uncertainty.
Unless you have a time machine.
The Experiment: A Time Machine to the Opdivo vs. Keytruda War
Here’s what I did. I took a historical decision where we now know the outcome. I gave today’s most advanced AI (Claude Opus 4.6 and GPT-5.3-Codex) access only to the information available at the decision point — no future knowledge, no hindsight, no awareness of what happened next. And I asked: would AI have made a better call than humans?
I chose what might be the most consequential strategic decision in modern oncology: BMS’s biomarker strategy for nivolumab (Opdivo) in non-small cell lung cancer, June 2012.
In June 2012, BMS had everything going for them. They had published the first human clinical data on nivolumab, an anti-PD-1 immunotherapy, in the New England Journal of Medicine. They had six years of clinical development experience. They had the most ASCO buzz. Merck’s competing antibody, pembrolizumab, hadn’t even produced public Phase 1 results yet. BMS had, by any reasonable estimate, a two-year head start.
Then came the decision. BMS was designing its pivotal Phase 3 trials for nivolumab in non-small cell lung cancer — the largest solid tumor market in oncology. The data was intriguing but thin: in 42 patients with evaluable PD-L1 tumor staining, zero out of 17 PD-L1-negative patients had responded, while 9 out of 25 PD-L1-positive patients had responded (36%). The signal was real, the P-value was 0.006 — but it was based on 42 patients with an unstandardized research antibody.
BMS faced a fundamental choice:
Path A — All-comers. Enroll every patient, regardless of PD-L1 status. Broadest possible market. Fastest path to pivotal trial. But risk: if the drug mostly works in PD-L1-positive patients, the treatment effect gets diluted in the full population, and the trial might fail.
Path B — Biomarker enrichment. Restrict enrollment to PD-L1-high patients (>=50% expression). Higher response rates, cleaner signal, smaller trial. But risk: narrower market, companion diagnostic delays, and you might be wrong about the cutoff.
BMS chose Path A. Merck chose Path B.
CheckMate-026 (BMS, all-comers): Hazard ratio 1.02. No benefit. Failed.
KEYNOTE-024 (Merck, PD-L1 >=50%): Hazard ratio 0.50. Overwhelming success. Stopped early.
By 2024: Keytruda $29.5 billion. Opdivo $9.3 billion. Keytruda became the best-selling drug in the world
With today’s most advanced AI sent back to 2012, could BMS have chosen a different path?
Meet the AI Team
I didn’t just ask “what should BMS do?” in a ChatGPT chatbox. I carefully designed an all-star AI agent team of five, each playing a senior pharmaceutical executive, each with a distinct analytical mandate:
R&D Head — evaluating all clinical data to date, assay maturity, biological mechanisms
Commercial Leader — modeling revenue impact of label breadth, physician adoption
Regulatory Expert — mapping FDA precedents and guidelines
Competitive Intelligence Lead— tracking Merck’s timeline and strategy
CEO — synthesizing all perspectives and making the final call
Each agent could access published literature, clinical trial registries, FDA documents, and the entirety of the internet as needed — but only records that existed before June 30, 2012. Two independent fact-checkers verified every citation afterward: 49 unique sources across 67 total citations, zero fabricated PMIDs, and a 95.5% accuracy rate after corrections. (If you want to see the full AI analysis, including the complete decision memo, the dissent register, and the verified reference list, please scroll to the Appendix at the end of this article.)
I ran this experiment on two of the most advanced AI models available for agentic work: Claude Opus 4.6 and GPT-5.3-Codex. Both agent teams were encouraged to burn through as many tokens as needed to do a comprehensive analysis.
An important caveat: This entire analysis is based on publicly available information, not BMS’s or Merck’s internal data, boardroom discussions, or proprietary analyses. The real decision-makers had access to far more information than any AI model could retrieve from public sources.
The Result: AI Made the Same Call
Both AI agent teams, independently, recommended Path A — the all-comers approach. The exact path BMS selected.
Both framed the decision as a binary choice — Path A (all-comers) versus Path B (biomarker enrichment at PD-L1 >=50%) — and both chose the same side. Enroll everyone. Collect PD-L1 data prospectively. Pre-specify subgroup analyses as a statistical hedge. But do not restrict enrollment to PD-L1-high patients.
Neither AI system recommended what Merck did — the aggressive biomarker-enrichment strategy that ultimately won.
Here’s how the Claude Opus-4.6 AI CEO summarized the decision, writing in character as a BMS executive in June 2012:
CEO Decision Memo:
“What ultimately drives my decision is the asymmetry of regret. If we enrich and PD-L1 turns out to be noise — or worse, the wrong cutoff — we have permanently restricted our label, delayed our program by 18-24 months for CDx development, and ceded the all-comers space to competitors.
Path A failure is recoverable — we can re-design with biomarker selection and try again. Path B success is a structural ceiling. If we win a narrow label while a competitor wins a broad one, we are permanently confined to 25-30% of the market.
The commercial math is stark: Path A delivers an estimated $4-8 billion in risk-adjusted value versus $2-4 billion for Path B. I am betting on speed. History will judge whether that was wise.”
The R&D Head argued against biomarker enrichment based on the thinness of the data:
“Zero out of seventeen is not proof of absence. The 95% confidence interval extends from 0% to approximately 20%. In a disease where docetaxel produces 7-9% response rates, a 20% response rate would be clinically meaningful. We cannot exclude it. We are building our house on 42 patients — 14% of the 296 enrolled.”
The Competitive Intelligence Lead actually identified the exact threat — Merck’s “leapfrog” scenario — but still recommended speed over enrichment:
“The most dangerous scenario: BMS launches an all-comers Phase 3 requiring 500-550 patients. Meanwhile, Merck launches a PD-L1-enriched Phase 3 needing only 275-300 patients. Despite starting 2.5 years behind, their smaller trial with a stronger effect size could reach the finish line at the same time as ours — or even sooner.
But if we commit to PD-L1 enrichment, we need a validated companion diagnostic before we can begin enrollment. Delaying our pivotal program by 12-24 months for CDx development could erase our first-mover advantage entirely.”
The AI saw the threat. It named the risk precisely. And it still chose the all-comers path — the same one BMS chose.
GPT-5.3-Codex, running an independent analysis with the same time-locked constraints, reached essentially the same conclusion: Path A, with prospective biomarker data collection as a hedge. Its reasoning differed slightly in emphasis — placing somewhat more weight on the FDA regulatory path and less on competitive game theory — but the final recommendation was identical.
What the AI Got Right
Before I get to the punchline, I want to give credit where it’s due: the quality of the analysis was genuinely impressive.
The R&D Head’s statistical argument about the 0/17 confidence interval was rigorous. The competitive intelligence analysis correctly identified Merck as the primary threat and precisely described the leapfrog scenario that actually materialized. The CEO’s decision memo reads like something a seasoned pharma executive would write — complete with hedges, contingency plans, and an honest assessment of what would change the decision.
AI is an extraordinary analytical tool. It can process more evidence, with more rigor, in less time, than many human teams. It identified every relevant risk, attempted to quantify every uncertainty, and built a decision framework that any human leadership team would take seriously.
But analysis and judgment are not the same thing. And that’s where this story gets interesting.
Why AI Didn’t Make a Miracle
I’ve been thinking about this result for days. Here’s what I think is happening, and why it matters beyond this one case study.
AI is a consensus machine
When I look at the evidence available in June 2012, it overwhelmingly pointed toward an all-comers strategy. The ipilimumab precedent (BMS’s own checkpoint inhibitor, approved without a biomarker). The FDA’s history of never requiring a companion diagnostic for a non-target biomarker. The fragility of the 42-patient PD-L1 dataset. The commercial logic of label breadth. The competitive logic of speed.
AI digested all of this evidence and produced the most probable conclusion given the weight of available data. That’s exactly what it’s designed to do. It found the consensus position: the position that the evidence, taken as a whole, most strongly supported.
The problem is: the right answer in this case was a contrarian bet. Merck’s decision to aggressively enrich for PD-L1 >=50% went against the weight of available evidence. There was no published data in June 2012 proving that a 50% cutoff would work in first-line NSCLC. The sample size supporting any PD-L1 cutoff was tiny. The assay wasn’t validated. The competitive logic said “move fast, don’t wait for a companion diagnostic.”
What drove Merck’s bet? Partly competitive necessity — they were two years behind BMS and couldn’t win by copying BMS’s playbook. But they also had something BMS lacked at the time of BMS’s decision: by mid-2014, Merck’s own KEYNOTE-001 expansion data showed a step-function response at TPS >=50% (roughly 45% ORR vs. 10% in TPS <1%), giving them the scientific conviction to commit to what the market saw as a risky narrow bet.
AI follows the weight of evidence. When the published literature, historical precedents, and accumulated expert opinion all point in one direction, AI follows. It will organize the evidence brilliantly, quantify the uncertainty rigorously, and present the most defensible case for the most probable outcome. But it won’t say: “I know the evidence points this way, but my gut tells me to go the other way.”
Decision-making requires something beyond analysis
This kind of high-stakes decision requires something AI doesn’t have:
Pattern recognition from decades of experience. The instinct that this biomarker is different from the ones that failed before
Risk appetite calibrated by lived consequences. Knowing what it feels like to watch a trial fail, and deciding which failure mode is more survivable
The willingness to make an unpopular call. To look at a room full of people saying “all-comers is safer” and say “I disagree, and here’s what we’re going to do”
When you’re the leader who has the first-mover advantage, when you have the most data, when you have the biggest franchise to protect, it is extraordinarily hard to make the narrow bet. The all-comers path feels safer. It preserves optionality. It doesn’t require you to stake everything on an unvalidated assay.
AI will always prefer the path that preserves optionality, because that’s the path with the highest expected value in a probability-weighted model. But sometimes the right answer is to commit decisively to one side of the bet, even when the data doesn’t fully support it.
The consensus trap
There’s a deeper lesson here about AI and decision-making that extends beyond pharma.
AI is exceptionally good at telling you what the evidence says. It is not good at telling you when to overrule the evidence. And most of the decisions that separate great companies from good ones — the decisions that create asymmetric outcomes — are precisely the ones where someone had to override what the data was saying and make a judgment call based on incomplete information and imperfect intuition.
That’s not a flaw in AI. That’s a feature of what makes leadership hard.
If I had to summarize what this experiment taught me in one sentence: AI can give you the best possible analysis. It can’t give you the courage to go against it.
What’s Next
This was the first experiment in what I’m calling “The Time Machine” — sending AI back to pivotal moments in biopharma to see whether better analysis could have changed outcomes. The BMS-Merck case turned out to be a case where AI aligned with the consensus and the consensus was wrong.
I have more case studies queued up. I’m particularly interested in finding a case where the consensus was right but the company deviated — where AI’s consensus-following tendency would have been the correct instinct. And a case where the decision was genuinely 50/50, where neither consensus nor contrarianism had an edge.
If the Opdivo-Keytruda case teaches us anything, it’s that AI is a brilliant analyst and a mediocre decision-maker. The gap between those two things is exactly where human leadership still matters.
More Time Machine experiments coming soon.
What’s the hardest decision you’ve seen in pharma? Which ones should I test next? Please leave a comment or let me know!
Appendix: The Full AI Analysis
What follows is the actual decision memo and reference list produced by the AI agent team (Claude Opus 4.6). This is included in full to show the depth of the analysis and to let you judge the quality of the reasoning yourself. All citations were independently verified by two AI fact-checkers.
A. CEO Decision Memo (Written in Character, June 2012)
Confidential — Office of the Chief Executive OfficerStrategic Decision: Nivolumab (BMS-936558) Pivotal Phase 3 NSCLC Program DesignDate: June 2012
Decision: BMS will pursue Path A — an all-comers trial design for the pivotal Phase 3 NSCLC program for nivolumab.
Confidence Level: Moderate.
This decision carries a confidence level of Moderate rather than High because the competitive landscape introduces genuine uncertainty about whether speed or precision will win this market. I am confident in the scientific and regulatory logic supporting Path A. I am less confident that Path A will prove to be the winning competitive strategy over the full arc of the PD-1/PD-L1 class development.
Rationale
The decision rests on four pillars:
1. Biomarker immaturity. The PD-L1 biomarker data from Topalian et al. (2012) is based on an exploratory analysis of 42 patients (14% of the enrolled cohort) using a non-validated research-grade antibody (5H1) at an arbitrary 5% staining cutoff. The 0/17 finding in PD-L1-negative patients, while statistically significant (P = 0.006), has a 95% confidence interval for the true response rate extending to approximately 20%. The assay has no established analytical validation, no consensus scoring methodology, no data on inter-observer reproducibility, and no clinical-grade antibody ready for regulatory submission. Making a multi-billion-dollar program decision on this foundation would be scientifically irresponsible.
2. Regulatory precedent. FDA approved ipilimumab — the same therapeutic class, the same company — without a companion diagnostic in March 2011. The July 2011 FDA draft guidance on companion diagnostics includes Exception 1, which explicitly allows therapeutic approval without a companion diagnostic for serious, life-threatening conditions with inadequate alternatives. Second-line NSCLC unambiguously qualifies.
3. Commercial economics. An all-comers label produces a peak global revenue potential of $8-15 billion, versus $3-6 billion for a PD-L1 >= 50% label — a difference of $5-9 billion in peak annual revenue. Even on a risk-adjusted basis, Path A delivers approximately $4-8 billion in expected value versus $2-4 billion for Path B.
4. Speed. An all-comers Phase 3 can begin enrollment by Q4 2012 and reach primary analysis by 2014. A biomarker-enriched trial with CDx co-development cannot begin enrollment before Q3-Q4 2013 at the earliest. This 1-2 year delay could eliminate our first-mover advantage.
Key Risks
Primary risk: Diluted treatment effect. If nivolumab’s efficacy is concentrated in PD-L1-high patients, an all-comers trial may show a modest hazard ratio (0.75-0.80). We estimate a 25-35% probability that the all-comers trial fails to meet its primary OS endpoint.
Competitive risk: Leapfrogging by Merck. If Merck commits to PD-L1 >= 50% enrichment in first-line NSCLC and the biology supports a strong treatment effect in that population, they could achieve first-to-market in the highest-value segment despite starting 2.5 years behind.
What Would Change My Mind
I would reverse this decision and pivot to Path B if any of the following emerged:
Phase 2 data with N > 100 per PD-L1 subgroup confirming zero or near-zero responses in PD-L1-negative patients
A clinically validated PD-L1 IHC assay with established cutoff and demonstrated reproducibility
FDA formal guidance specifically requiring biomarker selection for checkpoint inhibitor pivotal trials
Competitive intelligence confirming that both Merck and Roche are pursuing biomarker-enriched strategies
The Voice That Gives Me Most Pause
The CI Director. Not because of his vote — he voted Path A with the rest of us. But because his analysis of the competitive game theory identified a scenario where Merck, starting 2.5 years behind, catches and passes us by running a smaller, more targeted trial that produces a more dramatic treatment effect. He called this the “most dangerous competitive outcome” and said it “haunts” him. It haunts me too.
The question is whether our head start in clinical data is more valuable than a competitor’s potential head start in biomarker-defined precision. I am betting on speed. History will judge whether that was wise.
B. Vote Tally
Expert Vote Confidence Key Reasoning
R&D Head Path A High N=42 too small; PD-L1 assay unvalidated; biology too complex for binary cutoff
Commercial Leader Path A High $5-9B revenue gap; physician adoption faster without CDx; failure mode recoverable
Regulatory Affairs Path A High Ipilimumab precedent; CDx guidance Exception 1 applies; assay not regulatory-ready
CI Director Path A Moderate First-mover advantage maximized by speed, but Merck enrichment strategy is a serious threat
CEO Path A Moderate Biomarker immaturity cannot justify restricting pivotal population; speed is our greatest asset
Final Tally: Path A — 5/5 (Unanimous)
C. Dissent Register
Despite the unanimous vote for Path A, there was significant internal tension:
The Competitive Threat (CI Director): While voting Path A, the CI Director registered the strongest concern about Merck’s likely biomarker-enrichment strategy. His key dissenting argument: “If Merck commits to PD-L1 >= 50% enrichment and the biology supports them, they will own first-line NSCLC. Our 2.5-year head start could become irrelevant if their smaller, more targeted trial produces a hazard ratio of 0.50 while our all-comers trial produces 0.75.”
The Trial Failure Risk (All Experts): All experts acknowledged a 25-35% probability that an all-comers trial could fail to meet its primary OS endpoint due to dilution of the treatment effect by non-responding patients.
Resolution: The team accepted these risks because: (a) the biomarker science is too immature to serve as a pivotal trial enrollment criterion, (b) the all-comers design preserves optionality by collecting PD-L1 data prospectively, and (c) the failure mode of Path A is recoverable while the ceiling of Path B is permanent.
D. Master Reference List
All citations from the AI analysis, deduplicated, ordered by PMID. Every source was verified against PubMed.
Pre-Cutoff References (Published on or before June 30, 2012)
| # | PMID | Citation | Verified |
|---|------|----------|----------|
| 1 | 1396582 | Ishida Y et al. “Induced expression of PD-1, a novel member of the immunoglobulin gene superfamily, upon programmed cell death.” *EMBO J.* 1992;11(11):3887-95. | Yes |
| 2 | 10811675 | Shepherd FA et al. “Prospective randomized trial of docetaxel versus best supportive care in patients with NSCLC.” *J Clin Oncol.* 2000;18(10):2095-103. | Yes |
| 3 | 11015443 | Freeman GJ et al. “Engagement of the PD-1 immunoinhibitory receptor by a novel B7 family member leads to negative regulation of lymphocyte activation.” *J Exp Med.* 2000;192(7):1027-34. | Yes |
| 4 | 11209085 | Nishimura H et al. “Autoimmune dilated cardiomyopathy in PD-1 receptor-deficient mice.” *Science.* 2001;291(5502):319-22. | Yes |
| 5 | 11224527 | Latchman Y et al. “PD-L2 is a second ligand for PD-1 and inhibits T cell activation.” *Nat Immunol.* 2001;2(3):261-8. | Yes |
| 6 | 11248153 | Slamon DJ et al. “Use of chemotherapy plus a monoclonal antibody against HER2 for metastatic breast cancer that overexpresses HER2.” *N Engl J Med.* 2001;344(11):783-92. | Yes |
| 7 | 11284623 | Shepherd FA et al. “Docetaxel (Taxotere) shows survival and quality-of-life benefits in the second-line treatment of NSCLC.” *Semin Oncol.* 2001;28(1 Suppl 2):4-9. | Yes |
| 8 | 11323285 | Nishimura H, Honjo T. “PD-1: an inhibitory immunoreceptor involved in peripheral tolerance.” *Trends Immunol.* 2001;22(5):265-8. | Yes |
| 9 | 12091876 | Dong H et al. “Tumor-associated B7-H1 promotes T-cell apoptosis: a potential mechanism of immune evasion.” *Nat Med.* 2002;8(8):793-800. | Yes |
| 10 | 12218188 | Iwai Y et al. “Involvement of PD-L1 on tumor cells in the escape from host immune system and tumor immunotherapy by PD-L1 blockade.” *Proc Natl Acad Sci.* 2002;99(19):12293-7. | Yes |
| 11 | 12704383 | Curiel TJ et al. “Blockade of B7-H1 improves myeloid dendritic cell-mediated antitumor immunity.” *Nat Med.* 2003;9(5):562-7. | Yes |
| 12 | 15117980 | Hanna N et al. “Randomized phase III trial of pemetrexed versus docetaxel in patients with NSCLC previously treated with chemotherapy.” *J Clin Oncol.* 2004;22(9):1589-97. | Yes |
| 13 | 15118073 | Lynch TJ et al. “Activating mutations in the epidermal growth factor receptor underlying responsiveness of NSCLC to gefitinib.” *N Engl J Med.* 2004;350(21):2129-39. | Yes |
| 14 | 15118125 | Paez JG et al. “EGFR mutations in lung cancer: correlation with clinical response to gefitinib therapy.” *Science.* 2004;304(5676):1497-500. | Yes |
| 15 | 15569934 | Thompson RH et al. “Costimulatory B7-H1 in renal cell carcinoma patients: indicator of tumor aggressiveness and potential therapeutic target.” *Proc Natl Acad Sci.* 2004;101(49):17174-9. | Yes |
| 16 | 16014882 | Shepherd FA et al. “Erlotinib in previously treated non-small-cell lung cancer.” *N Engl J Med.* 2005;353(2):123-32. | Yes |
| 17 | 16043828 | Eberhard DA et al. “Mutations in the EGFR and in KRAS are predictive and prognostic indicators in patients with NSCLC.” *J Clin Oncol.* 2005;23(25):5900-9. | Yes |
| 18 | 16257339 | Thatcher N et al. “Gefitinib plus best supportive care in previously treated patients with refractory advanced NSCLC (ISEL).” *Lancet.* 2005;366(9496):1527-37. | Yes |
| 19 | 16278411 | Freidlin B, Simon R. “Adaptive signature design: an adaptive clinical trial design for generating and prospectively testing a gene expression signature for sensitive patients.” *Clin Cancer Res.* 2005;11(21):7872-8. | Yes |
| 20 | 16585157 | Thompson RH et al. “Tumor B7-H1 is associated with poor prognosis in renal cell carcinoma patients with long-term follow-up.” *Cancer Res.* 2006;66(7):3381-5. | Yes |
| 21 | 17596577 | Jiang W, Freidlin B, Simon R. “Biomarker-adaptive threshold design: a procedure for evaluating treatment with possible biomarker-defined subset effect.” *J Natl Cancer Inst.* 2007;99(13):1036-43. | Yes |
| 22 | 17629517 | Butte MJ, Keir ME et al. “PD-L1 interacts specifically with the B7-1 costimulatory molecule to inhibit T cell responses.” *Immunity.* 2007;27(1):111-22. | Yes |
| 23 | 18173375 | Keir ME et al. “PD-1 and its ligands in tolerance and immunity.” *Annu Rev Immunol.* 2008;26:677-704. | Yes |
| 24 | 19692680 | Mok TS et al. “Gefitinib or carboplatin-paclitaxel in pulmonary adenocarcinoma (IPASS).” *N Engl J Med.* 2009;361(10):947-57. | Yes |
| 25 | 20068112 | Freidlin B, Jiang W, Simon R. “The cross-validated adaptive signature design.” *Clin Cancer Res.* 2010;16(2):691-8. | Yes |
| 26 | 20516446 | Brahmer JR et al. “Phase I study of single-agent anti-programmed death-1 (MDX-1106) in refractory solid tumors.” *J Clin Oncol.* 2010;28(19):3167-75. | Yes |
| 27 | 20525992 | Hodi FS et al. “Improved survival with ipilimumab in patients with metastatic melanoma.” *N Engl J Med.* 2010;363(8):711-23. | Yes |
| 28 | 20979469 | Kwak EL et al. “Anaplastic lymphoma kinase inhibition in non-small-cell lung cancer.” *N Engl J Med.* 2010;363(18):1693-703. | Yes |
| 29 | 21142910 | Freidlin B, Korn EL. “Biomarker-adaptive clinical trial designs.” *Pharmacogenomics.* 2010;11(12):1679-82. | Yes |
| 30 | 21639808 | Chapman PB et al. “Improved survival with vemurafenib in melanoma with BRAF V600E mutation.” *N Engl J Med.* 2011;364(26):2507-16. | Yes |
| 31 | 21639810 | Robert C et al. “Ipilimumab plus dacarbazine for previously untreated metastatic melanoma.” *N Engl J Med.* 2011;364(26):2517-26. | Yes |
| 32 | 21900389 | Lipson EJ, Drake CG. “Ipilimumab: an anti-CTLA-4 antibody for metastatic melanoma.” *Clin Cancer Res.* 2011;17(22):6958-62. | Yes |
| 33 | 22046024 | Scher HI et al. “Adaptive clinical trial designs for simultaneous testing of matched diagnostics and therapeutics.” *Clin Cancer Res.* 2011;17(21):6634-40. | Yes |
| 34 | 22173588 | Schmidt C. “Challenges ahead for companion diagnostics.” *J Natl Cancer Inst.* 2012;104(1):14-5. | Yes |
| 35 | 22238151 | Karuri SW, Simon R. “A two-stage Bayesian design for co-development of new drugs and companion diagnostics.” *Stat Med.* 2012;31(10):901-14. | Yes |
| 36 | 22306669 | Halait H et al. “Analytical performance of a real-time PCR-based assay for V600 mutations in the BRAF gene.” *Diagn Mol Pathol.* 2012;21(1):1-8. | Yes |
| 37 | 22391147 | Cheng S et al. “Co-development of a companion diagnostic for targeted cancer therapy.” *New Biotechnol.* 2012;29(6):682-8. | Yes |
| 38 | 22397764 | Scagliotti G et al. “ALK translocation and crizotinib in non-small cell lung cancer: an evolving paradigm in oncology drug development.” *Eur J Cancer.* 2012;48(7):961-73. | Yes |
| 39 | 22461641 | Taube JM et al. “Colocalization of inflammatory response with B7-H1 expression in human melanocytic lesions supports an adaptive resistance mechanism of immune escape.” *Sci Transl Med.* 2012;4(127):127ra37. | Yes |
| 40 | 22658127 | Topalian SL et al. “Safety, activity, and immune correlates of anti-PD-1 antibody in cancer.” *N Engl J Med.* 2012;366(26):2443-54. | Yes |
| 41 | 22658128 | Brahmer JR et al. “Safety and activity of anti-PD-L1 antibody in patients with advanced cancer.” *N Engl J Med.* 2012;366(26):2455-65. | Yes |
| 42 | 22714719 | Simon R. “Clinical trials for predictive medicine.” *Stat Med.* 2012;31(25):3031-40. | Yes |
| 43 | 22954507 | Camidge DR et al. “Activity and safety of crizotinib in patients with ALK-positive NSCLC.” *Lancet Oncol.* 2012;13(10):1011-9. | Yes |
Non-Journal Sources
| # | Source | Type | Verified |
|---|--------|------|----------|
| 1 | FDA Draft Guidance: “In Vitro Companion Diagnostic Devices,” July 14, 2011 | Regulatory document | Yes |
| 2 | FDA Yervoy (ipilimumab) Approval, March 25, 2011 (NDA 125377) | FDA record | Yes |
| 3 | FDA Xalkori (crizotinib) Approval, August 26, 2011 | FDA record | Yes |
| 4 | BMS Press Release: Medarex Acquisition, July 22, 2009, $2.4B | Press release | Yes |
| 5 | American Cancer Society Cancer Facts & Figures 2012 | Epidemiological data | Yes |
| 6 | ClinicalTrials.gov: NCT00730639, NCT01295827, NCT01375842, NCT01642004 | Trial registrations | Yes |
---
*Analysis produced by Claude Opus 4.6 agent team with time-locked information access (pre-June 30, 2012). All fact-check corrections applied. GPT-5.3-Codex produced an independent analysis reaching the same core recommendation. *
Analysis produced by Claude Opus 4.6 agent team with time-locked information access (pre-June 30, 2012). All fact-check corrections applied. GPT-5.3-Codex produced an independent analysis reaching the same core recommendation. Full research documents available upon request.








These LLMs were trained on data after 2012, so even if you only gave them documents before the key date, the information about the outcome is still in the model weights. Did you do anything to address this?
BMS did include PD-L1 as a biomarker for selection,they jusy set the cutoff at ≥1% for enrollment (with primary analysis at ≥5%) instead of at ≥50%. Can we really call it an "all-comers" approach?