Trial Translator Platform Assesses Real-World Applicability of Cancer Clinical Trials - Ravi Parikh
June 4, 2025
Andrea Miyahira interviews Ravi Parikh about a Nature Medicine paper evaluating how well oncology trial results generalize to real-world patients. Dr. Parikh discusses a "Trial Translator" platform, which uses machine learning to predict survival outcomes based on electronic health records and genomic data from community oncology clinics. Their research examines 11 phase 3 oncology trials across four cancer types and finds that while low-risk populations show similar outcomes to clinical trial results, high-risk patients often see significantly reduced or no benefit from experimental treatments. In prostate-specific findings from the CHAARTED and LATITUDE trials, high-risk patients showed much lower absolute and relative survival benefits from the interventions. Dr. Parikh emphasizes the need to broaden eligibility criteria or design separate trials for these underrepresented high-risk populations, suggesting their models could also serve as "digital twins" for more rigorous single-arm studies in the future.
Biographies:
Ravi B. Parikh, MD, MPP, FACP, Attending Physician, Associate Professor, Department of Hematology and Medical Oncology, Winship Cancer Institute, Emory University, Atlanta, GA
Andrea K. Miyahira, PhD, Director of Global Research & Scientific Communications, The Prostate Cancer Foundation
Biographies:
Ravi B. Parikh, MD, MPP, FACP, Attending Physician, Associate Professor, Department of Hematology and Medical Oncology, Winship Cancer Institute, Emory University, Atlanta, GA
Andrea K. Miyahira, PhD, Director of Global Research & Scientific Communications, The Prostate Cancer Foundation
Read the Full Video Transcript
Andrea Miyahira: Hi, everyone. I'm Andrea Miyahira at the Prostate Cancer Foundation. Please welcome Dr. Ravi Parikh, who recently relocated to Emory University. He will discuss his paper, "Evaluating generalizability of oncology trial results to real-world patients using machine learning-based trial simulations," published recently in Nature Medicine. Dr. Parikh, thanks for joining.
Ravi Parikh: Thank you. And thank you for allowing me to present some of our work on evaluating generalizability of seminal oncology trials to real-world patients. This was published in Nature Medicine a little earlier this year. I think we are quite well aware that oncology trials are poorly generalizable, oftentimes to real-world patients. On the left, you see a study comparing results from patients who are well represented and often enrolled in clinical trials in the dark blue compared to survival results from patients who are underrepresented and unrepresented in clinical trials. And we can clearly see that survival results are oftentimes lower in the real world, particularly among underrepresented and unrepresented patients compared to those who are well represented.
Now, that sometimes is due to the fact that clinical trials are structured—their eligibility criteria are structured—to identify healthier patients. But a really seminal study was published in 2021 that showed that even in simulations where we relaxed certain eligibility criteria, we were still generally able to preserve clinical trial estimates. And so that has led to this movement to broaden eligibility criteria to allow more patients who are representative of real-world patients into clinical trials.
And yet and still, even if we were able to do that, oftentimes, it's just difficult to generalize clinical trial results to the real world, because the type of patient who may choose to enroll in a trial is oftentimes different than patients who are in the real world. And that creates a lot of difficulty when I have a patient in front of me and I'm trying to apply the results of a trial to them.
So enter in our platform, as you can see on the next slide, which we called Trial Translator, which was built on a large real-world electronic health record database of patients with advanced malignancies who were derived from primarily community oncology clinics throughout the country, linked to variant calls from commercially available next-generation sequencing assays. So we really have a large multimodal database that we're using to build a digital twin platform to help translate randomized controlled trials.
So now I'm going to get into a little bit of what we did and what our results were. You can see the first phase of our platform, which was phenotyping. We essentially took a large dataset across four different malignancies and trained a series of machine learning algorithms to predict their likelihood of one- to two-year survival. We used standard machine learning methods to preprocess data, select features, and then cross-validate those algorithms in order to reduce the likelihood of overfitting or suboptimal performance when applied to other datasets.
In a holdout test dataset, we tested the performance of these algorithms and selected one specific type of algorithm—the gradient boosting model—that generally had the best performance at predicting mortality compared to other standard machine learning models. So we can go to the next slide that just shows some of our performance and sampling of our machine learning modeling results in a representative cancer, breast cancer, as well as some of the features that were most relevant in the breast cancer-specific model. And again, we did this across four different malignancies.
And we can see the second phase of our platform, which was trial emulation, in which we actually took patients within specific cancers and sought to emulate specific eligibility criteria from 11 phase 3 clinical trials that have been published in top-tier journals and still influence our practice today. After selecting eligible patients from very granular mappings of eligibility criteria, we then used our machine learning model—indicated in the GBM diamond there—to risk-stratify patients into high, intermediate, and poor prognosis cohorts, after which we just replicated the clinical trials testing the experimental arm against the control arm in each of those risk-stratified cohorts.
The next slide illustrates a bit of what we found. So what we were clearly able to see was that compared to clinical trial results for control populations versus treatment populations, illustrated in the red shaded lines there, in general, low-risk populations tended to perform somewhat similarly to what was quoted in clinical trials. They tended to have somewhat similar absolute survival estimates and also relative survival estimates compared to the trial.
However, for high-risk cohorts, the third of the populations that were generally at highest prognostic risk, in general, we were unable to find significant findings from comparing the experimental versus control arm. And in some cases, there was no benefit whatsoever, indicating that clinical trials generally generalize poorly to those high-risk cohorts. On the next slide, you can see the results when we pooled our estimates across all of the 11 clinical trials.
And so I think there's a few conclusions that we derived. One is that clinical trials overall generalize somewhat poorly for real-world populations. However, by using machine learning, we may be able to actually identify those populations for whom trials generalize poorly, and that represents a cohort in which perhaps we ought to be engaging a bit differently and pursuing alternative trial design strategies and treatment strategies for those individuals rather than just presuming that trials will generalize well to them. So that was, in essence, what we found.
Andrea Miyahira: Well, thank you so much, Dr. Parikh, for sharing this with us. Can you discuss your prostate cancer-specific findings?
Ravi Parikh: Sure. So we emulated two seminal prostate cancer studies in this paper. One was the CHAARTED randomized trial, which tested chemohormonal therapy versus ADT alone in metastatic hormone-sensitive prostate cancer. And then the other was the LATITUDE trial, which tested abiraterone plus prednisone versus ADT alone in metastatic castration-sensitive prostate cancer.
And so what we found in both of those studies was largely consistent with our overall study, which was that we generally saw a very robust treatment effect in low-risk populations—individuals who were healthier and generally would have been eligible for the trial. However, in high-risk populations, in general, the absolute survival benefit of the experimental arm—which again, was chemohormonal therapy in CHAARTED and abi plus pred in LATITUDE—the absolute plus the relative survival benefit was much lower. Indicating that in high-risk populations, those studies and their absolute survival benefits generally did not generalize all too well.
Andrea Miyahira: OK. Thank you. And what are features of the high-risk patients that most reduce their predicted survival benefits in the RCTs? Were these more clinicopathologic features of the tumor, or were they more host factors or demographic features?
Ravi Parikh: That's a great question. So the caveat here is that because next-generation sequencing in the real world for prostate cancer was still relatively in its nascency at the time that most of the real-world patients were treated in this cohort, most of the factors ended up being host-related features that could be derived from granular diagnosis codes, laboratories, and cancer-specific factors.
However, there were certain biomarker-derived features that did make their way into features that were of highest predicted risk. For example, in lung cancer, the absence of an actionable mutation was heavily predictive of poor prognosis or high risk. In prostate cancer, obviously castrate resistance—which wasn't so relevant for the two trials, but in the general population—was heavily relevant. Age, weight loss, PSA at diagnosis—those were all factors that were deemed relevant.
So I think in conclusion, none of the factors here should be surprising, but it's largely the combination of these factors, oftentimes in nonlinear combinations that only the machine learning algorithm can really understand, that are responsible in classifying someone as high versus low risk. And yet and still, we're able to still see—even with those combinations of relatively well-known risk factors—we're able to see these clear differences in survival benefits from the drugs that we're testing.
Andrea Miyahira: OK. Thanks. And what are your biggest messages for clinical trialists and oncologists to take away from this?
Ravi Parikh: Sure. Yeah. Well, so I think there's a couple of things. First is that for these relatively higher-risk populations, in general, they tend to be underrepresented, because clinicians choose not to enroll them, or unrepresented, because there are systematic eligibility criteria that exclude them. And yet and still, these high-risk patients represent a meaningful proportion of real-world patients, and we have very little evidence that clinical trials generalize well to them.
And so we can't just take the results of a clinical trial and assume it's going to translate well. We actually need to be expanding and broadening eligibility criteria, or designing separate trials in and of themselves so that we can generate higher-quality evidence for these individuals. The other thing I would say is that maybe on a more forward-looking lesson, what we've derived here are essentially computational representations of patients who would be enrolled in clinical trials. There's a series of weights that are assigned to various clinical factors.
And so as we think about novel trial designs, particularly thinking about running clinical trials with so-called external or historical control arms, where we can run single-arm trials and compare them against real-world patients, models like the ones we've derived can serve as sort of digital controls or digital twins that can enable us to generate a higher amount of rigor for single-arm studies, particularly for rare variants or for novel drugs that can allow us to be a little bit more confident in those drugs compared to the non-controlled studies that we're used to running. And I think it represents a really interesting direction for where we might take these types of studies in the future.
Andrea Miyahira: OK. Well, thank you so much, Dr. Parikh, for sharing this with us.
Ravi Parikh: All right. Thank you.
Andrea Miyahira: Hi, everyone. I'm Andrea Miyahira at the Prostate Cancer Foundation. Please welcome Dr. Ravi Parikh, who recently relocated to Emory University. He will discuss his paper, "Evaluating generalizability of oncology trial results to real-world patients using machine learning-based trial simulations," published recently in Nature Medicine. Dr. Parikh, thanks for joining.
Ravi Parikh: Thank you. And thank you for allowing me to present some of our work on evaluating generalizability of seminal oncology trials to real-world patients. This was published in Nature Medicine a little earlier this year. I think we are quite well aware that oncology trials are poorly generalizable, oftentimes to real-world patients. On the left, you see a study comparing results from patients who are well represented and often enrolled in clinical trials in the dark blue compared to survival results from patients who are underrepresented and unrepresented in clinical trials. And we can clearly see that survival results are oftentimes lower in the real world, particularly among underrepresented and unrepresented patients compared to those who are well represented.
Now, that sometimes is due to the fact that clinical trials are structured—their eligibility criteria are structured—to identify healthier patients. But a really seminal study was published in 2021 that showed that even in simulations where we relaxed certain eligibility criteria, we were still generally able to preserve clinical trial estimates. And so that has led to this movement to broaden eligibility criteria to allow more patients who are representative of real-world patients into clinical trials.
And yet and still, even if we were able to do that, oftentimes, it's just difficult to generalize clinical trial results to the real world, because the type of patient who may choose to enroll in a trial is oftentimes different than patients who are in the real world. And that creates a lot of difficulty when I have a patient in front of me and I'm trying to apply the results of a trial to them.
So enter in our platform, as you can see on the next slide, which we called Trial Translator, which was built on a large real-world electronic health record database of patients with advanced malignancies who were derived from primarily community oncology clinics throughout the country, linked to variant calls from commercially available next-generation sequencing assays. So we really have a large multimodal database that we're using to build a digital twin platform to help translate randomized controlled trials.
So now I'm going to get into a little bit of what we did and what our results were. You can see the first phase of our platform, which was phenotyping. We essentially took a large dataset across four different malignancies and trained a series of machine learning algorithms to predict their likelihood of one- to two-year survival. We used standard machine learning methods to preprocess data, select features, and then cross-validate those algorithms in order to reduce the likelihood of overfitting or suboptimal performance when applied to other datasets.
In a holdout test dataset, we tested the performance of these algorithms and selected one specific type of algorithm—the gradient boosting model—that generally had the best performance at predicting mortality compared to other standard machine learning models. So we can go to the next slide that just shows some of our performance and sampling of our machine learning modeling results in a representative cancer, breast cancer, as well as some of the features that were most relevant in the breast cancer-specific model. And again, we did this across four different malignancies.
And we can see the second phase of our platform, which was trial emulation, in which we actually took patients within specific cancers and sought to emulate specific eligibility criteria from 11 phase 3 clinical trials that have been published in top-tier journals and still influence our practice today. After selecting eligible patients from very granular mappings of eligibility criteria, we then used our machine learning model—indicated in the GBM diamond there—to risk-stratify patients into high, intermediate, and poor prognosis cohorts, after which we just replicated the clinical trials testing the experimental arm against the control arm in each of those risk-stratified cohorts.
The next slide illustrates a bit of what we found. So what we were clearly able to see was that compared to clinical trial results for control populations versus treatment populations, illustrated in the red shaded lines there, in general, low-risk populations tended to perform somewhat similarly to what was quoted in clinical trials. They tended to have somewhat similar absolute survival estimates and also relative survival estimates compared to the trial.
However, for high-risk cohorts, the third of the populations that were generally at highest prognostic risk, in general, we were unable to find significant findings from comparing the experimental versus control arm. And in some cases, there was no benefit whatsoever, indicating that clinical trials generally generalize poorly to those high-risk cohorts. On the next slide, you can see the results when we pooled our estimates across all of the 11 clinical trials.
And so I think there's a few conclusions that we derived. One is that clinical trials overall generalize somewhat poorly for real-world populations. However, by using machine learning, we may be able to actually identify those populations for whom trials generalize poorly, and that represents a cohort in which perhaps we ought to be engaging a bit differently and pursuing alternative trial design strategies and treatment strategies for those individuals rather than just presuming that trials will generalize well to them. So that was, in essence, what we found.
Andrea Miyahira: Well, thank you so much, Dr. Parikh, for sharing this with us. Can you discuss your prostate cancer-specific findings?
Ravi Parikh: Sure. So we emulated two seminal prostate cancer studies in this paper. One was the CHAARTED randomized trial, which tested chemohormonal therapy versus ADT alone in metastatic hormone-sensitive prostate cancer. And then the other was the LATITUDE trial, which tested abiraterone plus prednisone versus ADT alone in metastatic castration-sensitive prostate cancer.
And so what we found in both of those studies was largely consistent with our overall study, which was that we generally saw a very robust treatment effect in low-risk populations—individuals who were healthier and generally would have been eligible for the trial. However, in high-risk populations, in general, the absolute survival benefit of the experimental arm—which again, was chemohormonal therapy in CHAARTED and abi plus pred in LATITUDE—the absolute plus the relative survival benefit was much lower. Indicating that in high-risk populations, those studies and their absolute survival benefits generally did not generalize all too well.
Andrea Miyahira: OK. Thank you. And what are features of the high-risk patients that most reduce their predicted survival benefits in the RCTs? Were these more clinicopathologic features of the tumor, or were they more host factors or demographic features?
Ravi Parikh: That's a great question. So the caveat here is that because next-generation sequencing in the real world for prostate cancer was still relatively in its nascency at the time that most of the real-world patients were treated in this cohort, most of the factors ended up being host-related features that could be derived from granular diagnosis codes, laboratories, and cancer-specific factors.
However, there were certain biomarker-derived features that did make their way into features that were of highest predicted risk. For example, in lung cancer, the absence of an actionable mutation was heavily predictive of poor prognosis or high risk. In prostate cancer, obviously castrate resistance—which wasn't so relevant for the two trials, but in the general population—was heavily relevant. Age, weight loss, PSA at diagnosis—those were all factors that were deemed relevant.
So I think in conclusion, none of the factors here should be surprising, but it's largely the combination of these factors, oftentimes in nonlinear combinations that only the machine learning algorithm can really understand, that are responsible in classifying someone as high versus low risk. And yet and still, we're able to still see—even with those combinations of relatively well-known risk factors—we're able to see these clear differences in survival benefits from the drugs that we're testing.
Andrea Miyahira: OK. Thanks. And what are your biggest messages for clinical trialists and oncologists to take away from this?
Ravi Parikh: Sure. Yeah. Well, so I think there's a couple of things. First is that for these relatively higher-risk populations, in general, they tend to be underrepresented, because clinicians choose not to enroll them, or unrepresented, because there are systematic eligibility criteria that exclude them. And yet and still, these high-risk patients represent a meaningful proportion of real-world patients, and we have very little evidence that clinical trials generalize well to them.
And so we can't just take the results of a clinical trial and assume it's going to translate well. We actually need to be expanding and broadening eligibility criteria, or designing separate trials in and of themselves so that we can generate higher-quality evidence for these individuals. The other thing I would say is that maybe on a more forward-looking lesson, what we've derived here are essentially computational representations of patients who would be enrolled in clinical trials. There's a series of weights that are assigned to various clinical factors.
And so as we think about novel trial designs, particularly thinking about running clinical trials with so-called external or historical control arms, where we can run single-arm trials and compare them against real-world patients, models like the ones we've derived can serve as sort of digital controls or digital twins that can enable us to generate a higher amount of rigor for single-arm studies, particularly for rare variants or for novel drugs that can allow us to be a little bit more confident in those drugs compared to the non-controlled studies that we're used to running. And I think it represents a really interesting direction for where we might take these types of studies in the future.
Andrea Miyahira: OK. Well, thank you so much, Dr. Parikh, for sharing this with us.
Ravi Parikh: All right. Thank you.