Comparing Traditional Risk Groups vs AI Analysis in Prostate Cancer Management, Journal Club - Jonathan Tward & Ashley Ross
April 29, 2025
Biographies:
Jonathan Tward, MD, PhD, Professor, Department of Radiation Oncology, Huntsman Cancer Institute, University of Utah, Salt Lake City, UT
Ashley Ross, MD, PhD, Urologist, Associate Professor of Urology, Robert H Lurie Cancer Center, Northwestern Feinberg School of Medicine, Chicago, IL
Matthew R. Cooperberg, MD, MPH, Professor of Urology; Epidemiology & Biostatistics, Helen Diller Family Chair in Urology, UCSF Helen Diller Family Comprehensive Cancer Center, University of California, San Francisco, San Francisco, CA
Discussion Between Expert Clinicians and Patients: Comparing Traditional Risk Groups vs AI Analysis in Prostate Cancer Management
Artificial Intelligence and Prostate Cancer: Risk Stratification After Primary Therapy, ADT Treatment Intensification, and Evaluation of Metastatic Disease
Beyond the NCCN Localized Prostate Cancer Risk Category: The MMAI Prognostic Risk Stratification Model - Jonathan Tward
'Unfavorable Histology’ Classification Aims to Reduce Unnecessary Treatment, Journal Club - Jesse McKenney, Jane Nguyen, & Cornelia Ding
Matthew Cooperberg: Good morning, everyone, and thank you for joining us for what is our fourth installment now in the Prostate Cancer Journal Club for Patients. Brought to us with a lot of gratitude from the PCF and UroToday. This is an idea launched by the UCSF patient advocates, many of whom are on the call and will be helping drive this discussion. The idea being to bring groundbreaking, game-changing literature papers that have been published in the prostate cancer arena straight to you, the patient community.
As I said, this is our fourth installment, and for today's journal club, we are going to be looking at a really interesting paper on a brand-new technology, which is getting a lot of attention, which is pathology AI. This is the idea of using computer learning, which everyone is hearing about constantly in the news every day, to try to do more with the tissue that we get from prostate biopsies and prostatectomy specimens every day. This is computer vision, machine learning, artificial intelligence. There's a lot of terms that are applied to these technologies.
And this is a field that is moving extremely quickly. So there are several tests now on the market or about to be on the market in the prostate cancer space, which aim to do just this — to augment what the pathologist does when they get the tissue from a biopsy or prostatectomy by doing more. And that doing more is different things with different products and different efforts.
The one that is the furthest along and is now in the NCCN guideline is an artificial intelligence tool termed a multimodal artificial intelligence tool because it's actually incorporating some clinical data together with the pathology pictures, marketed by a company called Artera, which is a pretty new company in this space. And it's been moving very quickly. This is now in the NCCN guidelines. This is approved in all 50 states, finally, and some of you may have already heard about it or seen it in use.
And so it is my pleasure to be joined by the senior author, first author, lead author on the latest major paper in this space, Prostate Cancer Risk Stratification in NRG Oncology, Phase III Randomized Trials, using Multimodal Deep Learning with Digital Histopathology. So it's my pleasure to introduce Jonathan Tward from University of Utah and the Huntsman Cancer Institute, who led this effort. He's going to be taking us through the findings of the paper and the study. And then I'm also thrilled to be joined by Dr. Ashley Ross from Northwestern and the Robert Lurie Cancer Center, who has been a long-time leader in the prostate cancer biomarker space.
And I would say at the outset that this is very much a biomarker and the way we think about these pathology AI tools. This is in the same space as the tests that we have had for a decade now, like Prolaris and Decipher and Oncotype. There are some potential advantages here. There are some potentially some limitations here relative to those tests, and we'll talk about them as we go forward. So without further ado, I will pass it on to Jonathan. Welcome. Thank you—
Jonathan Tward: Thank you very much. I appreciate being here. And my favorite thing to do is actually talk to patients. That is why we all do what we do, and what is extremely exciting about this work is it is really the kind of work that lets a patient and their physician sit down together and really get some very useful and actionable information at the outset of their prostate cancer diagnosis. These are my disclosures.
I've worked with several people in various industries, although this particular research was not funded on my side. So I have not received compensation by Artera. OK. So as most of the men who have been diagnosed with prostate cancer know, we doctors like to lump people into groups, and we're basically taught to do this across the spectrum of oncology. When you think about cancer, most people are familiar with the concepts of stages. Everyone knows we stick people in boxes stage 1, 2, 3, 4. And in prostate cancer, we actually do something very similar, except we like to call it risk groups.
And there's a guideline that many physicians around the world use called the NCCN risk grouping, and this risk grouping is basically how almost everybody approaches low-risk prostate cancer. And so you can see here that some of the risk groups — there's low risk, and there's another group called favorable intermediate risk and one called unfavorable intermediate risk and not shown on the slide. You might see high risk. And when you really look at what information we need to put men in one of these risk groups, it's very limited really.
You just need three pieces of information, really, to find out if someone's in the low-risk group — what their prostate feels like on exam, what the tissue looked like to a human pathologist under a microscope — that's the Gleason grading that we talked so much about — and what the PSA value is. And then maybe you just need one or two more pieces of information, frankly, to further stratify, but it's a rather rudimentary risk classification system. And why do we do this?
Well, we really do this, frankly, to make determinations about treatment intensity. So you can see someone with a low risk of prostate cancer, we want to do something not very intense. In fact, we want to observe them. We want to put them on active surveillance. But as we start moving into these intermediate-risk categories, you start having questions about, well, maybe it is or isn't safe to watch, let's say, a favorable intermediate-risk patient. But if we do treat them, do we need to do multiple things or just one simple thing?
And so if you look at an unfavorable intermediate-risk patient, if you're thinking about radiation therapy, for example, the guidelines would suggest that not only should we consider treating them — and if we are going to use radiation, it would suggest that we want to combine with hormone therapy. So you can see the intensification going up. But here's a very provocative question. OK. We put you guys in risk groups. And most of you men know what risk group you may be in. But let's actually talk about the risks that they actually predict, which is interesting because I'm not sure that a lot of men are really aware of it at this level.
And frankly, I'm not certain that all the physicians are aware of the risks that these risk groups engender. So if you look at these NCCN risk groups — and we'll start with the top bar in the table. The probability that a man will die of prostate cancer over the next decade when being put into one of these boxes — for a low-risk man is about 1.5% probable. For a favorable intermediate-risk man, about 3%. For an unfavorable, 8%.
So overall, that looks pretty good. I mean, we'd love to see this as 0% across the board, but on one hand, it's reassuring that most men won't die of prostate cancer. But on the other hand, in the unfavorable intermediate risk, it's already getting up there. And if, again not shown on this slide, but just as important, are men called high risk and even very high risk. If you look at a different metric, which I think is extremely important, which is the risk of metastasis happening, you can see the proportions down there.
Now, I value this particular outcome, frankly, as a physician, more than anything. I think different doctors have different opinions. And at the end of the day, all that matters is a patient's goals. But why metastasis matters so much is that this is really fundamentally the thing that would lead to additional therapies beyond initial treatment, including drug therapies that may last a lifetime and significant quality of life decrements. So you can see the spectrum of risk for that.
And then the third thing I'll put in the bottom, which is interesting only because Dr. Cooperberg mentioned biomarkers, is there are some biomarkers out there that look at detecting worse pathology than what you thought from the biopsy. So if you're asking a different sort of question, which is, what is the probability that a man with low-risk prostate cancer, let's say, has something more than Gleason 6, and in fact, has a Gleason 4 plus 3, that might be 10% probable. But these are the actual probabilities for these groups.
So going back to these groupings and going back to treatment guidelines — and again, we're showing you the NCCN guidelines, but there are lots of guidelines that doctors use across the world. The European Urology Society has one that people like. The AUA has one. There's lots of guidelines, but they all do this. And they're pretty similar across the guidelines. They say if you're a man with low-risk prostate cancer, you really should do active surveillance.
And in fact, in this particular guideline, it says it's preferred, but they give you a couple other choices. You might want to consider radiation. You might want to consider surgery. So let's call that low-intensity choices, especially if you're doing active surveillance. And then when you go up to the favorable group, you're like dealer's choice. You could do surveillance. You could do RP, which stands for prostatectomy, which is surgical removal, or RT, which is some form of radiation therapy. But the guideline doesn't exactly help you pick between one of the three. It just says they exist.
And then if you go up into what the guideline says for unfavorable — for example, they've erased the idea of active surveillance, which is interesting. But they say, you know what? If you're deciding that you're going to get radiation therapy, you should probably do hormone therapy. So then again, more intensity. So let's really talk about these biomarkers, again. And there's a difference between something called prognostic biomarkers and predictive biomarkers.
And so if you're a patient and you have some kind of test that can determine your prognosis. And this is independent of what treatment you receive. We're just trying to cluster people into, is it really aggressive or not?
You get people who might be stratified into an excellent prognosis, or those who might be stratified into a poor prognosis. And what those lines are trying to show you is that over time, what's the risk of something — of you not staying healthy, maybe, or not developing metastasis? So the lines indicate at the top that most people stay good, where the people, the poor prognosis bad.
So if you have an excellent prognosis, maybe treatment intensification is unnecessary. Maybe you need to do something simple. But if you have a poor prognosis and you have something that tells you that, then you and your physician are going to want to think about whether or not you might want to do more than one thing.
And so let's say you have treatment intensification is desirable, you might want to know if the thing you're going to do is futile or not. So a predictive test is a test that will look at whether or not a drug you might choose will help. And what this is trying to show is that this very test that we're going to talk about not only can stratify men into good and bad prognosis, but as a secondary point, although not related to this paper, it can tell you if a drug choice, which is hormone therapy you may want to choose, will simply work or not. So that's what a predictive test does.
So the motivation for this work was that we thought that conventional risk grouping and staging is suboptimal, and we felt that the spectrum of risk leads to overtreatment and undertreatment. And these risk groups are widely used, but they're limited. And so there's an opportunity for artificial intelligence to potentially augment or help this.
And I won't really go through this slide because you're going to have to trust me on this. What we basically did was took thousands and thousands of biopsy slides that had been archived from men, who bravely volunteered for prospective randomized trials by the Radiation Therapy Oncology Group, which has since been named NRG Oncology.
And these tissue samples were banked thinking that we'd do all kinds of fancy molecular genetic analysis on them. At the offset, no one was thinking that an AI would look at them and make an impression. But they were available for digitization, so they literally had photos taken of them. And we had an AI look at them, much like a pathologist might try to interpret slides, and combine that with a little bit of clinical data to see if it can predict the future of the men.
So the objective of the study was to develop clinically usable risk groupings, and I would say, that are more clinically relevant, I guess, than our historic risk groupings, which we have learned were pretty good at determining things like maybe whether or not someone's PSA would rise after treatment, but not more important things like will they need additional treatment? Will they develop metastasis?
So there were thousands of men in the study. And we can probably skip over this. But at the end of the day, we had thousands of patients to choose from. An AI model was trained on many, many thousands of men with many, many digital histopathology slides. And this work looked at a few thousands of those men to see how AI performed compared to NCCN risk groups.
And we can probably skip over this one. But what I'm really trying to show here, and I'll say it in one minute, is that there are ways of determining accuracy, how close to truth is any kind of predictive model. And numbers that equal 1 is like a God-like prediction, and numbers that equal 0.5 is as good as a coin toss. And the gray bars show how good conventional risk groups are at accurately predicting a future, like risk of metastasis or risk of death.
And the AI outperforms normal risk groupings at predicting the future for almost any endpoint that you track after a man receives treatment. So here, we really get into what the paper shows. These are called Kaplan-Meier curves, and let me explain what you're looking at.
The x-axis is time elapsed in years from treatment. And the y-axis going up is the probability of time that you detect a spread of cancer metastasis in the population. So on the left-hand side, you see how NCCN risk grouping does. NCCN, low risk, intermediate, and high.
But what you can see on the right is how our AI models do. And the thing that's important to see is what's written in the text, which is if you use an NCCN standard for low risk of metastasis, like less than 3%, you'll only capture 30.4% of men with that conventional way who are low risk.
And the high risk, it would stratify into 44% of men. But if you look at how the AI does it, it shifts the men into lower risk disease. So it knows with basically more truth and more accuracy that instead of only 30.4% men being low risk, it's more likely 43.5%. And there's fewer high-risk men.
We rationally designed these AI risk groups to be very specifically calibrated to these risks. We wanted these risk groups to be less than 3% risk of metastasis at 10 years from the AI model, a 3% to 10% risk and a greater than 10% risk.
And the reason for that is that those particular clusterings are critically useful for guiding treatment decisions. So this is a bit complicated here. But what you can see is the columns represent NCCN risk groups. And the colors represent how the AI would reclassify those people. And so one way to really look at this is if you look at the intermediate-risk column, you can see that 57% of the men by conventional risk grouping who are called intermediate risk, who we'd say all the time should get treated, actually have such a low risk of cancer that they have a risk that is very similar to a historically low-risk man, and maybe they could be surveilled.
And so if you overlay the actual risks of metastasis, you can see, for example, that in the column called low, which is historic risk grouping, there's a small percentage of men — it's really not that low. And maybe we should not watch them. And then in the intermediate-risk group, we've got a spectrum of risk that goes up, and so we shouldn't treat them all the same.
And even in the highest-risk group, you can see that there's a spectrum of risk that might even include surveillance or maybe somewhat less intense therapies. You can see how this changes our thinking. So this is the same graph. This is the AI. The low-risk men guidelines say should always be observed.
The intermediate-risk men say everyone should get treated. And sometimes, you might want to give hormone therapy. And the high-risk group according to the guideline says everyone gets treated. And if you're going to use radiation, you should all put men on a long-term hormone.
And here's how the treatment implication changes. With the AI model, although it might be considered heresy even in the high-risk men, it highly suggests that we can probably just observe these men. In the low-risk men who we would always observe, there's a small percentage of men that the AI suggests we should probably treat.
And I said, need radiation treatment. They can also receive surgical treatment. But this was a radiation study, which is why I said that. And that's so that Dr. Ross doesn't get mad at me. The AI here says that men might need radiation treatment, but we might want to do further evaluation to see if they should or should not get hormone therapy.
Here, the implication is, well, we probably should throw the book at these guys. These guys we're going to use our most intense treatment. If we use radiation, we probably want to use long-term hormone. And I think there's one more click in there or not. Not sure.
And these guys, maybe they can do OK with just short-term hormone treatment, which is not a guideline recommendation. The guideline says all those men should get long-term hormone. So this is how NCCN risk groups look at, let's call it, an unfavorable or a favorable intermediate-risk man.
This is a human pathologist, a PSA, whatever. Everyone's a 7% risk. The AI will take that 7% risk and deconstruct it into personalized individual risks. So you can see the individual risk below. They actually average to 7%.
But the score report you see on the right is what a patient might see if they get this particular test. So in this example, this test shows, hey, you're a man with unfavorable intermediate-risk prostate cancer. That's what the guideline would actually say.
However, your actual risk of metastasis is only 1.7%. It's low. Just like a guy that you might actually want to consider surveilling. And so if you're sitting in front of me, and I'm seeing this 2% risk of mets, I'm thinking, I don't need to probably give you hormone therapy in addition to something else. It's already so low.
But if I did, it mentions below that hormone therapy could drive that small risk down a little bit more. So anyway, the implications are that we avoid overtreatment in low-risk patients. We enable intensification for high-risk treatments. And we also have a way to discreetly discern on the predictive side in an intermediate-risk man, if hormone therapy would simply work or not, period.
So that's my story, and I'm sticking to it. And I'm looking forward to commentary by Dr. Ross and questions from this august group of patients.
Matthew Cooperberg: Wonderful. Thank you. And as a reminder to everyone participating, please put any questions in the Q&A, and we will answer them as they go, as we go. Now, I have a single slide from Dr. Ross here, if you give me one moment.
Ashley Ross: And while we're loading that up, that was a great presentation, Dr. Tward. And thank you, Dr. Cooper, again, for all the patients that are on. Before I dive into AI in general, from a patient's perspective, particularly for you all that are on this call that are sort of informed patients and patient advocates, you want to know, what's going on in the research realm?
Then as tests come to market, like ArteraAI, what's the state of the science? And we had a good review of that from Dr. Tward. And then is this applicable to me? Is this — even though it's available, is it ready for widespread dissemination? Do I know what I want to do with those results?
And so to back up a little bit, I think that as Dr. Cooperberg mentioned going in and as we saw from Dr. Tward, one of the things that we're trying to deal with now with a lot of this extra computing power and other things that we have, is managing big data, whether it be population-level data, whether it be pixels on a digitized pathology slide, how do we have the AI look at that?
There are some initial attempts, and there's been very robust partnerships between things like the National Radiation Oncology Group, and clinician scientists, and companies like ArteraAI to develop these digital path tools. So when we look at artificial intelligence for digital path, before we even get to ArteraAI, the digital path part, it's important to note that we're kind of just coming into new territory.
So we've had things like how things look under the microscope for a while, Gleason grading. We've had genomics come around, where we understand that gene expression drives protein production, drives what cancer cells do. The AI is, in some ways, a little bit more of a black box.
And so I thought it'd be important to, one, put up this slide on promises and ponderings of AI and as it relates to digital pathology. The promise is two things. One, it's a multimodal AI. You can use machine learning to take variables that we already use to put people into these discrete buckets and maybe individualize them just on their imputed factors, your pathology Gleason grade, read by the pathologist, the PSA, et cetera.
But the digital path part can also give maybe extra information with no tissue consumption, instant turnaround time, ArteraAI is proprietary, but you could imagine open-source free testing. And this might augment or even replace molecular biomarkers. The ponderings and why I think, for me, I was very excited about Artera and other digital path things as they were developed.
But I've tempered my enthusiasm a little bit, and I am not actively using a lot of this testing in my patients currently because there are a lot of ponderings. Where is the data right now? What is it actually doing?
So for the digital pathology part, you can sometimes retrofit things, and ArteraAI tried to do that to see what it is actually looking at. Is it picking up cribriform patterns, which are ways that cells touch each other, or other things that might tell us among Gleason grade group 4 what's a stratification?
Is it picking up an inflammatory signal? But it's hard to retrofit. It's better to let the computer just do what it's doing. The second—so then the second question is, does the computer really understand everything? There was an AI that was made, where it was trying to identify airplanes.
And the AI came out almost, as Dr. Tward said, God-like prediction of this picture is an airplane. And then someone threw in a piglet with its ears out, and the AI said, airplane. And so we just don't know.
And you look at what ArteraAI has done, and it's been a huge effort among scientists like Dr. Tward and others in their company and NRG. They have looked at multiple randomized controlled trials among radiation patients, so nobody was not treated, in different contexts with historical slides, and then they've made their markers.
What's changed? Or in terms of real-world data, where are we? And the thing to note for patients, or for me, myself, Gleason grading has changed. So those computed values of Gleason grade that the machine learning uses, those may or may not be accurate.
There's been a consolidation of Gleason grade group 1; that's more refined. The slides, as they age, have fading. They have artifacts. Are modern slides going to look the same? Probably.
So before we even get to what the data is telling us, since the technology itself is not fully understood, I think, in addition to the high-level evidence, which I think they've done an awesome job of that, the state that we are right now using digital path in our practices for prognostic testing, risk stratification, where we were just talking about, is in development in my mind, and it's going to take a few years of prospective data in the real world for me — every provider is different — to feel comfortable making decisions for my patients.
But even beyond that, if you say—I think that stuff's going to be moot. It probably works as well in my patients today as it did in these studies. Who would use this to help them make decisions with their patients? Patients getting radiation. You cannot really extrapolate into, or it might be dangerous to extrapolate into, surveillance, yes or no.
Plus, in that surveillance category, nobody had MMAI high risk, as Dr. Tward showed you. So it wouldn't even give you useful information, per se. How about patients with prostatectomy?
We recently published with ArteraAI a small series on the PLCO cohort, and it does seem to stratify them too. How much bang for the buck are you getting as a patient? Unclear. But that's not always important. And you don't have to practice, at a patient level, population medicine, like is this going to cost the health-care system or not?
I will tell you this. From the data I've seen so far, if you do get an ArteraAI, and you're in the high-risk category, it's nuanced in the studies. But I saw some of the questions that some of the panelists put up. You've really read the paper.
Well, the high-risk people do—they have aggressive disease. They're performing worse than even NCCN high risk. But if you're a low-risk Artera, I don't know what that means for you, the patient. And you can extrapolate.
But I'd like to see, and I think we're going to see, a lot of good real-world data. It sounds silly. Usually, we have all this real-world data, and we're trying to get a randomized trial. This is one of those cases where I think we need some real-world data.
And I do see a high utilization of things like this in my practice in the next couple of years. Right now, I'm mostly using digital AI in pathology, like in research, to be honest, and then we're implementing things like for diagnosis, not for risk assessment. But with that preamble, I'll yield, and we can go through some questions.
But just so that the audience knows, I was really on the bandwagon. But now, when it came, like when rubber met the road and I was talking to my own patients, and do I trust this to make a decision for you that's going to impact your whole life, I don't think we're there yet. And so I think we're getting there. And I think it's exciting, and I think Tward's work should be applauded. But I don't think we're quite there yet, but it's coming fast. And so that's why it's great to have this meeting as we're having today.