Whole Body PSMA Quantitative Parameters: Why Are We Not Using Whole Body PSMA PET Parameters in the Real World? "Presentation" Irène Buvat

April 9, 2025

At the 2025 UCSF-UCLA PSMA Conference, Irène Buvat explains why quantitative whole-body PSMA PET indices aren't routinely used clinically despite their value. She demonstrates that various deep learning solutions for automated segmentation show excellent correlation with expert assessments while maintaining consistent risk stratification power. Dr. Buvat advocates establishing a benchmark method to enable systematic reporting of valuable biomarkers including tumor heterogeneity and tissue-specific volumes.

UCLAUCSF video logo stroke

Biography:

Irène Buvat, PhD, Director of Research Unit, Head of Laboratory of Translational Imaging in Oncology, Institut Curie, Inserm, Orsay, France


Read the Full Video Transcript

Irene Buvat: So why are we not using whole-body PSMA PET index routinely? Because as you've seen, they can be very useful. So Wolfgang demonstrated very well the usefulness of these whole-body PSMA biomarkers. But I guess not many of you are using them clinically in the everyday routine. And the reason is that they require whole-body tumor segmentation. And at the moment, there is no one push-button whole-body tumor segmentation available in clinical routine.

And for that reason, many visual scores have been proposed as a replacement for these quantitative whole-body tumor analyses. So for instance, the visual tumor to salivary gland ratio or the HIT score, which is a combination between heterogeneity and SUVs in the different lesions, have been proposed. And they have been shown to have some prognostic value for PSA response and also for overall survival. And you also have a visual version of RECIP 1.0 to assess the tumor response after lutetium treatment.

But we are in 2025, and my claim is that time has now come for automated segmentation, for quantitative instead of visual image analysis. And actually, plenty of solutions for automated segmentation using deep learning in PSMA PET images have been published over the last two to three years.

And if you look at this paper, you see that they report very encouraging results with very good segmentation of targets, even when they are plenty. Here, you can see the red delineations. And in this paper, when they look at the correlation between the automated segmentation and the resulting total tumor volume with what they got when an expert did a segmentation, you can see that there is an excellent correlation.

So one question you might have is whether all these AI tools will actually provide the same total lesion volume. And the answer is actually no. You might get different results depending on the AI tool you use. And this is not specific to AI. If you use different thresholds—SUV greater than 3, SUV greater than 4—you get different results. And this has been shown in the past for non-AI methods.

But this paper shows that you had a very good correlation between the tumor volume measured with different thresholds. And the good news is, whatever the threshold, the stratification between the low and high risk patients remains identical. So you have the same stratification power, whatever the tool or the threshold here.

So does that mean that we do not care about which method is used, which AI tool is used? Not exactly, because as different methods will yield different results, the optimal cutoff to distinguish between high risk and poor risk patients—or the one you want to treat with lutetium and the one you don’t want to treat—will differ. So if you want to agree on a single cutoff, you have to agree on the way total lesion volume has to be measured.

And this is where the notion of benchmark arrives. So a benchmark is by no means the ground truth. It is only a method we all agree on to delineate the tumor lesions. So in our context, this could be publicly shared PSMA PET/CT scans with a broad variety of uptake patterns, with a consensus segmentation, and the associated total lesion volume for each.

And if we had that, anyone could check that his favorite AI tool gives a result that is compliant with the benchmark results. And this is doable because we have done it for FDG-PET in lymphoma patients. We shared 60 cases. And you can see on the right-hand side that they had very different uptake patterns. And for each of these patients, we agreed on the way we should measure the total metabolically active tumor lesion.

So for each of these cases, we provide a reference segmentation, as well as the associated total metabolically active tumor volume. And so you can use this data to check that you are able to apply the benchmark method properly and get the expected TMTV. And you can also test an AI tool to determine whether it gives the expected values.

And so if it gives the expected value, you can consider that your AI tool is compliant with the benchmark method. And if it doesn’t, then you have to run an analysis to tell whether your tool is actually better than the benchmark method, which is here an SUV greater than 4, basically

And of course, if you have this automated whole-body tumor segmentation available, then you can systematically report plenty of candidate biomarkers, such as the heterogeneity of tumor uptake, the percent volume of tumor that respond to treatment. And if you also perform FDG-PET for these patients, you would be able to calculate the fraction of the total tumor volume as per FDG-PET, for instance, that is PSMA positive. And that might be an interesting biomarker to decide whether you want to treat your patient with lutetium or not.

And at the moment, we also have an AI-based tool for performing a whole-body organ segmentation based on the whole-body CT. So if you combine these two, you would even be able to report tissue-specific tumor volume by just overlaying your tumor segmentation from the PSMA scan with the organ delineation obtained using a publicly available tool here, Total Segmentator.

So in conclusion, I think time has come to shift to AI-driven whole-body tumor segmentation even if it’s not perfect. It’s not perfect, but still, it can be useful to leverage the quantitative whole-body PSMA imaging. And for that, I think it would be worth creating and agreeing on a benchmark method so that one push-button AI tools could be assessed, and then one could systematically generate whole-body biomarkers, such as the total lesion volume, and the most clinically useful whole-body biomarkers will naturally emerge.