Multimodal Deep Learning Predicts Bladder Cancer Response to Neoadjuvant Chemotherapy - Bishoy Faltas
June 9, 2025
Biographies:
Bishoy M. Faltas, MD, Chief Research Officer, Englander Institute for Precision Medicine, Gellert Family- John P. Leonard, MD, Research Scholar, Associate Professor of Medicine, Cell and Developmental Biology, Weill Cornell Medicine, New York- Presbyterian Hospital, NY
Ashish Kamat, MD, MBBS, Professor of Urology and Wayne B. Duddleston Professor of Cancer Research, University of Texas, MD Anderson Cancer Center, Houston, TX
Predicting response to neoadjuvant chemotherapy in muscle-invasive bladder cancer via interpretable multimodal deep learning.
ASCO GU 2024: Predicting Clinical Outcomes in the S1314-COXEN Trial Using a Multimodal Deep Learning Model Integrating Histopathology, Cell Types, and Gene Expression
Ashish Kamat: Hello, everybody. And welcome to UroToday's Bladder Cancer Center of Excellence. It's a pleasure to welcome to the forum once again someone who's been with us many times, Professor Bishoy Faltas, who really needs no introduction. Bishoy, you've done a lot when it comes to advancing the field of bladder cancer, not just obviously on the clinical side, but on the deep, deep dive into learning and really looking at it from a multi-perspective way, which is really very appropriate given the title of your presentation and discussion today, which is the multimodal deep learning to predict responses in bladder cancer. So really, really excited to see what you have to share with us.
Bishoy Faltas: Thank you so much, Dr. Kamat. It's a pleasure to be here today. And yes, absolutely. So actually the title of my talk today is multimodal deep learning to predict response in bladder cancer. And what had really driven us to explore this is exactly what you mentioned, is that we're generating all these different types of data, whether it's imaging data, whether it's genomic data, or whether it's functional data from the laboratory.
And now with artificial intelligence, we are starting to think about a framework we can integrate all these data, whether it's in a biomarker setting, which is what I will talk to you about today, but also in different other settings which we can talk about in the Q&A. So most of the work that I will discuss today has been published. This is from a manuscript that was published. And the title of the manuscript is predicting response to neoadjuvant chemotherapy in muscle-invasive bladder cancer via interpretable multimodal deep learning.
And as you can see, this work is really a team science effort that included many members of the team including-- it was led by Fei Wang, who's my colleague here at Weill Cornell and co-led by me. And we had trainees from my lab and from his group that were co-leading the work as well. And as you can see, there are also-- there's also involvement from many others at the SWOG Cooperative group, who helped provide us with a lot of the samples from the clinical trial that was used in this particular paper and also helped us with various stages of collaboration.
So we set out to solve an unmet clinical need, which is the lack of accurate biomarkers for predicting pathologic complete response to neoadjuvant chemotherapy. As we know, that neoadjuvant chemotherapy, with or without immunotherapy now, followed by radical cystectomy is a standard of care for patients with muscle-invasive bladder cancer, or MIBC. However, only about a third of patients achieve a complete pathologic response, which I refer to from now on as Path CR.
And we know that path CR positively correlates with overall survival and is a reliable predictive biomarker of overall survival. But we don't have really a reliable predictive biomarker of pathologic complete response to neoadjuvant chemotherapy itself. And if we were to have that, we can potentially identify patients who are candidates for bladder preservation.
So the main design features of our model is how we're integrating histopathology image analysis, RNA expression, spatial cell type data to achieve accurate pathologic complete response. And our hypothesis was that we can develop such a biomarker, such an integrated or multimodal biomarker, using different inputs from the SWOG 1314 COXEN study. This was a randomized phase two trial enrolling patients who received neoadjuvant chemotherapy with either four cycles or gemcitabine cisplatin, or dose-dense MVAC followed by radical cystectomy.
And we use the H and E histopathology images and the Affymetrix transcriptomic data that were obtained from the transurethral resection of the bladder tumor, or TURBT, from 180 patients. And we used all this data as inputs for training our deep learning model to predict complete pathologic response following radical cystectomy. So how does this model actually learn to predict clinical outcomes from the training data?
And these deep learning models essentially go through the same process as a human or a toddler, learning how to identify an object or an animal, in this case, a dog. So the toddler learns what a dog is, and more importantly, what it's not by pointing to objects and saying the word dog. And then the parent provides constant feedback until the toddler's brain becomes aware of the features that all dogs possesses.
This same process applies to a deep learning model, which applies what's called a non-linear transformation to its input and uses what it learns to create a statistical model as an output. And iterations of this process continue until the output has reached an acceptable level of accuracy. And the number of processing layers through which this data passes is really what inspires the label deep.
And the architecture that enables this learning is biologically inspired by neurons. So it's actually artificial neurons, which are essentially these graphics processing units that excel at performing computations, multiple computations in parallel. And a deep neural network aggregates several layers of these cascaded artificial neurons to learn the hierarchy of progressively complex features.
So we design a deep learning model that integrated three neural network branches, each with a different internal architecture and each learning from a different data input. The first branch uses what's called a multi-layer perceptron that's the design of the model to create a response classifier from gene expression. The second branch using the ResNet architecture. So that's a different type of architecture, which is a convolutional neural network with 50 deep layers. That's where 50 comes from in the name. And that learns the most important features from the whole pathology slide images, creating neural embeddings as output.
And then the third branch uses a pre-trained convolutional neural network model called HoverNet to identify cell types based on nuclear morphology. And we can then map out the spatial distributions of these cell types within the slide. And finally, all these model outputs are integrated to predict pathologic complete response. And this design allows several things. First, it allows us to determine the importance of each branch, the relative importance of each branch to the path CR prediction.
It also allows the model to be interpretable, meaning that we can interrogate the model to understand the features that are important for its decision making and the biological insights it learned by comparing how it learned from these different types of data. So to train the model, we consider pT0 to be a complete pathologic response. And we had 30% of patients with available slides achieving path CR.
We use two methods to rigorously test the model's predictive performance. First, we split the data set into 80% for training, 20% for testing. We then used what's called-- sorry. We used what's called a five-fold cross validation, dividing the data set into five equal folds. And each fold serves as a temporary testing set, while the remaining data is held out as the training set. And that ensures that the model is robust and performs well across different subsets of data.
So we looked at the value of integrating these multiple data inputs and using deep learning to accurately predict these neoadjuvant chemotherapy response. And to do this, we compared the area under the curve from the integration of three branches compared to individual branches, for example, gene expression or combinations of two branches, gene expression plus the neural embeddings from whole slide images and so forth.
And what we found is that the integration of the gene expression, the whole slide neural embeddings, and the cell type data, so essentially, all three branches really improved performance, achieving an AUC of 0.72 in the 80-20 training testing split and outperforming single and bimodal combinations. This was also confirmed in the five-fold cross-validation. And here we see an AUC of 0.74, highlighting the reliability of this integrated model and the added value of these different types of data that capture information that is not encoded in the other branches.
So now that we knew that the whole is greater than the sum of its parts, we went back and asked, well, which of these three layers contributed the most to the combined predictions? And in order to do that, we measured what's called the Shapley additive exPlanation, or SHAP values, for each of these branches. These are values that are actually derived from a game theory approach that measures each player's, or in this case, data type branch contribution to the final payout, which in this case is the prediction of path CR.
So the gene-- we found that the gene expression branch contributed the most to path CR predictions, with the mean SHAP magnitude of 0.13 compared to lower SHAP values for neural embeddings and cell type and morphology features, highlighting the value of this gene expression data, which again is not necessarily encoded in the histopathology features in terms of predicting the value of it for predicting pathologic complete response to neoadjuvant chemotherapy.
So then we try to understand if the model could learn on its own some important biological features. And the challenge that we face here with these types of model is so-called the Blackbox problem. And this problem arises because we're unable to know exactly the factors that these models use to make decisions. And we were very cognizant of that at the time that we designed this study. So we tried to design our model to be interpretable as much as possible to gain insights into its decision making.
So because we know the biological meaning of the expression of specific genes, we can use that as a test or a sanity check, if you will, whether the model is learning biologically relevant features. And the SHAP values are again useful here to help us understand the most important features for the model. And we found, for example, that it autonomously learned that the expression of important genes, including, for example, TP63, is critical for predicting pathologic complete response.
And TP63 is an important transcription factor that is a master driver of the basal differentiation program, with a known prominent role in the basal subtype of bladder cancer. So in some ways, that makes sense that would be a gene that is associated with response to any adjuvant chemotherapy. And indeed, when we perform gene set enrichment analysis of the genes that were identified by our deep learning model as biologically relevant, we found that the basal differentiation program was significantly associated with response predictions to neoadjuvant chemotherapy.
And the directionality of the response between submolecular subtype membership and response to chemotherapy is a bit complex. But there was a recent paper that was published by Dr. Lerner in Clinical Cancer Research that showed essentially that RNA expression using RNA expression outcome data from the same study, from SWOG 1314. And you can see that the basal subtype is an important subtype in terms of response to neoadjuvant chemotherapy.
But interestingly, as I mentioned earlier, the data that we used was a bulk expression data. And it's important to understand that the molecular subtype classifications such as basal or luminal are derived from bulk RNA seq or expression data, which aggregates transcripts from for multiple cell types in the tumor and the microenvironment. But we have actually this information captured in a spatial way in the H&E images.
So we asked whether we can recover this information using our HoverNet model. And we can definitely do that. As you can see here, we can identify cancer cells that are highlighted in red immune cells which are green, stromal cells which are blue within the segmented patches in these H&E images from patients who had pathologic complete response or not. And we then interrogated our deep learning model to identify patches that our model assigned high value to, so high predictive value to, for predicting pathologic response. And we can generate a spatial map of these areas for each whole slide, as you can see here.
And then we can overlay the cell type contributions on these spatial maps. And we can measure the ratio, for example, of different cell types in the high attention to the low attention patches and compare these values between patients who had a complete response or not. And we found that tumors from patients who had a path CR showed significantly higher tumor stroma ratio that was enriched in high attention patches compared to tumors from patients who did not achieve pathologic complete response. And this shows that our model can discover new relationships and new hypotheses essentially between the abundance of a specific cell type and response to therapy just from the H&E images.
So I'll end with a look towards the future. And I would love to talk about this more. We are planning to test our model in additional clinical data sets, especially with the changes in the drugs that are used in the neoadjuvant setting. We are working on integrating additional data type branches, for example, circulating tumor DNA or genomic information.
And I'd like to point out that in this model, we actually did not include the clinical information. So in some ways, our model really cleared a high bar because the clinical information carries a lot of predictive value that we could have included in this model. And then those AUC values would definitely go up. But actually, we did all of this just from the H&E images and the RNA expression data without factoring in any clinical values or clinical parameters.
And we are planning to train models with the same architecture to predict response to emerging neoadjuvant chemotherapies, including immunotherapy and antibody drug conjugates. And our hope is that accurate AI-driven integrated biomarkers will enable bladder preservation approaches. So with that, I'd like to thank my collaborators. And I would like to thank the SWOG group, Seth Lerner, David McConkey, and everyone in this group who participated in this study. And I'd like-- I would like to thank Fei Wang and Olivier Elemento at Weill Cornell. And I would be happy to take any questions.
Ashish Kamat: Professor Bishoy, once again, congratulations on such a tremendous body of work. You've highlighted some of the issues that we would have been able to discuss, and we will still go through them. But tell us a little bit about your sense as to how this particular model that you've developed, a multi-layer model, essentially is the next generation of machine learning and AI. Because folks have come out with this when it comes to pure clinical data, which you didn't include here, genomic data, which you did, H&E stains.
And these are special case used machine learning AI models. And just as you've gone back and assimilated all of this into one large model, it's almost like AI is moving towards a literal one size fits all, which is then going to be useful across the spectrum of all cancers. So share with us some of your thoughts about that concept.
Bishoy Faltas:Yeah, that's a great concept. Great question. So folks who are following AI, the next thing that they talk about is AGI, or Artificial General Intelligence. I think you're maybe bringing up what would artificial general intelligence for oncology look like? Where you could learn from all types of data. I don't know that we're there yet in that sense. I think that is possible with a lot of data and novel architectures.
I think in biology and in clinical medicine in general, I think the bottleneck-- there are two bottlenecks in my opinion. And the key one is the lack of large enough data sets that are well annotated with the right information. And that's whether at the biology and the laboratory setting or at the clinical setting because of HIPAA restrictions, or other silos, or where the data, data silos, where it's very hard to combine data and batch effects. And also the real value is in clinical data annotation. And I think we all know the limitations of that and how time and resource-intensive that is.
So I think as that improves, there are ways that these things can be addressed. So people who work in computer science work in synthetic data sets, which can be-- they are synthetic but can be used to train models that are smarter. And then the other aspect is development of novel architectures that are learning through biological and clinical insights and not just large language models, just kind of learning to predict the next word.
And human language lends itself very well to that concept and so does a lot of our knowledge. But that doesn't necessarily mean that it's learning the core or the essence of our knowledge. And if it's not learning that, its ability to lead us to the next big breakthrough is somewhat limited.
Ashish Kamat: Yeah. I think you're right. Because again, we want these models to be able to tell us the information that we need and that our patients need and not necessarily evolve into the next Lord of the Rings, where you're just going down different stories and rabbit holes. I'm sure you're aware of PANDA, which is a machine learning CT scan tool that the FDA has fast tracked. And it blew the radiologists out of the water when it comes to diagnosing pancreatic cancer just based on CT scans.
So you alluded a little bit to having this now granted iteration, where you will use imaging. You will use clinical data. Is that something you're envisioning just in a disease state? Like for example, I know you touched upon a little bit. What do you think that the power of what you are developing is truly cancer agnostic or disease site agnostic?
Bishoy Faltas: I think it's disease site agnostic. So we are-- I think the key thing we're developing at Well Cornell and at the Englander Institute for Precision Medicine in my lab. So it's a collaborative effort, but also with other collaborators at Cornell University in Ithaca and other collaborators. We're developing an ecosystem where you can bring in all these pieces together and have them talk to each other as we were just saying.
So we're starting with integrating additional. So we've successfully showed in this paper we can integrate imaging data. So that's H&E images. We can integrate transcriptomic data. So that's already done, and we've published that. We're now working on a framework to integrate in addition to that circulating tumor DNA data and imaging data. And we have already done that to a certain extent. So there is additional work that is funded by the Bladder Cancer Advocacy Network, so BCAN. Innovation Award with myself and my collaborator at Cornell, Jaehee Kim.
And we are building these bio-digital twins where we're going to be able to-- one of the aims is to really incorporate all this data together. And we can, for example, measure the total tumor volume for patients with metastatic urothelial cancer or really any cancer from their PET CTs. So we can measure the total global volume. So not just a 2D resist. And again, that takes the images, uses AI to extract that information, then we feed it into another into the AI that we've developed.
And we can correlate that with the circulating tumor data that we've developed. And we're building that into a-- combining that with a virtual patient model that Jaehee and my lab have developed. And that's a very potent way because then we can essentially predict what's happening in the real patient and simulate a lot of these treatments in silico. But it's really anchored in real patient observations. So that framework is something that we're very excited about. And we're working very hard on that. And hopefully we'll be able to bring this to the light soon. But that's something that we're very excited about because we see this as a way where you can start truly integrating all of these elements together.
Ashish Kamat: Yeah. And that's really cool work. Because obviously as you alluded to a little bit, and it's [INAUDIBLE] the audience more to the paper for the details. But clearly, being able to predict path CR is the first step. Obviously, correlate with overall survival, whether you can actually have CCR and path CR correlate and using all the other parameters you talked about.
The next step is can we actually help patients identify if it's safe to just preserve their bladder after neoadjuvant therapy, whether it's a platinum-based, or ADC-based, or whatever. So this is really exciting stuff, Bishoy. And again, we could chat forever, but I want to just, in the interest of time, say thank you and thanks for sharing this with us.
Bishoy Faltas: Thank you so much for having me.