Despina Siolas, MD, PhD, on Use of an NLP Model to Identify High-Risk Bladder Cancer
– Criteria recognized in unstructured EMR data from real-world patients
This Reading Room is a collaboration between ® and:
In patients with non-muscle invasive bladder cancer (NMIBC), treatment from the American Urological Association (AUA) and the Society of Urologic Oncology (SUO) provide clinical and pathologic criteria for risk stratification. Identifying these criteria in real-world patients, however, can be challenging, since the data from electronic medical records (EMRs) in hospital-based U.S. health systems contain no specific codes for NMIBC versus muscle-invasive bladder cancer.
Recent study results in show that a natural language processing (NLP) model developed using unstructured EMR data can retrospectively identify and describe characteristics of real-world patients with high-risk NMIBC with a sensitivity and specificity of 83.7% and 91.1%, respectively. The NLP model's false-positive rate was 8.9%, reported Despina Siolas, MD, PhD, of Weill Cornell Medical College in New York City, and colleagues.
These rates were confirmed using manual records as "the gold standard," the team explained.
"The NLP model, combined with a rule-based algorithm, identified high-risk NMIBC with good performance and will enable future work to study real-world treatment patterns and clinical outcomes for high-risk NMIBC," the researchers wrote. Previously, have been used to extract data on bladder cancer stage, grade, and from pathology reports and other types of clinical records.
NMIBC -- as tumor limited to the urothelium, stage Ta or carcinoma in situ (CIS), or to the lamina propria (stage T1) -- affects 70-80% of patients with newly diagnosed bladder cancer.
The current NLP model was developed, trained, and validated using three independent EMR-derived datasets from U.S. patients diagnosed with NMIBC between 2011 and 2020, Siolas and co-authors explained. Each data set contained at least 10 examples of NMIBC features classified by AUA/SUO risk criteria, including tumor stage, grade, histology, recurrence, tumor size and number, and degree of invasion.
The retrospective validation analysis of data from 4,402 adults with NMIBC showed that the NLP model recognized seven high-risk AUA/SUO risk-stratification criteria, with a positive predictive value of 79.4% and a negative predictive value of 93.2%. The F1 scores, weighted across NMIBC features, were >0.7 for all but prostatic urethral involvement.
Of the 748 patients manually confirmed as having high-risk NMIBC, the NLP model showed that 196 (26%) had CIS, and that of these, 19% also had T1 disease and 23% also had Ta disease. In 552 tumors (74%), there was no associated CIS.
In the following interview, Siolas explained why she finds this work so exciting.
How would you characterize your findings?
Siolas: They're a great step in the right direction. We used unstructured medical records primarily used for routine healthcare delivery and deciphered cancer-related data specific for a subpopulation of patients with bladder cancer. This is very cutting edge.
Were there any "ah-ha" moments?
Siolas: It was really exciting when we realized that the NLP model was performing at a very high performance metric. That meant our goal was becoming a reality.
Could this model be used to develop other phenotyping models to identify high risk in other cancer types?
Siolas: Absolutely! I think this is an exciting opportunity to apply this model for other types of cancer, such as breast cancer.
What's next for your research?
Siolas: The next step would be to examine a larger cohort for treatment patterns and clinical outcomes for patients with high-risk NMIBC. We would like to further explore predictive factors of disease progression, clinical treatment trends, or other variables associated with survival outcomes.
Read the study here.
The study was funded by Merck Sharp & Dohme.
Siolas reported financial relationships with Zephyr AI, Ciox Health, and Natera; other co-authors also reported relationships with industry.
Primary Source
JCO Clinical Cancer Informatics
Source Reference: