Study design and population
This single-center, retrospective, observational study was conducted using data from the medical records of patients with out-of-hospital cardiopulmonary arrest who were transported to the Nara Medical University Advanced Critical Care and Emergency Centre. This study was approved by the institutional ethics committee of Nara Medical University (No. 3131). Since this was an observational study, the need for written informed consent was waived by the ethics committee of Nara Medical University. This study report follows the TRIPOD guidelines17. All methods in our study were performed in accordance with the tenets of the Declaration of Helsinki.
Between April 2015 and March 2021, consecutive patients with out-of-hospital cardiopulmonary arrest who were admitted and treated for ROSC or extracorporeal circulation were included in this study. Patients who were under 18 years old; patients with head trauma, stroke, and previous intracranial disease, except lacunar infarction; patients transported from other hospitals; patients who did not undergo head CT imaging within 3 h after resuscitation (based on the CT imaging duration of previous studies2); patients with inadequate CT imaging coverage; patients who died during surgery for aortic rupture immediately after cardiopulmonary resuscitation; and patients who were withdrawn from life-sustaining therapy after ROSC at their or their family’s request were excluded. Withdrawal of life-sustaining treatment is not performed at our institution.
Post-resuscitation care
According to our standardized protocol, the patients were managed with sedation, analgesia, and ventilation according to the resuscitation guidelines18. The exclusion criterion for target temperature management (TTM) was shock (systolic blood pressure < 90 mmHg) despite the use of vasopressors. Central body temperature was maintained at 33 °C for 24 h using the Arctic Sun® Temperature Management System (Bard, BD, Covington, GA, USA), followed by rewarming at a rate of 0.25 °C/h and maintained at 37 °C for additional 24 h. If TTM could not be performed, other treatment was performed in the similar manner.
Data collection of participant characteristics
The following data were collected retrospectively from the electronic medical records: age, sex, witness, bystander, initial rhythm, cause of cardiac arrest, time from cardiac arrest to circulation resumption, time from circulation resumption to CT scan, and CPC at 1 month after resuscitation for inpatients and at outpatient follow-up for discharged patients.
Original neurological outcome assessment
Current recommendations define poor neurological outcome as a CPC19 of 3–520,21. Based on the minimum acceptable time of up to 1 month after resuscitation, the neurological outcome of a patient with a good neurological outcome on head CT immediately after resuscitation who died within 1 month was classified as CPC 5, which indicates poor neurological outcome. Accurate labels are indispensable for training the machine learning models on various data features. As this study aimed to predict the neurological outcomes from head CT imaging data, neurological outcome information must be reflected in the training data (i.e., the head CT). The criteria for the presumption of CPC 1 or 2 are that the patient is awake, able to communicate and perform the indicated actions without disability, and has no evidence of paralysis in the extremities. To minimize the risk of misclassification in the present study, the patients classified as CPC 1 or 2 after CT imaging who died within 1 month of admission but whose deaths were not attributable to intracranial disease were still classified as CPC 1 or 2. Since these patients were presumed to have good neurological outcomes after CT was performed, we did not consider it appropriate to exclude them a priori from the study, which aimed to predict neurological outcomes based on head CT imaging.
CT protocol and conditions for image collection
All CT images were acquired with a 64-row helical CT system (Optima CT660; GE Healthcare, Chicago, IL). The scan settings were as follows: 120 kVp; auto mA; rotation time, 0.5 s; helical pitch, 0.531; noise index, 3.0; and image noise, SD10. In post-resuscitation CT imaging, decreasing the examination time is a priority. Therefore, in this study, reconstructed images with an orbitomeatal baseline were used to reduce variations in the imaging conditions. The CT images used for machine learning were of slices at the levels of Monroe’s foramen and the pineal gland that are used in GWR-based studies. Images were acquired as portable network graphics with a size of 1 × 256 × 256 under a window level of 40 Hounsfield unit (HU) and width of 80 HU.
Machine learning model
The prepared image dataset was stratified and divided into training and validation datasets at the commonly used ratio of 8:2. Then, the training dataset was stratified and divided into training dataset and test dataset at a ratio of 8:2 and used to construct the model. The validation data, which was not used for training, was used to validate the model. The VGG1922 machine learning model was used; it is a 19-layer convolutional neural network, with transfer learning for applying parameters that were obtained from training with 1 million images (Fig. 4). Transfer learning is a method of transferring learning on large amounts of high-quality data to create highly accurate models for a small dataset. Although models with better predictive accuracy are available, VGG19 was used in the present study because of the high accuracy achieved by this relatively simple model in a previous study on post-resuscitation head CT8.
Image data acquired at a size of 256 × 256 were cropped in the center and resized to 224 × 224, followed by normalization. As the number of data was small, the data were adapted to image augmentation using transformations. Image augmentation is the creation of a new training sample from an existing image by slightly changing the original images. In this study, we induced random changes in the training data by adjusting sharpness, rotation, and erasing. As the dataset comprised imbalanced data that was biased in class, adjustments were made using weights. A grid search was used to tune the hyperparameters. To avoid overfitting the model on the current dataset, the number of epochs was determined by “earlystopping”, which stops training when the accuracy of the validation data decreases.
We further explored the areas of focus using the Grad-CAM technique to generate a “visual explanation” for the class decision of the model13. The Grad-CAM technique uses gradients that flow into the final convolution layer to produce a coarse localization map that highlights important regions in the image to predict the classes.
Measurement of GWRs based on CT scans
The GWRs were measured for all patients in the study using a previously described method5,6,7,23. Briefly, head CT scans were retrospectively reviewed twice by an emergency physician blinded to patient outcomes. We measured the average HU of the circular ROI (10.0–15 mm2) on each side of the basal ganglia, centrum semiovale, and high cortical level. The caudate nucleus (CN), putamen (PU), posterior limb of internal capsule (PLIC), and corpus callosum (CC) were measured at the basal ganglia level, and the medial cortex (MC) and medial white matter (MW) were measured at the centrum semiovale level (MC1 and MW1) and high cortical level (MC2 and MW2), respectively. The relationship between the two measurements was evaluated using Spearman’s correlation coefficient, and the average of both measurements was used for subsequent evaluation. The GWRs were calculated according to previously reported equations as follows: GWR-BG = (CN + PU)/(PLIC + CC), GWR-CE = (MC1 + MC2)/(MW1 + MW2), and GWR-AV = (GWR-BG + GWR-CE)/2. In this study, we used a GWR cut-off value of 1.2, as mentioned in previous studies5.
Statistical analysis
Continuous variables are expressed as median (interquartile range) and categorical variables as number of patients (percentages). The Mann–Whitney U and Fisher’s exact tests were used to compare continuous and categorical variables, respectively. Statistical significance was set at P < 0.05. The neurological outcome prediction performance of the methods was assessed by plotting the ROC curves and comparing the AUCs. As this study’s dataset comprised imbalanced data with labels at a ratio of approximately 8:2, a PR curve was drawn, and the AUCs were compared. ROC curves are frequently used to compare the performances of models since they are unaffected by the class proportions of the data. However, as our study cohort was considered to have a class imbalance (a small number of CPC 1/2), we had to consider the class bias. PR curves provide a good picture of the performance of a method when the ratio of classes in the test data is close to the ratio expected when the model is practically applied24. All analyses were performed using Python 3.8.5 (Python Software Foundation, Beaverton, OR).