An artificial intelligence model system for bone age assessment of preschool children

WhatsApp Group Join Now
Telegram Group Join Now
Instagram Group Join Now

Bone age serves as a quantitative measure of skeletal developmental maturity.1 Wrist x-ray images are becoming more widely used for BAA in children.10,11,12,13,14 In the past, manual BAA methods required observers to carefully compare or score individual bones.16 DL offers a faster and more permanent solution. We classified multiple observers into groups, tested and compared their diagnostic accuracy with and without AI assistance, and assessed inter-observer consistency and intra-observer reproducibility. The results show that experienced radiologists can increase the accuracy of BAA with the help of AI. At the same time, AI can reduce interobserver variability and increase intraobserver reproducibility.

AI technology stands out as a prominent application in the realm of medical imaging, including the diagnosis of lung nodules and the detection of bladder cancer.17,18 DL can accurately estimate the shape and position of each target bone in the wrist for BAA, developed since 2017.19 Currently, researchers build algorithmic models to predict the speed and accuracy of BA.20 Drawing from a vast collection of photographs.9, 21 Spampinato et al.9 were pioneers in exploring the application of DL to medical images, and demonstrated a mean deviation of approximately 0.8 years compared to manual assessment. In 2020, Reddy et al.22 used a publicly provided anonymized dataset from the Radiological Society of North America Pediatric Bone Age Challenge.2 MAEs were comparable between whole hand and index finger models (0.392 years vs. 0.425 years, p= 0.14). Both BA values ​​were significantly smaller than those obtained by three pediatric radiologists from single-finger radiographs (0.667 years, p<0.0001). Larson et al.21 developed a DL model for BAA using the Greulich and Pyle (GP) atlas and associated clinical radiology reports based on a comparison of 12,611 clinical hand radiographs. The mean difference between modeled BAA radiographs and evaluators was 0 years, with mean RMSE and MAE of 0.63 and 0.50 years, respectively. All assessments fell within the 95% range of agreement with each other. The residual network model efficiently extracts X-ray bone image features and independently determines bone age, boasting an impressive BA prediction accuracy of 97.6% and an MAE of 0.455 years.12 AI models have consistently demonstrated high accuracy in BAA,21,22,23 And the results of this research confirm this fact. Radiologists can increase their diagnostic accuracy in BA diagnosis with the help of AI models.

Environmental and genetic factors influence bone development to varying degrees, resulting in varying outcomes in BAA. We used two different methods of BAA, mainly suitable for Chinese children. Both the TW3 method and the RUS-CHN method are widely used for the assessment of preschool children. The TW3 method assesses and scores the maturity of each bone region of interest and draws on reference data from children living in Europe and the United States, published in 2001.4 The TW3 method is a quantitative approach that scores and aggregates 20 wrist bones, characterized by strong objectivity, resulting in highly accurate assessments with an accuracy of less than one month.24 However, it is time-consuming and involves a complex evaluation process. Several studies have confirmed the high accuracy of the TW3 method for BAA.3,25 In a sample of British children, CA was underestimated in females over 3 years of age, resulting in BA and CA (−0.43 years, p<0.001), while no such difference was observed in males (0.01 years, p= 0.760).3 Based on the analysis of 9059 clinical left-hand radiographs, an improved TW3-AI system for BAA demonstrated strong concordance with reviewers' overall assessment with an RMSE of 0.50 years.25 In our study, with the help of the AI ​​model system, the RMSE of mid-level physician observations decreased from 0.358 to 0.151. This further indicates that AI has the potential to reduce the disparity in BAA results compared to the reference standard in the TW3 method, thereby helping clinicians to increase diagnostic accuracy. In 2006, researchers5 Revised standards based on TW3 method and established RUS-CHN method. Using samples from urban areas in China building on the original bone growth framework of the TW3 method, the RUS-CHN method identifies new maturational characteristics, which characterize children's skeletons during their rapid growth and development. are better adapted to the actual conditions. It also subdivides the long-term fusion process of the radius and ulna into five distinct grades, thereby increasing accuracy throughout the growth and development period.26 The RUS-CHN method requires more steps, consumes extra time during the evaluation process, and is difficult to master. In a preliminary study by our team involving 390 preschool children, it was observed that the TW3 method performed better than the RUS-CHN method, but was not completely reliable on its own. This is because both methods overestimate the age of both sexes. Nevertheless, the mean difference of the TW3 method approached zero.27 In the present study, when observers used the RUS-CHN method, with and without AI assistance, the RMSE was 0.359 and 0.148, while the MAE was 0.309 and 0.113, respectively, indicating a high level of diagnostic performance. . Moreover, with the help of AI, the diagnostic accuracy of the observer can be further enhanced.

The application of AI systems to BAA presents two primary challenges, namely ensuring consistency in both inter- and intra-observer assessments. In an investigation involving US children, researchers compared the BAA performance of a group of pediatric radiologists with and without AI support. With the help of AI, the accuracy of BAA improved, with an overall accuracy of 68.2% versus 63.6%, and an accuracy of 98.6% within 1 year versus 97.4%. Furthermore, the ICC with AI was 0.9951, while without AI, it was 0.9914.10 Lee KC et al.28 found that a deep learning-based model demonstrated accuracy for a total of 102 hand radiographs in BAA. Furthermore, it appeared to increase clinical utility by improving inter-observer reliability, which increased the two-observer ICC from 0.945 to 0.990 with AI. More recently, Wang X et al.15 concluded that an AI model increases both the accuracy and consistency of BAA for physicians of all experience levels. The accuracy of senior, mid-level, and junior clinicians was significantly better without AI support than with AI support (MAEs of 0.325, 0.344, and 0.370 vs. 0.403, 0.469, and 0.755, respectively). Furthermore, their consistency results were significantly higher without AI assistance than with AI assistance (ICCs of 0.996, 0.996, and 0.992 vs. 0.987, 0.989, and 0.941, respectively). In this study, for the comparison of inter-observer agreement, with the help of AI, the ICC values ​​for both methods of BAA reached 0.991 in the first interpretation. Regarding the intraobserver reproducibility between the first and second interpretation, the ICC results were increased to 0.998 for the TW3 method and 0.997 for the RUS-CHN method (Reviewer 4). and Bland-Altman plots showed an excellent agreement between reviewers in both methods. The use of AI-assisted software in BAA can help assessors reduce both inter-observer variability and intra-observer variability.

Advances in AI software have simplified and accelerated the BAA process. Several studies have compared differences in BAA between AI tools and radiologists.13,16,21,28,29,30 Their results confirm that AI can increase diagnostic accuracy. However, relying solely on AI results without radiologist confirmation is not considered reliable.31 In such cases, AI software is designed to assist radiologists in making faster and more accurate diagnoses rather than replacing radiologists. Two scenarios were set up for the observers, one with and one without the AI ​​model system, and the BAA accuracy was calculated separately. Our results are consistent with previous findings and further demonstrate that AI can help radiologists increase the accuracy of BAA, especially in preschool children, using both TW3 and RUS-CHN methods.

The present study has several limitations: 1) it is a single-center, cross-sectional study with small sample sizes, focusing only on a specific population aged 3–6 years in China; 2) The study exclusively compared the TW3 and RUS-CHN methods, but other methods such as the GP method, which are commonly used in different regions and hospitals, were not considered. 3) The observers in this study were mid-level physicians, and had no comparison with other level physicians, such as junior and senior physicians. 4) Bone age assessment time was not documented, although previous studies have shown that AI can reduce assessment time. Comparative time consumption should be considered. Therefore, more in-depth multicenter studies involving different BAA methods and observers with different levels of experience are necessary to validate these findings in future research.

During the BAA process for preschool children, the use of AI model systems can significantly improve not only clinicians' diagnostic accuracy, but also interobserver consistency and intraobserver reproducibility. As a result, AI model systems hold great promise for age assessment of X-ray hand and wrist bones and are a valuable tool in the clinical work of radiologists.

WhatsApp Group Join Now
Telegram Group Join Now
Instagram Group Join Now

Leave a Comment