Cleft Lip and Palate Classification Through Vision Transformers and Siamese Neural Networks

被引:0
作者
Nantha, Oraphan [1 ]
Sathanarugsawait, Benjaporn [1 ]
Praneetpolgrang, Prasong [1 ]
机构
[1] Sripatum Univ, Sch Informat Technol, Bangkok 10900, Thailand
关键词
cleft lip and palate; vision transformers; siamese neural networks; few-shot learning; medical assessment;
D O I
10.3390/jimaging10110271
中图分类号
TB8 [摄影技术];
学科分类号
0804 ;
摘要
This study introduces a novel approach for the diagnosis of Cleft Lip and/or Palate (CL/P) by integrating Vision Transformers (ViTs) and Siamese Neural Networks. Our study is the first to employ this integration specifically for CL/P classification, leveraging the strengths of both models to handle complex, multimodal data and few-shot learning scenarios. Unlike previous studies that rely on single-modality data or traditional machine learning models, we uniquely fuse anatomical data from ultrasound images with functional data from speech spectrograms. This multimodal approach captures both structural and acoustic features critical for accurate CL/P classification. Employing Siamese Neural Networks enables effective learning from a small number of labeled examples, enhancing the model's generalization capabilities in medical imaging contexts where data scarcity is a significant challenge. The models were tested on the UltraSuite CLEFT dataset, which includes ultrasound video sequences and synchronized speech data, across three cleft types: Bilateral, Unilateral, and Palate-only clefts. The two-stage model demonstrated superior performance in classification accuracy (82.76%), F1-score (80.00-86.00%), precision, and recall, particularly distinguishing Bilateral and Unilateral Cleft Lip and Palate with high efficacy. This research underscores the significant potential of advanced AI techniques in medical diagnostics, offering valuable insights into their application for improving clinical outcomes in patients with CL/P.
引用
收藏
页数:28
相关论文
共 21 条
  • [1] Tongue Contour Tracking and Segmentation in Lingual Ultrasound for Speech Recognition: A Review
    Al-hammuri, Khalid
    Gebali, Fayez
    Chelvan, Ilamparithi Thirumarai
    Kanan, Awos
    [J]. DIAGNOSTICS, 2022, 12 (11)
  • [2] Arasteh ST, 2024, Arxiv, DOI arXiv:2404.08064
  • [3] Semi-supervised learning of the electronic health record for phenotype stratification
    Beaulieu-Jones, Brett K.
    Greene, Casey S.
    [J]. JOURNAL OF BIOMEDICAL INFORMATICS, 2016, 64 : 168 - 178
  • [4] Optimizing the Ultrasound Tongue Image Representation for Residual Network-Based Articulatory-to-Acoustic Mapping
    Csapo, Tamas Gabor
    Gosztolya, Gabor
    Toth, Laszlo
    Shandiz, Amin Honarmandi
    Marko, Alexandra
    [J]. SENSORS, 2022, 22 (22)
  • [5] Cleft lip and palate: understanding genetic and environmental influences
    Dixon, Michael J.
    Marazita, Mary L.
    Beaty, Terri H.
    Murray, Jeffrey C.
    [J]. NATURE REVIEWS GENETICS, 2011, 12 (03) : 167 - 178
  • [6] Dosovitskiy A, 2021, Arxiv, DOI arXiv:2010.11929
  • [7] Eshky A, 2019, Arxiv, DOI arXiv:1907.00835
  • [8] Harding A, 1996, EUR J DISORDER COMM, V31, P331
  • [9] Koch G., 2015, ICML DEEP LEARNING W, V2, P1
  • [10] Detection and classification of unilateral cleft alveolus with and without cleft palate on panoramic radiographs using a deep learning system
    Kuwada, Chiaki
    Ariji, Yoshiko
    Kise, Yoshitaka
    Funakoshi, Takuma
    Fukuda, Motoki
    Kuwada, Tsutomu
    Gotoh, Kenichi
    Ariji, Eiichiro
    [J]. SCIENTIFIC REPORTS, 2021, 11 (01)