• 1. School of Communications and Information Engineering, Chongqing University of Posts and Telecommunications, Chongqing 400065, P. R. China;
  • 2. Sichuan Artificial Intelligence Research Institute, Yibin, Sichuan 644005, P. R. China;
  • 3. School of Computer Science, China West Normal University, Nanchong, Sichuan 637009, P. R. China;
XU Liming, Email: xulm@cwnu.edu.cn
Export PDF Favorites Scan Get Citation

Medical cross-modal retrieval aims to achieve semantic similarity search between different modalities of medical cases, such as quickly locating relevant ultrasound images through ultrasound reports, or using ultrasound images to retrieve matching reports. However, existing medical cross-modal hash retrieval methods face significant challenges, including semantic and visual differences between modalities and the scalability issues of hash algorithms in handling large-scale data. To address these challenges, this paper proposes a Medical image Semantic Alignment Cross-modal Hashing based on Transformer (MSACH). The algorithm employed a segmented training strategy, combining modality feature extraction and hash function learning, effectively extracting low-dimensional features containing important semantic information. A Transformer encoder was used for cross-modal semantic learning. By introducing manifold similarity constraints, balance constraints, and a linear classification network constraint, the algorithm enhances the discriminability of the hash codes. Experimental results demonstrate that the MSACH algorithm improves the mean average precision (MAP) by 11.8% and 12.8% on two datasets compared to traditional methods. The algorithm exhibits outstanding performance in enhancing retrieval accuracy and handling large-scale medical data, showing promising potential for practical applications.

Copyright © the editorial department of Journal of Biomedical Engineering of West China Medical Publisher. All rights reserved