Quantum word embedding for machine learning

被引:1
作者
Nguyen, Phuong-Nam [1 ]
机构
[1] PHENIKAA Univ, Fac Comp Sci, Hanoi 12116, Vietnam
关键词
quantum computing; bioinformatics; data analysis; SEQUENCE; DESIGN;
D O I
10.1088/1402-4896/ad6299
中图分类号
O4 [物理学];
学科分类号
0702 ;
摘要
The accelerated progress in quantum computing has enabled a new form of machine intelligence that runs on quantum hardware, which holds great promise for more powerful computational models in various learning tasks. An emergent application of Quantum Machine Intelligence (QMI) is Quantum Natural Language Processing (QNLP). This paper proposes a multi-dimensional, finite automaton model for quantum word embedding (QWE) via the Galois field. We demonstrated the model to three applications: (1) English vocabulary, (2) amino acid-based genetic codes, and (3) DNA-based genetic codes. The numerical results obtained from the proposed algorithm for the English vocabulary indicate that it produces more representative word features than Word2Vec based on the word distance metric. Second, the proposed algorithm is also utilized to model RNA-Protein interaction based on the latent distance of a given molecule, which is demonstrated on three large datasets, namely RPI369, RPI1807, and RPI2241. Finally, two embedding techniques for DNA-based genetic codes are proposed in this work: Two-state Lackadaisical Encoding (TCE) and Topological-Cyclic Encoding (TLE). These techniques enable extracting relevant features for the efficacy score of gRNAs used in the CRISPR-Cas 9 system, demonstrated on 15 datasets, compared to 12 mathematical features. We make our implementation available at https://github.com/namnguyen0510/Quantum-Embedding-of-Word/tree/main.
引用
收藏
页数:19
相关论文
共 49 条
[1]   The power of quantum neural networks [J].
Abbas, Amira ;
Sutter, David ;
Zoufal, Christa ;
Lucchi, Aurelien ;
Figalli, Alessio ;
Woerner, Stefan .
NATURE COMPUTATIONAL SCIENCE, 2021, 1 (06) :403-409
[2]  
[Anonymous], 1998, Galois theory
[3]   Representation Learning: A Review and New Perspectives [J].
Bengio, Yoshua ;
Courville, Aaron ;
Vincent, Pascal .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2013, 35 (08) :1798-1828
[4]   MathFeature: feature extraction package for DNA, RNA and protein sequences based on mathematical descriptors [J].
Bonidia, Robson P. ;
Domingues, Douglas S. ;
Sanches, Danilo S. ;
de Carvalho, Andre C. P. L. F. .
BRIEFINGS IN BIOINFORMATICS, 2022, 23 (01)
[5]  
BONIDIA RP, 2022, BRIEF BIOINFORM, V23, DOI DOI 10.1101/2020.12.19.423610
[6]   Geometric Deep Learning Going beyond Euclidean data [J].
Bronstein, Michael M. ;
Bruna, Joan ;
LeCun, Yann ;
Szlam, Arthur ;
Vandergheynst, Pierre .
IEEE SIGNAL PROCESSING MAGAZINE, 2017, 34 (04) :18-42
[7]  
Chari R, 2015, NAT METHODS, V12, P823, DOI [10.1038/nmeth.3473, 10.1038/NMETH.3473]
[8]   DeepCRISPR: optimized CRISPR guide RNA design by deep learning [J].
Chuai, Guohui ;
Ma, Hanhui ;
Yan, Jifang ;
Chen, Ming ;
Hong, Nanfang ;
Xue, Dongyu ;
Zhou, Chi ;
Zhu, Chenyu ;
Chen, Ke ;
Duan, Bin ;
Gu, Feng ;
Qu, Sheng ;
Huang, Deshuang ;
Wei, Jia ;
Liu, Qi .
GENOME BIOLOGY, 2018, 19
[9]   Multiplex Genome Engineering Using CRISPR/Cas Systems [J].
Cong, Le ;
Ran, F. Ann ;
Cox, David ;
Lin, Shuailiang ;
Barretto, Robert ;
Habib, Naomi ;
Hsu, Patrick D. ;
Wu, Xuebing ;
Jiang, Wenyan ;
Marraffini, Luciano A. ;
Zhang, Feng .
SCIENCE, 2013, 339 (6121) :819-823
[10]   Generative Adversarial Networks An overview [J].
Creswell, Antonia ;
White, Tom ;
Dumoulin, Vincent ;
Arulkumaran, Kai ;
Sengupta, Biswa ;
Bharath, Anil A. .
IEEE SIGNAL PROCESSING MAGAZINE, 2018, 35 (01) :53-65