Background: The detection of remote homology between protein sequences is a central problem in computational biology. Discriminative methods such as the support vector machine (SVM) are among the most effective approaches. Objective: Many SVM-based methods focus on finding useful representations of protein sequences using either explicit feature vector representations or kernel functions. Such representations may suffer from the peaking phenomenon in many machine-learning methods because the features are usually very large and may contain some noise. In addition, the dataset for the problem of remote homology detection is imbalanced as the number of negative samples is far greater than the number of positive samples. Method: Based on these observations, we propose a new method for reconstructing feature space based on latent semantic analysis (LSA) and hierarchical clustering. In addition, for detecting remote homology, we adopt an alternative evaluation method called the precision-recall (PR) curve & score instead of the receiver operating characteristic (ROC). Results: Compared to existing methods, the performance increased by 14% on the 3-gram features and 7% on the LA features. Conclusion: Through analysis of the contrasting experiment results, we confirmed that our method is effective and performs better than other existing methods.
机构:
MIT, Dept Math, Cambridge, MA 02139 USA
MIT, Comp Sci & Artificial Intelligence Lab, Cambridge, MA 02139 USAMIT, Dept Math, Cambridge, MA 02139 USA
Daniels, Noah M.
Gallant, Andrew
论文数: 0引用数: 0
h-index: 0
机构:
Tufts Univ, Dept Comp Sci, Medford, MA 02451 USAMIT, Dept Math, Cambridge, MA 02139 USA
Gallant, Andrew
论文数: 引用数:
h-index:
机构:
Ramsey, Norman
Cowen, Lenore J.
论文数: 0引用数: 0
h-index: 0
机构:
Tufts Univ, Dept Comp Sci, Medford, MA 02451 USAMIT, Dept Math, Cambridge, MA 02139 USA
机构:
Harbin Inst Technol, Shenzhen Grad Sch, Sch Comp Sci & Technol, Shenzhen 518055, Guangdong, Peoples R China
Harbin Inst Technol, Shenzhen Grad Sch, Key Lab Network Oriented Intelligent Computat, Shenzhen 518055, Guangdong, Peoples R China
Gordon Life Sci Inst, Belmont, MA 02478 USAHarbin Inst Technol, Shenzhen Grad Sch, Sch Comp Sci & Technol, Shenzhen 518055, Guangdong, Peoples R China
Liu, Bin
Chen, Junjie
论文数: 0引用数: 0
h-index: 0
机构:
Harbin Inst Technol, Shenzhen Grad Sch, Sch Comp Sci & Technol, Shenzhen 518055, Guangdong, Peoples R ChinaHarbin Inst Technol, Shenzhen Grad Sch, Sch Comp Sci & Technol, Shenzhen 518055, Guangdong, Peoples R China
Chen, Junjie
Wang, Xiaolong
论文数: 0引用数: 0
h-index: 0
机构:
Harbin Inst Technol, Shenzhen Grad Sch, Sch Comp Sci & Technol, Shenzhen 518055, Guangdong, Peoples R China
Harbin Inst Technol, Shenzhen Grad Sch, Key Lab Network Oriented Intelligent Computat, Shenzhen 518055, Guangdong, Peoples R ChinaHarbin Inst Technol, Shenzhen Grad Sch, Sch Comp Sci & Technol, Shenzhen 518055, Guangdong, Peoples R China
机构:
MIT, Dept Math, Cambridge, MA 02139 USA
MIT, Comp Sci & Artificial Intelligence Lab, Cambridge, MA 02139 USAMIT, Dept Math, Cambridge, MA 02139 USA
Daniels, Noah M.
Gallant, Andrew
论文数: 0引用数: 0
h-index: 0
机构:
Tufts Univ, Dept Comp Sci, Medford, MA 02451 USAMIT, Dept Math, Cambridge, MA 02139 USA
Gallant, Andrew
论文数: 引用数:
h-index:
机构:
Ramsey, Norman
Cowen, Lenore J.
论文数: 0引用数: 0
h-index: 0
机构:
Tufts Univ, Dept Comp Sci, Medford, MA 02451 USAMIT, Dept Math, Cambridge, MA 02139 USA
机构:
Harbin Inst Technol, Shenzhen Grad Sch, Sch Comp Sci & Technol, Shenzhen 518055, Guangdong, Peoples R China
Harbin Inst Technol, Shenzhen Grad Sch, Key Lab Network Oriented Intelligent Computat, Shenzhen 518055, Guangdong, Peoples R China
Gordon Life Sci Inst, Belmont, MA 02478 USAHarbin Inst Technol, Shenzhen Grad Sch, Sch Comp Sci & Technol, Shenzhen 518055, Guangdong, Peoples R China
Liu, Bin
Chen, Junjie
论文数: 0引用数: 0
h-index: 0
机构:
Harbin Inst Technol, Shenzhen Grad Sch, Sch Comp Sci & Technol, Shenzhen 518055, Guangdong, Peoples R ChinaHarbin Inst Technol, Shenzhen Grad Sch, Sch Comp Sci & Technol, Shenzhen 518055, Guangdong, Peoples R China
Chen, Junjie
Wang, Xiaolong
论文数: 0引用数: 0
h-index: 0
机构:
Harbin Inst Technol, Shenzhen Grad Sch, Sch Comp Sci & Technol, Shenzhen 518055, Guangdong, Peoples R China
Harbin Inst Technol, Shenzhen Grad Sch, Key Lab Network Oriented Intelligent Computat, Shenzhen 518055, Guangdong, Peoples R ChinaHarbin Inst Technol, Shenzhen Grad Sch, Sch Comp Sci & Technol, Shenzhen 518055, Guangdong, Peoples R China