论文部分内容阅读
目的:基于已知药物靶点和潜在药物靶点蛋白的一级结构相似性,结合SVM技术研究新的有效的药物靶点预测方法。方法:构造训练样本集,提取蛋白质序列的一级结构特征,进行数据预处理,选择最优核函数,优化参数并进行特征选择,训练最优预测模型,检验模型的预测效果。以G蛋白偶联受体家族的蛋白质为预测集,应用建立的最优分类模型对其进行潜在药物靶点挖掘。结果:基于SVM所建立的最优分类模型预测的平均准确率为81.03%。应用最优分类器对构造的G蛋白预测集进行预测,结果发现预测排位在前20的蛋白质中有多个与疾病相关。特别的,其中有两个G蛋白在治疗靶点数据库(TTD)中显示已作为临床试验的药物靶点。结论:基于SVM和蛋白质序列特征的药物靶点预测方法是有效的,应用该方法预测出的潜在药物靶点能够为发现新的药靶提供参考。
OBJECTIVE: To study new and effective drug target prediction methods based on the primary structural similarities of known drug targets and potential drug target proteins in combination with SVM. Methods: Constructing the training sample set, extracting the primary structure features of protein sequence, preprocessing the data, selecting the optimal kernel function, optimizing the parameters and selecting the features, training the optimal prediction model to test the predictive effect of the model. The G protein coupled receptor family of proteins as a predictive set, the establishment of the optimal classification model for its potential drug targets mining. Results: The average accuracy rate of the best classification model based on SVM was 81.03%. The optimal classifier was used to predict the predicted set of G proteins and found that there were multiple disease-related predictions in the top 20 proteins. In particular, two of these G proteins have been shown in therapeutic target databases (TTDs) as drug targets for clinical trials. Conclusion: The drug target prediction method based on SVM and protein sequence characteristics is effective. The potential drug targets predicted by this method can provide a reference for the discovery of new drug targets.