论文部分内容阅读
应用支持向量分类方法,将云南省9个地区缴获的1148个海洛因样品,用近红外漫反射光谱在4 000 cm~(-1)~10 000 cm~(-1)范围内吸收系数数据集合,构建判别毒品来路的分类器。光谱数据选取了指纹波数区段5 990 cm~(-1)~7 500 cm~(-1),以及最大和较大吸收系数的41个波数的光谱数据。针对一对一算法的五分类问题,采用两种分类法C-SVC和v-SVC, 4种核函数,分别以默认参数和优化参数,得训练集模型有效率和检验集的预报总精度。比较各种模型后,确定了152个指纹区波数,线性核函数的L-152 C-SVC作为分类器模型。该模型对已知分类的5个地区随机选取的训练集样本,在10-交叉检验下的有效率是90.74%,对不包含训练集的其余全部已知样品,其预报总精度是88.71%。5地区分类统计计算的敏感性、特异性、相关系数的评价都较好。最后,又试用该分类器于未知地毒品的来路辨认。与报道的模式识别比较,工作没有止于训练集给出模型,检验集判断预报效果的已知样品,又走出了重要一步,即识别训练集和检验集之外的未知样品。
Using the support vector classification method, 1148 samples of heroin seized in nine areas of Yunnan Province were collected and the absorption coefficient data were collected by near-infrared diffuse reflectance spectroscopy in the range of 4 000 cm -1 to 10 000 cm -1. Construct a classifier that identifies the source of the drug. Spectral data were obtained from the spectral data of 41 9 wavenumbers to 7 500 cm -1 and 41 wavenumbers of maximum and maximum absorption coefficients. Aiming at the five-classification problem of one-to-one algorithm, two kinds of classification C-SVC and v-SVC, four kinds of kernel functions are used respectively, and the efficiency of the training set model and the overall prediction accuracy of the test set are obtained by default parameters and optimization parameters respectively. After comparing various models, the L-152 C-SVC with 152 wave numbers and linear kernel functions was determined as the classifier model. In the model, training samples randomly selected from five regions of known classification are 90.74% under the 10-crossing test. The prediction accuracy of all the other known samples without the training set is 88.71%. The sensitivity, specificity and correlation coefficient of the statistic calculation of regional classification are all good. Finally, try this classifier to identify the origin of unknown drugs. Compared with the reported pattern recognition, the work does not stop at the training set to give the model, and the known sample whose test result is to judge the prediction effect has come out another important step, which is to identify unknown samples outside the training set and test set.