A kind of jade source area identification method of spectral normalization combination multivariate statistical model
Technical field
The present invention relates to the methods that a kind of pair of jade carries out source area identification, specifically, the basic principle is that using swashing
Photoinduction breakdown spectral technology (LIBS), is pre-processed using spectral normalization, in conjunction with principal component analysis (PCA), is supported
Vector machine (SVM) both multivariate statistical models analyze spectroscopic data, carry out the source area identification of jade.
Background technique
Jade is a kind of precious mineral, it will usually be made as all kinds of jewellery, ornament and the art work.The jade of high-quality
There is stone very high economy and technique to be worth.In China, jade culture has the history of more than one thousand years, has contained intension abundant.
And the quality and price of jade, mainly determined by the place of production of jade raw material.Compared with identification jade is true and false, the mirror in jade source area
It is bigger to determine difficulty.Only have by the source area discrimination method accuracy that traditional color, micro-judgment and normal optical detect
60% or so, and associated specialist is needed just to can be carried out, it is affected by human factors very big.Although also having using Raman spectrum skill at present
Art, inductively coupled plasma spectrometry technology are compared the jade sample of different sources, but without formation more system
Identification method.
Laser induced breakdown spectroscopy (abbreviation LIBS) is a kind of emerging element analysis technology, is had pre- without sample
Processing, small to sample damage, analysis speed is fast, is able to achieve a variety of advantages such as multielement measurement, in the identification of the source area of jade
With very big potentiality.But LIBS spectroscopic data usually has, and data volume is big, dimension is more, vulnerable to the spy of experiment condition influence of fluctuations
Point, it is difficult to be directly used in the identification work in source area, therefore the present invention is pre-processed using spectral normalization, eliminates experiment condition
The influence of fluctuation, and multivariate statistics model is combined, it first carries out extracting principal component using principal component analysis (PCA), removing redundancy
Data and noise reduce data dimension;Classified again to the principal component of extraction using support vector machines (SVM), determines jade
Source area.
Summary of the invention
The purpose of the present invention is being directed to, traditional jade source area identification method accuracy is low, is affected by human factors big lack
It falls into, proposes to be identified in conjunction with spectral normalization, multivariate statistical model PCA and SVM, by LIBS technology by qualitative people
It is promoted to more scientific quantification for empirical analysis to identify, to improve the accuracy of qualification result.
The technical scheme is that
A kind of jade source area identification method of spectral normalization combination multivariate statistical model, it is characterized in that this method includes
Following steps:
1) it using jade sample known to one group of source area, is modeled as calibration sample, same source area will be come from
Sample be classified as same class, the sample of different sources is classified as inhomogeneity;
2) calibration sample is detected using laser induced breakdown spectroscopy experimental system, obtains the light of this group of calibration sample
Spectral line is composed, the inside contains the characteristic spectral line and these features of the laser induced breakdown spectroscopy of each calibration sample various elements
The intensity of spectral line;
3) laser induced breakdown spectroscopy of all calibration samples is normalized: selects an intensity higher
Characteristic spectral line is as standard feature spectral line, and for the spectrum of each calibration sample, the intensity of each characteristic spectral line is simultaneously divided by standard
The intensity of characteristic spectral line is remained as the intensity after normalization, the characteristic spectral line intensity after forming a spectral normalization
Matrix X,
Wherein, n indicates the quantity of the jade sample for calibration, and p indicates the quantity of characteristic spectral line, xi1,xi2,…,xipTable
Show the intensity of each characteristic spectral line after i-th of jade sample spectra normalizes;
4) to matrix X carry out principal component analysis, extract principal component: by matrix X carry out diagonalization, that is, find one it is orthogonal
Matrix A, so that
Wherein, ATWith XTRespectively indicate matrix A and the transposition of X, λ1, λ2..., λpIt is the characteristic value on diagonal line, and meets
Characteristic value on diagonal line sorts from large to small, i.e. λ1≥λ2≥…≥λp;
M eigenvalue λ before selecting1, λ2..., λm, so that the sum of this m characteristic value is more than or equal to characteristic value summation
95%, i.e.,
The corresponding dimension of this preceding m characteristic value, is exactly the preceding m principal component of matrix X, is denoted as S respectively1, S2..., Sm, full
Foot:
S1=XA1, S2=XA2..., Sm=XAm (4)
Wherein, A1, A2..., AmRespectively the 1st of orthogonal matrix A, the 2nd ..., m column element;
The principal component matrix that principal component analysis obtains is denoted as S, is just had
Wherein, n indicate for calibration jade sample quantity, m indicate choose characteristic value number, that is, it is main at
The number divided, si1,si2,…,simIndicate the preceding m principal component of i-th of jade sample for calibration;
5) this n jade samples for calibration are set from k kind source area, every kind of place of production includes CiA sample, i.e.,
C1+C2+…Ck=n (6)
Model is established using support vector machine method to be calibrated, and source area is pressed to this n jade samples for calibration
Pairwise classification;
The sample from first source area is first regarded as one kind, the sample in remaining k-1 source area is regarded as another kind of;It is right
In these two types of samples, using support vector machine method, that is, the vector ω and a constant b of m dimension are found, so that first production
Principal component [the s of each jade sample i on groundi1 si2 … sim] be all satisfied
ωT[si1 si2 … sim]+b≥+1 (7)
And there are samples, and equal sign is set up, i.e. some jade sample i in the first place of production*Principal component [si*1 si*2
… si*m] meet
ωT[si*1 si*2 … si*m]+b=+1 (8)
Principal component [the s of each jade sample j in another kind ofj1 sj2 … sjm] be all satisfied
ωT[sj1 sj2 … sjm]+b≤-1 (9)
And there are samples, and equal sign is set up, i.e., it is another kind of in some jade sample j*Principal component [sj*1 sj*2 …
sj*m] meet
ωT[sj*1 sj*2 … sj*m]+b=-1 (10)
Such two classes sample is just by a linear plane ωTS+b=0 is separated, and spacing distance isWherein ‖ ω ‖
Indicate the modulus value of vector ω;
The vector ω and constant b more than one set for meeting above-mentioned condition, take wherein ‖ ω ‖ it is minimum, that is, make two class samples it
Between spacing distanceMaximum that group (ω1 *,b1 *), as the jade sample for dividing first place of production and the remaining k-1 place of production
The best mode of product;
After the jade sample for marking off first place of production, then the sample in second place of production is regarded as one kind, is left k-2 and produces
The sample on ground is regarded as the second class, is divided using aforesaid way, and corresponding (ω is recorded2 *,b2 *);And so on, until dividing
The sample in all places of production out;
6) the jade sample unknown for one group of source area, is predicted, specific practice is as follows as test sample:
Test sample is detected using laser induced breakdown spectroscopy experimental system first, obtains optic spectrum line;And it is right
Spectrum is normalized, the characteristic spectral line intensity matrix X ' after being normalized;Characteristic spectral line intensity after extracting normalization
The preceding m principal component of matrix X ', is denoted as S ', has
Wherein q indicate for test jade sample quantity, m indicate choose characteristic value number, that is, it is main at
The number divided, si1′,si2′,…,simEach principal component of the i-th of ' expression jade sample for test;
Pairwise classification is carried out to this q jade samples for test using support vector machine method, predicts its source area;
First mark off the sample in first place of production;For i-th of jade sample, if principal component [si1′ si2′ … sim'] meet (ω1 *)T
[si1′ si2′ … sim′]+b1 *>=0, it is regarded as the sample in first place of production;If meeting (ω1 *)T[si1′ si2′ …
sim′]+b1 *< 0, it is regarded as the sample in other places of production;The sample in second place of production is marked off in the sample in other places of production again,
I.e. for i-th of jade sample,
If principal component meets (ω2 *)T[si1′ si2′ … sim′]+b2 *>=0, it is regarded as the sample in second place of production;
If meeting (ω2 *)T[si1′ si2′ … sim′]+b2 *< 0, it is regarded as the sample in other places of production;And so on, institute
There is the source area of test sample all to predict to finish;
7) the prediction source area of test sample is compared with true source area, the correctness of verification source area identification.
The present invention has the following advantages that and the technical effect of high-lighting:
LIBS technology is to the ablation quality of jade sample in nanogram rank, and damage is very small, and qualification process is hardly right
Jade condition has an impact, and Nondestructive Identification may be implemented;Each sample need to only acquire a spectrum in specific operation process, a batch
Sample finishes the time for only needing a few minutes, it can be achieved that Rapid identification to analyzing and identifying from adopting spectrum;Using LIBS spectral line data conduct
Qualitative artificial empirical analysis is promoted to more scientific quantification and identified, significantly improves source area identification by classification indicators
Correctness.It is pre-processed using spectral normalization, avoids experiment condition and fluctuate the influence identified jade source area;It adopts
Primary data is pre-processed with PCA, only retains important principal component, unnecessary dimension is eliminated, greatly reduces mould
Time & Space Complexity when type calculates;Principal component is modeled using SVM, classification is carried out and realizes source area identification, by SVM
It is combined in superior classification capacity with the characteristics of PCA dimensionality reduction, available high identification accuracy.
Detailed description of the invention
Fig. 1 is flow diagram of the invention.
Fig. 2 a be with the first and second principal components be transverse and longitudinal coordinate draw each place of production sample data distribution, before Fig. 2 b is
The characteristic value of seven principal components accounts for the percentage of all characteristic value summations.
Fig. 3 is place of production qualification result figure.
Specific embodiment
The present invention is described further with reference to the accompanying drawings and examples.
A kind of jade source area identification method of spectral normalization combination multivariate statistical model provided by the invention, it is specific
Include the following steps:
1) it using jade sample known to one group of source area, is modeled as calibration sample, same source area will be come from
Sample be classified as same class, the sample of different sources is classified as inhomogeneity;
2) calibration sample is detected using laser induced breakdown spectroscopy experimental system, obtains the light of this group of calibration sample
Spectral line is composed, the inside contains the characteristic spectral line and these features of the laser induced breakdown spectroscopy of each calibration sample various elements
All kinds of atomic spectral lines and ion line and its intensity of the elements such as the intensity of spectral line, mainly Ca, Mg, Si;
3) laser induced breakdown spectroscopy of all calibration samples is normalized: selects an intensity higher
Characteristic spectral line is as standard feature spectral line, such as atomic spectral line of the Ca element at wavelength 616.129nm, and the intensity of spectral line is high and line
Type is preferable, is suitable as standard feature spectral line;For the spectrum of each calibration sample, the intensity of each characteristic spectral line is simultaneously divided by mark
The intensity of quasi- characteristic spectral line is remained as the intensity after normalization, and the characteristic spectral line after forming a spectral normalization is strong
Matrix X is spent,
Wherein, n indicates the quantity of the jade sample for calibration, and p indicates the quantity of characteristic spectral line, xi1,xi2,…,xipTable
Show the intensity of each characteristic spectral line after i-th of jade sample spectra normalizes;
4) to matrix X carry out principal component analysis, extract principal component: by matrix X carry out diagonalization, that is, find one it is orthogonal
Matrix A, so that:
Wherein, ATWith XTRespectively indicate matrix A and the transposition of X, λ1, λ2..., λpIt is the characteristic value on diagonal line, and meets
Characteristic value on diagonal line sorts from large to small, i.e. λ1≥λ2≥…≥λp;
M eigenvalue λ before selecting1, λ2..., λm, so that 95% of the sum of this m characteristic value more than or equal to characteristic value summation
(the lesser characteristic value of 5% numerical value after casting out, the corresponding dimension of these characteristic values act on very little to classification, can directly reject;
The percentage cast out is adjustable, usual 5%-10%, both can guarantee that the principal component information of extraction was sufficient,
Can avoid doping unwanted contributions), i.e.,
The corresponding dimension of this preceding m characteristic value, is exactly the preceding m principal component of matrix X, is denoted as S respectively1, S2..., Sm, full
Foot:
S1=XA1, S2=XA2..., Sm=XAm (4)
Wherein, A1, A2..., AmRespectively the 1st of orthogonal matrix A, the 2nd ..., m column element;
The principal component matrix that principal component analysis obtains is denoted as S, is just had
Wherein, n indicate for calibration jade sample quantity, m indicate choose characteristic value number, that is, it is main at
The number divided, si1,si2,…,simIndicate that the preceding m principal component of i-th of jade sample for calibration, this m principal component are real
The linear combination of former spectrum intensity data on border, but be further extracted compared with former spectrum jade sample is carried out it is original
The useful information of ground identification, eliminates the garbages such as noise;
5) this n jade samples for calibration are set from k kind source area, every kind of place of production includes CiA sample, i.e.,
C1+C2+…Ck=n (6)
Model is established using support vector machine method to be calibrated, and source area is pressed to this n jade samples for calibration
Pairwise classification;
The sample (for example source area is the jade of Luodian) from first source area is first regarded as one kind, is left k-1
The sample (jades in other source areas such as Xinjiang, Qinghai, Russia, South Korea) in source area is regarded as another kind of;For these two types of samples
Product find the vector ω and a constant b of m dimension, so that each jade in first place of production using support vector machine method
Principal component [the s of stone sample ii1 si2 … sim] be all satisfied
ωT[si1 si2 … sim]+b≥+1 (7)
And there are samples, and equal sign is set up, i.e. some jade sample i in the first place of production*Principal component [si*1 si*2
… si*m] meet
ωT[si*1 si*2 … si*m]+b=+1 (8)
Principal component [the s of each jade sample j in another kind ofj1 sj2 … sjm] be all satisfied
ωT[sj1 sj2 … sjm]+b≤-1 (9)
And there are samples, and equal sign is set up, i.e., it is another kind of in some jade sample j*Principal component [sj*1 sj*2 …
sj*m] meet
ωT[sj*1 sj*2 … sj*m]+b=-1 (10)
Such two classes sample is just by a linear plane ωTS+b=0 is separated, and two class samples are located at this line
The two sides in mild-natured face, and spacing distance isWherein ‖ ω ‖ indicates the modulus value of vector ω.
The vector ω and constant b more than one set for meeting above-mentioned condition, take wherein ‖ ω ‖ it is minimum, that is, make two class samples it
Between spacing distanceMost apparent that group (ω of maximum, differentiation1 *,b1 *), as first place of production of division and remaining k-1 production
The best mode of the jade sample on ground;
After the jade sample for marking off first place of production, then the sample in second place of production is regarded as a kind of (for example dividing
Out after the jade of Luodian, the jade sample in Xinjiang is then divided), the sample in the remaining k-2 place of production is regarded as the second class, and use is above-mentioned
Mode is divided, and corresponding (ω is recorded2 *,b2 *);And so on, until marking off the sample in all places of production;
6) the jade sample unknown for one group of source area, is predicted, specific practice is as follows as test sample:
Test sample is detected using laser induced breakdown spectroscopy experimental system first, obtains optic spectrum line;And it is right
Spectrum is normalized, the characteristic spectral line intensity matrix X ' after being normalized;Characteristic spectral line intensity after extracting normalization
The preceding m principal component of matrix X ', is denoted as S ', has
Wherein q indicate for test jade sample quantity, m indicate choose characteristic value number, that is, it is main at
The number divided, si1′,si2′,…,simEach principal component of the i-th of ' expression jade sample for test;
Pairwise classification is carried out to this q jade samples for test using support vector machine method, predicts its source area.
First mark off the sample in first place of production.For i-th of jade sample, if principal component [si1′ si2′ … sim'] meet (ω1 *)T
[si1′ si2′ … sim′]+b1 *>=0, it is regarded as the sample in first place of production;If meeting (ω1 *)T[si1′ si2′ …
sim′]+b1 *< 0, it is regarded as the sample in other places of production;The sample in second place of production is marked off in the sample in other places of production again,
I.e. for i-th of jade sample,
If principal component meets (ω2 *)T[si1′ si2′ … sim′]+b2 *>=0, it is regarded as the sample in second place of production;
If meeting (ω2 *)T[si1′ si2′ … sim′]+b2 *< 0, it is regarded as the sample in other places of production;And so on, directly
It all predicts to finish to the source area of all test samples;
7) the prediction source area of test sample is compared with true source area, the correctness of verification source area identification.
Embodiment: source area identification is carried out to the Khotan jade sample from 5 kinds of source areas.
The source area identification of 638 Khotan jade samples, wherein 114 originate in Luodian, 114 originate in Xinjiang, and 110 originate in
Qinghai, 150 originate in Russia, and 150 originate in South Korea.
Key step are as follows:
1) first (Luodian, Xinjiang, Qinghai, Russia and South Korea, each place of production are originated in respectively using 500 known source areas
100 samples) Khotan jade sample as calibration sample, establish model: using laser induced breakdown spectroscopy experimental system to every
A sample is detected, and spectrum is obtained, and finds characteristic spectral line, mainly includes Mg in 285.2nm, 382.9nm, 383.2nm etc.
Atom line, Ca 487.812nm, 616.129nm, 714.814nm etc. atom line, Si 288.157nm,
The atom line and ion line of 265.977nm etc..Background correction, the area in each spectral line region of integral calculation is as the intensity of spectral line.
2) it selects atom line of the Ca at 616.129nm as standard spectral line, carries out spectral normalization.
3) the intensity of spectral line data of all calibration samples are pre-processed using PCA, obtains the spy of first five principal component
Value indicative accounts for 57.5%, 25.9%, 7.1%, 4.0%, the 1.8% of characteristic value summation respectively, and accounting reaches 96.3%, is more than
95%, therefore extract the data of 5 principal components before each sample.
4) model is established using SVM method, 500 calibration samples is divided two-by-two by source area.Model accuracy
Reach 100%, i.e., the source area in modeling process each for the jade sample of calibration can correctly be identified.
5) in order to verify the correctness of identification method, 138 Khotan jade samples (14 productions in unknown source area in advance are used
In Luodian, 14 originate in Xinjiang, and 10 originate in Qinghai, and 50 originate in Russia, and 50 originate in South Korea) it is used as test sample, make
With established model, source area prediction is carried out.The accuracy of prediction result has also reached 100%, has obtained originating in well
Ground qualification result.
Fig. 2 a be with the first and second principal components be transverse and longitudinal coordinate draw each place of production sample data distribution, before Fig. 2 b is
The characteristic value of seven principal components accounts for the percentage of all characteristic value summations.
Fig. 3 is place of production qualification result figure, and for the jade sample in 5 kinds of source areas, the identification accuracy of calibration and verification is equal
Reach 100%.