CN106248621A - A kind of evaluation methodology and system - Google Patents

A kind of evaluation methodology and system Download PDF

Info

Publication number
CN106248621A
CN106248621A CN201610790067.9A CN201610790067A CN106248621A CN 106248621 A CN106248621 A CN 106248621A CN 201610790067 A CN201610790067 A CN 201610790067A CN 106248621 A CN106248621 A CN 106248621A
Authority
CN
China
Prior art keywords
near infrared
infrared spectrum
basic data
spectrum
similarity
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201610790067.9A
Other languages
Chinese (zh)
Other versions
CN106248621B (en
Inventor
张军
詹映
薛庆逾
石超
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Upper Seabird And Hundred Million Electronics Technology Development Co Ltds
Original Assignee
Upper Seabird And Hundred Million Electronics Technology Development Co Ltds
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Upper Seabird And Hundred Million Electronics Technology Development Co Ltds filed Critical Upper Seabird And Hundred Million Electronics Technology Development Co Ltds
Priority to CN201610790067.9A priority Critical patent/CN106248621B/en
Publication of CN106248621A publication Critical patent/CN106248621A/en
Application granted granted Critical
Publication of CN106248621B publication Critical patent/CN106248621B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G01MEASURING; TESTING
    • G01NINVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
    • G01N21/00Investigating or analysing materials by the use of optical means, i.e. using sub-millimetre waves, infrared, visible or ultraviolet light
    • G01N21/17Systems in which incident light is modified in accordance with the properties of the material investigated
    • G01N21/25Colour; Spectral properties, i.e. comparison of effect of material on the light at two or more different wavelengths or wavelength bands
    • G01N21/31Investigating relative effect of material at wavelengths characteristic of specific elements or molecules, e.g. atomic absorption spectrometry
    • G01N21/35Investigating relative effect of material at wavelengths characteristic of specific elements or molecules, e.g. atomic absorption spectrometry using infrared light
    • G01N21/359Investigating relative effect of material at wavelengths characteristic of specific elements or molecules, e.g. atomic absorption spectrometry using infrared light using near infrared light

Landscapes

  • Physics & Mathematics (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Chemical & Material Sciences (AREA)
  • Analytical Chemistry (AREA)
  • Biochemistry (AREA)
  • General Health & Medical Sciences (AREA)
  • General Physics & Mathematics (AREA)
  • Immunology (AREA)
  • Pathology (AREA)
  • Investigating Or Analysing Materials By Optical Means (AREA)

Abstract

The present invention provides a kind of evaluation methodology and system, for the quality of basic data is evaluated, including: from basic data, obtain the chemical score of a plurality of near infrared spectrum and correspondence;Near infrared spectrum is carried out pretreatment;Solve the similarity distance between each near infrared spectrum and partial auto-correlation;Obtain near infrared spectrum and the chemical score of the maximum similarity corresponding with each near infrared spectrum;Calculate the absolute difference of chemical score corresponding between the near infrared spectrum of the corresponding maximum similarity of each near infrared spectrum respectively, and solve the meansigma methods of all absolute differences;Judge that the meansigma methods of absolute difference, whether more than the error amount preset, when for time no, evaluates basic data qualified;When for being, evaluate basic data defective.The present invention can precise and high efficiency before near infrared spectrum is modeled, the quality of basic data is evaluated, with get rid of near infrared spectrum data of low quality is modeled, improve modeling analysis effectiveness, reduce manpower and materials waste.

Description

A kind of evaluation methodology and system
Technical field
The present invention relates to data processing field, particularly relate to a kind of evaluation methodology and system.
Background technology
Near infrared spectrum quantitative modeling needs substantial amounts of sample spectrum information and basic data, for a collection of modeling light The quality of data that spectrum is traditional with basic data differentiates it is to see two, and one is not make model to see the repeatability of flowing, and one is to do Forecast error is seen after model, but owing to the flow detection of the repetition of Duplicate Samples is highly difficult;And model needs substantial amounts of sample Could differentiate and be belonging to differentiate afterwards.For a collection of modeling data quality lack early stage to its overall evaluation, if a collection of fixed Spectrum and the basic data of amount modeling are inaccurate or the most corresponding, and the model of foundation often precision is the highest or the suitability is not strong, Tradition runs into this situation, or resamples modeling, or is updated model safeguarding, but still has problems, gesture Model must be caused to again pull up failure, cause the waste of great human and material resources, financial resources.How just to model before modeling Whether quality anticipation can set up a qualified model becomes particularly important and necessary.
Summary of the invention
The shortcoming of prior art in view of the above, it is an object of the invention to provide a kind of evaluation methodology and system, uses In solution prior art, the basic data of the near infrared spectrum to be modeled can not be carried out effectiveness anticipation and cause effect Rate is low and the problem of waste of manpower and financial resources etc..
For achieving the above object and other relevant purposes, the present invention provides a kind of evaluation methodology, is used for comprising near infrared light The quality of basic data for modeling of spectrum is evaluated, and described method includes: obtain from described basic data a plurality of closely Infrared spectrum, and obtain the chemical score corresponding with every near infrared spectrum;Described near infrared spectrum is carried out pretreatment;Solve each Similarity distance between described near infrared spectrum and partial auto-correlation;According to the similarity distance between each described near infrared spectrum with And partial auto-correlation obtains the near infrared spectrum of the maximum similarity corresponding to each described near infrared spectrum and corresponding respectively Chemical score;Obtain chemistry corresponding between the near infrared spectrum of the corresponding maximum similarity of each described near infrared spectrum respectively The difference of value, and after all described differences are taken absolute value, obtain the absolute difference corresponding with described difference, solve all described The meansigma methods of absolute difference;The meansigma methods of described absolute difference is compared, when described absolute difference with the error amount preset Meansigma methods more than described default error amount time, evaluate the off quality of described basic data;When described absolute difference When meansigma methods is less than or equal to described default error amount, evaluate the up-to-standard of described basic data.
In the present invention one specific embodiment, the mode that described near infrared spectrum carries out pretreatment includes S-G derivation side Method.
In the present invention one specific embodiment, according to the information content of described near infrared spectrum, solve each described near-infrared Similarity distance between spectrum.
In the present invention one specific embodiment, the similarity between each described near infrared spectrum is between each described near infrared spectrum Partial auto-correlation and each described near infrared spectrum between the ratio of similarity distance.
In the present invention one specific embodiment, when evaluate described basic data off quality time, to described basis number The sampling mode of the near infrared spectrum according to is adjusted and/or safeguards basis flow-data.
For achieving the above object and other relevant purposes, the present invention also provides for a kind of data evaluation system, for comprising The quality for the basic data of modeling of near infrared spectrum is evaluated, and described system includes: basic data acquisition module, uses To obtain a plurality of near infrared spectrum from described basic data, and obtain the chemical score corresponding with every near infrared spectrum;Pre-place Reason module, in order to carry out pretreatment to described near infrared spectrum;Maximum similarity spectrum acquisition module, in order to solve each described closely Similarity distance between infrared spectrum and partial auto-correlation;And according to the similarity distance between each described near infrared spectrum and office Portion's correlation coefficient obtains near infrared spectrum and correspondingization of the maximum similarity corresponding with each described near infrared spectrum respectively Value;Difference meansigma methods solves module, in order to obtain the near of the corresponding maximum similarity of each described near infrared spectrum respectively The difference of chemical score corresponding between infrared spectrum, and after all described differences are taken absolute value, obtain corresponding with described difference Absolute difference, solves the meansigma methods of all described absolute differences;Comparison module, in order to by the meansigma methods of described absolute difference with pre- If error amount compare, when the meansigma methods of described absolute difference is more than described default error amount, evaluate described basis Data off quality;When the meansigma methods of described absolute difference is less than or equal to described default error amount, evaluate described Basic data up-to-standard.
In the present invention one specific embodiment, the mode that described near infrared spectrum carries out pretreatment includes S-G derivation side Method.
In the present invention one specific embodiment, maximum similarity spectrum acquisition module is in order to according to described near infrared spectrum Information content, solves the similarity distance between each described near infrared spectrum.
In the present invention one specific embodiment, the similarity between each described near infrared spectrum is between each described near infrared spectrum Partial auto-correlation and each described near infrared spectrum between the ratio of similarity distance.
In the present invention one specific embodiment, also include adjusting module, in order to when evaluating the quality of described basic data not Time qualified, the sampling mode of the near infrared spectrum in described basic data be adjusted and/or basis flow-data is carried out Safeguard.
As it has been described above, the evaluation methodology of the present invention and system, for the basis for modeling comprising near infrared spectrum The quality of data is evaluated, and described method includes: obtains a plurality of near infrared spectrum from described basic data, and obtains with every The chemical score that bar near infrared spectrum is corresponding;Described near infrared spectrum is carried out pretreatment;Solve between each described near infrared spectrum Similarity distance and partial auto-correlation;According to the similarity distance between each described near infrared spectrum and partial auto-correlation's difference Obtain the near infrared spectrum of the maximum similarity corresponding with each described near infrared spectrum and corresponding chemical score;Obtain each respectively The difference of chemical score corresponding between the near infrared spectrum of the maximum similarity that described near infrared spectrum is corresponding, and to all institutes State after difference takes absolute value, obtain the absolute difference corresponding with described difference, solve the meansigma methods of all described absolute differences;Will The meansigma methods of described absolute difference compares with the error amount preset, when the meansigma methods of described absolute difference is preset more than described Error amount time, evaluate the off quality of described basic data;When the meansigma methods of described absolute difference is less than or equal to described During the error amount preset, evaluate the up-to-standard of described basic data.The present invention can with precise and high efficiency near infrared spectrum Before being modeled, use sample in a small amount to carry out near infrared spectrum and pass judgment on chemical score quality, with the matter to basic data Amount is evaluated, and judges whether basic data can set up a stable accurate model, for the matter of near infrared spectrum data The evaluation of amount provides a kind of effective method of discrimination, it is to avoid cause owing to basic data is of low quality samples modeling in a large number, Also promote basic data when being of high quality, for expanding and improving the chemometrics method for basic data and provide and instruct, It is modeled underproof near infrared spectrum analyzing to get rid of, improves effectiveness and the accuracy of modeling analysis, reduce manpower The waste of material resources.
Accompanying drawing explanation
Fig. 1 is shown as the evaluation methodology of present invention schematic flow sheet in one embodiment.
Fig. 2 is shown as the pass applying the number of samples in a specific embodiment of the evaluation methodology of the present invention with correlation coefficient It it is schematic diagram.
Fig. 3 is shown as the relation applying the number of samples in a specific embodiment of the evaluation methodology of the present invention with similarity Schematic diagram.
Fig. 4 is shown as the relation applying the chemical score in a specific embodiment of the evaluation methodology of the present invention with sample number Schematic diagram.
Fig. 5 is shown as the relation applying the sample number in a specific embodiment of the evaluation methodology of the present invention with chemical score Schematic diagram.
Fig. 6 is shown as the pass applying the sample number in a specific embodiment of the evaluation methodology of the present invention with relative error It it is schematic diagram.
Fig. 7 is shown as the relation applying the chemical score in a specific embodiment of the evaluation methodology of the present invention with sample number Schematic diagram.
Fig. 8 is shown as the pass applying the sample number in a specific embodiment of the evaluation methodology of the present invention with correlation coefficient It it is schematic diagram.
Fig. 9 is shown as the relation applying the sample number in a specific embodiment of the evaluation methodology of the present invention with similarity Schematic diagram.
Figure 10 is shown as applying the sample number in a specific embodiment of the evaluation methodology of the present invention and relative error Relation schematic diagram.
Figure 11 is shown as the evaluation system of present invention module diagram in one embodiment.
Element numbers explanation
1 evaluates system
11 basic data acquisition modules
12 pretreatment module
13 maximum similarity spectrum acquisition modules
14 difference meansigma methodss solve module
15 comparison modules
S11~S16 step
Detailed description of the invention
Below by way of specific instantiation, embodiments of the present invention being described, those skilled in the art can be by this specification Disclosed content understands other advantages and effect of the present invention easily.The present invention can also be by the most different concrete realities The mode of executing is carried out or applies, the every details in this specification can also based on different viewpoints and application, without departing from Various modification or change is carried out under the spirit of the present invention.It should be noted that, in the case of not conflicting, following example and enforcement Feature in example can be mutually combined.
It should be noted that the diagram provided in following example illustrates the basic structure of the present invention the most in a schematic way Think, component count, shape and size when then only showing the assembly relevant with the present invention rather than implement according to reality in diagram Drawing, during its actual enforcement, the kenel of each assembly, quantity and ratio can be a kind of random change, and its assembly layout kenel is also It is likely more complexity.
Refer to Fig. 1, be shown as the evaluation methodology of present invention schematic flow sheet in one embodiment.Described method For the quality for the basic data of modeling comprising near infrared spectrum is evaluated, be i.e. equivalent to be used for the near of modeling The quality of the quantitative modeling data of infrared spectrum is evaluated, to the spectrum of a small amount of quantitative modeling and basic data before modeling Accuracy and corresponding performance be analyzed, and then evaluate the quality of all of basic data of this batch, i.e. spectrum and basis Accuracy and the correspondence of data are the highest, and the quality of spectrum is the highest.The present invention, before near-infrared great amount of samples obtains, uses in a small amount Sample carry out near infrared spectrum and pass judgment on chemical score quality, judge basic data whether can set up one stable Accurate model, the evaluation for the quality of near infrared spectrum data provides a kind of effective method of discrimination, it is to avoid due to basis number According to of low quality cause sample modeling in a large number, also promote basic data when being of high quality, for expanding and improving for basis The chemometrics method of data provides and instructs.
Described evaluation methodology shown in Fig. 1 includes:
S11: obtain a plurality of near infrared spectrum from described basic data, and obtain the change corresponding with every near infrared spectrum Value;In a specific embodiment, obtain a collection of modeling spectrum and chemical score wherein comprise M spectrum and with its spectrogram label Essential Chemistry value T_m (T_m represents the chemical score that m-th spectrum is corresponding) that Attribute Relative is answered, spectrum is made up of m wavelength points.
S12: described near infrared spectrum is carried out pretreatment;Although due to chemistry containing raw material near infrared spectrum, outer See, physical message, but near infrared spectrum is easily moved by external environment, instrument self that parts are instable to be affected, so in In the present invention one specific embodiment, after obtaining a plurality of described near infrared spectrum, also include using S-G Method of Seeking Derivative to acquisition Described a plurality of near infrared spectrum carries out pretreatment.Can eliminate or reduce above-mentioned shortcoming to a certain extent.Yu Benshi Executing in example, S-G Method of Seeking Derivative is: first each spectrum carrying out S-G and smooths, window width is 2k+1, uses differential width afterwards For w, the spectrum after smoothing is carried out first derivation.
S13: solve the similarity distance between each described near infrared spectrum and partial auto-correlation;
S14: obtain described with each according to the similarity distance between each described near infrared spectrum and partial auto-correlation respectively The near infrared spectrum of the maximum similarity that near infrared spectrum is corresponding and corresponding chemical score;Wherein, two spectrum XiWith Yj(i, j= 1 ... n, i ≠ j) similarity between is Dij, and in specific embodiment, solving similarity is DijStep also include:
1) spectrum X is soughtiWith YjBetween coefficient Rij, constructing a moving window number is the window of k, spectrum XiWith Yj In have m wavelength points, by spectrum XiWith YjFrom the beginning of the c wavelength points, move to c+k-1 wavelength points, and calculate XiWith Yj? Coefficient R in this section of spectrumcij, c is from 1:m-k+1 wavelength points, finally tries to achieve being averagely correlated with under all moving windows Coefficient is XjWith YjBetween coefficient Rij
R c i j = Σ c = 1 m - k + 1 ( X i , c : c + k - 1 - X ‾ ) × ( Y j , c : c + k - 1 - Y ‾ ) ) Σ c = 1 m - k + 1 ( X i , c : c + k - 1 - X ‾ ) 2 × ( Y j , c : c + k - 1 - Y ‾ ) 2
Xi,cRepresent the c moving window in i-th article of spectrum, Yj,cRepresent the c moving window in j-th strip spectrum.
R i j = Σ i , j = 1 , i ≠ j m - k + 1 R c i j m - k + 1
2) X in original spectrum is calculatediWith YjInformation content, xiIt is the information content of i-th spectrum, yjFor j-th strip light The information content of spectrum, calculating the information content comprised mutually between two spectrum is:
S x y = Σ i = 1 n x i log x i y j
Wherein, i, j=1 ... n, and i ≠ j.
Similarity distance between described each described near infrared spectrum is Sxy+Syx, and the similarity between each described near infrared spectrum For the partial auto-correlation between each described near infrared spectrum and the ratio of the similarity distance between each described near infrared spectrum.I.e. described Similarity is
S15: obtain correspondence between the near infrared spectrum of each described near infrared spectrum corresponding maximum similarity respectively The difference of chemical score, and after all described differences are taken absolute value, obtain the absolute difference corresponding with described difference, solve all The meansigma methods of described absolute difference;
S16: the meansigma methods of described absolute difference is compared with the error amount preset, average when described absolute difference When value is more than described default error amount, evaluate the off quality of described basic data, the most of low quality, for follow-up sample Originally it is not enough to set up a good near-infrared model;When the meansigma methods of described absolute difference is less than or equal to described default error During value, evaluate the up-to-standard of described basic data.
Know similarity D between each sampleij, spectrum samples i choose from other remaining m-1 samples one and it The spectrum v that similarity is maximum, obtains the chemical score T of corresponding with spectrum v for spectrum i two group sample simultaneouslyiWith Tv, obtain every Spectrum maximum similar spectral and corresponding chemical score thereof, obtain the mean error of their similar group of chemical scoreWhenMore than threshold value H (threshold value H determines according to actual production demand), then judge that this collection of modeling data is not suitable for, otherwise, this collection of modeling data energy It is enough in and sets up stable a, model accurately.Wherein,
In the present invention one specific embodiment, when evaluate described basic data off quality time, also include described The step that the sampling mode of the near infrared spectrum in basic data is adjusted and/or safeguards basis flow-data.
The present invention can also verify reasonability and the suitability of this method according to model external certificate error and quality evaluation.
The near infrared spectrum quantitative modeling data evaluation method of the present invention, it is possible to anticipation modeling data quality is good in advance Bad, it is to avoid the near-infrared data that the quality of data is the highest sample in a large number, can be that modeling producer reduces sample unnecessary waste, Reducing substantial amounts of material resources, manpower, financial resources, the accuracy verification for the adjustment basic data of sampling method simultaneously provides reliably Know foundation.
Can be the highest for anticipation qualitative data by the evaluation methodology of the present invention, but modeling data result is bad The model data that provide the foundation ensure reliably, promote improvement and the raising of near-infrared quantitative modeling method, for actual production Concrete application lays a solid foundation.
With the present invention concrete application example in actual production, the present invention will be further detailed below, this reality Example uses nicotine basic data that the former cigarette Nicotiana tabacum L. near infrared spectrum after beating and double roasting and Flow Analyzer done as experiment Object, specifically describes in detail a kind of near-infrared quantitative modeling quality testing new method.
Step one, obtain near infrared spectrum and corresponding chemical score thereof, detailed process: Nicotiana tabacum L. through beating and double roasting it After, the spectrogram of 268 samples is obtained through On-line NIR instrument, it is 256 that the wavelength of spectrum is counted, and by corresponding sample Product obtain corresponding chemical score for Flow Analyzerization inspection.Wherein, described sample is the Nicotiana tabacum L. that Red River Redrying Factory provides.
Step 2, near infrared spectrum is carried out pretreatment, detailed process: each spectrogram is converted into row matrix, chooses Window number is 7;Differential width is 3 each spectrum is carried out S-G convolution to smooth derivation.
Step 3, asking for the similarity of spectrum, detailed process is:
1. one moving window k=7 of structure, by spectrum from the beginning of first wavelength points, moves to 250 wavelength points, calculates Obtain the correlation coefficient between each spectrum as shown in Figure 2.
2. information content, x between each spectrogram in original spectrumiIt is the information content of i-th spectrum, yjFor j-th strip light The information content of spectrum, calculates the information comprised between each spectrum, is substituted into following formula and calculate respectively between two spectrum Information content:
S x y = Σ i = 1 n x i log x i y j
S y x = Σ i = 1 n y i log y j x i
3. the similarity between a spectrumAnd similarity collection of illustrative plates is as shown in Figure 3.
Step 4, according to similarity Dij, ask the mean error between its basic data to judge near-infrared quantitative modeling number According to quality, its detailed process: according to step 3, try to achieve similarity D between each sample and other samples, from No. 1 sample Start, select the sample the highest with his similarity to mate, and find corresponding chemical score, calculate 268 samples and he The error of chemical score between the highest coupling sample of similarity, 268 sample chemical Distribution value as shown in the figure 4, mutual Join Distribution of chemical value as it is shown in figure 5, relative error scattergram as shown in Figure 6, obtains the sample of 268 samples and its similarity mode Average relative error between product is 11.24%, and mean absolute error is 0.26, less than mean absolute error H in reality application =0.35, it is possible to determine that this batch of basic data quality can set up a stable near-infrared quantitative model being suitable for.According to 268 Spectrum and basic data set up near-infrared quantitative model, its external certificate parameter such as table 1 institute.Table 1 is that first near-infrared is quantitative Model external certificate parameter.
Table 1
The model correlation coefficient set up as seen from Table 1 is 0.82, and validation criteria deviation is 0.33, and average relative error is 10.9%, less than the average relative error in reality application and mean absolute error, this model can be applied in actual production.
In another specific embodiment, use and obtain other a collection of modeling spectrum and its corresponding chemical score, ask it Similarity between spectrum, the error matched to evaluate the quality of this batch of modeling data, its detailed process: obtain the most a collection of Through 210 spectrum and the corresponding chemical score of On-line near infrared analyzer after beating and double roasting, basic data scattergram as it is shown in fig. 7, According to above-mentioned steps two, three, four, obtain their correlation coefficient as shown in Figure 8, obtain the similarity that is mutually matched between sample such as Shown in Fig. 9, relative error scattergram as shown in Figure 10, obtains the average exhausted of 210 samples sample room with its similarity mode Being 0.65 to error, average relative error is 27.42%, and mean absolute error 0.64 is more than the average absolute in reality application by mistake Difference H=0.35, it is determined that this batch of spectrum basic data quality is the poorest, it is impossible to set up a stable model.According to 210 spectrum and Basic data sets up near-infrared quantitative model, and its external certificate parameter is as shown in table 2, and table 2 is shown as second batch data near-infrared Quantitative model external certificate parameter.
Table 2
From Table 2, it can be seen that the near-infrared quantitative model predictive value set up by this group basic data and basic data Dependency is little, and validation criteria deviation is 0.41, and owing to error is too big, the model of foundation cannot be applied to reality or predict not Accurate.
Showing from Tables 1 and 2, the quality of near-infrared modeling basic data can be evaluated by the inventive method, permissible The quality of this batch of basic data was quickly judged before modeling.
Refer to Figure 11, be shown as the module diagram in the present invention one specific embodiment.Described evaluation system 1, is used for The quality of basic data for modeling comprising near infrared spectrum is evaluated, i.e. the accuracy of spectrum and basic data and Correspondence is the highest, and the quality of spectrum is the highest.Described system 1 includes:
Basic data acquisition module 11, in order to obtain a plurality of near infrared spectrum from described basic data, and obtains with every The chemical score that bar near infrared spectrum is corresponding;
Pretreatment module 12, in order to carry out pretreatment to described near infrared spectrum;
Maximum similarity spectrum acquisition module 13, in order to solve the similarity distance between each described near infrared spectrum and local Correlation coefficient;And according to the similarity distance between each described near infrared spectrum and partial auto-correlation obtain respectively with each described closely The near infrared spectrum of the maximum similarity that infrared spectrum is corresponding and corresponding chemical score;
Difference meansigma methods solves module 14, the maximum similarity corresponding in order to obtain each described near infrared spectrum respectively Near infrared spectrum between the difference of corresponding chemical score, and after all described differences are taken absolute value, obtain and described difference pair The absolute difference answered, solves the meansigma methods of all described absolute differences;
Comparison module 15, in order to by the meansigma methods of described absolute difference with preset error amount compare, when described absolutely During to the meansigma methods of difference more than described default error amount, evaluate the off quality of described basic data;When described definitely When the meansigma methods of difference is less than or equal to described default error amount, evaluate the up-to-standard of described basic data.
In the present invention one specific embodiment, the mode that described near infrared spectrum carries out pretreatment includes S-G derivation side Method.
In the present invention one specific embodiment, described maximum similarity spectrum acquisition module 12 is in order to according to described near-infrared The information content of spectrum, solves the similarity distance between each described near infrared spectrum.
In the present invention one specific embodiment, the similarity between each described near infrared spectrum is between each described near infrared spectrum Partial auto-correlation and each described near infrared spectrum between the ratio of similarity distance.
In the present invention one specific embodiment, also include adjusting module, in order to when evaluating the quality of described basic data not Time qualified, the sampling mode of the near infrared spectrum in described basic data be adjusted and/or basis flow-data is carried out Safeguard.
Described evaluation system 1 is the system entries corresponding with described evaluation methodology, both technical scheme one_to_one corresponding, and institute is relevant Description in described evaluation methodology all can be applicable in the present embodiment, is not added with at this repeating.
In sum, the evaluation methodology of the present invention and system, for the basis for modeling comprising near infrared spectrum The quality of data is evaluated, and described method includes: obtains a plurality of near infrared spectrum from described basic data, and obtains with every The chemical score that bar near infrared spectrum is corresponding;Described near infrared spectrum is carried out pretreatment;Solve between each described near infrared spectrum Similarity distance and partial auto-correlation;According to the similarity distance between each described near infrared spectrum and partial auto-correlation's difference Obtain the near infrared spectrum of the maximum similarity corresponding with each described near infrared spectrum and corresponding chemical score;Obtain each respectively The difference of chemical score corresponding between the near infrared spectrum of the maximum similarity that described near infrared spectrum is corresponding, and to all institutes State after difference takes absolute value, obtain the absolute difference corresponding with described difference, solve the meansigma methods of all described absolute differences;Will The meansigma methods of described absolute difference compares with the error amount preset, when the meansigma methods of described absolute difference is preset more than described Error amount time, evaluate the off quality of described basic data;When the meansigma methods of described absolute difference is less than or equal to described During the error amount preset, evaluate the up-to-standard of described basic data.The present invention can with precise and high efficiency near infrared spectrum Before being modeled, use sample in a small amount to carry out near infrared spectrum and pass judgment on chemical score quality, with the matter to basic data Amount is evaluated, and judges whether basic data can set up a stable accurate model, for the matter of near infrared spectrum data The evaluation of amount provides a kind of effective method of discrimination, it is to avoid cause owing to basic data is of low quality samples modeling in a large number, Also promote basic data when being of high quality, for expanding and improving the chemometrics method for basic data and provide and instruct, It is modeled underproof near infrared spectrum analyzing to get rid of, improves effectiveness and the accuracy of modeling analysis, reduce manpower The waste of material resources.So, the present invention effectively overcomes various shortcoming of the prior art and has high industrial utilization.
The principle of above-described embodiment only illustrative present invention and effect thereof, not for limiting the present invention.Any ripe Above-described embodiment all can be modified under the spirit and the scope of the present invention or change by the personage knowing this technology.Cause This, have usually intellectual such as complete with institute under technological thought without departing from disclosed spirit in art All equivalences become are modified or change, and must be contained by the claim of the present invention.

Claims (10)

1. an evaluation methodology, it is characterised in that for the quality to the basic data for modeling comprising near infrared spectrum Being evaluated, described method includes:
From described basic data, obtain a plurality of near infrared spectrum, and obtain the chemical score corresponding with every near infrared spectrum;
Described near infrared spectrum is carried out pretreatment;
Solve the similarity distance between each described near infrared spectrum and partial auto-correlation;
Obtain respectively and each described near infrared light according to the similarity distance between each described near infrared spectrum and partial auto-correlation The near infrared spectrum of the maximum similarity that spectrum is corresponding and corresponding chemical score;
Obtain chemical score corresponding between the near infrared spectrum of the corresponding maximum similarity of each described near infrared spectrum respectively Difference, and after all described differences are taken absolute value, obtain the absolute difference corresponding with described difference, solve all described definitely The meansigma methods of difference;
The meansigma methods of described absolute difference is compared with the error amount preset, when the meansigma methods of described absolute difference is more than institute When stating default error amount, evaluate the off quality of described basic data;When described absolute difference meansigma methods less than or etc. When described default error amount, evaluate the up-to-standard of described basic data.
Evaluation methodology the most according to claim 1, it is characterised in that: described near infrared spectrum is carried out the mode of pretreatment Including S-G Method of Seeking Derivative.
Evaluation methodology the most according to claim 1, it is characterised in that: according to the information content of described near infrared spectrum, ask Solve the similarity distance between each described near infrared spectrum.
Evaluation methodology the most according to claim 1, it is characterised in that: the similarity between each described near infrared spectrum is each institute State the partial auto-correlation between near infrared spectrum and the ratio of the similarity distance between each described near infrared spectrum.
Evaluation methodology the most according to claim 1, it is characterised in that: when evaluating the off quality of described basic data Time, also include the sampling mode of the near infrared spectrum in described basic data being adjusted and/or basis flow-data being entered The step that row is safeguarded.
6. evaluate system for one kind, it is characterised in that for the quality to the basic data for modeling comprising near infrared spectrum Being evaluated, described system includes:
Basic data acquisition module, in order to obtain a plurality of near infrared spectrum from described basic data, and obtains near with every red The chemical score that external spectrum is corresponding;
Pretreatment module, in order to carry out pretreatment to described near infrared spectrum;
Maximum similarity spectrum acquisition module, in order to solve the similarity distance between each described near infrared spectrum and Local Phase relation Number;And obtain respectively and each described near infrared light according to the similarity distance between each described near infrared spectrum and partial auto-correlation The near infrared spectrum of the maximum similarity that spectrum is corresponding and corresponding chemical score;
Difference meansigma methods solves module, in order to obtain the reddest of the corresponding maximum similarity of each described near infrared spectrum respectively The difference of chemical score corresponding between external spectrum, and after taking absolute value all described differences, obtains corresponding with described difference exhausted To difference, solve the meansigma methods of all described absolute differences;
Comparison module, in order to compare the meansigma methods of described absolute difference, when described absolute difference with the error amount preset Meansigma methods more than described default error amount time, evaluate the off quality of described basic data;When described absolute difference When meansigma methods is less than or equal to described default error amount, evaluate the up-to-standard of described basic data.
Evaluation system the most according to claim 6, it is characterised in that: described near infrared spectrum is carried out the mode of pretreatment Including S-G Method of Seeking Derivative.
Evaluation system the most according to claim 6, it is characterised in that: maximum similarity spectrum acquisition module is in order to according to institute State the information content of near infrared spectrum, solve the similarity distance between each described near infrared spectrum.
Evaluation system the most according to claim 6, it is characterised in that: the similarity between each described near infrared spectrum is each institute State the partial auto-correlation between near infrared spectrum and the ratio of the similarity distance between each described near infrared spectrum.
Evaluation system the most according to claim 6, it is characterised in that: also include adjusting module, in order to when evaluating described base During plinth data off quality, the sampling mode of the near infrared spectrum in described basic data is adjusted and/or to base Plinth flow-data is safeguarded.
CN201610790067.9A 2016-08-31 2016-08-31 A kind of evaluation method and system Active CN106248621B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610790067.9A CN106248621B (en) 2016-08-31 2016-08-31 A kind of evaluation method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610790067.9A CN106248621B (en) 2016-08-31 2016-08-31 A kind of evaluation method and system

Publications (2)

Publication Number Publication Date
CN106248621A true CN106248621A (en) 2016-12-21
CN106248621B CN106248621B (en) 2019-04-02

Family

ID=58080988

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610790067.9A Active CN106248621B (en) 2016-08-31 2016-08-31 A kind of evaluation method and system

Country Status (1)

Country Link
CN (1) CN106248621B (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109324015A (en) * 2018-10-17 2019-02-12 浙江中烟工业有限责任公司 Based on the similar tobacco leaf alternative of spectrum
CN109409700A (en) * 2018-10-10 2019-03-01 网宿科技股份有限公司 A kind of configuration data confirmation method, business monitoring method and device
CN110765161A (en) * 2018-07-10 2020-02-07 普天信息技术有限公司 Implementation method for applying energy consumption data quality control to big data real-time processing architecture
CN111257277A (en) * 2018-11-30 2020-06-09 湖南中烟工业有限责任公司 Tobacco leaf similarity judgment method based on near infrared spectrum technology
CN111426648A (en) * 2020-03-19 2020-07-17 甘肃省交通规划勘察设计院股份有限公司 Method and system for determining similarity of infrared spectrogram
CN113670847A (en) * 2021-09-26 2021-11-19 山东大学 Near-infrared quality monitoring method for swertia mussotii extraction process
CN113984708A (en) * 2021-10-22 2022-01-28 浙江中烟工业有限责任公司 Maintenance method and device of chemical index detection model

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101251471A (en) * 2008-03-12 2008-08-27 湖南中烟工业有限责任公司 Method for searching analog tobacco leaf based on tobacco leaf near infrared spectra
JP2009261831A (en) * 2008-04-30 2009-11-12 Pola Chem Ind Inc Estimation method of amount of sebum of skin
JP4385433B2 (en) * 1998-09-04 2009-12-16 三井化学株式会社 Manufacturing operation control method by near infrared analysis
CN103729650A (en) * 2014-01-17 2014-04-16 华东理工大学 Selection method for near infrared spectrum modeling samples
CN104330381A (en) * 2014-10-25 2015-02-04 陕西玉航电子有限公司 Near-infrared spectrum analysis method
CN104990894A (en) * 2015-07-09 2015-10-21 南京富岛信息工程有限公司 Detection method of gasoline properties based on weighted absorbance and similar samples
CN105136736A (en) * 2015-09-14 2015-12-09 上海创和亿电子科技发展有限公司 Online near infrared sample size determination method
CN105334185A (en) * 2015-09-14 2016-02-17 上海创和亿电子科技发展有限公司 Spectrum projection discrimination-based near infrared model maintenance method
CN105891147A (en) * 2016-03-30 2016-08-24 浙江中烟工业有限责任公司 Near infrared spectrum information extraction method based on canonical correlation coefficients

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP4385433B2 (en) * 1998-09-04 2009-12-16 三井化学株式会社 Manufacturing operation control method by near infrared analysis
CN101251471A (en) * 2008-03-12 2008-08-27 湖南中烟工业有限责任公司 Method for searching analog tobacco leaf based on tobacco leaf near infrared spectra
JP2009261831A (en) * 2008-04-30 2009-11-12 Pola Chem Ind Inc Estimation method of amount of sebum of skin
CN103729650A (en) * 2014-01-17 2014-04-16 华东理工大学 Selection method for near infrared spectrum modeling samples
CN104330381A (en) * 2014-10-25 2015-02-04 陕西玉航电子有限公司 Near-infrared spectrum analysis method
CN104990894A (en) * 2015-07-09 2015-10-21 南京富岛信息工程有限公司 Detection method of gasoline properties based on weighted absorbance and similar samples
CN105136736A (en) * 2015-09-14 2015-12-09 上海创和亿电子科技发展有限公司 Online near infrared sample size determination method
CN105334185A (en) * 2015-09-14 2016-02-17 上海创和亿电子科技发展有限公司 Spectrum projection discrimination-based near infrared model maintenance method
CN105891147A (en) * 2016-03-30 2016-08-24 浙江中烟工业有限责任公司 Near infrared spectrum information extraction method based on canonical correlation coefficients

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
张浚哲 等: ""一种基于变权重组合的光谱相似性测度"", 《测绘学报》 *
陈斌 等: ""PCA结合马氏距离法剔除近红外异常样品"", 《江苏大学学报》 *

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110765161A (en) * 2018-07-10 2020-02-07 普天信息技术有限公司 Implementation method for applying energy consumption data quality control to big data real-time processing architecture
CN109409700A (en) * 2018-10-10 2019-03-01 网宿科技股份有限公司 A kind of configuration data confirmation method, business monitoring method and device
CN109409700B (en) * 2018-10-10 2022-03-08 网宿科技股份有限公司 Configuration data confirmation method, service monitoring method and device
CN109324015A (en) * 2018-10-17 2019-02-12 浙江中烟工业有限责任公司 Based on the similar tobacco leaf alternative of spectrum
CN109324015B (en) * 2018-10-17 2021-07-13 浙江中烟工业有限责任公司 Tobacco leaf replacing method based on spectrum similarity
CN111257277A (en) * 2018-11-30 2020-06-09 湖南中烟工业有限责任公司 Tobacco leaf similarity judgment method based on near infrared spectrum technology
CN111257277B (en) * 2018-11-30 2023-02-17 湖南中烟工业有限责任公司 Tobacco leaf similarity judgment method based on near infrared spectrum technology
CN111426648A (en) * 2020-03-19 2020-07-17 甘肃省交通规划勘察设计院股份有限公司 Method and system for determining similarity of infrared spectrogram
CN113670847A (en) * 2021-09-26 2021-11-19 山东大学 Near-infrared quality monitoring method for swertia mussotii extraction process
CN113984708A (en) * 2021-10-22 2022-01-28 浙江中烟工业有限责任公司 Maintenance method and device of chemical index detection model
CN113984708B (en) * 2021-10-22 2024-03-19 浙江中烟工业有限责任公司 Maintenance method and device for chemical index detection model

Also Published As

Publication number Publication date
CN106248621B (en) 2019-04-02

Similar Documents

Publication Publication Date Title
CN106248621A (en) A kind of evaluation methodology and system
CN110161013B (en) Laser-induced breakdown spectroscopy data processing method and system based on machine learning
CN105630743A (en) Spectrum wave number selection method
CN107958267B (en) Oil product property prediction method based on spectral linear representation
CN109064553B (en) Solid wood board section morphology inversion method based on near infrared spectrum analysis
CN105334185B (en) The near-infrared model maintaining method differentiated based on spectrum projection
WO2020029851A1 (en) Workflow-based vibration spectrum analysis model optimization method
CN105843870B (en) The analysis method and its application of repeatability and reproducibility
CN105203498A (en) Near infrared spectrum variable selection method based on LASSO
CN109324013A (en) A method of it is quickly analyzed using Gaussian process regression model building oil property near-infrared
CN107941739A (en) A kind of SBS performance of modified bitumen index method for rapidly judging
McNeish et al. The effect of measurement quality on targeted structural model fit indices: A comment on Lance, Beck, Fan, and Carter (2016).
CN110569566A (en) Method for predicting mechanical property of plate strip
US8725469B2 (en) Optimization of data processing parameters
WO2023207453A1 (en) Traditional chinese medicine ingredient analysis method and system based on spectral clustering
CN114611582B (en) Method and system for analyzing substance concentration based on near infrared spectrum technology
CN105223140A (en) The method for quickly identifying of homology material
CN107976417B (en) Crude oil type identification method based on infrared spectrum
CN107966420B (en) Method for predicting crude oil property by near infrared spectrum
CN104897709A (en) Agricultural product element quantitative detection model building method based on X-ray fluorescence analysis
CN108663334B (en) Method for searching spectral characteristic wavelength of soil nutrient based on multi-classifier fusion
CN106485049B (en) A kind of detection method of the NIRS exceptional sample based on Monte Carlo cross validation
CN108920428B (en) Fuzzy distance discrimination method based on joint fuzzy expansion principle
CN111474124B (en) Spectral wavelength selection method based on compensation
CN108489928A (en) A kind of short-wave infrared extinction spectra textile fiber component detection method

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant