WO2006088350A1 - Procede et systeme pour la selection d'une dimension d'un modele de calibrage et utilisation d'un tel modele de calibrage - Google Patents

Procede et systeme pour la selection d'une dimension d'un modele de calibrage et utilisation d'un tel modele de calibrage Download PDF

Info

Publication number
WO2006088350A1
WO2006088350A1 PCT/NL2005/000124 NL2005000124W WO2006088350A1 WO 2006088350 A1 WO2006088350 A1 WO 2006088350A1 NL 2005000124 W NL2005000124 W NL 2005000124W WO 2006088350 A1 WO2006088350 A1 WO 2006088350A1
Authority
WO
WIPO (PCT)
Prior art keywords
model
calibration
dimensionality
risk
data
Prior art date
Application number
PCT/NL2005/000124
Other languages
English (en)
Inventor
Nicolaas Maria Faber
Original Assignee
Chemometry Consultancy
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chemometry Consultancy filed Critical Chemometry Consultancy
Priority to PCT/NL2005/000124 priority Critical patent/WO2006088350A1/fr
Publication of WO2006088350A1 publication Critical patent/WO2006088350A1/fr

Links

Classifications

    • GPHYSICS
    • G01MEASURING; TESTING
    • G01NINVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
    • G01N21/00Investigating or analysing materials by the use of optical means, i.e. using sub-millimetre waves, infrared, visible or ultraviolet light
    • G01N21/17Systems in which incident light is modified in accordance with the properties of the material investigated
    • G01N21/25Colour; Spectral properties, i.e. comparison of effect of material on the light at two or more different wavelengths or wavelength bands
    • G01N21/31Investigating relative effect of material at wavelengths characteristic of specific elements or molecules, e.g. atomic absorption spectrometry
    • G01N21/35Investigating relative effect of material at wavelengths characteristic of specific elements or molecules, e.g. atomic absorption spectrometry using infrared light
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01JMEASUREMENT OF INTENSITY, VELOCITY, SPECTRAL CONTENT, POLARISATION, PHASE OR PULSE CHARACTERISTICS OF INFRARED, VISIBLE OR ULTRAVIOLET LIGHT; COLORIMETRY; RADIATION PYROMETRY
    • G01J3/00Spectrometry; Spectrophotometry; Monochromators; Measuring colours
    • G01J3/28Investigating the spectrum
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01NINVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
    • G01N21/00Investigating or analysing materials by the use of optical means, i.e. using sub-millimetre waves, infrared, visible or ultraviolet light
    • G01N21/17Systems in which incident light is modified in accordance with the properties of the material investigated
    • G01N21/25Colour; Spectral properties, i.e. comparison of effect of material on the light at two or more different wavelengths or wavelength bands
    • G01N21/27Colour; Spectral properties, i.e. comparison of effect of material on the light at two or more different wavelengths or wavelength bands using photo-electric detection ; circuits for computing concentration
    • G01N21/274Calibration, base line adjustment, drift correction
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01NINVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
    • G01N21/00Investigating or analysing materials by the use of optical means, i.e. using sub-millimetre waves, infrared, visible or ultraviolet light
    • G01N21/17Systems in which incident light is modified in accordance with the properties of the material investigated
    • G01N21/25Colour; Spectral properties, i.e. comparison of effect of material on the light at two or more different wavelengths or wavelength bands
    • G01N21/31Investigating relative effect of material at wavelengths characteristic of specific elements or molecules, e.g. atomic absorption spectrometry
    • G01N21/35Investigating relative effect of material at wavelengths characteristic of specific elements or molecules, e.g. atomic absorption spectrometry using infrared light
    • G01N21/3504Investigating relative effect of material at wavelengths characteristic of specific elements or molecules, e.g. atomic absorption spectrometry using infrared light for analysing gases, e.g. multi-gas analysis
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01NINVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
    • G01N21/00Investigating or analysing materials by the use of optical means, i.e. using sub-millimetre waves, infrared, visible or ultraviolet light
    • G01N21/17Systems in which incident light is modified in accordance with the properties of the material investigated
    • G01N21/25Colour; Spectral properties, i.e. comparison of effect of material on the light at two or more different wavelengths or wavelength bands
    • G01N21/31Investigating relative effect of material at wavelengths characteristic of specific elements or molecules, e.g. atomic absorption spectrometry
    • G01N21/35Investigating relative effect of material at wavelengths characteristic of specific elements or molecules, e.g. atomic absorption spectrometry using infrared light
    • G01N21/3563Investigating relative effect of material at wavelengths characteristic of specific elements or molecules, e.g. atomic absorption spectrometry using infrared light for analysing solids; Preparation of samples therefor
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01NINVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
    • G01N21/00Investigating or analysing materials by the use of optical means, i.e. using sub-millimetre waves, infrared, visible or ultraviolet light
    • G01N21/17Systems in which incident light is modified in accordance with the properties of the material investigated
    • G01N21/25Colour; Spectral properties, i.e. comparison of effect of material on the light at two or more different wavelengths or wavelength bands
    • G01N21/31Investigating relative effect of material at wavelengths characteristic of specific elements or molecules, e.g. atomic absorption spectrometry
    • G01N21/35Investigating relative effect of material at wavelengths characteristic of specific elements or molecules, e.g. atomic absorption spectrometry using infrared light
    • G01N21/3577Investigating relative effect of material at wavelengths characteristic of specific elements or molecules, e.g. atomic absorption spectrometry using infrared light for analysing liquids, e.g. polluted water
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01NINVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
    • G01N21/00Investigating or analysing materials by the use of optical means, i.e. using sub-millimetre waves, infrared, visible or ultraviolet light
    • G01N21/17Systems in which incident light is modified in accordance with the properties of the material investigated
    • G01N21/25Colour; Spectral properties, i.e. comparison of effect of material on the light at two or more different wavelengths or wavelength bands
    • G01N21/31Investigating relative effect of material at wavelengths characteristic of specific elements or molecules, e.g. atomic absorption spectrometry
    • G01N21/35Investigating relative effect of material at wavelengths characteristic of specific elements or molecules, e.g. atomic absorption spectrometry using infrared light
    • G01N21/359Investigating relative effect of material at wavelengths characteristic of specific elements or molecules, e.g. atomic absorption spectrometry using infrared light using near infrared light

Definitions

  • the present invention relates to a method for providing a calibration model comprising a quantitative relation between object (or sample) measurement data and object (or sample) property value(s).
  • the calibration model can also be used for classifying object property values, e.g. in case of gasoline classification, or for analysis of effects in controlled experiments.
  • the method comprises acquiring measurement data of a set of JV training objects, the measurement data comprising K predictor values associated with the object measurement data and M predictand values associated with the sample property values for each sample, JV, K and M being integer values, resulting in an JV x K predictor matrix X and an JV x M predictand matrix Y, obtaining calibration models interrelating the predictor matrix X and the predictand matrix Y, each calibration model having model terms, the number of which determines the model dimensionality, and validating the obtained calibration models (e.g. by obtaining a test statistic, such as a cross-validated error).
  • the present invention relates to a system for providing a calibration, classification or analysis of effects model comprising a quantitative relation between object measurement data and object property values
  • the system comprising a sample unit for acquiring measurement data of a set of JV training objects, a predictor measurement unit connected to the sample unit for providing measurement data comprising K predictor values associated with the object measurement data, a predictand unit connected to the sample unit for providing M predictand values associated with the sample property values for each sample, JV, K and M being integer values, resulting in an NxK predictor matrix X and an JV x M predictand matrix Y, and a data processor unit connected to the predictor measurement unit and the predictand unit, the data processor unit being arranged for obtaining calibration models interrelating the predictor matrix X and the predictand matrix Y, each calibration model having model terms, the number of which determines the model dimensionality, and validating the obtained calibration models.
  • NIR near-infrared
  • Multivariate calibration is then used to develop a quantitative relation, i.e., a model, between the digitized spectra, stored in a data matrix X, and the concentrations, stored in a data matrix Y, as reviewed by H. Martens and T. Naes, Multivariate Calibration, Wiley, NY, 1989.
  • NIR spectroscopy is also increasingly used to infer other properties (stored in Y) of samples than concentrations, e.g., the strength and viscosity of polymers, the thickness of a tablet coating, and the octane rating of gasoline.
  • the first step towards constructing a multivariate calibration model is to remove undesirable features from the X data by pre-treatment techniques such as filtering or differentiation.
  • the next critical step serves to select the optimum model dimensionality, which is the number of terms that constitute the multivariate model. This step is equivalent to determining the optimum degree of a polynomial for fitting univariate (x,y)-data pairs.
  • it is a much harder problem to solve for multivariate calibration owing to the higher complexity of the data at hand and the often-tiny substructures to be discovered. Many methods have been developed to solve this problem, of which model validation is the most frequently applied one in practice.
  • validation amounts to assessing the ability of the model to predict the properties of interest for unknown future objects, e.g., chemical or biological samples, from the same type.
  • This assessment can be performed in two essentially different modes, namely externally and internally.
  • the adjective 'external' refers to the requirement that the validation objects be independent of the objects used for constructing the model, i.e., the training set, otherwise one does not properly assess the ability to predict for truly unknown future objects. For example, replicates are not allowed.
  • the predictive ability is estimated by applying the model to these independent validation objects and averaging the squared prediction errors, i.e., the differences between model prediction and the associated known value. The square root of this average squared error is known as the root mean squared error of prediction (RMSEP).
  • Internal validation differs from external validation in the sense that the validation objects are taken from the training set itself, i.e., the validation objects are not independent.
  • To execute an internal validation one has the choice between cross- validation and leverage correction.
  • cross-validation one constructs models after judiciously leaving out segments of objects. Then an estimate of RMSEP follows by averaging squared prediction errors for the left-out objects, as in external validation.
  • Cross-validation can be quite computer-intensive, depending on the size of the data sets and the number of segments. Leverage correction is a 'quick and dirty' alternative. Calibration model validation is problematic for various reasons. External validation is best in the sense that a closer assessment of RMSEP is possible.
  • the shape of the RMSEP curve depends on the mode (external or internal), the type of data and also on the particular data pre- treatment (first-, second-derivative, etc.), ad hoc rules are used for visual interpretation.
  • the optimum dimensionality is found, for example, not only as the one that leads to a global minimum, which would be the 'logical' selection criterion, but also, depending on the shape incidentally encountered, as the first local minimum, a plateau, or where the curve 'levels off.
  • a practitioner may use different selection criteria for different pre-treatments of the same data.
  • the present invention seeks to provide an improved method and system for providing a calibration model having a good predictive ability when challenged with future unknown objects of the same type as the objects used for constructing the model.
  • the method and system should be objective in selecting the optimum model dimensionality. As a result, the actual use of the model for predicting a sample property from measured, practical data sets will be much more trustworthy.
  • obtaining the calibration model comprises initialising the calibration model using a predetermined minimum number of model terms, adding an additional model term to increase the dimensionality of the calibration model, calculating a risk of over-fitting by the model associated with the current dimensionality of the model, and repeating the adding of an additional model term and calculating of the risk of over-fitting up to a predetermined dimensionality of the model.
  • the initial calibration model includes a predetermined minimum dimensionality. This may be a single model term, but in cases for which prior knowledge exists, e.g. from experimental design considerations, it may be advantageous to start with a higher dimensionality.
  • calculating a risk comprises calculating a cumulative risk for the calibration model up to and including the current dimensionality of the calibration model, and the repeating of adding and calculating is executed until the cumulative risk of over-fitting of the current calibration model exceeds a predetermined risk threshold value.
  • This embodiment allows to objectively obtain an optimum calibration model dimensionality, based on a risk parameter indicating an acceptable risk of over-fitting.
  • This acceptable risk depends on circumstances, and may e.g. be chosen as 5%, 1%, or even very low (e.g. 0.01%) in case of forensic testing.
  • the method further comprises ordering the additional model terms of the calibration model.
  • an explicit ordering of the model terms may be beneficial, e.g. in Principle Component Regression (PCR).
  • an implied ordering may already be present, e.g. in Partial Least Squares Regression (PLSR), but still an explicit ordering may provide advantages in this case.
  • the explicit ordering may be based on using correlation, using prediction coefficient, using correlation relative standard deviation, or even using top down ordering.
  • the model dimensionality is limited to a maximum value. When all possible calibration models up to the maximum value are observed, this is particularly advantageous to demonstrate that the higher-numbered non-significant terms of the calibration models may be left out with effectiveness.
  • a statistical procedure is selected for quantifying the risk of over-fitting in a further embodiment.
  • the statistical procedure comprises a randomization test.
  • a randomization test parameter e.g. number of randomizations
  • validating the obtained calibration models comprises obtaining a test statistic for each of the obtained calibration models. This may e.g. be based on correlation, model fit, internal validation, external validation, etc.
  • the present invention relates to a system as defined in the preamble, in which the data processor unit is further arranged for obtaining the calibration model by initialising the calibration model using a predetermined minimum number of model terms, adding an additional model term to increase the dimensionality of the calibration model, calculating a risk of over-fitting by the model associated with the current dimensionality of the model, and repeating the adding of an additional model term and calculating of the risk of over-fitting up to a predetermined dimensionality of the model.
  • the data processing unit may further be arranged for executing the various embodiments of the present method.
  • the present invention relates to the use of a calibration model obtained by the present method, comprising inputting measurement data of a sample to the model for obtaining at least one property value related to the sample.
  • FIG. 1 represents a schematic description of the various steps leading to a multivariate calibration model according to current practice
  • FIGS. Ia and b illustrate near-infrared (NIR) absorbance spectra, in FIG. Ia for the prediction of octane rating of gasoline, and in FIG. 2b for the prediction of hydrogen content of gas oil;
  • FIGS. 3a and b illustrate the dependence of cross- validated RMSEP on trial model dimensionality, in FIG. 3a are results for the octane data, and in FIG. 3ZJ for the gas oil data;
  • FIGS. 4a and b illustrate frequency histograms of the test statistic after randomization in comparison with the value actually observed for the input data for increasing trial model dimensionality
  • FIG. 4a are results for the octane data
  • FIG. 4b for the gas oil data
  • FIGS. 5 ⁇ and b illustrate the cumulative risk of over-fitting the data for increasing trial model dimensionality
  • FIG. 5a are results for the octane data
  • FIG. 5b for the gas oil data
  • FIG. 6 represents a schematic description of one embodiment of the method according to the present invention.
  • FIG. 7 shows an embodiment of a calibration model providing system according to an embodiment of the present invention. PREFERRED EMBODIMENTS OF THE INVENTION
  • the present invention is henceforth described as using input data from near- infrared (NIR) spectroscopy, but it should be appreciated that it is possible to enter input data from almost any technical field as long as the objectives set forth through the invention are fulfilled.
  • the method according to the present invention is not limited to spectroscopic input data.
  • the term property used in the present invention shall be given a broad interpretation: it may comprise properties of solid, semi-solid, fluid, vapor samples etc. such as concentration, density, elasticity, viscosity, strength, thickness, class belonging (e.g. octane rating for gasoline classification) etc., but also predictions from probability input data (e.g. stock market information) or other input figures for prediction from any technical field etc.
  • upper-case bold characters are used for matrices, e.g., X and Y, lower-case bold characters for column vectors, e.g., t, italic characters for scalars, e.g., a and A.
  • Transposition of matrices and vectors will be denoted by a superscripted "T", e.g., P ⁇ .
  • the multivariate calibration model under consideration approximates the data for the training set objects as a sum of outer products of vectors:
  • N is the number of training objects
  • K is the number of predictor variables
  • A is the model dimensionality.
  • the predictor data of a single object constitute a row vector in X hence a single index suffices to characterize them.
  • Multiway calibration deals with predictor data of higher complexity than multivariate calibration (A. Smilde, R. Bro and P. Geladi, Multi-way Analysis. Applications in the Chemical Sciences, Wiley, Chichester, 2004).
  • the predictor data of a single object constitute an array hence a single index no longer suffices to characterize them.
  • the resulting predictor data are characterized by four indices, namely three spatial indices and one spectral index, all of which are independent, thus leading to a four- way array.
  • a multiway calibration model can, however, also be equally represented in terms of score vectors (t ⁇ 's), and loading vectors (p ⁇ 's and q a 's), but now there will exist constraints among the elements of the loading vectors — the single pseudo-index corresponds one-one to multiple physical indices in an ordered way. Although these constraints even depend on the estimation procedure deployed, the multiway calibration model can be represented as a sum of outer products of vectors, without loss of information. The problem of selecting the optimum multiway calibration model dimensionality is therefore, in principle, the same as selecting the optimum multivariate calibration model dimensionality. Thus, multivariate solutions are equally valid in the multiway domain.
  • the properties of interest, arranged in the rows of Y, are usually continuous variables. However, the same set-up of X and Y data can be used to predict integer- valued properties of interest. In the case where the integer codes for class membership, the purpose of the model would be classification, rather than calibration. The problem of selecting the optimum classification model dimensionality is therefore, in principle, the same as selecting the optimum calibration model dimensionality. Thus, calibration solutions are equally valid for classification, whether the predictor data are multivariate or multiway.
  • the predictors, arranged in the rows of X, are usually continuous variables. However, the same set-up of X and Y data can be used to predict from integer- valued predictors.
  • the purpose of the model would be analysis of effects, rather than calibration, see H. Martens and M. Martens, Multivariate Analysis of Quality. An Introduction, Wiley, Chichester, 2001.
  • the problem of selecting the optimum analysis of effects model dimensionality is therefore, in principle, the same as selecting the optimum calibration model dimensionality. Thus, calibration solutions are equally valid for analysis of effects.
  • FIG. 1 represents a schematic description of the various steps leading to a multivariate calibration model of the form under consideration, according to the currently most common practice.
  • step 100 samples are taken from a substance or matter and subjected to a multivariate data source (step 110, measurement of K predictor values), for instance a spectrometer, a chromatograph, or an electrochemical instrument, i.e., an instrument that provides multiple dimensional data, vectors, as a result.
  • the predictor data are arranged in, for example, the rows of a matrix X with Nx K elements (step 120).
  • concentration or property measurements step 130
  • step 130 concentration or property measurements from the sample substance or matter yield the predictand matrix or vector Y of size NxM (step 140).
  • step 150 concentration or property measurements
  • the trial models are ranked in step 170 according to the validation results.
  • the results of the best ranking trial model(s) are reported in step 180, e.g. on displaying means. It is seen that constructing a multivariate calibration model is essentially a trial and error process, because the best setting is usually not known in advance.
  • the main problem is to avoid over-fitting. In other words, it is desirable to stop adding terms to the model when they represent noise. Since terms are added sequentially, the actual risk of over-fitting increases monotonously with the number of terms. To decide about adding a term, the associated cumulative risk must be made precise in terms of a probability, which is a well-defined topic in statistics. Thus, to adequately control the actual risk of over-fitting, one should estimate it using a statistical procedure. For each term it must be determined whether the risk of over-fitting is acceptable, or not. If the risk is acceptable, the term under scrutiny passes the statistical test; otherwise one stops adding terms. The risk thus estimated should incorporate the risk of over-fitting for previous terms.
  • the practitioner must select a criterion for ordering the terms, when ordering is opted for. Clearly, the criterion must reflect the relevance of a term for the description of the property of interest (Y).
  • the present version of the COMODITE method uses the correlation between the score vectors and the property of interest, but any criterion based on model fit, internal or external validation etc. may be equally suitable. Consequently, the ordering is considered to be suitable for PLSR by construction, but an explicit ordering step is required for, for example, PCR. • The practitioner must set the overall acceptable risk of over-fitting, ⁇ .
  • the present version of the COMODITE method includes a randomization test for determining the risk, but any statistical procedure that does not make overly stringent assumptions about the data is suitable.
  • the particular choice of a randomization test implies that the model term under test must be constructed from the data from which the lower-numbered terms are eliminated, i.e., residual data sets. The reason for this is, that the previously tested terms would give a spurious contribution to the test statistic under the null-hypothesis. Alternatives for the randomization test may require similar actions.
  • the practitioner must select the test statistic, T.
  • the present version of the COMODITE method uses the correlation between the score vectors and the property of interest, but any criterion based on model fit, internal or external validation etc. may be equally suitable.
  • the practitioner must determine how to update the risk of over-fitting.
  • the cumulative risk is calculated from a product of estimated probabilities, which is exact for independent events and conservative otherwise.
  • the multivariate PLSR calibration models must relate tiny substructures in the spectra to variations in the properties of interest.
  • the optimum model dimensionality depends on the property of interest when calibrating NIR spectra. In other words, limited prior knowledge is available, which makes the selection of model dimensionality a critical step.
  • the results of the method according to the present invention are shown in FIG. 4. Compared are the test statistic obtained for the actual data set (vertical dashed line) and the frequency histogram for the test statistic after randomizing the rows of Y. The number of randomizations is set to the rather high value of 1000 to obtain representative results. If a term contains real structure, then the test statistic for the actual data set should stand out. By contrast, if a term does not contain real structure, then it doesn't matter whether one scrambles the rows of Y ⁇ (relative to the rows in X ⁇ ) since there is no relation between the rows of the residual matrices X a and Y ⁇ anyway.
  • FIG. 5 These plots further illustrate that the method according to the present invention is able to highlight subtle aspects of the model dimensionality selection process. By contrast, cross-validation gives no indication that these data sets could require such vastly different dimensionalities.
  • FIG. 6 A schematic description the method according to the present invention is now provided with reference to FIG. 6. Selection of model dimensionality in step 600 precedes validation step 610 of the trial model. In principle, one only needs to validate the trial data pre-treatment for the optimum selected model dimensionality, which is computationally efficient. The rather conspicuous difference with FIG. 1 is that the essentially different tasks of selection of model dimensionality and model validation are disentangled.
  • the method according to the present invention may be implemented in a hardware environment, e.g. in the exemplary embodiment of a calibration model providing system 10 as shown in Fig. 7.
  • the system 10 comprises a sample unit 11 which is arranged to obtain measurement data from a group of N samples.
  • the sample unit 11 is connected to a predictor measurement unit 13, e.g.
  • the sample unit 11 is also connected to a predictand unit 12, which is arranged to provide the property values related to each sample, in order to obtain the N x M predictand matrix Y.
  • the predictand unit 12 may comprise an actual measurement apparatus, or it may comprise an input unit for entering a known property value (or values) for each of the N samples.
  • the system 10 comprises a data processing unit 14 connected to a memory unit 15.
  • the memory unit 15 may comprise suitable program code to control the data processing unit 14 to function according to the present method, and may comprise a semiconductor memory device, a magnetic memory devices (such as a hard disk), an optical memory device, etc.
  • the data processing unit 14 may comprise a processor or multiple processors.
  • the memory unit 15 is also suitable for storing intermediate calculation results and for storing the eventually obtained calibration model.
  • the data processing unit 14 and memory unit 15 may be formed by a general purpose personal computer, arranged to interface with the predictand unit 12 and predictor measurement unit 13. As will be apparent to the skilled person, input/output devices will be part of the data processing unit 13 for controlling the system 10 by an operator.
  • the data processing unit 14 is furthermore connected with a reporting unit 16 for outputting data related to the present invention, and may be formed by a display, a printer, or further storage means.
  • the system 10 may be used to obtain the desired property values of further actual samples with unknown properties.
  • the predictand unit 12 is not used, and the data processing unit 14 is only used to obtain the desired property value (values) of samples from the related measurement data using the calibration model.

Landscapes

  • Physics & Mathematics (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • General Physics & Mathematics (AREA)
  • Biochemistry (AREA)
  • Chemical & Material Sciences (AREA)
  • Analytical Chemistry (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Immunology (AREA)
  • Pathology (AREA)
  • Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Theoretical Computer Science (AREA)
  • Investigating Or Analysing Materials By Optical Means (AREA)

Abstract

Procédé et système proposant un modèle de calibrage comprenant une relation quantitative entre des données de mesure d'objet et des valeurs de propriété d'objet. Le système (10) comprend une unité d'échantillon (11), une unité de mesure de prévision (13), une unité de prévision (12) et une unité de traitement de données (14), laquelle est disposée en vue d'obtenir des modèles de calibrage, chaque modèle comportant un certain nombre de termes modèles qui détermine la dimension du modèle tout en validant les modèles de calibrage obtenus. L'unité de traitement de données (14) comporte en outre le modèle de calibrage par le biais de l'initialisation du modèle de calibrage, l'ajout d'un terme modèle supplémentaire afin d'augmenter la dimensionnalité du modèle de calibrage, le calcul d'un risque de dépassement par le modèle associé à la dimension courante du modèle, et la répétition de l'ajout d'un terme modèle supplémentaire et le calcul du risque de dépassement jusqu'à atteindre une dimension prédéterminée du modèle.
PCT/NL2005/000124 2005-02-21 2005-02-21 Procede et systeme pour la selection d'une dimension d'un modele de calibrage et utilisation d'un tel modele de calibrage WO2006088350A1 (fr)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/NL2005/000124 WO2006088350A1 (fr) 2005-02-21 2005-02-21 Procede et systeme pour la selection d'une dimension d'un modele de calibrage et utilisation d'un tel modele de calibrage

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/NL2005/000124 WO2006088350A1 (fr) 2005-02-21 2005-02-21 Procede et systeme pour la selection d'une dimension d'un modele de calibrage et utilisation d'un tel modele de calibrage

Publications (1)

Publication Number Publication Date
WO2006088350A1 true WO2006088350A1 (fr) 2006-08-24

Family

ID=34960583

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/NL2005/000124 WO2006088350A1 (fr) 2005-02-21 2005-02-21 Procede et systeme pour la selection d'une dimension d'un modele de calibrage et utilisation d'un tel modele de calibrage

Country Status (1)

Country Link
WO (1) WO2006088350A1 (fr)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109711606A (zh) * 2018-12-13 2019-05-03 平安医疗健康管理股份有限公司 一种基于模型的数据预测方法及装置

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0415401A2 (fr) * 1989-09-01 1991-03-06 Edward W. Stark Méthode et appareil pour la correction de signal multiplicatif
US6075594A (en) * 1997-07-16 2000-06-13 Ncr Corporation System and method for spectroscopic product recognition and identification
US6480795B1 (en) * 1998-03-13 2002-11-12 Buchi Labortechnik Ag Automatic calibration method
US20030143520A1 (en) * 2002-01-31 2003-07-31 Hood Leroy E. Gene discovery for the system assignment of gene function
WO2004003969A2 (fr) * 2002-06-28 2004-01-08 Tokyo Electron Limited Procede et systeme de prediction de performance du processus a l'aide d'un outil de traitement de materiaux et de donnees capteur

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0415401A2 (fr) * 1989-09-01 1991-03-06 Edward W. Stark Méthode et appareil pour la correction de signal multiplicatif
US6075594A (en) * 1997-07-16 2000-06-13 Ncr Corporation System and method for spectroscopic product recognition and identification
US6480795B1 (en) * 1998-03-13 2002-11-12 Buchi Labortechnik Ag Automatic calibration method
US20030143520A1 (en) * 2002-01-31 2003-07-31 Hood Leroy E. Gene discovery for the system assignment of gene function
WO2004003969A2 (fr) * 2002-06-28 2004-01-08 Tokyo Electron Limited Procede et systeme de prediction de performance du processus a l'aide d'un outil de traitement de materiaux et de donnees capteur

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109711606A (zh) * 2018-12-13 2019-05-03 平安医疗健康管理股份有限公司 一种基于模型的数据预测方法及装置

Similar Documents

Publication Publication Date Title
Andersen et al. Variable selection in regression—a tutorial
Faber et al. How to avoid over-fitting in multivariate calibration—The conventional validation approach and an alternative
Cuadros-Rodríguez et al. Quality performance metrics in multivariate classification methods for qualitative analysis
Riedl et al. Review of validation and reporting of non-targeted fingerprinting approaches for food authentication
Mehdizadeh et al. An intelligent system for egg quality classification based on visible-infrared transmittance spectroscopy
EP2428802B1 (fr) Dispositif d'analyse automatique et procédé d'analyse
JP4856993B2 (ja) 自己診断型自動分析装置
Filgueiras et al. Evaluation of trends in residuals of multivariate calibration models by permutation test
JP2008530536A (ja) 検体の非侵襲性を測定する方法および装置
CA2228844C (fr) Analyse de fluides biologiques par detection des valeurs aberrantes par distances generalisees
Liu et al. A comparative study for least angle regression on NIR spectra analysis to determine internal qualities of navel oranges
EP2286203A1 (fr) Analyse de données spectrales pour la sélection d'un modèle d'étalonnage
Zhu et al. Study on the quantitative measurement of firmness distribution maps at the pixel level inside peach pulp
Shao et al. Multivariate calibration of near-infrared spectra by using influential variables
Miller Chemometrics in process analytical chemistry
González et al. A robust partial least squares regression method with applications
Jiang et al. Molecular spectroscopic wavelength selection using combined interval partial least squares and correlation coefficient optimization
Omidikia et al. Uninformative variable elimination assisted by gram–Schmidt orthogonalization/successive projection algorithm for descriptor selection in QSAR
Sharma et al. Point‐of‐care detection of fibrosis in liver transplant surgery using near‐infrared spectroscopy and machine learning
JP4366261B2 (ja) 測定反応過程の異常の有無判定方法,該方法を実行可能な自動分析装置及び該方法のプログラムを記憶した記憶媒体
JP2018504709A (ja) 自動定量的回帰
Rodionova et al. Application of SIC (simple interval calculation) for object status classification and outlier detection—comparison with regression approach
Wang et al. SVM classification method of waxy corn seeds with different vitality levels based on hyperspectral imaging
WO2006088350A1 (fr) Procede et systeme pour la selection d'une dimension d'un modele de calibrage et utilisation d'un tel modele de calibrage
Zhang et al. Robust principal components regression based on principal sensitivity vectors

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application
NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 05710903

Country of ref document: EP

Kind code of ref document: A1