CN114925321B - Novel robust estimation method and device for overcoming pollution data and uncertain events - Google Patents

Novel robust estimation method and device for overcoming pollution data and uncertain events Download PDF

Info

Publication number
CN114925321B
CN114925321B CN202210524867.1A CN202210524867A CN114925321B CN 114925321 B CN114925321 B CN 114925321B CN 202210524867 A CN202210524867 A CN 202210524867A CN 114925321 B CN114925321 B CN 114925321B
Authority
CN
China
Prior art keywords
estimator
lehmann
hodges
whl
weighted
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202210524867.1A
Other languages
Chinese (zh)
Other versions
CN114925321A (en
Inventor
高学鸿
周剑兰
黄国忠
向治锦
刘雪敏
李浩轩
蒋慧灵
周亮
张磊
邓青
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
University of Science and Technology Beijing USTB
Original Assignee
University of Science and Technology Beijing USTB
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by University of Science and Technology Beijing USTB filed Critical University of Science and Technology Beijing USTB
Priority to CN202210524867.1A priority Critical patent/CN114925321B/en
Publication of CN114925321A publication Critical patent/CN114925321A/en
Application granted granted Critical
Publication of CN114925321B publication Critical patent/CN114925321B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/16Matrix or vector computation, e.g. matrix-matrix or matrix-vector multiplication, matrix factorization
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/18Complex mathematical operations for evaluating statistical data, e.g. average values, frequency distributions, probability functions, regression analysis

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Mathematical Physics (AREA)
  • Mathematical Analysis (AREA)
  • Theoretical Computer Science (AREA)
  • Mathematical Optimization (AREA)
  • Pure & Applied Mathematics (AREA)
  • Computational Mathematics (AREA)
  • Algebra (AREA)
  • Databases & Information Systems (AREA)
  • Software Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Computing Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Operations Research (AREA)
  • Probability & Statistics with Applications (AREA)
  • Control Of Conveyors (AREA)

Abstract

The invention discloses a novel robust estimation method and device for overcoming pollution data and uncertain events, and relates to the technical field of data robust optimization. The method comprises the following steps: the method can be applied to a mathematical optimization model based on the weighted Hodges-Lehmann method, and can be used for processing uncertain events containing pollution data. Defining uncertainty of the pollution data and the data information; introducing a Hodges-Lehmann robust estimator; providing a single-term weighted Hodges-Lehmann robust estimator; and (3) providing a binomial weighted Hodges-Lehmann robust estimator. The invention provides a new robust estimation method, namely a weighted Hodges-Lehmann, which applies wHL to a mathematical optimization model and can process uncertain future events with data pollution. wHL further improves the estimation accuracy and robustness of the robust estimator in the presence of abnormal data, and can be used as an important means for improving robust estimation. The method has the advantages of high operation efficiency, high calculation precision, high robustness and the like.

Description

Novel robust estimation method and device for overcoming pollution data and uncertain events
Technical Field
The invention relates to the technical field of data robust optimization, in particular to a novel robust estimation method and device for overcoming pollution data and uncertain events.
Background
In the process of data statistics, a large amount of highly uncertain information may be contained in the data due to various factors. In addition, the collected data is easily contaminated due to personal preference and error operation, which results in a large deviation between the estimated value and the actual demand. There is therefore a need to propose a robust solution that is less sensitive to outliers (i.e. data contamination).
Disclosure of Invention
Aiming at the problem that the condition that an uncertain event is associated with an abnormal value in the prior art is possibly invalid, the invention provides a novel robust estimation method and a novel robust estimation device for overcoming pollution data and the uncertain event.
In order to solve the technical problems, the invention provides the following technical scheme:
in one aspect, a novel robust estimation method for overcoming pollution data and uncertain events is provided, and the method is applied to electronic equipment and comprises the following steps:
s1: determining pollution data and uncertain data information in a preset mathematical optimization model, and marking the pollution data and the uncertain data information as abnormal values;
s2: introducing Hodges-Lehmann steady estimator into the mathematical optimization model;
s3: and providing a weighted Hodges-Lehmann robust estimator, and associating the abnormal value with the weight of the Hodges-Lehmann robust estimator to complete novel robust estimation for overcoming the uncertain events of the pollution data.
Optionally, in step S1, determining pollution data and uncertain data information in a preset mathematical optimization model, where the data information is marked as an abnormal value, includes:
by collecting data in a predetermined mathematical optimization model, a set S with a plurality of observations is given, where S = { x = { x = 1 ,x 2 ,x 3 ,…x n H, if there is an observed value x in the set S i If the difference from other observed values is large, the observed value x i Marking the polluted data or uncertain data as abnormal values; wherein x is i The phase difference relationship with the set S is: x is a radical of a fluorine atom i <<S/{x i H or x i >>S/{x i }。
Optionally, in step S2, introducing a hodgkins-leihmann robust estimator into the mathematical optimization model, and proposing a weighted Hodges-leihmann, including:
given a set of observation values S, the Hodges-leihmann robust estimator is defined as the following equations (1) and (2):
Figure BDA0003643784810000021
wherein the content of the first and second substances,
Figure BDA0003643784810000022
wherein HL represents Hodges-Lehmann steady estimator of Hodges-Lehmann estimation; h ij Representing new sample values formed by adding the present values to each other.
Optionally, in step S3, proposing a weighted Hodges-Lehmann robust estimator, and associating the outlier with a weight of the Hodges-Lehmann robust estimator, including:
s31: proposing a single-term weighted Hodges-Lehmann robust estimator, associating the abnormal value with the weight of the Hodges-Lehmann robust estimator to obtain a first class wHL estimator, wherein the first class wHL estimator is defined as the median of all pairwise weighted averages of observed values and is represented as wHL;
s32: and (3) providing a two-term weighted Hodges-Lehmann robust estimator, associating the abnormal value with the weight of the Hodges-Lehmann robust estimator to obtain a second class wHL estimator, wherein the second class wHL estimator is defined as taking a weighted median on the basis of a weighted average value and is represented as wHL.
Optionally, in step S31, a one-term weighted Hodges-Lehmann robust estimator is proposed, the outlier is associated with a weight of the Hodges-Lehmann robust estimator, so as to obtain a first class wHL estimator, where the first class wHL estimator is defined as a median of all pairwise weighted averages of the observations, and is denoted as wHL, and includes:
given a value of oneGroup observation x 1 ,x 2 ,x 3 ,…x n Sum weight w 1 ,w 2 ,w 3 ,…w n Wherein
Figure BDA0003643784810000031
Providing a single-term weighted Hodges-Lehmann robust estimator, and associating the abnormal value with the weight of the Hodges-Lehmann robust estimator to obtain a first-class wHL estimator; the first class wHL estimator is defined as the median of all pairwise weighted averages of observations, denoted wHL, as in equation (3) below:
Figure BDA0003643784810000032
wherein the content of the first and second substances,
Figure BDA0003643784810000033
the wHL estimate is calculated for three cases, namely (1) i<j,(2)i≤j,(3)
Figure BDA0003643784810000034
Wherein L is ij Representing sample values formed by adding weighted observations to each other; w is a 1 …w n A weighted value is represented.
Optionally, in step S32, providing a bivariate weighted Hodges-Lehmann robust estimator, associating the outlier with a weight of the Hodges-Lehmann robust estimator to obtain a second class wHL estimator, where the second class wHL estimator is defined as taking a weighted median based on a weighted average, and is denoted as wHL, and includes:
given a set of observations x 1 ,x 2 ,x 3 ,…x n Sum weight w 1 ,w 2 ,w 3 ,…w n Wherein
Figure BDA0003643784810000035
Providing a binomial weighted Hodges-Lehmann robust estimator, and associating the abnormal value with the weight of the Hodges-Lehmann robust estimator to obtain a second class wHL estimator; defining the second class of wHL estimators as weighted averages and taking the weighted median, expressed as wHL, as in equation (5) below:
Figure BDA0003643784810000041
the calculation of wHL2 estimates is done for three cases, namely (1) i<j,(2)i≤j,(3)
Figure BDA0003643784810000042
Optionally, step S3 is followed by:
corresponding fault points are obtained through three conditions of the calculated wHL1 estimator or wHL estimator, and the advantages and the disadvantages of wHL or wHL are evaluated through the fault points.
In one aspect, a novel robust estimation apparatus for overcoming pollution data and uncertain events is provided, the apparatus being adapted for use in any one of the above methods, the apparatus being applied to an electronic device, the apparatus comprising:
the data acquisition module is used for determining pollution data and uncertain data information in a preset mathematical optimization model and marking the pollution data and the uncertain data information as abnormal values;
the robust estimator introducing module is used for introducing Hodgkin-Lehmann estimation Hodges-Lehmann robust estimators into the mathematical optimization model;
and the weighted estimation module is used for providing a weighted Hodges-Lehmann steady estimator, correlating the abnormal value with the weight of the Hodges-Lehmann steady estimator and finishing novel steady estimation for overcoming the uncertain events of the pollution data.
Optionally, a data acquisition module, further for
By collecting data in a predetermined mathematical optimization model, a set S with a plurality of observations is given, where S = { x = { x = 1 ,x 2 ,x 3 ,…x n H, if there is an observed value x in the set S i If the difference from other observed values is large, the observed value x i Marking the polluted data or uncertain data as abnormal values; wherein x is i The phase difference relationship with the set S is: x is the number of i <<S/{x i H or x i >>S/{x i }。
Optionally, the robust estimator importing module is further configured to define the hodgkin-leihmann robust estimator as the following equation (1) given a set of observation values S:
Figure BDA0003643784810000043
wherein, the first and the second end of the pipe are connected with each other,
Figure BDA0003643784810000051
wherein HL represents Hodges-Lehmann steady estimator of Hodges-Lehmann estimation; h ij Representing new sample values formed by adding the present values to each other.
In one aspect, an electronic device is provided, which includes a processor and a memory, where at least one instruction is stored in the memory, and the at least one instruction is loaded and executed by the processor to implement the above-mentioned novel robust estimation method for overcoming pollution data and uncertain events.
In one aspect, a computer-readable storage medium having stored therein at least one instruction, which is loaded and executed by a processor, is provided to implement a novel robust estimation method for overcoming polluting data and uncertain events as described above.
The technical scheme of the embodiment of the invention at least has the following beneficial effects:
in the scheme, the invention provides a novel robust estimation method for overcoming pollution data and uncertain events. The problem of data statistics is solved by applying the method based on weighting Hodges-Lehmann (wHL) to mathematical optimizationThe problems of data pollution and uncertain information exist in the use. Uncertain future events involving data contamination can be handled. Two types of weighted Hodges-Lehmann estimators are presented. The first type wHL estimator (wHL) is the median of all pairwise weighted averages of observations and the second type wHL estimator (wHL 2) is a weighted median based on a weighted average. And consider (1) i<j,(2)i≤j,(3)
Figure BDA0003643784810000052
wHL estimate in three cases.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.
FIG. 1 is a flow chart of a novel robust estimation method for overcoming pollution data and uncertain events according to an embodiment of the present invention;
FIG. 2 is a flow chart of a novel robust estimation method for overcoming contamination data and indeterminate events provided by embodiments of the present invention;
FIG. 3a is a comparison line graph of fault points under i < j for a novel robust estimation method for overcoming contamination data and uncertain events according to an embodiment of the present invention;
FIG. 3b is a comparison line graph of the fault points under the condition of i ≦ j for the novel robust estimation method for overcoming the pollution data and the uncertain events provided by the embodiment of the present invention;
FIG. 3c is a block diagram of a robust estimation method for overcoming contamination data and uncertain events according to an embodiment of the present invention
Figure BDA0003643784810000061
Comparing the line graphs of the fault points under the condition;
FIG. 4 is a block diagram of a novel robust apparatus for overcoming pollution data and uncertain events according to an embodiment of the present invention;
fig. 5 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.
Detailed Description
In order to make the technical problems, technical solutions and advantages of the present invention more apparent, the following detailed description is given with reference to the accompanying drawings and specific embodiments.
The embodiment of the invention provides a novel robust estimation method for overcoming pollution data and uncertain events, which can be realized by electronic equipment, wherein the electronic equipment can be a terminal or a server. As shown in fig. 1, the processing flow of the method may include the following steps:
s101: determining pollution data and uncertain data information in a preset mathematical optimization model, and marking the pollution data and the uncertain data information as abnormal values;
s102: introducing Hodges-Lehmann steady estimator into the mathematical optimization model;
s103: and providing a weighted Hodges-Lehmann robust estimator, and associating the abnormal value with the weight of the Hodges-Lehmann robust estimator to complete novel robust estimation for overcoming the uncertain events of the pollution data.
Optionally, in step S101, determining pollution data and uncertain data information in a preset mathematical optimization model, where the data information is marked as an abnormal value, including:
by collecting data in a predetermined mathematical optimization model, a set S with a plurality of observations is given, where S = { x = { x = 1 ,x 2 ,x 3 ,…x n H, if there is an observed value x in the set S i If the difference from other observed values is large, the observed value x i Marking the polluted data or uncertain data as abnormal values; wherein x is i The phase difference relationship with the set S is: x is the number of i <<S/{x i } or x i >>S/{x i }。
Optionally, in step S102, introducing a hodgkins-leihmann robust estimator into the mathematical optimization model, and proposing a weighted Hodges-leihmann, including:
given a set of observation values S, the Hodges-leihmann robust estimator is defined as the following equation (1):
Figure BDA0003643784810000071
wherein the content of the first and second substances,
Figure BDA0003643784810000072
wherein HL represents Hodges-Lehmann steady estimator of Hodges-Lehmann estimation; h ij Representing new sample values formed by adding the present values to each other.
Optionally, in step S103, proposing a weighted Hodges-Lehmann robust estimator, and associating the outlier with a weight of the Hodges-Lehmann robust estimator, including:
s131: proposing a single-term weighted Hodges-Lehmann robust estimator, associating the abnormal value with the weight of the Hodges-Lehmann robust estimator to obtain a first class wHL estimator, wherein the first class wHL estimator is defined as the median of all pairwise weighted averages of observed values and is represented as wHL;
s132: and (3) proposing a binomial weighted Hodges-Lehmann robust estimator, and associating the abnormal value with the weight of the Hodges-Lehmann robust estimator to obtain a second class wHL estimator, wherein the second class wHL estimator is defined as taking a weighted median on the basis of a weighted average value and is represented as wHL.
Optionally, in step S131, proposing a univariate weighted Hodges-Lehmann robust estimator, associating the outlier with a weight of the Hodges-Lehmann robust estimator to obtain a first class wHL estimator, where the first class wHL estimator is defined as a median of all pairwise weighted averages of the observations, and is denoted as wHL, and includes:
given a set of observations x 1 ,x 2 ,x 3 ,…x n Sum weight w 1 ,w 2 ,w 3 ,…w n In which
Figure BDA0003643784810000081
Providing a single-term weighted Hodges-Lehmann robust estimator, and associating the abnormal value with the weight of the Hodges-Lehmann robust estimator to obtain a first-class wHL estimator; the first class wHL estimator is defined as the median of all pairwise weighted averages of observations, denoted wHL, as in equation (3) below:
Figure BDA0003643784810000082
wherein the content of the first and second substances,
Figure BDA0003643784810000083
the calculation of wHL1 estimates, i.e., (1) i, is performed for three cases<j,(2)i≤j,(3)
Figure BDA0003643784810000084
Wherein L is ij Representing sample values formed by adding weighted observations to each other; w is a 1 …w n A weighted value is represented.
Optionally, in step S132, proposing a binomial weighted Hodges-Lehmann robust estimator, associating the outlier with a weight of the Hodges-Lehmann robust estimator to obtain a second class wHL estimator, where the second class wHL estimator is defined as taking a weighted median based on a weighted average, and is denoted as wHL, and includes:
given a set of observations x 1 ,x 2 ,x 3 ,…x n Sum weight w 1 ,w 2 ,w 3 ,…w n Wherein
Figure BDA0003643784810000091
Propose twoThe method comprises the steps of weighting Hodges-Lehmann robust estimators by terms, and associating the abnormal values with weights of the Hodges-Lehmann robust estimators to obtain a second class wHL estimator; defining the second class of wHL estimators as weighted averages and taking the weighted median, expressed as wHL, as in equation (5) below:
Figure BDA0003643784810000092
the calculation of wHL1 estimates, i.e., (1) i, is performed for three cases<j,(2)i≤j,(3)
Figure BDA0003643784810000093
Optionally, after step S103, the method further includes:
corresponding fault points are obtained through three conditions of the calculated wHL1 estimator or wHL estimator, and the advantages and the disadvantages of wHL or wHL are evaluated through the fault points.
In the embodiment of the invention, as the Hodges-Lehmann (HL) is widely applied in the aspect of improving the robustness, while the uncertain future events are usually related to the weight, the influence of the weight on the uncertain events is easily ignored based on the traditional expected objective function, and the influence is possibly invalid when the situations that the uncertain events are related to the abnormal value are processed; the invention provides a new robust estimation method, namely wHL (weighted Hodges-Lehmann ), wHL is applied to the mathematical optimization model, and the method can process uncertain future events with data pollution, further improve the estimation precision and robustness of the robust estimator when abnormal data exist, and can be used as an important means for improving the robust estimation. The method has the advantages of high operation efficiency, high calculation precision, high robustness and the like.
The embodiment of the invention provides a novel robust estimation method for overcoming pollution data and uncertain events, which can be realized by electronic equipment, wherein the electronic equipment can be a terminal or a server. As shown in fig. 2, a flow chart of a novel robust estimation method for overcoming pollution data and uncertain events, a processing flow of the method may include the following steps:
s201: determining pollution data and uncertain data information in a preset mathematical optimization model, and marking the pollution data and the uncertain data information as abnormal values;
in one possible embodiment, during the data collection and information collection process, data pollution and uncertainty of information, i.e. abnormal values, are caused by manual operation errors or other reasons. The existence of the abnormal value can affect the subsequent data calculation, prediction and the like.
In a possible embodiment, a set S of a plurality of observations is given by collecting data in a predetermined mathematical optimization model, where S = { x = { x = 1 ,x 2 ,x 3 ,…x n H, if there is an observed value x in the set S i If the difference between the observed value xi and other observed values is larger, the observed value xi is pollution data or uncertain data and is marked as an abnormal value; wherein x is i The phase difference relationship with the set S is: x is the number of i <<S/{x i H or x i >>S/{x i }。
S202: introducing Hodges-Lehmann steady estimator into the mathematical optimization model;
in one possible implementation, the robustness is improved by a number of studies using Hodges-Lehmann (HL) estimators, where HL is a compromise between median and mean.
In one possible embodiment, the Hodges-Lehmann robust estimator is defined as the following equation (1) given a set of observation values S:
Figure BDA0003643784810000101
wherein the content of the first and second substances,
Figure BDA0003643784810000102
wherein HL represents Hodges-Lehmann steady estimator for Hodges-Lehmann estimation; h ij Means formed by mutually adding and summing the valuesThe sample value of (2).
In a feasible implementation mode, the HL estimator has the characteristic of being insensitive to abnormal values, the problem that the uncertain events are related to the abnormal values cannot be solved based on a traditional expected objective function, and then the HL estimator can be applied to a mathematical optimization model to process uncertain future events with data pollution. But the contingency is usually related to a weight, whereas the traditional HL estimator does not involve this factor. The present invention proposes a new robust estimation, namely weighted Hodges-Lehmann (wHL), to solve the above-mentioned problem.
S203: and providing a single-term weighted Hodges-Lehmann robust estimator, and associating the abnormal value with the weight of the Hodges-Lehmann robust estimator to obtain a first class wHL estimator, wherein the first class wHL estimator is defined as the median of all pairwise weighted averages of the observed values and is represented as wHL.
In one possible embodiment, a set of observations x is given 1 ,x 2 ,x 3 ,…x n And a weighting value w 1 ,w 2 ,w 3 ,…w n Wherein
Figure BDA0003643784810000111
Providing a single-term weighted Hodges-Lehmann robust estimator, and associating the abnormal value with the weight of the Hodges-Lehmann robust estimator to obtain a first-class wHL estimator; the first class wHL estimator is defined as the median of all pairwise weighted averages of observations, denoted wHL, as in equation (3) below:
Figure BDA0003643784810000112
wherein the content of the first and second substances,
Figure BDA0003643784810000113
the calculation of wHL1 estimates, i.e., (1) i, is performed for three cases<j,(2)i≤j,(3)
Figure BDA0003643784810000114
Wherein L is ij Representing sample values formed by adding weighted observations to each other; w is a 1 …w n A weighted value is represented.
Wherein, three conditions are respectively as follows:
Figure BDA0003643784810000115
Figure BDA0003643784810000116
Figure BDA0003643784810000117
s204: and (3) proposing a binomial weighted Hodges-Lehmann robust estimator, and associating the abnormal value with the weight of the Hodges-Lehmann robust estimator to obtain a second class wHL estimator, wherein the second class wHL estimator is defined as taking a weighted median on the basis of a weighted average value and is represented as wHL.
In a possible embodiment, a set of observations x is given 1 ,x 2 ,x 3 ,…x n Sum weight w 1 ,w 2 ,w 3 ,…w n Wherein
Figure BDA0003643784810000118
Providing a binomial weighted Hodges-Lehmann robust estimator, and associating the abnormal value with the weight of the Hodges-Lehmann robust estimator to obtain a first-class wHL estimator; defining the second class of wHL estimators as weighted averages and taking the weighted median, expressed as wHL, as in equation (5) below:
Figure BDA0003643784810000119
calculating wH for three casesL1 estimator, i.e. (1) i<j,(2)i≤j,(3)
Figure BDA0003643784810000121
Wherein, the estimators under wHL2 are:
Figure BDA0003643784810000122
Figure BDA0003643784810000123
Figure BDA0003643784810000124
in a possible implementation, S204 further includes:
corresponding fault points are obtained by calculating wHL1 estimators and wHL estimators, and evaluating wHL and wHL through the fault points.
In one possible implementation, the points of failure are used to evaluate the merits of wHL1 and wHL. The formula derivation obtains wHL estimators corresponding to failure points in three cases, and wHL1 and HL have the same failure point in the corresponding three cases. After normalization operation is carried out on the second-class wHL estimator, fault points under three conditions are calculated. And obtains its upper and lower bounds by considering the best case and the worst case.
As shown in fig. 3a-3b, the fault point comparison is for different robust estimates. The second class wHL estimator was compared to the traditional method for failure points in three different cases. The median, HL, and first class wHL estimates have fixed points of failure given the sample size. While the weighted median and the second class of wHL estimators have a different fault point than the previous estimates, as shown in fig. 3a-3c, the second class of estimators have a more stable range of fault points when the sample size is increased from 1 to 20 compared to the weighted median. And the lower bound of the fault point of the second-class wHL estimator is higher than that of other methods, the pollution data only accounts for a small part of the overall data, and the robustness of the newly-proposed wHL estimator method is proved.
In the embodiment of the invention, the method applies the weighted Hodges-Lehmann (wHL) to mathematical optimization, and solves the problems of data pollution and uncertain information in data statistics application. Uncertain future events involving data contamination can be handled.
Two types of weighted Hodges-Lehmann estimators are presented. The first class wHL estimator (wHL) is the median of all pairwise weighted averages of observations and the second class wHL estimator (wHL 2) is the weighted median based on the weighted average. And consider (1) i<j,(2)i≤j,(3)
Figure BDA0003643784810000125
wHL estimate in three cases.
FIG. 4 is a block diagram illustrating a novel robust estimation apparatus for overcoming dirty data and uncertain events, according to an exemplary embodiment. The apparatus is suitable for use in any of the above methods. Referring to fig. 4, the apparatus 300 includes:
the data acquisition module 310 is configured to determine pollution data and uncertain data information in a preset mathematical optimization model, and mark the pollution data and uncertain data information as abnormal values;
a robust estimator introducing module 320, configured to introduce a hodgks-Lehmann robust estimator into the mathematical optimization model;
and the weighted estimation module 330 is used for providing a weighted Hodges-Lehmann robust estimator, associating the abnormal value with the weight of the Hodges-Lehmann robust estimator, and completing novel robust estimation for overcoming the uncertain events of the pollution data.
Optionally, the data obtaining module 310 is further configured to obtain, by collecting data in a preset mathematical optimization model, given a set S of a plurality of observations, S = { x = { n } { 1 ,x 2 ,x 3 ,…x n H, if there is an observed value x in the set S i If the difference from other observed values is large, the observed value x i As pollution data orUncertain data, marked as outliers; wherein x is i The phase difference relationship with the set S is: x is the number of i <<S/{x i H or x i >>S/{x i }。
Optionally, the robust estimator introducing module 320 is further configured to define the hodgkin-leihmann robust estimator as the following equation (1) given a set of observation values S:
Figure BDA0003643784810000131
wherein, the first and the second end of the pipe are connected with each other,
Figure BDA0003643784810000132
wherein HL represents Hodges-Lehmann steady estimator of Hodges-Lehmann estimation; h ij Representing new sample values formed by adding the present values to each other.
Optionally, the weighted estimation module 330 is further configured to propose a single-term weighted Hodges-Lehmann robust estimator, associate the abnormal value with a weight of the Hodges-Lehmann robust estimator, and obtain a first class wHL estimator, where the first class wHL estimator is defined as a median of all pairwise weighted averages of the observed values, and is denoted as wHL;
and (3) proposing a binomial weighted Hodges-Lehmann robust estimator, and associating the abnormal value with the weight of the Hodges-Lehmann robust estimator to obtain a second class wHL estimator, wherein the second class wHL estimator is defined as taking a weighted median on the basis of a weighted average value and is represented as wHL.
Optionally, a weighted estimation module 330, further for giving a set of observations x 1 ,x 2 ,x 3 ,…x n Sum weight w 1 ,w 2 ,w 3 ,…w n Wherein
Figure BDA0003643784810000141
Proposing a single-term weighted Hodges-Lehmann robust estimator, and carrying out the estimationThe abnormal value is associated with the weight of the Hodges-Lehmann robust estimator to obtain a first-class wHL estimator; the first class wHL estimator is defined as the median of all pairwise weighted averages of observations, denoted wHL, as in equation (3) below:
Figure BDA0003643784810000142
wherein the content of the first and second substances,
Figure BDA0003643784810000143
the calculation of wHL1 estimates, i.e., (1) i, is performed for three cases<j,(2)i≤j,(3)
Figure BDA0003643784810000144
Wherein L is ij Representing sample values formed by mutually adding weighted observations; w is a 1 …w n A weighted value is represented.
Optionally, a weighted estimation module 330, further for giving a set of observations x 1 ,x 2 ,x 3 ,…x n Sum weight w 1 ,w 2 ,w 3 ,…w n Wherein
Figure BDA0003643784810000145
Providing a binomial weighted Hodges-Lehmann robust estimator, and associating the abnormal value with the weight of the Hodges-Lehmann robust estimator to obtain a first-class wHL estimator; defining the second class of wHL estimators as weighted averages and taking the weighted median, expressed as wHL, as in equation (5) below:
Figure BDA0003643784810000146
the calculation of wHL1 estimates, i.e., (1) i, is performed for three cases<j,(2)i≤j,(3)
Figure BDA0003643784810000147
Optionally, the apparatus further comprises: the effect comparison module 340 obtains corresponding fault points through three conditions of the calculated wHL estimator or wHL estimator, and evaluates the advantages and disadvantages of wHL or wHL through the fault points.
Fig. 5 is a schematic structural diagram of an electronic device 400 according to an embodiment of the present invention, where the electronic device 400 may generate relatively large differences due to different configurations or performances, and may include one or more processors (CPUs) 401 and one or more memories 402, where at least one instruction is stored in the memory 402, and the at least one instruction is loaded and executed by the processor 401 to implement the following steps of the novel robust estimation method for overcoming pollution data and uncertain events:
s1: the method comprises the steps of determining pollution data and uncertain data information in a preset mathematical optimization model, and marking the pollution data and the uncertain data information as abnormal values;
s2: introducing Hodges-Lehmann steady estimator into the mathematical optimization model;
s3: and providing a weighted Hodges-Lehmann robust estimator, and associating the abnormal value with the weight of the Hodges-Lehmann robust estimator to complete novel robust estimation for overcoming the uncertain events of the pollution data.
In an exemplary embodiment, a computer-readable storage medium, such as a memory, is also provided that includes instructions executable by a processor in a terminal to perform the novel robust estimation method to overcome contaminating data and uncertain events described above. For example, the computer readable storage medium may be a ROM, a Random Access Memory (RAM), a CD-ROM, a magnetic tape, a floppy disk, an optical data storage device, and the like.
It will be understood by those skilled in the art that all or part of the steps for implementing the above embodiments may be implemented by hardware, or may be implemented by a program instructing relevant hardware, where the program may be stored in a computer-readable storage medium, and the above-mentioned storage medium may be a read-only memory, a magnetic disk or an optical disk, etc.
The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like that fall within the spirit and principle of the present invention are intended to be included therein.

Claims (7)

1. A novel robust estimation method for overcoming pollution data and uncertain events is characterized by comprising the following steps:
s1: determining pollution data and uncertain data information in a preset mathematical optimization model, and marking the pollution data and the uncertain data information as abnormal values;
s2: introducing Hodges-Lehmann steady estimator into the mathematical optimization model;
s3: providing a weighted Hodges-Lehmann robust estimator, associating the abnormal value with the weight of the Hodges-Lehmann robust estimator, and finishing novel robust estimation for overcoming the uncertain events of the pollution data;
in step S3, providing a weighted Hodges-Lehmann robust estimator, and associating the abnormal value with a weight of the Hodges-Lehmann robust estimator, including:
s31: proposing a single-item weighted Hodges-Lehmann robust estimator, associating the abnormal value with the weight of the Hodges-Lehmann robust estimator to obtain a first-class wHL estimator, wherein the first-class wHL estimator is defined as the median of all pairwise weighted averages of observed values and is represented as wHL;
s32: proposing a binomial weighted Hodges-Lehmann robust estimator, associating the abnormal value with the weight of the Hodges-Lehmann robust estimator to obtain a second class wHL estimator, wherein the second class wHL estimator is defined as taking a weighted median on the basis of a weighted average value and is represented as wHL;
in step S31, a single-term weighted Hodges-Lehmann robust estimator is proposed, the abnormal value is associated with a weight of the Hodges-Lehmann robust estimator, a first class wHL estimator is obtained, the first class wHL estimator is defined as a median of all pairwise weighted averages of the observed values, and is represented as wHL, and the method includes:
given a set of observations x 1 ,x 2 ,x 3 ,…x n Sum weight w 1 ,w 2 ,w 3 ,…w n Wherein
Figure FDA0003907936920000011
Providing a single-term weighted Hodges-Lehmann robust estimator, and associating the abnormal value with the weight of the Hodges-Lehmann robust estimator to obtain a first-class wHL estimator; the first class wHL estimator is defined as the median of all pairwise weighted averages of observations, denoted wHL, as in equation (3) below:
Figure FDA0003907936920000021
wherein the content of the first and second substances,
Figure FDA0003907936920000022
the calculation of wHL1 estimates, i.e., (1) i, is performed for three cases<j,(2)i≤j,
Figure FDA0003907936920000023
Wherein L is ij Representing sample values formed by adding weighted observations to each other; w is a 1 …w n Representing a weighted value;
in step S32, a binomial weighted Hodges-Lehmann robust estimator is proposed, the abnormal value is associated with a weight of the Hodges-Lehmann robust estimator to obtain a second class wHL estimator, the second class wHL estimator is defined as taking a weighted median based on a weighted average, and is represented as wHL, and the method includes:
given a set of observations x 1 ,x 2 ,x 3 ,…x n Sum weight w 1 ,w 2 ,w 3 ,…w n Wherein
Figure FDA0003907936920000024
Providing a binomial weighted Hodges-Lehmann robust estimator, and associating the abnormal value with the weight of the Hodges-Lehmann robust estimator to obtain a second class wHL estimator; defining the second class of wHL estimators as weighted averages and taking the weighted median, expressed as wHL, as in equation (5) below:
Figure FDA0003907936920000025
the calculation of wHL1 estimates, i.e., (1) i, is performed for three cases<j,(2)i≤j,
Figure FDA0003907936920000026
2. The method according to claim 1, wherein in step S1, determining pollution data and uncertain data information in a preset mathematical optimization model, and marking the pollution data and uncertain data information as abnormal values comprises:
by collecting data in a predetermined mathematical optimization model, a set S with a plurality of observations is given, where S = { x = { x = 1 ,x 2 ,x 3 ,…x n H, if there is an observed value x in the set S i If the difference from other observed values is large, the observed value x i Marking the polluted data or uncertain data as abnormal values; wherein x is i The phase difference relationship with the set S is: x is the number of i <<S/{x i H or x i >>S/{x i }。
3. The method according to claim 1, wherein in step S2, introducing a hodgkins-Lehmann estimation Hodges-Lehmann robust estimator into the mathematical optimization model, and proposing a weighted Hodges-Lehmann, comprises:
given a set of observation values S, the Hodges-leihmann robust estimator is defined as the following equation (1):
Figure FDA0003907936920000031
wherein, the first and the second end of the pipe are connected with each other,
Figure FDA0003907936920000032
wherein HL represents Hodges-Lehmann steady estimator of Hodges-Lehmann estimation; h ij Representing new sample values formed by adding sample values to each other and then.
4. The method according to claim 1, wherein the step S3 is further followed by:
corresponding fault points are obtained through three conditions of the calculated wHL1 estimator or wHL estimator, and the advantages and the disadvantages of wHL or wHL are evaluated through the fault points.
5. A new robust estimation device to overcome polluting data and uncertain events, characterized in that it is adapted to the method of any of the preceding claims 1 to 4, the device comprising:
the data acquisition module is used for determining pollution data and uncertain data information in a preset mathematical optimization model and marking the pollution data and the uncertain data information as abnormal values;
the robust estimator introducing module is used for introducing Hodgkin-Lehmann estimation Hodges-Lehmann robust estimators into the mathematical optimization model;
the weighted estimation module is used for providing a weighted Hodges-Lehmann steady estimator, associating the abnormal value with the weight of the Hodges-Lehmann steady estimator, and finishing novel steady estimation under the conditions of information uncertainty and data pollution overcoming;
the weight estimation module is further used for proposing a single-term weighted Hodges-Lehmann robust estimator, correlating the abnormal value with the weight of the Hodges-Lehmann robust estimator to obtain a first class wHL estimator, wherein the first class wHL estimator is defined as the median of all pairwise weighted averages of observed values and is represented as wHL;
proposing a binomial weighted Hodges-Lehmann robust estimator, associating the abnormal value with the weight of the Hodges-Lehmann robust estimator to obtain a second class wHL estimator, wherein the second class wHL estimator is defined as taking a weighted median on the basis of a weighted average value and is represented as wHL;
a weighted estimation module for further giving a set of observations x 1 ,x 2 ,x 3 ,…x n Sum weight w 1 ,w 2 ,w 3 ,…w n Wherein
Figure FDA0003907936920000041
Providing a single-term weighted Hodges-Lehmann robust estimator, and associating the abnormal value with the weight of the Hodges-Lehmann robust estimator to obtain a first-class wHL estimator; the first class wHL estimator is defined as the median of all pairwise weighted averages of observations, denoted wHL, as in equation (3) below:
Figure FDA0003907936920000042
wherein the content of the first and second substances,
Figure FDA0003907936920000043
the wHL estimate is calculated for three cases, namely (1) i<j,(2)i≤j,
Figure FDA0003907936920000044
Wherein L is ij Representing sample values formed by adding weighted observations to each other; w is a 1 …w n Representing a weighted value;
a weighted estimation module for further giving a set of observations x 1 ,x 2 ,x 3 ,…x n Sum weight w 1 ,w 2 ,w 3 ,…w n Wherein
Figure FDA0003907936920000045
Providing a binomial weighted Hodges-Lehmann robust estimator, and associating the abnormal value with the weight of the Hodges-Lehmann robust estimator to obtain a first-class wHL estimator; defining the second class of wHL estimators as weighted averages and taking the weighted median, expressed as wHL, as in equation (5) below:
Figure FDA0003907936920000051
the calculation of wHL1 estimates, i.e., (1) i, is performed for three cases<j,(2)i≤j,
Figure FDA0003907936920000052
6. The apparatus of claim 5, wherein the data acquisition module is further configured to obtain the data
By collecting data in a predetermined mathematical optimization model, a set S with a plurality of observations is given, where S = { x = { x = 1 ,x 2 ,x 3 ,…x n H, if there is an observed value x in the set S i If the difference from other observed values is large, the observed value x i Marking the polluted data or uncertain data as abnormal values; wherein x is i The phase difference relationship with the set S is: x is the number of i <<S/{x i } or x i >>S/{x i }。
7. The apparatus according to claim 6, wherein the robust estimator inletting module is further configured to define a Hodges-Lehmann robust estimator for Hodges-Lehmann estimation given a set of observation values S as given by equation (1):
Figure FDA0003907936920000053
wherein the content of the first and second substances,
Figure FDA0003907936920000054
wherein HL represents Hodges-Lehmann steady estimator of Hodges-Lehmann estimation; h ij Representing new sample values formed by adding sample values to each other and then.
CN202210524867.1A 2022-05-13 2022-05-13 Novel robust estimation method and device for overcoming pollution data and uncertain events Active CN114925321B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210524867.1A CN114925321B (en) 2022-05-13 2022-05-13 Novel robust estimation method and device for overcoming pollution data and uncertain events

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210524867.1A CN114925321B (en) 2022-05-13 2022-05-13 Novel robust estimation method and device for overcoming pollution data and uncertain events

Publications (2)

Publication Number Publication Date
CN114925321A CN114925321A (en) 2022-08-19
CN114925321B true CN114925321B (en) 2022-12-06

Family

ID=82808233

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210524867.1A Active CN114925321B (en) 2022-05-13 2022-05-13 Novel robust estimation method and device for overcoming pollution data and uncertain events

Country Status (1)

Country Link
CN (1) CN114925321B (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103336904A (en) * 2013-07-08 2013-10-02 国家电网公司 Robust state estimation method based on piecewise linearity weight factor function
CN104615875A (en) * 2015-01-27 2015-05-13 中国林业科学研究院资源信息研究所 Stable regression method for remote sensing individual tree canopy and forest diameter
CN112182483A (en) * 2020-08-14 2021-01-05 中国电力科学研究院有限公司 Solar radiation prediction method and device based on air quality index

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105718632B (en) * 2016-01-14 2018-10-19 河海大学 A kind of uncertainty underground water repairs multiple-objection optimization management method
CN107315918B (en) * 2017-07-06 2020-05-01 青岛大学 Method for improving steady estimation by using noise
CN107766607B (en) * 2017-09-04 2021-02-05 电子科技大学 Robust design method for transmitting and receiving extended target detection
JP6889096B2 (en) * 2017-12-12 2021-06-18 株式会社東芝 Learning model manufacturing method, pollution density calculation method and pollution density calculation device
CN111651708B (en) * 2020-05-29 2022-05-20 四川大学 Early warning threshold setting method for abnormal identification of dam safety monitoring data
CN113030007B (en) * 2021-02-10 2023-04-18 河南中烟工业有限责任公司 Method for rapidly testing quality stability of tobacco essence based on similarity learning algorithm

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103336904A (en) * 2013-07-08 2013-10-02 国家电网公司 Robust state estimation method based on piecewise linearity weight factor function
CN104615875A (en) * 2015-01-27 2015-05-13 中国林业科学研究院资源信息研究所 Stable regression method for remote sensing individual tree canopy and forest diameter
CN112182483A (en) * 2020-08-14 2021-01-05 中国电力科学研究院有限公司 Solar radiation prediction method and device based on air quality index

Also Published As

Publication number Publication date
CN114925321A (en) 2022-08-19

Similar Documents

Publication Publication Date Title
CN111046564B (en) Residual life prediction method for two-stage degraded product
US9159030B1 (en) Refining location detection from a query stream
CN113048920B (en) Method and device for measuring flatness of industrial structural part and electronic equipment
CN110287537B (en) Wild value resistant self-adaptive Kalman filtering method for frequency standard output jump detection
CN116307215A (en) Load prediction method, device, equipment and storage medium of power system
CN115932586A (en) Method, device, equipment and medium for estimating state of charge of battery on line
JP2001280599A (en) Service life prediction method for power generation plant piping
CN111680398B (en) Single machine performance degradation prediction method based on Holt-windows model
CN112100574A (en) Resampling-based AAKR model uncertainty calculation method and system
CN113935535A (en) Principal component analysis method for medium-and-long-term prediction model
CN114925321B (en) Novel robust estimation method and device for overcoming pollution data and uncertain events
CN113379168B (en) Time series prediction processing method, device and equipment
CN111859289B (en) Traffic tool transaction conversion rate estimation method and device, electronic equipment and medium
CN117610195A (en) Strip steel expansion prediction method, device, medium and electronic equipment
CN105787283A (en) Earthen site monitoring data correcting and fitting method based on spatial and temporal correlation
CN116299015B (en) Battery state evaluation method, battery state evaluation device, electronic equipment and storage medium
CN117079737A (en) Polishing solution prediction method and device based on component analysis
CN111259338B (en) Component failure rate correction method and device, computer equipment and storage medium
CN113127803A (en) Method and device for establishing service cluster capacity estimation model and electronic equipment
CN115006921B (en) Method, device, equipment and medium for predicting service life of air filter
CN116186017A (en) Big data collaborative supervision method and platform
CN113065234B (en) Batch reliability risk level assessment method and system for intelligent electric meters
CN112988536B (en) Data anomaly detection method, device, equipment and storage medium
CN111929585B (en) Battery charge state calculating device, method, server and medium
CN114511225A (en) Section identification method, device, equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant