CN111259318A - Intelligent data optimization method and device and computer readable storage medium - Google Patents

Intelligent data optimization method and device and computer readable storage medium Download PDF

Info

Publication number
CN111259318A
CN111259318A CN202010068234.5A CN202010068234A CN111259318A CN 111259318 A CN111259318 A CN 111259318A CN 202010068234 A CN202010068234 A CN 202010068234A CN 111259318 A CN111259318 A CN 111259318A
Authority
CN
China
Prior art keywords
data set
data
cost
value
optimization
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010068234.5A
Other languages
Chinese (zh)
Inventor
王海平
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ping An Technology Shenzhen Co Ltd
Original Assignee
Ping An Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ping An Technology Shenzhen Co Ltd filed Critical Ping An Technology Shenzhen Co Ltd
Priority to CN202010068234.5A priority Critical patent/CN111259318A/en
Publication of CN111259318A publication Critical patent/CN111259318A/en
Priority to PCT/CN2020/098964 priority patent/WO2021143055A1/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/11Complex mathematical operations for solving equations, e.g. nonlinear equations, general mathematical optimization problems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/11Complex mathematical operations for solving equations, e.g. nonlinear equations, general mathematical optimization problems
    • G06F17/13Differential equations

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Computational Mathematics (AREA)
  • Mathematical Analysis (AREA)
  • Mathematical Optimization (AREA)
  • Pure & Applied Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Theoretical Computer Science (AREA)
  • Operations Research (AREA)
  • Databases & Information Systems (AREA)
  • Software Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Algebra (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention relates to an artificial intelligence technology, and discloses an intelligent data optimization method, which comprises the following steps: receiving a data optimization instruction input by a user, extracting an original data set from a big data storage platform, performing exception removal processing on the original data set to obtain a standard data set, performing grey prediction on the standard data set to obtain a statistical information set, calculating a cost value of the statistical information set to obtain a cost data set, eliminating data which is greater than or equal to a preset cost threshold value in the cost data set to obtain an optimized cost data set, performing data range modification operation on the optimized cost data set to obtain an optimal data set, and storing the optimal data set into the big data storage platform to complete data optimization operation. The invention also provides an intelligent data optimization device and a computer readable storage medium. The invention can realize the high-efficiency intelligent data optimization function.

Description

Intelligent data optimization method and device and computer readable storage medium
Technical Field
The present invention relates to the field of artificial intelligence technologies, and in particular, to an intelligent data optimization method, apparatus, and computer-readable storage medium.
Background
At present, for data optimization, manual experience and big data optimization models such as Hadoop are mostly depended, but the manual experience is difficult to accumulate scientifically optimized models according to the data conditions of companies, in other words, an automatic optimization mechanism and an optimization model are lacked, so that developers can complete data optimization more quickly, and the big data optimization model has high requirements on hardware due to the fact that the big data optimization model needs ultra-strong hardware capability of data expansion and unstructured data support, so that a data optimization method with high cost performance is urgently needed to solve.
Disclosure of Invention
The invention provides an intelligent data optimization method, an intelligent data optimization device and a computer readable storage medium, and mainly aims to perform intelligent data optimization according to user optimization requirements.
In order to achieve the above object, the present invention provides an intelligent data optimization method, which includes:
receiving a data optimization instruction input by a user, extracting an original data set from a big data storage platform, and performing exception removal processing on the original data set to obtain a standard data set;
performing grey prediction on the standard data set to obtain a statistical information set;
calculating the cost value of the statistical information set to obtain a cost data set;
eliminating data which are greater than or equal to a preset cost threshold value in the cost data set to obtain an optimized cost data set;
and carrying out data range modification operation on the optimized cost data set to obtain an optimal data set, and storing the optimal data set into the big data storage platform to complete data optimization operation.
Optionally, the exception removing processing includes bilateral test rejection processing and unilateral test rejection processing, and the unilateral test rejection processing includes minimum test rejection processing and maximum test rejection processing;
the calculation method of the bilateral test rejection processing comprises the following steps:
Figure BDA0002373976260000021
wherein, i is a positive integer,
Figure BDA0002373976260000022
represents the mean of the original data set, S represents the standard deviation of the original data set, YiRepresenting data within said original data set, G1And the value of the bilateral test rejection processing is obtained.
The calculation method of the minimum test rejection processing comprises the following steps:
Figure BDA0002373976260000023
G2testing the value after the elimination processing for the minimum value;
the calculation method of the maximum value test rejection processing comprises the following steps:
Figure BDA0002373976260000024
G3and testing the value after the elimination processing for the maximum value.
Optionally, performing gray prediction on the standard data set to obtain a statistical information set, including:
counting historical data of the standard data set according to a sampling statistical method to obtain a historical data set;
adding the historical data set and the standard data set to obtain a total data set;
and establishing a differential equation according to the total data set, and solving the differential equation to obtain a statistical information set.
Optionally, the differential equation is:
Figure BDA0002373976260000025
wherein, X(2)Representing the total data set, s being a data number of the total data set, a being a constraint factor of the differential equation, and u being a target value of the differential equation.
Optionally, the calculating the cost value of the statistical information set may obtain a cost data set, including:
fully arranging the statistical information set to obtain a plurality of fully arranged values;
calculating the cost values of the full arrangement values according to a pre-constructed cost function;
and selecting the arrangement data set corresponding to the full arrangement value with the minimum cost value to obtain a cost data set.
In addition, in order to achieve the above object, the present invention further provides an intelligent data optimization device, which includes a memory and a processor, wherein the memory stores an intelligent data optimization program that can run on the processor, and the intelligent data optimization program, when executed by the processor, implements the following steps:
receiving a data optimization instruction input by a user, extracting an original data set from a big data storage platform, and performing exception removal processing on the original data set to obtain a standard data set;
performing grey prediction on the standard data set to obtain a statistical information set;
calculating the cost value of the statistical information set to obtain a cost data set;
eliminating data which are greater than or equal to a preset cost threshold value in the cost data set to obtain an optimized cost data set;
and carrying out data range modification operation on the optimized cost data set to obtain an optimal data set, and storing the optimal data set into the big data storage platform to complete data optimization operation.
Optionally, the exception removing processing includes bilateral test rejection processing and unilateral test rejection processing, and the unilateral test rejection processing includes minimum test rejection processing and maximum test rejection processing;
the calculation method of the bilateral test rejection processing comprises the following steps:
Figure BDA0002373976260000031
wherein, i is a positive integer,
Figure BDA0002373976260000032
represents the mean of the original data set, S represents the standard deviation of the original data set, YiRepresenting data within said original data set, G1And the value of the bilateral test rejection processing is obtained.
The calculation method of the minimum test rejection processing comprises the following steps:
Figure BDA0002373976260000033
G2testing the value after the elimination processing for the minimum value;
the calculation method of the maximum value test rejection processing comprises the following steps:
Figure BDA0002373976260000034
G3and testing the value after the elimination processing for the maximum value.
Optionally, performing gray prediction on the standard data set to obtain a statistical information set, including:
counting historical data of the standard data set according to a sampling statistical method to obtain a historical data set;
adding the historical data set and the standard data set to obtain a total data set;
and establishing a differential equation according to the total data set, and solving the differential equation to obtain a statistical information set.
Optionally, the calculating the cost value of the statistical information set may obtain a cost data set, including:
fully arranging the statistical information set to obtain a plurality of fully arranged values;
calculating the cost values of the full arrangement values according to a pre-constructed cost function;
and selecting the arrangement data set corresponding to the full arrangement value with the minimum cost value to obtain a cost data set.
In addition, to achieve the above object, the present invention also provides a computer readable storage medium, on which an intelligent data optimization program is stored, the intelligent data optimization program being executable by one or more processors to implement the steps of the intelligent data optimization method as described above.
According to the invention, the statistical information set is obtained through grey prediction, the cost value of the data set is calculated to obtain the cost data set, and the optimal data set is obtained through data range modification operation. Therefore, the intelligent data optimization method, the intelligent data optimization device and the computer readable storage medium can realize an efficient data optimization function.
Drawings
Fig. 1 is a schematic flow chart of an intelligent data optimization method according to an embodiment of the present invention;
fig. 2 is a schematic diagram of an internal structure of an intelligent data optimization apparatus according to an embodiment of the present invention;
fig. 3 is a block diagram illustrating an intelligent data optimization program in the intelligent data optimization apparatus according to an embodiment of the present invention.
The implementation, functional features and advantages of the objects of the present invention will be further explained with reference to the accompanying drawings.
Detailed Description
It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
The invention provides an intelligent data optimization method. Fig. 1 is a schematic flow chart of an intelligent data optimization method according to an embodiment of the present invention. The method may be performed by an apparatus, which may be implemented by software and/or hardware.
In this embodiment, the intelligent data optimization method includes:
and S1, receiving a data optimization instruction input by a user, extracting an original data set from the big data storage platform, and performing exception removal processing on the original data set to obtain a standard data set.
The big data storage platform is a framework or platform which stores and processes a large amount of data, such as mapreduce, hive, spark, and the like.
The original data set is a data set which needs to be optimized, such as life insurance data input by a user, and the specifications, data volume, final application and the like of the life insurance data are different, so that the methods are different in a storage mode, a data calculation mode and the like, and data optimization is needed.
The exception removing processing is an operation for removing the missing, repeated and other abnormal data in the original data set to obtain standard data. And the exception removing treatment comprises bilateral test elimination and unilateral test elimination. The unilateral test rejection comprises minimum test rejection and maximum test rejection, and the bilateral test rejection data formula is as follows:
Figure BDA0002373976260000051
wherein, i is a positive integer,
Figure BDA0002373976260000052
represents the mean of the original data set, S represents the standard deviation of the original data set, YiRepresenting data within the original data set.
The formula of the minimum test rejection is as follows:
Figure BDA0002373976260000053
the formula of the maximum test rejection is as follows:
Figure BDA0002373976260000054
and S2, performing gray prediction on the standard data set to obtain a statistical information set.
Preferably, the purpose of the gray prediction is to evaluate and process the concurrency and the optimal resource configuration of the standard data set according to the currently input running condition and the resource use condition of the standard data set and the historical data task, such as the use rate and the running time in the aspects of CPU, memory, disk, network IO, and the like, so as to obtain the statistical data set.
Further, the S2 includes: and counting historical data of the standard data set according to a sampling statistical method to obtain a historical data set, adding the historical data set and the standard data set to obtain a total data set, establishing a differential equation according to the total data set, and solving the differential equation to obtain a statistical information set.
In detail, the differential equation is established as follows:
X(0)={X(0)(i),i=1,2,3,…,n}
wherein, X(0)Representing the standard data set, and the historical data set is the data volume of the standard data set:
X(1)={X(1)(k),k=1,2,3,…,t}
the total dataset is X(2)(k)
Figure BDA0002373976260000061
For the total data set X(2)(k) Establishing a differential equation:
Figure BDA0002373976260000062
wherein s is the data number of the total data set, a is a constraint factor of the differential equation, and u is a target value of the differential equation. The solution to the above differential equation is:
Figure BDA0002373976260000063
or
Figure BDA0002373976260000066
Wherein k represents a data number of the standard data set.
And S3, calculating the cost value of the statistical information set to obtain a cost data set.
S3 is to calculate the cost (i.e., cost) of each execution mode according to the statistical information set, and then select an optimal execution mode, where the execution mode includes a storage mode and a data calculation mode.
Further, the S3 includes: and receiving the statistical information set, carrying out full arrangement on the statistical information set to obtain a plurality of full arrangement values, calculating the cost values of the full arrangement values according to a pre-constructed cost function, and selecting an arrangement data set corresponding to the full arrangement value with the minimum cost value to obtain a cost data set.
In detail, the full permutation value y is:
Figure BDA0002373976260000064
wherein n! Representing a permutation and combination of said set of statistical information, rk| A Representing and arranging data traversing the set of statistical information.
Preferably, the cost function is:
Figure BDA0002373976260000065
wherein N represents a specific number of the plurality of full permutation values, ygoalTarget value, y, representing a preset full alignment valueiRepresenting said plurality of fully aligned values, L representing an objective function, preferably using a gradient descent algorithm, J (y)i) A penalty function is represented and p represents an adjustment factor.
And S4, eliminating the data which are greater than or equal to a preset cost threshold value in the cost data set to obtain an optimized cost data set.
Preferably, if the preset cost threshold is 0.8, if the data in the cost data set is greater than or equal to the cost threshold 0.8, the data is rejected, and if the data in the cost data set is less than the cost threshold 0.8, the data is retained.
And S5, performing data range modification operation on the optimized cost data set to obtain an optimal data set, and storing the optimal data set into the big data storage platform to complete data optimization operation.
Preferably, the data range modification operation includes methods of partition pruning, distribution pull-up, distribution pull-down, distribution alignment, and the like.
Further, if the user feels that the data distribution of the optimized cost data set is complex, the scheme can perform partition pruning on the optimized cost data set according to the CART algorithm or other pruning algorithms, so that the data distribution is simple and convenient; if the data distribution of the optimized cost data set is dispersed and a user needs to centralize data, the distribution pull-up operation can be carried out, and the optimized cost data set is mapped in a data interval; if the data distribution of the optimized cost data set is huge, performing the distribution push-down to map the optimized cost data set into a reasonable data interval; and if the structure of the data of the optimized cost data set on the data arrangement is incomplete, performing the distribution alignment to ensure that the structure of the data distribution is more complete.
The invention also provides an intelligent data optimization device. Fig. 2 is a schematic diagram of an internal structure of an intelligent data optimization apparatus according to an embodiment of the present invention.
In this embodiment, the intelligent data optimization device 1 may be a PC (Personal Computer), a terminal device such as a smart phone, a tablet Computer, or a mobile Computer, or may be a server. The intelligent data optimization device 1 comprises at least a memory 11, a processor 12, a communication bus 13, and a network interface 14.
The memory 11 includes at least one type of readable storage medium, which includes a flash memory, a hard disk, a multimedia card, a card type memory (e.g., SD or DX memory, etc.), a magnetic memory, a magnetic disk, an optical disk, and the like. The memory 11 may in some embodiments be an internal storage unit of the intelligent data optimization device 1, such as a hard disk of the intelligent data optimization device 1. The memory 11 may also be an external storage device of the intelligent data optimization device 1 in other embodiments, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), and the like, provided on the intelligent data optimization device 1. Further, the memory 11 may also comprise both an internal storage unit of the intelligent data optimization apparatus 1 and an external storage device. The memory 11 may be used not only to store application software installed in the intelligent data optimization apparatus 1 and various types of data, such as the code of the intelligent data optimization program 01, but also to temporarily store data that has been output or is to be output.
The processor 12 may be a Central Processing Unit (CPU), controller, microcontroller, microprocessor or other data Processing chip in some embodiments, and is used for executing program codes stored in the memory 11 or Processing data, such as executing the intelligent data optimization program 01.
The communication bus 13 is used to realize connection communication between these components.
The network interface 14 may optionally include a standard wired interface, a wireless interface (e.g., WI-FI interface), typically used to establish a communication link between the apparatus 1 and other electronic devices.
Optionally, the apparatus 1 may further comprise a user interface, which may comprise a Display (Display), an input unit such as a Keyboard (Keyboard), and optionally a standard wired interface, a wireless interface. Alternatively, in some embodiments, the display may be an LED display, a liquid crystal display, a touch-sensitive liquid crystal display, an OLED (Organic Light-Emitting Diode) touch device, or the like. The display, which may also be referred to as a display screen or display unit, is suitable for displaying information processed in the intelligent data optimization device 1 and for displaying a visual user interface.
While FIG. 2 shows only the intelligent data optimization device 1 with components 11-14 and an intelligent data optimization program 01, those skilled in the art will appreciate that the configuration shown in FIG. 1 does not constitute a limitation of the intelligent data optimization device 1, and may include fewer or more components than shown, or some components in combination, or a different arrangement of components.
In the embodiment of the apparatus 1 shown in fig. 2, an intelligent data optimization program 01 is stored in the memory 11; the processor 12, when executing the intelligent data optimization program 01 stored in the memory 11, implements the following steps:
the method comprises the steps of firstly, receiving a data optimization instruction input by a user, extracting an original data set from a big data storage platform, and performing exception removal processing on the original data set to obtain a standard data set.
The big data storage platform is a framework or platform which stores and processes a large amount of data, such as mapreduce, hive, spark, and the like.
The original data set is a data set which needs to be optimized, such as life insurance data input by a user, and the specifications, data volume, final application and the like of the life insurance data are different, so that the methods are different in a storage mode, a data calculation mode and the like, and data optimization is needed.
The exception removing processing is an operation for removing the missing, repeated and other abnormal data in the original data set to obtain standard data. And the exception removing treatment comprises bilateral test elimination and unilateral test elimination. The unilateral test rejection comprises minimum test rejection and maximum test rejection, and the bilateral test rejection data formula is as follows:
Figure BDA0002373976260000091
wherein i is a positive integer,
Figure BDA0002373976260000092
Represents the mean of the original data set, S represents the standard deviation of the original data set, YiRepresenting data within the original data set.
The formula of the minimum test rejection is as follows:
Figure BDA0002373976260000093
the formula of the maximum test rejection is as follows:
Figure BDA0002373976260000094
and secondly, performing grey prediction on the standard data set to obtain a statistical information set.
Preferably, the purpose of the gray prediction is to evaluate and process the concurrency and the optimal resource configuration of the standard data set according to the currently input running condition and the resource use condition of the standard data set and the historical data task, such as the use rate and the running time in the aspects of CPU, memory, disk, network IO, and the like, so as to obtain the statistical data set.
Further, the second step comprises: and counting historical data of the standard data set according to a sampling statistical method to obtain a historical data set, adding the historical data set and the standard data set to obtain a total data set, establishing a differential equation according to the total data set, and solving the differential equation to obtain a statistical information set.
In detail, the differential equation is established as follows:
X(0)={X(0)(i),i=1,2,3,…,n}
wherein, X(0)Representing the standard data set, and the historical data set is the data volume of the standard data set:
X(1)={X(1)(k),k=1,2,3,…,t}
the assemblyData set is X(2)(k)
Figure BDA0002373976260000101
For the total data set X(2)(k) Establishing a differential equation:
Figure BDA0002373976260000102
wherein s is the data number of the total data set, a is a constraint factor of the differential equation, and u is a target value of the differential equation. The solution to the above differential equation is:
Figure BDA0002373976260000103
or
Figure BDA0002373976260000104
Wherein k represents a data number of the standard data set.
And step three, calculating the cost value of the statistical information set to obtain a cost data set.
And step three, calculating the cost (i.e. the cost) of each execution mode according to the statistical information set, and further selecting the optimal execution mode, wherein the execution modes comprise a storage mode, a data calculation mode and the like.
Further, the third step includes: and receiving the statistical information set, carrying out full arrangement on the statistical information set to obtain a plurality of full arrangement values, calculating the cost values of the full arrangement values according to a pre-constructed cost function, and selecting an arrangement data set corresponding to the full arrangement value with the minimum cost value to obtain a cost data set.
In detail, the full permutation value y is:
Figure BDA0002373976260000105
wherein n! Representing a permutation and combination of said set of statistical information, rk| A Representing and arranging data traversing the set of statistical information.
Preferably, the cost function is:
Figure BDA0002373976260000106
wherein N represents a specific number of the plurality of full permutation values, ygoalTarget value, y, representing a preset full alignment valueiRepresenting said plurality of fully aligned values, L representing an objective function, preferably using a gradient descent algorithm, J (y)i) A penalty function is represented and p represents an adjustment factor.
And step four, eliminating the data which are greater than or equal to a preset cost threshold value in the cost data set to obtain an optimized cost data set.
Preferably, if the preset cost threshold is 0.8, if the data in the cost data set is greater than or equal to the cost threshold 0.8, the data is rejected, and if the data in the cost data set is less than the cost threshold 0.8, the data is retained.
And fifthly, carrying out data range modification operation on the optimized cost data set to obtain an optimal data set, and storing the optimal data set into the big data storage platform to complete data optimization operation.
Preferably, the data range modification operation includes methods of partition pruning, distribution pull-up, distribution pull-down, distribution alignment, and the like.
Further, if the user feels that the data distribution of the optimized cost data set is complex, the scheme can perform partition pruning on the optimized cost data set according to the CART algorithm or other pruning algorithms, so that the data distribution is simple and convenient; if the data distribution of the optimized cost data set is dispersed and a user needs to centralize data, the distribution pull-up operation can be carried out, and the optimized cost data set is mapped in a data interval; if the data distribution of the optimized cost data set is huge, performing the distribution push-down to map the optimized cost data set into a reasonable data interval; and if the structure of the data of the optimized cost data set on the data arrangement is incomplete, performing the distribution alignment to ensure that the structure of the data distribution is more complete.
Alternatively, in other embodiments, the intelligent data optimization program can be further divided into one or more modules, and the one or more modules are stored in the memory 11 and executed by one or more processors (in this embodiment, the processor 12) to implement the present invention.
For example, referring to fig. 3, a schematic diagram of program modules of an intelligent data optimization program in an embodiment of the intelligent data optimization apparatus of the present invention is shown, in this embodiment, the intelligent data optimization program can be divided into a data receiving and processing module 10, a gray prediction module 20, a cost optimization module 30, and a data optimization module 40, which exemplarily:
the data receiving and processing module 10 is configured to: receiving a data optimization instruction input by a user, extracting an original data set from a big data storage platform, and performing exception removal processing on the original data set to obtain a standard data set.
The gray prediction module 20 is configured to: and performing grey prediction on the standard data set to obtain a statistical information set.
The cost optimization 30 is used to: and calculating the cost value of the statistical information set to obtain a cost data set, and eliminating data which is greater than or equal to a preset cost threshold value in the cost data set to obtain an optimized cost data set.
The data optimization module 40 is configured to: and carrying out data range modification operation on the optimized cost data set to obtain an optimal data set, and storing the optimal data set into the big data storage platform to complete data optimization operation.
The functions or operation steps of the data receiving and processing module 10, the gray prediction module 20, the cost optimization module 30, the data optimization module 40, and other program modules implemented by the program modules are substantially the same as those of the above embodiments, and are not described herein again.
Furthermore, an embodiment of the present invention also provides a computer-readable storage medium, on which an intelligent data optimization program is stored, where the intelligent data optimization program is executable by one or more processors to implement the following operations:
receiving a data optimization instruction input by a user, extracting an original data set from a big data storage platform, and performing exception removal processing on the original data set to obtain a standard data set.
And performing grey prediction on the standard data set to obtain a statistical information set.
And calculating the cost value of the statistical information set to obtain a cost data set, and eliminating data which is greater than or equal to a preset cost threshold value in the cost data set to obtain an optimized cost data set.
And carrying out data range modification operation on the optimized cost data set to obtain an optimal data set, and storing the optimal data set into the big data storage platform to complete data optimization operation.
It should be noted that the above-mentioned numbers of the embodiments of the present invention are merely for description, and do not represent the merits of the embodiments. And the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, apparatus, article, or method that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, apparatus, article, or method. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, apparatus, article, or method that includes the element.
Through the above description of the embodiments, those skilled in the art will clearly understand that the method of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better implementation manner. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium (e.g., ROM/RAM, magnetic disk, optical disk) as described above and includes instructions for enabling a terminal device (e.g., a mobile phone, a computer, a server, or a network device) to execute the method according to the embodiments of the present invention.
The above description is only a preferred embodiment of the present invention, and not intended to limit the scope of the present invention, and all modifications of equivalent structures and equivalent processes, which are made by using the contents of the present specification and the accompanying drawings, or directly or indirectly applied to other related technical fields, are included in the scope of the present invention.

Claims (10)

1. An intelligent data optimization method, the method comprising:
receiving a data optimization instruction input by a user, extracting an original data set from a big data storage platform, and performing exception removal processing on the original data set to obtain a standard data set;
performing grey prediction on the standard data set to obtain a statistical information set;
calculating the cost value of the statistical information set to obtain a cost data set;
eliminating data which are greater than or equal to a preset cost threshold value in the cost data set to obtain an optimized cost data set;
and carrying out data range modification operation on the optimized cost data set to obtain an optimal data set, and storing the optimal data set into the big data storage platform to complete data optimization operation.
2. The intelligent data optimization method according to claim 1, wherein the anomaly removal processing includes a bilateral test rejection processing and a unilateral test rejection processing, and the unilateral test rejection processing includes a minimum value test rejection processing and a maximum value test rejection processing;
the calculation method of the bilateral test rejection processing comprises the following steps:
Figure FDA0002373976250000011
wherein, i is a positive integer,
Figure FDA0002373976250000012
represents the mean of the original data set, S represents the standard deviation of the original data set, YiRepresenting data within said original data set, G1The value of the bilateral test rejection processing is obtained;
the calculation method of the minimum test rejection processing comprises the following steps:
Figure FDA0002373976250000013
wherein G is2Testing the value after the elimination processing for the minimum value;
the calculation method of the maximum value test rejection processing comprises the following steps:
Figure FDA0002373976250000014
wherein G is3And testing the value after the elimination processing for the maximum value.
3. The intelligent data optimization method of claim 1, wherein the grey prediction of the standard data set to obtain a statistical information set comprises:
counting historical data of the standard data set according to a sampling statistical method to obtain a historical data set;
adding the historical data set and the standard data set to obtain a total data set;
and establishing a differential equation according to the total data set, and solving the differential equation to obtain the statistical information set.
4. The intelligent data optimization method of claim 3, wherein the differential equation is:
Figure FDA0002373976250000021
wherein, X(2)Representing the total data set, s being a data number of the total data set, a being a constraint factor of the differential equation, and u being a target value of the differential equation.
5. The intelligent data optimization method of any one of claims 1 to 4, wherein the calculating the cost value of the set of statistical information yields a cost data set comprising:
fully arranging the statistical information set to obtain a plurality of fully arranged values;
calculating the cost values of the full arrangement values according to a pre-constructed cost function;
and selecting the arrangement data set corresponding to the full arrangement value with the minimum cost value to obtain a cost data set.
6. An intelligent data optimization device, comprising a memory and a processor, the memory having stored thereon an intelligent data optimization program operable on the processor, the intelligent data optimization program when executed by the processor implementing the steps of:
receiving a data optimization instruction input by a user, extracting an original data set from a big data storage platform, and performing exception removal processing on the original data set to obtain a standard data set;
performing grey prediction on the standard data set to obtain a statistical information set;
calculating the cost value of the statistical information set to obtain a cost data set;
eliminating data which are greater than or equal to a preset cost threshold value in the cost data set to obtain an optimized cost data set;
and carrying out data range modification operation on the optimized cost data set to obtain an optimal data set, and storing the optimal data set into the big data storage platform to complete data optimization operation.
7. The intelligent data optimization device according to claim 6, wherein the anomaly removal processing includes a bilateral test rejection processing and a unilateral test rejection processing, and the unilateral test rejection processing includes a minimum value test rejection processing and a maximum value test rejection processing;
the calculation method of the bilateral test rejection processing comprises the following steps:
Figure FDA0002373976250000022
wherein, i is a positive integer,
Figure FDA0002373976250000023
represents the mean of the original data set, S represents the standard deviation of the original data set, YiRepresenting data within said original data set, G1And the value of the bilateral test rejection processing is obtained.
The calculation method of the minimum test rejection processing comprises the following steps:
Figure FDA0002373976250000031
wherein G is2Testing the value after the elimination processing for the minimum value;
the calculation method of the maximum value test rejection processing comprises the following steps:
Figure FDA0002373976250000032
wherein G is3And testing the value after the elimination processing for the maximum value.
8. The intelligent data optimization apparatus of claim 6, wherein the grey prediction of the standard data set to obtain a statistical information set comprises:
counting historical data of the standard data set according to a sampling statistical method to obtain a historical data set;
adding the historical data set and the standard data set to obtain a total data set;
and establishing a differential equation according to the total data set, and solving the differential equation to obtain a statistical information set.
9. The intelligent data optimization device according to any one of claims 6 to 8, wherein the calculating the cost value of the statistical information set yields a cost data set, comprising:
fully arranging the statistical information set to obtain a plurality of fully arranged values;
calculating the cost values of the full arrangement values according to a pre-constructed cost function;
and selecting the arrangement data set corresponding to the full arrangement value with the minimum cost value to obtain a cost data set.
10. A computer-readable storage medium having stored thereon an intelligent data optimization program executable by one or more processors to perform the steps of the intelligent data optimization method of any one of claims 1 to 5.
CN202010068234.5A 2020-01-19 2020-01-19 Intelligent data optimization method and device and computer readable storage medium Pending CN111259318A (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN202010068234.5A CN111259318A (en) 2020-01-19 2020-01-19 Intelligent data optimization method and device and computer readable storage medium
PCT/CN2020/098964 WO2021143055A1 (en) 2020-01-19 2020-06-29 Intelligent data optimization method and apparatus, electronic device and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010068234.5A CN111259318A (en) 2020-01-19 2020-01-19 Intelligent data optimization method and device and computer readable storage medium

Publications (1)

Publication Number Publication Date
CN111259318A true CN111259318A (en) 2020-06-09

Family

ID=70950881

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010068234.5A Pending CN111259318A (en) 2020-01-19 2020-01-19 Intelligent data optimization method and device and computer readable storage medium

Country Status (2)

Country Link
CN (1) CN111259318A (en)
WO (1) WO2021143055A1 (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2021143055A1 (en) * 2020-01-19 2021-07-22 平安科技(深圳)有限公司 Intelligent data optimization method and apparatus, electronic device and storage medium
CN114358882A (en) * 2022-01-06 2022-04-15 安徽易商数码科技有限公司 Rural electric business operation data processing method
CN116540790A (en) * 2023-07-05 2023-08-04 深圳市保凌影像科技有限公司 Tripod head stability control method and device, electronic equipment and storage medium

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6615211B2 (en) * 2001-03-19 2003-09-02 International Business Machines Corporation System and methods for using continuous optimization for ordering categorical data sets
CN103853844A (en) * 2014-03-24 2014-06-11 南开大学 Hadoop-based relation table nonredundant key set identification method
CN105205144B (en) * 2015-09-18 2019-03-26 北京百度网讯科技有限公司 Method and system for data diagnosis optimization
CN108767883B (en) * 2018-06-27 2021-05-04 深圳库博能源科技有限公司 Response processing method of demand side
CN110705816B (en) * 2019-08-14 2023-08-25 中国平安人寿保险股份有限公司 Task allocation method and device based on big data
CN111259318A (en) * 2020-01-19 2020-06-09 平安科技(深圳)有限公司 Intelligent data optimization method and device and computer readable storage medium

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2021143055A1 (en) * 2020-01-19 2021-07-22 平安科技(深圳)有限公司 Intelligent data optimization method and apparatus, electronic device and storage medium
CN114358882A (en) * 2022-01-06 2022-04-15 安徽易商数码科技有限公司 Rural electric business operation data processing method
CN116540790A (en) * 2023-07-05 2023-08-04 深圳市保凌影像科技有限公司 Tripod head stability control method and device, electronic equipment and storage medium
CN116540790B (en) * 2023-07-05 2023-09-08 深圳市保凌影像科技有限公司 Tripod head stability control method and device, electronic equipment and storage medium

Also Published As

Publication number Publication date
WO2021143055A1 (en) 2021-07-22

Similar Documents

Publication Publication Date Title
CN111259318A (en) Intelligent data optimization method and device and computer readable storage medium
US20150324326A1 (en) Techniques to perform curve fitting for statistical tests
CN107273104B (en) Processing method and device for configuration data structure
CN111414353A (en) Intelligent missing data filling method and device and computer readable storage medium
JP6570156B2 (en) Database system optimization method, system, electronic apparatus, and storage medium
CN112541745A (en) User behavior data analysis method and device, electronic equipment and readable storage medium
EP4394595A1 (en) Job solving method and apparatus
CN110363303B (en) Memory training method and device for intelligent distribution model and computer readable storage medium
CN112508118A (en) Target object behavior prediction method aiming at data migration and related equipment thereof
CN112579621B (en) Data display method and device, electronic equipment and computer storage medium
CN111931848A (en) Data feature extraction method and device, computer equipment and storage medium
CN111241066B (en) Platform database automation operation and maintenance method, device and computer readable storage medium
CN114638501A (en) Business data processing method and device, computer equipment and storage medium
CN111143568A (en) Method, device and equipment for buffering during paper classification and storage medium
CN111339064A (en) Data tilt correction method, device and computer readable storage medium
CN112699934B (en) Alarm classification method and device and electronic equipment
CN110134390B (en) Method for realizing intelligent push function of programmable controller graph programming control based on user similarity
CN111782208A (en) Index early warning method and device, computer equipment and storage medium
CN110674020A (en) APP intelligent recommendation method and device and computer readable storage medium
CN110717056A (en) Noe4j graph database updating maintenance method, device and computer readable storage medium
CN109918353A (en) The method and terminal device of automated information processing
US20240152811A1 (en) Artificial-intelligence-assisted construction of integration processes
CN114819590B (en) Policy intelligent recommendation method, device, equipment and storage medium
CN111522812B (en) User intelligent layering method and device, electronic equipment and readable storage medium
CN113722292B (en) Disaster response processing method, device, equipment and storage medium of distributed data system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination