CN116795825A - Data cleaning method, device, computer equipment and storage medium - Google Patents

Data cleaning method, device, computer equipment and storage medium Download PDF

Info

Publication number
CN116795825A
CN116795825A CN202310743373.7A CN202310743373A CN116795825A CN 116795825 A CN116795825 A CN 116795825A CN 202310743373 A CN202310743373 A CN 202310743373A CN 116795825 A CN116795825 A CN 116795825A
Authority
CN
China
Prior art keywords
data
storage
index
cleaning
storage object
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310743373.7A
Other languages
Chinese (zh)
Inventor
苏媛媛
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ping An Property and Casualty Insurance Company of China Ltd
Original Assignee
Ping An Property and Casualty Insurance Company of China Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ping An Property and Casualty Insurance Company of China Ltd filed Critical Ping An Property and Casualty Insurance Company of China Ltd
Priority to CN202310743373.7A priority Critical patent/CN116795825A/en
Publication of CN116795825A publication Critical patent/CN116795825A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/215Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2282Tablespace storage structures; Management thereof
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0639Performance analysis of employees; Performance analysis of enterprise or organisation operations
    • G06Q10/06393Score-carding, benchmarking or key performance indicator [KPI] analysis

Landscapes

  • Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Theoretical Computer Science (AREA)
  • Human Resources & Organizations (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Data Mining & Analysis (AREA)
  • Quality & Reliability (AREA)
  • Development Economics (AREA)
  • Educational Administration (AREA)
  • Economics (AREA)
  • Strategic Management (AREA)
  • Game Theory and Decision Science (AREA)
  • Software Systems (AREA)
  • Marketing (AREA)
  • Operations Research (AREA)
  • Tourism & Hospitality (AREA)
  • General Business, Economics & Management (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The embodiment of the application belongs to the fields of big data and financial science and technology, and relates to a data cleaning method, a device, computer equipment and a storage medium, wherein the method comprises the following steps: acquiring various storage indexes of various storage objects, wherein the storage objects have stored data; calculating each storage index of each storage object through a weight algorithm to obtain index weights respectively corresponding to each storage index of each storage object; for each storage object, calculating the data retention value of the stored data in the storage object based on the index weight and the index value respectively corresponding to each storage index of the storage object; determining the data value grade of the stored data in the storage object according to the data retention value; acquiring a data cleaning strategy matched with the data value grade, and entering stored data in a storage object according to the data cleaning strategyAnd (5) cleaning row data. The application also relates to a blockchain technique, in which data cleaning policies can be stored The application can effectively clean the stored data.

Description

Data cleaning method, device, computer equipment and storage medium
Technical Field
The present application relates to the field of big data technologies and financial technologies, and in particular, to a data cleaning method, a data cleaning device, a computer device, and a storage medium.
Background
The database is a convergence warehouse of service data, and the upper layer service stores and calls the data through the database. As time increases, the data stored in the database also increases substantially, resulting in increased storage costs. For example, financial institution businesses involve a large number of customers, and daily increases the amount of data stored, resulting in tremendous storage pressure.
Because the stored data is likely to be accessed, the manager does not know which data will be accessed, and therefore, the stored data can only be stored continuously, cannot be accurately evaluated, and cannot be effectively cleaned.
Disclosure of Invention
The embodiment of the application aims to provide a data cleaning method, a data cleaning device, computer equipment and a storage medium, so as to effectively clean stored data.
In order to solve the above technical problems, the embodiment of the present application provides a data cleaning method, which adopts the following technical scheme:
acquiring various storage indexes of various storage objects, wherein the storage objects have stored data;
Calculating each storage index of each storage object through a preset weight algorithm to obtain index weights respectively corresponding to each storage index of each storage object;
for each storage object, calculating the data retention value of the stored data in the storage object based on index weights and index values respectively corresponding to each storage index of the storage object;
determining the data value grade of the stored data in the storage object according to the data retention value;
and acquiring a data cleaning strategy matched with the data value grade, and cleaning the data of the stored data in the storage object according to the data cleaning strategy.
In order to solve the above technical problems, the embodiment of the present application further provides a data cleaning device, which adopts the following technical scheme:
the index acquisition module is used for acquiring various storage indexes of various storage objects, wherein the storage objects have stored data;
the weight calculation module is used for calculating each storage index of each storage object through a preset weight algorithm to obtain index weights respectively corresponding to each storage index of each storage object;
The value calculation module is used for calculating the data retention value of the stored data in each storage object based on the index weight and the index value respectively corresponding to each storage index of the storage object;
the grade determining module is used for determining the data value grade of the stored data in the storage object according to the data retention value;
and the data cleaning module is used for acquiring a data cleaning strategy matched with the data value grade and cleaning the data stored in the storage object according to the data cleaning strategy.
In order to solve the above technical problems, the embodiment of the present application further provides a computer device, which adopts the following technical schemes:
acquiring various storage indexes of various storage objects, wherein the storage objects have stored data;
calculating each storage index of each storage object through a preset weight algorithm to obtain index weights respectively corresponding to each storage index of each storage object;
for each storage object, calculating the data retention value of the stored data in the storage object based on index weights and index values respectively corresponding to each storage index of the storage object;
Determining the data value grade of the stored data in the storage object according to the data retention value;
and acquiring a data cleaning strategy matched with the data value grade, and cleaning the data of the stored data in the storage object according to the data cleaning strategy.
In order to solve the above technical problems, an embodiment of the present application further provides a computer readable storage medium, which adopts the following technical schemes:
acquiring various storage indexes of various storage objects, wherein the storage objects have stored data;
calculating each storage index of each storage object through a preset weight algorithm to obtain index weights respectively corresponding to each storage index of each storage object;
for each storage object, calculating the data retention value of the stored data in the storage object based on index weights and index values respectively corresponding to each storage index of the storage object;
determining the data value grade of the stored data in the storage object according to the data retention value;
and acquiring a data cleaning strategy matched with the data value grade, and cleaning the data of the stored data in the storage object according to the data cleaning strategy.
Compared with the prior art, the embodiment of the application has the following main beneficial effects: acquiring various storage indexes of various storage objects, wherein the storage objects have stored data, and the various storage indexes describe the characteristics of the stored data from different dimensions or the characteristics of the storage objects on storage; calculating each storage index of each storage object through a weight algorithm to obtain index weights respectively corresponding to each storage index of each storage object, and realizing objective calculation of the weights; for each storage object, calculating the data retention value of the stored data according to the index weight and the index value corresponding to each storage index of the storage object, determining the data value level according to the data retention value, and acquiring a data cleaning strategy matched with the data value level, wherein the data cleaning strategy defines a cleaning mode of the stored data, so that reasonable and effective data cleaning can be realized according to the data cleaning strategy.
Drawings
In order to more clearly illustrate the solution of the present application, a brief description will be given below of the drawings required for the description of the embodiments of the present application, it being apparent that the drawings in the following description are some embodiments of the present application, and that other drawings may be obtained from these drawings without the exercise of inventive effort for a person of ordinary skill in the art.
FIG. 1 is an exemplary system architecture diagram in which the present application may be applied;
FIG. 2 is a flow chart of one embodiment of a data cleansing method according to the present application;
FIG. 3 is a schematic diagram of an embodiment of a data cleaning device according to the present application;
FIG. 4 is a schematic structural diagram of one embodiment of a computer device in accordance with the present application.
Detailed Description
Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs; the terminology used in the description of the applications herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the application; the terms "comprising" and "having" and any variations thereof in the description of the application and the claims and the description of the drawings above are intended to cover a non-exclusive inclusion. The terms first, second and the like in the description and in the claims or in the above-described figures, are used for distinguishing between different objects and not necessarily for describing a sequential or chronological order.
Reference herein to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment may be included in at least one embodiment of the application. The appearances of such phrases in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. Those of skill in the art will explicitly and implicitly appreciate that the embodiments described herein may be combined with other embodiments.
In order to make the person skilled in the art better understand the solution of the present application, the technical solution of the embodiment of the present application will be clearly and completely described below with reference to the accompanying drawings.
As shown in fig. 1, a system architecture 100 may include terminal devices 101, 102, 103, a network 104, and a server 105. The network 104 is used as a medium to provide communication links between the terminal devices 101, 102, 103 and the server 105. The network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, among others.
The user may interact with the server 105 via the network 104 using the terminal devices 101, 102, 103 to receive or send messages or the like. Various communication client applications, such as a web browser application, a shopping class application, a search class application, an instant messaging tool, a mailbox client, social platform software, etc., may be installed on the terminal devices 101, 102, 103.
The terminal devices 101, 102, 103 may be various electronic devices having a display screen and supporting web browsing, including but not limited to smartphones, tablet computers, electronic book readers, MP3 players (Moving Picture Experts Group Audio Layer III, dynamic video expert compression standard audio plane 3), MP4 (Moving Picture Experts Group Audio Layer IV, dynamic video expert compression standard audio plane 4) players, laptop and desktop computers, and the like.
The server 105 may be a server providing various services, such as a background server providing support for pages displayed on the terminal devices 101, 102, 103.
It should be noted that, the data cleaning method provided by the embodiment of the present application is generally executed by a server, and accordingly, the data cleaning device is generally disposed in the server.
It should be understood that the number of terminal devices, networks and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.
With continued reference to FIG. 2, a flow chart of one embodiment of a data cleansing method according to the present application is shown. The data cleaning method comprises the following steps:
step S201, each storage index of each storage object is obtained, wherein the storage object has stored data therein.
In this embodiment, the electronic device (for example, the server shown in fig. 1) on which the data cleaning method operates may communicate with the terminal device through a wired connection manner or a wireless connection manner. It should be noted that the wireless connection may include, but is not limited to, 3G/4G/5G connection, wiFi connection, bluetooth connection, wiMAX connection, zigbee connection, UWB (ultra wideband) connection, and other now known or later developed wireless connection.
Specifically, the server acquires each storage index of each storage object, where the storage object is an object for storing data, has a multi-dimensional storage index, and different storage indexes describe characteristics of stored data in the storage object from different dimensions, or characteristics of the storage object on the storage data, for example, the storage index may describe characteristics of the storage object on the storage data through storage cost, and describe characteristics of stored data in the storage object through maintainability.
Further, before the step S201, the method may further include: receiving an object cleaning instruction; and determining a plurality of storage objects to be cleaned based on the object cleaning instruction, wherein the storage objects are databases or data storage organizations in the databases.
Specifically, the server receives an object cleaning instruction, in which a plurality of storage objects to be cleaned are recorded, and the server needs to evaluate the data value and clean the data of the storage objects.
The storage object in the application can be a database or a data storage organization in the database; the database may comprise different storage portions, i.e. different data storage organizations, e.g. the database stores class a data and class B data simultaneously, and the storage of class a data may be regarded as one data storage organization and the storage of class B data may be regarded as another data storage organization.
Typically, the memory objects in an object scrubbing instruction are typically at the same level, i.e., simultaneously for a database, or simultaneously for a data storage organization in a database; the memory objects in an object scrubbing instruction may also be of different levels, i.e. contain both databases and data storage organizations.
In this embodiment, a plurality of storage objects to be cleaned are determined based on the object cleaning instruction, and the storage objects are databases or data storage organizations in the databases, so that flexibility in setting the storage objects is improved.
Further, the step S201 may include: acquiring each storage evaluation parameter and each user influence parameter of each storage object; carrying out standardization processing on each storage evaluation parameter and each user influence parameter of each storage object to obtain each storage evaluation index and each user influence index of each storage object; and determining each storage evaluation index and each user influence index of each storage object as each storage index of each storage object.
Specifically, for each storage object, the storage object has a plurality of storage evaluation parameters and a plurality of user influence parameters; the storage evaluation parameter and the user influence parameter are characteristics for describing stored data in the storage object or characteristics of the storage object on the stored data.
The storage evaluation parameter and the user influence parameter may have actual physical meaning, that is, may have dimensions, for example, a unit of a remaining storage capacity in the storage object is a Terabyte (a unit of a storage capacity of a computer may be denoted by TB), and the storage evaluation parameter and the user influence parameter may have no dimensions.
The storage evaluation parameters are more prone to storing characteristics of the object or stored data itself, and the user-influencing parameters are more prone to managing characteristics of the person adding to the object or stored data. For example, the storage evaluation parameters may include (encryption) storage cost, remaining storage capacity, the number of operations (the number of times stored data is operated, including the number of reads, modifications, etc.), maintainability, and the like; the user influencing parameters include user authorization parameters (the user data is required to be authorized by the user, the authorized permission condition of the user to the user data is indicated by the user authorization parameters, the authorized data can be reserved for a long time, the unauthorized data is required to be cleaned as soon as possible), activation parameters (whether a manager needs to reserve the data or not, and the evaluation of the cleaning process is slowed down) and the like.
Carrying out standardization processing on each storage evaluation parameter and each user influence parameter of the storage object, so as to map each storage evaluation parameter and each user influence parameter to a proper value interval; regardless of the dimension of each storage evaluation parameter and each user influence parameter, standardized processing can be performed to obtain each storage evaluation index and each user influence index of the storage object, and each storage evaluation index and each user influence index of the storage object are determined to be each storage index of the storage object.
In this embodiment, each storage evaluation parameter and each user influence parameter of each storage object are obtained and standardized so as to facilitate subsequent weight calculation, and each storage evaluation index and each user influence index after the standardized processing are determined as each storage index of each storage object, so that the richness of the storage indexes is improved, and the storage objects can be comprehensively and accurately evaluated.
Step S202, calculating each storage index of each storage object through a preset weight algorithm to obtain index weights respectively corresponding to each storage index of each storage object.
Specifically, the server may calculate, through a preset weight algorithm, an index weight corresponding to each storage index of each storage object, where the greater the index weight is, the more important the storage index is in the evaluation of the data retention value.
Further, the step S202 may include: and calculating each storage index of each storage object through an entropy algorithm to obtain index weights respectively corresponding to each storage index of each storage object.
Specifically, the preset weight algorithm may be an entropy value algorithm, and the concept of entropy is derived from thermodynamics and is a measure of uncertainty of the system state. In the theory of information, information is a measure of the degree of order of a system, entropy is a measure of the degree of disorder of a system, and the absolute values of the information and the entropy are equal but opposite in sign. The inherent information of each scheme in the evaluation can be utilized, the information entropy of each storage index is obtained through an entropy value algorithm, the smaller the information entropy is, the lower the disorder degree of the information is, the larger the utility value of the information is, and the larger the weight of the storage index is.
Each storage object has the same several storage indexes, but different storage objects have different index values under the same storage index. The entropy algorithm calculates an index weight for each storage index of the storage object. Assuming that the existing storage objects A, B, C and T have one storage index A, when calculating the index weight of the storage index A of the storage object A, the index values of the storage objects A are combined with other storage objects (B, C and T).
It is to be appreciated that the present application can also use other weight algorithms, such as CRITIC algorithm, analytic hierarchy process, relief algorithm, etc., to calculate the index weight.
In the embodiment, the index weights corresponding to the storage indexes of the storage objects are calculated through an entropy method, so that the influence of subjective factors is avoided, and the rationality of the obtained index weights is ensured.
In step S203, for each storage object, the data retention value of the stored data in the storage object is calculated based on the index weight and the index value corresponding to each storage index of the storage object.
In particular, the present application calculates the data retention value to evaluate whether the stored data is worth preserving. Currently, for each storage object, the index weight of each storage index has been calculated and has its index value; multiplying the weight of each storage index with the index value corresponding to each storage index, and adding the products to obtain the data retention value of the stored data in the storage object. It will be appreciated that the larger the value of the data retention value, the more valuable and important the stored data has to be kept.
Step S204, determining the data value grade of the stored data in the storage object according to the data retention value.
Specifically, the application sets a plurality of data value grades, and the data value grades are related to the data retention value; multiple data value intervals can be preset, different data value intervals represent different data value grades, and the data value intervals corresponding to the data retention values are determined, so that the data value grade of the stored data in the storage object can be obtained.
In one embodiment, the storage objects are sorted according to the data retention value to obtain a sorting queue, and each storage object in the sorting queue is set to be at a data value level.
Step S205, a data cleaning strategy matched with the data value grade is obtained, and data cleaning is carried out on the stored data in the storage object according to the data cleaning strategy.
Specifically, different data value classes have different data cleansing policies, and the same data value class has the same data cleansing policy. And acquiring a data cleaning strategy matched with the data value grade, wherein the data cleaning strategy prescribes a cleaning mode of the stored data in the storage object, so that the stored data in the storage object is cleaned according to the data cleaning strategy.
In this embodiment, each storage index of each storage object is obtained, where the storage object has stored data, and each storage index describes characteristics of the stored data from different dimensions, or characteristics of the storage object on storage; calculating each storage index of each storage object through a weight algorithm to obtain index weights respectively corresponding to each storage index of each storage object, and realizing objective calculation of the weights; for each storage object, calculating the data retention value of the stored data according to the index weight and the index value corresponding to each storage index of the storage object, determining the data value level according to the data retention value, and acquiring a data cleaning strategy matched with the data value level, wherein the data cleaning strategy defines a cleaning mode of the stored data, so that reasonable and effective data cleaning can be realized according to the data cleaning strategy.
Further, the step S205 may include: acquiring a data cleaning strategy matched with the data value grade, wherein the data cleaning strategy comprises cleaning time nodes and corresponding cleaning proportions thereof; and carrying out data cleaning on the stored data in the storage object according to the data cleaning strategy.
Specifically, a data cleaning strategy matched with the data value grade is obtained, wherein the data cleaning strategy comprises cleaning time nodes and corresponding cleaning proportions thereof. The cleaning time nodes can be multiple, and the cleaning time nodes represent the time nodes at which to clean, for example, three cleaning time nodes of one year, three years and five years are set, and represent that the stored data are cleaned in the first year, the third year and the fifth year respectively from the current beginning, and the cleaning of the stored data is completed three times (the completion of cleaning does not represent the complete deletion of the stored data); when each time node cleans the data, a part of the stored data can be cleaned, and the cleaning rate of each time is represented by the cleaning rate, for example, the cleaning rate of the first year is 0%, which means that the stored data is not cleaned in the first year; the cleaning rate in the third year is 50%, which represents that 50% of the stored data are cleaned in the third year; the cleaning rate in the fifth year was 80%, representing that 80% of the stored data was cleaned in the third year. It can be appreciated that data cleansing policies of different data value classes have different cleansing time nodes and their corresponding cleansing proportions. The cleaning policy may also be configured temporarily.
And the server performs data cleaning on the stored data in the storage object according to the data cleaning policy.
In this embodiment, a data cleaning policy matching with the value level of data is obtained, where the data cleaning policy includes cleaning time nodes and their corresponding cleaning proportion, which represents how much stored data needs to be cleaned at which time nodes, so that effective data cleaning is implemented on stored data in a storage object according to the data cleaning policy.
Further, if the storage object stores the stored data in the form of a data table, the step of performing data cleaning on the stored data in the storage object according to the data cleaning policy may include: acquiring each table evaluation index of each data table in stored data of a storage object; determining the data table value of each data table according to each table evaluation index of each data table; and cleaning each data table according to the data cleaning strategy and the data table value.
In particular, the storage object may store stored data in the form of a data table, i.e., store data in the data table. The server can also acquire various table evaluation indexes of various data tables; the table evaluation index may be the same as the storage index of the storage object in meaning, for example, the table evaluation index may include (encryption) storage cost, table data amount, operation number (number of times the data table is operated, including number of times of reading, modifying, etc.), maintainability, user authorization parameter (the data of the user needs to be authorized by the user, and the authorization permission of the user to the user is represented by the user authorization parameter), and activation parameter (whether the manager needs to retain the data, slow down the evaluation of the cleaning process), etc.
The server may determine the data table value of each data table according to each table evaluation index of each data table, for example, calculate the index weight of each table evaluation index of each data table according to the weight algorithm, and then calculate the data table value according to the index weight and the index value thereof, where the data table value reflects the storage value and importance of the data table. Alternatively, each table evaluation index is input into a value evaluation model, which may be an artificial intelligence based model, from which the data table value is output.
The data cleaning strategy has cleaning proportion, and at each cleaning time node, the data table which needs to be cleaned according to the cleaning proportion is determined according to the data table value of each data table, and then each data table is cleaned. For example, in the first year, 20% of the stored data needs to be cleaned, in the second year, 50% of the stored data needs to be cleaned, it is determined which of the data tables (denoted as D1-type data tables) have the data table value of the last 20% and which of the data tables (denoted as D2-type data tables) have the data table value of the last 20% -50%, then the D1-type data table is deleted at the cleaning trigger time point designated in the first year, and the D2-type data table is deleted at the cleaning trigger time point designated in the second year.
In this embodiment, the data table value of each data table in the stored data is calculated, and each data table is effectively cleaned according to the data cleaning policy and the data table value.
Further, after the step S205, the method may further include: obtaining reevaluation information of the storage object, wherein the reevaluation information comprises a data value grade, an access index, an activation index or an authorization index of the storage object; determining an updated cleaning strategy according to the reevaluation information; and performing data cleaning on the stored data in the storage object according to the updated cleaning strategy.
Specifically, re-evaluation information of the storage object is obtained, wherein the re-evaluation information comprises an initial data value grade, an access index, an activation index or an authorization index of the storage object; wherein the access index is an index of the latest accessed condition of the stored data; the manager can judge the importance of the stored data, and add an activation index to the stored data, wherein the activation index can slow down the cleaning process of the stored data; the authorization indicator indicates the latest authorization of the user for the stored data.
The updated cleaning policy is determined based on the reevaluation information, e.g., the reevaluation information is entered into an artificial intelligence based policy model, e.g., a tree model based policy model, resulting in the updated cleaning policy. The updated cleansing policy may be adjusted to a previous data cleansing policy, e.g., the previous data cleansing policy specifies cleansing 20% of the stored data in the first year, cleansing 80% of the stored data in the second year, cleansing 100% of the stored data in the third year; after the first year, an updated cleansing policy is obtained that specifies that 80% of the original stored data needs to be retained in the second year and 60% of the original stored data needs to be retained in the third year. The data is cleaned up of the stored data in the storage object according to the updated cleaning policy. It will be appreciated that the updated policy is based on the execution of the original data cleansing policy (or replaced when the original data cleansing policy is not executed).
In this embodiment, re-evaluation information of the storage object is obtained, an updated cleaning policy is determined according to the re-evaluation information, and data cleaning is performed on stored data in the storage object by replacing the data cleaning policy with the updated cleaning policy, so that timely adjustment of the cleaning policy can be realized.
The stored data in the storage object of the application can be business data stored by a financial institution; it will be appreciated that the stored data may also be data that needs to be stored in other fields of the scene.
It should be emphasized that, to further ensure the privacy and security of the data cleansing policy, the data cleansing policy may also be stored in a node of a blockchain.
The blockchain is a novel application mode of computer technologies such as distributed data storage, point-to-point transmission, consensus mechanism, encryption algorithm and the like. The Blockchain (Blockchain), which is essentially a decentralised database, is a string of data blocks that are generated by cryptographic means in association, each data block containing a batch of information of network transactions for verifying the validity of the information (anti-counterfeiting) and generating the next block. The blockchain may include a blockchain underlying platform, a platform product services layer, an application services layer, and the like.
The embodiment of the application can acquire and process the related data based on the artificial intelligence technology. Among these, artificial intelligence (Artificial Intelligence, AI) is the theory, method, technique and application system that uses a digital computer or a digital computer-controlled machine to simulate, extend and extend human intelligence, sense the environment, acquire knowledge and use knowledge to obtain optimal results.
Artificial intelligence infrastructure technologies generally include technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a robot technology, a biological recognition technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and other directions.
Those skilled in the art will appreciate that implementing all or part of the above described methods may be accomplished by computer readable instructions stored in a computer readable storage medium that, when executed, may comprise the steps of the embodiments of the methods described above. The storage medium may be a nonvolatile storage medium such as a magnetic disk, an optical disk, a Read-Only Memory (ROM), or a random access Memory (Random Access Memory, RAM).
It should be understood that, although the steps in the flowcharts of the figures are shown in order as indicated by the arrows, these steps are not necessarily performed in order as indicated by the arrows. The steps are not strictly limited in order and may be performed in other orders, unless explicitly stated herein. Moreover, at least some of the steps in the flowcharts of the figures may include a plurality of sub-steps or stages that are not necessarily performed at the same time, but may be performed at different times, the order of their execution not necessarily being sequential, but may be performed in turn or alternately with other steps or at least a portion of the other steps or stages.
With further reference to fig. 3, as an implementation of the method shown in fig. 2, the present application provides an embodiment of a data cleaning device, where the embodiment of the device corresponds to the embodiment of the method shown in fig. 2, and the device is specifically applicable to various electronic devices.
As shown in fig. 3, the data cleaning device 300 according to the present embodiment includes: an index acquisition module 301, a weight calculation module 302, a value calculation module 303, a rank determination module 304, and a data cleaning module 305, wherein:
The index obtaining module 301 is configured to obtain each storage index of each storage object, where the storage object has stored data therein.
The weight calculation module 302 is configured to calculate each storage index of each storage object through a preset weight algorithm, so as to obtain an index weight corresponding to each storage index of each storage object.
The value calculation module 303 is configured to calculate, for each storage object, a data retention value of the stored data in the storage object based on the index weight and the index value corresponding to each storage index of the storage object.
The level determining module 304 is configured to determine a data value level of the stored data in the storage object according to the data retention value.
The data cleaning module 305 is configured to obtain a data cleaning policy that matches the data value level, and perform data cleaning on the stored data in the storage object according to the data cleaning policy.
In this embodiment, each storage index of each storage object is obtained, where the storage object has stored data, and each storage index describes characteristics of the stored data from different dimensions, or characteristics of the storage object on storage; calculating each storage index of each storage object through a weight algorithm to obtain index weights respectively corresponding to each storage index of each storage object, and realizing objective calculation of the weights; for each storage object, calculating the data retention value of the stored data according to the index weight and the index value corresponding to each storage index of the storage object, determining the data value level according to the data retention value, and acquiring a data cleaning strategy matched with the data value level, wherein the data cleaning strategy defines a cleaning mode of the stored data, so that reasonable and effective data cleaning can be realized according to the data cleaning strategy.
In some optional implementations of the present embodiment, the data cleaning device 300 may further include: the device comprises an instruction receiving module and an object determining module, wherein:
the instruction receiving module is used for receiving the object cleaning instruction.
And the object determining module is used for determining a plurality of storage objects to be cleaned based on the object cleaning instruction, wherein the storage objects are databases or data storage organizations in the databases.
In this embodiment, a plurality of storage objects to be cleaned are determined based on the object cleaning instruction, and the storage objects are databases or data storage organizations in the databases, so that flexibility in setting the storage objects is improved.
In some optional implementations of the present embodiment, the index obtaining module 301 may include: parameter acquisition submodule, standard processing submodule and index determination submodule, wherein:
and the parameter acquisition sub-module is used for acquiring each storage evaluation parameter and each user influence parameter of each storage object.
And the standard processing sub-module is used for carrying out standardized processing on each storage evaluation parameter and each user influence parameter of each storage object to obtain each storage evaluation index and each user influence index of each storage object.
The index determination submodule is used for determining each storage evaluation index and each user influence index of each storage object as each storage index of each storage object.
In this embodiment, each storage evaluation parameter and each user influence parameter of each storage object are obtained and standardized so as to facilitate subsequent weight calculation, and each storage evaluation index and each user influence index after the standardized processing are determined as each storage index of each storage object, so that the richness of the storage indexes is improved, and the storage objects can be comprehensively and accurately evaluated.
In some optional implementations of this embodiment, the weight calculation module 302 is further configured to calculate, by using an entropy algorithm, each storage index of each storage object, to obtain an index weight corresponding to each storage index of each storage object.
In the embodiment, the index weights corresponding to the storage indexes of the storage objects are calculated through an entropy method, so that the influence of subjective factors is avoided, and the rationality of the obtained index weights is ensured.
In some alternative implementations of the present embodiment, the data cleaning module 305 may include: a policy acquisition sub-module and a data cleaning sub-module, wherein:
And the strategy acquisition sub-module is used for acquiring a data cleaning strategy matched with the data value grade, wherein the data cleaning strategy comprises cleaning time nodes and corresponding cleaning proportions thereof.
And the data cleaning sub-module is used for cleaning the data of the stored data in the storage object according to the data cleaning strategy.
In this embodiment, a data cleaning policy matching with the value level of data is obtained, where the data cleaning policy includes cleaning time nodes and their corresponding cleaning proportion, which represents how much stored data needs to be cleaned at which time nodes, so that effective data cleaning is implemented on stored data in a storage object according to the data cleaning policy.
In some optional implementations of the present embodiment, the storing object stores the stored data in the form of a data table, and the data cleaning submodule may include: index acquisition unit, value determination unit, and table cleaning unit, wherein:
and the index acquisition unit is used for acquiring each table evaluation index of each data table in the stored data of the storage object.
And the value determining unit is used for determining the data table value of each data table according to each table evaluation index of each data table.
And the table cleaning unit is used for cleaning each data table according to the data cleaning strategy and the data table value.
In this embodiment, the data table value of each data table in the stored data is calculated, and each data table is effectively cleaned according to the data cleaning policy and the data table value.
In some optional implementations of this embodiment, the data cleaning device may further include: the device comprises an information acquisition module, an update determination module and an update cleaning module, wherein:
the information acquisition module is used for acquiring reevaluation information of the storage object, wherein the reevaluation information comprises a data value grade, an access index, an activation index or an authorization index of the storage object.
And the updating determining module is used for determining an updated cleaning strategy according to the reevaluation information.
And the updating cleaning module is used for cleaning the data of the stored data in the storage object according to the updated cleaning strategy.
In this embodiment, re-evaluation information of the storage object is obtained, an updated cleaning policy is determined according to the re-evaluation information, and data cleaning is performed on stored data in the storage object by replacing the data cleaning policy with the updated cleaning policy, so that timely adjustment of the cleaning policy can be realized.
In order to solve the technical problems, the embodiment of the application also provides computer equipment. Referring specifically to fig. 4, fig. 4 is a basic structural block diagram of a computer device according to the present embodiment.
The computer device 4 comprises a memory 41, a processor 42, a network interface 43 communicatively connected to each other via a system bus. It should be noted that only computer device 4 having components 41-43 is shown in the figures, but it should be understood that not all of the illustrated components are required to be implemented and that more or fewer components may be implemented instead. It will be appreciated by those skilled in the art that the computer device herein is a device capable of automatically performing numerical calculations and/or information processing in accordance with predetermined or stored instructions, the hardware of which includes, but is not limited to, microprocessors, application specific integrated circuits (Application Specific Integrated Circuit, ASICs), programmable gate arrays (fields-Programmable Gate Array, FPGAs), digital processors (Digital Signal Processor, DSPs), embedded devices, etc.
The computer equipment can be a desktop computer, a notebook computer, a palm computer, a cloud server and other computing equipment. The computer equipment can perform man-machine interaction with a user through a keyboard, a mouse, a remote controller, a touch pad or voice control equipment and the like.
The memory 41 includes at least one type of readable storage medium including flash memory, hard disk, multimedia card, card memory (e.g., SD or DX memory, etc.), random Access Memory (RAM), static Random Access Memory (SRAM), read Only Memory (ROM), electrically Erasable Programmable Read Only Memory (EEPROM), programmable Read Only Memory (PROM), magnetic memory, magnetic disk, optical disk, etc. In some embodiments, the storage 41 may be an internal storage unit of the computer device 4, such as a hard disk or a memory of the computer device 4. In other embodiments, the memory 41 may also be an external storage device of the computer device 4, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash Card (Flash Card) or the like, which are provided on the computer device 4. Of course, the memory 41 may also comprise both an internal memory unit of the computer device 4 and an external memory device. In this embodiment, the memory 41 is typically used to store an operating system and various application software installed on the computer device 4, such as computer readable instructions of a data cleaning method. Further, the memory 41 may be used to temporarily store various types of data that have been output or are to be output.
The processor 42 may be a central processing unit (Central Processing Unit, CPU), controller, microcontroller, microprocessor, or other data processing chip in some embodiments. The processor 42 is typically used to control the overall operation of the computer device 4. In this embodiment, the processor 42 is configured to execute computer readable instructions stored in the memory 41 or process data, such as computer readable instructions for executing the data cleansing method.
The network interface 43 may comprise a wireless network interface or a wired network interface, which network interface 43 is typically used for establishing a communication connection between the computer device 4 and other electronic devices.
The computer device provided in this embodiment may perform the above-described data cleaning method. The data cleaning method may be the data cleaning method of each of the above embodiments.
In this embodiment, each storage index of each storage object is obtained, where the storage object has stored data, and each storage index describes characteristics of the stored data from different dimensions, or characteristics of the storage object on storage; calculating each storage index of each storage object through a weight algorithm to obtain index weights respectively corresponding to each storage index of each storage object, and realizing objective calculation of the weights; for each storage object, calculating the data retention value of the stored data according to the index weight and the index value corresponding to each storage index of the storage object, determining the data value level according to the data retention value, and acquiring a data cleaning strategy matched with the data value level, wherein the data cleaning strategy defines a cleaning mode of the stored data, so that reasonable and effective data cleaning can be realized according to the data cleaning strategy.
The present application also provides another embodiment, namely, a computer-readable storage medium storing computer-readable instructions executable by at least one processor to cause the at least one processor to perform the steps of the data cleansing method as described above.
In this embodiment, each storage index of each storage object is obtained, where the storage object has stored data, and each storage index describes characteristics of the stored data from different dimensions, or characteristics of the storage object on storage; calculating each storage index of each storage object through a weight algorithm to obtain index weights respectively corresponding to each storage index of each storage object, and realizing objective calculation of the weights; for each storage object, calculating the data retention value of the stored data according to the index weight and the index value corresponding to each storage index of the storage object, determining the data value level according to the data retention value, and acquiring a data cleaning strategy matched with the data value level, wherein the data cleaning strategy defines a cleaning mode of the stored data, so that reasonable and effective data cleaning can be realized according to the data cleaning strategy.
From the above description of the embodiments, it will be clear to those skilled in the art that the above-described embodiment method may be implemented by means of software plus a necessary general hardware platform, but of course may also be implemented by means of hardware, but in many cases the former is a preferred embodiment. Based on such understanding, the technical solution of the present application may be embodied essentially or in a part contributing to the prior art in the form of a software product stored in a storage medium (e.g. ROM/RAM, magnetic disk, optical disk) comprising instructions for causing a terminal device (which may be a mobile phone, a computer, a server, an air conditioner, or a network device, etc.) to perform the method according to the embodiments of the present application.
It is apparent that the above-described embodiments are only some embodiments of the present application, but not all embodiments, and the preferred embodiments of the present application are shown in the drawings, which do not limit the scope of the patent claims. This application may be embodied in many different forms, but rather, embodiments are provided in order to provide a thorough and complete understanding of the present disclosure. Although the application has been described in detail with reference to the foregoing embodiments, it will be apparent to those skilled in the art that modifications may be made to the embodiments described in the foregoing description, or equivalents may be substituted for elements thereof. All equivalent structures made by the content of the specification and the drawings of the application are directly or indirectly applied to other related technical fields, and are also within the scope of the application.

Claims (10)

1. The data cleaning method is characterized by comprising the following steps of:
acquiring various storage indexes of various storage objects, wherein the storage objects have stored data;
calculating each storage index of each storage object through a preset weight algorithm to obtain index weights respectively corresponding to each storage index of each storage object;
for each storage object, calculating the data retention value of the stored data in the storage object based on index weights and index values respectively corresponding to each storage index of the storage object;
determining the data value grade of the stored data in the storage object according to the data retention value;
and acquiring a data cleaning strategy matched with the data value grade, and cleaning the data of the stored data in the storage object according to the data cleaning strategy.
2. The data cleansing method according to claim 1, further comprising, before the step of acquiring each storage index of each storage object:
receiving an object cleaning instruction;
and determining a plurality of storage objects to be cleaned based on the object cleaning instruction, wherein the storage objects are databases or data storage organizations in the databases.
3. The data cleansing method according to claim 1, wherein the step of acquiring each storage index of each storage object comprises:
acquiring each storage evaluation parameter and each user influence parameter of each storage object;
carrying out standardization processing on each storage evaluation parameter and each user influence parameter of each storage object to obtain each storage evaluation index and each user influence index of each storage object;
and determining all storage evaluation indexes and all user influence indexes of all storage objects as all storage indexes of all storage objects.
4. The data cleaning method according to claim 1, wherein the step of calculating each storage index of each storage object by a preset weight algorithm to obtain an index weight corresponding to each storage index of each storage object respectively includes:
and calculating each storage index of each storage object through an entropy algorithm to obtain index weights respectively corresponding to each storage index of each storage object.
5. The method of claim 1, wherein the step of obtaining a data cleansing policy that matches the data value level and cleansing stored data in the storage object according to the data cleansing policy comprises:
Acquiring a data cleaning strategy matched with the data value grade, wherein the data cleaning strategy comprises cleaning time nodes and corresponding cleaning proportions thereof;
and carrying out data cleaning on the stored data in the storage object according to the data cleaning strategy.
6. The data cleansing method according to claim 5, wherein the storage object stores the stored data in the form of a data table, and the step of cleansing the stored data in the storage object according to the data cleansing policy comprises:
acquiring each table evaluation index of each data table in the stored data of the storage object;
determining the data table value of each data table according to each table evaluation index of each data table;
and cleaning each data table according to the data cleaning strategy and the data table value.
7. The data cleansing method according to claim 1, further comprising, after the step of cleansing the stored data in the storage object according to the data cleansing policy:
obtaining reevaluation information of the storage object, wherein the reevaluation information comprises a data value grade, an access index, an activation index or an authorization index of the storage object;
Determining an updated cleaning strategy according to the reevaluation information;
and cleaning the data of the stored data in the storage object according to the updated cleaning strategy.
8. A data cleaning device, comprising:
the index acquisition module is used for acquiring various storage indexes of various storage objects, wherein the storage objects have stored data;
the weight calculation module is used for calculating each storage index of each storage object through a preset weight algorithm to obtain index weights respectively corresponding to each storage index of each storage object;
the value calculation module is used for calculating the data retention value of the stored data in each storage object based on the index weight and the index value respectively corresponding to each storage index of the storage object;
the grade determining module is used for determining the data value grade of the stored data in the storage object according to the data retention value;
and the data cleaning module is used for acquiring a data cleaning strategy matched with the data value grade and cleaning the data stored in the storage object according to the data cleaning strategy.
9. A computer device comprising a memory having stored therein computer readable instructions which when executed by a processor implement the steps of the data cleansing method of any of claims 1 to 7.
10. A computer readable storage medium having stored thereon computer readable instructions which when executed by a processor implement the steps of the data cleaning method according to any of claims 1 to 7.
CN202310743373.7A 2023-06-21 2023-06-21 Data cleaning method, device, computer equipment and storage medium Pending CN116795825A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310743373.7A CN116795825A (en) 2023-06-21 2023-06-21 Data cleaning method, device, computer equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310743373.7A CN116795825A (en) 2023-06-21 2023-06-21 Data cleaning method, device, computer equipment and storage medium

Publications (1)

Publication Number Publication Date
CN116795825A true CN116795825A (en) 2023-09-22

Family

ID=88041468

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310743373.7A Pending CN116795825A (en) 2023-06-21 2023-06-21 Data cleaning method, device, computer equipment and storage medium

Country Status (1)

Country Link
CN (1) CN116795825A (en)

Similar Documents

Publication Publication Date Title
CN113610239B (en) Feature processing method and feature processing system for machine learning
CN111506723B (en) Question-answer response method, device, equipment and storage medium
CN112990583B (en) Method and equipment for determining model entering characteristics of data prediction model
CN112036483B (en) AutoML-based object prediction classification method, device, computer equipment and storage medium
CN115757075A (en) Task abnormity detection method and device, computer equipment and storage medium
CN115630221A (en) Terminal application interface display data processing method and device and computer equipment
CN112199374A (en) Data feature mining method aiming at data missing and related equipment thereof
CN112507141A (en) Investigation task generation method and device, computer equipment and storage medium
CN117235633A (en) Mechanism classification method, mechanism classification device, computer equipment and storage medium
CN116843395A (en) Alarm classification method, device, equipment and storage medium of service system
CN116028446A (en) Time sequence data file management method, device, equipment and storage medium thereof
CN116402625A (en) Customer evaluation method, apparatus, computer device and storage medium
CN112085566B (en) Product recommendation method and device based on intelligent decision and computer equipment
CN116795825A (en) Data cleaning method, device, computer equipment and storage medium
CN115238813A (en) Risk assessment method, device, equipment and storage medium for shared account
CN116993218A (en) Index analysis method, device, equipment and storage medium based on artificial intelligence
CN116842011A (en) Blood relationship analysis method, device, computer equipment and storage medium
CN117291693A (en) Policy generation method, device, equipment and storage medium based on artificial intelligence
CN117786390A (en) Feature data arrangement method to be maintained and related equipment thereof
CN116401061A (en) Method and device for processing resource data, computer equipment and storage medium
CN116402644A (en) Legal supervision method and system based on big data multi-source data fusion analysis
CN116757771A (en) Scheme recommendation method, device, equipment and storage medium based on artificial intelligence
CN116777639A (en) Case risk rating method, device, computer equipment and storage medium
CN116910095A (en) Buried point processing method, buried point processing device, computer equipment and storage medium
CN118212075A (en) Product recommendation method, device, equipment and storage medium based on artificial intelligence

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination