CN117743310A - Full-period data management method, system and storage medium - Google Patents

Full-period data management method, system and storage medium Download PDF

Info

Publication number
CN117743310A
CN117743310A CN202311755753.9A CN202311755753A CN117743310A CN 117743310 A CN117743310 A CN 117743310A CN 202311755753 A CN202311755753 A CN 202311755753A CN 117743310 A CN117743310 A CN 117743310A
Authority
CN
China
Prior art keywords
data
access
value
target enterprise
enterprise database
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311755753.9A
Other languages
Chinese (zh)
Inventor
黄瀛
周勃
刘红霖
李嘉
黄武庆
罗迪
石琼玉
黄旷
叶植添
李文特
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Yunbao Big Data Industry Development Co ltd
Original Assignee
Yunbao Big Data Industry Development Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Yunbao Big Data Industry Development Co ltd filed Critical Yunbao Big Data Industry Development Co ltd
Priority to CN202311755753.9A priority Critical patent/CN117743310A/en
Publication of CN117743310A publication Critical patent/CN117743310A/en
Pending legal-status Critical Current

Links

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention belongs to the technical field of data management, and relates to a full-period data management method, a system and a storage medium.

Description

Full-period data management method, system and storage medium
Technical Field
The invention belongs to the technical field of data management, and relates to a full-period data management method, a full-period data management system and a storage medium.
Background
Modern society is faced with a continual increase in mass data, while these data exhibit a variety of characteristics, including structured data, semi-structured data, unstructured data, and the like. In the face of such huge and diversified data, the conventional data management method cannot meet the needs of enterprises, so that full-period data management is generated.
Although the existing data management mode of enterprises has avoided the defects of low efficiency and difficulty in continuous development of the traditional data management mode, limitations still exist, and the method is specifically expressed in the following steps: the existing enterprise data management method aims at the aspect that the enterprise database periodically examines the data value, and a large amount of unaccessed, old or weakly-correlated data exist in the enterprise database, the data are not cleaned in time, a large amount of space of the enterprise database is occupied, and therefore the storage capacity of high-value data is reduced.
Disclosure of Invention
In view of this, in order to solve the above-mentioned problems in the background art, a full-period data management method, system and storage medium are proposed.
The aim of the invention can be achieved by the following technical scheme: the first aspect of the present invention provides a full-period data management method, comprising: s1, issuing a data value examination instruction: issuing data value review instructions to the target enterprise database at regular data value review time points.
S2, evaluating data input value: receiving a data value examination order, acquiring the input information of each data through an input end of a target enterprise database, and analyzing the input value coefficient of each data aiming at the target enterprise database.
S3, evaluating data access value: and acquiring access information of each data through an access terminal of the target enterprise database, and analyzing an access value coefficient of each data for the target enterprise database.
S4, evaluating the data association value: and analyzing the association value coefficient of each data aiming at the target enterprise database according to the input information and the access information of each data in the target enterprise database.
S5, calculating the score of the comprehensive value of the data: based on the input value coefficient, the access value coefficient and the associated value coefficient of each data aiming at the target enterprise database, calculating the comprehensive value score of each data aiming at the target enterprise database, screening each data lower than the preset comprehensive value score, and generating a low-value data sequence of the target enterprise database.
S6, low-value data sequence feedback: and feeding the generated low-value data sequence back to an administrator terminal of the target enterprise database.
Preferably, the input information comprises the user identity authority level, the stored data table and the number thereof, the input date, the data type, the format and the value of each data item.
The access information includes the terminal IP address, user identity authority level, access path, access date, access duration, type of operation performed, and number of data items performed for each access.
Preferably, the analyzing the input value coefficient of each data for the target enterprise database includes: setting the input authority influence weight corresponding to the user identity authority level of each input in the input information of each data, and taking the input authority influence weight as the identity value influence weight delta of each input of each data in the target enterprise database ij Where i is the number of each data in the target enterprise database, i=1, 2,..a, j is the number of each entry, j=1, 2,..b.
According to the stored data table numbers recorded in each time in the recording information of each data, combining the stored data table numbers corresponding to various data services of the target enterprise stored in the WEB cloud to obtain the data service type corresponding to the target enterprise of the stored data table numbers recorded in each time of each data, and further setting the service value influence weight mu of each time of recording of each data in the target enterprise database ij
From the formulaAnd obtaining a basic input value index of each data in the target enterprise database.
Analyzing the input quality index beta of each data in the target enterprise database according to the data type, format and value of each data item input in each data input information i
According to the input date t of each input in the input information of each data ij The data is input each time and the input date of the previous input is differenced, and the interval days delta t of each data input each time is obtained ij And screening out the last input date t 'of each data' i In combination with the date of the day t 0 Calculating the input timeliness index phi of each data in the target enterprise database iWhere e is a natural constant and b is the number of data entries.
From the formulaAnd obtaining the input value coefficient of each data aiming at the target enterprise database.
Preferably, the analyzing the input quality index of each data in the target enterprise database includes: according to the data types, formats and values of all data items recorded in each time in the recording information of all data, the standard recording format, limited effective domain and missing display value of all data items of the target enterprise database stored in the WEB cloud are combined, all format error data items, all value error data items and all missing data items recorded in each time of all data are screened out, and the number m of data items recorded in each time of all data is counted ij Number of format error data items m' ij Number of valueerror data items m ij And number n of missing data items ij From the formulaAnd obtaining the input quality evaluation index of each data of the target enterprise database.
Preferably, the analyzing the access value coefficient of each data for the target enterprise database includes: comparing the terminal IP address of each access in the access information of each data with a target enterprise trust terminal IP address list stored in a WEB cloud, setting the address reliability of the data access to be sigma if the terminal IP address of the data access to be in the target enterprise trust terminal IP address list, otherwise setting the address reliability to be epsilon, and obtaining the address reliability d of each access of each data in a target enterprise database iw ,d iw σ or ε, σ > ε, where w is the number of each visit, w=1, 2, …, c.
Acquiring the identity value influence weight delta 'of each access of each data in the target enterprise database according to the user identity authority level of each access in the access information of each data' iw
From the formulaAnd obtaining a basic access value index of each data in the target enterprise database.
According to the access date of each access in the access information of each data Screening the last access date t of each data iw Calculating access timeliness index of each data in target enterprise database> Wherein->The date of the (w+1) -th access to the (i) -th data in the target enterprise database,/-th data in the target enterprise database>And c is the number of data accesses, wherein the threshold value is a preset threshold value of the number of days of reasonable interval data access.
According to the access time delta T of each access in the access information of each data iw Type of execution operation and number of execution data items v iw Acquiring the execution depth theta of each access of each data according to the execution depth corresponding to the execution operation of each data stored in the WEB cloud iw Calculating access degree index k of each data in target enterprise database i
Calculating access value coefficient FW of each data for target enterprise database i
Preferably, the analyzing the association value coefficient of each data for the target enterprise database includes: and extracting the stored data table of each data which is recorded last time according to the stored data table of each record in the record information of each data, and recording the stored data table as a reference stored data table of each data.
And counting the number of recorded data of the reference storage data table of certain data aiming at the reference storage data table of the certain data, taking the difference value between the recorded data and 1 as the first-order storage association data density of the data, and calculating the first-order storage association index y of the data by combining the association depth between the data in the shared storage data table stored in the WEB cloud.
And further acquiring each external key data table corresponding to the reference storage data table of the data, counting the number of the recorded data of each external key data table, accumulating, taking the accumulated result as the second-order storage association data density of the data, combining the association depth between the data of the main external key storage data table stored in the WEB cloud, and calculating the second-order storage association index r of the data.
Further acquiring each storage data table using the same index with the reference storage data table of the data, recording the storage data table as each index association data table of the reference storage data table of the data, counting the recorded data quantity of each index association data table, accumulating, taking the accumulated result as the third-order storage association data density of the data, and calculating the third-order storage association index s of the data by combining the association depth between the data of the same index storage data table stored in the WEB cloud.
Obtaining a storage association value coefficient of the data according to the formula x=y+r+s, and further obtaining the storage association value coefficient x of each data i
Preferably, the analyzing the association value coefficient of each data for the target enterprise database further includes: extracting access paths of each access in the access information of each data, carrying out summarization analysis on the access paths of each access of each data, counting the occurrence times of each related access component in the summarized access paths of each data, and marking each related access component with the occurrence times greater than 1 in the summarized access path of certain data as The conventional access components of the data are integrated to generate a conventional access component sequence of the data, the conventional access component sequence of the data is further obtained and compared one by one, the similarity of the conventional access component sequences among the data is obtained, and accordingly, the access associated data of the data and the corresponding access associated depth eta of the access associated data are obtained il L is the number of each access related data, l=1, 2, …, z, and the access related value coefficient f of each data is calculated iWhere z is the number of access-related data.
And then is represented by the formulaAnd obtaining the association value coefficient of each data aiming at the target enterprise database.
Preferably, the calculation formula of the comprehensive value score of each data for the target enterprise database is as follows:pi is 180 deg..
A second aspect of the present invention provides a full cycle data management system comprising: and the data value examination instruction issuing module is used for issuing the data value examination instruction to the target enterprise database at the regular data value examination time point.
The data input value evaluation module is used for receiving the data value examination command, acquiring input information of each data through an input end of the target enterprise database, and analyzing input value coefficients of each data aiming at the target enterprise database.
The data access value evaluation module is used for acquiring access information of each data through an access terminal of the target enterprise database and analyzing the access value coefficient of each data for the target enterprise database.
The data association value evaluation module is used for analyzing the association value coefficient of each data aiming at the target enterprise database according to the input information and the access information of each data in the target enterprise database.
The data comprehensive value score calculation module is used for calculating the comprehensive value score of each data for the target enterprise database based on the input value coefficient, the access value coefficient and the association value coefficient of each data for the target enterprise database, screening each data lower than the preset comprehensive value score and generating a low-value data sequence of the target enterprise database.
And the low-value data sequence feedback module is used for feeding the generated low-value data sequence back to the manager terminal of the target enterprise database.
The cloud database is used for storing the trust terminal IP address list of the target enterprise and the stored data list numbers corresponding to various data services, storing the execution depths corresponding to various data execution operations, and storing the association depths among the shared stored data list data, the main external key stored data list data and the same index stored data list data.
A third aspect of the present invention provides a storage medium storing one or more programs executable by one or more processors to implement steps in a full cycle data governance method according to the present invention.
Compared with the prior art, the invention has the following beneficial effects: (1) According to the invention, the basic input value index, the input quality index and the input timeliness index of each data in the target enterprise database are combined, the input value coefficient of each data for the target enterprise database is comprehensively analyzed, the value of the data is estimated from different angles, the one-sided estimation which only depends on a single index is avoided, the value contribution degree of each data to the target enterprise database is more comprehensively known, and the input importance of each data in the target enterprise database is more accurately judged.
(2) According to the method, the access value coefficient of each data in the target enterprise database is comprehensively analyzed from three layers of the basic access value index, the input timeliness index and the access degree index of each data in the target enterprise database, the access value of each data in the target enterprise database is deeply inspected by adopting a quantitative evaluation mode with strong universality and wide expansibility, the objectivity and the accuracy of an evaluation result are improved, and then data support is provided for subsequent data comprehensive value scoring calculation.
(3) According to the method, the first-order storage association index, the second-order storage association index and the third-order storage association index of the data are respectively analyzed according to the data sharing relationship, the main foreign key relationship and the same index relationship of the data reference storage data table, the association between the data is finely depicted on the data storage level, the storage relationship network between the data is comprehensively known, and more accurate basis is provided for subsequent data value evaluation and data cleaning.
(4) According to the method, the access paths of all the data accesses are summarized and analyzed, the conventional access component sequences of all the data are integrated and generated, the similarity of the conventional access component sequences among all the data is obtained, the access association data of all the data and the corresponding access association depth of the access association data are obtained according to the similarity, the access association value of all the data is quantized more accurately, the hidden data access relationship is identified, and more comprehensive and deep data insight is provided for enterprises.
(5) According to the method, based on the input value coefficient, the access value coefficient and the associated value coefficient of each data aiming at the target enterprise database, the comprehensive value score of each data aiming at the target enterprise database is calculated, the low-value data sequence of the target enterprise database is generated and fed back to the manager terminal, the defect that the conventional method lacks detailed work in the aspect of periodically checking the data value aiming at the enterprise database is perfected, the data which are not accessed, old or weak in association can be timely cleaned, the occupation of the storage space of the database is reduced, the efficiency of data retrieval is optimized, the adverse effect of the data on the whole data quality is avoided, and the high-value data in the enterprise database is effectively reserved, so that the demands of enterprises in the aspects of business decision and data analysis are met.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings that are needed for the description of the embodiments will be briefly described below, it being obvious that the drawings in the following description are only some embodiments of the present invention, and that other drawings may be obtained according to these drawings without inventive effort for a person of ordinary skill in the art.
FIG. 1 is a flow chart of the real-time steps of the method of the present invention.
FIG. 2 is a schematic diagram of the connection of the modules of the system of the present invention.
Detailed Description
The foregoing is merely illustrative of the principles of the invention and various modifications, additions and substitutions for those skilled in the art will be apparent to those having ordinary skill in the art in light of the foregoing description and this disclosure without departing from the principles of the invention or from such principles and spirit of the invention as defined in the accompanying claims.
Example 1
Referring to fig. 1, the present invention provides a full-period data management method, which includes: s1, issuing a data value examination instruction: issuing data value review instructions to the target enterprise database at regular data value review time points.
S2, evaluating data input value: receiving a data value examination order, acquiring the input information of each data through an input end of a target enterprise database, and analyzing the input value coefficient of each data aiming at the target enterprise database.
Specifically, the input information comprises user identity authority level, stored data table and number thereof, input date, data type, format and value of each data item.
The access information comprises the IP address of the terminal, the user identity authority level, the access path, the access date, the access time length, the execution operation type and the execution data item number of each access.
It should be noted that, the input information of each data is obtained through an input interface of the input end of the target enterprise database, and specifically, the input interface may be an application program, a website, or other forms of user interfaces. After the user inputs or uploads the data in the input interface, the data is transmitted to the server where the input end is located, and then is finally written into the target enterprise database through a series of data processing and storage operations.
The access information of each data is obtained through the query interface of the target enterprise database access end, wherein the query interface is generally provided by a database management system, and can be called in various different manners, such as SQL query statement, API interface and the like. When users need to access certain data, they can use the query interface to send a query request to the target enterprise database, the request can be transmitted to the server where the access terminal is located, and then the data meeting the query conditions is finally returned to the users through a series of data processing and query operations.
It should be further noted that, in general, the initial data entry is considered as a data entry behavior, and the subsequent updated entry is considered as a data access behavior, however, the invention helps to realize the materialization, scientization and refinement of the data entry value assessment by considering the life cycle, the integrity and the service value of the data more comprehensively, and regards the subsequent updated data entry as a data entry behavior, so that the data entry information is the relevant parameters of each data entry.
Specifically, the analyzing the input value coefficient of each data aiming at the target enterprise database comprises the following steps: setting the input authority influence weight corresponding to the user identity authority level of each input in the input information of each data, and taking the input authority influence weight as the identity value influence weight delta of each input of each data in the target enterprise database ij Where i is the number of each data in the target enterprise database, i=1, 2, …, a, j is the number of each entry, j=1, 2, …, b.
According to the stored data table numbers recorded in each time in the recording information of each data, combining the stored data table numbers corresponding to various data services of the target enterprise stored in the WEB cloud to obtain the data service type corresponding to the target enterprise of the stored data table numbers recorded in each time of each data, and further setting the service value influence weight mu of each time of recording of each data in the target enterprise database ij
It should be noted that, the basis for setting the input permission influence weight through the user identity permission level is that the user identity permission level indicates the degree of the user's operation permission on the data in the database, and the permission level depends on the responsibility of the user and the operation to be performed. For example, for high-authority users such as administrators and data administrators, they can have complete access and control authority of the database, including operations of creating, deleting, modifying the data table structure, backing up and recovering the database, while for normal users, only operations of data query, adding, modifying, deleting, and the like can be performed, and operations of modifying and system setting cannot be performed on the structure of the database, then for high-authority users such as administrators and data administrators, the input authority of the high-authority users relative to the normal users has higher influence weight, and the weight difference depends on the gap between identity authorities and the like.
The basis for setting the business value influence weight of the data input through the data business type of the target enterprise is that in enterprise data management, different types of data business have different business values and influences on the enterprise, different data may have different business criticality in enterprise operation, certain data may be directly related to core business of the enterprise, other data may have more supporting properties, further, in order to improve business value of the enterprise, the business value of the enterprise needs to be improved by utilizing the business type corresponding to the data based on the different business values of the data, and the business type can serve as the basis for setting the weight, so that the input of key business data is ensured to have higher weight.
From the formulaAnd obtaining a basic input value index of each data in the target enterprise database.
Analyzing the input quality index beta of each data in the target enterprise database according to the data type, format and value of each data item input in each data input information i
According to the input date t of each input in the input information of each data ij The data is input each time and the input date of the previous input is differenced, and the interval days delta t of each data input each time is obtained ij And screening out the last input date t 'of each data' i In combination with the date of the day t 0 Calculating the input timeliness index phi of each data in the target enterprise database iWhere e is a natural constant and b is the number of data entries.
From the formulaAnd obtaining the input value coefficient of each data aiming at the target enterprise database.
Specifically, the analyzing the input quality index of each data in the target enterprise database includes: according to the data types, formats and values of all data items recorded in each time in the recording information of all data, the standard recording format, limited effective domain and missing display value of all data items of the target enterprise database stored in the WEB cloud are combined, all format error data items, all value error data items and all missing data items recorded in each time of all data are screened out, and the number m of data items recorded in each time of all data is counted ij Number of format error data items m' ij Number of valueerror data items m ij And number n of missing data items ij From the formulaAnd obtaining the input quality evaluation index of each data of the target enterprise database.
It should be noted that, the data item is a minimum unit of data, and may be a word, data or symbol, and the data type of the data item includes, but is not limited to, a text type, a numerical value type, a date type, a boolean value type, and the like.
The limited valid field of the data item is expressed as a reasonable value range of the data item, and illustratively, the valid field of the data item input with gender is only male or female, the valid field of the data item input with product value is only real number greater than or equal to 0, and the like.
Illustratively, the missing display values of the data items may be spaces, unknown values, null value symbols, and the like.
According to the embodiment of the invention, the basic input value index, the input quality index and the input timeliness index of each data in the target enterprise database are combined, the input value coefficient of each data for the target enterprise database is comprehensively analyzed, the value of the data is estimated from different angles, the one-sided estimation which only depends on a single index is avoided, the value contribution degree of each data to the target enterprise database is more comprehensively known, and the input importance of each data in the target enterprise database is more accurately judged.
S3, evaluating data access value: and acquiring access information of each data through an access terminal of the target enterprise database, and analyzing an access value coefficient of each data for the target enterprise database.
Specifically, the analyzing the access value coefficient of each data for the target enterprise database includes: comparing the terminal IP address of each access in the access information of each data with a target enterprise trust terminal IP address list stored in a WEB cloud, setting the address reliability of the data access to be sigma if the terminal IP address of the data access to be in the target enterprise trust terminal IP address list, otherwise setting the address reliability to be epsilon, and obtaining the address reliability d of each access of each data in a target enterprise database iw ,d iw σ or ε, σ > ε, where w is the number of each visit, w=1, 2, …, c.
Acquiring the identity value influence weight delta 'of each access of each data in the target enterprise database according to the user identity authority level of each access in the access information of each data' iw
It should be noted that, the identity value influence weight of each access of each data in the target enterprise database is consistent with the identity value influence weight obtaining method of each entry of each data in the target enterprise database, and no description is repeated here.
From the formulaObtaining basic access value of each data in target enterprise databaseThe value index.
According to the access date of each access in the access information of each dataScreening the last access date t' of each data iw Calculating access timeliness index of each data in target enterprise database> Wherein->The date of the (w+1) -th access to the (i) -th data in the target enterprise database,/-th data in the target enterprise database>And c is the number of data accesses, wherein the threshold value is a preset threshold value of the number of days of reasonable interval data access.
According to the access time delta T of each access in the access information of each data iw Type of execution operation and number of execution data items v iw Acquiring the execution depth theta of each access of each data according to the execution depth corresponding to the execution operation of each data stored in the WEB cloud iw Calculating access degree index k of each data in target enterprise database i
Calculating access value coefficient FW of each data for target enterprise database i
According to the embodiment of the invention, the access value coefficient of each data in the target enterprise database is comprehensively analyzed from three layers of the basic access value index, the input timeliness index and the access degree index of each data in the target enterprise database, the access value of each data in the target enterprise database is deeply inspected by adopting a quantitative evaluation mode with strong universality and wide expansibility, the objectivity and the accuracy of an evaluation result are improved, and further, the data support is provided for the subsequent calculation of the comprehensive value score of the data.
S4, evaluating the data association value: and analyzing the association value coefficient of each data aiming at the target enterprise database according to the input information and the access information of each data in the target enterprise database.
Specifically, the analyzing the association value coefficient of each data for the target enterprise database includes: and extracting the stored data table of each data which is recorded last time according to the stored data table of each record in the record information of each data, and recording the stored data table as a reference stored data table of each data.
And counting the number of recorded data of the reference storage data table of certain data aiming at the reference storage data table of the certain data, taking the difference value between the recorded data and 1 as the first-order storage association data density of the data, and calculating the first-order storage association index y of the data by combining the association depth between the shared storage data table data stored in the WEB cloud.
And further acquiring each external key data table corresponding to the reference storage data table of the data, counting the number of the recorded data of each external key data table, accumulating, taking the accumulated result as the second-order storage association data density of the data, combining the association depth between the data of the main external key storage data table stored in the WEB cloud, and calculating the second-order storage association index r of the data.
Further acquiring each storage data table using the same index with the reference storage data table of the data, recording the storage data table as each index association data table of the reference storage data table of the data, counting the recorded data quantity of each index association data table, accumulating, taking the accumulated result as the third-order storage association data density of the data, and calculating the third-order storage association index s of the data by combining the association depth between the data of the same index storage data table stored in the WEB cloud.
The first-order storage association index of the data is obtained by multiplying the first-order storage association data density of the data by the association depth between the data in the shared storage data table.
The second order stored association index of the data is obtained by multiplying the second order stored association data density of the data by the association depth between the data of the primary foreign key stored data table.
The third-order storage association index of the data is obtained by multiplying the third-order storage association data density of the data by the association depth between data of the same index storage data table.
Obtaining a storage association value coefficient of the data according to the formula x=y+r+s, and further obtaining the storage association value coefficient x of each data i
According to the embodiment of the invention, the analysis of the first-order storage association index, the second-order storage association index and the third-order storage association index of the data is respectively developed aiming at the data sharing relationship, the main foreign key relationship and the same index relationship of the data reference storage data table, the association between the data is finely depicted on the data storage level, the storage relationship network between the data is comprehensively known, and more accurate basis is provided for subsequent data value evaluation and data cleaning.
Specifically, the analyzing the association value coefficient of each data for the target enterprise database further includes: extracting access paths of each access in access information of each data, carrying out summarization analysis on the access paths of each access of each data, counting the occurrence times of each related access component in each data summarization access path, recording each related access component with the occurrence times larger than 1 in a certain data summarization access path as each conventional access component of the data, integrating and generating a conventional access component sequence of the data, further acquiring the conventional access component sequence of each data, comparing the conventional access component sequences one by one, acquiring the similarity of the conventional access component sequences among the data, and accordingly acquiring each access related data of each data and the corresponding access related depth eta of each access related data of each data il L is the number of each access related data, l=1, 2, …, z, and the access related value coefficient f of each data is calculated iWherein z isThe associated data amount is accessed.
It should be noted that a data access path refers to a series of steps, routes, or access components for accessing and operating data in a computer system or network, and an access component refers to an access component that appears in the data access path.
It should also be noted that, the process of obtaining the sequence similarity of the conventional access components between the above data is as follows: comparing the data A with the conventional access component sequences of the rest data except the data A in the target enterprise database one by one, if the conventional access component sequences of the data A and the conventional access component sequences of the data B have the same conventional access component, counting the same conventional access component number between the data A and the data B, taking the ratio of the conventional access component sequences to the conventional access component number of the data A as the conventional access component sequence similarity between the data A and the data B, and further obtaining the conventional access component sequence similarity between the data.
It should be further noted that, the process of acquiring each access association data of each data and the corresponding access association depth thereof is as follows: if the sequence similarity of the conventional access component between the data A and the data B is larger than or equal to a preset access component sequence similarity reasonable threshold, taking the data B as access association data of the data A, taking the sequence similarity of the conventional access component between the data A and the data B as access association depth of the data B and the data A, and further obtaining each access association data of each data and the corresponding access association depth.
According to the embodiment of the invention, the access paths of all data accesses are summarized and analyzed, the conventional access component sequences of all data are integrated and generated, the similarity of the conventional access component sequences among all data is obtained, the access related data of all data and the corresponding access related depth are obtained according to the similarity, the access related value of all data is quantized more accurately, the hidden data access relationship can be identified, and more comprehensive and deep data insight is provided for enterprises.
And then is represented by the formulaAnd obtaining the association value coefficient of each data aiming at the target enterprise database.
S5, calculating the score of the comprehensive value of the data: based on the input value coefficient, the access value coefficient and the associated value coefficient of each data aiming at the target enterprise database, calculating the comprehensive value score of each data aiming at the target enterprise database, screening each data lower than the preset comprehensive value score, and generating a low-value data sequence of the target enterprise database.
Specifically, the calculation formula of the comprehensive value score of each data for the target enterprise database is as follows:
s6, low-value data sequence feedback: and feeding the generated low-value data sequence back to an administrator terminal of the target enterprise database.
According to the embodiment of the invention, based on the input value coefficient, the access value coefficient and the association value coefficient of each data aiming at the target enterprise database, the comprehensive value score of each data aiming at the target enterprise database is calculated, the low-value data sequence of the target enterprise database is generated and fed back to the manager terminal, the defect that the conventional method lacks detailed work in the aspect of periodically checking the data value aiming at the enterprise database is perfected, the data which are not accessed, old or weak in association can be timely cleaned, the occupation of the storage space of the database is reduced, the data retrieval efficiency is optimized, the adverse effect of the data on the whole data quality is avoided, and the high-value data in the enterprise database is effectively reserved, so that the demands of enterprises in the aspects of business decision and data analysis are met.
Example 2
Referring to fig. 2, the present invention provides a full-period data management system, which includes: the system comprises a data value examination instruction issuing module, a data input value evaluation module, a data access value evaluation module, a data correlation value evaluation module, a data comprehensive value score calculation module, a low-value data sequence feedback module and a cloud database.
The cloud database is respectively connected with the data entry value evaluation module, the data access value evaluation module and the data correlation value evaluation module.
And the data value examination instruction issuing module is used for issuing the data value examination instruction to the target enterprise database at the regular data value examination time point.
The data input value evaluation module is used for receiving the data value examination command, acquiring input information of each data through an input end of the target enterprise database, and analyzing input value coefficients of each data aiming at the target enterprise database.
The data access value evaluation module is used for acquiring access information of each data through an access terminal of the target enterprise database and analyzing the access value coefficient of each data for the target enterprise database.
The data association value evaluation module is used for analyzing the association value coefficient of each data aiming at the target enterprise database according to the input information and the access information of each data in the target enterprise database.
The data comprehensive value score calculation module is used for calculating the comprehensive value score of each data for the target enterprise database based on the input value coefficient, the access value coefficient and the association value coefficient of each data for the target enterprise database, screening each data lower than the preset comprehensive value score and generating a low-value data sequence of the target enterprise database.
And the low-value data sequence feedback module is used for feeding the generated low-value data sequence back to the manager terminal of the target enterprise database.
The cloud database is used for storing the trust terminal IP address list of the target enterprise and the stored data list numbers corresponding to various data services, storing the execution depths corresponding to various data execution operations, and storing the association depths among the shared stored data list data, the main external key stored data list data and the same index stored data list data.
Example 3
The present invention proposes a storage medium storing one or more programs executable by one or more processors to implement the steps in the full-cycle data governance method of the present invention.
The foregoing is merely illustrative and explanatory of the principles of the invention, as various modifications and additions may be made to the specific embodiments described, or similar arrangements may be substituted by those skilled in the art, without departing from the principles of the invention or beyond the scope of the invention as defined in the appended claims.

Claims (10)

1. A full-cycle data governance method, comprising:
S1, issuing a data value examination instruction: issuing a data value review instruction to a target enterprise database at a periodic data value review time point;
s2, evaluating data input value: receiving a data value examination order, acquiring the input information of each data through an input end of a target enterprise database, and analyzing the input value coefficient of each data aiming at the target enterprise database;
s3, evaluating data access value: the method comprises the steps of obtaining access information of each data through an access terminal of a target enterprise database, and analyzing an access value coefficient of each data for the target enterprise database;
s4, evaluating the data association value: analyzing the association value coefficient of each data aiming at the target enterprise database according to the input information and the access information of each data in the target enterprise database;
s5, calculating the score of the comprehensive value of the data: calculating the comprehensive value score of each data aiming at the target enterprise database based on the input value coefficient, the access value coefficient and the associated value coefficient of each data aiming at the target enterprise database, screening each data lower than the preset comprehensive value score, and generating a low-value data sequence of the target enterprise database;
s6, low-value data sequence feedback: and feeding the generated low-value data sequence back to an administrator terminal of the target enterprise database.
2. The method for full-cycle data management according to claim 1, wherein: the input information comprises user identity authority level, stored data table and number, input date, data type, format and value of each data item;
the access information comprises the IP address of the terminal, the user identity authority level, the access path, the access date, the access time length, the execution operation type and the execution data item number of each access.
3. A full-cycle data governance method according to claim 2 and wherein: analyzing the input value coefficient of each data aiming at the target enterprise database comprises the following steps: setting the input authority influence weight corresponding to the user identity authority level of each input in the input information of each data, and taking the input authority influence weight as the identity value influence weight delta of each input of each data in the target enterprise database ij Where i is the number of each data in the target enterprise database, i=1, 2,., a, j is the number of each entry, j=1, 2,., b;
according to the stored data table numbers recorded in each time in the recording information of each data, combining the stored data table numbers corresponding to various data services of the target enterprise stored in the WEB cloud to obtain the data service type corresponding to the target enterprise of the stored data table numbers recorded in each time of each data, and further setting the service value influence weight mu of each time of recording of each data in the target enterprise database ij
From the formulaObtaining a basic input value index of each data in a target enterprise database;
analyzing the input quality index beta of each data in the target enterprise database according to the data type, format and value of each data item input in each data input information i
According to the input date t of each input in the input information of each data ij The data is input each time and the input date of the previous input is differenced, and the interval days delta t of each data input each time is obtained ij And screening out the last input date t 'of each data' i In combination with the date of the day t 0 Calculating the input timeliness index phi of each data in the target enterprise database iWherein e is a natural constant, b is the number of data entry times;
from the formulaAnd obtaining the input value coefficient of each data aiming at the target enterprise database.
4. A full cycle data governance method according to claim 3 and comprising: the analysis target enterprise database comprises the following steps of: according to the data types, formats and values of all data items recorded in each time in the recording information of all data, the standard recording format, limited effective domain and missing display value of all data items of the target enterprise database stored in the WEB cloud are combined, all format error data items, all value error data items and all missing data items recorded in each time of all data are screened out, and the number m of data' "items recorded in each time of all data is counted ij Number of format error data items m ij Number of valueerror data items m ij And number n of missing data items ij From the formulaAnd obtaining the input quality evaluation index of each data of the target enterprise database.
5. A full cycle data governance method according to claim 3 and comprising: the analyzing the access value coefficient of each data aiming at the target enterprise database comprises the following steps: comparing the terminal IP address of each access in the access information of each data with a target enterprise trust terminal IP address list stored in a WEB cloud, setting the address reliability of the data access to be sigma if the terminal IP address of the data access to be in the target enterprise trust terminal IP address list, otherwise setting the address reliability to be epsilon, and obtaining the address reliability d of each access of each data in a target enterprise database iw ,d iw =σ or ε, σ > ε, where w is the number of each visit, w=1, 2,. -%, c;
acquiring the identity value influence weight delta 'of each access of each data in the target enterprise database according to the user identity authority level of each access in the access information of each data' iw
From the formulaObtaining a basic access value index of each data in a target enterprise database;
According to the access date of each access in the access information of each dataScreening the last access date t' of each data iw Calculating access timeliness index of each data in target enterprise database> Wherein->For target enterprise dataThe access date of the (w+1) -th access of the (i) -th data in the library,/th access>A preset data access reasonable interval day threshold value is adopted, and c is the number of data access times;
according to the access time delta T of each access in the access information of each data iw Type of execution operation and number of execution data items v iw Acquiring the execution depth theta of each access of each data according to the execution depth corresponding to the execution operation of each data stored in the WEB cloud iw Calculating access degree index k of each data in target enterprise database i
Calculating access value coefficient FW of each data for target enterprise database i
6. The method for full-cycle data management according to claim 5, wherein: the analyzing the associated value coefficient of each data aiming at the target enterprise database comprises the following steps: extracting a stored data table which is recorded last time for each data according to the stored data table recorded each time in the recorded information of each data, and recording the stored data table as a reference stored data table for each data;
Counting the number of recorded data of a reference storage data table of certain data aiming at the reference storage data table of the data, taking the difference value between the recorded data and 1 as the first-order storage association data density of the data, and calculating the first-order storage association index y of the data by combining the association depth between the shared storage data table data stored in the WEB cloud;
further acquiring each external key data table corresponding to the reference storage data table of the data, counting the number of the recorded data of each external key data table, accumulating, taking the accumulated result as the second-order storage association data density of the data, combining the association depth between the data of the main external key storage data table stored in the WEB cloud, and calculating the second-order storage association index r of the data;
further acquiring each storage data table using the same index with the reference storage data table of the data, recording the storage data table as each index association data table of the reference storage data table of the data, counting the recorded data quantity of each index association data table, accumulating, taking the accumulated result as the third-order storage association data density of the data, and calculating the third-order storage association index s of the data by combining the association depth between the data of the same index storage data table stored in the WEB cloud;
Obtaining a storage association value coefficient of the data according to the formula x=y+r+s, and further obtaining the storage association value coefficient x of each data i
7. The method for full-cycle data management as defined in claim 6, wherein: the analyzing the associated value coefficient of each data aiming at the target enterprise database further comprises: extracting access paths of each access in access information of each data, carrying out summarization analysis on the access paths of each access of each data, counting the occurrence times of each related access component in each data summarization access path, recording each related access component with the occurrence times larger than 1 in a certain data summarization access path as each conventional access component of the data, integrating and generating a conventional access component sequence of the data, further acquiring the conventional access component sequence of each data, comparing the conventional access component sequences one by one, acquiring the similarity of the conventional access component sequences among the data, and accordingly acquiring each access related data of each data and the corresponding access related depth eta of each access related data of each data il L is the number of each access-related data, l=1, 2, & gt, z, and the access-related value coefficient f of each data is calculated iWherein z is the number of access-related data;
and then is represented by the formulaAnd obtaining the association value coefficient of each data aiming at the target enterprise database.
8. The method for full-cycle data management as defined in claim 7, wherein: the calculation formula of the comprehensive value score of each data aiming at the target enterprise database is as follows:pi is 180 deg..
9. A full cycle data management system, comprising:
the data value examination instruction issuing module is used for issuing a data value examination instruction to the target enterprise database at a regular data value examination time point;
the data input value evaluation module is used for receiving the data value examination instruction, acquiring input information of each data through an input end of the target enterprise database, and analyzing input value coefficients of each data aiming at the target enterprise database;
the data access value evaluation module is used for acquiring access information of each data through an access terminal of the target enterprise database and analyzing an access value coefficient of each data for the target enterprise database;
the data association value evaluation module is used for analyzing the association value coefficient of each data aiming at the target enterprise database according to the input information and the access information of each data in the target enterprise database;
the data comprehensive value score calculation module is used for calculating the comprehensive value score of each data for the target enterprise database based on the input value coefficient, the access value coefficient and the association value coefficient of each data for the target enterprise database, screening each data lower than the preset comprehensive value score and generating a low-value data sequence of the target enterprise database;
The low-value data sequence feedback module is used for feeding back the generated low-value data sequence to an administrator terminal of the target enterprise database;
the cloud database is used for storing the trust terminal IP address list of the target enterprise and the stored data list numbers corresponding to various data services, storing the execution depths corresponding to various data execution operations, and storing the association depths among the shared stored data list data, the main external key stored data list data and the same index stored data list data.
10. A storage medium, characterized by: the storage medium stores one or more programs executable by one or more processors to implement the steps in the full cycle data governance method as claimed in any of claims 1 to 8.
CN202311755753.9A 2023-12-19 2023-12-19 Full-period data management method, system and storage medium Pending CN117743310A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311755753.9A CN117743310A (en) 2023-12-19 2023-12-19 Full-period data management method, system and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311755753.9A CN117743310A (en) 2023-12-19 2023-12-19 Full-period data management method, system and storage medium

Publications (1)

Publication Number Publication Date
CN117743310A true CN117743310A (en) 2024-03-22

Family

ID=90250424

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311755753.9A Pending CN117743310A (en) 2023-12-19 2023-12-19 Full-period data management method, system and storage medium

Country Status (1)

Country Link
CN (1) CN117743310A (en)

Citations (24)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104965861A (en) * 2015-06-03 2015-10-07 上海新炬网络信息技术有限公司 Monitoring device for data access
US20160048543A1 (en) * 2014-08-13 2016-02-18 Wipro Limited System and method for determining governance effectiveness of knowledge management system
CN105550511A (en) * 2015-12-11 2016-05-04 北京锐软科技股份有限公司 Data quality evaluation system and method based on data verification technique
KR20180106533A (en) * 2017-03-20 2018-10-01 장경애 Data Value evaluation system through detailed analysis of data governance data
CN109992576A (en) * 2019-03-01 2019-07-09 苏州龙石信息科技有限公司 A kind of government data quality evaluation and abnormal data recovery technique based on big data technology
JP2020035440A (en) * 2018-08-22 2020-03-05 株式会社ビジネスインテリジェンス Business card related information providing method, enterprise value determination method, business card information providing method, business card value determination method, business card related information providing device, enterprise value determination device, business card information providing device, business card value determination device and computer program
CN112306406A (en) * 2020-10-22 2021-02-02 济南华芯算古信息科技有限公司 Intelligent storage automatic grading method and device, storage medium and electronic equipment
US20210035124A1 (en) * 2019-08-02 2021-02-04 Dell Products, Lp System and method for management of sensor data based on high-value data model
KR102213589B1 (en) * 2020-10-07 2021-02-08 주식회사 한국인증융합연구원 Technology value evaluation and enterprise value evaluation platform provision system, and method thereof
CN112380190A (en) * 2020-11-27 2021-02-19 北京三维天地科技股份有限公司 Data quality health degree analysis method and system based on multidimensional analysis technology
CN112433888A (en) * 2020-12-02 2021-03-02 网易(杭州)网络有限公司 Data processing method and device, storage medium and electronic equipment
CN112529449A (en) * 2020-12-20 2021-03-19 大唐互联科技(武汉)有限公司 Supplier quality evaluation method and system based on big data
CN113360548A (en) * 2021-06-29 2021-09-07 平安普惠企业管理有限公司 Data processing method, device, equipment and medium based on data asset analysis
CN114637739A (en) * 2022-03-22 2022-06-17 平安国际融资租赁有限公司 Database management and control method, system, computer equipment and computer storage medium
CN114722434A (en) * 2022-06-09 2022-07-08 江苏荣泽信息科技股份有限公司 Block chain-based ledger data control method and device
CN114997812A (en) * 2022-04-24 2022-09-02 上海蔚屹茗达信息科技有限公司 Human resource comprehensive management big data supervision service system
CN115409419A (en) * 2022-09-26 2022-11-29 河南星环众志信息科技有限公司 Value evaluation method and device of business data, electronic equipment and storage medium
CN115422173A (en) * 2022-08-17 2022-12-02 天元大数据信用管理有限公司 Data management method and system in financial credit field
CN116307919A (en) * 2023-03-30 2023-06-23 四川联欣科技服务有限公司 Enterprise value evaluation method based on fuzzy comprehensive evaluation
CN116450757A (en) * 2023-06-19 2023-07-18 深圳索信达数据技术有限公司 Method, device, equipment and storage medium for determining evaluation index of data asset
CN116467292A (en) * 2023-03-17 2023-07-21 上海市大数据股份有限公司 Data quality assessment method and system based on big data analysis
CN116521660A (en) * 2023-04-18 2023-08-01 万隆(上海)资产评估有限公司 Enterprise value assessment method and related device based on big data
CN116862202A (en) * 2023-08-28 2023-10-10 泉州大数据运营服务有限公司 Enterprise management data management method based on big data analysis
CN117056323A (en) * 2023-07-26 2023-11-14 北京计算机技术及应用研究所 Data quality assessment method based on data access behaviors

Patent Citations (24)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160048543A1 (en) * 2014-08-13 2016-02-18 Wipro Limited System and method for determining governance effectiveness of knowledge management system
CN104965861A (en) * 2015-06-03 2015-10-07 上海新炬网络信息技术有限公司 Monitoring device for data access
CN105550511A (en) * 2015-12-11 2016-05-04 北京锐软科技股份有限公司 Data quality evaluation system and method based on data verification technique
KR20180106533A (en) * 2017-03-20 2018-10-01 장경애 Data Value evaluation system through detailed analysis of data governance data
JP2020035440A (en) * 2018-08-22 2020-03-05 株式会社ビジネスインテリジェンス Business card related information providing method, enterprise value determination method, business card information providing method, business card value determination method, business card related information providing device, enterprise value determination device, business card information providing device, business card value determination device and computer program
CN109992576A (en) * 2019-03-01 2019-07-09 苏州龙石信息科技有限公司 A kind of government data quality evaluation and abnormal data recovery technique based on big data technology
US20210035124A1 (en) * 2019-08-02 2021-02-04 Dell Products, Lp System and method for management of sensor data based on high-value data model
KR102213589B1 (en) * 2020-10-07 2021-02-08 주식회사 한국인증융합연구원 Technology value evaluation and enterprise value evaluation platform provision system, and method thereof
CN112306406A (en) * 2020-10-22 2021-02-02 济南华芯算古信息科技有限公司 Intelligent storage automatic grading method and device, storage medium and electronic equipment
CN112380190A (en) * 2020-11-27 2021-02-19 北京三维天地科技股份有限公司 Data quality health degree analysis method and system based on multidimensional analysis technology
CN112433888A (en) * 2020-12-02 2021-03-02 网易(杭州)网络有限公司 Data processing method and device, storage medium and electronic equipment
CN112529449A (en) * 2020-12-20 2021-03-19 大唐互联科技(武汉)有限公司 Supplier quality evaluation method and system based on big data
CN113360548A (en) * 2021-06-29 2021-09-07 平安普惠企业管理有限公司 Data processing method, device, equipment and medium based on data asset analysis
CN114637739A (en) * 2022-03-22 2022-06-17 平安国际融资租赁有限公司 Database management and control method, system, computer equipment and computer storage medium
CN114997812A (en) * 2022-04-24 2022-09-02 上海蔚屹茗达信息科技有限公司 Human resource comprehensive management big data supervision service system
CN114722434A (en) * 2022-06-09 2022-07-08 江苏荣泽信息科技股份有限公司 Block chain-based ledger data control method and device
CN115422173A (en) * 2022-08-17 2022-12-02 天元大数据信用管理有限公司 Data management method and system in financial credit field
CN115409419A (en) * 2022-09-26 2022-11-29 河南星环众志信息科技有限公司 Value evaluation method and device of business data, electronic equipment and storage medium
CN116467292A (en) * 2023-03-17 2023-07-21 上海市大数据股份有限公司 Data quality assessment method and system based on big data analysis
CN116307919A (en) * 2023-03-30 2023-06-23 四川联欣科技服务有限公司 Enterprise value evaluation method based on fuzzy comprehensive evaluation
CN116521660A (en) * 2023-04-18 2023-08-01 万隆(上海)资产评估有限公司 Enterprise value assessment method and related device based on big data
CN116450757A (en) * 2023-06-19 2023-07-18 深圳索信达数据技术有限公司 Method, device, equipment and storage medium for determining evaluation index of data asset
CN117056323A (en) * 2023-07-26 2023-11-14 北京计算机技术及应用研究所 Data quality assessment method based on data access behaviors
CN116862202A (en) * 2023-08-28 2023-10-10 泉州大数据运营服务有限公司 Enterprise management data management method based on big data analysis

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
KERUI HU, LEMIAO QIU, SHUYOU ZHANG, ZILI WANG, NAIYU FANG: "An incremental rare association rule mining approach with a life cycle tree structure considering time-sensitive data", SPRING NATURE, 11 May 2023 (2023-05-11), pages 10800 - 10824 *
徐岚珊;郭树行;: "面向数据治理的数据资产质量评估模型研究", 科技资讯, no. 03, 23 January 2020 (2020-01-23), pages 24 - 25 *
杭聪;黄连月;黄鑫;: "数据库SQL审查与性能优化技术研究与应用", 电力信息与通信技术, no. 04, 15 April 2016 (2016-04-15), pages 151 - 158 *
高方科: "D集团管理支撑***(MSS)数据仓库数据质量评估和控制研究", 中国知网, 1 June 2023 (2023-06-01), pages 30 - 42 *

Similar Documents

Publication Publication Date Title
Gil‐Alana Fractional integration and structural breaks at unknown periods of time
Stvilia et al. A framework for information quality assessment
Cebrián et al. Is Google Trends a quality data source?
US7668789B1 (en) Comparing distributions of cases over groups of categories
KR101369020B1 (en) Compensating for unbalanced hierarchies when generating olap queries from report specifications
US10417265B2 (en) High performance parallel indexing for forensics and electronic discovery
Ju et al. A novel method of interestingness measures for association rules mining based on profit
JP2009528639A (en) Social analysis system and method for analyzing conversations in social media
CN114428822B (en) Data processing method and device, electronic equipment and storage medium
CN112948397A (en) Data processing system, method, device and storage medium
CN108108477B (en) A kind of the KPI system and Rights Management System of linkage
Zhou et al. A comprehensive process similarity measure based on models and logs
US11036701B2 (en) Data sampling in a storage system
Truică et al. TextBenDS: a generic textual data benchmark for distributed systems
Ding et al. Efficient currency determination algorithms for dynamic data
Ramaciotti Morales et al. Role of the website structure in the diversity of browsing behaviors
Sayal Detecting time correlations in time-series data streams
Anas et al. Mean Estimators Using Robust Quantile Regression and L‐Moments’ Characteristics for Complete and Partial Auxiliary Information
CN115982429B (en) Knowledge management method and system based on flow control
CN117743310A (en) Full-period data management method, system and storage medium
Toivonen Big data quality challenges in the context of business analytics
Huang et al. From zero to one: A perspective on citing
KR20160139897A (en) Method and system for providing evaluation service of enterprise value by automated network deduction
US7756854B2 (en) Minimization of calculation retrieval in a multidimensional database
Simonov et al. What Drives Demand for Government-Controlled News? Evidence from Russia

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination