CN115310132B

CN115310132B - Data identity identification and data fragmentation method and device

Info

Publication number: CN115310132B
Application number: CN202211027229.5A
Authority: CN
Inventors: 龚䶮; 张微; 陈晓; 王志轩; 李志男
Original assignee: Huajiao Lianchuang Xiamen Technology Co ltd; Beijing Huayixin Technology Co ltd
Current assignee: Beijing Huayixin Technology Co ltd; Lin Shaowei
Priority date: 2022-08-25
Filing date: 2022-08-25
Publication date: 2023-04-25
Anticipated expiration: 2042-08-25
Also published as: CN115310132A

Abstract

The invention discloses a method and a device for data identity identification and data fragmentation, which take the association of data owner identity data and a data structure as a core, fragment the association of the data owner identity data and the data structure, fragment the data owner identity private key and the data value associated with the association fragment, and store the association table of the data owner identity private key fragments, the data value fragments and the association fragments in an offline and cold way, thereby isolating the data, the data owners and other third parties and ensuring the safety privacy of the data and the data owners.

Description

Data identity identification and data fragmentation method and device

Technical Field

The invention relates to the technical field of data security and user privacy security, in particular to a method and a device for data identity identification and data fragmentation.

Background

Data security is essentially two dimensions, one being the non-tamperability of the data; another is the privacy security of the data. Blockchains naturally solve the problem of tamper resistance, but cannot solve the problem of privacy security, or need to be used together with security methods, such as asymmetric encryption, secure random numbers, etc. The combination of the blockchain and the password method can play an effective role in solving the problems of asymmetric data information, single data dimension, low cost of obtaining effective data, data security, privacy protection and the like, can realize the authority management of data obtaining, and provides effective privacy protection and data validation for the data of various character users. Thus, the blockchain+password approach is the most common, with some success, but the approach still has some limitations.

First, although most of the method schemes use cryptography, experts still point to: all cryptographic-based schemes introduce a significant computational complexity to a certain extent, and such computational complexity is difficult to eliminate with hardware improvements, and in the long run, cryptographic methods are always slower in efficiency than other schemes that are not cryptographic-based. There are many methods for decrypting data, which emphasize the transmission of data, the storage of data, and how to maximize the decryption time of data during encryption, and essentially all require users to send data to third parties, which are mutually contradictory to the security and privacy requirements of data. The password method with high security level is slower in application efficiency and larger in calculation amount.

Secondly, the application of the emerging methods results in increasingly blurred boundaries of "personal data". Currently, the internet aggregates massive amounts of user data, but its trust crisis has not been able to afford trust rebuilding of the data carrier. The problem of establishing data trust is that the problem of data privacy safety is solved, and the problem that a large data platform can randomly trade after data uploading is solved, so that the benefit of all data parties in the trade cannot be guaranteed. After encryption, the data must be processed as a whole. However, the data as a whole is possibly guessed as a source, and the security privacy of the data and the data owners cannot be completely guaranteed. Thus, the blockchain+cipher approach also results in the problem of "unwilling data sharing between data owners and large data platforms, making large amounts of data islands emerge".

Thirdly, the new method application of big data, blockchain, unmanned, facial recognition, wearable equipment, intelligent home, medical monitoring equipment, behavioral biological data and the like is developed continuously, new fields of data driving innovation are developed continuously, and the data scale is enlarged continuously. In addition, the scenes of industrial data flow, such as major projects or key scientific and technological achievement evaluation, industrial internet platform, enterprise core data processing and processing, and the like, data cross-border flow and the like also continuously provide new requirements for enhancing data protection and releasing data value. In addition, new demands for law, ethics and governance, including government governance, medical, credit and other data fields involving personal privacy, have partially exceeded the coverage of traditional information protection mechanisms, but at the same time present new challenges to the availability of data streams. However, the blockchain+password method is unfavorable for data sharing and security privacy, so that large-scale data value mining is difficult to ensure, data values are difficult to fully release, and data circulation based on the data values is difficult to realize.

In order to solve the above problems, the present invention provides a method and a device for data identity identification and data fragmentation, which uses the association of data owner identity data and data structure as a core, fragments the association of data owner identity data and data structure, fragments the data owner identity private key and data value associated with the association fragments, and stores the association table of the data owner identity private key fragments, the data value fragments and the association fragments offline, thereby isolating the data, the data owners and other third parties and ensuring the security privacy of the data and the data owners. In the method, the related data fragmentation and the identity private key fragmentation of the data owner are different from the current other data fragmentation and user key fragmentation, and the difference is that the traditional data fragmentation is used for guaranteeing the safety of data and storage thereof and preventing data leakage; the traditional user key fragmentation aims at protecting the user password and the backup safety thereof and preventing the user password from being decrypted; in the invention, the data fragmentation and the data owner identity private key fragmentation are mainly aimed at isolating data from the data owner, and prevent a third party from directly searching or indirectly deducing the data owner through the data. The method spreads around the associativity fragmentation of the data structure and the data owner identity data, thereby fragmenting the data value and the data owner identity privacy key to sever the associativity of the data and the data owner.

Disclosure of Invention

The invention aims to provide a data identity identification and data fragmentation method and device, which fundamentally solve the problem of ' safety privacy of data and data owners ', in particular the problem that a third party directly searches or indirectly speculates the data owners through data ', further solve the problem of ' unwilling data sharing between the data owners and a large data platform, enable a large amount of data islands to emerge ', realize large-scale data value mining on the basis, fully release data value, and further realize data circulation on the basis of the data value.

The invention adopts the following method scheme: the invention provides a data identity identification and data fragmentation method, which is a method for separating data, data owners and other third parties and ensuring the safety privacy of the data and the data owners by taking the association of cutting off the data owner identity data and a data structure as a core, fragmenting the association of the data owner identity data and the data structure, fragmenting the data owner identity private key and the data value associated with the association fragments, and storing the association table of the data owner identity private key fragments, the data value fragments and the association fragments in an offline cold manner.

In order to achieve the above object, a first aspect of the embodiment of the present invention discloses a technical solution:

the data identification and data fragmentation method specifically comprises the following steps:

s1, carrying out fragmentation processing on the association of a data structure and identity data of a data owner;

s2, according to the data structure and the associated fragmentation result of the data owner identity data, carrying out fragmentation processing on the data owner identity private key associated with the associated fragments;

s3, randomly associating the data owner identity private key fragments with the data owner identity data fragments, and generating an association table A from the random association result;

s4, carrying out fragmentation processing on the data value associated with the association fragment according to the data structure and the association fragmentation result of the identity data of the data owner;

s5, randomly associating the data value fragments with the data structure fragments, and generating an association table B by using a random association result;

s6, the association table A and the association table B are stored in an offline cold storage mode through a storage medium, and are only limited to online calling verification and updating when the identity of a data owner needs to be locked, and online calling downloading is not allowed.

Preferably, the step S1 specifically includes:

s1.1, establishing a data structure set and a preliminary association set of a data owner identity data set according to data uploading behaviors of the data owner;

s1.2, extracting association elements of a data structure and data owner identity data in the preliminary association set, and further constructing an association set of the data structure and the data owner identity data;

s1.3, setting an allowable range for the association degree value of the data structure and the identity data of the data owner;

s1.4, mining a data structure and a correlation degree value of identity data of a data owner for the correlation set;

s1.5, for the association degree value exceeding the allowable range, carrying out fragmentation processing on the relation between the data structure and the identity data of the data owner, and establishing an association set of the data structure and the identity data of the data owner which are required to be fragmented correspondingly;

s1.6, carrying out fragmentation processing on the relevance set.

Preferably, the step S2 specifically includes:

s2.1, determining the fragmentation number of the identity data of the owner in the associated fragmentation according to the data structure and the associated fragmentation result of the identity data of the owner;

s2.2, determining the fragmentation number of the data owner identity private key, wherein the requirement must be greater than the fragmentation number of the data owner identity data;

s2.3, setting a lower limit of the fragment length of the identity private key of the data owner;

s2.4, when the fragmentation number of the data owner identity private key is multiplied by the lower limit of the fragmentation length of the data owner identity private key to exceed the length of the data owner identity private key, determining the missing length of the data owner identity private key, and filling the missing length of the data owner identity private key with a space private key;

s2.5, according to the fragmentation number of the data owner identity private key and the fragmentation length of the data owner identity private key, carrying out fragmentation processing on the data owner identity private key.

Preferably, the step S4 specifically includes:

s4.1, determining the fragmentation number of the data structure in the associated fragmentation according to the data structure and the associated fragmentation result of the identity data of the data owner;

s4.2, determining the fragmentation number of the data value, wherein the requirement must be greater than the fragmentation number of the data structure;

s4.3, setting a data value length lower limit, wherein the requirement is that more than 2 minimum data units are required;

s4.4, when the number of fragmented data structures is multiplied by the lower limit of the length of the data value to exceed the length of the data structure, determining the missing length of the data value, and filling the missing length of the data value with a space private key;

s4.5, according to the data value fragmentation number and the data value length, carrying out fragmentation processing on the identity private key of the data owner.

Preferably, in the step S2.5, the data owner identity private key associated with the association shard is subjected to a sharding process, and a random method or an average field method may be adopted for dividing the data owner identity private key;

preferably, in step S4.5, the data value is marked, corresponding fragmentation processing is performed on the data value according to the marking result of the data value, and the marking result of the fragmentation processing of the corresponding data value is intersected to obtain a final fragmentation result of the data value;

preferably, the marking method is to perform three data value dimension marking on the data value according to time, space and path, and comprises the following steps;

the data value time dimension tag includes a data owner upload of the data value, a data structure fragmentation result, a data store, a data process, a data transaction, a time associated with the data owner;

the data value space dimension marker comprises a data structure and a data value;

the data value path dimension mark comprises a credibility level, a data processing path and a data transaction path.

The second aspect of the present invention discloses a data identity and data fragmentation device for data and data owner isolation, the device comprising:

a memory storing executable program code;

a processor coupled to the memory;

the processor calls the executable program codes stored in the memory to execute any one of the data identification and data fragmentation methods disclosed in the first aspect of the embodiment of the invention;

the third aspect of the present invention discloses a computer storage medium, where the computer storage medium stores computer instructions, where the computer instructions are used to execute any of the data identity and data fragmentation methods disclosed in the first aspect of the present invention when the computer instructions are called.

Compared with the prior art, the invention has the beneficial effects that:

(1) The data identity identification and data fragmentation method provided by the invention is superior to the traditional method of protecting the safety privacy by taking the user data as a whole and adopting a cryptography method in order to ensure the safety privacy of data owners and data. The key of the method for guaranteeing the safety privacy of the data and the data owners is to develop the associated fragmentation, the identity private key fragmentation and the data value fragmentation between the data structure and the data of the data owners by taking the isolated data and the data owners as main targets. The data and the identity of the data owner are fragmented before entering the platform or entering the data pool, so that the data is isolated from the data owner, the auxiliary large data platform solves anonymous data uploading of the data owner, ensures that the platform only knows the data and the user is unknown, ensures that the buyer does not know the data owner, and also ensures that any third party cannot directly search or indirectly estimate the data owner according to the data, thereby ensuring that the privacy of the data owner is not known by the platform and the buyer.

(2) The data identification and data fragmentation method provided by the invention ensures that the data of the data owner is not directly traded, but is put into an oversized data pool for processing, so that a data processing party cannot know the specific source of any data. And a more standardized and standardized space for freely processing the data is provided for a data processor, so that the data processor is promoted to process the processed data from the data value, and further, the data processing basic conditions for fully mining and releasing the data value are provided for a large data platform.

(3) The data identity identification and data fragmentation method provided by the invention promotes the data owners to be more willing to upload data on the basis of guaranteeing the safety privacy of the data owners and the data and facilitating the data value mining, is beneficial to the served big data platform to gather more user data, promotes the data processors and the big data platform to be more willing to develop data processing and service work, and can also support the safe shared data of the big data platform and other big data platforms so as to break the data islands among the big data platforms and among the data owners.

Drawings

FIG. 1 is a schematic diagram of a data identification and data fragmentation method;

FIG. 2 is a flow chart of a method of associated fragmentation of data structures and data owner identity data;

FIG. 3 is a flow chart of a method of fragmentation of data identities for data isolation from data owners;

FIG. 4 is a flow chart of a data value fragmentation method for data isolation from a data owner;

FIG. 5 is a flow chart of a method for fragmenting data values according to the principles of data value fragmentation processable processing criteria.

Detailed Description

The data identification and data fragmentation method provided by the invention are further described in detail below with reference to the accompanying drawings and specific embodiments.

The method is suitable for the fields of data application, research or transaction with certain requirements on data security and user privacy security. The data uploaded by the data owners comprise various types of data commonly known in society, such as digital products, data assets, data elements and the like. The method of the invention is used as a data fragmentation method, and requires that the data quantity is at least more than or equal to 2 basic quantity units.

For example, assume a large data platform, the data owner of which is labeled DOn (PS-ID), n=1, 2. Wherein n is the serial number of the data owner on the big data platform, PS is the identity private key of the data owner on the big data platform, and ID is the identity data set of the data owner.

Data on the large data platform is labeled ADm (DS-DV), m=1, 2. Where m is the sequential number of the data owner on the big data platform, DS is the data structure, DV is the data value.

In large data platforms that do not employ the present technology, the association of individual data owners with their data is explicit. The big data platform can determine the data owner to which the data belongs according to the association of the data in the database and the data owner. The association form of data and data owners in a database is various, in order to guarantee the safety and privacy of the data owners, a large data platform usually selects to hide the identity data of the data owners, and associates the identity private key of the data owners with the data, namely BR (DOn (PS), ADm (DS-DV)). Inevitably, this data owner-to-data association mode allows some third parties familiar with data (data processors, data users, large data platforms, data transaction service parties) to infer data owner identity data DOn (ID) from data ADm (DS-DV), and thus decipher data owner DOn (PS-ID), exposing the data and data owners completely, and destroying the security privacy of the data and data owners. Furthermore, this potential risk leads to a unwilling data sharing between data owners and large data platforms in the data application, research or transaction domain, which have certain requirements for data security and user privacy security, leading to the emergence of large amounts of data islands. Further, these fields cannot realize large-scale data circulation based on data value, cannot perform large-scale data value mining, and cannot fully release data value.

The present technology includes a data identity fragmentation method for data-to-data owner isolation and a data fragmentation method for data-to-data owner isolation. The principle is shown in fig. 1, wherein the data identity fragmentation method for isolating data from data owners refers to a data owner identity fragmentation method based on the data structure and the associated fragmentation of the data owner identity data. The data fragmentation method for isolating data from a data owner refers to a data value fragmentation method which meets the principle of data value fragmentation processable processing standards after the data structure and the associated fragmentation of the identity data of the data owner and the data value fragmentation. The data fragmentation method can ensure that the data is prevented from being divulged after being divided, ensure the privacy safety of the data, ensure the processing of the data and meet the requirement of the next data value mining.

The core of the method of the present technology is the associated fragmentation of data structure and data owner identity data, which refers to the fragmentation of associated BR (DOn (ID), ADm (DS)) of data structure ADm (DS) and data owner identity data DOn (ID), after which the third party (data processor, data user, large data platform, data transaction service party) cannot be reached based on data structure ADm (DS) and associated fragments BRs (DOn (ID), ADm (DS)) (associated fragment number s=1, 2.

1) The basic flow of the associated fragmentation of the data structure ADm (DS) and the data owner identity data DOn (ID) is shown in fig. 2, and the specific method is as follows:

1-1) a set of data structures { ADm (DS) } and a set of data owner identity data { DOn (ID) }.

A data structure set { ADm (DS) } is built according to a data structure ADm (DS), a method of adopting a method of taking a data sequence number m as a number of data in the data structure set { ADm (DS) }, taking the data structure DS as an element of the data structure set { ADm (DS) }, and marking the data set element as ADm (DS).

The method for establishing the data owner identity data set { DOn (ID) } according to the owner identity data DOn (ID), wherein the method adopts the data owner identity sequence number n as the number of the data owner identity in the data owner identity data set { DOn (ID) }, and the data owner identity data set ID is used as the element of the data owner identity data set { DOn (ID) }, and the data set element is marked as DOn (ID).

1-2) mining the relevance of a data structure set { ADm (DS) } and a data owner identity data set { DOn (ID) }, the specific method is as follows:

1-2-1) establishing a data structure set and a preliminary association set of the data owner identity data set according to data owner data uploading behaviors.

Determining the basic corresponding relation of the element ADm (DS) of the data structure set { ADm (DS) } and the element DOn (ID) of the data owner identity data set { DOn (ID) } according to the corresponding relation m-n of the data owner identity sequence number n and the data sequence number m by the data owner data uploading behavior, and combining the relation set into a preliminary relation set { BR (ADm (DS), DOn (ID), m-n) of the data structure set and the data owner identity data set.

1-2-2) extracting the data structure and the association elements of the data owner identity data in the preliminary association set of the data structure set and the data owner identity data set, thereby constructing the preliminary association set of the data structure and the data owner identity data.

The association elements BR (adm (DS), don (ID), m-n) of the data structure and the data owner identity data are extracted based on the preliminary association set { BR (adm (DS), don (ID), m-n) } of the data structure set and the data owner identity data set. According to the preliminary association m-n in the association elements BR (adm (DS), don (ID), m-n), each data structure adm (DS) is matched with the data owner identity data don (ID), data adm-n (DS) conforming to the preliminary association m-n in the data structure adm (DS) is screened, data owner identity data don-m (ID) conforming to the preliminary association m-n in the data owner identity data don (ID) is screened, the data structure adm-n (DS) and the data owner identity data don-m (ID) are integrated into a preliminary association combination element BR (adm-n (DS), don-m (ID)), and then a preliminary association set { BR (adm-n (DS), don-m (ID)) of the data structure adm (DS) and the data owner identity data don (ID) is constructed.

1-2-3) mining the association degree value of the data structure and the data owner identity data according to the preliminary association set of the data structure and the data owner identity data.

For a preliminary association set { BR (adm-n (DS), don-m (ID)) } of the data structure and the data owner identity data, the association values of the data structure and the data owner identity data are mined by adopting a data mining method. The associated value mining method comprises the following steps: randomly and arbitrarily combining the data in the data structure adm-n (DS) which accords with the preliminary association m-n with ac (adm-n (DS)), randomly and arbitrarily combining the data in the data owner identity data don-m (ID) which accords with the preliminary association m-n with ac (don-m (ID)), and mining and calculating the association degree value VR (ac (adm-n (DS)) of ac (adm-n (DS)) and ac (don-m (ID)) by adopting a clustering method, a gray association analysis method, an Apriori algorithm and other methods.

1-2-4) determines the association of the data structure requiring fragmentation with the data owner identity data.

The higher the correlation value of ac (adm-n (DS)) and ac (don-m (ID)), the higher the correlation of the two, the higher the mutual predictability, and the higher the data security and data owner privacy security risks. Therefore, according to the numerical distribution situation of the result calculated by the association degree value of ac (adm-n (DS)) and ac (don-m (ID)), the allowable range of the association degree value is set, the association degree value is lower than the probability distribution below PA percent, the association degree of ac (adm-n (DS)) and ac (don-m (ID)) is considered to meet the requirements of data security and privacy security risks of data owners, the association relation m-n (ac (adm-n (DS)) of ac (don-m (ID)) and ac (don-n (DS)) does not need fragmentation, and otherwise the fragmentation is needed.

1-2-5) centering on the association relationship between the data structure needing fragmentation and the identity data of the data owner, and establishing an association set of the data structure and the identity data of the data owner needing fragmentation.

According to step 1-2-4), a set of associations { l-k (ac (adl-k (DS)), ac (dok-l (ID))), ac (adl-k (DS)), ac (dok-l (ID)) } of data structures centered on the association relationship l-k (ac (adl-k (DS)), ac (dok-l (ID)) of data structures requiring fragmentation with data owner identity data requiring fragmentation is established. Wherein the association l-k belongs to the association m-n and the association k-l belongs to the association n-m.

1-2-6) further mining the set of associations requiring fragmentation of the data structure and the data owner identity data to complete association fragmentation.

The set of associations of data structures with data owner identity data requiring fragmentation is re-labeled { l-k } ^j ^-h (adl-k ^j-h (DS)，dok-l ^h-j (ID))，adl-k ^j-h (DS)，dok-l ^h-j (ID) }. Where j-h is the association of adl-k (DS) to dok-l (ID), and h-j is the association of dok-l (ID) to adl-k (DS). Then according to the step 1-2-3, calculating the association degree value VR (adl-k) of the association set ^j-h (DS)，dok-l ^h-j (ID))。

And counting all the data structures and the association sets of the data owner identity data which need fragmentation, and calculating the association degree values between the data structures and the sets contained in the association sets and the data owner identity data and the sets contained in the association sets by using a matrix equation through comparing the data structures, the data owner identity data and the association degree values contained in the association sets. Based on these association degree values, it is determined how association fragmentation should be performed.

The specific method comprises the following steps: data structure requiring fragmentation and set of associations { l-k for data owner identity data ^j-h (adl-k ^j-h (DS)，dok-l ^h-j (ID)) } is its data structure set { adl-k } ^j-h Arbitrary partitioning of (DS) } and data owner identity data set { dok-l ^h-j (ID) }. Relevance set { l-k ^j-h (adl-k ^j ^-h (DS)，dok-l ^h-j (ID)) } is VR (adl-k) ^j-h (DS)，dok-l ^h-j (ID)). According to step 1-2-3) adl-k is calculated ^j-h Various Divisions of (DS) and data owner identity data sets { dok-l ^h-j (ID) } various divided series association degree values. According to adl-k ^j-h (DS) and dok-l ^h-j (ID) partitioning scheme and corresponding series association degree value, and adl-k ^j-h (DS) and dok-l ^h-j (ID) and a correlation value VR (adl-k) ^j-h (DS)，dok-l ^h-j (ID)) comparison.

Among the fragmentation alternatives, a scheme is selected, of which adl-k ^j-h (DS) divided partial data Structure set, dok-l ^h-j And (D) dividing the data into all the part data owner identity data sets, wherein the maximum value, the minimum value and the average value of the association degree values between all the part data structure sets and all the part data owner identity data sets are lower than PA percent and are the lowest compared with three values of other dividing schemes.

And after the above fragmentation is carried out on all the relevance sets needing the fragmentation, counting all relevance fragmentation results, and selecting a relevance fragmentation scheme with the smallest data structure set and the identity data set of each part of data owner as a final scheme. The association fragmentation result is labeled as an association fragment set { BRs (DOn (ID), ADm (DS)) (association fragment number s=1, 2..once., n) }.

2) A data identity fragmentation method for data isolation from data owners.

Based on the correlation fragmentation result, the correlation fragment set { BRs (DOn (ID), ADm (DS)) (correlation fragment number s=1, 2, the association shard of n) cuts off the association between the data structure and the data owner identity data. But in essence the data owner identity data DOn (ID) in the association fragment has a strong association with the data owner identity private key DOn (PS) on a big data platform. To avoid the association of third parties (data processors, data users, big data platforms, data transaction service parties) with the data owner identity privacy keys DOn (PS) by means of association fragments, the data owners are presumed. It is necessary to further cut off the association of the data owner identity private key DOn (PS) representing the data owner identity with the associated shard on the basis of the associated sharding. The fragmentation method of the identity private key of the data owner is also called data identity fragmentation.

The basic flow is shown in figure 3, and the specific method comprises the following steps:

2-1) based on the data structure and the associated fragmentation result of the data owner identity data, the number of data owner identity data fragmentation NUM (brs (DOn (ID)) in the associated fragmentation can be determined. The number of fragmented data of the identity of the data owner NUM (s (DOn (PS))) should be greater than or equal to the number of fragmented data of the identity of the data owner NUM (brs (DOn (ID)). The dividing of fragmented data of the identity of the data owner NUM may be by using a fixed, random, average field or other method.

2-2) when the number of data owner identity data fragmentation, NUM, (brs (DOn (ID)) in the associated fragmentation is too high, the number of data owner identity private key fragmentation, NUM, (s (DOn (PS)) does not meet the requirement of the number of data owner identity data fragmentation, NUM, (brs (DOn (ID)), the requirement that the length of the private key fragments is at least 3 bytes after private key fragmentation is determined to be NUM, (brs (DOn (ID)) -NUM (s (DOn (PS))) 3, space private keys with the length of NUM, (brs (DOn (ID)) -NUM (s (DOn (PS)) 3 are added on the basis of the data owner identity private keys, and the private keys are randomly or added at specific positions of the data owner identity private keys to form new data owner identity private key DOn '(PS)), and then step 2-1) is repeated to make data owner identity private key DOn' (PS).

2-3) randomly associating the data owner identity private key fragment s (DOn (PS)) or s (DOn' (PS)) with the data owner identity data fragment brs (DOn (ID)) according to steps 2-1) and 2-2), and generating an association table of the data owner identity private key fragment and the data owner identity data fragment from the random association result. In order to ensure the security and privacy, the association table is stored by adopting an offline cold storage mode through a permanent storage medium with large capacity and low performance requirement, and the aim is to ensure the security and reliability of the association table. And meanwhile, the method is limited to offline call verification and update when the identity of the data owner needs to be locked, and online call downloading is not allowed. When the data owner identity needs to be locked, the data owner identity data fragments br (DOn (ID)) corresponding to the data owner identity private key fragments s (DOn (PS)) can be queried through the data owner identity private key DOn (PS) and the association table, so that the data owner identity DOn (PS-ID) is recovered.

3) A data fragmentation method for data isolation from data owners.

Based on the correlation fragmentation result, the correlation fragment set { BRs (DOn (ID), ADm (DS)) (correlation fragment number s=1, 2, the association shard of n) cuts off the association between the data structure and the data owner identity data. But in essence the data structure ADm (DS) and the data value ADm (DV) in the dependency shard have strong dependencies in the large data platform database. To avoid the association of the data value ADm (DV) by the third party (data processor, data user, big data platform, data transaction service) by the association fragment, it is speculated which data the data owner owns. The association of the data value ADm (DV) with the associated fragment needs to be further cut off on the basis of the associated fragmentation. The fragmentation method is also referred to as a data fragmentation method for data isolation from the data owners.

The basic flow is shown in figure 4, and the specific method comprises the following steps:

3-1) based on the associated fragmentation results of the data structure and the data owner identity data, the number of data structure fragmentation NUM (brs (ADm (DS))) in the associated fragmentation can be determined. The number of fragmentation NUM (s (ADm (DV))) of the data value should be greater than or equal to the number of fragmentation NUM (brs (ADm (DS))) of the data structure. The division of the data value fragmentation is related to the standard principle of the data value fragmentation processing (the method is as follows in step 5)), and the data value fragmentation result is obtained according to the standard principle of the data value fragmentation processing, so as to obtain the number NUM (s (ADm (DV)) of the data value fragments.

3-2) when the number of data structure fragmentation NUM (brs (ADm (DS))) in the associated fragmentation is too high, the number of data value fragmentation NUM (s (ADm (DV))) does not meet the requirement of the number of data structure fragmentation NUM (brs (ADm (DS))). Comparing the number of fragments of the data value with the number of fragments of the data structure, the structural length of the data value deficiency can be determined as NUM (brs (ADm (DS))) -NUM (s (ADm (DV))). On the basis of the original data value, adding space data values with the length NUM (brs (ADm (DS))) -NUM (s (ADm (DV))), randomly or specially adding the data values at specific positions of the original data value to form a new data value ADm '(DV), and repeating the step 3-1 to fragment the data value ADm' (DV).

3-3) randomly associating the data value fragments s (ADm (DV)) or s (ADm' (DV)) with the data structure fragments s (ADm (DS)) according to steps 3-1) and 3-2), and generating an association table of the data value fragments and the data structure fragments from the random association result. In order to ensure the security and privacy, the association table is stored by adopting an offline cold storage mode through a permanent storage medium with large capacity and low performance requirement, and the aim is to ensure the security and reliability of the association table. And meanwhile, the method is limited to the offline call verification and update when the data owned by the data owner needs to be determined, and the online call downloading is not allowed. When the data owned by the data owner needs to be determined, the data structure fragments s (ADm (DS)) corresponding to the data value fragments s (ADm (DV)) can be queried through the data value ADm (DV) and the association table, so that the data ADm (DS-DV) owned by the data owner can be recovered.

4) Data value fragmentation is a standard principle of workable processing.

Firstly, the data in the technology is the data uploaded by the data owners, and comprises various types of data commonly known in society, such as digital products, data assets, data elements and the like. To ensure data integrity, processability, and recoverability, data value fragmentation is the data value fragmentation of various types of data for greater than 2 minimum data units.

Secondly, the safety privacy of the data and the data owners is guaranteed in the steps 2) to 4), and particularly, the problem that a third party directly searches or indirectly pushes out the data owners through the data is solved, and in order to further realize the large-scale data value mining, the data value is fully released, and then the data circulation based on the data value is realized, the data value fragmentation is divided according to the principle of the data value fragmentation processing standard. The specific dividing method comprises the following steps:

the data value ADm (DV) is marked according to three data value dimensions of time, space and path. The basic flow is shown in fig. 5, and the marks include:

the data value ADm (DV) time dimension flag includes a data owner upload time UT (ADm (DV)) of the data value ADm (DV), a data structure fragmentation result time FRT (ADm (DS)), a data storage time ST (ADm (DV)), a data processing time PRT (ADm (DS-DV)), a data transaction time TT (ADm (DS-DV)), a time ORT (ADm-DV) associated with the data owner. The data value ADm (DV) spatial dimension marker includes a data structure DSm and a data value DVm. The data value path dimension token includes a trust level CL (ADm (DS-DV)), a data processing path PR (ADm (DS-DV)), a data transaction path TC (ADm (DS-DV)). The marking results are daily management data of the big data platform to the data, and can be directly provided by the big data platform.

And according to each marking result of the data value ADm (DV), carrying out data value fragmentation processing corresponding to the fragmentation of the data value ADm (DV). For example, if the data value ADm (DV) is uploaded by the data owner in batches, the data owner upload time UT (ADm (DV)) of the data value ADm (DV) includes some partial set of the data value ADm (DV) corresponding to the upload time. The fragmentation result of the data value ADm (DV) is a partial set of the data value ADm (DV) corresponding to the uploading time, and is marked as sUT (ADm (DV)). The data value fragmentation processing results corresponding to the other respective markers are respectively marked as sFRT (ADm (DS)), sST (ADm (DV)), sPRT (ADm (DS-DV)), sTT (ADm (DS-DV)), sORT (ADm (DS-DV)), sDSm, sDVm, sCL (ADm (DS-DV)), sPR (ADm (DS-DV)), sTC (ADm (DS-DV)). Taking the intersection of the data value ADm (DV) and the data value fragmentation processing results corresponding to all the above marks, thereby obtaining the final data value fragmentation result s (ADm (DV)).

The foregoing is only a preferred embodiment of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art, who is within the scope of the present invention, should be covered by the protection scope of the present invention by equally replacing or changing the technical scheme and the inventive concept thereof.

Claims

1. The data identification and data fragmentation method is characterized by comprising the following steps:

s1, establishing a data structure set and a preliminary association set of a data owner identity data set according to data uploading behaviors of the data owner;

s2, extracting association elements of the data structure and the identity data of the data owner in the preliminary association set, and constructing an association set of the data structure and the identity data of the data owner;

s3, setting an allowable range for the association degree value of the data structure and the identity data of the data owner;

s4, mining the association degree value of the data structure and the identity data of the data owner for the association set;

s5, carrying out fragmentation processing on the relation between the data structure and the identity data of the data owner according to the association degree value exceeding the allowable range, and establishing an association set of the data structure and the identity data of the data owner which are required to be fragmented correspondingly;

s6, carrying out fragmentation processing on the relevance set to obtain a data structure and a relevant fragmentation result of identity data of all the data;

s7, according to the data structure and the associated fragmentation result of the data owner identity data, carrying out fragmentation processing on the data owner identity private key associated with the associated fragments;

s8, randomly associating the data owner identity private key fragments with the data owner identity data fragments, and generating an association table A from the random association result;

s9, carrying out fragmentation processing on the data value associated with the association fragment according to the data structure and the association fragmentation result of the identity data of the data owner;

s10, randomly associating the data value fragments with the data structure fragments, and generating an association table B by using a random association result;

and S11, storing the association table A and the association table B by adopting an offline cold storage mode through a storage medium, wherein the online calling verification and updating are only limited when the identity of a data owner needs to be locked, and the online calling downloading is not allowed.

2. The method for data identity and data fragmentation according to claim 1, wherein the step S7 is specifically:

s71, determining the number of fragmentation of the identity data of the data owner in the associated fragmentation according to the data structure and the associated fragmentation result of the identity data of the data owner;

s72, determining the fragmentation number of the identity private key of the data owner, wherein the requirement must be greater than the fragmentation number of the identity data of the data owner;

s73, setting a lower limit of the fragment length of the identity private key of the data owner;

s74, when the fragmentation number of the data owner identity private key is multiplied by the lower limit of the fragmentation length of the data owner identity private key to exceed the length of the data owner identity private key, determining the missing length of the data owner identity private key, and filling the missing length of the data owner identity private key with a space private key;

s75, according to the fragmentation number of the data owner identity private key and the fragmentation length of the data owner identity private key, carrying out fragmentation processing on the data owner identity private key.

3. The method for data identity and data fragmentation according to claim 1, wherein the step S9 is specifically:

s91, determining the fragmentation number of the data structure in the associated fragmentation according to the data structure and the associated fragmentation result of the identity data of the data owner;

s92, determining the fragmentation number of the data value, wherein the requirement must be greater than the fragmentation number of the data structure;

s93, setting a lower limit of the length of the data value, wherein the requirement is that the data value must be larger than 2 minimum data units;

s94, when the number of fragmented data structures is multiplied by the lower limit of the length of the data value to exceed the length of the data structure, determining the missing length of the data value, and filling the missing length of the data value with a space private key;

s95, according to the data value fragmentation number and the data value length, carrying out fragmentation processing on the identity private key of the data owner.

4. The method for data identity and data fragmentation according to claim 2, wherein in step S75, the data owner identity secret key associated with the association fragment is fragmented, and the data owner identity secret key may be divided by a random method or an average field method.

5. The method for data identity and data fragmentation according to claim 3, wherein in step S95, the data value is marked, corresponding fragmentation processing is performed on the data value according to the marking result of the data value, and the marking result of the corresponding data value fragmentation processing is intersected to obtain a final data value fragmentation result.

6. The method for data identity and data fragmentation according to claim 5, wherein the marking is to perform three data value dimension marking on the data value according to time, space and path, and the method comprises the following steps;

the data value path dimension labels include a confidence level, a data processing path, and a data transaction path.

7. A data identification and data fragmentation device for data and data owner isolation, the device comprising:

a memory storing executable program code;

a processor coupled to the memory;

the processor invokes the executable program code stored in the memory to perform the data identification and data fragmentation method of any of claims 1-6.

8. A computer storage medium storing computer instructions which, when invoked, are operable to perform a data identity and data fragmentation method as claimed in any one of claims 1 to 6.