CN113709090A

CN113709090A - System and method for determining group privacy disclosure risk

Info

Publication number: CN113709090A
Application number: CN202011103737.8A
Authority: CN
Inventors: 张颖; 张继东; 袁海
Original assignee: Tianyi Smart Family Technology Co Ltd
Current assignee: Tianyi Digital Life Technology Co Ltd
Priority date: 2020-10-15
Filing date: 2020-10-15
Publication date: 2021-11-26
Anticipated expiration: 2040-10-15
Also published as: CN113709090B

Abstract

Systems and methods for determining group privacy exposure risk are disclosed. The system comprises: the privacy data set preprocessing module is used for preprocessing the privacy data set; a privacy database for storing the pre-processed privacy data set; a group information preprocessing module for determining the openness of the private data set based on the preprocessed private data set and forming an open matrix; the privacy leakage degree calculation module is used for determining the association degree of the privacy data sets, forming an association matrix, determining the triple information of each type of privacy data sets, and determining the privacy leakage degree based on the association matrix and the triple information; and the privacy disclosure risk determining module is used for determining the privacy disclosure risk based on the privacy disclosure degree. The application also discloses a corresponding method. The system and the method consider the relevance among the privacy data and the individuation of the user, evaluate the group privacy disclosure risk more accurately and objectively, and have stronger expansibility and generalizability.

Description

System and method for determining group privacy disclosure risk

Technical Field

The present application relates to the field of information security technology, and more particularly, to a system and method for determining group privacy disclosure risk.

Background

With the rapid development of emerging technologies such as cloud computing, internet of things and the internet, the data is also increased explosively. During the process of data collection, mining, analysis and application, if some personal information is contained in the data record, the privacy of the person is easily revealed. This can have undesirable consequences for both individuals and society. Therefore, how to protect personal privacy is more and more emphasized by people. Current research on privacy data protection focuses mainly on how to desensitize data through a series of desensitization algorithms, or reduce the possibility of privacy disclosure by means of privacy data distribution protection methods, such as common Private Aggregation of teachers' entire (PATE) or Differential Privacy (DP) protection methods, and the like, while relatively few model or algorithm research is conducted to determine privacy disclosure risks.

Patent document CN 105871891 a discloses "a DNS privacy disclosure risk assessment method and system", in which quantitative evaluation of DNS privacy disclosure risk can be achieved. The method comprises the steps of calculating the total DNS privacy disclosure RISK RISK value of a user or a user group according to the privacy disclosure degree and the privacy infringement difficulty degree of each privacy disclosure RISK to the user or the user group and the privacy disclosure RISK value, wherein the data source mainly comprises the monitoring of a link in a server and the collection of access domain names, the privacy disclosure RISK of data is judged according to the interaction frequency of each application and the server, and the construction and evaluation indexes of a privacy data set are not perfect. For example, when the privacy data is comprehensively evaluated, the degree of openness of a user to a certain type of specific sensitive data and the relevance among the data are not considered, so that the evaluation method is not suitable for the case that the data generated in some application scenarios are strongly correlated in family scenarios and similar scenarios due to the relativity in relation or similar relevance among the members.

Patent document CN 110222058A discloses a multi-source data association privacy disclosure RISK assessment system based on FP-growth, in which vulnerability analysis is performed on different data sources, a privacy disclosure RISK value RISK of a single data source is calculated to construct a privacy disclosure RISK assessment index, a mapping assessment set is constructed by combining an asset influence coefficient C, a threat frequency T and a vulnerability severity V, a privacy disclosure RISK factor entropy weight coefficient is calculated through multiple matrix changes and a markov chain, and a privacy disclosure RISK assessment model is finally obtained. But the patent only considers the privacy disclosure risk from the perspective of data assets, the model ignores the relevance factors between people, and the algorithm calculation process is relatively complex and is not suitable for the privacy disclosure risk assessment in the group unit. Under the condition of large data volume, the calculation cost is high, and the popularization is not facilitated.

Most of the current privacy disclosure risk assessment methods or systems aim at a single individual, but in group applications, such as groups taking families as units, due to the fact that the relationships among group members have certain relevance and are even relatively close, the current privacy disclosure risk assessment methods taking individuals as main research objects have at least two problems in terms of dimensions such as data generation, data use and privacy opening. First, because of the relationship relevance between each member in the group, the data generated in the application process has a larger relevance, and the privacy evaluation based on the individual often ignores the influence of the data relevance on privacy disclosure. Even though the concept of a user group is introduced into a part of privacy evaluation models, the evaluation dimension is still single-group data, and the relevance between the data is ignored. The association performance between the data plays a role of enhancing background knowledge, so that the privacy data of one member is leaked, and the privacy of other group members is possibly leaked to some extent. Secondly, the openness of the privacy is a non-strict quantitative measurement relationship, due to the individual difference among the groups, the openness degree of different individuals to the individual privacy is different, from the perspective of individualization of sensitive data, the definitions of the same kind of privacy by different individuals and the individualized privacy requirements when the privacy data are issued and processed are also different, and the group is taken as the privacy evaluation unit, so that the difference of the openness of the privacy itself is certainly considered.

Therefore, after studying the technology for evaluating the privacy disclosure risk in the prior art, most of the technical solutions have at least two problems as follows.

First, the relevance between data is not considered when evaluating the privacy disclosure risk, but in practical application, there often exists a potential invisible relevance between data. For example, some fields in a certain data record are associated with other data in some way, so that more privacy information can be deduced therefrom to cause privacy disclosure.

Second, the differential needs of different groups or individuals for privacy protection are not taken into account when assessing privacy leakage risks. In fact, different people have quite different requirements on privacy protection, and the definition standards of privacy disclosure are different. Therefore, when the privacy disclosure risk assessment is performed in groups (especially, user groups in households), the methods in the prior art cannot completely adapt to the requirements.

Therefore, there is a need in the art for a method and system for comprehensively, objectively and quantifiably determining the risk of privacy disclosure, so that the risk of privacy disclosure can be comprehensively evaluated in groups, particularly in families.

Disclosure of Invention

The following presents a simplified summary of one or more aspects in order to provide a basic understanding of such aspects. This summary is not an extensive overview of all contemplated aspects, and is intended to neither identify key or critical elements of all aspects nor delineate the scope of any or all aspects. Its sole purpose is to present some concepts of one or more aspects in a simplified form as a prelude to the more detailed description that is presented later.

The application aims to provide a method and a system for determining group privacy disclosure risks. The method comprises the steps of classifying data related to group application by taking a group as a unit, proposing a characterized privacy data set, establishing a privacy open matrix through an algorithm and calculating a privacy disclosure risk coefficient by combining personal privacy openness factors, and finally obtaining a privacy data disclosure risk determination result of the group comprehensively. The result can be used as a reference basis for providing targeted and personalized privacy protection measures for millions of group members. Meanwhile, the method in the application can also be popularized to any organization or application scene needing to determine privacy disclosure risks by taking the group as a unit.

According to a first aspect of the present application, there is provided a system for determining a risk of group privacy disclosure, the system comprising: the system comprises a privacy data set preprocessing module, a privacy database, a group information preprocessing module, a privacy disclosure degree calculating module and a privacy disclosure risk determining module.

Wherein the privacy data set preprocessing module is used for preprocessing the privacy data set, and comprises: the method comprises the steps of identifying privacy data related to various applications and services in a group scene, namely defining a privacy data set, and generating a feature vector of the privacy data set after cleaning, formatting, filtering useless data, normalizing repeated data and standardizing the data.

The privacy database is used to store data related to privacy leakage risks, including but not limited to: personal information (such as age, identity card, occupation, hobbies and interests, work units and the like) of family members, APP access information and log information, internet access characteristic information and traffic information, basic information of family intelligent equipment, use log information of the family intelligent equipment, other information and the like. Each private data record consists of the following tuples:

{ [ privacy tag metadata MetaDi ]; [ privacy tag metadata describes MetaDSpec-i, which can be described in regular expressions or Backus-parade ]; (keyword list (keyword 1, keyword 2, keyword 3.. keyword n, this element is optional)); eigenvalues }.

The group information preprocessing module is used for determining the openness degree of each group member to the data and forming a privacy open matrix between the group members and the privacy data set according to the individual requirements of different members in the group to privacy protection and by combining the output of the privacy data set preprocessing module.

The privacy disclosure degree calculation module is used for determining the association degree of the privacy data sets and forming an association matrix R, for each type of privacy data set, determining the three-tuple information { disclosure severity Si, disclosure difficulty Bi, openness Fpi of the group }, and determining the privacy disclosure degree of each type of privacy data set based on the association matrix and the three-tuple information.

The privacy disclosure risk determination module is used for determining a privacy disclosure risk quantitative value of the group based on the privacy disclosure degree to determine the privacy disclosure risk of the group.

According to a second aspect of the present application, there is provided a method for determining a risk of privacy leakage for a group, the method comprising: pre-processing a set of privacy data about a group; determining the degree of association of the privacy data sets and forming an association matrix; the method comprises the steps of determining the openness ai of a group to each type of privacy data set based on the preprocessed privacy data sets, and forming a corresponding privacy open matrix; for each type of private data set, determining three-tuple information { leakage severity Si, leakage difficulty Bi, openness Fpi of the group }; determining the privacy disclosure degree of each type of privacy data set based on the incidence matrix and the triplet information; and determining a privacy disclosure risk quantification value of the group based on the privacy disclosure degree to determine the privacy disclosure risk of the group.

To the accomplishment of the foregoing and related ends, the one or more aspects comprise the features hereinafter fully described and particularly pointed out in the claims. The following description and the annexed drawings set forth in detail certain illustrative features of the one or more aspects. These features are indicative, however, of but a few of the various ways in which the principles of various aspects may be employed and the present description is intended to include all such aspects and their equivalents.

Drawings

So that the manner in which the above recited features of the present application can be understood in detail, a more particular description of the disclosure briefly summarized above may be had by reference to aspects, some of which are illustrated in the appended drawings. It is to be noted, however, that the appended drawings illustrate only certain typical aspects of this application and are therefore not to be considered limiting of its scope, for the description may admit to other equally effective aspects.

In the drawings:

fig. 1 is a schematic block diagram illustrating a system for determining group privacy disclosure risk according to an embodiment of the present application; and

fig. 2 is a flow diagram illustrating a method 200 for determining group privacy disclosure risk according to an embodiment of the present application.

Detailed Description

The detailed description set forth below in connection with the appended drawings is intended as a description of various configurations and is not intended to represent the only configurations in which the concepts described herein may be practiced. The detailed description includes specific details to provide a thorough understanding of the various concepts. It will be apparent, however, to one skilled in the art that these concepts may be practiced without these specific details. In some instances, well known components are shown in block diagram form in order to avoid obscuring such concepts.

It is to be understood that other embodiments will be evident based on the present disclosure, and that system, structural, process, or mechanical changes may be made without departing from the scope of the present disclosure

As shown in fig. 1, a system for determining a risk of privacy disclosure of a group is illustrated. The system comprises: the privacy data set preprocessing module 10, the privacy database 20, the group information preprocessing module 30, the privacy disclosure degree calculating module 40, and the privacy disclosure risk determining module 50.

The privacy data set preprocessing module 10 is configured to preprocess the privacy data set, and includes: the method comprises the steps of identifying privacy data related to various applications and services in a group scene, namely defining a privacy data set, and cleaning, formatting, filtering useless data, normalizing repeated data and standardizing the privacy data set to generate a feature vector of the privacy data set.

The privacy database 20 is used to construct a privacy database, wherein the privacy data includes but is not limited to: personal information (such as age, identity card, occupation, hobbies and interests, work units and the like) of the members, various application access information and log information, internet access characteristic information and traffic information, basic information of the intelligent equipment, use log information of the intelligent equipment, other information and the like.

Each private data record consists of the following tuples:

The group information preprocessing module 30 is configured to determine, according to personalized requirements of different members in the group for privacy protection, the openness degree of each member for the data in the group and form a privacy open matrix between the members and the privacy data, in combination with the output of the privacy data preprocessing module.

The privacy disclosure degree calculation module 40 is configured to determine the degree of association of the privacy data sets and form a correlation matrix R, determine, for each type of privacy data set, triplet information { disclosure severity Si, disclosure difficulty Bi, openness degree Fpi of the group }, and determine the degree of privacy disclosure of each type of privacy data set based on the correlation matrix and the triplet information.

The privacy disclosure risk determination module 50 is configured to determine a privacy disclosure risk of the member based on the feature vector value of the privacy data in combination with the risk disclosure vector value of the privacy data and output the result.

The detailed implementation process and steps of the method 200 for determining a risk of group privacy disclosure according to the present application are described in detail below with reference to the flowchart in fig. 2.

Step one, S201: preprocessing private data sets for a group

This step includes the following substeps.

The method comprises the steps of combing business data, log data and the like which can be related to various applications and services in a group, and after cleaning, analyzing and sorting, performing word segmentation on the business data, the log data and the like by using a word segmentation device to obtain a specific data set. And searching the privacy database by using the key words of each data set according to each item of data of each data set to perform feature matching, if the matching is successful, the data is the privacy data, and adding the corresponding feature value into the tuple, otherwise, discarding the tuple.

X ═ i1, i 2.., im } is defined as a set of m different private data items ij, called a private data set, where ij is the feature vector to which the private data item corresponds.

Assuming that P number of privacy data sets are finally obtained after sorting all applications and services, the privacy data sets can be expressed as { X1, X2, … Xp }. When there is an association between some two sets of private data Xi and Xj, it is satisfied

When the number of corresponding private data items Xi and Xj is different, taking t as Max { m | where m is X1, X2... Xp is the number of data items }, P private data sets can be represented by the matrix Pr as follows:

next, the degree of correlation of the private data sets is determined and a correlation matrix R is generated.

Determining the relevance of the private data items according to the matrix Pr and forming a data relevance coefficient matrix R as shown in formula I

The formula I is as follows: r ═ R (R)_ji)_txt

Wherein the calculation formula of each rij is shown below, satisfying r_ij＝r_ji

Step two S202: determining the degree of association of the private data sets and generating an association matrix R

Calculating the relevance of the private data items according to the matrix Pr and forming a data relevance coefficient matrix R as shown in formula I

The formula I is as follows: r ═ R (R)_ji)_txt

Wherein the calculation formula of each rij is shown in (2), and r is satisfied_ij＝r_ji

Step three, S203: determining the openness ai of the group u to a certain type of private data set and forming an open matrix

Due to the individual differences among the members of the group, the individual with different personalities has different degrees of openness to the private data. The openness degree of each type of private data of the group members is divided into a, B, C, D, E, f. The rank may be represented by different numbers n₁,n₂,n₃...n_n}. Wherein n satisfies the following condition n_k<1, k is 1,2,3, when i<When j is, n_i<n_j}。

Assuming that there are m members in group u, and the openness of each member to the data items in dataset { X1, X2, … Xp } is represented by a feature vector ui (a1, a2, a 3.. ap), the relationship between the group member privacy openness and dataset P can be represented as the following matrix:

the minimum allowable openness of the final data set Xi in the group is represented by Fp, then:

f (min { uk (a1) }, min { uk (a2), min { uk (ap)) }, wherein k ═ 1,. n }, and n is the number of group members.

Step four S204: determining three-tuple information { leakage severity Si, leakage difficulty degree Bi, and openness degree min { ai } of group u } of each type of data item of the privacy data set

It is evaluated from three dimensions for a certain class of private data sets: revealing severity Si, revealing difficulty Bi, data openness Fpi, forming a private dataset triple (revealing severity Si, revealing difficulty Bi, data openness Fpi). The higher the Si value is, the greater the loss of the data to the group members after the data is leaked, and the more serious the consequence is; bi > is 1, and the higher the Bi value is, the more difficult the data is to leak; fpi < ═ 1, smaller values indicate that the members of the group are less open.

Step five S205: determining a degree of privacy disclosure for each type of privacy data set

For a certain data set Xp, defining a privacy disclosure risk coefficient theta as follows:

the formula II is as follows: θ ═ Si × Fpi)/Bi.

A privacy leakage vector T [ θ 1, θ 1, θ 2.. θ p ] is created for the privacy data set { X1, X2, … Xp }.

Determine a privacy-exposure risk vector for the dataset X1, X2._VALUEWherein R is_VALUEExpressed as { Risk1, Risk2.. RiskP }: then

The formula III is as follows: r_VALUE＝R*T，

Where R is the incidence matrix and T is the privacy leakage vector.

Step six S206: determining privacy exposure risk for a group

Defining a certain privacy data item i by combining the frequency of data generation or collection in actual specific application or service_mnThe number of occurrences in a certain time period is r_mn.Determining the frequency of occurrence FRE of each data item_mnThen, then

The formula four is as follows:

defining privacy Weight of dataset XP as Weight_k＝max{FRE_mk}。

And determining a privacy disclosure risk quantitative value of the group according to the following to determine the privacy disclosure risk of the group.

The formula five is as follows:

the present invention is described in further detail below with reference to the attached drawings. Taking a common family with 4 households as an example, assuming that the daily family service scene is relatively simple, the following record sets in three aspects are obtained by sorting data and logs generated in various applications of the family:

the integration: { APP usage; { APP open time, APP action, user, search keyword }, {1,4,3,2}

And a second set: { Internet surfing situation: { visit time, visit web site, length of stay, keyword topic }, {1/2,1/2,3,2}

And (3) collecting three: { device information: { device name, device action, time }, {2, 4,1, 0}

And (4) collecting: { personal information: { identification card, name, gender, age {, {2,3,4,1} }

Remarking: wherein the calculation of the eigenvalues in the privacy database is not within the scope of the discussion of this technical solution.

The private data set Pr can thus be derived:

step two: the degree of correlation of the private data sets is determined and a correlation matrix R is generated. The correlation matrix rij is calculated by formula one as follows:

step three: evaluating the openness ai of a family user group u to a certain type of private data set

Assuming that there are 4 members in the family, the openness of each member to the three data sets is represented by a matrix as follows:

step four: for the data set one, two and three, the three-tuple information { the severity of leakage Si, the difficulty of leakage Bi, the openness min { ai } of the user group u is confirmed respectively as follows:

data set one { Si ═ 5, Bi ═ 2, Fpi ═ 1/6}

Data set two { Si ═ 3, Bi ═ 6, Fpi ═ 1/4}

Data set three { Si ═ 2, Bi ═ 6, Fpi ═ 1/5}

Data set four { Si ═ 8, Bi ═ 6, Fpi ═ 1/8}

Step five: calculating the privacy disclosure degree of each type of privacy data set, wherein the privacy disclosure risk coefficient formula of each data set is theta (Si) Fpi)/Bi, and the calculation result is as follows

Data set one X1, whose privacy leakage risk factor is calculated as: 5/12

Data set two X2, whose privacy leakage risk factor is calculated as: 3/24

Data set three X3, whose privacy leakage risk factor is calculated as: 1/15

Data set three X4, whose privacy leakage risk factor is calculated as: 1/6

Privacy leakage vectors are created for the data sets { X1, X2, X3, X4}

T[θ1,θ1,θ2,...θp]＝5/12，1/8，1/15,1/6}；

Defining the privacy leakage risk value vector of the data set { X1, X2, X3} as RVALUE, calculating the result according to formula III as follows:

0.337

0.456

0.036

-0.817

step six: and calculating the quantitative value of the risk of the privacy disclosure of the whole family. Assuming that the maximum number of times the data set { X1, X2, X3, X4} takes within 30 days is {210,120,30,10}, the frequency Weightk of the data set { X1, X2, X3, X4} is {7,4,1,0.33} respectively according to the formula four

And calculating the privacy evaluation risk of the family data according to the formula five, wherein the calculation result is 4.49.

As described above, compared with the prior art, the system and the method for determining the group privacy disclosure risk provided by the present application have the following advantages:

1. it is first proposed to make privacy disclosure risk determination in units of groups (such as households) and to consider the association between various privacy data. Aiming at privacy determination which is mainly performed by group members, the invention firstly proposes that privacy disclosure risk determination is performed by taking a group as a unit, and meanwhile, in order to reduce privacy disclosure caused by relevance among privacy data, a relevance coefficient matrix among privacy data items is determined and a corresponding determination method is provided, so that the relevance among the privacy data is fully considered in privacy protection.

2. And in combination with the actual application scene, the openness of the privacy data is determined in a grading way, and the openness degree of the related privacy data item is determined for each group member so as to form an individual privacy open vector. Aiming at multi-user scenes such as groups and the like, a method for determining the openness matrix of the group privacy data set is provided by combining the relevance among the group members, so that the personalized requirements of privacy protection can be met and the characteristics of the groups can be considered in the privacy determination process.

3. The application also provides a method for expressing the triples (the disclosure severity Si, the disclosure difficulty Bi and the data openness Fpi) of the privacy data set, one privacy data set is evaluated according to the severity of the privacy disclosure, the disclosure difficulty and the data development degree, and the privacy disclosure risk coefficient is determined according to a formula, so that the disclosure risk of the privacy data is better quantified.

4. According to the group privacy disclosure risk determining method, the group privacy disclosure risk determining method has strong expansibility and generalizability, can be popularized to other scenes needing group privacy determination, has no limit to the number of users of the group, and can be easily extended to user groups with a certain similar characteristic. With the development of the internet of things and the 5G technology, more and more IoT devices are bound to be accessed into the network, more and more organizations are bound to doubt the hidden danger of privacy disclosure, and the method has a very wide application prospect.

5. The privacy risk determination system and method of the present application are very versatile, have a long technology application life cycle, and can be embedded in any application or product that requires the use of the method.

By adopting the system and the method, the privacy personalized requirements among different members of the group are considered, the characteristics of the privacy data set and the correlation degree among the data sets are also considered, and the risk of group privacy disclosure is comprehensively determined, so that the privacy disclosure taking the group as a unit is more accurately evaluated.

It is to be understood that the specific order or hierarchy of steps in the methods disclosed is an illustration of exemplary processes. Based upon design preferences, it is understood that the specific order or hierarchy of steps in the methods or methodologies described herein may be rearranged. The accompanying method claims present elements of the various steps in a sample order, and are not meant to be limited to the specific order or hierarchy presented unless specifically recited herein.

The previous description is provided to enable any person skilled in the art to practice the various aspects described herein. Various modifications to these aspects will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other aspects. Thus, the claims are not intended to be limited to the aspects shown herein, but is to be accorded the full scope consistent with the language claims, wherein reference to an element in the singular is not intended to mean "one and only one" (unless specifically so stated) but rather "one or more". The term "some" means one or more unless specifically stated otherwise. A phrase referring to "at least one of a list of items refers to any combination of those items, including a single member. By way of example, "at least one of a, b, or c" is intended to encompass: at least one a; at least one b; at least one c; at least one a and at least one b; at least one a and at least one c; at least one b and at least one c; and at least one a, at least one b, and at least one c. All structural and functional equivalents to the elements of the various aspects described throughout this disclosure that are known or later come to be known to those of ordinary skill in the art are expressly incorporated herein by reference and are intended to be encompassed by the claims. Moreover, nothing disclosed herein is intended to be dedicated to the public regardless of whether such disclosure is explicitly recited in the claims.

Claims

1. A system for determining group privacy exposure risk, comprising:

a privacy data set preprocessing module for preprocessing the privacy data set relating to the group;

a privacy database for storing the pre-processed privacy data set;

the group information preprocessing module is used for determining the openness ai of the group to each type of privacy data set based on the preprocessed privacy data sets and forming a corresponding privacy open matrix;

the privacy disclosure degree calculation module is used for determining the association degree of the privacy data sets and forming an association matrix R, determining the three-tuple information { disclosure severity Si, disclosure difficulty Bi and openness degree Fpi of the group } of each type of privacy data sets, and determining the privacy disclosure degree of each type of privacy data sets based on the association matrix R and the triple information; and

a privacy disclosure risk determination module to determine a privacy disclosure risk quantification value for the group based on the privacy disclosure degree to determine the privacy disclosure risk for the group.

2. The system of claim 1, wherein pre-processing the private data set comprises:

the private data set is cleaned, formatted, filtered of useless data, normalized by repeated data, and normalized to generate a feature vector of the private data set.

3. The system of claim 1,

determining the openness ai comprises dividing the openness of the members of the group to each type of private data set into a plurality of levels, the maximum value of the levels not exceeding 1, the openness of the group to each type of private data set being the minimum value of the levels, and

wherein Si is more than or equal to 1, Bi is more than or equal to 1, and Fpi is less than or equal to 1.

4. The system of claim 1, wherein determining the degree of privacy disclosure comprises:

defining a privacy disclosure risk coefficient theta ═ (Si × Fpi)/Bi;

defining a privacy leakage vector T (θ 1, θ 2, …) for each type of privacy data set; and

determining the privacy leakage risk vector R_VALUE＝R*T。

5. The system of claim 4, wherein determining the group's privacy-exposure risk quantified value comprises weighting the privacy-exposure risk vector according to a frequency of use of each type of privacy data set to obtain the privacy-exposure risk quantified value.

6. A method for determining group privacy exposure risk, comprising:

pre-processing a set of privacy data about the group;

determining the degree of association of the private data sets and forming an association matrix R;

the method comprises the steps of determining the openness ai of the group to each type of privacy data set based on the preprocessed privacy data sets, and forming a corresponding privacy open matrix;

for each type of private data set, determining three-tuple information { leakage severity Si, leakage difficulty Bi, openness Fpi of the group };

determining a privacy disclosure degree of each type of privacy data set based on the incidence matrix R and the triplet information; and

determining a privacy exposure risk quantification value for the group based on the privacy exposure degree to determine a privacy exposure risk for the group.

7. The method of claim 6, wherein preprocessing the private data set comprises: the private data set is cleaned, formatted, filtered of useless data, normalized by repeated data, and normalized to generate a feature vector of the private data set.

8. The method of claim 6,

9. The method of claim 6, wherein determining the degree of privacy disclosure comprises:

defining a privacy disclosure risk coefficient theta ═ (Si × Fpi)/Bi;

determining the privacy leakage risk vector R_VALUE＝R*T。

10. The method of claim 9, wherein determining the group's privacy-exposure risk quantified value comprises weighting the privacy-exposure risk vector according to a frequency of use of each type of privacy data set to obtain the privacy-exposure risk quantified value.