CN117992809A - Hierarchical protection method for operation and maintenance information of multiple databases of bank - Google Patents

Hierarchical protection method for operation and maintenance information of multiple databases of bank Download PDF

Info

Publication number
CN117992809A
CN117992809A CN202410406963.5A CN202410406963A CN117992809A CN 117992809 A CN117992809 A CN 117992809A CN 202410406963 A CN202410406963 A CN 202410406963A CN 117992809 A CN117992809 A CN 117992809A
Authority
CN
China
Prior art keywords
user
data point
user data
point set
service
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202410406963.5A
Other languages
Chinese (zh)
Other versions
CN117992809B (en
Inventor
唐军
陈伟
周超萍
彭向南
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Jiangsu Kaibo Technology Co ltd
Original Assignee
Jiangsu Kaibo Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Jiangsu Kaibo Technology Co ltd filed Critical Jiangsu Kaibo Technology Co ltd
Priority to CN202410406963.5A priority Critical patent/CN117992809B/en
Publication of CN117992809A publication Critical patent/CN117992809A/en
Application granted granted Critical
Publication of CN117992809B publication Critical patent/CN117992809B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention relates to the technical field of data processing, in particular to a hierarchical protection method for operation and maintenance information of multiple databases of a bank, which comprises the following steps: collecting a plurality of user service data of each user service type in each user; acquiring a plurality of user data point sets, and acquiring a plurality of level clustering clusters according to the change similarity condition of user service data among different user service types; obtaining the degree of protection to be achieved for each user data point set according to the information distribution trend condition of the whole user data point set in the level cluster and the information change difference condition among different user data point sets; and screening and generalizing all the user data point sets according to the degree to be protected of each user data point set to obtain a plurality of generalized user service data. The invention improves the accuracy of the protection result and improves the grading protection efficiency of the operation and maintenance information.

Description

Hierarchical protection method for operation and maintenance information of multiple databases of bank
Technical Field
The invention relates to the technical field of data processing, in particular to a hierarchical protection method for operation and maintenance information of multiple databases of a bank.
Background
A large amount of user data in different types exist in a business system of a bank, and the data volume of the user data is continuously expanded along with the rapid iteration of business so as to ensure the data security of user information; the sensitive data in the user data needs to be protected.
The conventional method generally utilizes a data generalization mode to uniformly protect all user data, but the degree of judging the importance of the user by the data in different dimensions in a multi-database of a bank is different, and the importance of the data in the dimensions is also different for the user, so that different influence relations exist among the data in different dimensions, the traditional data generalization protection mode cannot properly protect each data, the accuracy of a protection result is reduced, and the hierarchical protection efficiency of operation and maintenance information is reduced.
Disclosure of Invention
The invention provides a hierarchical protection method for operation and maintenance information of multiple databases of a bank, which aims to solve the existing problems: the degree of judging the importance of the user is different for the data of different dimensions in the bank multi-database, and the importance of the data of the dimensions is also different for the user, so that different influence relations exist among the data of different dimensions, and each data cannot be properly protected in the traditional data generalization protection mode.
The invention relates to a hierarchical protection method for operation and maintenance information of multiple databases of a bank, which adopts the following technical scheme:
the method comprises the following steps:
Collecting a plurality of user service data of each user service type in each user, wherein each user service type corresponds to a plurality of recording moments, and each recording moment corresponds to a plurality of user service data;
For any user and any recording moment, recording a data set integrally formed by user service data of all user service types of the user at the recording moment as a user data point set of the user at the recording moment, and clustering all user service data according to the change similarity condition of the user service data among different user service types to obtain a plurality of level clustering clusters;
Obtaining the degree of protection to be achieved for each user data point set according to the information distribution trend condition of the whole user data point set in the level cluster and the information change difference condition among different user data point sets;
And screening and generalizing all the user data point sets according to the degree to be protected of each user data point set to obtain a plurality of generalized user service data.
Preferably, the clustering is performed on all user service data according to the change similarity condition of the user service data among different user service types to obtain a plurality of level clustering clusters, and the specific method comprises the following steps:
Acquiring dimension service comprehensive correlation factors according to the correlation conditions of user service data among different user service types;
obtaining the data sensitivity of each user data point set according to the dimension service comprehensive correlation factor and the association condition of the user service data among different user data point sets;
Presetting a cluster number Taking the absolute value of the difference value of the data sensitivity between different user data point sets as a distance measure according to the clustering quantity/>And the distance measurement is carried out, k-means clustering is carried out on all user data point sets to obtain a plurality of cluster clusters, and each cluster is recorded as a level cluster.
Preferably, the dimension service comprehensive correlation factor is obtained according to the correlation condition of the user service data among different user service types, and the specific method comprises the following steps:
For any user service type, marking a sequence formed by all user service data under the user service type as a single-dimension service type data sequence, and acquiring all single-dimension service type data sequences; the pearson phase relation number between any two single-dimensional service type data sequences is recorded as dimension service correlation, and the standard deviation of the dimension service correlation of all any two single-dimensional service type data sequences is recorded as dimension service comprehensive correlation factor.
Preferably, the data sensitivity of each user data point set is obtained according to the dimension service comprehensive correlation factor and the association condition of the user service data among different user data point sets, and the specific method comprises the following steps:
For any user data point set, carrying out linear normalization on user service data of all user service types in the user data point set, and recording the normalized user service data as a user service standard value; the absolute value of the difference value of any two user service standard values in the user data point set is recorded as local data similarity, and the average value of the local similarity of all any two user service standard values in the user data point set is recorded as the anti-interference coefficient of the user data point set; and recording the product of the anti-interference coefficient of the user data point set and the dimension service comprehensive correlation factor as the data sensitivity of the user data point set.
Preferably, the method for obtaining the degree of protection of each user data point set according to the information distribution trend condition of the whole user data point set in the level cluster and the information change difference condition among different user data point sets includes the following specific steps:
Obtaining the type trend similarity of each level cluster according to the distribution condition of all user data point sets in each level cluster;
obtaining factors to be protected of each user data point set according to the type trend similarity of each level cluster; and carrying out linear normalization on all the factors to be protected, and marking the normalized factors to be protected as the degree of to be protected.
Preferably, the obtaining the type trend similarity of each level cluster according to the distribution condition of all the user data point sets in each level cluster includes the following specific methods:
For any one level cluster, acquiring the similarity of service data of all any two user data point sets in the level cluster;
According to the similarity of service data of any two user data point sets in the level cluster and the difference of data sensitivity between any two user data point sets, the type trend similarity of the level cluster is obtained, and the specific method is as follows:
In the method, in the process of the invention, Representing type trend similarity of the level cluster; /(I)Representing the number of all user data point sets in the hierarchical cluster; /(I)Representing the division of the level clusterThe number of all user data point sets except the individual user data point set; /(I)Representation except for the firstSecond/>, outside the set of individual user data pointsA set of individual user data points, andBusiness data similarity of individual user data point sets; /(I)Represents the/>Data sensitivity of the individual user data point sets; /(I)Representation except for the firstSecond/>, outside the set of individual user data pointsData sensitivity of the individual user data point sets; /(I)Representing preset super parameters; /(I)The representation takes absolute value.
Preferably, the method for obtaining the similarity of the service data of any two user data point sets in the level cluster includes the following specific steps:
For any user data point set in the level cluster, marking a sequence formed by user service data under all user service types in the user data point set as a user service sequence of the user data point set; acquiring user service sequences of all user data point sets in the level cluster;
For any two user data point sets, the pearson phase relation number of the user service sequence between the two user data point sets is recorded as the service data similarity of the two user data point sets.
Preferably, the obtaining the factor to be protected of each user data point set according to the type trend similarity of each level cluster includes the following specific methods:
acquiring the sensitive contrast of each user data point set;
Recording any user data point set in any level cluster as a target user data point set, and obtaining a factor to be protected of the target user data point set according to the type trend similarity of the level cluster and the sensitive contrast of the target user data point set, wherein the specific method comprises the following steps:
In the method, in the process of the invention, Representing factors to be protected of a target user data point set; /(I)Representing the sensitive contrast of the target user data point set; /(I)Representing a maximum value of sensitive contrast of all user data point sets in the level cluster; /(I)Representing type trend similarity of the level cluster; /(I)The euclidean distance between the cluster center in the level cluster and the target user data point set is represented.
Preferably, the method for acquiring the sensitive contrast of each user data point set includes the following specific steps:
And recording any one user data point set in any one level cluster as a first target user data point set, recording the average value of the data sensitivity of all user data point sets in the level cluster as a cluster-like sensitivity, and recording the absolute value of the difference value between the data sensitivity of the first target user data point set and the cluster-like sensitivity as the sensitive contrast of the first target user data point set.
Preferably, the filtering generalizing is performed on all the user data point sets according to the to-be-protected degree of each user data point set to obtain a plurality of generalized user service data, including the following specific methods:
Presetting a threshold value of the degree of protection to be protected The degree of protection is greater than/>The user data point set of the (a) is marked as a user data point set to be processed; the degree of protection is less than or equal to/>The user data point set of (2) is marked as a pending user data point set; and carrying out data generalization processing on each user service data in each user data point set to be processed to obtain a plurality of generalized user service data.
The technical scheme of the invention has the beneficial effects that: obtaining a level cluster by analyzing the change similarity condition of user service data among different user service types; obtaining the degree of protection to be achieved for each user data point set according to the information distribution trend condition of the whole user data point set in the level cluster and the information change difference condition among different user data point sets; thereby carrying out screening generalization treatment; firstly, according to the change similarity condition of user service data among different user service types, clustering all the user service data to obtain a plurality of level clustering clusters, wherein the level clustering clusters are used for dividing the user service data with similar value change into the same clustering cluster so as to improve the efficiency of information hierarchical protection; obtaining the degree to be protected of each user data point set according to the information distribution trend condition of the whole user data point set in the level cluster and the information change difference condition among different user data point sets, wherein the degree to be protected is used for describing the tightness degree of the mutual influence relationship between the user data point set and other user data point sets in the level cluster to which the user data point set belongs, and the degree of the user data point set needing to be protected is reflected better; the invention carries out self-adaptive generalization processing on the user data point sets by analyzing the numerical association condition and information difference condition of the user service data among different user service types and the distribution condition of different user data point sets; the accuracy of the protection result is improved, and the hierarchical protection efficiency of the operation and maintenance information is improved.
Drawings
In order to more clearly illustrate the embodiments of the invention or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described, it being obvious that the drawings in the following description are only some embodiments of the invention, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
Fig. 1 is a flow chart of steps of a hierarchical protection method for operation and maintenance information of multiple databases of a bank according to the present invention.
Detailed Description
In order to further describe the technical means and effects adopted by the present invention to achieve the preset purpose, the following description refers to the specific implementation, structure, characteristics and effects of a hierarchical protection method for operation and maintenance information of multiple databases of a bank according to the present invention, which are described in detail below with reference to the accompanying drawings and preferred embodiments. In the following description, different "one embodiment" or "another embodiment" means that the embodiments are not necessarily the same. Furthermore, the particular features, structures, or characteristics of one or more embodiments may be combined in any suitable manner.
Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs.
The invention provides a specific scheme for a hierarchical protection method for operation and maintenance information of multiple databases of a bank.
Referring to fig. 1, a flowchart of a step of a hierarchical protection method for operation and maintenance information of multiple databases in a bank according to an embodiment of the present invention is shown, where the method includes the following steps:
Step S001: and collecting a plurality of user service data of each user service type in each user.
It should be noted that, in the existing method, all user data is generally uniformly protected by using a data generalization mode, but the degree of judging the importance of the user is different for data in different dimensions in a multi-database of a bank, and the importance of the data in the dimensions is also different for the user, so that different influence relations exist for the data in different dimensions, and each data cannot be properly protected by using the traditional data generalization mode for protection, thereby reducing the accuracy of a protection result and the hierarchical protection efficiency of operation and maintenance information.
Specifically, user service data of a core user needs to be collected at first, and the specific process is as follows: acquiring a plurality of historical user service data of seven user service types, namely user card numbers, contact ways, financial product investment quantity, fund access times, total fund storage amount, total fund taking amount and user balance of 30 users in nearly three months from a user database of a banking system; and acquiring a plurality of user service data of all user service types in a plurality of users. Specifically, the user database of the present embodiment updates and records the user service data of all user service types of all users once every 1 day as one recording time.
It should be noted that, in this embodiment, the number of users is 30, the number of user service types is 7, and the contents of these 7 user service types are sequentially described by taking the number of user cards, the contact manner, the investment number of financial products, the number of times of money access and the balance of users as examples, which are not specifically limited, where the number of users, the content and the number of user service types may be determined according to specific implementation conditions.
So far, the method obtains a plurality of user service data of each user service type in each user.
Step S002: obtaining the data sensitivity of each user data point set according to the association condition among different user service types and the overall data fluctuation condition of the user service data; and clustering all the user data point sets according to the data sensitivity of the user data point sets to obtain a plurality of level clustering clusters.
It should be noted that, in the environment of development and operation of banking, the banking system is continuously invaded by external part of IP, which may cause leakage of part of user service data, thereby causing loss of personal privacy information of the corresponding user. Each user is influenced by economic association factors such as living environment, income level, consumption habit and the like, so that the overall change condition of the user service data of the user is different, and the corresponding characterized user service characteristics are also different; the degree of influence of different user service types on the user service data is different for a single user, and the user service types can mutually influence in a real life environment. In order to reduce the possibility of leakage of data information, it is necessary to analyze the distribution structural characteristics of different user service types and rank the user service data for subsequent processing.
Preferably, in one embodiment of the present invention, according to the overall change condition of user service data of each user service type in a plurality of users, a data fluctuation factor of each user service type is obtained, including the following specific methods:
taking any user and any user service type as an example, if the user service data of the user service type in the user is not 0, marking the user as a core user of the user service type; and acquiring all core users of the user service type, and acquiring all core users of all user service types.
Further, as an example, the first can be calculated by the following formulaData fluctuation factor for individual user traffic types:
In the method, in the process of the invention, Represents the/>A data fluctuation factor for each user service type; /(I)Represents the/>The number of all core users of the individual user service type; /(I)Represents the/>Individual user traffic type at/>The number of all user service data in the individual core users; /(I)Represents the/>Individual user traffic type at/>First/>, among individual core usersIndividual user traffic data; /(I)Represents the/>The average value of all user service data of the individual user service types in all core users; /(I)The representation takes absolute value.
It is noted that the first quantization is performed by the variation of user traffic data between adjacent core users by user traffic typeA data fluctuation factor for each user service type; if/>The larger the data fluctuation factor of the individual user service type, the description of the/>The individual user traffic types belong to the more easily changeable types for the user, the/>The more susceptible individual user traffic types are to user behavior, reflecting the/>The more individual user service types can represent the individual service data characteristics of the user.
Preferably, in one embodiment of the present invention, the method for obtaining the cross correlation degree of the user dimension of each user service type according to the difference condition of the data fluctuation factor between each user service type and other user service types includes the following specific steps:
Will divide by Each user service type except the individual user service type is marked as a comparison user service type; according to/>The variation difference of the data fluctuation factor between each user service type and each comparison user service type is obtained to obtain the/>User dimension cross correlation degree of individual user traffic types. As an example, the/>, can be calculated by the following formulaUser dimension cross correlation degree of individual user traffic types:
In the method, in the process of the invention, Represents the/>User dimension cross correlation degree of individual user service types; /(I)Represents the/>The number of all comparison user service types of the individual user service types; /(I)Represents the/>A data fluctuation factor for each user service type; /(I)Represents the/>/>, Of individual user traffic typesData fluctuation factors of the user service types are compared; /(I)Representing the maximum value of the data fluctuation factors of all the comparison user service types; /(I)The representation takes absolute value; /(I)Representing an exponential function based on natural constants, the examples employ/>Model to present inverse proportional relationship and normalization process,/>For model input, the implementer may choose the inverse proportion function and the normalization function according to the actual situation.
It is to be noted that by different comparison of user service types and the firstQuantification of the variation of the data fluctuation factor between individual user traffic typesUser dimension cross correlation degree of individual user service types; if/>The greater the degree of cross-correlation of user dimensions for individual user traffic types, the description for the/>For all core users of individual user traffic types, the/>All of the individual user traffic types are compared with the user traffic type and the/>The closer the association of individual user service types, the more/>The greater the influence of individual user traffic types on other comparative user traffic types, the more reflective the/>The greater the importance of individual user traffic types to the presentation of personal information of the user.
Preferably, in one embodiment of the present invention, according to a user dimension cross correlation degree of each user service type and a correlation condition between all user service data, clustering the user service data of each user at each recording time to obtain a plurality of level clustering clusters, including the specific method that:
Taking any user, any recording time and any user service type as an example, if the user is a core user of the user service type, marking the user service type as the core user service type of the user; the ratio of the number of all core user service types of the user to the number of all user service types of the user is recorded as the core service degree of the user; recording a data set integrally formed by user service data of all user service types of the user at the recording moment as a user data point set of the user at the recording moment; acquiring a user data point set of the user at all recording moments; and acquiring user data point sets of all users at all recording moments. Wherein each set of user data points corresponds to a user, a recording instant and a plurality of user traffic types.
Further, the first step isThe mean value of the cross-correlation degree of the user dimensions of all user service types in each user data point set is recorded as the/>Comprehensive dimension sensitivity of the individual user data point sets; according to the user dimension cross correlation degree of each user service type, the core service degree of the corresponding user and the/>, in all user data point setsComprehensive dimension sensitivity of the user data point set to obtain the/>Data sensitivity of the individual user data point sets. As an example, the/>, can be calculated by the following formulaData sensitivity of individual user data point sets:
In the method, in the process of the invention, Represents the/>Data sensitivity of the individual user data point sets; /(I)Represents the/>Comprehensive dimension sensitivity of the individual user data point sets; /(I)Representing the number of all user data point sets; /(I)Representing the number of all user traffic types; /(I)Represents the/>The individual user data points are set at the/>User dimension cross correlation degree of individual user service types; /(I)Represents the/>The individual user data point sets correspond to the core service degree of the user; /(I)Represents the/>The individual user data points are set at the/>User dimension cross correlation degree of individual user service types; /(I)The representation takes absolute value.
It is to be noted that, by the difference value between the ratios of the cross correlation degree of the corresponding user dimensions of different user data point sets and the core service degree, and the thComprehensive dimensional sensitivity of individual user data point sets to quantify/>Data sensitivity of the individual user data point sets; if/>The greater the data sensitivity of the individual user data point set, the description of the/>The more easily the individual user data point set is affected by the individual user traffic types, the greater the change is found, reflecting the/>The more the individual user data point sets need to be desensitized.
Further, a neighborhood radius is presetA minimum point/>Wherein the present embodiment usesTo describe the example, the present embodiment is not particularly limited, wherein/>The absolute value of the difference in data sensitivity between different user data point sets can be used as a distance measure according to the specific implementationAnd performing DBSCAN clustering on all the user data point sets by the distance measurement to obtain a plurality of clustering clusters, and marking each clustering cluster as a level clustering cluster. Wherein according to/>The clustering process of the distance measurement is a well-known content of a DBSCAN (Density-Based Spatial Clustering of Applications with Noise) clustering algorithm, and the embodiment will not be described in detail.
Optionally, in other embodiments, taking any one user service type as an example, a sequence formed by all user service data under the user service type is recorded as a single-dimensional service type data sequence, and all the single-dimensional service type data sequences are obtained; the pearson phase relation number between any two single-dimensional service type data sequences is recorded as dimension service correlation, and the standard deviation of the dimension service correlation of all any two single-dimensional service type data sequences is recorded as dimension service comprehensive correlation factor. The obtaining of the pearson correlation coefficient is a known technique, and this embodiment will not be described in detail.
Further, taking any user data point set as an example, carrying out linear normalization on user service data of all user service types in the user data point set, and recording each normalized user service data as a user service standard value; the absolute value of the difference value of any two user service standard values in the user data point set is recorded as local data similarity, and the average value of the local similarity of all any two user service standard values in the user data point set is recorded as the anti-interference coefficient of the user data point set; and recording the product of the anti-interference coefficient of the user data point set and the dimension service comprehensive correlation factor as the data sensitivity of the user data point set.
Further, presetting a cluster numberWherein the present embodiment is described as/>To describe the example, the present embodiment is not particularly limited, wherein/>Depending on the particular implementation; taking absolute value of difference value of data sensitivity between different user data point sets as distance measurement according to clustering quantity/>And the distance measurement is carried out, k-means clustering is carried out on all user data point sets to obtain a plurality of cluster clusters, and each cluster is recorded as a level cluster. Wherein according to the number of clusters/>The clustering process of the distance metrics is a well-known content of a k-means clustering algorithm, and the embodiment is not repeated.
So far, all the level clustering clusters are obtained through the method.
Step S003: and obtaining the degree of protection of each user data point set according to the data distribution trend condition of the whole user data point set in the level cluster and the information change difference condition among different user data point sets.
It should be noted that, for the same level cluster, the distribution trend among different user data point sets in the level cluster will also have different degrees of difference due to the fact that the user service data under the respective user service types are not identical among different user data point sets in the level cluster, so that part of the user data point sets have more obvious trend distribution; in an actual environment, the user data point sets with obvious trend distribution are greatly influenced by each user service type, and are more easily interfered by the outside.
Preferably, in one embodiment of the present invention, the type trend similarity of each level cluster is obtained according to the distribution situation of all user data point sets in each level cluster, including the following specific methods:
taking any user data point set in any level cluster as an example, and recording a sequence formed by user service data under all user service types in the user data point set as a user service sequence of the user data point set; acquiring user service sequences of all user data point sets in the level cluster; taking any two user data point sets as an example, recording the Pearson phase relation number of the user service sequence between the two user data point sets as the service data similarity of the two user data point sets; and acquiring the similarity of the service data of all any two user data point sets in the level cluster. Wherein each set of user data points corresponds to a sequence of user traffic.
Further, the type trend similarity of the level cluster is obtained according to the business data similarity of any two user data point sets in the level cluster and the difference of data sensitivity between any two user data point sets. As an example, the type trend similarity for the level cluster may be calculated by the following formula:
In the method, in the process of the invention, Representing the type trend similarity of the level cluster; /(I)Representing the number of all user data point sets in the level cluster; /(I)Representing the division/>, in the hierarchical clusterThe number of all user data point sets except the individual user data point set; /(I)Representation except for the firstSecond/>, outside the set of individual user data pointsA set of individual user data points, andBusiness data similarity of individual user data point sets; /(I)Represents the/>Data sensitivity of the individual user data point sets; /(I)Representation except for the firstSecond/>, outside the set of individual user data pointsData sensitivity of the individual user data point sets; /(I)Representing preset hyper-parameters, wherein the embodiment presets/>For preventing denominator from being 0; /(I)The representation takes absolute value.
The method is characterized in that the type trend similarity of the hierarchical clustering clusters is measured through the difference of data sensitivity among different user data point sets in the hierarchical clustering clusters and the business data similarity; if the type trend similarity of the level cluster is larger, the user service data in the level cluster is less likely to be interfered by the outside, and the degree of reflecting that the user service data needs to be protected is lower.
Preferably, in an embodiment of the present invention, the obtaining the degree of protection to be provided for each user data point set according to the type trend similarity of each level cluster includes the following specific methods:
Taking any user data point set in the level cluster as an example, marking the average value of the data sensitivity of all the user data point sets in the level cluster as a cluster-like sensitivity, and marking the absolute value of the difference value between the data sensitivity of the user data point set and the cluster-like sensitivity as the sensitivity contrast of the user data point set. And obtaining factors to be protected of the user data point set according to the type trend similarity of the level cluster and the sensitive contrast of the user data point set. As an example, the factors to be safeguarded for the set of user data points may be calculated by the following formula:
In the method, in the process of the invention, Representing factors to be safeguarded of the user data point set; /(I)Representing the sensitive contrast of the set of user data points; representing a maximum value of sensitive contrast of all user data point sets in the level cluster; /(I) Representing the type trend similarity of the level cluster; /(I)Representing the euclidean distance of the cluster center in the level cluster from the set of user data points.
The method is characterized in that the factors to be protected of the user data point sets are measured by comparing the data sensitivity between the user data point sets and the whole user data point sets in the level clustering clusters and the type trend similarity of the level clustering clusters and combining the distribution distance between the user data point sets and the corresponding clustering centers; if the factor to be protected of the user data point set is larger, the user service data in the user data point set is safer and is easier to lose, and the user service data in the user data point set is reflected to be subjected to data protection processing.
In addition, it should be noted that the level clustering cluster is obtained based on clustering all user data point sets, and each user data point set in the level clustering cluster is equivalent to a multidimensional data point, and because the euclidean distance can analyze the distance condition between multidimensional data points, the distance between the user data point set and the corresponding clustering center can be directly measured through the euclidean distance. The obtaining of the euclidean distance is a known technique, and this embodiment is not described in detail.
Further, obtaining factors to be protected of all user data point sets, carrying out linear normalization on all the factors to be protected, and recording each normalized factor to be protected as the degree of to be protected.
So far, the to-be-protected degree of all user data point sets is obtained through the method.
Step S004: and carrying out data protection processing on each level cluster according to the degree to be protected and the data sensitivity of the user data point set to obtain a plurality of generalized user service data.
Preferably, in one embodiment of the present invention, according to the degree of protection of each user data point set, all user data point sets are screened and generalized to obtain a plurality of generalized user service data, including the specific methods as follows:
Presetting a threshold value of the degree of protection to be protected Wherein the present embodiment is described as/>To describe the example, the present embodiment is not particularly limited, wherein/>Depending on the particular implementation; the degree of protection is greater than/>The user data point set of the (a) is marked as a user data point set to be processed; the degree of protection is less than or equal to/>The user data point set of (2) is marked as a pending user data point set; performing data generalization processing on each user service data in each user data point set to be processed to obtain a plurality of generalized user service data; and re-storing all user service data in all undetermined user data point sets and all generalized user service data in a system database to finish desensitization protection processing based on user information on a bank open environment. The process of performing data generalization processing on data is well known in the art, and is not in the protection scope of the present invention, and will not be described herein.
Optionally, in other embodiments, data protection processing is performed on each level cluster to obtain a plurality of generalized user service data, including the following specific methods:
respectively carrying out data replacement processing on all user service data in each level cluster to obtain a plurality of replaced user service data; and carrying out data generalization processing on all the replaced user service data to obtain a plurality of generalized user service data, and storing all the generalized user service data in a system database again to finish hierarchical protection based on operation and maintenance information on a bank multi-database. The process of performing the data permutation on the data is well known in the art, and is not in the scope of the present invention, and will not be described herein.
This embodiment is completed.
The above description is only of the preferred embodiments of the present invention and is not intended to limit the invention, but any modifications, equivalent substitutions, improvements, etc. within the principles of the present invention should be included in the scope of the present invention.

Claims (10)

1. A hierarchical protection method for operation and maintenance information of multiple databases of a bank, which is characterized by comprising the following steps:
Collecting a plurality of user service data of each user service type in each user, wherein each user service type corresponds to a plurality of recording moments, and each recording moment corresponds to a plurality of user service data;
For any user and any recording moment, recording a data set integrally formed by user service data of all user service types of the user at the recording moment as a user data point set of the user at the recording moment, and clustering all user service data according to the change similarity condition of the user service data among different user service types to obtain a plurality of level clustering clusters;
Obtaining the degree of protection to be achieved for each user data point set according to the information distribution trend condition of the whole user data point set in the level cluster and the information change difference condition among different user data point sets;
And screening and generalizing all the user data point sets according to the degree to be protected of each user data point set to obtain a plurality of generalized user service data.
2. The hierarchical protection method for operation and maintenance information of multiple databases of a bank according to claim 1, wherein the clustering is performed on all user service data according to the change similarity condition of the user service data among different user service types to obtain a plurality of level cluster clusters, and the specific method comprises the following steps:
Acquiring dimension service comprehensive correlation factors according to the correlation conditions of user service data among different user service types;
obtaining the data sensitivity of each user data point set according to the dimension service comprehensive correlation factor and the association condition of the user service data among different user data point sets;
Presetting a cluster number Taking the absolute value of the difference value of the data sensitivity between different user data point sets as a distance measure according to the clustering quantity/>And the distance measurement is carried out, k-means clustering is carried out on all user data point sets to obtain a plurality of cluster clusters, and each cluster is recorded as a level cluster.
3. The hierarchical protection method for operation and maintenance information of multiple databases of a bank according to claim 2, wherein the step of obtaining the dimension business comprehensive correlation factor according to the association condition of user business data among different user business types comprises the following specific steps:
For any user service type, marking a sequence formed by all user service data under the user service type as a single-dimension service type data sequence, and acquiring all single-dimension service type data sequences; the pearson phase relation number between any two single-dimensional service type data sequences is recorded as dimension service correlation, and the standard deviation of the dimension service correlation of all any two single-dimensional service type data sequences is recorded as dimension service comprehensive correlation factor.
4. The hierarchical protection method for operation and maintenance information of multiple databases in banks according to claim 2, wherein the obtaining the data sensitivity of each user data point set according to the dimension business comprehensive correlation factor and the association condition of user business data among different user data point sets comprises the following specific steps:
For any user data point set, carrying out linear normalization on user service data of all user service types in the user data point set, and recording the normalized user service data as a user service standard value; the absolute value of the difference value of any two user service standard values in the user data point set is recorded as local data similarity, and the average value of the local similarity of all any two user service standard values in the user data point set is recorded as the anti-interference coefficient of the user data point set; and recording the product of the anti-interference coefficient of the user data point set and the dimension service comprehensive correlation factor as the data sensitivity of the user data point set.
5. The hierarchical protection method for operation and maintenance information of multiple databases of a bank according to claim 1, wherein the obtaining the degree of protection of each user data point set according to the information distribution trend condition of the whole user data point set in the hierarchical cluster and the information change difference condition between different user data point sets comprises the following specific steps:
Obtaining the type trend similarity of each level cluster according to the distribution condition of all user data point sets in each level cluster;
obtaining factors to be protected of each user data point set according to the type trend similarity of each level cluster; and carrying out linear normalization on all the factors to be protected, and marking the normalized factors to be protected as the degree of to be protected.
6. The hierarchical protection method for operation and maintenance information of multiple databases of a bank according to claim 2 or 5, wherein the obtaining the type trend similarity of each level cluster according to the distribution condition of all user data point sets in each level cluster comprises the following specific steps:
For any one level cluster, acquiring the similarity of service data of all any two user data point sets in the level cluster;
According to the similarity of service data of any two user data point sets in the level cluster and the difference of data sensitivity between any two user data point sets, the type trend similarity of the level cluster is obtained, and the specific method is as follows:
In the method, in the process of the invention, Representing type trend similarity of the level cluster; /(I)Representing the number of all user data point sets in the hierarchical cluster; /(I)Representing the division of the level clusterThe number of all user data point sets except the individual user data point set; /(I)Representation except for the firstSecond/>, outside the set of individual user data pointsA set of individual user data points, andBusiness data similarity of individual user data point sets; Represents the/> Data sensitivity of the individual user data point sets; /(I)Representation except for the firstSecond/>, outside the set of individual user data pointsData sensitivity of the individual user data point sets; /(I)Representing preset super parameters; /(I)The representation takes absolute value.
7. The hierarchical protection method for operation and maintenance information of multiple databases in banks according to claim 6, wherein the specific method for obtaining the similarity of service data of all arbitrary two user data point sets in the hierarchical cluster is as follows:
For any user data point set in the level cluster, marking a sequence formed by user service data under all user service types in the user data point set as a user service sequence of the user data point set; acquiring user service sequences of all user data point sets in the level cluster;
For any two user data point sets, the pearson phase relation number of the user service sequence between the two user data point sets is recorded as the service data similarity of the two user data point sets.
8. The hierarchical protection method for operation and maintenance information of multiple databases of a bank according to claim 2 or 5, wherein the obtaining the factors to be protected of each user data point set according to the type trend similarity of each level cluster comprises the following specific steps:
acquiring the sensitive contrast of each user data point set;
Recording any user data point set in any level cluster as a target user data point set, and obtaining a factor to be protected of the target user data point set according to the type trend similarity of the level cluster and the sensitive contrast of the target user data point set, wherein the specific method comprises the following steps:
In the method, in the process of the invention, Representing factors to be protected of a target user data point set; /(I)Representing the sensitive contrast of the target user data point set; representing a maximum value of sensitive contrast of all user data point sets in the level cluster; /(I) Representing type trend similarity of the level cluster; /(I)The euclidean distance between the cluster center in the level cluster and the target user data point set is represented.
9. The hierarchical protection method for operation and maintenance information of multiple databases in a bank according to claim 8, wherein the obtaining the sensitive contrast of each user data point set comprises the following specific steps:
And recording any one user data point set in any one level cluster as a first target user data point set, recording the average value of the data sensitivity of all user data point sets in the level cluster as a cluster-like sensitivity, and recording the absolute value of the difference value between the data sensitivity of the first target user data point set and the cluster-like sensitivity as the sensitive contrast of the first target user data point set.
10. The hierarchical protection method for operation and maintenance information of multiple databases in banks according to claim 1, wherein the specific method for filtering and generalizing all user data point sets according to the degree of protection of each user data point set to obtain a plurality of generalized user service data comprises the following steps:
Presetting a threshold value of the degree of protection to be protected The degree of protection is greater than/>The user data point set of the (a) is marked as a user data point set to be processed; the degree of protection is less than or equal to/>The user data point set of (2) is marked as a pending user data point set; and carrying out data generalization processing on each user service data in each user data point set to be processed to obtain a plurality of generalized user service data.
CN202410406963.5A 2024-04-07 2024-04-07 Hierarchical protection method for operation and maintenance information of multiple databases of bank Active CN117992809B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202410406963.5A CN117992809B (en) 2024-04-07 2024-04-07 Hierarchical protection method for operation and maintenance information of multiple databases of bank

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202410406963.5A CN117992809B (en) 2024-04-07 2024-04-07 Hierarchical protection method for operation and maintenance information of multiple databases of bank

Publications (2)

Publication Number Publication Date
CN117992809A true CN117992809A (en) 2024-05-07
CN117992809B CN117992809B (en) 2024-06-21

Family

ID=90897876

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202410406963.5A Active CN117992809B (en) 2024-04-07 2024-04-07 Hierarchical protection method for operation and maintenance information of multiple databases of bank

Country Status (1)

Country Link
CN (1) CN117992809B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN118296216A (en) * 2024-06-06 2024-07-05 厦门市华林测绘信息有限公司 Association matching method and system for family spectrum information and geographic information

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP6190511B1 (en) * 2016-10-18 2017-08-30 株式会社大和総研 Salesperson clustering system and program
JP2021111281A (en) * 2020-01-15 2021-08-02 アルトア株式会社 Business operator classification device, method, program, business operator evaluation system, and credit risk evaluation system
CN115081025A (en) * 2022-08-19 2022-09-20 湖南华菱电子商务有限公司 Sensitive data management method and device based on digital middlebox and electronic equipment
CN116701965A (en) * 2023-04-23 2023-09-05 广东电网有限责任公司广州供电局 BIRCH clustering algorithm-based panoramic carbon representation method for enterprise users
CN117473431A (en) * 2023-12-22 2024-01-30 青岛民航凯亚***集成有限公司 Airport data classification and classification method and system based on knowledge graph

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP6190511B1 (en) * 2016-10-18 2017-08-30 株式会社大和総研 Salesperson clustering system and program
JP2021111281A (en) * 2020-01-15 2021-08-02 アルトア株式会社 Business operator classification device, method, program, business operator evaluation system, and credit risk evaluation system
CN115081025A (en) * 2022-08-19 2022-09-20 湖南华菱电子商务有限公司 Sensitive data management method and device based on digital middlebox and electronic equipment
CN116701965A (en) * 2023-04-23 2023-09-05 广东电网有限责任公司广州供电局 BIRCH clustering algorithm-based panoramic carbon representation method for enterprise users
CN117473431A (en) * 2023-12-22 2024-01-30 青岛民航凯亚***集成有限公司 Airport data classification and classification method and system based on knowledge graph

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN118296216A (en) * 2024-06-06 2024-07-05 厦门市华林测绘信息有限公司 Association matching method and system for family spectrum information and geographic information

Also Published As

Publication number Publication date
CN117992809B (en) 2024-06-21

Similar Documents

Publication Publication Date Title
CN117992809B (en) Hierarchical protection method for operation and maintenance information of multiple databases of bank
Onnela et al. Dynamics of market correlations: Taxonomy and portfolio analysis
CN107862347A (en) A kind of discovery method of the electricity stealing based on random forest
WO2002063555A2 (en) An artificial intelligence trending system
WO1999062007A1 (en) A scalable system for clustering of large databases having mixed data attributes
CA2368931A1 (en) Risk management system, distributed framework and method
Wang et al. Temporal-aware graph neural network for credit risk prediction
WO2009010950A1 (en) System and method for predicting a measure of anomalousness and similarity of records in relation to a set of reference records
Li et al. Transaction fraud detection using gru-centered sandwich-structured model
CN109886334A (en) A kind of shared nearest neighbor density peak clustering method of secret protection
Al-Qerem et al. Loan default prediction model improvement through comprehensive preprocessing and features selection
Xu et al. Novel key indicators selection method of financial fraud prediction model based on machine learning hybrid mode
CN116307227A (en) Service information processing method, device and computer equipment
CN114626553A (en) Training method and device of financial data monitoring model and computer equipment
CN117729264A (en) Digital financial service mass information transmission method
Orozco et al. Feature engineering for semi-supervised electricity theft detection in AMI
CN117972792B (en) Method for desensitizing massive user information in bank development environment
CN115375480A (en) Abnormal virtual coin wallet address detection method based on graph neural network
Yang et al. Automatic Feature Engineering‐Based Optimization Method for Car Loan Fraud Detection
Baidoo A credit analysis of the unbanked and underbanked: an argument for alternative data
CN113610629A (en) Method and device for screening client data features from large-scale feature set
Jin et al. Financial credit default forecast based on big data analysis
Guo et al. Statistical decision research of long-term deposit subscription in banks based on decision tree
CN113159137A (en) Gas load clustering method and device
Lin et al. A Credit Scoring Model Based on Integrated Mixed Sampling and Ensemble Feature Selection: RBR_XGB

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant