CN115713249B

CN115713249B - Government satisfaction evaluation system and method based on data security and privacy protection

Info

Publication number: CN115713249B
Application number: CN202211235305.1A
Authority: CN
Inventors: 卢清华; 李梦园; 李方伟
Original assignee: Chongqing Yitong College
Current assignee: Chongqing Yitong College
Priority date: 2022-10-10
Filing date: 2022-10-10
Publication date: 2023-06-13
Anticipated expiration: 2042-10-10
Also published as: CN115713249A

Abstract

The invention discloses a government satisfaction evaluation system and a government satisfaction evaluation method based on data security and privacy protection, and belongs to the technical field of data scoring. The method specifically comprises the following steps: the system comprises a database module, a data security module and a data scoring module, wherein the database module further comprises a data acquisition module and a data classification module; the data security module comprises an identity authentication module, a data desensitization module and a feedback early warning module; the data scoring module comprises a preprocessing module, a model training module and a scoring module. The data security module provides effective encryption protection work for the sensitive information of the investigation masses by using an identity authentication, data desensitization technology and a feedback early warning mechanism, enhances data security and privacy protection, forms an accessible system, increases data transparency and improves fairness and openness of government affair evaluation work. The data scoring module completes model training by using a Catboost algorithm in machine learning, establishes a scientific government satisfaction scoring model and completes relevant data scoring work.

Description

Government satisfaction evaluation system and method based on data security and privacy protection

Technical Field

The invention belongs to the technical field of data scoring, and particularly relates to a government satisfaction evaluation system and method based on data security and privacy protection.

Background

Traditional government performance evaluation is mainly top-down evaluation, which leads partial government workers to only go up and down, and pursues a few evaluated indexes on one side, so that statistical data contains 'moisture'. With the continuous advancement of digital government construction, the data multi-running way becomes a novel and efficient government office mode. Meanwhile, various levels of governments are paying more attention to the opinion of people, and the public is involved in government affair evaluation.

The existing government affair evaluation system only collects satisfaction data through a simple code scanning questionnaire or a web page questionnaire, and only takes the average value after the sum of the satisfaction scores as a final scoring result in data processing. The resulting problems are: (1) The complete government satisfaction degree scoring system is lacking, and the complete government evaluation system with strong adaptability and popularization is lacking from data collection to database establishment, data processing, scoring model establishment and the like; (2) The lack of data security management does not process sensitive information such as attributes of masses, and risks of privacy information disclosure exist; (3) The scoring model is too behind, a scientific and efficient algorithm model is not formed, and the problems behind government affair data are difficult to mine, so that guiding suggestions are provided. (4) The government satisfaction data is not disclosed, so that people cannot access to the query, and the public confidence is lacked.

Therefore, evaluation standards, methods and programs of the application science are urgent to perform government satisfaction evaluation work, and the method has important significance for government personnel to perform responsibilities, construct harmonious trunk group relations and establish good government images.

CN111222753a, an e-government performance assessment system, comprising: the system comprises an evaluation model module, a data acquisition module, a data processing module, an index weight learning and generating module, an evaluation result generating module and an evaluation report natural language generating module. In the electronic government performance evaluation system provided by the invention, an index weight training model is established, the weight of each basic evaluation index is learned from data, and the certainty, objectivity and rationality of the weight of each basic evaluation index in the evaluation process are improved. According to the invention, the final evaluation result has more basis and directivity for the fusion of the scores on the basic evaluation indexes through the weight setting; and through the setting of a plurality of basic evaluation indexes, the government affair module is further conveniently evaluated from all directions, so that the requirements of people are precisely known, and the government affair module is improved.

The e-government performance evaluation system proposed by the CN111222753A patent does not have an encryption protection link for checking sensitive information of masses. The patent provides a data security module, which utilizes identity authentication, data desensitization technology and feedback early warning mechanism to enhance data security and privacy protection.

The cn111222753a patent data processing lacks a data access system. The data access port is arranged in the data security module to form an accessible system, so that people can access original government satisfaction investigation data, data transparency is improved, and fairness and openness of government evaluation work are improved.

The patent CN111222753A focuses on determining the weight of an evaluation index through an evidence calculation method and a differential purification algorithm, combining the weight to synthesize the total evaluation information on the evaluation index to obtain a final evaluation result, and generating a text evaluation report in natural language. The patent focuses on processing the government affair evaluation satisfaction data to obtain a scientific government affair satisfaction degree scoring result. The index screening process is simplified by using a machine algorithm-Catboost algorithm, the influence of human subjective factors is reduced, and the government satisfaction index contribution degree is obtained by using an own importance () function of the Catboost algorithm, so that the index weight is determined. Not only is the data processing speed improved, but also the scientificity and fairness of weight assignment are ensured.

Disclosure of Invention

The present invention is directed to solving the above problems of the prior art. A government satisfaction evaluation system and method based on data security and privacy protection are provided. The technical scheme of the invention is as follows:

a government satisfaction evaluation system based on data security and privacy protection, comprising: a database module, a data security module and a data scoring module, wherein,

the database module is used for forming a database according to the collection and classification results of various government satisfaction evaluation data and providing the database to the data security module;

the data security module is used for performing access control, privacy protection and feedback early warning on government satisfaction evaluation data, completing the security management work of the data and providing the processed data set to the data scoring module;

and the data scoring module is used for model training of the government satisfaction evaluation data, constructing a government satisfaction scoring model and outputting scoring results.

Further, the database module comprises a data acquisition module and a data classification module, wherein the data acquisition module is used for collecting various data of government satisfaction; the data classification module is used for dividing the collected government satisfaction data into month data, quarter data and year data according to time, and further dividing the collected government satisfaction data into sub-data sets including safe construction, legal construction and service evaluation according to government theme content.

Further, the data security module comprises an identity authentication module, a data desensitization module and a feedback early warning module, wherein the identity authentication module is used for authenticating the identity of a visitor and determining the inquiry authority of the visitor; the data desensitization module is used for carrying out desensitization processing on the data set containing sensitive information; the feedback early warning module records the user behavior to generate a log; when the number of times of identity authentication rejection of the same user exceeds a set threshold, the access port is locked in time and early warning is fed back to the terminal.

Further, the specific steps of the data security module for completing the access control, privacy protection and feedback early warning work are as follows:

(1) When a user applies for accessing government satisfaction data resources, firstly, authenticating the identity of the user, and refusing access if the authentication is not passed; if the authentication is passed, the operation can be further performed;

judging the data resource requested by the user, and obtaining a required data set according to the authority when the data resource does not contain sensitive data; when the request data resource contains sensitive data, encrypting sensitive information including attributes of masses to obtain a desensitized data set;

(2) Recording the access process of the user and generating a corresponding log;

(3) When the number of times of identity authentication rejection of the same user exceeds a set threshold value, the access port is locked in time and feedback early warning is given to the terminal.

Further, the data desensitizing module is used for desensitizing the data set containing sensitive information, and specifically comprises the following steps:

(1) The sensitive data set enters a data desensitizing module;

(2) Determining a desensitization scheme, and desensitizing sensitive data by using modes of truncation, encryption, hiding and replacement;

(3) Writing a desensitization rule, writing a desensitization rule table, wherein different desensitization rules correspond to different data encryption methods;

(4) According to the sensitive data category, namely name, ID card number, mobile phone number, address and desensitization scheme, through the main key association response, according to the appointed desensitization scheme, finish the sensitive data desensitization;

(5) The desensitized data set is provided to a data scoring module.

Further, the data scoring module comprises a preprocessing module, a model training module and a scoring module, wherein the preprocessing module is used for preprocessing the accessed government satisfaction data set including data cleaning, unbalanced data processing and data set segmentation, and providing the processed data set to the model training module; the model training module is used for carrying out model training on the data set, completing model training by utilizing a machine learning algorithm-Catboost algorithm, obtaining government satisfaction index contribution degree by utilizing an importance () function in the Catboost algorithm, further determining index weight and providing the index weight to the scoring module; and the scoring module establishes a government satisfaction scoring model by using the index weight, finishes the scoring work of related data and finally obtains a government satisfaction scoring result.

Furthermore, the model training is completed by using a machine learning algorithm-Catboost algorithm, and the government satisfaction index contribution degree is obtained by using an importance () function in the Catboost algorithm, so that index weights are determined and provided for a scoring module, and the method specifically comprises the following steps:

a government satisfaction evaluation method based on data security and privacy protection based on the system comprises the following steps:

collecting and classifying various government satisfaction evaluation data by utilizing a database module to form a database and providing the database to a data security module;

the data security module is utilized to carry out access control, privacy protection and feedback early warning on government satisfaction evaluation data, the security management work of the data is completed, and the processed data set is provided for the data scoring module;

and performing model training of government satisfaction evaluation data by utilizing a data scoring module, constructing a government satisfaction scoring model, and outputting scoring results.

The invention has the advantages and beneficial effects as follows:

aiming at the defects in the prior art, the invention establishes a complete government satisfaction evaluation system. It comprises the following steps: the system comprises a database module, a data security module and a data scoring module, wherein the database module comprises a data acquisition module and a data classification module and is used for forming a database according to the collection and classification results of various government satisfaction evaluation data and providing the database to the data security module; the data security module comprises an identity authentication module, a data desensitization module and a feedback early warning module, and is used for controlling access to government satisfaction evaluation data, protecting privacy and carrying out feedback early warning, completing data security management work and providing a processed data set to the data scoring module; the data scoring module comprises a preprocessing module, a model training module and a scoring module, and is used for model training of government satisfaction evaluation data, constructing a government satisfaction scoring model and outputting scoring results.

The invention has the following advantages: 1. forming a complete government satisfaction data link, and increasing adaptability and popularization; 2. the data security module is added, the access control, encryption technology and feedback early warning mechanism are utilized to provide effective protection work for the sensitive information of the investigation masses, and the data security and privacy protection are enhanced; 3. an identity authentication module is added to form an accessible system, so that the data transparency is increased, and the fairness and the openness of government affair evaluation work are improved; 4. the government satisfaction scoring model is established by using a machine learning-Catboost algorithm, so that the data processing speed is improved, and the scientificity and fairness of satisfaction index weight assignment are ensured.

Drawings

FIG. 1 is a government satisfaction evaluation system based on data security and privacy protection in accordance with a preferred embodiment of the present invention;

fig. 2 is a schematic flow chart of a database module in a government satisfaction evaluation system based on data security and privacy protection provided by the application;

fig. 3 is a schematic flow chart of a data security module in a government satisfaction evaluation system based on data security and privacy protection provided by the application;

fig. 4 is a schematic flow chart of a data scoring module in a government satisfaction evaluation system based on data security and privacy protection provided by the application;

FIG. 5 is a flow chart of data scoring in an embodiment of the present application;

FIG. 6 is a graph showing the results of a desensitization process for a dataset in an embodiment of the present application;

FIG. 7 is a comparison of the results of the algorithmic model evaluation of the dataset in the example of the present application;

fig. 8 is a constructed government satisfaction score model.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and specifically described below with reference to the drawings in the embodiments of the present invention. The described embodiments are only a few embodiments of the present invention.

The technical scheme for solving the technical problems is as follows:

as shown in fig. 1-6, a government satisfaction evaluation system based on data security and privacy protection comprises a database module, a data security module and a data scoring module, and the implementation method comprises the following steps:

Preferably, the database module comprises a data acquisition module and a data classification module, wherein the data acquisition module is used for collecting various data of government satisfaction; the data classification module is used for dividing the collected government satisfaction data into month data, quarter data and year data according to time, and further dividing the collected government satisfaction data into sub-data sets such as safe construction, legal construction, service evaluation and the like according to government theme content.

Preferably, the data security module comprises an identity authentication module, a data desensitization module and a feedback early warning module, wherein the identity authentication module is used for authenticating the identity of a visitor and determining the inquiry authority of the visitor; the data desensitization module is used for carrying out desensitization treatment on the data set containing sensitive information, so that the data security is improved; and the feedback early warning module records the user behavior to generate a log. When the number of times of identity authentication rejection of the same user exceeds a set threshold, the access port is locked in time and early warning is fed back to the terminal, so that the access safety is improved.

Preferably, the data scoring module comprises a preprocessing module, a model training module and a scoring module, wherein the preprocessing module is used for cleaning the accessed government satisfaction data set, processing unbalanced data, dividing the data set and providing the processed data set to the model training module; the model training module is used for carrying out model training on the data set, completing model training by utilizing a machine learning algorithm-Catboost algorithm, obtaining government satisfaction index contribution degree by utilizing an importance () function in the Catboost algorithm, further determining index weight and providing the index weight to the scoring module; and the scoring module establishes a government satisfaction scoring model by using the index weight, finishes the scoring work of related data and finally obtains a government satisfaction scoring result.

Preferably, the specific steps for the data security module to complete the access control, privacy protection and feedback early warning work are as follows:

(2) Judging the data resource requested by the user, and obtaining a required data set according to the authority when the data resource does not contain sensitive data; when the request data resource contains sensitive data, sensitive information such as attributes of masses is encrypted to obtain a desensitized data set.

(3) And recording the access process of the user, and generating a corresponding log.

(4) When the number of times of identity authentication rejection of the same user exceeds a set threshold value, the access port is locked in time and feedback early warning is given to the terminal.

Preferably, when the sensitive data set is accessed, the data desensitizing module is automatically entered to complete dynamic data desensitization, and the specific steps are as follows:

(1) The sensitive data set enters a data desensitizing module;

(2) A desensitization regimen is determined. Desensitizing sensitive data by means of truncation, encryption, hiding, replacement and the like, such as replacing a true value with special characters (x, etc.);

(4) According to the sensitive data category, namely sensitive information including name, ID card number, mobile phone number and address and a desensitization scheme, responding through a main key association, and according to a designated desensitization scheme, completing desensitization of the sensitive data;

(5) The desensitized data set is provided to a data scoring module.

Preferably, the government satisfaction data set enters a data scoring module to finish the related data scoring operation, and the specific steps are as follows:

(1) And finishing data preprocessing work: the data cleaning work is completed through the preprocessing module, namely, a data set is checked, and data is described; then the over-sampling or under-sampling method is used for finishing the processing of unbalanced data; finally, the data set is divided into a training set and a testing set and provided for a model training module

(2) And (3) completing model training work: the training set data enter a data scoring module to perform model training through a model training module, the model training is completed through a machine learning algorithm-Catboost algorithm, the model is evaluated, and the government satisfaction index contribution degree is obtained through an importance () function in the Catboost algorithm, and index weights are determined and provided for the scoring module;

(3) Finishing data scoring work: and establishing a government satisfaction degree scoring model by using the index weight through a scoring module, and finishing related data scoring work to finally obtain a government satisfaction degree scoring result.

Preferably, the data scoring module completes data preprocessing, model training and establishment of a scoring model to obtain scoring results, and detailed programming pseudo-code sentences based on Python software are as follows:

a first part: completing data preprocessing work

Description data of #

df.describe(data)

Null filling of # with KNN

From fancyimpute import BiScaler,KNN,NuclearNormMinimization,SoftImpute

dataset＝KNN(k＝3).complete(data)

# processing of unbalanced data, oversampling and undersampling

# oversampling: the corresponding function in the Python library is random oversuppler:

from imblearn.over_sampling import RandomOverSampler

ROS＝RandomOverSampler(random_state＝0)

x_resampled,y_resampled＝ROS.fit_sample(x,y)

# undersampling: the function in the corresponding Python library is RandomunderworSampler

from imblearn.under_sampling import RandomUnderSampler

RUS＝RandomUnderSampler(random_state＝0)

x_resampled,y_resampled＝RUS.fit_sample(x,y)

# segmentation dataset

dftrain,dfvalid＝train_test_split(dfdata,train_size＝0.7,random_state＝42)

Xtrain,Ytrain＝dftrain.drop(label_col,axis＝1),dftrain[lable_col]

Xvalid,Yvalid＝dfvalid.drop(label_col,axis＝1),dftrain[lable_col]

Cate_cols_indexs＝np.where(Xtrain,columns.isin(cate_cols))[]

Data_train＝cb.pool(data＝Xtrain,label＝Ytrain,cat_features＝cate_cols)

Data_valid＝cb.pool(data＝Xvalid,label＝Yvalid,cat_features＝cate_cols)

A second part: completing model training work

Setting parameters of #

iterations＝1000

early_stopping_rounds＝200

Params＝{'learning_rate':0.05,

'loss_function':"Logloss",

'eval_metric':"Accuracy",

'depth':6,

'min_data_in_leaf':20,

'random_seed':42,

'logging_level':'Silent',

'use_best_model':True,

'one_hot_max_size':2,

'boosting_type':"Ordered",

'max_ctr_complexity':4}

Training model #

model＝cb.CatBoostClassifier(

iterations＝iterations,

early_stopping_rounds＝early_stopping_rounds,

train_dir＝'catboost_info/',

**Params)

Direct training #

model.fit(

Data_train,

eval_set＝Data_valid,

plot＝TRUE

)

print("model.get_all_params():")

print(model.get_all_params())

Model for # evaluation

y_pred_train＝model.predict(Xtrain)

y_pred_valid＝model.predict(Xvalid)

train_score＝f1_score(Ytrain,y_pred_train)

valid_score＝f1_score(Yvalid,y_pred_valid)

print('train f1_score:{:.5}'.format(train_score))

print('valid f1_score:{:.5}\n'.format(valid_score))

Determining feature importance #

dfimportance＝model.get_feature_importtance(prettified＝True)

dfimportance＝dfimportance.sort_values(by＝"Importances").iloc[-20:]

fig_importance＝px.bar(dfimportance,

x＝"Importances",y＝"Feature ID",title＝"CatBoost Feature Importance Ranking")

display(dfimportance)

display(fig_importance)

Third section: completing data scoring work

# establishes a scoring model and determines scoring results

Fi=fi/sum (Fi) #fi represents the contribution rate of the ith government index

print(Fi)

W＝F1*W1+F2*W2+F3*W3+...+Fn*Wn

Suppose that satisfaction evaluation data of the 2021J-district residents for government food safety work is collected by the data collection module A1 in the database module of fig. 2, and that the data set has sensitive information of the name, the identification card number, the mobile phone number, the address, etc. of the investigator. The data classification module A2 is used for further dividing the data into 3 sub-data sets according to time and government topics: 2021 year data D1, food safety job data D2, and 2021 year food safety job data D3.

Now, as shown in fig. 3, the user R accesses 2021 year food safety work data D3 through the data security module B, and after acquiring the authority through the identity authentication module B1, the data automatically enters into the data desensitization module B2 to complete the desensitization work of the sensitive data information, which specifically comprises the following steps:

(1) According to the sensitive data category, namely name, ID card number, mobile phone number, address and desensitization scheme, through the primary key response, according to the appointed desensitization scheme, finish the desensitization of sensitive data;

the desensitization processing result of the data set D3 is shown in FIG. 6, wherein the names in the data set are reserved with surnames, and the names are hidden; the identification card number reserves the first six digits and the last four digits, so that the identification card number can be matched with regional information and the safety of the information can be improved; starting from the fifth bit, hiding four bits of the mobile phone number; the address information is cut off and only remains in the area, thereby being convenient for checking compliance of government work satisfaction investigation in the corresponding area and preventing information leakage.

(2) The resulting desensitized dataset D3' is provided to the data scoring module C.

To further determine the satisfaction score for the residents of the J area with respect to the government food safety work, the data set D3' utilizes the data scoring module C to complete the final scoring work, referring to fig. 4, as follows:

(1) The government satisfaction data set D3' enters a preprocessing module C1 to finish basic data preprocessing work, including data cleaning work, data checking, null value filling feature and data set segmentation and the like;

(2) The model training work is completed through the model training module C2. Model training is carried out on the data set D3' by using a Catboost algorithm through Python software, and the government satisfaction data set is assumed to be

Wherein the method comprises the steps of

Is an index vector of m government satisfaction features,/->

Is a tag value corresponding to a government satisfaction index. The Catboost algorithm uses the mean +.>

I.e.

To deduce the frequency of each category characteristic, coding to form a brand new numerical variable ++>

I.e.

Wherein [ (S)]Representing an indication function: satisfy the following requirements

The function returns to 1 when the function represents the category variable index, and returns to 0 when the function represents the category variable index is opposite; p is a priori value of the super parameter; the parameter alpha (alpha > 0) is the weight of the prior value; />

And y is _j And respectively represent the j-th category variable index and the corresponding label value thereof.

After the automatic coding work is finished, the Catboost algorithm replaces the gradient estimation method by the self-ordering lifting method, and each government satisfaction sample D is obtained _k (D _k ∈D' ₃ ) Training to obtain a unique model M _i Finally obtain M _n I.e. finding an unbiased gradient estimate of the sample, thereby training and obtaining the final model.

(3) And (5) completing model evaluation. The model trained by the cast model government satisfaction scoring module is evaluated through four indexes, and fig. 7 shows the calculation result of each index, wherein the four measurement indexes are respectively: model training speed, accuracy, F1 value and AUC value, wherein the accuracy, F1 value, AUC value calculation method and measurement content are as follows:

model training speed refers to the time required by different algorithms to train out a model under the same computer equipment environment and the same amount of data sets;

accuracy (precision) refers to the ratio of the sample with the true correct government satisfaction score to the sample, and the calculation formula is

Recall (Recall) refers to the proportion of samples for which the government satisfaction score is truly correct and the classification is identified by the model, calculated as

The F1 value refers to a weighted harmonic mean of accuracy and recall, and assumes that the two weights are the same, i.e

(4) And constructing a government satisfaction degree scoring model. As shown in fig. 8, the relativity importance of n feature variables in the data set D3' is obtained by using the importance function in the CatBoost algorithm, the index contribution rate is determined, and the scoring model is obtained by further weighting according to the index weight, which comprises the following specific steps:

index contribution rate f of n different indexes obtained according to Catboost model _i Further according to F _i ＝f _i /∑f _i Obtaining the weight of the corresponding index, reconstructing a resident food safety satisfaction degree scoring model, and assuming that the resident food safety satisfaction degree scoring is W, wherein each index scoring is W _i The final government satisfaction scoring model is as follows: w=f ₁ W ₁ +F ₂ W ₂ +...+F _i W _i +...+F _n W _n

The system, apparatus, module or unit set forth in the above embodiments may be implemented in particular by a computer chip or entity, or by a product having a certain function. One typical implementation is a computer. In particular, the computer may be, for example, a personal computer, a laptop computer, a cellular telephone, a camera phone, a smart phone, a personal digital assistant, a media player, a navigation device, an email device, a game console, a tablet computer, a wearable device, or a combination of any of these devices.

Computer readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of storage media for a computer include, but are not limited to, phase change memory (PRAM), static Random Access Memory (SRAM), dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), read Only Memory (ROM), electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium, which can be used to store information that can be accessed by a computing device. Computer-readable media, as defined herein, does not include transitory computer-readable media (transmission media), such as modulated data signals and carrier waves.

It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article or apparatus that comprises the element.

The above examples should be understood as illustrative only and not limiting the scope of the invention. Various changes and modifications to the present invention may be made by one skilled in the art after reading the teachings herein, and such equivalent changes and modifications are intended to fall within the scope of the invention as defined in the appended claims.

Claims

1. A government satisfaction evaluation system based on data security and privacy protection, comprising: a database module, a data security module and a data scoring module, wherein,

the data scoring module is used for model training of government satisfaction evaluation data, constructing a government satisfaction scoring model and outputting scoring results;

the database module comprises a data acquisition module and a data classification module, wherein the data acquisition module is used for collecting various data of government satisfaction; the data classification module is used for dividing the collected government satisfaction data into month data, quarter data and year data according to time, and further dividing the collected government satisfaction data into sub-data sets including safe construction, legal construction and service evaluation according to government theme content;

the data security module comprises an identity authentication module, a data desensitization module and a feedback early warning module, wherein the identity authentication module is used for authenticating the identity of a visitor and determining the inquiry authority of the visitor; the data desensitization module is used for carrying out desensitization processing on the data set containing sensitive information; the feedback early warning module records the user behavior to generate a log; when the number of times of identity authentication rejection of the same user exceeds a set threshold, locking an access port in time and feeding back early warning to a terminal;

the data scoring module comprises a preprocessing module, a model training module and a scoring module, wherein the preprocessing module is used for preprocessing the accessed government satisfaction data set, including data cleaning, unbalanced data processing and data set segmentation, and providing the processed data set to the model training module; the model training module is used for carrying out model training on the data set, completing model training by utilizing a machine learning algorithm-Catboost algorithm, obtaining government satisfaction index contribution degree by utilizing an importance function in the Catboost algorithm, further determining index weight and providing the index weight for the scoring module; and the scoring module establishes a government satisfaction scoring model by using the index weight, finishes the scoring work of related data and finally obtains a government satisfaction scoring result.

2. The government satisfaction evaluation system based on data security and privacy protection according to claim 1, wherein the specific steps of the data security module for completing access control, privacy protection and feedback early warning work are as follows:

(1) When a user applies for accessing government satisfaction data resources, firstly, authenticating the identity of the user, and refusing access if the authentication is not passed; if the authentication is passed, further operation is performed;

(2) Judging the data resource requested by the user, and obtaining a required data set according to the authority when the data resource does not contain sensitive data; when the request data resource contains sensitive data, encrypting sensitive information including attributes of masses to obtain a desensitized data set;

(3) Recording the access process of the user and generating a corresponding log;

3. The government satisfaction evaluation system based on data security and privacy protection according to claim 1, wherein the data desensitizing module is used for desensitizing a data set containing sensitive information, and the specific steps are as follows:

(1) The sensitive data set enters a data desensitizing module;

(5) The desensitized data set is provided to a data scoring module.

4. The government satisfaction evaluation system based on data security and privacy protection according to claim 1, wherein the model training is completed by using a machine learning algorithm-Catboost algorithm, and the government satisfaction index contribution is obtained by using an importance function in the Catboost algorithm, so as to determine an index weight and provide the index weight to a scoring module, and the government satisfaction evaluation system specifically comprises:

assume that the government satisfaction data set is d= (X) _k ,Y _k ) _k＝1,2...,n Wherein

Is an index vector containing m government satisfaction characteristics, Y _k ＝(y ₁ ,y ₂ ,...y _k )，y _k E R is a label value corresponding to a government satisfaction index, and the Catboost algorithm utilizes the mean value of the same class of characteristic data +.>

I.e. < ->

To deduce the frequency of each category characteristic, and code to form new numerical variable

The function returns to 1 if the time is short, and returns to 0 if the time is short; p is a priori value of the super parameter; the parameter α is the weight of a priori value, where α > 0,/->

And y is _j Respectively representing the j-th category variable index and the corresponding label value thereof;

after the automatic coding work is finished, the Catboost algorithm replaces the gradient estimation method by the self-ordering lifting method, and each government satisfaction sample D is obtained _k Training to obtain a unique model M _i Wherein D is _k E D, finally obtain M _n I.e. finding an unbiased gradient estimate of the sample, thereby training and obtaining the final model.

5. A method for evaluating government satisfaction based on data security and privacy protection based on the system of any one of claims 1-4, comprising the steps of: