CN115630389A - Data processing method, device, equipment and storage medium - Google Patents

Data processing method, device, equipment and storage medium Download PDF

Info

Publication number
CN115630389A
CN115630389A CN202211006362.2A CN202211006362A CN115630389A CN 115630389 A CN115630389 A CN 115630389A CN 202211006362 A CN202211006362 A CN 202211006362A CN 115630389 A CN115630389 A CN 115630389A
Authority
CN
China
Prior art keywords
log file
desensitization
processed
target
risk
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211006362.2A
Other languages
Chinese (zh)
Inventor
陈启波
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ping An Bank Co Ltd
Original Assignee
Ping An Bank Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ping An Bank Co Ltd filed Critical Ping An Bank Co Ltd
Priority to CN202211006362.2A priority Critical patent/CN115630389A/en
Publication of CN115630389A publication Critical patent/CN115630389A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/62Protecting access to data via a platform, e.g. using keys or access control rules
    • G06F21/6218Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database
    • G06F21/6245Protecting personal data, e.g. for financial or medical purposes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/18File system types
    • G06F16/1805Append-only file systems, e.g. using logs or journals to store data
    • G06F16/1815Journaling file systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Databases & Information Systems (AREA)
  • Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Bioethics (AREA)
  • Medical Informatics (AREA)
  • Computer Hardware Design (AREA)
  • Computer Security & Cryptography (AREA)
  • Software Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Storage Device Security (AREA)

Abstract

The invention discloses a data processing method, a data processing device, a data processing equipment and a storage medium, wherein the data processing method comprises the steps of obtaining a log file to be processed, calling a trained privacy leakage risk evaluation model to carry out risk evaluation on the log file to be processed to obtain a risk level of the log file to be processed, determining a target desensitization strategy of the log file to be processed based on the risk level, and carrying out desensitization processing on the log file to be processed according to the target desensitization strategy to obtain a target log file after desensitization processing. By adopting the embodiment of the invention, the desensitization treatment can be carried out on the log files to be treated with different privacy leakage risk levels by adopting different target desensitization strategies, and the technical problem that the log files are easy to leak privacy is effectively solved.

Description

Data processing method, device, equipment and storage medium
Technical Field
The present invention relates to the field of data processing technologies, and in particular, to a data processing method, apparatus, device, and storage medium.
Background
In the operation process of the existing terminal, a large number of log files can be automatically generated, and part of the log files may contain private contents, such as identity card numbers, names, mobile phone numbers, ages, article purchase records and the like. If the private contents are leaked, the privacy of the user is greatly threatened.
Based on this, a method for effectively preventing the private content of the user in the log file from leaking is needed.
Disclosure of Invention
The embodiment of the invention aims to provide a data processing method, a data processing device, data processing equipment and a storage medium, and aims to solve the technical problem that a log file generated by a terminal is easy to leak privacy at present.
In a first aspect, an embodiment of the present invention provides a data processing method, including:
acquiring a log file to be processed;
calling a trained privacy leakage risk assessment model to carry out risk assessment on the log file to be processed to obtain the risk level of the log file to be processed;
determining a target desensitization policy for the pending log file based on the risk level;
and desensitizing the log file to be processed according to the target desensitization strategy to obtain the desensitized target log file.
Further, the risk levels include a first risk level and a second risk level, the first risk level is greater than the second risk level, the target desensitization policy includes a first desensitization policy and a second desensitization policy, the desensitization processing intensity of the first desensitization policy is greater than the desensitization processing intensity of the second desensitization policy, and the determining the target desensitization policy of the pending log file based on the risk levels includes:
if the risk level is a first risk level, determining that a first desensitization strategy is a target desensitization strategy of the log file to be processed;
and if the risk level is a second risk level, acquiring a target comparison table corresponding to the log file to be processed, and taking the second desensitization strategy and the target comparison table as the target desensitization strategy of the log file to be processed.
Further, before the step of calling the trained privacy disclosure risk assessment model to perform risk assessment on the log file to be processed, the data processing method further includes:
acquiring a plurality of log files to be trained which are not subjected to desensitization treatment, wherein the field types contained in the log files to be trained comprise at least one of Chinese types, english types and character types;
determining the risk grade of the log file to be trained containing one field type as a first risk grade, and determining the risk grade of the log file to be trained containing two or more different field types as a second risk grade;
training a plurality of log files to be trained and risk levels corresponding to the log files to be trained as training sample data, wherein the risk levels are used for evaluating privacy leakage risks to be trained;
and optimizing model parameters based on a target loss function until the privacy leakage risk assessment model is converged to obtain a trained privacy leakage risk assessment model.
Further, if the risk level is a first risk level, performing desensitization processing on the log file to be processed according to the target desensitization policy, including:
and according to the first desensitization strategy, performing unrecoverable first desensitization treatment on the sensitive data in the log file to be processed.
Further, if the risk level is a second risk level, performing desensitization processing on the log file to be processed according to the target desensitization policy includes:
determining a weight coefficient corresponding to the sensitive data corresponding to each field type in the log file to be processed;
performing recoverable second desensitization processing on corresponding sensitive data for preset times according to the second desensitization strategy, a target comparison table corresponding to each field type in the log file to be processed and a weight coefficient of the sensitive data corresponding to each field type; the weight coefficient is in direct proportion to the preset times, and the preset times are at least one time.
Further, after the step of performing desensitization processing on the log file to be processed according to the target desensitization policy, the data processing further includes:
acquiring an encrypted public key of a target object, and encrypting the target comparison table through the encrypted public key to generate an encrypted comparison table;
and sending the encryption comparison table and the desensitized target log file to a target object.
Further, the acquiring the log file to be processed includes:
acquiring an encrypted log file sent by a log generator;
and decrypting the encrypted log file to obtain a log file to be processed.
In a second aspect, an embodiment of the present invention provides a data processing apparatus, including:
the first acquisition module is used for acquiring a log file to be processed;
the risk evaluation module is used for calling the trained privacy leakage risk evaluation model to carry out risk evaluation on the log file to be processed to obtain the risk level of the log file to be processed;
a determination module for determining a target desensitization policy for the pending log file based on the risk level;
and the desensitization module is used for performing desensitization treatment on the log file to be processed according to the target desensitization strategy to obtain the desensitized target log file.
In a third aspect, an embodiment of the present invention provides an electronic device, where the electronic device includes a processor, a memory, and a computer program stored in the memory and executable on the processor, and the processor implements the steps in the data processing method described in any one of the above when executing the computer program.
In a fourth aspect, the present invention provides a computer-readable storage medium, where a computer program is stored, and when the computer program is executed by a processor, the computer program implements the steps in the data processing method according to any one of the above.
The embodiment of the invention provides a data processing method, a data processing device, data processing equipment and a storage medium, and by calling a trained privacy leakage risk evaluation model, the privacy leakage risk level of a log file to be processed can be determined, so that corresponding target desensitization strategies can be determined according to different privacy leakage risk levels, desensitization processing can be further performed on the log file to be processed with different privacy leakage risk levels by adopting different target desensitization strategies, and the technical problem that the log file is easy to cause privacy leakage is effectively solved.
Drawings
FIG. 1 is a schematic flow chart diagram of a data processing method according to an embodiment of the present invention;
FIG. 2 is another schematic flow chart diagram of a data processing method according to an embodiment of the present invention;
fig. 3 is a schematic flowchart of a method for training a privacy leakage risk assessment model according to an embodiment of the present invention;
FIG. 4 is a schematic diagram of a data processing apparatus according to an embodiment of the present invention;
FIG. 5 is a schematic diagram of another structure of a data processing apparatus according to an embodiment of the present invention;
FIG. 6 is a schematic structural diagram of an electronic device according to an embodiment of the present invention;
fig. 7 is another schematic structural diagram of an electronic device according to an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
It should be understood that the various steps recited in the method embodiments of the present disclosure may be performed in a different order, and/or performed in parallel. Moreover, method embodiments may include additional steps and/or omit performing the illustrated steps. The scope of the present disclosure is not limited in this respect.
The term "include" and variations thereof as used herein are open-ended, i.e., "including but not limited to". The term "based on" is "based at least in part on". The term "one embodiment" means "at least one embodiment"; the term "another embodiment" means "at least one additional embodiment"; the term "some embodiments" means "at least some embodiments". Relevant definitions for other terms will be given in the following description.
In the related art, a large number of log files are automatically generated in the running process of the terminal at present, and part of the log files may contain private contents, such as an identification number, a name, a mobile phone number, an age, an article purchase record and the like. If the private contents are leaked, the privacy of the user is greatly threatened.
In order to solve the technical problems in the related art, an embodiment of the present invention provides a data processing method, please refer to fig. 1, where fig. 1 is a schematic flow chart of the data processing method provided in the embodiment of the present invention, and the method includes steps 101 to 104;
step 101, obtaining a log file to be processed.
In this embodiment, the log file to be processed is a log file automatically generated in the running process of the system. Different log files contain the same or different data, and the data may include private information of the user, such as an identification number, an article purchase record and the like. Therefore, the embodiment mainly processes the log file automatically generated by the terminal system to solve the technical problem that the existing log file automatically generated by the terminal system is easy to leak privacy.
And 102, calling the trained privacy leakage risk assessment model to perform risk assessment on the log file to be processed to obtain the risk level of the log file to be processed.
In this embodiment, the trained privacy leakage risk assessment model is mainly used for identifying sensitive data in an input log file to be processed, so as to determine a privacy leakage risk level of a corresponding log file to be processed according to the identified sensitive data.
The privacy leakage risk levels of different log files to be processed can be determined through the trained privacy leakage risk evaluation model, so that different desensitization processing can be appointed for different privacy leakage risk levels, and the safety of sensitive data in the files to be processed with different privacy leakage risk levels is guaranteed.
Specifically, the higher the privacy leakage risk level is, the easier the sensitive data in the log file to be processed corresponding to the representation is to be leaked; and otherwise, the lower the privacy leakage risk level is, the more difficult the sensitive data in the corresponding log file to be processed is to be characterized and leaked.
And 103, determining a target desensitization strategy of the log file to be processed based on the risk level.
After determining the privacy leakage risk levels corresponding to different log files to be processed, according to the different privacy leakage risk levels, a desensitization policy corresponding to the privacy leakage level is selected from a preset desensitization policy pool as a target desensitization policy corresponding to the log files to be processed.
The preset desensitization strategy pool comprises at least two desensitization strategies, for example, a desensitization strategy corresponding to a high privacy leakage risk level which reaches a preset level, and a desensitization strategy corresponding to a low privacy leakage risk level which does not reach the preset level.
It should be noted that the desensitization strategies in the preset desensitization strategy pool provided in this embodiment are not limited to the two desensitization strategies provided in the above embodiment, and may also include three or more desensitization strategies, and the number of the specific desensitization strategies needs to be set according to actual application requirements, and is not specifically limited herein.
And 104, desensitizing the log file to be processed according to the target desensitizing strategy to obtain the desensitized target log file.
After the target desensitization strategy corresponding to the log file to be processed is determined, desensitization processing can be carried out on the log file to be processed through the determined target desensitization strategy. Specifically, the desensitization processing provided by this embodiment mainly performs data deformation, fuzzification, or disguise on sensitive data in a corresponding log file to be processed by determining a target desensitization policy, so as to implement reliable protection on privacy and sensitive data of the log file to be processed, and effectively prevent the log file generated by the terminal system from being prone to privacy leakage.
In summary, the present invention provides a data processing method, which includes obtaining a log file to be processed, calling a trained privacy leakage risk assessment model to perform risk assessment on the log file to be processed, obtaining a risk level of the log file to be processed, determining a target desensitization policy of the log file to be processed based on the risk level, and performing desensitization processing on the log file to be processed according to the target desensitization policy, so as to obtain a target log file after desensitization processing. By adopting the embodiment of the invention, the desensitization treatment can be carried out on the log files to be treated with different privacy leakage risk levels by adopting different target desensitization strategies, and the technical problem that the log files are easy to leak privacy is effectively solved.
Referring to fig. 2, fig. 2 is another schematic flow chart of a data processing method according to an embodiment of the present invention, the method includes steps 201 to 210;
step 201, obtaining the encrypted log file sent by the log generator.
In a terminal system, a log file is generated by a log generator of the terminal system, but the log generator in the existing terminal system does not generally encrypt the log file, so that the log file generated by the terminal is easy to have the possibility of privacy disclosure.
In order to reduce the possibility of privacy leakage of the log file in the transmission process, the embodiment performs encryption processing on the log file generated by the log generator in the terminal system to obtain an encrypted log file after the encryption processing. Therefore, in the process of acquiring the encrypted log file generated by the log generator, the possibility of privacy leakage of the log file in the transmission process can be effectively reduced.
Step 202, decrypting the encrypted log file to obtain a to-be-processed log file.
In this embodiment, the log file generated by the log generator is encrypted by the encryption public key of the system, so as to obtain an encrypted log file. Therefore, the encrypted log file can be decrypted through the private key of the system, so that the possibility of privacy leakage of the encrypted log file in the transmission process is effectively reduced, and the log file to be processed with low privacy leakage risk can be obtained.
And 203, calling the trained privacy leakage risk assessment model to perform risk assessment on the log file to be processed to obtain the risk level of the log file to be processed.
As an alternative embodiment, the risk levels include a first risk level and a second risk level, the first risk level being greater than the second risk level. The target desensitization strategy provided by this embodiment includes a first desensitization strategy and a second desensitization strategy, and the desensitization treatment intensity of the first desensitization strategy is greater than the desensitization treatment intensity of the second desensitization strategy.
Specifically, the desensitization processing strength refers to the difficulty that sensitive data in a log file after desensitization processing leaks after desensitization processing is performed on the sensitive data in the log file. For example, if the desensitization processing strength is high, the difficulty of leakage of sensitive data in the desensitized log file is high; otherwise, the desensitization processing intensity is low, and the difficulty of leakage of sensitive data in the desensitized log file is relatively low.
And 204, if the risk level is the first risk level, determining that the first desensitization strategy is a target desensitization strategy of the log file to be processed.
After determining that the first desensitization policy is the target desensitization policy for the pending log file, step 206 is performed.
Step 205, if the risk level is a second risk level, acquiring a target comparison table corresponding to the log file to be processed, and taking the second desensitization policy and the target comparison table as a target desensitization policy of the log file to be processed.
The field type included in the log file to be processed in this embodiment includes at least one of a chinese type, an english type, and a character type. Specifically, when the log file to be processed only contains one data of the field type, because the data type is single, if only simple desensitization processing is adopted, sensitive data in the log file to be processed is likely to be leaked and then is easy to crack; when the log file to be processed contains data of two or more different field types, the sensitive data in the log file to be processed is still difficult to crack even if simple desensitization processing is adopted because the data types are more and complicated. Therefore, the present embodiment defines the risk level of the log file to be processed containing one of the field types as a first risk level, and defines the risk level of the log file to be processed containing two or more different field types as a second risk level.
In this embodiment, the preset desensitization strategy pool includes a first desensitization strategy and a second desensitization strategy. The first desensitization strategy mainly aims at the log files to be processed with high privacy leakage risk level, namely the log files to be processed only contain one type of field type data; the second desensitization strategy mainly aims at the log files to be processed with low privacy leakage risk level, namely the log files to be processed only contain two or more different field type data.
After determining that the first desensitization policy is the target desensitization policy for the pending log file, steps 207 through 208 are performed.
And step 206, according to the first desensitization strategy, performing unrecoverable first desensitization on the sensitive data in the log file to be processed to obtain a target log file after the first desensitization.
In this embodiment, because the first desensitization policy is for a pending log file with a high privacy leakage risk level, in this embodiment, desensitization processing is performed on the pending log file by using an unrecoverable first desensitization processing, so as to prevent that when data leakage occurs in the pending log file only containing a single data type, sensitive data is easily inferred or reconstructed by non-sensitive data, thereby causing leakage of sensitive data of user individual privacy. Specifically, the first unrecoverable desensitization process is mainly to perform unrecoverable processing on the sensitive data in the log file to be processed, which only contains data of a single character type, for example, to replace specific data in the sensitive data by using a unique constant value or "star", or to replace specific data in the sensitive data by outputting a random value through a random function.
It should be noted that the first unrecoverable desensitization process is mainly to modify a part of sensitive data in the log file to be processed, so that the modified part of sensitive data cannot be restored and derived.
Since the sensitive data in the log file to be processed, which only contains a single data type, is also a single data type, the data structure is not complex, and the user can easily remember the actual data corresponding to the replaced data. When a user who does not use the user himself acquires the sensitive data subjected to the first desensitization, the user who does not use the user himself cannot think that the sensitive data subjected to the first desensitization cannot be deduced according to the data which is not subjected to the desensitization, and cannot find any way for recovering the data subjected to the first desensitization, so that the safety of the log file to be processed, which only contains a single data type, is effectively improved.
Step 207, determining a weight coefficient corresponding to the sensitive data corresponding to each field type in the log file to be processed.
In this embodiment, the present embodiment provides multiple types of lookup tables, where each field type corresponds to one type of lookup table. Specifically, the comparison table is used for performing corresponding processing on sensitive data in the log file to be processed according to data in the comparison table during desensitization processing, wherein the corresponding processing includes directly replacing part of the data in the comparison table with part of the sensitive data, and calculating the part of the data in the comparison table and part of the sensitive data by using a preset specific function so as to replace part of the data in the sensitive data with a result obtained by calculation. It should be noted that the corresponding processing manner is not limited to the two manners defined above, and any manner of replacing the data obtained from the data in the comparison table and the partial data in the sensitive data with the partial data in the sensitive data is within the protection scope of the embodiment of the present invention, which is not limited herein.
In some embodiments, the weight coefficient of the sensitive data in the log file to be processed is determined in the following manner: and then determining the weight coefficient corresponding to the sensitive data corresponding to each field type in the log file to be processed according to the ratio of the data length of the sensitive data corresponding to each field type to the total data length.
And 208, performing recoverable second desensitization processing on corresponding sensitive data for preset times according to the second desensitization strategy, the target comparison table corresponding to each field type in the log file to be processed and the weight coefficient of the sensitive data corresponding to each field type to obtain the target log file after the second desensitization processing.
The weight coefficient is in direct proportion to the preset times, and the preset times are at least one time.
In some embodiments, the recoverable second desensitization processing is mainly to perform recoverable processing on sensitive data in the log file to be processed, which contains data of two or more different field types, for example, to replace the sensitive data with data in a target comparison table corresponding to the field type of the sensitive data, or to calculate partial data in the data and the sensitive data in the target comparison table by using a preset specific function, so as to replace the calculated result with the partial data in the sensitive data. In this way, the sensitive data after the second desensitization treatment can be restored through the corresponding target comparison table subsequently to obtain the original sensitive data.
As an optional embodiment, when the log file to be processed includes two different field type data (first type data and second type data), and a first weight coefficient of the first type data is greater than a second weight coefficient of the second type data, performing, according to a second desensitization policy and a target comparison table corresponding to the first type data, recoverable second desensitization processing for a first number of times on the sensitive data of the first type data, and performing, according to the second desensitization policy and the target comparison table corresponding to the second type data, recoverable second desensitization processing for a second number of times on the sensitive data of the second type data, where the first number of times is greater than the second number of times. Specifically, when the first weight coefficient of the first type data is 0.8 and the second weight coefficient of the second type data is 0.2, the sensitive data of the first type data can be subjected to recoverable second desensitization for 4 times, for example, the sensitive data of the same part of the first type data can be subjected to desensitization sequentially by adopting 4 different desensitization processing modes according to the corresponding target comparison table, or the sensitive data of different parts of the first type data can be subjected to desensitization respectively by adopting 4 different desensitization processing modes according to the corresponding target comparison table; and performing 1 time of recoverable second desensitization processing on the sensitive data of the second type data, for example, performing desensitization processing on the sensitive data of the second type data by adopting any desensitization processing mode according to the corresponding target comparison table.
It should be noted that the recoverable second desensitization processing is mainly to modify a part of data of the sensitive data in the log file to be processed according to the target comparison table corresponding to the character type, so that the modified part of data can be restored according to the corresponding target comparison table to obtain the original sensitive data.
Sensitive data in the log file to be processed containing data of two or more different field types are also of two or more different data types, so that the data structure is complex, and even a user himself/herself is easy to forget. When the user who is not the user obtains the sensitive data after the second desensitization processing, the sensitive data after the second desensitization processing cannot be easily deduced due to the characteristic of a complex data structure, so that the risk of personal sensitive information leakage of the user can be reduced, the user can be helped to retrieve the forgotten personal sensitive information again by performing recoverable second desensitization processing on the log file to be processed, and the user experience is improved.
In this embodiment, the desensitization processing (including the first desensitization processing and the second desensitization processing) provided in this embodiment is only to process a part of data of the sensitive data in the log file to be processed, and the desensitization processing does not change the original characteristics, the original association relationship with other data, and the original rules of the sensitive data in the log file to be processed, for example, when the sensitive data is a bank card number, because the first four-digit value of the bank card number indicates a bank name, the desensitization processing on the sensitive data does not process the first four-digit value of the bank card number, and for the convenience of user confirmation, the desensitization processing does not process the last four-digit value of the bank card number to keep the original data. When the sensitive data is the email box, because displayed address information is behind the "@", desensitization processing on the sensitive data does not process the "@" of the email box and the data behind the "@", and in order to facilitate confirmation of a user, desensitization processing also does not process the data of the previous preset number of the email box so as to keep the original data of the email box, wherein the preset number is determined according to the data length of the sensitive data, if the data length of the email box is greater than 5, the preset number can be set to 1 or 2, and if the data length of the email box is greater than 8, the preset number can be set to 3 or 4, as long as the number of the data subjected to desensitization processing is not less than the preset number, so the specific numerical values of the preset number are not specifically limited here.
Step 209, obtaining the encrypted public key of the target object, and encrypting the target comparison table through the encrypted public key to generate an encrypted comparison table.
In this embodiment, when the desensitized target log file needs to be transmitted to the target object, the security of the target log file in the transmission process is improved by obtaining the encryption public key of the target object and encrypting the target log file needing to be transmitted through the encryption public key of the target object, so as to prevent the data leakage of the target log file and the target comparison table in the transmission process, thereby directly restoring the desensitized sensitive data according to the leaked target log file and the target comparison table.
And step 210, sending the encryption comparison table and the desensitized target log file to a target object.
After the target object receives the encryption comparison table and the target log file, the target object can decrypt the encryption comparison table through a private key of the target object, so that the target comparison table corresponding to the sensitive data in the desensitized target log file is obtained, and the desensitized sensitive data can be restored according to the target comparison table.
As an alternative embodiment, please refer to fig. 3, fig. 3 is a schematic flowchart of a training method of a privacy disclosure risk assessment model according to an embodiment of the present invention, and as shown in fig. 3, the training method of a privacy disclosure risk assessment model according to the embodiment includes steps 301 to 304;
step 301, obtaining a plurality of log files to be trained without the desensitization treatment.
In this embodiment, the log file to be trained is similar to the log file to be processed provided in the above embodiments, and the field type included in the log file also includes at least one of a chinese type, an english type, and a character type.
Step 302, determining the risk level of the log file to be trained containing one field type as a first risk level, and determining the risk level of the log file to be trained containing two or more different field types as a second risk level.
Step 303, taking a plurality of log files to be trained and the risk level corresponding to each log file to be trained as training sample data, and training the privacy leakage risk assessment model to be trained.
And 304, optimizing model parameters based on the target loss function until the privacy leakage risk assessment model converges to obtain the trained privacy leakage risk assessment model.
In this embodiment, the target loss function may be an L1 function, or may be another loss function that can improve the model accuracy.
In one embodiment, a preset number of training sessions, such as 10000, may be set. Therefore, after the number of times of training the privacy leakage risk assessment model reaches 10000 times, the convergence of the privacy leakage risk assessment model can be judged, and the trained privacy leakage risk assessment model can be obtained.
In another embodiment, a predetermined recognition accuracy, for example 90%, may be set. Therefore, when the privacy leakage risk assessment model is verified, the recognition accuracy is up to 90% or more, the convergence of the privacy leakage risk assessment model can be judged, and the trained privacy leakage risk assessment model can be obtained.
In summary, an embodiment of the present invention provides a data processing method, which includes obtaining a log file to be processed, calling a trained privacy leakage risk assessment model to perform risk assessment on the log file to be processed, obtaining a risk level of the log file to be processed, determining a target desensitization policy of the log file to be processed based on the risk level, and performing desensitization processing on the log file to be processed according to the target desensitization policy, so as to obtain a target log file after desensitization processing. By adopting the embodiment of the invention, the desensitization treatment can be carried out on the log files to be treated with different privacy leakage risk levels by adopting different target desensitization strategies, and the technical problem that the privacy leakage easily occurs to the log files is effectively solved.
According to the method described in the foregoing embodiment, the embodiment will be further described from the perspective of a data processing apparatus, which may be specifically implemented as an independent entity, or may be implemented by being integrated in an electronic device, such as a terminal, where the terminal may include a mobile phone, a tablet computer, and the like.
Referring to fig. 4, fig. 4 is a schematic structural diagram of a data processing apparatus according to an embodiment of the present invention, and as shown in fig. 4, a data processing apparatus 400 according to an embodiment of the present invention includes:
the first obtaining module 401 is configured to obtain a log file to be processed.
In this embodiment, the first obtaining module 401 is specifically configured to: acquiring an encrypted log file sent by a log generator; and decrypting the encrypted log file to obtain a log file to be processed.
And the risk evaluation module 402 is configured to invoke the trained privacy disclosure risk evaluation model to perform risk evaluation on the log file to be processed, so as to obtain a risk level of the log file to be processed.
A determining module 403, configured to determine a target desensitization policy of the pending log file based on the risk level.
In this embodiment, the risk levels include a first risk level and a second risk level, the first risk level is greater than the second risk level, the target desensitization policy includes a first desensitization policy and a second desensitization policy, the desensitization processing strength of the first desensitization policy is greater than the desensitization processing strength of the second desensitization policy, and the determining module 403 is specifically configured to: if the risk level is a first risk level, determining that a first desensitization strategy is a target desensitization strategy of the log file to be processed; and if the risk level is a second risk level, acquiring a target comparison table corresponding to the log file to be processed, and taking the second desensitization strategy and the target comparison table as the target desensitization strategy of the log file to be processed.
And a desensitization module 404, configured to perform desensitization processing on the log file to be processed according to the target desensitization policy, so as to obtain a target log file after the desensitization processing.
In one embodiment, if the risk level is the first risk level, the desensitization module 404 is specifically configured to: and according to the first desensitization strategy, performing unrecoverable first desensitization treatment on the sensitive data in the log file to be processed.
In another embodiment, if the risk level is the second risk level, the desensitization module 404 is specifically configured to: determining a weight coefficient corresponding to the sensitive data corresponding to each field type in the log file to be processed; performing recoverable second desensitization processing on corresponding sensitive data for preset times according to the second desensitization strategy, a target comparison table corresponding to each field type in the log file to be processed and a weight coefficient of the sensitive data corresponding to each field type; the weight coefficient is in direct proportion to the preset times, and the preset times are at least one time.
As a preferred embodiment, please refer to fig. 5, fig. 5 is another schematic structural diagram of a data processing apparatus according to an embodiment of the present invention, and as shown in fig. 5, the data processing apparatus 400 further includes:
a model training module 405, configured to obtain a plurality of log files to be trained without the desensitization processing, where field types included in the log files to be trained include at least one of a chinese type, an english type, and a character type; determining the risk grade of the log file to be trained containing one field type as a first risk grade, and determining the risk grade of the log file to be trained containing two or more different field types as a second risk grade; training a privacy leakage risk assessment model to be trained by using a plurality of log files to be trained and risk grades corresponding to the log files to be trained as training sample data; and optimizing model parameters based on a target loss function until the privacy leakage risk assessment model converges to obtain a trained privacy leakage risk assessment model.
The second obtaining module 406 is configured to obtain an encrypted public key of the target object, and encrypt the target comparison table through the encrypted public key to generate an encrypted comparison table.
A sending module 407, configured to send the encryption comparison table and the desensitized target log file to a target object.
In a specific implementation, each of the modules and/or units may be implemented as an independent entity, or may be implemented as one or several entities by any combination, where the specific implementation of each of the modules and/or units may refer to the foregoing method embodiment, and specific achievable beneficial effects also refer to the beneficial effects in the foregoing method embodiment, which are not described herein again.
In addition, referring to fig. 6, fig. 6 is a schematic structural diagram of an electronic device according to an embodiment of the present invention, where the electronic device may be a mobile terminal such as a smart phone and a tablet computer. As shown in fig. 6, the electronic device 600 includes a processor 601, a memory 602. The processor 601 is electrically connected to the memory 602.
The processor 601 is a control center of the electronic device 600, connects various parts of the whole electronic device using various interfaces and lines, and performs various functions of the electronic device 600 and processes data by running or loading an application stored in the memory 602 and calling data stored in the memory 602, thereby performing overall monitoring of the electronic device 600.
In this embodiment, the processor 601 in the electronic device 600 loads instructions corresponding to processes of one or more application programs into the memory 602, and the processor 601 executes the application programs stored in the memory 602, thereby implementing various functions.
The electronic device 600 may implement the steps in any embodiment of the data processing method provided in the embodiment of the present invention, and therefore, beneficial effects that can be achieved by any data processing method provided in the embodiment of the present invention can be achieved, which are detailed in the foregoing embodiments and will not be described herein again.
Referring to fig. 7, fig. 7 is another schematic structural diagram of an electronic device according to an embodiment of the present invention, and as shown in fig. 7, fig. 7 is a specific structural block diagram of the electronic device according to the embodiment of the present invention, where the electronic device may be used to implement the data processing method provided in the foregoing embodiment. The electronic device 700 may be a mobile terminal such as a smart phone or a notebook computer.
The RF circuit 710 is used for receiving and transmitting electromagnetic waves, and performing interconversion between the electromagnetic waves and electrical signals, thereby communicating with a communication network or other devices. The RF circuitry 710 may include various existing circuit elements for performing these functions, such as antennas, radio frequency transceivers, digital signal processors, encryption/decryption chips, subscriber Identity Module (SIM) cards, memory, and so forth. The RF circuit 710 may communicate with various networks such as the internet, an intranet, a wireless network, or with other devices over a wireless network. The wireless network may comprise a cellular telephone network, a wireless local area network, or a metropolitan area network. The Wireless network described above may use various Communication standards, protocols and technologies, including but not limited to Global System for Mobile Communication (GSM), enhanced Mobile Communication (Enhanced Data GSM Environment, EDGE), wideband Code Division Multiple Access (WCDMA), code Division Multiple Access (CDMA), time Division Multiple Access (TDMA), wireless Fidelity (Wi-Fi) (e.g., institute of electrical and electronics engineers standard IEEE802.11 a, IEEE802.11 b, IEEE802.1 g and/or IEEE802.11 n), voice over Internet Protocol (VoIP), world wide Internet Access (micro for Access, max), other suitable protocols for Wireless messaging, and other instant messaging protocols, including any other protocols that are currently developed, and even those suitable for instant messaging.
The memory 720 may be used to store software programs and modules, such as program instructions/modules corresponding to the data processing method in the above-described embodiments, and the processor 780 may execute various functional applications and data processing by operating the software programs and modules stored in the memory 720.
The memory 720 may include high-speed random access memory, and may also include non-volatile memory, such as one or more magnetic storage devices, flash memory, or other non-volatile solid-state memory. In some examples, memory 720 may further include memory located remotely from processor 780, which may be connected to electronic device 700 via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
The input unit 730 may be used to receive input numeric or character information and generate keyboard, mouse, joystick, optical or trackball signal inputs related to user settings and function control. In particular, the input unit 730 may include a touch-sensitive surface 731 as well as other input devices 732. Touch-sensitive surface 731, also referred to as a touch display screen or touch pad, can collect touch operations by a user (e.g., operations by a user using a finger, a stylus, or any other suitable object or attachment to touch-sensitive surface 731 or near touch-sensitive surface 731) on or near touch-sensitive surface 731, and drive corresponding connection devices according to a predetermined program. Alternatively, the touch sensitive surface 731 may comprise two parts, a touch detection means and a touch controller. The touch detection device detects the touch direction of a user, detects a signal brought by touch operation and transmits the signal to the touch controller; the touch controller receives touch information from the touch sensing device, converts it to touch point coordinates, and sends the touch point coordinates to the processor 780, and can receive and execute commands from the processor 780. In addition, the touch-sensitive surface 731 can be implemented in a variety of types, including resistive, capacitive, infrared, and surface acoustic wave. The input unit 730 may also include other input devices 732 in addition to the touch-sensitive surface 731. In particular, other input devices 732 may include, but are not limited to, one or more of a physical keyboard, function keys (such as volume control keys, switch keys, etc.), a trackball, a mouse, a joystick, and the like.
The display unit 740 may be used to display information input by or provided to the user and various graphical user interfaces of the electronic device 700, which may be made up of graphics, text, icons, video, and any combination thereof. The Display unit 740 may include a Display panel 741, and optionally, the Display panel 741 may be configured in the form of an LCD (Liquid Crystal Display), an OLED (Organic Light-Emitting Diode), or the like. Further, touch-sensitive surface 731 can overlay display panel 741, such that when touch-sensitive surface 731 detects a touch event thereon or nearby, processor 780 can determine the type of touch event, and processor 780 can then provide a corresponding visual output on display panel 741 based on the type of touch event. Although in the figure the touch-sensitive surface 731 and the display panel 741 are shown as two separate components to implement input and output functions, in some embodiments the touch-sensitive surface 731 and the display panel 741 may be integrated to implement input and output functions.
The electronic device 700 may also include at least one sensor 750, such as light sensors, motion sensors, and other sensors. Specifically, the light sensor may include an ambient light sensor that may adjust the brightness of the display panel 741 according to the brightness of ambient light, and a proximity sensor that may generate an interrupt when the folder is closed or closed. As one of the motion sensors, the gravity acceleration sensor can detect the magnitude of acceleration in each direction (generally, three axes), can detect the magnitude and direction of gravity when the mobile phone is stationary, and can be used for applications of recognizing the posture of the mobile phone (such as horizontal and vertical screen switching, related games, magnetometer posture calibration), vibration recognition related functions (such as pedometer and tapping), and the like; as for other sensors such as a gyroscope, a barometer, a hygrometer, a thermometer, and an infrared sensor that may be configured on the electronic device 700, further description is omitted here.
The audio circuit 760, speaker 761, and microphone 762 may provide an audio interface between a user and the electronic device 700. The audio circuit 760 can transmit the electrical signal converted from the received audio data to the speaker 761, and the electrical signal is converted into a sound signal by the speaker 761 and output; on the other hand, the microphone 762 converts the collected sound signal into an electric signal, receives it by the audio circuit 760 and converts it into audio data, and then processes it by the audio data output processor 780 and sends it to, for example, another terminal via the RF circuit 710 or outputs it to the memory 720 for further processing. The audio circuitry 760 may also include an earbud jack to provide communication of a peripheral headset with the electronic device 700.
Electronic device 700, via transport module 770 (e.g., a Wi-Fi module), may assist the user in receiving requests, sending information, etc., which provides the user with wireless broadband internet access. Although the transmission module 770 is illustrated in the drawings, it is understood that it does not belong to the essential constitution of the electronic device 700 and may be omitted entirely within the scope not changing the essence of the invention as needed.
The processor 780 is a control center of the electronic device 700, connects various parts of the entire cellular phone using various interfaces and lines, and performs various functions of the electronic device 700 and processes data by operating or executing software programs and/or modules stored in the memory 720 and calling data stored in the memory 720, thereby integrally monitoring the electronic device. Optionally, processor 780 may include one or more processing cores; in some embodiments, processor 780 may integrate an application processor that handles primarily the operating system, user interface, applications, etc. and a modem processor that handles primarily wireless communications. It will be appreciated that the modem processor described above may not be integrated into processor 780.
The electronic device 700 also includes a power supply 790 (e.g., a battery) that provides power to various components, and in some embodiments may be logically coupled to the processor 780 via a power management system that may perform functions such as managing charging, discharging, and power consumption. Power source 790 may also include any component including one or more DC or AC power sources, a recharging system, power failure detection circuitry, a power converter or inverter, a power status indicator, and the like.
Although not shown, the electronic device 700 further includes a camera (e.g., a front camera, a rear camera), a bluetooth module, and the like, which are not described in detail herein. In this embodiment, the display unit of the electronic device is a touch screen display, and the mobile terminal further includes a memory and one or more programs, where the one or more programs are stored in the memory and configured to be executed by the one or more processors, and the one or more programs include instructions for performing operations.
In specific implementation, the above modules may be implemented as independent entities, or may be combined arbitrarily to be implemented as the same or several entities, and specific implementation of the above modules may refer to the foregoing method embodiments, which are not described herein again.
It will be understood by those skilled in the art that all or part of the steps of the methods of the above embodiments may be performed by instructions or by associated hardware controlled by the instructions, which may be stored in a computer readable storage medium and loaded and executed by a processor. To this end, the embodiment of the present invention provides a storage medium, in which a plurality of instructions are stored, and the instructions can be loaded by a processor to execute the steps of any embodiment of the data processing method provided by the embodiment of the present invention.
Wherein the storage medium may include: read Only Memory (ROM), random Access Memory (RAM), magnetic or optical disks, and the like.
Since the instructions stored in the storage medium can execute the steps in any embodiment of the data processing method provided in the embodiment of the present invention, the beneficial effects that can be achieved by any data processing method provided in the embodiment of the present invention can be achieved, which are detailed in the foregoing embodiments and will not be described herein again.
The foregoing detailed description has provided a data processing method, apparatus, device, and computer-readable storage medium according to embodiments of the present application, and specific examples have been applied in the present application to explain the principles and implementations of the present application, and the descriptions of the foregoing embodiments are only used to help understand the method and the core ideas of the present application; meanwhile, for those skilled in the art, according to the idea of the present application, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present application. Moreover, it will be apparent to those skilled in the art that various modifications and adaptations can be made without departing from the principles of the invention, and such modifications and adaptations are intended to be within the scope of the invention.

Claims (10)

1. A data processing method, comprising:
acquiring a log file to be processed;
calling a trained privacy leakage risk assessment model to carry out risk assessment on the log file to be processed to obtain the risk level of the log file to be processed;
determining a target desensitization policy for the pending log file based on the risk level;
and desensitizing the log file to be processed according to the target desensitization strategy to obtain the desensitized target log file.
2. The data processing method of claim 1, wherein the risk levels include a first risk level and a second risk level, the first risk level is greater than the second risk level, the target desensitization policy includes a first desensitization policy and a second desensitization policy, a desensitization processing intensity of the first desensitization policy is greater than a desensitization processing intensity of the second desensitization policy, and the determining the target desensitization policy for the pending log file based on the risk levels includes:
if the risk level is a first risk level, determining that a first desensitization strategy is a target desensitization strategy of the log file to be processed;
and if the risk level is a second risk level, acquiring a target comparison table corresponding to the log file to be processed, and taking the second desensitization strategy and the target comparison table as the target desensitization strategy of the log file to be processed.
3. The data processing method of claim 2, wherein prior to the step of invoking the trained privacy disclosure risk assessment model to perform risk assessment on the pending log file, the data processing method further comprises:
acquiring a plurality of log files to be trained which are not subjected to desensitization treatment, wherein the field types contained in the log files to be trained comprise at least one of Chinese types, english types and character types;
determining the risk grade of the log file to be trained containing one field type as a first risk grade, and determining the risk grade of the log file to be trained containing two or more different field types as a second risk grade;
training a plurality of log files to be trained and risk levels corresponding to the log files to be trained as training sample data, wherein the risk levels are used for evaluating privacy leakage risks to be trained;
and optimizing model parameters based on a target loss function until the privacy leakage risk assessment model is converged to obtain a trained privacy leakage risk assessment model.
4. The data processing method of claim 3, wherein if the risk level is a first risk level, performing desensitization processing on the log file to be processed according to the target desensitization policy comprises:
and according to the first desensitization strategy, performing unrecoverable first desensitization treatment on the sensitive data in the log file to be processed.
5. The data processing method of claim 3, wherein if the risk level is a second risk level, performing desensitization processing on the log file to be processed according to the target desensitization policy comprises:
determining a weight coefficient corresponding to the sensitive data corresponding to each field type in the log file to be processed;
performing recoverable second desensitization processing on corresponding sensitive data for preset times according to the second desensitization strategy, a target comparison table corresponding to each field type in the log file to be processed and a weight coefficient of the sensitive data corresponding to each field type; the weight coefficient is in direct proportion to the preset times, and the preset times are at least one time.
6. The data processing method according to claim 5, wherein after the step of performing desensitization processing on the log file to be processed according to the target desensitization policy, the data processing further comprises:
acquiring an encrypted public key of a target object, and encrypting the target comparison table through the encrypted public key to generate an encrypted comparison table;
and sending the encrypted comparison table and the desensitized target log file to a target object.
7. The data processing method of claim 1, wherein the obtaining the log file to be processed comprises:
acquiring an encrypted log file sent by a log generator;
and decrypting the encrypted log file to obtain a log file to be processed.
8. A data processing apparatus, characterized by comprising:
the first acquisition module is used for acquiring a log file to be processed;
the risk evaluation module is used for calling the trained privacy leakage risk evaluation model to carry out risk evaluation on the log file to be processed to obtain the risk level of the log file to be processed;
a determination module for determining a target desensitization policy for the pending log file based on the risk level;
and the desensitization module is used for performing desensitization treatment on the log file to be processed according to the target desensitization strategy to obtain the desensitized target log file.
9. An electronic device, characterized in that the electronic device comprises a processor, a memory and a computer program stored in the memory and executable on the processor, the processor implementing the steps in the method according to any of claims 1 to 7 when executing the computer program.
10. A computer-readable storage medium, characterized in that the computer-readable storage medium stores a computer program which, when executed by a processor, implements the steps in the method according to any one of claims 1 to 7.
CN202211006362.2A 2022-08-22 2022-08-22 Data processing method, device, equipment and storage medium Pending CN115630389A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211006362.2A CN115630389A (en) 2022-08-22 2022-08-22 Data processing method, device, equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211006362.2A CN115630389A (en) 2022-08-22 2022-08-22 Data processing method, device, equipment and storage medium

Publications (1)

Publication Number Publication Date
CN115630389A true CN115630389A (en) 2023-01-20

Family

ID=84902127

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211006362.2A Pending CN115630389A (en) 2022-08-22 2022-08-22 Data processing method, device, equipment and storage medium

Country Status (1)

Country Link
CN (1) CN115630389A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117313133A (en) * 2023-10-20 2023-12-29 网麒科技(北京)有限责任公司 Data desensitization method, device, equipment and storage medium

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117313133A (en) * 2023-10-20 2023-12-29 网麒科技(北京)有限责任公司 Data desensitization method, device, equipment and storage medium

Similar Documents

Publication Publication Date Title
CN105900466B (en) Message processing method and device
US10530921B2 (en) Method for unlocking terminal screen and terminal
CN106658489B (en) Terminal application processing method and device and mobile terminal
CN104852885B (en) Method, device and system for verifying verification code
CN107885825A (en) A kind of five application page sharing method and mobile terminal
CN104901805B (en) A kind of identification authentication methods, devices and systems
CN106709347B (en) Using the method and device of operation
CN109145552B (en) Information encryption method and terminal equipment
CN105912919B (en) A kind of unlocked by fingerprint method and terminal
CN108475304B (en) Method and device for associating application program and biological characteristics and mobile terminal
CN110457888B (en) Verification code input method and device, electronic equipment and storage medium
CN106255102B (en) Terminal equipment identification method and related equipment
CN106534093B (en) A kind of processing method of terminal data, apparatus and system
WO2014183370A1 (en) Systems and methods for user login
CN110149628B (en) Information processing method and terminal equipment
CN109544172B (en) Display method and terminal equipment
CN107590397A (en) A kind of method and apparatus for showing embedded webpage
CN108491713B (en) Safety reminding method and electronic equipment
CN103455751B (en) Password hint generation method, device and terminal equipment
CN109271779A (en) A kind of installation packet inspection method, terminal device and server
CN108287738A (en) A kind of application control method and device
CN109446794B (en) Password input method and mobile terminal thereof
CN115630389A (en) Data processing method, device, equipment and storage medium
CN109992939B (en) Login method and terminal equipment
CN109657469B (en) Script detection method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination