CN117688906A - Log desensitization method and device, electronic equipment and nonvolatile storage medium - Google Patents

Log desensitization method and device, electronic equipment and nonvolatile storage medium Download PDF

Info

Publication number
CN117688906A
CN117688906A CN202311631522.7A CN202311631522A CN117688906A CN 117688906 A CN117688906 A CN 117688906A CN 202311631522 A CN202311631522 A CN 202311631522A CN 117688906 A CN117688906 A CN 117688906A
Authority
CN
China
Prior art keywords
character
data
log
character string
string data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311631522.7A
Other languages
Chinese (zh)
Inventor
冯攀峰
缪镠
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Telecom Corp Ltd
Original Assignee
China Telecom Corp Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Telecom Corp Ltd filed Critical China Telecom Corp Ltd
Priority to CN202311631522.7A priority Critical patent/CN117688906A/en
Publication of CN117688906A publication Critical patent/CN117688906A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/12Use of codes for handling textual entities
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/18File system types
    • G06F16/1805Append-only file systems, e.g. using logs or journals to store data
    • G06F16/1815Journaling file systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/602Providing cryptographic facilities or services
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Computer Hardware Design (AREA)
  • Software Systems (AREA)
  • Computer Security & Cryptography (AREA)
  • Bioethics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The application discloses a log desensitizing method, a log desensitizing device, electronic equipment and a nonvolatile storage medium. Wherein the method comprises the following steps: acquiring log data in a target service system; traversing the character string data, and sequentially judging whether each character positioned behind the initial character in the character string data meets a first matching rule from the initial character; under the condition that each character positioned between the initial character and the termination character in the character string data meets a first matching rule, determining a coordinate to be desensitized in the log data according to a first position of the initial character in the log data and a second position of the termination character in the log data; and carrying out desensitization treatment on the data corresponding to each coordinate to be desensitized in the log data to obtain the desensitized log data. The method and the device solve the technical problem of low log desensitization efficiency caused by adopting a mode of regular matching of an uncertain finite state machine when log desensitization is carried out in the related technology.

Description

Log desensitization method and device, electronic equipment and nonvolatile storage medium
Technical Field
The present disclosure relates to the field of log data processing technologies, and in particular, to a log desensitizing method, a device, an electronic device, and a nonvolatile storage medium.
Background
During the operation of the business system, the business processing flow including personal sensitive information such as name, mobile phone number and the like may be involved, and exact personal sensitive information such as name, mobile phone number, bank card number and the like exists in the log of the transaction system, and once the information is revealed, immeasurable loss may be brought to the user. In the age when personal information is gradually paid attention to, the desensitization of log data is an important point of enterprises.
However, in the related art, when the log desensitization is performed, a non-deterministic finite state machine is adopted to perform regular matching, so that the technical problem of low log desensitization efficiency is caused.
In view of the above problems, no effective solution has been proposed at present.
Disclosure of Invention
The embodiment of the application provides a log desensitization method, a device, electronic equipment and a nonvolatile storage medium, which at least solve the technical problem of low log desensitization efficiency caused by adopting a mode of regular matching by an uncertain finite state machine when log desensitization is carried out in the related technology.
According to one aspect of the embodiments of the present application, there is provided a log desensitizing method, including: acquiring log data in a target service system, wherein the log data comprises a plurality of character string data; traversing the character string data, determining characters which are not preset special characters in the character string data as initial characters, and sequentially judging whether each character positioned behind the initial characters in the character string data meets a first matching rule from the initial characters, wherein the first matching rule is used for judging whether data formed by each character is sensitive data; under the condition that each character positioned between a starting character and a stopping character in the character string data meets a first matching rule, determining a coordinate to be desensitized in the log data according to a first position of the starting character in the log data and a second position of the stopping character in the log data, wherein the stopping character is a first target character positioned behind the starting character in the character string data, and the next character of the target character is a preset special character; and carrying out desensitization treatment on the data corresponding to each coordinate to be desensitized in the log data to obtain the desensitized log data.
Optionally, sequentially determining whether each character of the character string data located after the start character satisfies the first matching rule includes: determining first matching rules corresponding to the log data, wherein each first matching rule corresponds to one sensitive data type, and the sensitive data type comprises at least one of the following: name, mobile phone number, address, bank card number, identification card number, mailbox; and respectively calling a first matching thread corresponding to each first matching rule to match the log data, and judging whether sensitive data of each sensitive data type exists in the log data.
Optionally, determining whether sensitive data of each sensitive data type exists in the log data includes: determining characters which are not preset special characters in the character string data as initial characters, and recording a first position of the initial characters in the log data, wherein the preset special characters comprise at least one of the following: comma, colon, double quotation, brackets, middle brackets, stop signs, semicolon; starting from the initial character, judging whether each character in the character string data meets a first matching rule in sequence, and re-determining the next initial character in the character string data under the condition that the character does not meet the first matching rule; judging whether the next character next to the character is a preset special character or not under the condition that the character meets the first matching rule, and continuously judging whether the next character meets the first matching rule or not under the condition that the next character next to the character is not the preset special character; and determining the current character as a termination character and recording a second position of the termination character in the log data under the condition that the next character of the immediately adjacent characters is a preset special character.
Optionally, after determining the current character as the termination character, the method further comprises: determining a character length of a character string located between the start character and the end character; under the condition that the character length is not greater than the data length of the data of the sensitive data type corresponding to the first matching rule, determining the character string between the initial character and the termination character as sensitive data, and determining the coordinate to be desensitized according to the first position and the second position; and when the character length is larger than the data length, the next initial character in the character string data is redetermined for matching.
Optionally, the desensitizing processing for the data corresponding to each coordinate to be desensitized in the log data includes: replacing characters of coordinates to be desensitized in the log data with a preset desensitization identifier; or encrypting the characters of the coordinates to be desensitized in the log data to obtain corresponding encrypted characters, and replacing the corresponding characters of the coordinates to be desensitized with the encrypted characters.
Optionally, the character string data includes: the system comprises single data, combined data and object data, wherein the single data is character string data only comprising one service attribute data, the combined data is character string data comprising a plurality of service attribute data, and the object data is character string data meeting a preset format; after obtaining the log data in the target service system, the method further comprises: determining judgment sub-conditions corresponding to the log data, and determining logic operators among the judgment sub-conditions, wherein the judgment sub-conditions are used for judging whether the character string data has possibility of containing sensitive data or not, and the logic operators comprise at least one of the following: or operation and AND operation; determining a target preprocessing rule according to the judging sub-condition and a logic operator corresponding to the judging sub-condition, and screening the character string data in the log data according to the target preprocessing rule, wherein the target preprocessing rule is used for filtering the character string data which does not contain sensitive data in the log data.
Optionally, filtering the character string data in the log data according to the target preprocessing rule includes: judging whether the first character of the character string data is a first preset character or not, and reserving the character string data under the condition that the first character is the first preset character, wherein the first preset character is the first character in a preset format of the object data; judging whether the data length of the character string data is larger than a first length threshold value or not under the condition that the first character of the character string data is not a first preset character, and reserving the character string data under the condition that the data length is larger than the first length threshold value; judging whether the character string data meets any one of target conditions under the condition that the data length is not larger than a first length threshold, wherein the target conditions comprise: the character string data starts with Chinese characters, the character string data starts with numbers, the data length of the character string data is not less than a second length threshold value, the character string data ends with a second preset character, the second length threshold value is smaller than the first length threshold value, and the second preset character is a suffix character of the electronic mailbox; and when the character string data meets any one of the target conditions, reserving the character string data, and when the character string data does not meet all of the target conditions, screening out the character string data.
According to another aspect of the embodiments of the present application, there is also provided a log desensitizing apparatus, including: the system comprises a log data acquisition module, a target service system and a target service system, wherein the log data acquisition module is used for acquiring log data in the target service system, and the log data comprises a plurality of character string data; the sensitive data matching module is used for traversing the character string data, determining that characters which are not preset special characters in the character string data are initial characters, and sequentially judging whether each character positioned behind the initial characters in the character string data meets a first matching rule from the initial characters, wherein the first matching rule is used for judging whether data formed by each character is sensitive data or not; the sensitive position determining module is used for determining the coordinate to be desensitized in the log data according to the first position of the initial character in the log data and the second position of the termination character in the log data under the condition that each character positioned between the initial character and the termination character in the character string data meets a first matching rule, wherein the termination character is the first target character positioned behind the initial character in the character string data, and the next character of the target character is a preset special character; the sensitive data desensitization module is used for carrying out desensitization processing on the data corresponding to each coordinate to be desensitized in the log data to obtain the log data after desensitization.
According to still another aspect of the embodiments of the present application, there is also provided an electronic device, including: the system comprises a memory and a processor for running a program stored in the memory, wherein the program runs to execute a log desensitizing method.
According to still another aspect of the embodiments of the present application, there is further provided a nonvolatile storage medium, where the nonvolatile storage medium includes a stored computer program, and a device where the nonvolatile storage medium is located executes the log desensitizing method by running the computer program.
In the embodiment of the application, log data in a target service system is acquired, wherein the log data comprises a plurality of character string data; traversing the character string data, determining characters which are not preset special characters in the character string data as initial characters, and sequentially judging whether each character positioned behind the initial characters in the character string data meets a first matching rule from the initial characters, wherein the first matching rule is used for judging whether data formed by each character is sensitive data; under the condition that each character positioned between a starting character and a stopping character in the character string data meets a first matching rule, determining a coordinate to be desensitized in the log data according to a first position of the starting character in the log data and a second position of the stopping character in the log data, wherein the stopping character is a first target character positioned behind the starting character in the character string data, and the next character of the target character is a preset special character; the method comprises the steps of carrying out desensitization processing on data corresponding to each coordinate to be desensitized in log data to obtain desensitized log data, preprocessing the data according to preset rules when the transaction log data is desensitized, filtering out data which does not need to be desensitized, carrying out sensitive information matching on the data which needs to be desensitized through an SNFA (Simple Non-deterministic finite automaton) method, calculating sensitive information coordinates, and finally carrying out quick encryption desensitization according to the sensitive data coordinates by using an encryption algorithm, thereby achieving the purposes of improving the desensitization efficiency and avoiding omission of sensitive information, and further solving the technical problem of low log desensitization efficiency caused by adopting an irregular matching mode when the log desensitization is carried out in the related technology.
Drawings
The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this application, illustrate embodiments of the application and together with the description serve to explain the application and do not constitute an undue limitation to the application. In the drawings:
FIG. 1 is a block diagram of the hardware architecture of a computer terminal (or electronic device) for implementing a method of log desensitization provided in accordance with an embodiment of the present application;
FIG. 2 is a schematic diagram of a method flow for log desensitization provided in accordance with an embodiment of the present application;
fig. 3 is an NFA state transition diagram of a mobile phone number matching process in the related art according to an embodiment of the present application;
FIG. 4 is a schematic diagram of an SNFA-based log desensitization procedure provided according to an embodiment of the present application;
FIG. 5 is a schematic diagram of SNFA three-party package dependencies provided according to an embodiment of the present application;
FIG. 6 is a schematic flow chart of data preprocessing provided according to an embodiment of the present application;
FIG. 7 is a schematic flow chart of SNFA multithreading rule coordinate aggregation provided in accordance with an embodiment of the present application;
fig. 8 is a schematic diagram of an SNFA sensitive information matching process according to an embodiment of the present application;
fig. 9 is a schematic structural diagram of a log desensitizing apparatus according to an embodiment of the present application.
Detailed Description
In order to make the present application solution better understood by those skilled in the art, the following description will be made in detail and with reference to the accompanying drawings in the embodiments of the present application, it is apparent that the described embodiments are only some embodiments of the present application, not all embodiments. All other embodiments, which can be made by one of ordinary skill in the art based on the embodiments herein without making any inventive effort, shall fall within the scope of the present application.
It should be noted that the terms "first," "second," and the like in the description and claims of the present application and the above figures are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that embodiments of the present application described herein may be implemented in sequences other than those illustrated or otherwise described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
For the convenience of those skilled in the art to better understand the embodiments of the present application, some technical terms or nouns related to the embodiments of the present application will now be explained as follows:
non-deterministic finite state machine (Non-deterministic finite automaton, NFA): is a method for realizing regular matching, most development languages including Java currently use the method for realizing regular matching, and the method uses five-tuple representation as follows: m= (K, U, F, S, Z), where K represents a finite set, each element of which is called a state; u represents a finite alphabet, each element of which is an input symbol; f represents a conversion function, F (k) i ,O)=k j (wherein k is i ,k j E K, O e U), in state K i Lower input character O, transition to state k j The method comprises the steps of carrying out a first treatment on the surface of the S represents the initial state of finite set K; z represents the final state of finite set K.
Simple uncertain finite state machine (Simple Non-deterministic finite automaton, SNFA): according to the characteristics of sensitive data of the transaction system log, character string traversal matching is directly adopted, so that quick matching and accurate desensitization of sensitive information are realized.
Log back: a log framework is commonly used in java language based web applications.
In the related art, there are generally two methods, synchronous and asynchronous, when log desensitization is performed, synchronous log desensitization is suitable for a system with small concurrency and low response requirement, and asynchronous log desensitization is suitable for a system with high concurrency and high response requirement, such as a transaction system. The basic process of log desensitization is to receive log, match sensitive information of log, then desensitize the matched result, and finally write log into file. The fundamental difference is whether the business thread and the log printing thread use the same thread, and the sensitive information matching and log desensitizing steps are the same.
The sensitive information belongs to a key step in the whole log desensitization process, the efficiency of the sensitive information can determine the overall desensitization efficiency, the two desensitization modes in the related technology consume more time based on the regular matching mode of the NFA, and the system performance can be influenced in a high concurrency scene.
Taking the example of matching and desensitizing the mobile phone number, the regular expression of the mobile phone number in the related art is ≡1[3456789] \d {9} $ which represents an 11-bit character string beginning with the number 1, the second bit is one of the numbers 3456789, and the last 9 bits are all numbers, the corresponding NFA state flow is shown in fig. 3, wherein a circle represents a certain state, a double circle represents an end state, and an arrow represents a state transition condition. E represents matching null characters, transition to another state without matching any characters in the input string.
The state flow is divided into three parts, each part carries out state transition through epsilon, the first part is a state node S to K1, and whether the first character is 1 is matched; the second part is the status nodes K2 to K17, matching whether the second character is one of the digits 3, 4, 5, 6, 7, 8 and 9; the third part matches whether the consecutive 9 characters are numbers; all three conditions are met or the input character string reaches the end position, the whole transition reaches the end state end, otherwise, the state transition is continued from the start state.
The following description will take an input character string 15911111111abc1533517 as an example, with reference to an NFA state transition diagram.
The first character of the character string is 1, which accords with the condition, and the character string is adjusted from a starting state S to a state K1; then judging that the second character is 5 and meets the condition, and transferring the state to K12; then transition to K17 state through E; the string 911111111 then matches the consecutive 9-digit number and the state transitions to K17, which has been matched to the cell phone number. Then, the subsequent character string at the beginning of a is continuously matched from the beginning state according to the previous steps, and the final matching result is that a mobile phone number 15911111111 is matched (it should be noted that, the information such as the mobile phone number and the identity card number mentioned in the embodiment of the present application are imaginary information used for example, and are not actual personal information).
Therefore, in the related art, after matching the sub-strings meeting the conditions by adopting the NFA state flow mode, matching is continued until the strings reach the tail. This approach not only adds invalid matches, but may also match errors. The string 15911111111abc1533517 as described above is a continuous string which cannot be a cell phone number but is matched. The related technology directly adopts NFA regular matching to carry out a large number of invalid matching, so that the matching time consumption is increased, and misjudgment is also possible; even if data preprocessing (e.g., string map mapping or segmentation) is performed to extract valid data, and NFA regular matching is performed, although regular invalid matching is reduced and matching is ensured, the data is traversed once during data preprocessing, which increases matching time consumption and results in low log desensitization efficiency.
In order to solve the above-mentioned problems, related solutions are provided in the embodiments of the present application, and the following detailed description is provided.
In accordance with the embodiments of the present application, a method embodiment of log desensitization is provided, it being noted that the steps shown in the flowcharts of the figures may be performed in a computer system, such as a set of computer executable instructions, and, although a logical order is shown in the flowcharts, in some cases, the steps shown or described may be performed in an order other than that shown or described herein.
The method embodiments provided by the embodiments of the present application may be performed in a mobile terminal, a computer terminal, or similar computing device. Fig. 1 shows a block diagram of a hardware structure of a computer terminal (or electronic device) for implementing a log desensitization method. As shown in fig. 1, the computer terminal 10 (or electronic device) may include one or more processors 102 (shown as 102a, 102b, … …,102 n) which may include, but are not limited to, a microprocessor MCU or a processing device such as a programmable logic device FPGA, a memory 104 for storing data, and a transmission device 106 for communication functions. In addition, the method may further include: a display, an input/output interface (I/O interface), a Universal Serial BUS (USB) port (which may be included as one of the ports of the BUS), a network interface, a power supply, and/or a camera. It will be appreciated by those of ordinary skill in the art that the configuration shown in fig. 1 is merely illustrative and is not intended to limit the configuration of the electronic device described above. For example, the computer terminal 10 may also include more or fewer components than shown in FIG. 1, or have a different configuration than shown in FIG. 1.
It should be noted that the one or more processors 102 and/or other data processing circuits described above may be referred to generally herein as "data processing circuits. The data processing circuit may be embodied in whole or in part in software, hardware, firmware, or any other combination. Furthermore, the data processing circuitry may be a single stand-alone processing module, or incorporated, in whole or in part, into any of the other elements in the computer terminal 10 (or electronic device). As referred to in the embodiments of the present application, the data processing circuit acts as a processor control (e.g., selection of the path of the variable resistor termination to interface).
The memory 104 may be used to store software programs and modules of application software, such as program instructions/data storage devices corresponding to the log desensitizing method in the embodiments of the present application, and the processor 102 executes the software programs and modules stored in the memory 104, thereby performing various functional applications and data processing, that is, implementing the log desensitizing method described above. Memory 104 may include high-speed random access memory, and may also include non-volatile memory, such as one or more magnetic storage devices, flash memory, or other non-volatile solid-state memory. In some examples, the memory 104 may further include memory located remotely from the processor 102, which may be connected to the computer terminal 10 via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
The transmission means 106 is arranged to receive or transmit data via a network. The specific examples of the network described above may include a wireless network provided by a communication provider of the computer terminal 10. In one example, the transmission device 106 includes a network adapter (Network Interface Controller, NIC) that can connect to other network devices through a base station to communicate with the internet. In one example, the transmission device 106 may be a Radio Frequency (RF) module for communicating with the internet wirelessly.
The display may be, for example, a touch screen type Liquid Crystal Display (LCD) that may enable a user to interact with a user interface of the computer terminal 10 (or electronic device).
In the above operating environment, the embodiment of the present application provides a log desensitizing method, and fig. 2 is a schematic diagram of a log desensitizing method provided according to the embodiment of the present application, as shown in fig. 2, where the method includes the following steps:
step S202, acquiring log data in a target service system, wherein the log data comprises a plurality of character string data;
step S204, traversing the character string data, determining that characters which are not preset special characters in the character string data are initial characters, and sequentially judging whether each character positioned behind the initial characters in the character string data meets a first matching rule from the initial characters, wherein the first matching rule is used for judging whether data formed by each character is sensitive data or not;
step S206, under the condition that each character between the initial character and the termination character in the character string data meets a first matching rule, determining the coordinate to be desensitized in the log data according to the first position of the initial character in the log data and the second position of the termination character in the log data, wherein the termination character is the first target character positioned behind the initial character in the character string data, and the next character of the target character is a preset special character;
And step S208, performing desensitization processing on the data corresponding to each coordinate to be desensitized in the log data to obtain the desensitized log data.
Through the steps, the application scheme aims at the problem that the regular matching efficiency of the sensitive information based on the NFA in the log desensitization mode of the related technology is low, the sensitive information is matched based on the SNFA mode of the character string traversal according to the characteristics of the sensitive data of the transaction scene, invalid matching is reduced, all sensitive data coordinates are rapidly and accurately acquired, the flexible desensitization mode is provided for data desensitization according to the sensitive information coordinates acquired by the traversal matching, the purposes of ensuring the matching accuracy and improving the desensitization efficiency are achieved, and therefore the performance loss of the log desensitization printing in high-concurrency, high-flow scenes and the like in the transaction system is reduced, and the technical problem that the log desensitization efficiency is low due to the fact that the regular matching mode is carried out by adopting an uncertain state machine when the log desensitization is carried out in the related technology is solved.
According to the embodiment of the application, firstly, the service log is collected according to the characteristics of transaction data, then the service log is input into a desensitization thread for data preprocessing, data without desensitization is filtered out, then a SNFA-based matching method is constructed, the sensitive information is matched to obtain coordinate information, and finally data desensitization is carried out according to the coordinate information. The method of log desensitization in steps S202 to S208 in the embodiment of the present application is further described below.
Fig. 4 is a schematic diagram of an SNFA-based log desensitizing procedure according to an embodiment of the present application, as shown in fig. 4, where the procedure includes the following steps:
firstly, receiving log data in a target service system, and transmitting the log data to a desensitization thread to perform candidate data desensitization processing; for example, the log data of the target service system can be obtained by introducing an SNFA software package into the target service system and modifying the configuration file of the system log;
specifically, based on a Spring framework and a log back log framework, by providing an SNFA software package for use in the background of a target service system, as shown in fig. 5, in the embodiment of the present application, a three-party package is provided for a service party to use, and an asynchronous receiving class a in the three-party package of the SNFA is used to replace a data receiving class B in a log file configuration of an application, where the class a is a class outputstreamappliance in an inherited log back open source package, so that a user-defined class based on SNAF is used to adapt the log back, and data to be desensitized is transferred to a desensitization channel of the SNFA. The method can achieve the effect of desensitizing the acquired log data by modifying the log configuration, does not influence service functions, provides an expansion interface, and can be automatically expanded by a service party.
The obtained log data comprises a plurality of character string data which are divided according to the dimension of the data content, wherein the character string data comprise: the single data is character string data only comprising one service attribute data, wherein the service attribute data can be one service attribute in sensitive information such as a mobile phone number, a bank card number, a name and the like, or can be a non-sensitive information service attribute, for example, a character string '153 xxxx 1234' represents the mobile phone number and is single data; for example, the string "20ms" represents interface time consuming, not sensitive data, but single data.
The combined data is character string data containing a plurality of service attribute data, and can only contain sensitive information, can only contain non-sensitive information, and can also be a mixture of the sensitive information and the non-sensitive information. Such as the string "wang xx;153xxxx1234; jiangsu xxx' represents combined data only containing 3 sensitive information of mobile phone numbers, names and addresses; such as the string "xxxxx;2018-05-09 11:04:00 "represents combined data containing only 2 pieces of non-sensitive information of nickname and account registration time. Such as the string "153xxxx1234; a book; 2; jiangsu xxx' represents the combination data of sensitive data such as mobile phone numbers and addresses and non-sensitive data such as books and purchase quantity.
The object data is character string data satisfying a preset format, and in this embodiment, may be character string data satisfying a JSON format, where the JSON format includes: objects and arrays, where JSON objects are an unordered set of ' name/value ' pairs '. An object starts with "{" (left bracket) and ends with "}" (right bracket). Each "name" is followed by one ": (colon); the ""' name/value "pair is used" between "" (comma) separations. JSON arrays are ordered sets of values (values). An array starts with "[" (left middle brackets) and ends with "]" (right middle brackets). The values are separated by "," (comma). The value may be a string (string), a number (number), true, false, null, an object (object), or an array (array) with double quotation marks. The above structures can be nested.
By way of example, { "userName": "Mitsui", "userPhone":15311111111 "," itemNO ":"1bcd5cc3bd9a4003be7c14f53b503f66"," zip code ":"221300"," goodsName ": package 1", "goodsId": 1"," price ": 10.00", "count": 2"," tatalcount ":20.00}, is a JSON object representing merchandise information purchased by the user.
For example, [ { "itemNo": 1bcd5cc3bd9a4003be7c14f53b503f66"," zip code ": 221300", "goodsName": "merchandise 1", "goodsId": 1"," price ": 10.00", "count": 2"," tatalAcount ":20.00}, {" itemNo ": g458ucdk9d9a4507917c14f53ps91f81", "zip code": 221300"," goodsName ": merchandise 2", "goodsId": 2"," price ": 13.50", "count": 2"," tatalAcount ":27.00} ], i.e. is a JSON array representing a set of merchandise information purchased by the user.
After the log data is obtained, the data can be preprocessed according to optional preset rules by combining the data characteristics of the character string data, and the data which are not required to be matched are filtered out, wherein the method comprises the following specific steps.
In some embodiments of the present application, after obtaining log data in the target business system, the method further comprises the steps of: determining judgment sub-conditions corresponding to the log data, and determining logic operators among the judgment sub-conditions, wherein the judgment sub-conditions are used for judging whether the character string data has possibility of containing sensitive data or not, and the logic operators comprise at least one of the following: or operation and AND operation; determining a target preprocessing rule according to the judging sub-condition and a logic operator corresponding to the judging sub-condition, and screening the character string data in the log data according to the target preprocessing rule, wherein the target preprocessing rule is used for filtering the character string data which does not contain sensitive data in the log data.
The above-mentioned judgment sub-condition can be configured and selected by the user according to the actual requirement, for example, a preset processing rule is defaulted, namely, the judgment sub-condition 1. According to the service scene, the user can also close the judgment sub-condition 1. Meanwhile, the user can add the judgment sub-condition (the judgment sub-condition 2-the judgment sub-condition n) by himself, for example, add rules for filtering according to special characters of the data prefix. Each judgment sub-condition can adopt an AND mode or an OR mode, if an AND mode is adopted, all the judgment sub-conditions 1 to n are required to be met to enter a matching flow, otherwise, the data is ignored. If the method of OR is adopted, one of the judging sub-condition 1 to the judging sub-condition n is in accordance with the condition, the matching flow can be entered, otherwise, the data is ignored.
The process of screening the string data in the log data will be further described below taking the target preprocessing rule composed of the judgment sub-conditions (condition 1, condition 2, and condition 3) shown in fig. 6 as an example.
In some embodiments of the present application, filtering the string data in the log data according to the target preprocessing rule includes the following steps: judging whether the first character of the character string data is a first preset character or not, and reserving the character string data under the condition that the first character is the first preset character, wherein the first preset character is the first character in a preset format of the object data; judging whether the data length of the character string data is larger than a first length threshold value or not under the condition that the first character of the character string data is not a first preset character, and reserving the character string data under the condition that the data length is larger than the first length threshold value; judging whether the character string data meets any one of target conditions under the condition that the data length is not larger than a first length threshold, wherein the target conditions comprise: the character string data starts with Chinese characters, the character string data starts with numbers, the data length of the character string data is not less than a second length threshold value, the character string data ends with a second preset character, the second length threshold value is smaller than the first length threshold value, and the second preset character is a suffix character of the electronic mailbox; and when the character string data meets any one of the target conditions, reserving the character string data, and when the character string data does not meet all of the target conditions, screening out the character string data.
Specifically, the condition (judgment sub-rule) 1 shown in fig. 6 is: data is represented by symbols (i.e., the first predetermined character, {, [ beginning (JSON data is represented by symbols { or [ beginning "); condition 2 is: the data length is greater than a first length threshold Y (default value is 19, and the user can configure according to the service); condition 3 is: if the data satisfies any one of the target rules a, b and c3, the condition 3 is satisfied, wherein the target rule a is: starting with a Chinese character (possibly address or name; target rule b is that starting with a number, length 11 (namely the second length threshold) is up to a first length threshold Y (possibly mobile phone number, bank card number, identity card number), target rule c is that ending with a. Cn or. Com or. Net or. Org (namely the second preset character) (possibly mailbox)), namely considering that the data service data in JSON format is complex in the embodiment, and data filtering is not performed for the data types, and for single and combined data, corresponding data filtering is performed according to the characteristics of sensitive data length, suffix symbols and the like.
After the log data preprocessing is completed, the SNFA method can be adopted to match the sensitive information of the reserved log data, and desensitized coordinates are obtained, which is further described below.
In this embodiment, the target service system is exemplified by a transaction system, in which the sensitive data types of the sensitive information in the log data may include: name, mobile phone number, bank card number, mailbox, address, identity card number and the like, the data to be matched are put into the first matching threads corresponding to different first matching rules for parallel matching, and finally the coordinates to be desensitized are converged, so that subsequent desensitization is facilitated, and the method comprises the following specific steps.
In some embodiments of the present application, sequentially determining whether each character in the character string data located after the start character satisfies the first matching rule includes the steps of: determining first matching rules corresponding to the log data, wherein each first matching rule corresponds to one sensitive data type, and the sensitive data type comprises at least one of the following: name, mobile phone number, address, bank card number, identification card number, mailbox; and respectively calling a first matching thread corresponding to each first matching rule to match the log data, and judging whether sensitive data of each sensitive data type exists in the log data.
Specifically, as shown in fig. 7, after the matching result of each first matching thread is obtained, the coordinates to be desensitized obtained by each thread can be converged to obtain the position coordinates corresponding to all the sensitive data types in the character string data.
In the present embodiment, the first matching rule described above includes, but is not limited to, the first matching rule shown in the following table.
As shown in the above table, log data may include: sensitive data such as name, mobile phone number, bank card number, mailbox, address and identity card number are divided by comma or semicolon if the processing mode of NFA in the related technology is adopted, and then the matching is carried out by adopting the regular expression mode of NFA. In this way, data segmentation is time consuming, while single number, although not required for desensitization, still performs backtracking matching. In the embodiment of the present application, a SNFA manner is adopted, and once traversing is required, when a certain position of continuous data does not conform to a matching rule, the matching can be stopped until the next continuous data arrives and then the matching is restarted, and the method for matching sensitive information in the embodiment of the present application is further described below, and specific steps are as follows.
In some embodiments of the present application, determining whether sensitive data of each sensitive data type is present in the log data includes the steps of: determining characters which are not preset special characters in the character string data as initial characters, and recording a first position of the initial characters in the log data, wherein the preset special characters comprise at least one of the following: comma, colon, double quotation, brackets, middle brackets, stop signs, semicolon; starting from the initial character, judging whether each character in the character string data meets a first matching rule in sequence, and re-determining the next initial character in the character string data under the condition that the character does not meet the first matching rule; judging whether the next character next to the character is a preset special character or not under the condition that the character meets the first matching rule, and continuously judging whether the next character meets the first matching rule or not under the condition that the next character next to the character is not the preset special character; and determining the current character as a termination character and recording a second position of the termination character in the log data under the condition that the next character of the immediately adjacent characters is a preset special character.
In some embodiments of the present application, after determining the current character as the termination character, the method further comprises the steps of: determining a character length of a character string located between the start character and the end character; under the condition that the character length is not greater than the data length of the data of the sensitive data type corresponding to the first matching rule, determining the character string between the initial character and the termination character as sensitive data, and determining the coordinate to be desensitized according to the first position and the second position; and when the character length is larger than the data length, the next initial character in the character string data is redetermined for matching.
Specifically, fig. 8 is a schematic diagram of an SNFA sensitive information matching process provided according to an embodiment of the present application, and as shown in fig. 8, the process includes the following steps:
step 1, firstly traversing character string data, determining characters which are not preset special characters in the character string data, determining the characters as initial characters, recording the first position of the initial characters, namely recording the starting position coordinate of one effective data in the character string as i, assigning the current position coordinate j as i, and jumping to the step 2; if the character string data does not contain the character which is not the preset special character, judging that the effective data is not available, and ending the matching of the character string data;
The preset special characters include, but are not limited to: comma (,), colon (), double quotation ("), bracket ({ }), middle bracket ([ ]), dash (,), semicolon (;) and the like.
Step 2, judging whether the character of the current position coordinate j (the coordinate value is i at the beginning) accords with a corresponding first matching rule, if the character of the current position coordinate j meets the first matching rule, jumping to step 3, and if the matching fails, jumping to step 1 to re-determine the next initial character in the character string data;
step 3, judging whether the next character next to the current character is a preset special character, if the next character next to the current character is not the preset special character, jumping to step 2 to continuously judge whether the next character meets a first matching rule, if the next character next to the current character is the preset special character or reaches the end of the character string data, determining the current character as a termination character, recording the position coordinate j of the current character as the second position, and jumping to step 4;
and 4, if the character length of the character string between the ending second position j and the starting first position i is smaller than or equal to the data length Y (target length Y) of the sensitive data type data corresponding to the first matching rule, namely j-i+1< = Y, storing the coordinates [ i, j ] to be desensitized into a desensitization array, jumping to the step 5, otherwise jumping to the step 1, and re-determining the next initial character in the character string data for matching.
Step 5, after the matching of one effective data is finished, if the character string traverses to the end, finishing the matching of the data; otherwise, jumping to the step 1 to redetermine the next initial character in the character string data for matching.
In this embodiment, after the SNFA sensitive information matching process shown in fig. 8, the coordinates to be desensitized obtained by each first matching thread may be in the form of a two-dimensional array, which represents the start and end position information of the data to be desensitized.
For example, taking the mobile phone number matching as an example, the character string a is "153xxx3517, some places, 1575xxx1234, some places in some city in some provinces, some areas in some city" as an example, the coordinates output after the mobile phone number matching thread is desensitized are [ [0,10], [16,26] ], which means that 2 mobile phone numbers need to be desensitized, the initial coordinates of the character string corresponding to the first mobile phone number needing to be desensitized are the 0 th and 10 th positions of a, and the initial coordinates of the character string corresponding to the second mobile phone number needing to be desensitized are the 16 th and 26 th positions of a. In this embodiment, the results of the first matching threads are aggregated to obtain a mapping type coordinate result, where the key value is used to represent a first matching rule corresponding to the coordinate to be desensitized, and the value is a two-dimensional array output by each matching rule, and the format of the two-dimensional array is as follows: { "2": [ [0,10], [16,26] ], "1": [ [12,14] ], "3": [ [27,36] ] }, wherein "2" represents the first matching rule numbered 2, "1" represents the first matching rule numbered 1, "3" represents the first matching rule numbered 3, and the data within [ ] represents the coordinate information to be desensitized. The two-dimensional array may thus indicate that there are 2 phone numbers, 1 name and 1 address to desensitize.
After the coordinates to be desensitized are obtained, data desensitization can be performed according to the coordinates to be desensitized, and the specific steps are as follows.
In some embodiments of the present application, the desensitizing processing for the data corresponding to each coordinate to be desensitized in the log data includes the following steps: replacing characters of coordinates to be desensitized in the log data with a preset desensitization identifier; or encrypting the characters of the coordinates to be desensitized in the log data to obtain corresponding encrypted characters, and replacing the corresponding characters of the coordinates to be desensitized with the encrypted characters.
In particular, the replacement desensitization may be performed with a preset desensitization identifier (e.g., x). This mode is faster and the desensitization effect is shown in the following table.
Data type Original value Desensitization value
Name of name King AA King ×
Mobile phone number 15111111111 151*****234
Bank card number 6220000000000000000 6220******0000
Identification card number 100000000000000000 100000*******0000
Address of 1-101 indoor test cell in C area of B city of A province B city C region of province a
Mailbox [email protected] 1****@qq.com
In addition, the AES (Advanced Encryption Standard ) encryption mode can be adopted for desensitization, the desensitization effect of which is shown in the following table, and the speed of the method is slower than that of the mode of replacing desensitization by using a preset desensitization identifier (for example, x), but the original data can be recovered, so that the problem analysis and positioning are facilitated.
/>
According to the SNFA-based sensitive information matching and NFA-based sensitive information matching in the related technology, efficiency comparison tests are conducted on the windows11 system, the JVM of the test platform and the distributed memory size 4G and 4-core CPU, the SNFA mode is adopted, the matching efficiency is obviously improved, the matching efficiency improvement range is 30% -70% for different data types, and the more complex the character string data format is, the more obvious the efficiency improvement is.
When the transaction log data is desensitized, rules can be preset according to the characteristics of the transaction data, the data is preprocessed, the data without desensitization is filtered, and the overall desensitization efficiency is improved; according to the transaction data characteristics, the matching method of the NFA is replaced, the SNFA mode based on character string traversal is adopted for matching the sensitive information, the matching of the sensitive information of the O (n) level is realized, the matching accuracy is improved, and the desensitization efficiency is greatly improved; and finally, according to the sensitive data coordinates, an encryption algorithm is used for quick encryption and desensitization, so that the desensitization efficiency is improved, and meanwhile, sensitive information omission is avoided.
According to an embodiment of the present application, an embodiment of a log desensitizing apparatus is also provided. Fig. 9 is a schematic structural diagram of a log desensitizing apparatus according to an embodiment of the present application. As shown in fig. 9, the apparatus includes:
A log data obtaining module 90, configured to obtain log data in the target service system, where the log data includes a plurality of character string data;
optionally, the character string data includes: the system comprises single data, combined data and object data, wherein the single data is character string data only comprising one service attribute data, the combined data is character string data comprising a plurality of service attribute data, and the object data is character string data meeting a preset format; after acquiring the log data in the target service system, the log data acquisition module 90 is further configured to: determining judgment sub-conditions corresponding to the log data, and determining logic operators among the judgment sub-conditions, wherein the judgment sub-conditions are used for judging whether the character string data has possibility of containing sensitive data or not, and the logic operators comprise at least one of the following: or operation and AND operation; determining a target preprocessing rule according to the judging sub-condition and a logic operator corresponding to the judging sub-condition, and screening the character string data in the log data according to the target preprocessing rule, wherein the target preprocessing rule is used for filtering the character string data which does not contain sensitive data in the log data.
Optionally, filtering the character string data in the log data according to the target preprocessing rule includes: judging whether the first character of the character string data is a first preset character or not, and reserving the character string data under the condition that the first character is the first preset character, wherein the first preset character is the first character in a preset format of the object data; judging whether the data length of the character string data is larger than a first length threshold value or not under the condition that the first character of the character string data is not a first preset character, and reserving the character string data under the condition that the data length is larger than the first length threshold value; judging whether the character string data meets any one of target conditions under the condition that the data length is not larger than a first length threshold, wherein the target conditions comprise: the character string data starts with Chinese characters, the character string data starts with numbers, the data length of the character string data is not less than a second length threshold value, the character string data ends with a second preset character, the second length threshold value is smaller than the first length threshold value, and the second preset character is a suffix character of the electronic mailbox; and when the character string data meets any one of the target conditions, reserving the character string data, and when the character string data does not meet all of the target conditions, screening out the character string data.
The sensitive data matching module 92 is configured to traverse the character string data, determine that a character that is not a preset special character in the character string data is a start character, and sequentially determine, from the start character, whether each character that is located after the start character in the character string data meets a first matching rule, where the first matching rule is used to determine whether data formed by each character is sensitive data;
optionally, sequentially determining whether each character of the character string data located after the start character satisfies the first matching rule includes: determining first matching rules corresponding to the log data, wherein each first matching rule corresponds to one sensitive data type, and the sensitive data type comprises at least one of the following: name, mobile phone number, address, bank card number, identification card number, mailbox; and respectively calling a first matching thread corresponding to each first matching rule to match the log data, and judging whether sensitive data of each sensitive data type exists in the log data.
Optionally, determining whether sensitive data of each sensitive data type exists in the log data includes: determining characters which are not preset special characters in the character string data as initial characters, and recording a first position of the initial characters in the log data, wherein the preset special characters comprise at least one of the following: comma, colon, double quotation, brackets, middle brackets, stop signs, semicolon; starting from the initial character, judging whether each character in the character string data meets a first matching rule in sequence, and re-determining the next initial character in the character string data under the condition that the character does not meet the first matching rule; judging whether the next character next to the character is a preset special character or not under the condition that the character meets the first matching rule, and continuously judging whether the next character meets the first matching rule or not under the condition that the next character next to the character is not the preset special character; and determining the current character as a termination character and recording a second position of the termination character in the log data under the condition that the next character of the immediately adjacent characters is a preset special character.
The sensitive position determining module 94 is configured to determine coordinates to be desensitized in the log data according to a first position of the start character in the log data and a second position of the end character in the log data when each character located between the start character and the end character in the character string data satisfies a first matching rule, where the end character is a first target character located after the start character in the character string data, and a next character immediately adjacent to the target character is a preset special character;
optionally, after determining the current character as the termination character, the sensitive location determination module 94 is further configured to: determining a character length of a character string located between the start character and the end character; under the condition that the character length is not greater than the data length of the data of the sensitive data type corresponding to the first matching rule, determining the character string between the initial character and the termination character as sensitive data, and determining the coordinate to be desensitized according to the first position and the second position; and when the character length is larger than the data length, the next initial character in the character string data is redetermined for matching.
The sensitive data desensitizing module 96 is configured to desensitize data corresponding to each coordinate to be desensitized in the log data, so as to obtain desensitized log data.
Optionally, the desensitizing processing for the data corresponding to each coordinate to be desensitized in the log data includes: replacing characters of coordinates to be desensitized in the log data with a preset desensitization identifier; or encrypting the characters of the coordinates to be desensitized in the log data to obtain corresponding encrypted characters, and replacing the corresponding characters of the coordinates to be desensitized with the encrypted characters.
Note that each module in the log desensitizing apparatus may be a program module (for example, a set of program instructions for implementing a specific function), or may be a hardware module, and for the latter, it may be represented by the following form, but is not limited thereto: the expression forms of the modules are all a processor, or the functions of the modules are realized by one processor.
It should be noted that, the log desensitizing device provided in the present embodiment may be used to execute the log desensitizing method shown in fig. 2, so the explanation of the log desensitizing method is also applicable to the embodiments of the present application, and is not repeated here.
The embodiment of the application also provides a nonvolatile storage medium, which comprises a stored computer program, wherein the equipment where the nonvolatile storage medium is located executes the following log desensitizing method by running the computer program: acquiring log data in a target service system, wherein the log data comprises a plurality of character string data; traversing the character string data, determining characters which are not preset special characters in the character string data as initial characters, and sequentially judging whether each character positioned behind the initial characters in the character string data meets a first matching rule from the initial characters, wherein the first matching rule is used for judging whether data formed by each character is sensitive data; under the condition that each character positioned between a starting character and a stopping character in the character string data meets a first matching rule, determining a coordinate to be desensitized in the log data according to a first position of the starting character in the log data and a second position of the stopping character in the log data, wherein the stopping character is a first target character positioned behind the starting character in the character string data, and the next character of the target character is a preset special character; and carrying out desensitization treatment on the data corresponding to each coordinate to be desensitized in the log data to obtain the desensitized log data.
The foregoing embodiment numbers of the present application are merely for describing, and do not represent advantages or disadvantages of the embodiments.
In the foregoing embodiments of the present application, the descriptions of the embodiments are emphasized, and for a portion of this disclosure that is not described in detail in this embodiment, reference is made to the related descriptions of other embodiments.
In the several embodiments provided in the present application, it should be understood that the disclosed technology content may be implemented in other manners. The above-described embodiments of the apparatus are merely exemplary, and the division of the units, for example, may be a logic function division, and may be implemented in another manner, for example, a plurality of units or components may be combined or may be integrated into another system, or some features may be omitted, or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be through some interfaces, units or modules, or may be in electrical or other forms.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.
In addition, each functional unit in each embodiment of the present application may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units may be implemented in hardware or in software functional units.
The integrated units, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application may be embodied in essence or a part contributing to the prior art or all or part of the technical solution in the form of a software product stored in a storage medium, including several instructions to cause a computer device (which may be a personal computer, a server or a network device, etc.) to perform all or part of the steps of the methods described in the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a removable hard disk, a magnetic disk, or an optical disk, or other various media capable of storing program codes.
The foregoing is merely a preferred embodiment of the present application and it should be noted that modifications and adaptations to those skilled in the art may be made without departing from the principles of the present application and are intended to be comprehended within the scope of the present application.

Claims (10)

1. A method of log desensitization comprising:
acquiring log data in a target service system, wherein the log data comprises a plurality of character string data;
traversing the character string data, determining that characters which are not preset special characters in the character string data are initial characters, and sequentially judging whether each character positioned behind the initial characters in the character string data meets a first matching rule from the initial characters, wherein the first matching rule is used for judging whether data formed by each character is sensitive data;
under the condition that each character positioned between the initial character and the termination character in the character string data meets the first matching rule, determining a coordinate to be desensitized in the log data according to a first position of the initial character in the log data and a second position of the termination character in the log data, wherein the termination character is a first target character positioned behind the initial character in the character string data, and the next character of the target character is the preset special character;
And performing desensitization processing on the data corresponding to each coordinate to be desensitized in the log data to obtain the log data after desensitization.
2. The log desensitization method according to claim 1, wherein sequentially judging whether each character located after the start character in the character string data satisfies a first matching rule comprises:
determining the first matching rules corresponding to the log data, wherein each first matching rule corresponds to a sensitive data type, and the sensitive data type comprises at least one of the following: name, mobile phone number, address, bank card number, identification card number, mailbox;
and respectively calling a first matching thread corresponding to each first matching rule to match the log data, and judging whether sensitive data of each sensitive data type exists in the log data.
3. The method of log desensitization according to claim 2, wherein determining whether sensitive data of each of said sensitive data types is present in said log data comprises:
determining that a character which is not a preset special character in the character string data is the initial character, and recording the first position of the initial character in the log data, wherein the preset special character comprises at least one of the following: comma, colon, double quotation, brackets, middle brackets, stop signs, semicolon;
Starting from the initial character, sequentially judging whether each character in the character string data meets the first matching rule, and re-determining the next initial character in the character string data under the condition that the character does not meet the first matching rule;
judging whether the next character next to the character is the preset special character under the condition that the character meets the first matching rule, and continuously judging whether the next character meets the first matching rule under the condition that the next character next to the character is not the preset special character; the method comprises the steps of,
and under the condition that the character next to the character is the preset special character, determining the current character as the termination character, and recording the second position of the termination character in the log data.
4. A log desensitizing method according to claim 3, wherein after determining a current character as the termination character, the method further comprises:
determining a character length of the character string located between the start character and the end character;
under the condition that the character length is not greater than the data length of the data of the sensitive data type corresponding to the first matching rule, determining a character string between the initial character and the termination character as the sensitive data, and determining the coordinate to be desensitized according to the first position and the second position;
And re-determining the next initial character in the character string data for matching under the condition that the character length is larger than the data length.
5. The log desensitizing method according to claim 1, wherein the desensitizing processing of the data corresponding to each of the coordinates to be desensitized in the log data includes:
replacing the character of the coordinate to be desensitized in the log data with a preset desensitization identifier; or encrypting the character of the coordinate to be desensitized in the log data to obtain a corresponding encrypted character, and replacing the corresponding character of the coordinate to be desensitized with the encrypted character.
6. The log desensitization method according to claim 1, wherein said character string data comprises: the system comprises single data, combined data and object data, wherein the single data is the character string data only comprising one service attribute data, the combined data is the character string data comprising a plurality of service attribute data, and the object data is the character string data meeting a preset format; after obtaining the log data in the target service system, the method further comprises:
Determining judgment sub-conditions corresponding to the log data, and determining logic operators among the judgment sub-conditions, wherein the judgment sub-conditions are used for judging whether the character string data has possibility of containing sensitive data or not, and the logic operators comprise at least one of the following: or operation and AND operation;
determining a target preprocessing rule according to the judging sub-condition and the logic operator corresponding to the judging sub-condition, and screening the character string data in the log data according to the target preprocessing rule, wherein the target preprocessing rule is used for filtering the character string data which does not contain sensitive data in the log data.
7. The method of log desensitization according to claim 6, wherein filtering said character string data in said log data according to said target preprocessing rules comprises:
judging whether the first character of the character string data is a first preset character or not, and reserving the character string data under the condition that the first character is the first preset character, wherein the first preset character is the first beginning character in the preset format of the object data;
Judging whether the data length of the character string data is larger than a first length threshold value or not under the condition that the first character of the character string data is not the first preset character, and reserving the character string data under the condition that the data length is larger than the first length threshold value;
judging whether the character string data meets any one of target conditions under the condition that the data length is not larger than the first length threshold, wherein the target conditions comprise: the character string data starts with Chinese characters, the character string data starts with numbers, the data length of the character string data is not smaller than a second length threshold value, the character string data ends with a second preset character, the second length threshold value is smaller than the first length threshold value, and the second preset character is a suffix character of an electronic mailbox;
and when the character string data meets any one of the target conditions, reserving the character string data, and when the character string data does not meet all of the target conditions, screening out the character string data.
8. A log desensitizing apparatus, comprising:
The system comprises a log data acquisition module, a target service system and a target service system, wherein the log data acquisition module is used for acquiring log data in the target service system, and the log data comprises a plurality of character string data;
the sensitive data matching module is used for traversing the character string data, determining characters which are not preset with special characters in the character string data as initial characters, and sequentially judging whether each character positioned behind the initial characters in the character string data meets a first matching rule from the initial characters, wherein the first matching rule is used for judging whether data formed by each character is sensitive data;
the sensitive position determining module is used for determining a coordinate to be desensitized in the log data according to a first position of the initial character in the log data and a second position of the final character in the log data under the condition that each character positioned between the initial character and the final character in the character string data meets the first matching rule, wherein the final character is a first target character positioned behind the initial character in the character string data, and the next character of the target character is the preset special character;
And the sensitive data desensitization module is used for carrying out desensitization processing on the data corresponding to each coordinate to be desensitized in the log data to obtain the log data after desensitization.
9. An electronic device, comprising: a memory and a processor for executing a program stored in the memory, wherein the program is run to perform the log desensitization method according to any one of claims 1-7.
10. A non-volatile storage medium, characterized in that the non-volatile storage medium comprises a stored computer program, wherein the device in which the non-volatile storage medium is located performs the log desensitization method according to any one of claims 1 to 7 by running the computer program.
CN202311631522.7A 2023-11-30 2023-11-30 Log desensitization method and device, electronic equipment and nonvolatile storage medium Pending CN117688906A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311631522.7A CN117688906A (en) 2023-11-30 2023-11-30 Log desensitization method and device, electronic equipment and nonvolatile storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311631522.7A CN117688906A (en) 2023-11-30 2023-11-30 Log desensitization method and device, electronic equipment and nonvolatile storage medium

Publications (1)

Publication Number Publication Date
CN117688906A true CN117688906A (en) 2024-03-12

Family

ID=90129354

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311631522.7A Pending CN117688906A (en) 2023-11-30 2023-11-30 Log desensitization method and device, electronic equipment and nonvolatile storage medium

Country Status (1)

Country Link
CN (1) CN117688906A (en)

Similar Documents

Publication Publication Date Title
CN108197532B (en) The method, apparatus and computer installation of recognition of face
JP6574904B2 (en) Method, server, and storage medium for mining a target object social account
JP5449628B2 (en) Determining category information using multistage
CN108491388B (en) Data set acquisition method, classification method, device, equipment and storage medium
CN111241389B (en) Sensitive word filtering method and device based on matrix, electronic equipment and storage medium
WO2018001078A1 (en) Url matching method and device, and storage medium
CN110224859B (en) Method and system for identifying a group
CN110162637B (en) Information map construction method, device and equipment
CN112559631B (en) Data processing method and device of distributed graph database and electronic equipment
CN106790727A (en) Information push method and device
CN108228657B (en) Method and device for realizing keyword retrieval
CN108154024A (en) A kind of data retrieval method, device and electronic equipment
CN103559177A (en) Geographical name identification method and geographical name identification device
CN109766231A (en) Interface test method and device
CN109495562A (en) Obtain the method and device of device identification
CN117851404A (en) Report generation method, device, medium and equipment based on self-configuration
CN111310224B (en) Log desensitization method, device, computer equipment and computer readable storage medium
CN113033194A (en) Training method, device, equipment and storage medium of semantic representation graph model
CN107071553A (en) Method, device and computer readable storage medium for modifying video and voice
CN117688906A (en) Log desensitization method and device, electronic equipment and nonvolatile storage medium
CN110781375B (en) User state identification determining method and device
CN105634999B (en) A kind of aging method and device of Media Access Control address
CN113032251B (en) Method, device and storage medium for determining service quality of application program
CN114840388A (en) Data monitoring method and device, electronic equipment and storage medium
WO2021129849A1 (en) Log processing method, apparatus and device, and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination