CN115827702A - Software white list query method based on bloom filter - Google Patents

Software white list query method based on bloom filter Download PDF

Info

Publication number
CN115827702A
CN115827702A CN202310061459.1A CN202310061459A CN115827702A CN 115827702 A CN115827702 A CN 115827702A CN 202310061459 A CN202310061459 A CN 202310061459A CN 115827702 A CN115827702 A CN 115827702A
Authority
CN
China
Prior art keywords
software
white list
data
suspected
legal
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202310061459.1A
Other languages
Chinese (zh)
Other versions
CN115827702B (en
Inventor
严锦立
荣星
王平
吴流丽
廖建华
黄河
李彦琛
毛建辉
张永星
季伟
王耀
刘筱明
袁建国
张子文
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
UNIT 61660 OF PLA
Original Assignee
UNIT 61660 OF PLA
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by UNIT 61660 OF PLA filed Critical UNIT 61660 OF PLA
Priority to CN202310061459.1A priority Critical patent/CN115827702B/en
Publication of CN115827702A publication Critical patent/CN115827702A/en
Application granted granted Critical
Publication of CN115827702B publication Critical patent/CN115827702B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Abstract

The disclosure relates to a bloom filter-based software white list query method, which comprises the following steps: matching fingerprint data of software with data in a terminal white list bloom filter, and if the matching is successful, judging the software to be suspected legal; if the matching is unsuccessful, judging that the software belongs to illegal software; matching the fingerprint data of the suspected legal software with the fingerprint data of the white list software in the white list data cache of the terminal; if the matching is successful, judging the suspected legal software to be legal software; if the matching is unsuccessful, the fingerprint data of the suspected legal software is matched with a tree-shaped white list data index base in a server, if the matching is successful, the suspected legal software is judged to be legal software, and if the matching is unsuccessful, the suspected legal software is illegal software.

Description

Software white list query method based on bloom filter
Technical Field
The disclosure belongs to the technical field of databases, and particularly relates to a bloom filter-based software white list query method.
Background
With the rapid development of networks and information technologies, the scale and complexity of terminal software are continuously expanding and increasing, and realizing security control executed on the terminal software is a key to ensure the security of terminals and the security of the whole network. In the fields of industrial control, aerospace and the like, the safety of terminal software can directly influence whether the whole calculation control can be normally executed. In these critical facilities, the running of malware causes significant losses and costs. Therefore, it is important to adopt white list-based software control mechanisms in these fields.
The existing white list management and control system adopts a client-server architecture, and a server can issue a huge white list library in advance to a client. When the client runs the program, the client indexes the white list library based on the collected software fingerprint information. Although the method can realize the management and control of the white list, the method has the problems of high white list resource occupation and low query efficiency. Therefore, the white list resource occupation of the client is reduced, and the key of the white list management system is to improve the query efficiency of the white list.
The bloom filter is a probability type data structure based on binary vectors, can support efficient insertion and query, and can be used for determining that a certain piece of data does not exist or possibly exists in a set. Based on this feature, the bloom filter is typically used as a blacklist, for example, in DNS query [1], legitimate DNS requests can be quickly filtered based on the blacklist. However, the bloom filter has a certain misjudgment rate, and therefore, for a hit request, it is necessary to further inquire whether the request is an illegal request or a legal request. In summary, in a large-scale data filtering scene, the bloom filter has the advantage of efficiently querying the data meeting the conditions, but a certain misjudgment rate exists.
In a white list software management and control system, the characteristics of the bloom filter also have great application advantages, and the key problem to be solved is to eliminate misjudgment brought by the bloom filter. The principle of storing white list data by the bloom filter is shown in fig. 2, and the structure of the bloom filter is a binary vector with the length of 16 bits. In the process of updating the bloom filter, software fingerprint data is used as input and is processed by 4 hash functions to obtain 4 outputs, each output result is an index position in a binary vector, the value of the 4 index positions is set to be 1, and the values of other 12 positions are set to be 0 to generate the hash feature of the software. And obtaining the bloom filter after inserting the white list data by taking or subtracting the old value of the bloom filter from the hash characteristic of the software. When software fingerprints are inquired, similarly, software fingerprint data is used as input and is processed by 4 Hash functions to obtain 4 outputs, whether the 4 index positions are all 1 or not is judged in the binary vector, and if not, the software is judged to be illegal; if yes, the software is judged to be suspected legal software, and certain misjudgment exists in the bloom filter.
For software that hits in a bloom filter, a further determination is needed as to whether the software fingerprint is actually in the whitelist. The white list data corpus is cached on the client side, and whether the suspected legal software is legal software or illegal software can be judged through item-by-item matching and searching. However, the method has the problems of high storage occupation in the client and low white list query matching efficiency. Therefore, how to reduce the occupation amount of white list data in the client and the query efficiency of the white list are very critical.
Disclosure of Invention
The present disclosure is made based on the above-mentioned needs of the prior art, and the technical problem to be solved by the present disclosure is to provide a bloom filter-based software white list query method, so as to improve the efficiency of the judgment in the judgment of the white list software and eliminate the misjudgment.
In order to solve the above problem, the technical solution provided by the present disclosure includes:
the fingerprint data of the software is matched with the data in the terminal white list bloom filter, and if the matching is successful, the software is judged to be suspected legal software; if the matching is unsuccessful, judging that the software belongs to illegal software;
matching the fingerprint data of the suspected legal software with the fingerprint data of the white list software in the white list data cache of the terminal; if the matching is successful, judging the suspected legal software to be legal software; if the matching is unsuccessful, matching the fingerprint data of the suspected legal software with the software index information and the fingerprint information of the tree-shaped white list data index database in the server, if the matching is successful, judging the suspected legal software to be legal software, and if the matching is unsuccessful, judging the suspected legal software to be illegal software.
By the method, the software successfully matching the white list bloom filter continues to perform matching query with the data in the white list data cache or the tree-shaped white list data index library, so that misjudgment of the bloom filter is eliminated; the efficiency and the accuracy of judgment can be improved by confirming the white list data under the condition of not calling the server through the judgment of the client cache; and finally confirming a small part of white list software needing complete verification through a tree-shaped white list data index library of the server side and a white list data cache query mechanism of the client side. The advantages of the bloom filter, the client cache and the server are reasonably coordinated through the technical scheme, the bloom filter is used for quickly judging whether the software belongs to non-white list software, the frequently accessed white list data set stored in the client cache is used for quickly confirming whether the software screened by the bloom filter is the white list software, the server stores the full set of the white list library, and further completely confirms whether the software to be judged is the white list software under the condition that the client cache possibly has omission, so that the balance between the white list query efficiency and the storage occupation is realized, the query efficiency is improved, and the misjudgment is eliminated.
Preferably, matching the fingerprint data of the software with data in a white list bloom filter of the terminal includes: calculating the fingerprint data of the software respectively aiming at N different hash functions to obtain N different index values; and judging whether the numerical values of N positions corresponding to the N different index values in the binary number group of the bloom filter are all 1, judging the software to be suspected legal software when all the numerical values are 1, and judging the software to be illegal software if not all the numerical values are 1.
By the method, the white list data is inquired by using the bloom filter, and the vast majority of illegal software can be quickly filtered through a section of binary vector, so that the efficiency of software filtering inquiry is improved.
Preferably, the tree-shaped white list data index database comprises a root node, a child node and a leaf node which are connected in a tree-shaped data structure; the leaf node is connected with a white list data linked list, and the white list data linked list comprises fingerprint data and access frequency data of software, wherein the fingerprint data and the access frequency data of the software are the same as the hash characteristic value of the index path corresponding to the leaf node; the access time data is used for recording the time of the software fingerprint data being inquired.
According to the method, the white list data chain table stores the fingerprint data of the white list software and the access times of the fingerprint data, the access times data are used for recording the times of inquiring the software fingerprint data, the hash characteristic values of the software fingerprint data correspond to the fingerprint data of the software one by one, and the access times data are used for judging whether the white list software is frequently accessed software fingerprint data or not, so that the white list software is conveniently issued to the client side cache white list data, and a data source is provided for the client side cache white list data.
Preferably, in the tree-shaped white list data index database, the data of the child nodes and the leaf nodes are determined based on hash characteristic values of software in a white list; according to the depth of the tree-shaped data structure, the hash characteristic value of the software listed in a white list is divided into segments corresponding to the child nodes and the leaf nodes; and sequentially stores the segmented data in the corresponding nodes.
By the method, in order to avoid the influence of the misjudgment of the bloom filter on the white list control, the tree-shaped white list data index database is designed and constructed based on the Hash characteristics generated by software, and the storage, the quick index and the query of the white list data can be realized.
Preferably, the white list bloom filter data is obtained by the following steps: acquiring fingerprint data of software in the white list; calculating the fingerprint data of the software respectively aiming at N different hash functions to obtain N different index values; setting the numerical value of the position corresponding to the index value in the binary number group of the bloom filter as 1, setting the numerical values of other positions in the binary number group of the bloom filter as 0, setting the binary values of the corresponding positions of the array corresponding to the index value as 1, and setting the binary values of other positions as 0 to obtain the hash characteristic value of the software in the white list; and taking or obtaining the white list bloom filter data by the hash characteristic value of the software in the white list and the old value of the bloom filter according to bits.
By the method, efficient insertion of the white list data of the bloom filter is realized based on the probability type data structure of the binary vector.
Preferably, matching the fingerprint data of the suspected legal software with a tree white list data index database in a server includes:
acquiring fingerprint data of the suspected legal software and acquiring a hash characteristic value of the suspected legal software based on the fingerprint data of the suspected legal software;
according to the depth of the tree-shaped data structure, the hash characteristic value of suspected legal software is divided into segments corresponding to the child nodes and the leaf nodes;
when all the segments of the suspected legal software are not successfully matched with any one of the corresponding child nodes and leaf nodes in the tree-shaped white list data index database, judging the suspected legal software to be illegal software; when all the segments of the suspected legal software can be successfully matched with any one of the corresponding child nodes and leaf nodes in the tree-shaped white list data index database, matching the fingerprint data of the suspected legal software with each piece of fingerprint data under the corresponding leaf node; and when the fingerprint data is successfully matched, judging the suspected legal software to be legal software, and when the fingerprint data is not successfully matched, judging the suspected legal software to be illegal software.
By the method, for software with low hit access times of the bloom filter, quick indexing and matching of the white list are performed in the white list server based on the hash value, so that the misjudgment rate of the bloom filter can be eliminated.
Preferably, when all segments of the suspected legal software hash characteristic values are matched with corresponding child nodes and leaf nodes in the tree-shaped white list data index base, the suspected legal software hash characteristic values are sequentially matched with the data of the child nodes and the leaf nodes, and when any node data is not successfully matched, the node which is not matched with the next level is directly judged.
By the method, the follow-up matching query process is interrupted in time at the first time when the matching result appears, and the efficiency of white list query is improved.
Preferably, when the fingerprint data is successfully matched, adding 1 to the access frequency data to obtain new access frequency data; and when the new access times data is greater than or equal to a preset threshold value, the fingerprint data successfully matched is issued to a white list cache of the terminal.
By the method, only the software data with the access times larger than or equal to the threshold value is sent to the white list cache of the terminal, instead of caching the whole white list data to the terminal, and the problem of high storage occupation in the client is solved.
Preferably, the data in the terminal white list bloom filter comprises data from a bloom filter in the server.
By the method, the white list software is filtered in the bloom filter at the client, so that the feedback speed of the query result is increased, and the query efficiency is improved.
Preferably, the fingerprint data of the software includes at least a name, a version number, and an MD5 check feature value of the software.
By the method, the necessary technical information of the software is contained in the fingerprint data of the software.
Compared with the prior art, the software white list quick query method based on the bloom filter is characterized in that the software which is successfully matched with the white list bloom filter is continuously matched and queried with the data in the white list data cache or the tree-shaped white list data index library, so that misjudgment of the bloom filter is eliminated; through a tree-shaped white list data index database of the server side and a white list data cache query mechanism of the client side, the server side can store a full set of a white list database, and a frequently accessed white list data set is stored in a client side cache, so that balance between white list query efficiency and storage occupation is realized.
Drawings
In order to more clearly illustrate the embodiments of the present specification or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments described in the embodiments of the present specification, and other drawings can be obtained by those skilled in the art according to these drawings.
Fig. 1 is a flowchart of a software white list query method based on a bloom filter according to this embodiment;
fig. 2 is a structure diagram for acquiring white list data of software based on a bloom filter according to this embodiment;
fig. 3 is a flowchart for acquiring software white list data based on a bloom filter according to this embodiment;
fig. 4 is a software white list data storage structure diagram based on a tree white list data index database according to this embodiment;
fig. 5 is a flowchart of storing software white list data based on a tree white list data index library according to this embodiment.
Reference numerals:
001. a child node; 002. a leaf node; 003. a white list data link table; 004. the hash characteristic value of the software fingerprint data; 005. bloom filter old value; 006. the bloom filter new value.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are some embodiments of the present application, but not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
For the purpose of facilitating understanding of the embodiments of the present application, the following detailed description will be given with reference to the accompanying drawings, which are not intended to limit the embodiments of the present application.
The specific embodiment provides a software white list query method based on a bloom filter, in the prior art, black list lookup based on the bloom filter is applied, but the bloom filter is adopted in the black list lookup and the white list lookup is different, most of white list software can be rapidly determined based on the bloom filter of the black list software, illegal software or white list software needs to be further determined for a hit request, installation requests of a large amount of illegal software cannot be prevented at the first time, the efficiency is low, in the scheme, most of illegal software can be rapidly determined based on the bloom filter of the white list, the installation requests of a large amount of illegal software can be prevented in time to ensure the safety of a terminal, the response speed of a client is improved, the small amount of suspected illegal software which possibly exists can be determined by client-side cache and server-side white list database hierarchical query, misjudgment of the bloom filter can be eliminated, and rapid response to most of illegal software requests can be ensured.
Storing the white list data in the bloom filter can quickly filter the non-white software which is not in the white list, but because the bloom filter has a certain probability of misjudgment, whether the software fingerprint is really in the white list needs to be further judged for the software which hits in the bloom filter so as to ensure the accuracy of the white list judgment, and in the judgment of the white list, if the fingerprint information of the software is completely compared with the database, the accuracy is greatly improved, but the efficiency is insufficient, so that the specific implementation mode provides the software white list query method based on the bloom filter, and the efficiency and the accuracy of the white list lookup are considered through the method.
Specifically, the software white list query method based on the bloom filter according to the specific embodiment aims to eliminate misjudgment caused by the bloom filter and solve the problems that the client query storage occupation is high and the white list query matching efficiency is low due to the fact that the client caches the white list data corpus in order to eliminate the misjudgment of the bloom filter.
In order to illustrate the technical solution in this specific embodiment, a white list data of the software M to be determined is taken as an example to illustrate a software white list query method based on a bloom filter related thereto.
Specifically, a flowchart of a software white list query method based on a bloom filter according to the present embodiment is shown in fig. 1. The method comprises the following steps:
step S1, matching fingerprint data of software with data in a terminal white list bloom filter, and if matching is successful, judging the software to be suspected legal; and if the matching is unsuccessful, judging that the software belongs to illegal software.
Wherein, the fingerprint data of the software comprises data with software name information; the terminal white list bloom filter is a white list bloom filter applied to a terminal, the bloom filter is a probability type data structure based on a binary vector, can support efficient insertion and query, can be used for determining that a certain piece of data does not exist or possibly exists in a set, and the terminal white list bloom filter is used for quickly filtering illegal software based on a white list.
Preferably, the fingerprint data of the software includes at least a name, a version number, and an MD5 check feature value of the software.
Taking the software M to be judged as an example, the fingerprint data D (M) of the software M to be judged is imported at the client, and the fingerprint data D (M) of the software M to be judged is a string of character strings including the name, version number, MD5 check and other multidimensional characteristic values of the software M to be judged.
Preferably, the data in the terminal white list bloom filter comprises data from a bloom filter in the server.
The terminal white list bloom filter is issued in advance from the server side, and the query based on the white list software is carried out on the terminal white list bloom filter, so that the response speed of the query can be increased, and the query efficiency is improved.
Preferably, the white list bloom filter data is obtained by the following steps:
acquiring fingerprint data of software in the white list; calculating the fingerprint data of the software respectively aiming at N different hash functions to obtain N different index values; setting the numerical value of the position corresponding to the index value in the binary number group of the bloom filter as 1, setting the numerical values of other positions in the binary number group of the bloom filter as 0, setting the binary values of the corresponding positions of the array corresponding to the index value as 1, and setting the binary values of other positions as 0 to obtain the hash characteristic value of the software in the white list; and taking or obtaining the white list bloom filter data by the hash characteristic value of the software in the white list and the old value of the bloom filter according to bits.
The bloom filter is a probability type data structure based on binary vectors and supports efficient insertion and query of software data. The hash characteristic values of all white list software are stored in the white list bloom filter, and a query basis is provided for the rapid filtering of illegal software.
Taking the white list software K stored in the bloom filter as an example, a software white list data acquisition structure diagram based on the bloom filter is shown in fig. 2, and a software white list data acquisition flow diagram based on the bloom filter is shown in fig. 3, and the method comprises the following steps:
and step S001, importing the fingerprint data of the white list software K into the server, wherein the fingerprint data of the white list software K is a string of character strings and comprises multidimensional characteristic values such as the name, the version number, the MD5 check and the like of the software M to be judged.
Step S002, the fingerprint data of the white list software K is processed by 4 hash functions (H _1, H _2, H _3and H _ 4) to obtain 4 index values H1, H2, H3 and H4.
And S003, setting the binary vector length of the bloom filter to be 16, setting the binary values of the positions h1, h2, h3 and h4 in the bloom filter to be 1, and setting other positions to be 0 to obtain the hash feature HF (K) of the software M to be judged.
And step S004, performing bitwise OR on the hash characteristic HF (K) of the white list software K and the Old value BF _ Old of the bloom filter to obtain a New value BF _ New of the updated bloom filter.
The list bloom filter containing the K hash characteristic value of the white list software is obtained through the steps.
Matching fingerprint data of software with data in a terminal white list bloom filter, and if the matching is successful, judging the software to be suspected legal; and if the matching is unsuccessful, judging that the software belongs to illegal software. By matching the fingerprint data of the software with the data in the terminal white list bloom filter, the filtering of illegal software can be quickly realized, and the invasion of the illegal software is prevented.
Preferably, matching the fingerprint data of the software with data in a white list bloom filter of the terminal includes: calculating the fingerprint data of the software respectively aiming at N different hash functions to obtain N different index values; and judging whether the numerical values of N positions corresponding to the N different index values in the binary number group of the bloom filter are all 1, judging the software to be suspected legal software when all the numerical values are 1, and judging the software to be illegal software if not all the numerical values are 1.
The bloom filter is used for inquiring the white list data, the rapid filtering of most illegal software can be realized through a section of binary vector, and the efficiency of software filtering inquiry is improved.
Taking the software M to be judged as an example, the fingerprint data D (M) of the software M to be judged is imported at the client, and the fingerprint data D (M) of the software M to be judged is processed by 4 hash functions (H _1, H_2, H _3and H _ 4) to obtain 4 index values H1, H2, H3 and H4.
Searching whether binary values of h1, h2, h3 and h4 positions are all 1 in a white list bloom filter of the client, if so, indicating that the software is suspected legal software, and executing a step S2; if not all the data are 1, the software is judged to be illegal software, the query is returned to fail, and the operation is finished.
S2, matching the fingerprint data of the suspected legal software with the fingerprint data of the white list software in the white list data cache of the client; if the matching is successful, executing the step S3; if the matching is not successful, step S4 is executed.
When the tree-shaped white list data index base of the server side inquires the fingerprint data of the software, if the access times of the fingerprint data of the software are larger than or equal to a preset threshold value, the fingerprint data of the software are issued to the white list cache of the client side.
Only the fingerprint data of the white list software with the access times larger than the threshold value is issued to the client cache instead of the fingerprint data of all the white list software, so that the occupation amount stored in the client can be reduced.
The fingerprint data of the suspected legal software is matched with the fingerprint data of the white list software in the white list data cache of the client, whether the suspected legal software is really in a white list or not is further judged, the white list software data with a large number of accesses is cached in the client, and whether the suspected legal software is in the white list software which is frequently inquired or not can be judged through matching and searching.
Taking the software M to be judged as an example, searching whether fingerprint data D (M) of the software M to be judged exists in a white list data cache of the client, if so, indicating that the software is legal, executing a step S3, returning to the step S, and ending; if not, the software is determined to be a suspected legal software, and step S4 is executed.
And S3, if the matching is successful, judging the suspected legal software to be legal software.
And S4, if the matching is unsuccessful, matching the fingerprint data of the suspected legal software with a tree-shaped white list data index database in the server, if the matching is successful, judging the suspected legal software to be legal software, and if the matching is unsuccessful, judging the suspected legal software to be illegal software.
The tree-shaped white list data index database in the server comprises a root node, child nodes, leaf nodes and a white list data linked list, and is used for storing data of white list software.
Preferably, the tree-shaped white list data index database comprises a root node, a child node and a leaf node which are connected in a tree-shaped data structure; the leaf node is connected with a white list data linked list, and the white list data linked list comprises fingerprint data and access frequency data of software, wherein the fingerprint data and the access frequency data of the software are the same as the hash characteristic value of the index path corresponding to the leaf node; the access time data is used for recording the time of the software fingerprint data being inquired.
The white list data chain stores fingerprint data of white list software and access times of the fingerprint data, the access times data are used for recording the times of software fingerprint data being inquired, the hash characteristic values of the software fingerprint data correspond to the fingerprint data of the software one to one, and the access times data are used for judging whether the white list software is frequently accessed software fingerprint data or not, so that the white list software can be conveniently issued to a client side to cache the white list data, and a data source is provided for the client side to cache the white list data.
Preferably, in the tree-shaped white list data index database, the data of the child nodes and the leaf nodes are determined based on hash characteristic values of software in a white list; according to the depth of the tree-shaped data structure, the hash characteristic value of the software listed in a white list is divided into segments corresponding to the child nodes and the leaf nodes; and sequentially stores the segmented data in the corresponding nodes.
In order to avoid the influence of the misjudgment of the bloom filter on the white list control, a tree-shaped white list data index database is designed and constructed based on the Hash characteristics generated by software, and the storage, the quick index and the query of the white list data can be realized.
Taking the white list software K as an example, a software white list data storage structure diagram based on the tree-shaped white list data index database is shown in fig. 4, and a software white list data storage flow diagram based on the tree-shaped white list data index database is shown in fig. 5, and the method comprises the following steps:
step S101, the hash characteristic value HF (K) of the white list software K is obtained.
In step S102, the hash feature value HF (K) of the white list software K is divided into HF (K) [0 ].
Step S103, matching HF (K) [0 ] in the index tree at layer 1 (the root node is empty, and layer 0), and if the matching is successful, executing step S104; if the matching fails, add an HF (K) [0 ] node at layer 1, and generate HF (K) [4 ], HF (K) [8 ], HF (K) [12 ] node at layers 2 to 4 below the node, and execute step S107.
Step S104, matching the layer 2 of HF (K) [4 ] under the HF (K) [0 ] node in the index tree, and if the matching is successful, executing step S105; if the matching fails, add an HF (K) [4 ] node at layer 2, and generate an HF (K) [8 ], an HF (K) [12 ] node at layers 3-4 below the node in turn, and execute step S107.
Step S105, matching HF (K) [8 ] in the index tree at the layer 3 below the HF (K) [4 ] node, and if the matching is successful, executing step S106; if the matching fails, add an HF (K) [8 ] node at layer 3, and generate an HF (K) [12 ] node at layer 4 below the node in turn, and execute step S107.
Step S106, matching HF (K) [12 ] in the index tree at the 4 th layer below the HF (K) [8 ] node, and if the matching is successful, executing step S107; if the matching fails, an HF (K) [12 ] node is generated at layer 4 below the HF (K) [8 ] node, and step S107 is performed.
Step S107, storing white list data with the same hash characteristics in a white list data linked list under the HF (K) [12 ] leaf nodes, sequentially matching each node on the link by using the white list data D (K), and if the matching is successful, indicating that the white list data is stored in a library; if the matching fails, step S108 is executed.
Step S108, a node is created at the tail part of the white list data linked list of the HF (K) [12 ] leaf node, the white list data D (K) is filled, the access frequency is set to be 0, and finally the node is linked to the white list data linked list.
Through the steps, the tree-shaped white list data index library containing the hash characteristic value of the white list software K and the fingerprint data of the white list software K is obtained.
Matching the fingerprint data of the suspected legal software with a tree-shaped white list data index database in a server, if the matching is successful, judging the suspected legal software to be legal software, and if the matching is unsuccessful, judging the suspected legal software to be illegal software.
Matching and inquiring suspected legal software which is not hit by a white list bloom filter in terminal cache white list data with a tree-shaped white list data index library, firstly searching and matching hash characteristic values of the suspected legal software with hash characteristic values of the white list software stored in child nodes and leaf nodes of the tree-shaped white list data index library, and if the matching is unsuccessful, determining that the suspected legal software is illegal software; if the matching is successful, the fingerprint data of the suspected legal software and the fingerprint data in the white list data chain table are searched and matched, if the matching is successful, the suspected legal software is judged to be legal software, and if the matching is unsuccessful, the suspected legal software is illegal software.
And for suspected legal software which is not hit by the white list bloom filter in the terminal cache white list data, fast indexing and matching of the white list are carried out in the white list server based on the hash value, so that the misjudgment rate of the bloom filter can be eliminated.
Preferably, matching the fingerprint data of the suspected legal software with a tree white list data index database in a server includes:
acquiring fingerprint data of the suspected legal software and acquiring a hash characteristic value of the suspected legal software based on the fingerprint data of the suspected legal software;
according to the depth of the tree-shaped data structure, the hash characteristic value of suspected legal software is divided into segments corresponding to the child nodes and the leaf nodes;
when all the segments of the suspected legal software are not successfully matched with any one of the corresponding child nodes and leaf nodes in the tree-shaped white list data index database, judging the suspected legal software to be illegal software; when all the segments of the suspected legal software can be successfully matched with any one of the corresponding child nodes and leaf nodes in the tree-shaped white list data index database, matching the fingerprint data of the suspected legal software with each piece of fingerprint data under the corresponding leaf node; and when the fingerprint data match is successful, judging the suspected legal software to be legal software, and when the fingerprint data match is unsuccessful, judging the suspected legal software to be illegal software.
Taking the software M to be judged as an example, matching the fingerprint data of the software M to be judged with the tree-shaped white list data index database in the server.
The client submits the fingerprint data D (M) of the software M to be judged and the Hash characteristic HF (M) of the software M to be judged to a white list server side, and a white list query request is initiated.
The white list server divides the hash feature HF (M) of the software M to be judged into HF (M) [0 ], [4 ]; and if the matching is successful, finding the corresponding leaf node.
In a white list data linked list hung below a leaf node, sequentially matching software fingerprint data in each node by using fingerprint data D (M) of software M to be judged, if the matching is successful, adding 1 to an access frequency counter on the node successfully matched, judging whether the access frequency is greater than or equal to a threshold value P, if the access frequency is greater than or equal to the threshold value P, issuing { fingerprint data D (M) of the software M to be judged } to a white list cache of a client, and returning to the inquiry success; if the query is less than the preset threshold, only returning the query success to the terminal agent; if the matching fails, the software is judged to be illegal software, the query failure is returned to the client side, and the method is ended.
Preferably, when all segments of the suspected legal software hash characteristic values are matched with corresponding child nodes and leaf nodes in the tree-shaped white list data index base, the suspected legal software hash characteristic values are sequentially matched with the data of the child nodes and the leaf nodes, and when any node data is not successfully matched, the node which is not matched with the next level is directly judged.
The follow-up matching query process is interrupted in time at the first time when the matching result appears, and the efficiency of white list query is improved.
Taking the software M to be judged as an example, the white list server divides the hash feature HF (M) of the software M to be judged into HF (M) [0 ]; if the matching is successful, finding the corresponding leaf node.
Preferably, when the fingerprint data is successfully matched, adding 1 to the access frequency data to obtain new access frequency data; and when the new access times data is greater than or equal to a preset threshold value, the fingerprint data successfully matched is issued to a white list cache of the terminal.
Only the software data with the access times larger than or equal to the threshold is sent to the white list cache of the terminal, instead of caching the white list data to the terminal in a whole set, and the problem of high storage occupation in the client is solved.
Taking the software M to be judged as an example, sequentially matching the software fingerprint data in each node by using the fingerprint data D (M) of the software M to be judged, if the matching is successful, adding 1 to an access time counter on the node which is successfully matched, judging whether the access time is greater than or equal to a threshold value P, if the access time is greater than or equal to the threshold value P, issuing { the fingerprint data D (M) of the software M to be judged } to a white list cache of a client, and returning to the inquiry success; if the query is less than the preset threshold, only returning the query success to the terminal agent; if the matching fails, the software is judged to be illegal software, the query failure is returned to the client side, and the method is ended.
The above-mentioned embodiments, objects, technical solutions and advantages of the present application are described in further detail, it should be understood that the above-mentioned embodiments are merely exemplary embodiments of the present application, and are not intended to limit the scope of the present application, and any modifications, equivalent substitutions, improvements and the like made within the spirit and principle of the present application should be included in the scope of the present application.

Claims (10)

1. A bloom filter-based software white list query method is characterized by comprising the following steps:
matching fingerprint data of software with data in a terminal white list bloom filter, and if the matching is successful, judging the software to be suspected legal; if the matching is unsuccessful, judging that the software belongs to illegal software;
matching the fingerprint data of the suspected legal software with the fingerprint data of the white list software in the white list data cache of the terminal; if the matching is successful, judging the suspected legal software to be legal software; if the matching is unsuccessful, matching the fingerprint data of the suspected legal software with the software index information and the fingerprint information in the tree-shaped white list data index database in the server, if the matching is successful, judging the suspected legal software to be legal software, and if the matching is unsuccessful, judging the suspected legal software to be illegal software.
2. The method for querying the white list of software based on the bloom filter as claimed in claim 1, wherein matching the fingerprint data of the software with the data in the white list bloom filter of the terminal comprises:
calculating the fingerprint data of the software respectively aiming at N different hash functions to obtain N different index values;
and judging whether the numerical values of N positions corresponding to the N different index values in the binary number group of the bloom filter are all 1, judging the software to be suspected legal software when all the numerical values are 1, and judging the software to be illegal software if not all the numerical values are 1.
3. The method for querying the software white list based on the bloom filter as claimed in claim 1, comprising:
the tree-shaped white list data index database comprises a root node, a child node and a leaf node which are connected in a tree-shaped data structure; the leaf node is connected with a white list data linked list, and the white list data linked list comprises fingerprint data and access frequency data of software with the same hash characteristic value in an index path corresponding to the leaf node; the access time data is used for recording the time of the software fingerprint data being inquired.
4. The bloom filter-based software white list query method according to claim 1, wherein in the tree-shaped white list data index library, the data of child nodes and leaf nodes are determined based on hash feature values of software in a white list;
wherein the content of the first and second substances,
according to the depth of the tree-shaped data structure, the hash characteristic value of the software listed in a white list is divided into segments corresponding to the child nodes and the leaf nodes; and sequentially stores the segmented data in the corresponding nodes.
5. The method of claim 1, wherein the white list bloom filter data is obtained by the following steps:
acquiring fingerprint data of software in the white list;
calculating the fingerprint data of the software respectively aiming at N different hash functions to obtain N different index values;
setting the value of the position corresponding to the index value in the binary array of the bloom filter to be 1, setting the values of other positions in the binary array of the bloom filter to be 0, setting the binary values of the corresponding positions of the array corresponding to the index value to be 1, and setting the binary values of other positions to be 0 to obtain the hash characteristic value of the software in the white list;
and taking the hash characteristic value of the software in the white list and the old value of the bloom filter according to bits or obtaining the white list bloom filter data.
6. The method as claimed in claim 3, wherein the step of matching the fingerprint data of the suspected legal software with the tree white list data index database in the server comprises:
acquiring fingerprint data of the suspected legal software and acquiring a hash characteristic value of the suspected legal software based on the fingerprint data of the suspected legal software;
according to the depth of the tree data structure, the hash characteristic value of suspected legal software is divided into segments corresponding to the child nodes and the leaf nodes;
when all the segments of the suspected legal software are not successfully matched with any one of the corresponding child nodes and leaf nodes in the tree-shaped white list data index database, judging the suspected legal software to be illegal software; when all the segments of the suspected legal software can be successfully matched with any one of the corresponding child nodes and leaf nodes in the tree-shaped white list data index database, matching the fingerprint data of the suspected legal software with each piece of fingerprint data under the corresponding leaf node; and when the fingerprint data match is successful, judging the suspected legal software to be legal software, and when the fingerprint data match is unsuccessful, judging the suspected legal software to be illegal software.
7. The bloom filter-based software white list query method according to claim 6, wherein when all segments of the suspected legal software hash feature values are matched with corresponding child nodes and leaf nodes in the tree white list data index library, the suspected legal software hash feature values are sequentially matched with the data of the child nodes and the leaf nodes, and when any node data fails to be successfully matched, the suspected legal software hash feature values are directly judged to be unmatched and are not matched with the next-level node any more.
8. The bloom filter-based software white list query method according to claim 6, wherein when the fingerprint data match is successful, the access time data is further added by 1 to obtain new access time data; and when the new access times data is greater than or equal to a preset threshold value, the fingerprint data successfully matched is issued to a white list cache of the terminal.
9. The bloom filter-based software white list query method of claim 1, wherein the data in the terminal white list bloom filter comprises data from a bloom filter in the server.
10. The method as claimed in claim 1, wherein the fingerprint data of the software at least includes a name, a version number and an MD5 check feature value of the software.
CN202310061459.1A 2023-01-13 2023-01-13 Software white list query method based on bloom filter Active CN115827702B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310061459.1A CN115827702B (en) 2023-01-13 2023-01-13 Software white list query method based on bloom filter

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310061459.1A CN115827702B (en) 2023-01-13 2023-01-13 Software white list query method based on bloom filter

Publications (2)

Publication Number Publication Date
CN115827702A true CN115827702A (en) 2023-03-21
CN115827702B CN115827702B (en) 2023-05-16

Family

ID=85520791

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310061459.1A Active CN115827702B (en) 2023-01-13 2023-01-13 Software white list query method based on bloom filter

Country Status (1)

Country Link
CN (1) CN115827702B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116707901A (en) * 2023-06-07 2023-09-05 中国人民解放军61660部队 Asset identification method and system based on Web snapshot

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080201772A1 (en) * 2007-02-15 2008-08-21 Maxim Mondaeev Method and Apparatus for Deep Packet Inspection for Network Intrusion Detection
US8306988B1 (en) * 2009-10-26 2012-11-06 Mcafee, Inc. System, method, and computer program product for segmenting a database based, at least in part, on a prevalence associated with known objects included in the database
CN104850783A (en) * 2015-04-30 2015-08-19 中国人民解放军国防科学技术大学 Method and system for cloud detection of malicious software based on Hash characteristic matrix

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080201772A1 (en) * 2007-02-15 2008-08-21 Maxim Mondaeev Method and Apparatus for Deep Packet Inspection for Network Intrusion Detection
US8306988B1 (en) * 2009-10-26 2012-11-06 Mcafee, Inc. System, method, and computer program product for segmenting a database based, at least in part, on a prevalence associated with known objects included in the database
CN104850783A (en) * 2015-04-30 2015-08-19 中国人民解放军国防科学技术大学 Method and system for cloud detection of malicious software based on Hash characteristic matrix

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116707901A (en) * 2023-06-07 2023-09-05 中国人民解放军61660部队 Asset identification method and system based on Web snapshot

Also Published As

Publication number Publication date
CN115827702B (en) 2023-05-16

Similar Documents

Publication Publication Date Title
US8521696B2 (en) Structure of an alternative evaluator for directory operations
US20050187898A1 (en) Data Lookup architecture
CN115827702A (en) Software white list query method based on bloom filter
CN115269631A (en) Data query method, data query system, device and storage medium
CN111930924A (en) Data duplicate checking system and method based on bloom filter
CN109246102B (en) System and method for supporting large-scale authentication data rapid storage and retrieval
CN113946587A (en) Handle identifier analysis caching method, query method and handle identifier analysis system
US7676457B2 (en) Automatic index based query optimization
US7464226B2 (en) Fractional caching
CN111061972B (en) AC searching optimization method and device for URL path matching
CN113821630A (en) Data clustering method and device
CN112199396B (en) Industrial Internet identification query method and system facing MES
CN113839940B (en) URL pattern tree-based defense method, device, electronic equipment and readable storage medium
CN115495462A (en) Batch data updating method and device, electronic equipment and readable storage medium
CN114416741A (en) KV data writing and reading method and device based on multi-level index and storage medium
CN111858609A (en) Fuzzy query method and device for block chain
US8583596B2 (en) Multi-master referential integrity
US11966393B2 (en) Adaptive data prefetch
CN109165220B (en) Data matching calculation method
CN109947775B (en) Data processing method and device, electronic equipment and computer readable medium
CN107204927B (en) Information searching method based on name splitting in ICN network
CN115952328A (en) Method for quickly matching longest suffix of Domain Name System (DNS)
CN116627989A (en) Information query method and related device for topic subscription relation
CN116610713A (en) Terminal contact person data authorization acquisition method and device, computer equipment and medium
CN117807113A (en) Data query method, device, equipment and medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant