CN115827702B - Software white list query method based on bloom filter - Google Patents

Software white list query method based on bloom filter Download PDF

Info

Publication number
CN115827702B
CN115827702B CN202310061459.1A CN202310061459A CN115827702B CN 115827702 B CN115827702 B CN 115827702B CN 202310061459 A CN202310061459 A CN 202310061459A CN 115827702 B CN115827702 B CN 115827702B
Authority
CN
China
Prior art keywords
software
data
white list
suspected
bloom filter
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202310061459.1A
Other languages
Chinese (zh)
Other versions
CN115827702A (en
Inventor
严锦立
荣星
王平
吴流丽
廖建华
黄河
李彦琛
毛建辉
张永星
季伟
王耀
刘筱明
袁建国
张子文
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
UNIT 61660 OF PLA
Original Assignee
UNIT 61660 OF PLA
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by UNIT 61660 OF PLA filed Critical UNIT 61660 OF PLA
Priority to CN202310061459.1A priority Critical patent/CN115827702B/en
Publication of CN115827702A publication Critical patent/CN115827702A/en
Application granted granted Critical
Publication of CN115827702B publication Critical patent/CN115827702B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Abstract

The disclosure relates to a bloom filter-based software whitelist query method, comprising the following steps: matching the fingerprint data of the software with the data in the terminal white list bloom filter, and if the matching is successful, judging that the software is suspected legal software; if the matching is unsuccessful, judging that the software belongs to illegal software; matching the fingerprint data of the suspected legal software with the fingerprint data of the white list software in the terminal white list data cache; if the matching is successful, judging the suspected legal software as legal software; if the matching is unsuccessful, the fingerprint data of the suspected legal software is matched with a tree-shaped white list data index database in the server, if the matching is successful, the suspected legal software is judged to be legal software, and if the matching is unsuccessful, the suspected legal software is illegal software, through the scheme, the misjudgment of a bloom filter is eliminated, and meanwhile, the trade-off between white list inquiring efficiency and memory occupation is realized.

Description

Software white list query method based on bloom filter
Technical Field
The disclosure belongs to the technical field of databases, and particularly relates to a software white list query method based on a bloom filter.
Background
With the rapid development of networks and information technologies, the scale and complexity of terminal software are continuously expanding and increasing, and realizing the safety control of the execution of the terminal software is a key for ensuring the safety of the terminal and the whole network. In the fields of industrial control, aerospace and the like, the safety of terminal software can directly influence whether the whole calculation control can be normally executed. In these critical facilities, the operation of malware can be a significant penalty and expense. Therefore, the adoption of software management and control mechanisms based on whitelists is very important in these fields.
The existing white name single management system adopts a client-server architecture, and a server can pre-issue a huge white list library in a client. When the client runs the program, the client can index in the white list base based on the collected software fingerprint information. The white list management and control method can realize white list management and control, but has the problems of high white list resource occupation and low query efficiency. Therefore, the occupation of white list resources of the client is reduced, and the improvement of the query efficiency of the white list is the key of the white list management system.
Bloom filters are a probabilistic data structure based on binary vectors that can support efficient interpolation and querying, and can be used to determine that a piece of data must not exist or may exist in a collection. Based on this feature, bloom filters are typically used as black lists, e.g., in DNS query [1], and legitimate DNS requests can be quickly filtered based on the black list. However, the bloom filter has a certain misjudgment rate, so that for a hit request, further inquiry is needed to be performed on an illegal or legal request. In summary, in the large-scale data filtering scene, the bloom filter has the advantage of efficiently querying the data meeting the conditions, but has a certain misjudgment rate.
In the white list software management and control system, the characteristic of the bloom filter has great application advantages, and the key problem to be solved is to eliminate misjudgment caused by the bloom filter. The principle of bloom filter storing white list data is shown in fig. 2, and the bloom filter has a binary vector with a length of 16 bits. In the process of updating the bloom filter, 4 outputs are obtained by processing software fingerprint data as input through 4 hash functions, each output result is an index position in a binary vector, the values of the 4 index positions are set to be 1, and the values of other 12 positions are set to be 0 to generate the hash characteristic of the software. Taking or after the software hash feature and the bloom filter old value, the bloom filter inserted into the white list data can be obtained. When a software fingerprint is inquired, 4 hash functions are used as input to obtain 4 outputs, whether the 4 index positions are 1 is judged in a binary vector, and if not, the software is illegal; if yes, the software is the suspected legal software, because the bloom filter has certain misjudgment.
For software hitting in the bloom filter, it is necessary to further determine if the software fingerprint is indeed in the whitelist. And caching the white list data complete set on the client, and judging whether the suspected legal software is legal or illegal by matching and searching one by one. However, this approach has the problems of high storage occupancy in the client and low white list query matching efficiency. It is therefore critical how to reduce the occupation of white list data in the client and how to reduce the query efficiency of the white list.
Disclosure of Invention
The present disclosure is made based on the above-mentioned needs of the prior art, and the technical problem to be solved by the present disclosure is to provide a bloom filter-based software whitelist query method, so as to improve the efficiency of judgment and eliminate erroneous judgment in the judgment of whitelist software.
In order to solve the above problems, the technical solution provided by the present disclosure includes:
the software white list inquiring method based on the bloom filter is characterized in that fingerprint data of software is matched with data in the terminal white list bloom filter, and if the matching is successful, the software is judged to be suspected legal software; if the matching is unsuccessful, judging that the software belongs to illegal software;
matching the fingerprint data of the suspected legal software with the fingerprint data of the white list software in the terminal white list data cache; if the matching is successful, judging the suspected legal software as legal software; and if the matching is unsuccessful, matching the fingerprint data of the suspected legal software with the software index information and the fingerprint information of the tree-shaped white list data index library in the server, if the matching is successful, judging the suspected legal software as legal software, and if the matching is unsuccessful, judging the suspected legal software as illegal software.
By the method, software successfully matched with the bloom filter of the white list is continuously subjected to matching query with data in the white list data cache or with the tree-shaped white list data index database, so that misjudgment of the bloom filter is eliminated; the efficiency and accuracy of the judgment can be improved by confirming the white list data under the condition of not calling the server through the judgment of the client cache; and finally, by the tree-shaped white list data index database of the server side and the white list data cache inquiring mechanism of the client side, a small part of white list software needing complete verification can be finally confirmed. According to the technical scheme, advantages of the bloom filter, the client side cache and the server are reasonably coordinated, the bloom filter is used for rapidly judging whether software belongs to non-white list software, a frequently accessed white list data set stored in the client side cache is used for rapidly confirming whether software reserved after screening by the bloom filter is white list software, a server side can store the whole set of a white list library, whether the software to be judged is white list software or not is further completely confirmed under the condition that omission exists in the client side cache, and therefore trade-off is achieved between white list inquiring efficiency and storage occupation, inquiring efficiency is improved, and misjudgment is eliminated.
Preferably, matching the fingerprint data of the software with the data in the white list bloom filter of the terminal includes: respectively calculating fingerprint data of the software aiming at N different hash functions to obtain N different index values; judging whether the numerical values of N positions corresponding to the N different index values in the binary number group of the bloom filter are all 1, judging that the software is suspected legal software when the numerical values are all 1, and judging that the software is illegal software when the numerical values are not all 1.
According to the method, the bloom filter is used for inquiring the white list data, and the vast majority of illegal software can be quickly filtered through a section of binary vector, so that the software filtering inquiring efficiency is improved.
Preferably, the tree-shaped white list data index base comprises a root node, a child node and a leaf node which are connected in a tree-shaped data structure; the leaf nodes are connected with a white list data link list, and the white list data link list comprises fingerprint data and access times data of software with the same hash characteristic value as that of an index path corresponding to the leaf nodes; the access times data is used for recording the times of the software fingerprint data being queried.
By the method, fingerprint data of the white list software and access times of the fingerprint data are stored in the white list data link list, the access times are used for recording the times of inquiring the software fingerprint data, hash characteristic values of the software fingerprint data correspond to the software fingerprint data one by one, the access times are used for judging whether the white list software is frequently accessed software fingerprint data or not, and the data are convenient to issue to the client side cache white list data and provide data sources for the client side cache white list data.
Preferably, in the tree-shaped white list data index base, the data of the child nodes and the leaf nodes are determined based on hash characteristic values of software in a white list; the hash characteristic value of the software listed in the white list is segmented into segments corresponding to the child nodes and the leaf nodes according to the depth of the tree data structure; and sequentially stores the segmented data in the corresponding nodes.
According to the method, in order to avoid the influence of false judgment of the bloom filter on white name single management, a tree-shaped white list data index base is constructed based on hash feature design generated by software, and storage, rapid indexing and query of white list data can be realized.
Preferably, the white list bloom filter data is obtained by: acquiring fingerprint data of software in the white list; respectively calculating fingerprint data of the software aiming at N different hash functions to obtain N different index values; setting the value of the position corresponding to the index value in the binary number group of the bloom filter as 1, setting the values of other positions in the binary number group of the bloom filter as 0, setting the binary values of the corresponding positions of the array corresponding to the index value as 1, setting the binary values of other positions as 0, and obtaining the hash characteristic values of the software in the white list; and taking or obtaining the white list bloom filter data by the hash characteristic value of the software in the white list and the bloom filter old value according to the bit.
By the method, the probability type data structure based on the binary vector realizes the efficient insertion of the bloom filter white list data.
Preferably, matching the fingerprint data of the suspected legal software with a tree-shaped white list data index database in a server includes:
acquiring fingerprint data of the suspected legal software and acquiring a hash characteristic value of the suspected legal software based on the fingerprint data of the suspected legal software;
dividing the hash characteristic value of the suspected legal software into segments corresponding to the child nodes and the leaf nodes according to the depth of the tree data structure;
when all the segments of the suspected legal software are not successfully matched with any node in the corresponding child nodes and leaf nodes in the tree-shaped white list data index base, judging that the suspected legal software is illegal software; when all the segments of the suspected legal software can be successfully matched with any one of the corresponding child nodes and leaf nodes in the tree-shaped white list data index base, matching the fingerprint data of the suspected legal software with each piece of fingerprint data under the corresponding leaf nodes; and judging the suspected legal software as legal software when the fingerprint data is successfully matched, and judging the suspected legal software as illegal software when the fingerprint data is not successfully matched.
By the method, for software with low hit access times of the bloom filter, the quick indexing and matching of the white list are carried out in the white list server based on the hash value, so that the misjudgment rate of the bloom filter can be eliminated.
Preferably, when all segments of the hash feature values of the suspected legal software are matched with the corresponding child nodes and leaf nodes in the tree-shaped white list data index base, the hash feature values of the suspected legal software are sequentially matched with the data of the child nodes and the leaf nodes, and when any node data fails to be successfully matched, the node which fails to be matched is directly judged to be no longer matched with the next stage.
By the method, the follow-up matching query flow is timely interrupted at the first time when the matching result appears, and the white list query efficiency is improved.
Preferably, when the fingerprint data is successfully matched, the access frequency data is added with 1 to obtain new access frequency data; and when the new access frequency data is larger than or equal to a preset threshold value, the fingerprint data successfully matched is issued to a white list cache of the terminal.
By the method, only software data with access times greater than or equal to a threshold value is issued to the white list cache of the terminal, instead of caching the whole set of the white list data to the terminal, and the problem of high storage occupation in the client is solved.
Preferably, the data in the terminal white list bloom filter comprises data from a bloom filter in the server.
By the method, the white list software is filtered in the bloom filter and is carried out at the client, so that the feedback speed of the query result is improved, and the query efficiency is further improved.
Preferably, the fingerprint data of the software includes at least a name, version number and MD5 verification feature value of the software.
By the method, necessary technical information of the software is contained in fingerprint data of the software.
Compared with the prior art, the software white list quick query method based on the bloom filter is characterized in that software successfully matched with the bloom filter of the white list is continuously subjected to matching query with data in a white list data cache or with the tree-shaped white list data index database, so that misjudgment of the bloom filter is eliminated; through the tree-shaped white list data index library of the server side and the white list data cache query mechanism of the client side, the server side can save the whole set of the white list library, and the client side caches the frequently accessed white list data set, so that trade-off is realized between white list query efficiency and storage occupation.
Drawings
In order to more clearly illustrate the embodiments of the present description or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments described in the embodiments of the present description, and other drawings may be obtained according to these drawings for a person having ordinary skill in the art.
Fig. 1 is a flowchart of a software whitelist query method based on a bloom filter provided in this embodiment;
fig. 2 is a diagram of a software white list data acquisition structure based on a bloom filter according to the present embodiment;
fig. 3 is a flowchart of software whitelist data acquisition based on bloom filter provided in this embodiment;
fig. 4 is a software whitelist data storage structure diagram based on a tree whitelist data index base according to the present embodiment;
fig. 5 is a flowchart of software whitelist data storage based on a tree whitelist data index base according to the present embodiment.
Reference numerals:
001. a child node; 002. leaf nodes; 003. a white list data linked list; 004. hash characteristic values of software fingerprint data; 005. bloom filter old value; 006. bloom filter new value.
Detailed Description
For the purposes of making the objects, technical solutions and advantages of the embodiments of the present application more clear, the technical solutions of the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is apparent that the described embodiments are some embodiments of the present application, but not all embodiments. All other embodiments, which can be made by one of ordinary skill in the art without undue burden from the present disclosure, are within the scope of the present disclosure.
For the purpose of facilitating an understanding of the embodiments of the present application, reference will now be made to the following description of specific embodiments, taken in conjunction with the accompanying drawings, in which the embodiments are not intended to limit the embodiments of the present application.
The present embodiment provides a software whitelist query method based on a bloom filter, in the prior art, the blacklist search based on the bloom filter is applied, but the blacklist search is different from the blacklist search by adopting the bloom filter, and based on the blacklist software bloom filter, most of whitelist software can be rapidly determined, whether illegal software or whitelist software still needs further determination for hit requests, and installation requests of a large number of illegal software cannot be prevented at the first time, and efficiency is relatively low.
The bloom filter stores white list data, non-white software which is not in the white list can be filtered rapidly, but as the bloom filter has a certain probability of misjudgment, whether the software fingerprint is truly in the white list or not needs to be further judged for the software hit in the bloom filter so as to ensure the accuracy of white list judgment, and in the white list judgment, if all fingerprint information of the software is compared with a database, the accuracy is improved but the efficiency is insufficient, so that the software white list inquiring method based on the bloom filter is provided, and the efficiency and the accuracy of white list searching are considered through the method.
Specifically, the software whitelist query method based on the bloom filter in this embodiment aims to eliminate misjudgment caused by the bloom filter and solve the problems that client query storage occupation is high and whitelist query matching efficiency is low due to the fact that a client caches a whitelist data set for misjudgment of the bloom filter.
In order to explain the technical solution in this embodiment, the white list data of the software M to be judged is taken as an example to describe a software white list query method based on a bloom filter.
Specifically, the flowchart of the software whitelist query method based on the bloom filter in this embodiment is shown in fig. 1. The method comprises the following steps:
step S1, fingerprint data of software is matched with data in a white list bloom filter of a terminal, and if the matching is successful, the software is judged to be suspected legal software; if the matching is unsuccessful, judging that the software belongs to illegal software.
Wherein the fingerprint data of the software comprises data with software name information; the terminal white list bloom filter is a white list bloom filter applied to a terminal, the bloom filter is a probability type data structure based on binary vectors, can support efficient insertion and query, can be used for determining that certain data does not exist or possibly exists in a set, and the terminal white list bloom filter is used for rapidly filtering illegal software based on a white list.
Preferably, the fingerprint data of the software includes at least a name, version number and MD5 verification feature value of the software.
Taking the software M to be judged as an example, fingerprint data D (M) of the software M to be judged is imported into the client, wherein the fingerprint data D (M) of the software M to be judged is a string of character strings, and the character strings comprise multidimensional characteristic values such as names, version numbers, MD5 checks and the like of the software M to be judged.
Preferably, the data in the terminal white list bloom filter comprises data from a bloom filter in the server.
The terminal white list bloom filter is issued in advance from the server, and the inquiry based on white list software is carried out on the terminal white list bloom filter, so that the response speed of the inquiry can be improved, and the inquiry efficiency is improved.
Preferably, the white list bloom filter data is obtained by:
acquiring fingerprint data of software in the white list; respectively calculating fingerprint data of the software aiming at N different hash functions to obtain N different index values; setting the value of the position corresponding to the index value in the binary number group of the bloom filter as 1, setting the values of other positions in the binary number group of the bloom filter as 0, setting the binary values of the corresponding positions of the array corresponding to the index value as 1, setting the binary values of other positions as 0, and obtaining the hash characteristic values of the software in the white list; and taking or obtaining the white list bloom filter data by the hash characteristic value of the software in the white list and the bloom filter old value according to the bit.
Bloom filters are a probabilistic data structure based on binary vectors that support efficient insertion and querying of software data. The hash characteristic values of all the white list software are stored in the white list bloom filter, and a query basis is provided for the rapid filtering of illegal software.
Taking the white list software K stored in the bloom filter as an example, a software white list data acquisition structure diagram based on the bloom filter is shown in fig. 2, a software white list data acquisition flow diagram based on the bloom filter is shown in fig. 3, and the method comprises the following steps:
in step S001, fingerprint data of the white list software K is imported to the server, where the fingerprint data of the white list software K is a string of character strings, and includes multidimensional feature values such as a name, version number, and MD5 verification of the software M to be judged.
In step S002, the fingerprint data of the white list software K is processed by 4 hash functions (h_1, h_2, h_3 and h_4) to obtain 4 index values H1, H2, H3 and H4.
In step S003, the binary vector length of the bloom filter is 16, the binary values of the h1, h2, h3, and h4 positions in the bloom filter are set to be 1, and the other positions are set to be 0, so as to obtain the hash feature HF (K) of the software M to be judged.
Step S004, the hash feature HF (K) of the white list software K and the Old value BF_old of the bloom filter are subjected to bit-wise OR to obtain the New value BF_New of the bloom filter after updating.
And obtaining the list bloom filter containing the K hash characteristic value of the white list software through the steps.
Matching the fingerprint data of the software with the data in the terminal white list bloom filter, and if the matching is successful, judging that the software is suspected legal software; if the matching is unsuccessful, judging that the software belongs to illegal software. By matching the fingerprint data of the software with the data in the terminal white list bloom filter, the filtering of illegal software can be realized rapidly, and the invasion of illegal software is prevented.
Preferably, matching the fingerprint data of the software with the data in the white list bloom filter of the terminal includes: respectively calculating fingerprint data of the software aiming at N different hash functions to obtain N different index values; judging whether the numerical values of N positions corresponding to the N different index values in the binary number group of the bloom filter are all 1, judging that the software is suspected legal software when the numerical values are all 1, and judging that the software is illegal software when the numerical values are not all 1.
The bloom filter is used for inquiring the white list data, and the quick filtration of most illegal software can be realized through a section of binary vector, so that the efficiency of software filtration inquiry is improved.
Taking the software M to be judged as an example, fingerprint data D (M) of the software M to be judged is imported into a client, and 4 hash functions (H_1, H_2, H_3 and H_4) are processed on the fingerprint data D (M) of the software M to be judged to obtain 4 index values H1, H2, H3 and H4.
Searching whether binary values of h1, h2, h3 and h4 positions are all 1 in a white list bloom filter of the client, if so, indicating that the software is suspected legal software, and executing step S2; if not all the software is 1, the software is illegal, the query failure is returned, and the process is finished.
Step S2, matching the fingerprint data of the suspected legal software with the fingerprint data of the white list software in the client white list data cache; if the matching is successful, executing the step S3; if the matching is unsuccessful, step S4 is performed.
The method comprises the steps that fingerprint data of white list software in a client cache is issued from frequently accessed software fingerprint data in a server-side tree-shaped white list data index database, when the server-side tree-shaped white list data index database inquires the fingerprint data of the software, if the fingerprint data access times of the software are greater than or equal to a preset threshold value, the fingerprint data of the software are issued to the white list cache of the client.
The fingerprint data of the white list software with the access times larger than the threshold value is only issued to the client cache, but not the fingerprint data of all the white list software, so that the occupation amount stored in the client can be reduced.
The fingerprint data of the suspected legal software is matched with the fingerprint data of the white list software in the white list data cache of the client, whether the suspected legal software is really in the white list is further judged, the white list software data with a large number of accesses are cached in the client, and whether the suspected legal software is in the white list software which is frequently queried can be judged through matching searching.
Taking the software M to be judged as an example, searching whether fingerprint data D (M) of the software M to be judged exists in a white list data cache of the client, if so, indicating that the software is legal, executing step S3, returning to the successful inquiry, and ending; if not, the software is the suspected legal software, and the step S4 is executed.
And step S3, if the matching is successful, judging that the suspected legal software is legal software.
And S4, if the matching is unsuccessful, matching the fingerprint data of the suspected legal software with a tree-shaped white list data index database in the server, if the matching is successful, judging that the suspected legal software is legal software, and if the matching is unsuccessful, judging that the suspected legal software is illegal software.
The tree-shaped white list data index base in the server comprises a root node, a child node, a leaf node and a white list data linked list, and is used for storing data of white list software.
Preferably, the tree-shaped white list data index base comprises a root node, a child node and a leaf node which are connected in a tree-shaped data structure; the leaf nodes are connected with a white list data link list, and the white list data link list comprises fingerprint data and access times data of software with the same hash characteristic value as that of an index path corresponding to the leaf nodes; the access times data is used for recording the times of the software fingerprint data being queried.
The method comprises the steps that fingerprint data of white list software and access times of the fingerprint data are stored in a white list data chain, the access times are used for recording the times of inquiring the software fingerprint data, hash characteristic values of the software fingerprint data correspond to the software fingerprint data one by one, the access times are used for judging whether the white list software is frequently accessed software fingerprint data or not, and the software fingerprint data are conveniently issued to client side cache white list data, so that a data source is provided for the client side cache white list data.
Preferably, in the tree-shaped white list data index base, the data of the child nodes and the leaf nodes are determined based on hash characteristic values of software in a white list; the hash characteristic value of the software listed in the white list is segmented into segments corresponding to the child nodes and the leaf nodes according to the depth of the tree data structure; and sequentially stores the segmented data in the corresponding nodes.
In order to avoid the influence of false judgment of a bloom filter on white name management, a tree-shaped white list data index base is constructed based on hash feature design generated by software, and storage, quick indexing and query of white list data can be realized.
Taking white list software K as an example, a software white list data storage structure diagram based on a tree-shaped white list data index base is shown in fig. 4, a software white list data storage flow diagram based on a tree-shaped white list data index base is shown in fig. 5, and the method comprises the following steps:
step S101, a hash feature value HF (K) of the white list software K is acquired.
In step S102, the hash feature value HF (K) of the white list software K is segmented into HF (K) [0:3], HF (K) [4:7], HF (K) [8:11], and HF (K) [12:15] according to the depth d=4 of the index tree.
Step S103, matching HF (K) [0:3] in the index tree at the 1 st layer (the root node is empty and is the 0 th layer), and if the matching is successful, executing step S104; if the matching fails, adding HF (K) 0:3 nodes to the 1 st layer, sequentially generating HF (K) 4:7, HF (K) 8:11 and HF (K) 12:15 nodes to the 2 nd-4 th layers under the nodes, and executing the step S107.
Step S104, matching the HF (K) [4:7] with the layer 2 under the HF (K) [0:3] node in the index tree, and if the matching is successful, executing step S105; if the matching fails, adding HF (K) [4:7] nodes to the layer 2, sequentially generating HF (K) [8:11] and HF (K) [12:15] nodes to the 3-4 layers under the nodes, and executing the step S107.
Step S105, matching the HF (K) [8:11] with the 3 rd layer under the HF (K) [4:7] node in the index tree, and if the matching is successful, executing step S106; if the matching fails, adding HF (K) [8:11] nodes to the 3 rd layer, sequentially generating HF (K) [12:15] nodes to the 4 th layer under the nodes, and executing the step S107.
Step S106, matching the HF (K) [12:15] with the 4 th layer under the HF (K) [8:11] node in the index tree, and if the matching is successful, executing step S107; if the matching fails, generating HF (K) [12:15] node at the 4 th layer under the HF (K) [8:11] node, and executing step S107.
Step S107, storing white list data with the same hash characteristic in a white list data chain table under the leaf nodes of HF (K) [12:15], sequentially matching each node on the chain by using white list data D (K), and if the matching is successful, indicating that the white list data is stored in a library; if the match fails, step S108 is performed.
And S108, creating a node at the tail of a white list data link list of the HF (K) [12:15] leaf node, filling the white list data D (K), setting the access times to be 0, and finally linking the node into the white list data link list.
Through the steps, a tree-shaped white list data index base containing hash characteristic values of the white list software K and fingerprint data of the white list software K is obtained.
And matching the fingerprint data of the suspected legal software with a tree-shaped white list data index library in the server, judging the suspected legal software as legal software if the matching is successful, and judging the suspected legal software as illegal software if the matching is unsuccessful.
For suspected legal software hit by a white list bloom filter which is not cached in the white list data of the terminal, then carrying out matching query with a tree-shaped white list data index base, firstly carrying out searching and matching on hash characteristic values of the suspected legal software and hash characteristic values of the white list software stored in child nodes and leaf nodes of the tree-shaped white list data index base, and if the matching is unsuccessful, the suspected legal software is illegal software; and if the matching is successful, searching and matching the fingerprint data of the suspected legal software with the fingerprint data in the white list data link list, if the matching is successful, judging the suspected legal software as legal software, and if the matching is unsuccessful, judging the suspected legal software as illegal software.
And for suspected legal software which does not cache the hit of the bloom filter of the white list in the terminal, carrying out quick indexing and matching of the white list in the white list server based on the hash value, so that the misjudgment rate of the bloom filter can be eliminated.
Preferably, matching the fingerprint data of the suspected legal software with a tree-shaped white list data index database in a server includes:
acquiring fingerprint data of the suspected legal software and acquiring a hash characteristic value of the suspected legal software based on the fingerprint data of the suspected legal software;
dividing the hash characteristic value of the suspected legal software into segments corresponding to the child nodes and the leaf nodes according to the depth of the tree data structure;
when all the segments of the suspected legal software are not successfully matched with any node in the corresponding child nodes and leaf nodes in the tree-shaped white list data index base, judging that the suspected legal software is illegal software; when all the segments of the suspected legal software can be successfully matched with any one of the corresponding child nodes and leaf nodes in the tree-shaped white list data index base, matching the fingerprint data of the suspected legal software with each piece of fingerprint data under the corresponding leaf nodes; and judging the suspected legal software as legal software when the fingerprint data is successfully matched, and judging the suspected legal software as illegal software when the fingerprint data is not successfully matched.
Taking the software M to be judged as an example, matching fingerprint data of the software M to be judged with a tree-shaped white list data index database in a server.
The client submits { fingerprint data D (M) of the software M to be judged, hash characteristics HF (M) } of the software M to be judged to the white list server, and initiates a white list query request.
The white list server splits the hash characteristic HF (M) of the software M to be judged into HF (M) [0:3], HF (M) [4:7], HF (M) [8:11], HF (M) [12:15], and searches and matches sub-nodes and leaf nodes in an index tree of a tree-shaped white list data index structure in sequence, if the matching fails, the software is illegal, and the matching is stopped; if the matching is successful, the corresponding leaf node is found.
In a white list data link list hung under the leaf node, using fingerprint data D (M) of the software M to be judged to sequentially match the software fingerprint data in each node, if the matching is successful, adding 1 to an access frequency counter on the node with successful matching, judging whether the access frequency is greater than or equal to a threshold value P, if so, issuing { the fingerprint data D (M) of the software M to be judged } to a white list cache of the client, and returning that the query is successful; if the query is smaller than the query success is returned to the terminal agent; if the matching fails, the software is illegal, the inquiry failure is returned to the client, and the process is finished.
Preferably, when all segments of the hash feature values of the suspected legal software are matched with the corresponding child nodes and leaf nodes in the tree-shaped white list data index base, the hash feature values of the suspected legal software are sequentially matched with the data of the child nodes and the leaf nodes, and when any node data fails to be successfully matched, the node which fails to be matched is directly judged to be no longer matched with the next stage.
And the subsequent matching inquiry flow is timely interrupted at the first time when the matching result appears, so that the efficiency of white list inquiry is improved.
Taking software M to be judged as an example, the white list server splits the hash characteristic HF (M) of the software M to be judged into HF (M) [0:3], HF (M) [4:7], HF (M) [8:11], HF (M) [12:15], and sequentially searches and matches sub-nodes and leaf nodes in an index tree of a tree-shaped white list data index structure, if the matching fails, the software is illegal, and the matching is stopped; if the matching is successful, the corresponding leaf node is found.
Preferably, when the fingerprint data is successfully matched, the access frequency data is added with 1 to obtain new access frequency data; and when the new access frequency data is larger than or equal to a preset threshold value, the fingerprint data successfully matched is issued to a white list cache of the terminal.
Only software data with access times greater than or equal to a threshold value is issued to the white list cache of the terminal, instead of caching the whole set of the white list data to the terminal, and the problem of high storage occupation in the client is solved.
Taking the software M to be judged as an example, using fingerprint data D (M) of the software M to be judged to sequentially match the software fingerprint data in each node, if the matching is successful, adding 1 to an access frequency counter on the node successfully matched, judging whether the access frequency is greater than or equal to a threshold value P, if so, issuing { the fingerprint data D (M) of the software M to be judged } to a white list cache of a client, and returning successful inquiry; if the query is smaller than the query success is returned to the terminal agent; if the matching fails, the software is illegal, the inquiry failure is returned to the client, and the process is finished.
The foregoing embodiments have been provided for the purpose of illustrating the general principles of the present application, and are not meant to limit the scope of the invention, but to limit the scope of the invention.

Claims (9)

1. A bloom filter-based software whitelist query method, the method comprising:
matching the fingerprint data of the software with the data in the terminal white list bloom filter, and if the matching is successful, judging that the software is suspected legal software; if the matching is unsuccessful, judging that the software belongs to illegal software;
matching the fingerprint data of the suspected legal software with the fingerprint data of the white list software in the terminal white list data cache; if the matching is successful, judging the suspected legal software as legal software; if the matching is unsuccessful, matching the fingerprint data of the suspected legal software with software index information and fingerprint information in a tree-shaped white list data index base in a server, if the matching is successful, judging the suspected legal software as legal software, and if the matching is unsuccessful, judging the suspected legal software as illegal software;
the tree-shaped white list data index library comprises a root node, a child node and a leaf node which are connected in a tree-shaped data structure; the leaf nodes are connected with a white list data link list, and the white list data link list comprises fingerprint data and access times data of software with the same hash characteristic value as that in an index path corresponding to the leaf nodes; the access times data is used for recording the times of the software fingerprint data being queried.
2. The bloom filter-based software whitelist query method of claim 1, wherein matching fingerprint data of the software with data in a whitelist bloom filter of a terminal comprises:
respectively calculating fingerprint data of the software aiming at N different hash functions to obtain N different index values;
judging whether the numerical values of N positions corresponding to the N different index values in the binary number group of the bloom filter are all 1, judging that the software is suspected legal software when the numerical values are all 1, and judging that the software is illegal software when the numerical values are not all 1.
3. The bloom filter-based software whitelist query method of claim 1, wherein in the tree whitelist data index base, child node and leaf node data are determined based on hash feature values of software in a whitelist;
wherein, the liquid crystal display device comprises a liquid crystal display device,
dividing hash characteristic values of software listed in a white list into segments corresponding to the child nodes and the leaf nodes according to the depth of the tree data structure; and sequentially stores the segmented data in the corresponding nodes.
4. The bloom filter-based software whitelist query method of claim 1, wherein the whitelist bloom filter data is obtained by:
acquiring fingerprint data of software in the white list;
respectively calculating fingerprint data of the software aiming at N different hash functions to obtain N different index values;
setting the value of the position corresponding to the index value in the binary number group of the bloom filter as 1, setting the values of other positions in the binary number group of the bloom filter as 0, setting the binary values of the corresponding positions of the array corresponding to the index value as 1, setting the binary values of other positions as 0, and obtaining the hash characteristic values of the software in the white list;
and taking or obtaining the white list bloom filter data by the hash characteristic value of the software in the white list and the bloom filter old value according to the position.
5. The bloom filter-based software whitelist query method of claim 1, wherein matching the fingerprint data of the suspected legitimate software with a tree-shaped whitelist data index library in a server comprises:
acquiring fingerprint data of the suspected legal software and acquiring a hash characteristic value of the suspected legal software based on the fingerprint data of the suspected legal software;
dividing the hash characteristic value of the suspected legal software into segments corresponding to the child nodes and the leaf nodes according to the depth of the tree data structure;
when all the segments of the suspected legal software are not successfully matched with any node in the corresponding child nodes and leaf nodes in the tree-shaped white list data index base, judging that the suspected legal software is illegal software; when all the segments of the suspected legal software can be successfully matched with any one of the corresponding child nodes and leaf nodes in the tree-shaped white list data index base, matching the fingerprint data of the suspected legal software with each piece of fingerprint data under the corresponding leaf nodes; and judging the suspected legal software as legal software when the fingerprint data is successfully matched, and judging the suspected legal software as illegal software when the fingerprint data is not successfully matched.
6. The bloom filter-based software whitelist query method of claim 5, wherein when all segments of the suspected legal software hash feature values are matched with corresponding child nodes and leaf nodes in the tree whitelist data index base, the hash feature values of the segmented suspected legal software are sequentially matched with the data of the child nodes and the leaf nodes, and when any node data fails to be matched successfully, the failure of matching is directly judged, and the node of the next stage is no longer matched.
7. The bloom filter-based software whitelist query method of claim 5, further adding 1 to the access count data to obtain new access count data when fingerprint data matching is successful; and when the new access frequency data is larger than or equal to a preset threshold value, the fingerprint data successfully matched is issued to a white list cache of the terminal.
8. The bloom filter-based software whitelist query method of claim 1, wherein the data in the terminal whitelist bloom filter includes data from a bloom filter in the server.
9. The bloom filter-based software whitelist query method of claim 1, wherein the fingerprint data of the software includes at least a name, version number, and MD5 check feature value of the software.
CN202310061459.1A 2023-01-13 2023-01-13 Software white list query method based on bloom filter Active CN115827702B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310061459.1A CN115827702B (en) 2023-01-13 2023-01-13 Software white list query method based on bloom filter

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310061459.1A CN115827702B (en) 2023-01-13 2023-01-13 Software white list query method based on bloom filter

Publications (2)

Publication Number Publication Date
CN115827702A CN115827702A (en) 2023-03-21
CN115827702B true CN115827702B (en) 2023-05-16

Family

ID=85520791

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310061459.1A Active CN115827702B (en) 2023-01-13 2023-01-13 Software white list query method based on bloom filter

Country Status (1)

Country Link
CN (1) CN115827702B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116707901A (en) * 2023-06-07 2023-09-05 中国人民解放军61660部队 Asset identification method and system based on Web snapshot

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
IL189530A0 (en) * 2007-02-15 2009-02-11 Marvell Software Solutions Isr Method and apparatus for deep packet inspection for network intrusion detection
US8306988B1 (en) * 2009-10-26 2012-11-06 Mcafee, Inc. System, method, and computer program product for segmenting a database based, at least in part, on a prevalence associated with known objects included in the database
CN104850783B (en) * 2015-04-30 2018-07-13 中国人民解放军国防科学技术大学 A kind of Malware cloud detection method of optic and system based on Hash eigenmatrix

Also Published As

Publication number Publication date
CN115827702A (en) 2023-03-21

Similar Documents

Publication Publication Date Title
US10303673B2 (en) Hierarchical data storage
US8180763B2 (en) Cache-friendly B-tree accelerator
US7702640B1 (en) Stratified unbalanced trees for indexing of data items within a computer system
US20070233720A1 (en) Lazy bulk insertion method for moving object indexing
CN107729371B (en) Data indexing and querying method, device, equipment and storage medium of block chain
CN115827702B (en) Software white list query method based on bloom filter
CN110727663A (en) Data cleaning method, device, equipment and medium
CN107256263A (en) Internet hot spots information automatic monitoring method
CN109189759B (en) Data reading method, data query method, device and equipment in KV storage system
CN111930923B (en) Bloom filter system and filtering method
CN111930924A (en) Data duplicate checking system and method based on bloom filter
CN115269631A (en) Data query method, data query system, device and storage medium
CN113821630A (en) Data clustering method and device
CN111061972B (en) AC searching optimization method and device for URL path matching
EP3005161A1 (en) Datasets profiling tools, methods, and systems
CN116361287A (en) Path analysis method, device and system
CN115495462A (en) Batch data updating method and device, electronic equipment and readable storage medium
CN113839940B (en) URL pattern tree-based defense method, device, electronic equipment and readable storage medium
CN107045535B (en) Database table index
CN112199396B (en) Industrial Internet identification query method and system facing MES
CN114416741A (en) KV data writing and reading method and device based on multi-level index and storage medium
JP6323887B2 (en) Method and device for changing root node
JP2000339332A (en) Medium recording retrieval index, method and device for updating retrieval index and medium recording its program
CN113177031B (en) Processing method and device for database shared cache, electronic equipment and medium
US20220188339A1 (en) Network environment synchronization apparatus and method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant