CN113051566B - Virus detection method and device, electronic equipment and storage medium - Google Patents

Virus detection method and device, electronic equipment and storage medium Download PDF

Info

Publication number
CN113051566B
CN113051566B CN202110335752.3A CN202110335752A CN113051566B CN 113051566 B CN113051566 B CN 113051566B CN 202110335752 A CN202110335752 A CN 202110335752A CN 113051566 B CN113051566 B CN 113051566B
Authority
CN
China
Prior art keywords
file
virus
bloom filter
stage
rule
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110335752.3A
Other languages
Chinese (zh)
Other versions
CN113051566A (en
Inventor
闫华
位凯志
古亮
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Sangfor Technologies Co Ltd
Original Assignee
Sangfor Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Sangfor Technologies Co Ltd filed Critical Sangfor Technologies Co Ltd
Priority to CN202110335752.3A priority Critical patent/CN113051566B/en
Publication of CN113051566A publication Critical patent/CN113051566A/en
Application granted granted Critical
Publication of CN113051566B publication Critical patent/CN113051566B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/50Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
    • G06F21/55Detecting local intrusion or implementing counter-measures
    • G06F21/56Computer malware detection or handling, e.g. anti-virus arrangements
    • G06F21/562Static detection

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Security & Cryptography (AREA)
  • Software Systems (AREA)
  • Computer Hardware Design (AREA)
  • General Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Virology (AREA)
  • Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The embodiment of the invention is suitable for the technical field of computer security, and provides a virus detection method, a device, electronic equipment and a storage medium, wherein the virus detection method comprises the following steps: processing the file to be detected based on a bloom filter cluster, wherein the bloom filter cluster comprises at least two stages of bloom filters, each stage of bloom filter in the at least two stages of bloom filters respectively corresponds to rule fragments with different lengths in a virus rule base, and each stage of bloom filter is sequentially used for detecting whether the file to be detected is matched with the corresponding rule fragment; and determining whether virus detection is needed through a virus rule base or not based on a matching result of the file to be detected. The invention improves the efficiency of virus detection and shortens the time of virus detection.

Description

Virus detection method and device, electronic equipment and storage medium
Technical Field
The present invention relates to the field of computer security technologies, and in particular, to a virus detection method, a device, an electronic apparatus, and a storage medium.
Background
In the related art, when virus detection is performed, a virus analyzer manually or automatically analyzes a virus sample by an algorithm to extract virus rules in the virus sample. A virus rule is a continuous string. A plurality of virus rules are aggregated together to form a virus rule base. When the unknown file is subjected to virus detection, matching the character strings in the unknown file with virus rules in a virus rule library, and if the matching is successful, considering the unknown file as a virus file. In the related art, virus detection takes longer and longer along with the continuous expansion of a virus rule base.
Disclosure of Invention
In order to solve the above problems, embodiments of the present invention provide a method, an apparatus, an electronic device, and a storage medium for detecting viruses, so as to at least solve the problem of long detection time consumption caused by too many rules in a virus rule base in the related art.
The technical scheme of the invention is realized as follows:
in a first aspect, an embodiment of the present invention provides a method for detecting a virus, including:
processing a file to be tested based on a bloom filter cluster, wherein the bloom filter cluster comprises at least two stages of bloom filters, each stage of bloom filter in the at least two stages of bloom filters respectively corresponds to rule fragments with different lengths in a virus rule base, and each stage of bloom filter is sequentially used for detecting whether the file to be tested is matched with the corresponding rule fragment;
and determining whether virus detection is needed through the virus rule base or not based on the matching result of the file to be detected.
In the above scheme, the determining whether the virus detection is needed through the virus rule base based on the matching result of the file to be detected includes:
and under the condition that the matching result represents that the file to be detected is matched with the rule segments corresponding to the bloom filters of all stages, virus detection is carried out on the file to be detected based on the virus rule base.
In the above scheme, the determining whether the virus detection is needed through the virus rule base based on the matching result of the file to be detected includes:
and under the condition that the matching result represents that the file to be tested is not matched with the rule segment corresponding to any stage of bloom filter in each stage of bloom filter, determining that the file to be tested is a normal file.
In the above scheme, the virus detection on the file to be detected based on the virus rule base includes:
receiving a detection result from the virus rule base; the virus rule base is used for detecting whether the file to be detected is matched with the virus rules in the virus rule base.
In the above scheme, the processing the file to be tested based on the bloom filter cluster includes:
and under the condition that any stage of bloom filter in the bloom filter cluster detects that the file to be detected matches the corresponding rule segment, the next stage of bloom filter is used for matching.
In the above scheme, before the files to be tested are processed based on the bloom filter cluster, the method further comprises:
creating the at least two stages of bloom filters based on the virus rule base;
And splicing the at least two stages of bloom filters according to the sequence from short length to long length based on the lengths of the regular fragments corresponding to the bloom filters at each stage to obtain the bloom filter cluster.
In the above scheme, when each stage of bloom filter detects whether the file to be detected matches the corresponding rule segment, the method includes:
acquiring continuous character strings in the file to be tested based on a sliding window, and matching the continuous character strings with rule fragments corresponding to all levels of bloom filters to determine whether the file to be tested is matched with the corresponding rule fragments; the window width of the sliding window is the same as the length of the regular segment corresponding to each stage of bloom filter.
In a second aspect, an embodiment of the present invention provides a virus detection apparatus, including:
the processing module is used for processing the file to be detected based on a bloom filter cluster, the bloom filter cluster comprises at least two stages of bloom filters, each stage of bloom filter in the at least two stages of bloom filters corresponds to rule fragments with different lengths in a virus rule base respectively, and each stage of bloom filter is sequentially used for detecting whether the file to be detected is matched with the corresponding rule fragment;
And the determining module is used for determining whether virus detection is needed through the virus rule base or not based on the matching result of the file to be detected.
In a third aspect, an embodiment of the present invention provides an electronic device, including a processor and a memory, where the processor and the memory are connected to each other, where the memory is configured to store a computer program, the computer program includes program instructions, and the processor is configured to invoke the program instructions to execute the steps of the virus detection method provided in the first aspect of the embodiment of the present invention.
In a fourth aspect, embodiments of the present invention provide a computer-readable storage medium comprising: the computer readable storage medium stores a computer program. The computer program when executed by a processor implements the steps of the virus detection method as provided in the first aspect of the embodiment of the present invention.
The embodiment of the invention processes the file to be tested based on the bloom filter cluster, wherein the bloom filter cluster comprises at least two stages of bloom filters, each stage of bloom filter in the at least two stages of bloom filters respectively corresponds to rule fragments with different lengths in the virus rule base, and each stage of bloom filter is sequentially used for detecting whether the file to be tested is matched with the corresponding rule fragment. And determining whether virus detection is needed through a virus rule base or not based on a matching result of the file to be detected. According to the embodiment of the invention, the files to be detected are processed through the at least two stages of bloom filters, so that most files to be detected which are not virus files can be removed as soon as possible when virus detection is carried out, the efficiency of virus detection is improved, and the time for virus detection is reduced.
Drawings
FIG. 1 is a schematic flow chart of a related art virus detection;
FIG. 2 is a schematic diagram of an implementation flow of a virus detection method according to an embodiment of the present invention;
FIG. 3 is a schematic diagram of a bloom filter cluster provided by an embodiment of the present invention;
FIG. 4 is a schematic flow chart of another implementation of a virus detection method according to an embodiment of the present invention;
FIG. 5 is a schematic diagram of a multistage bloom filter architecture provided by an embodiment of the present invention;
FIG. 6 is a flow chart of the creation of a multi-stage bloom filter provided by an embodiment of the present invention;
FIG. 7 is a schematic diagram of a virus detection flow provided by an embodiment of the present invention;
FIG. 8 is a schematic diagram of a virus detection device according to an embodiment of the present invention;
fig. 9 is a schematic diagram of an electronic device according to an embodiment of the present invention.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the embodiments described are some, but not all embodiments of the invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
Referring to fig. 1, fig. 1 is a schematic flow chart of virus detection according to the related art. In the related technology, by analyzing a known virus sample, extracting virus rules in the virus sample, wherein the virus rules are a section of continuous character strings in the virus sample, and a large number of virus rules are gathered together to form a virus rule library. When the unknown file is subjected to virus detection, extracting continuous character strings with set lengths from the unknown file, matching the continuous character strings with the set lengths with virus rules in a virus rule library, and if the matching is successful, indicating that the unknown file contains the virus rules, wherein the unknown file is a virus file. However, as viruses of different types are increased, the number of virus rules in the virus rule base is increased. In virus detection, the matching of virus rules is generally performed by using classical multimode matching algorithms, such as Aho-Corasick algorithm and Boyer-Moore algorithm. When the number of virus rules in the virus rule library is large, the cache hit rate of the multimode matching algorithm is reduced, even memory jitter occurs, so that the huge time performance cost is caused, the virus detection time is long, and the user experience is influenced.
In the related art, there are several conventional anti-virus rule engine performance optimization schemes. The first scheme is a cloud separated distributed architecture, specifically, full Wen Haxi values of a large number of virus files are stored on a cloud server, when virus detection is performed on the detected files, full text hash values of the detected files are calculated, and the full text hash values are transmitted to the cloud server through a network. And after the cloud servers are compared, a detection result is returned, and if the full-text hash value of the tested file is consistent with the full Wen Haxi value stored by the cloud servers, the tested file is considered to be a virus file. The disadvantage of this solution is: 1. the hash value does not have generalization capability, and if the virus is slightly deformed, the scheme is invalid; 2. networking is necessary, otherwise the solution fails. The second scheme is a lightweight virus rule base, virus rules in the virus rule base are cut according to historical experience, frequently-occurring virus rules with larger influence are reserved, other virus rules with smaller influence are deleted, and therefore the purpose of improving detection performance is achieved. The disadvantage of this solution is: the virus searching and killing ability is weakened. The third scheme is that the virus rules are classified according to file types, specifically, according to file types, the virus rules corresponding to the virus files are respectively stored in different virus rule libraries. When virus detection is carried out, matching is carried out in a corresponding virus rule base according to the file type. The disadvantage of this solution is: performance problems can also result when the virus rules for a certain file type are too numerous.
In view of the drawbacks of the related art, the embodiment of the present invention provides a virus detection method, which at least can improve the virus detection efficiency and reduce the virus detection duration. In order to illustrate the technical scheme of the invention, the following description is made by specific examples.
Fig. 2 is a schematic flow chart of an implementation of a virus detection method according to an embodiment of the present invention, where an execution body of the virus detection method is an electronic device, and the electronic device may be a mobile terminal, a desktop computer, a notebook computer, or a server. Referring to fig. 2, the virus detection method includes:
s201, processing files to be tested based on a bloom filter cluster, wherein the bloom filter cluster comprises at least two stages of bloom filters, each stage of bloom filter in the at least two stages of bloom filters respectively corresponds to rule fragments with different lengths in a virus rule base, and each stage of bloom filter is sequentially used for detecting whether the files to be tested are matched with the corresponding rule fragments.
Before step S201, the file to be tested also needs to be input into the bloom filter cluster.
In the embodiment of the invention, the bloom filter cluster is composed of at least two stages of bloom filters, the lengths of the regular segments corresponding to each stage of bloom filter in the at least two stages of bloom filters are different, and each stage of bloom filter is spliced in series.
Bloom filters (Bloom filters) consist of a long binary vector and hash functions that can be used to quickly retrieve whether an element belongs to a collection.
When an element is added to the bloom filter, the following operations are performed:
the hash function in the bloom filter is used for calculating the element value to obtain a hash value, and a plurality of hash functions are used for obtaining a plurality of hash values. And setting the value of the corresponding index to 1 in the bit array according to the obtained hash value.
When it is desired to determine whether an element is present in the bloom filter, the following operations are performed:
the same hash calculation is performed again on the given element, after the hash value is obtained, whether each element in the bit array is 1 is judged, if the value is 1, the element is indicated to be in the bloom filter, and if one value which is not 1 exists, the element is indicated to be not in the bloom filter.
FIG. 3 is a schematic diagram of a bloom filter cluster according to an embodiment of the present invention, where the bloom filter cluster includes a plurality of bloom filters spliced in series, and in an embodiment, bloom filters in the bloom filter cluster are spliced from short to long according to lengths of regular segments corresponding to each stage of bloom filter in the bloom filter cluster. Here, the rule fragment refers to a continuous character string in the virus rule stored in the virus rule base, and it is assumed that one of the virus rules is dDR0ZzQ9OFBzdmVic0ZQMVhuaFJ and one of the rule fragments is dDR0ZzQ9OFB. Assuming that the length of the regular segment corresponding to the primary bloom filter is 5, the length of the regular segment corresponding to the secondary bloom filter is 7, and the length of the regular segment corresponding to the tertiary bloom filter is 10, splicing is performed according to the sequence of the primary bloom filter, the secondary bloom filter and the tertiary bloom filter, and a bloom filter cluster is obtained. When the file to be detected is subjected to virus detection, the file to be detected is input into a first stage bloom filter in a bloom filter cluster, the length of a rule segment corresponding to the first stage bloom filter is shortest, and virus detection is started from the first stage bloom filter.
In an embodiment of the present invention, each stage of bloom filter is used for detecting whether a file to be detected matches a corresponding rule segment, and in an embodiment, when each stage of bloom filter detects whether the file to be detected matches a corresponding rule segment, the method includes:
acquiring continuous character strings in the file to be tested based on a sliding window, and matching the continuous character strings with rule fragments corresponding to all levels of bloom filters to determine whether the file to be tested is matched with the corresponding rule fragments; the window width of the sliding window is the same as the length of the regular segment corresponding to each stage of bloom filter.
A sliding window (slidingnwindows) is a way of generating a continuous string, and given a continuous string, a virtual window with a fixed width is placed at the start position of the continuous string, and then the window is slid from left to right, so as to sequentially generate the continuous string with the window width. And continuous character strings in the file to be tested can be obtained through the sliding window. Here, the window width of the sliding window needs to be the same as the length of the regular segment corresponding to the bloom filter, so that the continuous character string with the same length as the regular segment can be extracted for matching.
For example, when the primary bloom filter detects whether the file to be detected matches the corresponding rule segment, assuming that the length of the rule segment corresponding to the primary bloom filter is 10, setting the window width of the sliding window to be 10, acquiring continuous character strings with the character string length of 10 in the file to be detected through the sliding window, and matching the continuous character strings in the file to be detected with the corresponding rule segment by the primary bloom filter, where matching refers to determining whether the continuous character strings in the file to be detected are identical to the rule segment corresponding to the primary bloom filter.
In an embodiment, the processing the file to be tested based on the bloom filter cluster includes:
and under the condition that any stage of bloom filter in the bloom filter cluster detects that the file to be detected matches the corresponding rule segment, the next stage of bloom filter is used for matching.
Here, instead of using multiple levels of bloom filter detection simultaneously, each level of bloom filter is spliced in series, in an embodiment, in a case where a bloom filter of a certain level detects a rule segment corresponding to a matching of a file to be tested, the file to be tested is sent to a bloom filter of a next level of the bloom filter of the level to be tested for matching, and so on.
Before virus detection can take place, the bloom filter clusters need to be created based on the virus rule base and the length of the rule segments that are set in advance.
Referring to fig. 4, in an embodiment, the method further comprises:
s401, creating the at least two stages of bloom filters based on the virus rule base.
The rule fragments corresponding to each stage of bloom filter in at least two stages of bloom filters are all from a virus rule base, and the rule fragments corresponding to the bloom filters are obtained from virus rules in the virus rule base. For example, the length of the rule segment corresponding to the primary bloom filter is set to be 15, one virus rule in the virus rule library is dDR0ZzQ OFBzdmVic0ZQMVhuaFJ, the rule segment with the length of 15 in the virus rule is extracted, for example, one rule segment is 0ZzQ9OFBzdmVic0, a plurality of rule segments with the length of 15 can be extracted from the virus rule, and the rule segments with the length of 15 can be extracted from other virus rules in the virus rule library. A first stage bloom filter is created from length 15 rule segments extracted from the virus rule. Similarly, other stages of bloom filters may be created as described above.
Here, the more rule fragments are extracted from the virus rules of the virus rule library, the higher the accuracy and the smaller the false alarm rate of the corresponding bloom filter in virus detection.
In practical applications, when creating a bloom filter, a corresponding bloom filter needs to be created based on a set hash function. The bloom filter is composed of a long binary vector and a plurality of hash functions, when the bloom filter is created, the hash functions can be set to be multiplication hash, and hash seed values can be set to be prime numbers such as 13, 1313, 1331, 17 and 1717 or a complex number with fewer prime factors. Here, the hash functions of the settings corresponding to each stage of bloom filter may be plural, and the hash functions of the settings corresponding to each stage of bloom filter may be different from each other.
When a rule segment needs to be added into a bloom filter, the rule segment firstly generates different hash values by a plurality of set hash functions, then the elements in the table below of the corresponding bit array are set to be 1, and when the bit array is initialized, all positions are 0. When the same rule fragment is stored a second time, it is very convenient to de-duplicate, as it is easily known that this value already exists, since the previous corresponding position has been set to 1.
When it is required to determine whether a continuous character string in the file to be tested exists in the bloom filter, hash calculation is only needed to be performed on the continuous character string, after a hash value is obtained, whether each element in the bit array is 1 is determined, and if the value is 1, it is indicated that the continuous character string is in the bloom filter, that is, the rule segment corresponding to the matching of the file to be tested. If a value other than 1 exists, the continuous character string is not in the bloom filter, that is, the file to be tested does not match the corresponding rule segment. Wherein different consecutive strings may be hashed out in the same position, which may be appropriate to increase the bit array size or to adjust the set hash function.
And S402, splicing the at least two stages of bloom filters according to the sequence from short length to long length based on the lengths of the regular fragments corresponding to the bloom filters at each stage to obtain the bloom filter cluster.
At least two stages of bloom filters are spliced according to the sequence from short to long of the length of the regular segment. For example, suppose a bloom filter cluster includes 3 stages of bloom filters, where the length of a regular segment corresponding to a primary bloom filter is 8, the length of a regular segment corresponding to a secondary bloom filter is 15, and the length of a regular segment corresponding to a tertiary bloom filter is 30. Then the bloom filter clusters are obtained by splicing the first-stage bloom filter, the second-stage bloom filter and the third-stage bloom filter in sequence. Through experiments, at least two stages of bloom filters are spliced according to the sequence from short length to long length of the regular segments, so that the detection efficiency of the bloom filter cluster virus is highest, and the time consumption is shortest.
S202, determining whether virus detection is needed through the virus rule library or not based on the matching result of the file to be detected.
Here, a total of 3 matching results can be obtained. One of the matching results indicates that the file to be tested is matched with the rule segments corresponding to each stage of bloom filter. The matching result indicates that the file to be tested is matched with the rule segment corresponding to the individual bloom filter in the bloom filter cluster, and the rule segment corresponding to the individual bloom filter is not matched. A matching result indicates that the file to be tested is not matched with the rule segments corresponding to the bloom filters of all levels.
In an embodiment, the determining whether the virus detection is needed through the virus rule base based on the matching result of the file to be detected includes:
and under the condition that the matching result represents that the file to be detected is matched with the rule segments corresponding to the bloom filters of all stages, virus detection is carried out on the file to be detected based on the virus rule base.
If each stage of bloom filter in the bloom filter cluster detects that the file to be detected is matched with the corresponding rule segment, the file to be detected is required to be sent to a virus rule base, and virus detection is carried out on the file to be detected by the virus rule base.
In an embodiment, the virus detection on the file to be detected based on the virus rule base includes:
receiving a detection result from the virus rule base; the virus rule base is used for detecting whether the file to be detected is matched with the virus rules in the virus rule base.
Here, the virus rule base may be stored locally or in a cloud server. If the virus rule base is stored locally, the electronic equipment can directly utilize the virus rule base to carry out virus detection, so that a detection result is obtained. If the virus rule base is stored in the cloud server, the file to be tested needs to be sent to the cloud server, and the electronic equipment receives a detection result of the cloud server on the file to be tested.
In an embodiment, the determining whether the virus detection is needed through the virus rule base based on the matching result of the file to be detected includes:
and under the condition that the matching result represents that the file to be tested is not matched with the rule segment corresponding to any stage of bloom filter in each stage of bloom filter, determining that the file to be tested is a normal file.
If the file to be tested is not matched with the rule segment corresponding to a bloom filter of a certain stage in the bloom filter cluster or the file to be tested is not matched with the rule segment corresponding to each stage of bloom filter, the file to be tested is determined to be a normal file according to the fact that the file to be tested does not comprise the rule segment in the virus rule.
In practical application, as long as the first-level bloom filter detects that the file to be detected is not matched with the corresponding rule segment, the subsequent bloom filter does not detect any more, and the file to be detected is determined to be a normal file.
In practical application, it is assumed that the bloom filter clusters include three-stage bloom filters in total, and the three-stage bloom filters are spliced in order from short to long according to a set length. When the bloom filter cluster is used for virus detection, a first-stage bloom filter with a shorter set length is used for filtering, and most files on actual equipment are normal files, so that only a small number of files are possible to be virus files, and most files to be detected can be removed earlier by the first-stage bloom filter. And because the bloom filter clusters are deployed in the memory, the searching and killing speed of the bloom filter clusters is high, and the performance of the first-stage bloom filter is strongest in the bloom filter clusters, the virus detection efficiency can be improved, and most files which are not viruses are removed. If the first stage bloom filter intercepts the tested file, the second stage bloom filter is used for filtering the tested file intercepted by the first stage bloom filter, if the second stage bloom filter intercepts the tested file again, the third stage bloom filter is used for filtering the tested file, and if all bloom filters intercept the tested file, whether the tested file is a virus file is detected based on the virus rule base. Otherwise, as long as the tested file is not intercepted by the primary bloom filter, the tested file is determined to be a normal file, and virus detection is not carried out on the tested file.
Compared with the method that the matching is directly carried out through the virus rule base, the detection speed of the bloom filter cluster is faster, the bloom filter is used for filtering, and as long as any one stage of bloom filter detects that the file to be detected is not matched with the corresponding rule segment, the file to be detected is directly determined to be a normal file, so that most of files to be detected which are not virus files can be removed as early as possible. For the tested file which is possibly a virus file, after the tested file is detected by the primary bloom filter, the virus is detected by the virus rule base, so that the accuracy of a virus detection result is ensured, and misjudgment is avoided.
Compared with the method only using the primary bloom filter, the primary bloom filter can only filter a part of normal files, and the rest files to be tested are detected through the set database, so that a long detection time is required. And the multi-stage bloom filter is used for filtering, most normal files can be filtered out quickly, the detection speed is increased, and the detection time is less.
And (3) detecting viruses of the files to be detected, which are matched by all bloom filters, through a virus rule base. For the tested file which is possibly a virus file, after the tested file is detected by the primary bloom filter, the virus rule base is used for carrying out virus detection to carry out virus detection, so that the accuracy of a virus detection result is ensured, and erroneous judgment cannot occur.
In the embodiment of the invention, as long as the first stage bloom filter detects that the file to be detected is not matched with the corresponding rule segment, the file to be detected is considered to be a normal file, and the next file to be detected is continuously detected. And if all bloom filters in the bloom filter cluster detect that the file to be detected is matched with the corresponding rule segment, performing virus detection through a virus rule base.
In practical application, when all bloom filters in the bloom filter cluster detect that the file to be detected is matched with the corresponding rule segment, the file to be detected can be directly determined to be a virus file. Or when a bloom filter of a certain stage in the bloom filter cluster detects that the file to be detected is matched with the corresponding rule segment, determining that the file to be detected is a virus file. This can increase the speed of virus detection.
The embodiment of the invention processes the file to be tested based on the bloom filter cluster, wherein the bloom filter cluster comprises at least two stages of bloom filters, each stage of bloom filter in the at least two stages of bloom filters respectively corresponds to rule fragments with different lengths in the virus rule base, and each stage of bloom filter is sequentially used for detecting whether the file to be tested is matched with the corresponding rule fragment. And determining whether virus detection is needed through a virus rule base or not based on a matching result of the file to be detected. According to the embodiment of the invention, the files to be detected are processed through the at least two stages of bloom filters, so that most files to be detected which are not virus files can be removed as soon as possible when virus detection is carried out, the efficiency of virus detection is improved, and the time for virus detection is reduced.
Referring to fig. 5, fig. 5 is a schematic diagram of a multi-stage bloom filter provided by an embodiment of the present invention, where the bloom filter cluster in fig. 5 includes 3-stage bloom filters in total, and when detecting viruses on a sample, the virus rule base is not directly queried, but continuous character strings in the sample are obtained through a sliding window, and the continuous character strings are filtered through the bloom filters. If one stage of bloom filter can judge that the sample is a normal sample, the virus detection flow is terminated in advance, so that the time for virus detection is saved. If all bloom filters judge that the sample is a virus sample, virus detection is carried out on the sample through a virus rule base, and if the sample is considered to be the virus sample through the virus rule base, the sample is determined to be the virus sample. For the sample which is possibly a virus file, after the sample is detected by the primary bloom filter, the virus is detected by the virus rule base, so that the accuracy of a virus detection result is ensured, and misjudgment is avoided. The application of the embodiment of the invention can not only improve the efficiency of virus detection, but also ensure the accuracy of the virus detection result.
Referring to fig. 6, fig. 6 is a flow chart of creating a multi-stage bloom filter, i.e. bloom filter cluster provided by an application embodiment of the present invention, where the bloom filter cluster in fig. 6 includes 3 stage bloom filters in total, and for each stage bloom filter in the multi-stage bloom filter, rule segments of virus rules are extracted from a virus rule base according to a preset length. The length of the rule segment corresponding to the 1-level filter is k1, the length of the rule segment corresponding to the 2-level filter is k2, and the length of the rule segment corresponding to the 3-level filter is k3. According to the extracted regular segments of k1 length and the set hash function, a level 1 bloom filter is instantiated, and similarly, a level 2 bloom filter and a level 3 bloom filter are instantiated. And splicing the 1-stage bloom filter, the 2-stage bloom filter and the 3-stage bloom filter to obtain the multi-stage bloom filter.
Referring to fig. 7, fig. 7 is a schematic diagram of a virus detection flow provided by an application embodiment of the present invention, when a detected file is detected, the detected file is first intercepted by a level 1 bloom filter, and if the level 1 bloom filter cannot intercept the detected file, it is determined that the detected file is a normal file. If the tested file is intercepted by the 1-stage bloom filter, the tested file is intercepted by the 2-stage bloom filter, and if the tested file cannot be intercepted by the 2-stage bloom filter, the tested file is determined to be a normal file. If the tested file is intercepted by the 2-stage bloom filter, the tested file is intercepted by the 3-stage bloom filter, and if the tested file cannot be intercepted by the 3-stage bloom filter, the tested file is determined to be a normal file. If the tested file is intercepted by the 3-level bloom filter, searching whether the tested file comprises virus rules in the virus rule base, and if the tested file does not comprise virus rules in the virus rule base, determining that the tested file is a normal file. If the measured file includes a virus rule in the virus rule base, it is determined that the measured file is a virus file. Wherein, the 1 stage bloom filter corresponds to the 1 stage filter in fig. 7, a hit indicates that the 1 stage filter intercepts the tested file, and a miss indicates that the 1 stage bloom filter does not intercept the tested file. The same applies to the 2-stage filter and the 3-stage filter. The application embodiment of the invention processes the files to be detected through the multistage bloom filter, and when the computer is subjected to virus detection, most of the files to be detected which are not virus files in the computer can be removed as early as possible, thereby improving the efficiency of virus detection and reducing the time of virus detection.
It should be understood that the sequence number of each step in the foregoing embodiment does not mean that the execution sequence of each process should be determined by the function and the internal logic, and should not limit the implementation process of the embodiment of the present invention.
It should be understood that the terms "comprises" and "comprising," when used in this specification and the appended claims, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.
The technical schemes described in the embodiments of the present invention may be arbitrarily combined without any collision.
In addition, in the embodiments of the present invention, "first", "second", etc. are used to distinguish similar objects and are not necessarily used to describe a particular order or precedence.
Referring to fig. 8, fig. 8 is a schematic diagram of a virus detection device according to an embodiment of the present invention, as shown in fig. 8, the device includes: a processing module and a determining module.
The processing module is used for processing the file to be detected based on a bloom filter cluster, the bloom filter cluster comprises at least two stages of bloom filters, each stage of bloom filter in the at least two stages of bloom filters corresponds to rule fragments with different lengths in a virus rule base respectively, and each stage of bloom filter is sequentially used for detecting whether the file to be detected is matched with the corresponding rule fragment;
And the determining module is used for determining whether virus detection is needed through the virus rule base or not based on the matching result of the file to be detected.
The determining module is specifically configured to:
and under the condition that the matching result represents that the file to be detected is matched with the rule segments corresponding to the bloom filters of all stages, virus detection is carried out on the file to be detected based on the virus rule base.
The determining module is specifically configured to:
and under the condition that the matching result represents that the file to be tested is not matched with the rule segment corresponding to any stage of bloom filter in each stage of bloom filter, determining that the file to be tested is a normal file.
The apparatus further comprises:
the receiving module is used for receiving the detection result from the virus rule base; the virus rule base is used for detecting whether the file to be detected is matched with the virus rules in the virus rule base.
The processing module is specifically used for:
and under the condition that any stage of bloom filter in the bloom filter cluster detects that the file to be detected matches the corresponding rule segment, the next stage of bloom filter is used for matching.
The apparatus further comprises:
the creation module is used for creating the at least two stages of bloom filters based on the virus rule base;
And the splicing module is used for splicing the at least two stages of bloom filters according to the sequence from short length to long length based on the lengths of the regular fragments corresponding to the bloom filters at each stage to obtain the bloom filter cluster.
The processing module is specifically used for:
acquiring continuous character strings in the file to be tested based on a sliding window, and matching the continuous character strings with rule fragments corresponding to all levels of bloom filters to determine whether the file to be tested is matched with the corresponding rule fragments; the window width of the sliding window is the same as the length of the regular segment corresponding to each stage of bloom filter.
In practice, the processing module and the determining module may be implemented by a processor in an electronic device, such as a central processing unit (CPU, central Processing Unit), a digital signal processor (DSP, digital Signal Processor), a micro control unit (MCU, microcontrollerUnit) or a programmable gate array (FPGA, field-Programmable GateArray), etc.
It should be noted that: in the virus detection device provided in the above embodiment, only the division of the above modules is used as an example, and in practical application, the above processing allocation may be performed by different modules according to needs, that is, the internal structure of the device is divided into different modules, so as to complete all or part of the above processing. In addition, the virus detection device and the virus detection method provided in the foregoing embodiments belong to the same concept, and specific implementation processes thereof are detailed in the method embodiments and are not described herein.
Based on the hardware implementation of the program modules, and in order to implement the method of the embodiment of the application, the embodiment of the application also provides an electronic device. Fig. 9 is a schematic diagram of a hardware composition structure of an electronic device according to an embodiment of the present application, as shown in fig. 9, the electronic device includes:
a communication interface capable of information interaction with other devices such as a network device and the like;
and the processor is connected with the communication interface so as to realize information interaction with other equipment and is used for executing the method provided by one or more technical schemes on the electronic equipment side when the computer program is run. And the computer program is stored on the memory.
Of course, in practice, the various components in the electronic device are coupled together by a bus system. It will be appreciated that a bus system is used to enable connected communications between these components. The bus system includes a power bus, a control bus, and a status signal bus in addition to the data bus. But for clarity of illustration the various buses are labeled as bus systems in fig. 9.
The memory in the embodiments of the present application is used to store various types of data to support the operation of the electronic device. Examples of such data include: any computer program for operating on an electronic device.
It will be appreciated that the memory can be either volatile memory or nonvolatile memory, and can include both volatile and nonvolatile memory. Wherein the nonvolatile Memory may be Read Only Memory (ROM), programmable Read Only Memory (PROM, programmable Read-Only Memory), erasable programmable Read Only Memory (EPROM, erasable Programmable Read-Only Memory), electrically erasable programmable Read Only Memory (EEPROM, electrically Erasable Programmable Read-Only Memory), magnetic random access Memory (FRAM, ferromagnetic random access Memory), flash Memory (flash Memory), magnetic surface Memory, optical disk, or compact disk Read Only (CD-ROM, compact Disc Read-Only Memory); the magnetic surface memory may be a disk memory or a tape memory. The volatile Memory may be a random access Memory (RAM, randomAccess Memory) that acts as an external cache. By way of example, and not limitation, many forms of RAM are available, such as static random access memory (SRAM, static Random Access Memory), synchronous static random access memory (SSRAM, synchronous Static RandomAccess Memory), dynamic random access memory (DRAM, dynamic Random Access Memory), synchronous dynamic random access memory (SDRAM, synchronous Dynamic RandomAccess Memory), double data rate synchronous dynamic random access memory (ddr SDRAM, double Data Rate Synchronous Dynamic Random Access Memory), enhanced synchronous dynamic random access memory (ESDRAM, enhanced Synchronous Dynamic RandomAccess Memory), synchronous link dynamic random access memory (SLDRAM, syncLink Dynamic Random Access Memory), direct memory bus random access memory (DRRAM, direct Rambus RandomAccess Memory). The memory described in the embodiments of the present application is intended to comprise, without being limited to, these and any other suitable types of memory.
The method disclosed in the embodiments of the present application may be applied to a processor or implemented by a processor. The processor may be an integrated circuit chip having signal processing capabilities. In implementation, the steps of the above method may be performed by integrated logic circuits of hardware in a processor or by instructions in the form of software. The processor may be a general purpose processor, DSP, or other programmable logic device, discrete gate or transistor logic device, discrete hardware components, or the like. The processor may implement or perform the methods, steps, and logic blocks disclosed in embodiments of the present application. The general purpose processor may be a microprocessor or any conventional processor or the like. The steps of the method disclosed in the embodiments of the present application may be directly embodied in a hardware decoding processor or implemented by a combination of hardware and software modules in the decoding processor. The software modules may be located in a storage medium having a memory, and the processor reads the program in the memory and performs the steps of the method in combination with its hardware.
Optionally, when the processor executes the program, a corresponding flow implemented by the electronic device in each method of the embodiments of the present application is implemented, and for brevity, will not be described herein again.
In an exemplary embodiment, the present application further provides a storage medium, i.e. a computer storage medium, in particular a computer readable storage medium, for example comprising a first memory storing a computer program, which is executable by a processor of an electronic device to perform the steps of the aforementioned method. The computer readable storage medium may be FRAM, ROM, PROM, EPROM, EEPROM, flash Memory, magnetic surface Memory, optical disk, or CD-ROM.
In the several embodiments provided in the present application, it should be understood that the disclosed apparatus, electronic device, and method may be implemented in other manners. The above described device embodiments are only illustrative, e.g. the division of the units is only one logical function division, and there may be other divisions in practice, such as: multiple units or components may be combined or may be integrated into another system, or some features may be omitted, or not performed. In addition, the various components shown or discussed may be coupled or directly coupled or communicatively coupled to each other via some interface, whether indirectly coupled or communicatively coupled to devices or units, whether electrically, mechanically, or otherwise.
The units described as separate units may or may not be physically separate, and units displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units; some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.
In addition, each functional unit in each embodiment of the present application may be integrated in one processing unit, or each unit may be separately used as one unit, or two or more units may be integrated in one unit; the integrated units may be implemented in hardware or in hardware plus software functional units.
Those of ordinary skill in the art will appreciate that: all or part of the steps for implementing the above method embodiments may be implemented by hardware associated with program instructions, where the foregoing program may be stored in a computer readable storage medium, and when executed, the program performs steps including the above method embodiments; and the aforementioned storage medium includes: a removable storage device, ROM, RAM, magnetic or optical disk, or other medium capable of storing program code.
Alternatively, the integrated units described above may be stored in a computer readable storage medium if implemented in the form of software functional modules and sold or used as a stand-alone product. Based on such understanding, the technical solutions of the embodiments of the present application may be essentially or partly contributing to the prior art, and the computer software product may be stored in a storage medium, and include several instructions to cause a computer device (which may be a personal computer, a server, or a network device, etc.) to execute all or part of the methods described in the embodiments of the present application. And the aforementioned storage medium includes: a removable storage device, ROM, RAM, magnetic or optical disk, or other medium capable of storing program code.
The technical solutions described in the embodiments of the present application may be arbitrarily combined without any conflict.
In addition, in the examples of this application, "first," "second," etc. are used to distinguish similar objects and not necessarily to describe a particular order or sequence.
The foregoing is merely specific embodiments of the present application, but the scope of the present application is not limited thereto, and any person skilled in the art can easily think about changes or substitutions within the technical scope of the present application, and the changes and substitutions are intended to be covered by the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims (9)

1. A method of virus detection, the method comprising:
processing a file to be tested based on a bloom filter cluster, wherein the bloom filter cluster comprises at least two stages of bloom filters, each stage of bloom filter in the at least two stages of bloom filters respectively corresponds to rule fragments with different lengths in a virus rule base, and each stage of bloom filter is sequentially used for detecting whether the file to be tested is matched with the corresponding rule fragment; the processing of the files to be tested based on the bloom filter cluster comprises the following steps: inputting the file to be tested into a first stage bloom filter in the bloom filter cluster, wherein the length of a rule segment corresponding to the first stage bloom filter is shortest;
determining whether virus detection is needed through the virus rule base or not based on the matching result of the file to be detected;
before processing the file to be tested based on the bloom filter cluster, the method further comprises:
creating the at least two stages of bloom filters based on the virus rule base;
and splicing the at least two stages of bloom filters according to the sequence from short length to long length based on the lengths of the regular fragments corresponding to the bloom filters at each stage to obtain the bloom filter cluster.
2. The method of claim 1, wherein determining whether virus detection by the virus rule base is required based on the matching result of the file to be tested, comprises:
and under the condition that the matching result represents that the file to be detected is matched with the rule segments corresponding to the bloom filters of all stages, virus detection is carried out on the file to be detected based on the virus rule base.
3. The method of claim 1, wherein determining whether virus detection by the virus rule base is required based on the matching result of the file to be tested, comprises:
and under the condition that the matching result represents that the file to be tested is not matched with the rule segment corresponding to any stage of bloom filter in each stage of bloom filter, determining that the file to be tested is a normal file.
4. The method according to claim 2, wherein the virus detection of the file under test based on the virus rule base comprises:
receiving a detection result from the virus rule base; the virus rule base is used for detecting whether the file to be detected is matched with the virus rules in the virus rule base.
5. The method according to any one of claims 1 to 2, wherein the processing the file to be tested based on bloom filter clusters comprises:
and under the condition that any stage of bloom filter in the bloom filter cluster detects that the file to be detected matches the corresponding rule segment, the next stage of bloom filter is used for matching.
6. A method according to any one of claims 1 to 3, wherein when each stage of bloom filter detects whether the file under test matches the corresponding rule segment, the method comprises:
acquiring continuous character strings in the file to be tested based on a sliding window, and matching the continuous character strings with rule fragments corresponding to all levels of bloom filters to determine whether the file to be tested is matched with the corresponding rule fragments; the window width of the sliding window is the same as the length of the regular segment corresponding to each stage of bloom filter.
7. A virus detection device, comprising:
the processing module is used for processing the file to be detected based on a bloom filter cluster, the bloom filter cluster comprises at least two stages of bloom filters, each stage of bloom filter in the at least two stages of bloom filters corresponds to rule fragments with different lengths in a virus rule base respectively, and each stage of bloom filter is sequentially used for detecting whether the file to be detected is matched with the corresponding rule fragment; the processing module is specifically configured to: inputting the file to be tested into a first stage bloom filter in the bloom filter cluster, wherein the length of a rule segment corresponding to the first stage bloom filter is shortest;
The determining module is used for determining whether virus detection is needed through the virus rule base or not based on the matching result of the file to be detected;
the creation module is used for creating the at least two stages of bloom filters based on the virus rule base;
and the splicing module is used for splicing the at least two stages of bloom filters according to the sequence from short length to long length based on the lengths of the regular fragments corresponding to the bloom filters at each stage to obtain the bloom filter cluster.
8. An electronic device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, wherein the processor implements the virus detection method according to any one of claims 1 to 6 when executing the computer program.
9. A computer readable storage medium, characterized in that the computer readable storage medium stores a computer program comprising program instructions which, when executed by a processor, cause the processor to perform the virus detection method according to any one of claims 1 to 6.
CN202110335752.3A 2021-03-29 2021-03-29 Virus detection method and device, electronic equipment and storage medium Active CN113051566B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110335752.3A CN113051566B (en) 2021-03-29 2021-03-29 Virus detection method and device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110335752.3A CN113051566B (en) 2021-03-29 2021-03-29 Virus detection method and device, electronic equipment and storage medium

Publications (2)

Publication Number Publication Date
CN113051566A CN113051566A (en) 2021-06-29
CN113051566B true CN113051566B (en) 2023-07-14

Family

ID=76516132

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110335752.3A Active CN113051566B (en) 2021-03-29 2021-03-29 Virus detection method and device, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN113051566B (en)

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN100396057C (en) * 2005-10-21 2008-06-18 清华大学 High speed block detecting method based on stated filter engine
CN101398820B (en) * 2007-09-24 2010-11-17 北京启明星辰信息技术股份有限公司 Large scale key word matching method
CN101848222B (en) * 2010-05-28 2013-05-01 武汉烽火网络有限责任公司 Inspection method and device of Internet deep packet
US8949371B1 (en) * 2011-09-29 2015-02-03 Symantec Corporation Time and space efficient method and system for detecting structured data in free text
CN104850656B (en) * 2015-06-05 2018-04-10 中国信息安全研究院有限公司 A kind of dynamic self-adapting multistage Bloom filter device
CN110865982A (en) * 2019-11-19 2020-03-06 深信服科技股份有限公司 Data matching method and device, electronic equipment and storage medium

Also Published As

Publication number Publication date
CN113051566A (en) 2021-06-29

Similar Documents

Publication Publication Date Title
US20210256127A1 (en) System and method for automated machine-learning, zero-day malware detection
US9171153B2 (en) Bloom filter with memory element
US8955120B2 (en) Flexible fingerprint for detection of malware
Drew et al. Polymorphic malware detection using sequence classification methods
CN111382434B (en) System and method for detecting malicious files
EP3346664B1 (en) Binary search of byte sequences using inverted indices
Breitinger et al. Performance issues about context-triggered piecewise hashing
US11080398B2 (en) Identifying signatures for data sets
US11687534B2 (en) Method and system for detecting sensitive data
Naik et al. Evaluating automatically generated YARA rules and enhancing their effectiveness
KR20210054799A (en) Method and apparatus for generating summary of url for url clustering
CN113051568A (en) Virus detection method and device, electronic equipment and storage medium
CN113051566B (en) Virus detection method and device, electronic equipment and storage medium
CN109858249A (en) The quick, intelligent comparison of mobile Malware big data and safety detection method
Oliver et al. Designing the elements of a fuzzy hashing scheme
US11223641B2 (en) Apparatus and method for reconfiguring signature
CN108491718B (en) Method and device for realizing information classification
CN112347477A (en) Family variant malicious file mining method and device
CN113051569B (en) Virus detection method and device, electronic equipment and storage medium
CN111159490A (en) Method, device and equipment for processing mode character string
EP2819054B1 (en) Flexible fingerprint for detection of malware
CN111310176B (en) Intrusion detection method and device based on feature selection
KR101508577B1 (en) Device and method for detecting malware
Mahmud et al. An improved hashing approach for biological sequence to solve exact pattern matching problems
KR101465132B1 (en) Method for acceleration of deep packet inspection using a multi-byte processing prefilter

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant