CN104978526A - Virus signature extraction method and apparatus - Google Patents

Virus signature extraction method and apparatus Download PDF

Info

Publication number
CN104978526A
CN104978526A CN201510378081.3A CN201510378081A CN104978526A CN 104978526 A CN104978526 A CN 104978526A CN 201510378081 A CN201510378081 A CN 201510378081A CN 104978526 A CN104978526 A CN 104978526A
Authority
CN
China
Prior art keywords
eigenwert
virus
computing equipment
similarity
eigenwerts
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201510378081.3A
Other languages
Chinese (zh)
Other versions
CN104978526B (en
Inventor
唐海
陈卓
杨康
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Hongxiang Technical Service Co Ltd
Original Assignee
Beijing Qihoo Technology Co Ltd
Qizhi Software Beijing Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Qihoo Technology Co Ltd, Qizhi Software Beijing Co Ltd filed Critical Beijing Qihoo Technology Co Ltd
Priority to CN201510378081.3A priority Critical patent/CN104978526B/en
Publication of CN104978526A publication Critical patent/CN104978526A/en
Application granted granted Critical
Publication of CN104978526B publication Critical patent/CN104978526B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/50Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
    • G06F21/55Detecting local intrusion or implementing counter-measures
    • G06F21/56Computer malware detection or handling, e.g. anti-virus arrangements
    • G06F21/561Virus type analysis

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Security & Cryptography (AREA)
  • Health & Medical Sciences (AREA)
  • Virology (AREA)
  • Computer Hardware Design (AREA)
  • General Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Abstract

The invention provides a virus signature extraction method and apparatus. The method comprises the steps of: obtaining a plurality of virus samples; dividing the virus samples into at least one type, and enabling the similarity between signature values of any two virus samples belonging to the same type to be greater than or equal to a preset threshold value; and for each type, extracting common signature of all virus samples belonging to the type. According to the virus signature extraction method and apparatus, the problem in the prior art that relatively fixed classification rules very easily cause redundancy of information in a virus base can be solved, the redundancy of the information in the virus base is greatly reduced, a virus classification mechanism can be improved, and the virus detection accuracy and efficiency are improved.

Description

The extracting method of virus characteristic and device
Technical field
The present invention relates to security technology area, be specifically related to a kind of extracting method and device of virus characteristic.
Background technology
As a kind of special data message, virus characteristic is the key distinguishing virus document, virus infections file and normal file, is usually recorded in virus base.Nearly all antivirus software all needs to judge whether file is virus document according to the virus characteristic recorded in virus base, and whether file infects virus.
In prior art, other division of virus type is normally carried out according to general classifying rules.Such as name is called that the prefix " Trojan " in the virus of " Trojan.Downloader-1420 " represents this virus and belongs to trojan horse, and " Downloader-1420 " the i.e. distinguishing identifier of this virus and other trojan horses.Based on relatively-stationary classifying rules, prior art just can carry out the extraction of virus characteristic respectively to different classes of virus.
But relatively-stationary classifying rules is but easy to the redundancy causing information in virus base.For example, in fact two close viruses have common virus characteristic, but are classified as different classifications for some reason in classifying rules, thus in virus base separately with different virus characteristics by record respectively.In virus base, the redundancy of information can cause repeatability in large quantities to judge in virus detection procedure, is unfavorable for the lifting of Viral diagnosis efficiency.
Summary of the invention
For defect of the prior art, the invention provides a kind of extracting method and device of virus characteristic, relatively-stationary classifying rules in prior art can be solved and be easy to the problem of the redundancy causing information in virus base.
First aspect, the invention provides a kind of extraction element of virus characteristic, comprising:
Acquiring unit, for obtaining several Virus Samples;
Taxon, is divided at least one classification for several Virus Samples obtained by described acquiring unit, is more than or equal to a predetermined threshold value to make the similarity belonged between the eigenwert of other any two Virus Samples of same class;
Extraction unit, for each classification obtained for described taxon, extracts the common trait belonging to such other all Virus Samples.
Alternatively, the eigenwert of described Virus Sample is the fuzzy Hash eigenwert of this Virus Sample under file layout.
Alternatively, described taxon specifically comprises:
Acquisition module, for obtaining the eigenwert of several Virus Samples that described acquiring unit obtains, with composition characteristic value set;
Estimation module, for estimating the computing velocity of each available computing equipment;
Replicated blocks, before being less than predetermined threshold value, repeatedly perform following step for the similarity between two eigenwerts any in described characteristic value collection:
All eigenwerts in described characteristic value collection are distributed at least one computing equipment by the computing velocity of each the available computing equipment obtained according to described estimation module, to make at least one computing equipment described screen the eigenwert be assigned under the processing time meets pre-conditioned prerequisite, the similarity between any two eigenwerts is made to be less than described predetermined threshold value.
Alternatively, described estimation module specifically comprises:
Send submodule, eigenwert for the predetermined number obtained by described acquiring unit sends to arbitrary available computing equipment, to make the eigenwert of this computing equipment to described predetermined number screen, the similarity between any two eigenwerts is made to be less than described predetermined threshold value;
Obtain submodule, for obtaining the processing time of this computing equipment, to obtain the estimated value of the computing velocity of each available computing equipment described.
Alternatively, described replicated blocks specifically comprise:
Determine submodule, for the computing velocity of each available computing equipment that obtains according to described estimation module and the described pre-conditioned quantity determining to distribute to the eigenwert of each computing equipment;
Send submodule, for determining that all eigenwerts in described characteristic value collection are distributed at least one computing equipment by the quantity of the eigenwert that submodule obtains according to described, to make at least one computing equipment described screen the eigenwert be assigned to, the similarity between any two eigenwerts is made to be less than described predetermined threshold value;
Receive submodule, for receiving the eigenwert after from the screening of at least one computing equipment described, to upgrade described characteristic value collection.
Alternatively, the described eigenwert to being assigned to is screened, and makes the similarity between any two eigenwerts be less than described predetermined threshold value, specifically comprises:
An eigenwert is retained, and successively following steps is performed to remaining all eigenwert:
The similarity of judging characteristic value whether and between the arbitrary eigenwert retained is more than or equal to described predetermined threshold value;
If so, then this eigenwert is removed;
If not, then this eigenwert is retained.
Alternatively, described taxon also comprises:
Sending module, for all samples to be clustered are divided into some parts, and send to several computing equipments respectively together with the characteristic value collection obtained with described replicated blocks, with the similarity making described computing equipment calculate all eigenwerts in the eigenwert of each sample and described characteristic value collection successively, and the classification corresponding to eigenwert that similarity between the eigenwert being this sample by each sample labeling is maximum;
Receiver module, for receiving the category label of each sample from several computing equipments described, to classify to all samples to be clustered.
Alternatively, describedly pre-conditionedly to comprise:
The described processing time of arbitrary computing equipment is less than the first preset value;
And/or,
The described processing time of all computing equipments reaches unanimity;
And/or,
When eigenwert quantity in described characteristic value collection is greater than the second preset value, described processing time of arbitrary described computing equipment levels off to the 3rd preset value.
Second aspect, present invention also offers a kind of extracting method of virus characteristic, comprising:
Obtain several Virus Samples;
Several Virus Samples described are divided at least one classification, are more than or equal to a predetermined threshold value to make the similarity belonged between the eigenwert of other any two Virus Samples of same class;
For each classification, extract the common trait belonging to such other all Virus Samples.
Alternatively, the eigenwert of described Virus Sample is the fuzzy Hash eigenwert of this Virus Sample under file layout.
As shown from the above technical solution, the present invention is in conjunction with clustering method, before extracting virus characteristic, first all Virus Samples are carried out to the classification of similarity, thus the virus of close classification can be avoided to be extracted different virus characteristics respectively, thus the present invention not only can solve relatively-stationary classifying rules in prior art and be easy to the problem of the redundancy causing information in virus base, greatly reduce the information redundancy in virus base, the classification mechanism improving virus can also be contributed to, promote order of accuarcy and the detection efficiency of Viral diagnosis.
In instructions of the present invention, describe a large amount of detail.But can understand, embodiments of the invention can be put into practice when not having these details.In some instances, be not shown specifically known method, structure and technology, so that not fuzzy understanding of this description.
Similarly, be to be understood that, to disclose and to help to understand in each inventive aspect one or more to simplify the present invention, in the description above to exemplary embodiment of the present invention, each feature of the present invention is grouped together in single embodiment, figure or the description to it sometimes.But, the method for the disclosure should not explained the following intention in reflection: namely the present invention for required protection requires feature more more than the feature clearly recorded in each claim.Or rather, as claims below reflect, all features of disclosed single embodiment before inventive aspect is to be less than.Therefore, the claims following embodiment are incorporated to this embodiment thus clearly, and wherein each claim itself is as independent embodiment of the present invention.
It will be understood by those skilled in the art that adaptively to change the module in the equipment in embodiment and they are arranged and be in one or more equipment that this embodiment is different.Module in embodiment or unit or assembly can be combined into a module or unit or assembly, and multiple submodule or subelement or sub-component can be put them in addition.Except at least some in such feature and/or process or unit is mutually exclusive part, any combination can be adopted to combine all processes of all features disclosed in this instructions (comprising adjoint claim, summary and accompanying drawing) and so disclosed any method or equipment or unit.Unless expressly stated otherwise, each feature disclosed in this instructions (comprising adjoint claim, summary and accompanying drawing) can by providing identical, alternative features that is equivalent or similar object replaces.
In addition, those skilled in the art can understand, although embodiments more described herein to comprise in other embodiment some included feature instead of further feature, the combination of the feature of different embodiment means and to be within scope of the present invention and to form different embodiments.Such as, in the following claims, the one of any of embodiment required for protection can use with arbitrary array mode.
All parts embodiment of the present invention with hardware implementing, or can realize with the software module run on one or more processor, or realizes with their combination.It will be understood by those of skill in the art that the some or all functions that microprocessor or digital signal processor (DSP) can be used in practice to realize according to the some or all parts in the equipment of a kind of browser terminal of the embodiment of the present invention.The present invention can also be embodied as part or all equipment for performing method as described herein or device program (such as, computer program and computer program).Realizing program of the present invention and can store on a computer-readable medium like this, or the form of one or more signal can be had.Such signal can be downloaded from internet website and obtain, or provides on carrier signal, or provides with any other form.
The present invention will be described instead of limit the invention to it should be noted above-described embodiment, and those skilled in the art can design alternative embodiment when not departing from the scope of claims.In the claims, any reference symbol between bracket should be configured to limitations on claims.Word " comprises " not to be got rid of existence and does not arrange element in the claims or step.Word "a" or "an" before being positioned at element is not got rid of and be there is multiple such element.The present invention can by means of including the hardware of some different elements and realizing by means of the computing machine of suitably programming.In the unit claim listing some devices, several in these devices can be carry out imbody by same hardware branch.Word first, second and third-class use do not represent any order.Can be title by these word explanations.
Last it is noted that above each embodiment is only in order to illustrate technical scheme of the present invention, be not intended to limit; Although with reference to foregoing embodiments to invention has been detailed description, those of ordinary skill in the art is to be understood that: it still can be modified to the technical scheme described in foregoing embodiments, or carries out equivalent replacement to wherein some or all of technical characteristic; And these amendments or replacement, do not make the essence of appropriate technical solution depart from the scope of various embodiments of the present invention technical scheme, it all should be encompassed in the middle of the scope of claim of the present invention and instructions.
Accompanying drawing explanation
In order to be illustrated more clearly in the embodiment of the present invention or technical scheme of the prior art, simply introduce doing one to the accompanying drawing used required in embodiment or description of the prior art below, apparently, accompanying drawing in the following describes is some embodiments of the present invention, for those of ordinary skill in the art, under the prerequisite not paying creative work, other accompanying drawing can also be obtained according to these accompanying drawings.
Fig. 1 is the steps flow chart schematic diagram of the extracting method of a kind of virus characteristic in one embodiment of the invention;
Fig. 2 is a kind of steps flow chart schematic diagram of classifying to several Virus Samples in one embodiment of the invention;
Fig. 3 is a kind of steps flow chart schematic diagram carrying out cluster calculation in one embodiment of the invention;
Fig. 4 is a kind of steps flow chart schematic diagram estimating arithmetic speed in one embodiment of the invention;
Fig. 5 is the structured flowchart of the extraction element of a kind of virus characteristic in one embodiment of the invention.
Embodiment
For making the object of the embodiment of the present invention, technical scheme and advantage clearly, below in conjunction with the accompanying drawing in the embodiment of the present invention, technical scheme in the embodiment of the present invention is clearly and completely described, obviously, described embodiment is the present invention's part embodiment, instead of whole embodiments.Based on the embodiment in the present invention, those of ordinary skill in the art, not making the every other embodiment obtained under creative work prerequisite, belong to the scope of protection of the invention.
It should be noted that in describing the invention, term " on ", the orientation of the instruction such as D score or position relationship be based on orientation shown in the drawings or position relationship, only the present invention for convenience of description and simplified characterization, instead of indicate or imply that the device of indication or element must have specific orientation, with specific azimuth configuration and operation, therefore can not be interpreted as limitation of the present invention.Unless otherwise clearly defined and limited, term " installation ", " being connected ", " connection " should be interpreted broadly, and such as, can be fixedly connected with, also can be removably connect, or connect integratedly; Can be mechanical connection, also can be electrical connection; Can be directly be connected, also indirectly can be connected by intermediary, can be the connection of two element internals.For the ordinary skill in the art, above-mentioned term concrete meaning in the present invention can be understood as the case may be.
Fig. 1 is the steps flow chart schematic diagram of the extracting method of a kind of virus characteristic in one embodiment of the invention.See Fig. 1, the method comprises:
Step 101: obtain several Virus Samples;
Step 102: several Virus Samples above-mentioned are divided at least one classification, is more than or equal to a predetermined threshold value to make the similarity belonged between the eigenwert of other any two Virus Samples of same class;
Step 103: for each classification, extracts the common trait belonging to such other all Virus Samples.
It should be understood that the Virus Sample obtained in above-mentioned steps 101 can specifically be identified as to be virus document or to be identified as by the file of virus infections, and quantity can be very huge.It will also be appreciated that in the embodiment of the present invention, the division of classification based on each sample to be clustered eigenwert and between similarity.Particularly, in different embodiments of the invention, above-mentioned eigenwert can be specially the cryptographic hash (output the regular length that the input of random length is transformed into by certain hash algorithm) under a kind of arbitrary form of sample, such as distinguishes the MD4 value of corresponding a kind of existing hash algorithm, MD5 value, SHA1 value, N-Hash value, RIPE-MD value or HAVAL value etc.; Accordingly, the calculating of the similarity between eigenwert can be realized by the difference degree between the cryptographic hash that compares two samples, and it is well-known to those skilled in the art, does not repeat them here.
As can be seen here, the embodiment of the present invention is in conjunction with clustering method, before extracting virus characteristic, first all Virus Samples are carried out to the classification of similarity, thus the virus of close classification can be avoided to be extracted different virus characteristics respectively, thus the embodiment of the present invention not only can solve relatively-stationary classifying rules in prior art and be easy to the problem of the redundancy causing information in virus base, greatly reduce the information redundancy in virus base, the classification mechanism improving virus can also be contributed to, promote order of accuarcy and the detection efficiency of Viral diagnosis.
As the preferred example of one, above-mentioned eigenwert is the fuzzy hash value of Virus Sample under default file form.Wherein, fuzzy hash algorithm is also called burst hash algorithm (the context triggered piecewise hashing of content-based segmentation, CTPH), cardinal principle is the weak Hash calculation file local content of use one, under given conditions burst is carried out to file, then a strong Hash is used to calculate cryptographic hash to every sheet file, get a part for these values and couple together, a fuzzy Hash result is formed together with fragmented condition, re-use a string-similarity contrast algorithm afterwards and judge the similarity of two fuzzy hash value has how many, just can judge the similarity degree of two files.Concrete fuzzy hash algorithm can be recorded with reference to document of the prior art, does not repeat them here.Be understandable that, the impact of variations in detail in Virus Sample on global outcome can be limited in local by fuzzy hash algorithm effectively that adopt in the embodiment of the present invention, thus effective judgement is made to final similarity, thus ensure the validity of Virus Sample cluster result further.
For example, for the variant virus V2 of viral V1 and viral V1, both files are only different in some places details, make the condition code for killing virus V1 can not be used for the killing of viral V2.Thus based on existing relatively-stationary classifying rules, viral V1 and viral V2 can be classified as different classifications usually, thus in virus base separately with different virus characteristics by record respectively.But based on the fuzzy hash algorithm that the embodiment of the present invention adopts, the impact that local detail change between virus V1 and viral V2 causes for final similarity is little, thus still can be classified as same classification and adopt same feature to carry out killing, the redundancy in virus base can be reduced compared to prior art, promoting the detection efficiency of virus.
Further, as one example more specifically, above-mentioned steps 102: several Virus Samples above-mentioned are divided at least one classification, to make the similarity belonged between the eigenwert of other any two Virus Samples of same class be more than or equal to a predetermined threshold value, following steps flow chart as shown in Figure 2 specifically can be comprised:
Step 1021: the eigenwert obtaining several Virus Samples above-mentioned, with composition characteristic value set;
Step 1022: the computing velocity estimating each available computing equipment;
Step 1023: whether the similarity in judging characteristic value set between any two eigenwerts is all less than predetermined threshold value;
Step 1024: if not, then according to the computing velocity of each available computing equipment above-mentioned, all eigenwerts in above-mentioned characteristic value collection are distributed at least one computing equipment, the eigenwert be assigned to is screened under the processing time meets pre-conditioned prerequisite to make at least one computing equipment above-mentioned, make the similarity between any two eigenwerts be less than above-mentioned predetermined threshold value, and return step 1023.
Above-mentioned steps 1021 specifically can comprise the eigenwert calculating each Virus Sample, and all eigenwerts being formed the process of a characteristic value collection, above-mentioned steps 1022 then specifically can comprise the process of the computing velocity being obtained each available computing equipment by any means.Be understandable that, here computing velocity specifically refers to the time that computing equipment carries out required for cluster calculation to the sample of some, therefore the estimation of computing velocity can be calculated by the hardware parameter of computing equipment and obtain, also can obtain according to actual test result, can also be the combination of above-mentioned two kinds of modes.It should be understood that there is not inevitable logic sequencing between above-mentioned steps 1021 and above-mentioned steps 1022, therefore can not limit on execution sequence to each other.
Above-mentioned steps 1023 and step 1024 form a circulation: before the similarity in characteristic value collection between any two eigenwerts is less than predetermined threshold value, repeatedly perform:
All eigenwerts in characteristic value collection are distributed at least one computing equipment by the computing velocity according to each available computing equipment above-mentioned, to make at least one computing equipment described screen the eigenwert be assigned under the processing time meets pre-conditioned prerequisite, the similarity between any two eigenwerts is made to be less than above-mentioned predetermined threshold value.
For example, above-mentioned steps 1024: all eigenwerts in above-mentioned characteristic value collection are distributed at least one computing equipment by the computing velocity according to each available computing equipment above-mentioned, the eigenwert be assigned to is screened under the processing time meets pre-conditioned prerequisite to make at least one computing equipment above-mentioned, make the similarity between any two eigenwerts be less than above-mentioned predetermined threshold value, specifically can comprise following step as shown in Figure 3:
Step 1024a: according to computing velocity and the above-mentioned pre-conditioned quantity determining the eigenwert distributing to each computing equipment of each available computing equipment above-mentioned;
Step 1024b: all eigenwerts in above-mentioned characteristic value collection are distributed at least one computing equipment according to the quantity of determined eigenwert, to make at least one computing equipment above-mentioned screen the eigenwert be assigned to, the similarity between any two eigenwerts is made to be less than above-mentioned predetermined threshold value;
Step 1024c: receive the eigenwert after from the screening of at least one computing equipment above-mentioned, to upgrade above-mentioned characteristic value collection.
Wherein, " screening the eigenwert be assigned to; make the similarity between any two eigenwerts be less than above-mentioned predetermined threshold value " in step 1024b is that each computing equipment carries out the actual process of cluster calculation to the eigenwert be assigned to, and namely by certain algorithm, the eigenwert be assigned to is reduced at least one eigenwert dissimilar between two by the mode removing partial feature value.For example, for four eigenwert N1, N2, N3, N4, similarity between eigenwert is as follows: N1:N2=0.9, N1:N3=0.3, N1:N4=0.1, N2:N3=0.4, N2:N4=0.2, N3:N4=0.3, thus when above-mentioned predetermined threshold value is 0.8, N2 is due to too similar to N1 and be removed (being namely classified as same classification), finally obtain the eigenwert after N1, N3, N4 these three screening, the similarity met between any two eigenwerts is less than 0.8 this condition.
Based on this, characteristic value collection can upgrade with many times along with the repetition of step 1024 in step 1024c, finally, when judging in step 1023 that the similarity in characteristic value collection between any two eigenwerts is all less than above-mentioned predetermined threshold value, each eigenwert in characteristic value collection can as the representative of a classification, make before this in screening removed most eigenwert all enough similar to wherein at least one.
On the other hand, as the concrete example of a kind of above-mentioned steps 1024a, above-mentionedly pre-conditionedly can being specially " processing time of arbitrary computing equipment is less than the first preset value; and the processing time of all computing equipments reach unanimity ", thus all eigenwerts in characteristic value collection are when distributing at least one computing equipment, can according to obtain in step 1022 each can the computing velocity of computing equipment determine to distribute to which computing equipment, and be specifically assigned to the quantity of eigenwert of each computing equipment.Such as, the actual treatment time for 10000 eigenwerts is the equipment C1 of 1 hour, the actual treatment time for 15000 eigenwerts is the equipment C2 of 1 hour, and be the equipment C3 of 1 hour for actual treatment time of 6000 eigenwerts, 10000,15000 and 6000 eigenwerts can be distributed respectively, to make equipment C1, C2, C3 reach unanimity under expected processing time is all less than the prerequisite of 1.5 hours, namely meet above-mentioned pre-conditioned.Meanwhile, if when remaining eigenwert is less than 3000 in characteristic value collection when certain primary distribution, can directly all distribute to equipment C2 to carry out the unnecessary increase processing to avoid multiplicity.Certainly, time spent by actual computation process be in advance be difficult to exactly determined, therefore the above-mentioned pre-conditioned ideal conditions just taked when assigned characteristics value, the computing equipment actual processing time used can not strictly meet above-mentioned pre-conditioned.
In addition, above-mentionedly pre-conditionedly any one condition following can be comprised, or the combination of following any number of condition:
Condition F1: the above-mentioned processing time of arbitrary computing equipment is less than the first preset value;
Condition F2: the above-mentioned processing time of all computing equipments reaches unanimity;
Condition F3: when the eigenwert quantity in above-mentioned characteristic value collection is greater than the second preset value, above-mentioned processing time of arbitrary above-mentioned computing equipment levels off to the 3rd preset value.
It should be understood that the renewal due to characteristic value collection is completed jointly by least one computing equipment, after at least one computing equipment all completes process, therefore only just can return step 1023 continue to perform.So the processing time that can ensure each computing equipment that adds of above-mentioned condition F1 is all less than the first preset value, avoids the minority computing equipment processing time long and drags slow whole treatment scheme.And adding of above-mentioned condition F2 can make the processing time of each computing equipment reach unanimity, make all computing equipments complete process under ideal conditions simultaneously, the utilization factor of computing equipment can be improved as much as possible, promote treatment effeciency.And in above-mentioned condition F3, the 3rd preset value can be the numerical value in a predetermined rational processing time under concrete scene.On the one hand, because the eigenwert quantity in characteristic value collection constantly can reduce along with computation process, thus can make to repeat shared computing equipment quantity each time and also constantly reduce thereupon adding of this condition, the service efficiency to computing equipment can be improved thus.On the other hand, adding of this condition can make the T.T. of an execution step 1024 be roughly controlled, and can by adjustment above-mentioned 3rd preset value avoid performing step 1024 successively T.T. long or too short caused treatment effeciency decline.In addition, eigenwert quantity in above-mentioned characteristic value collection is less than or equal to above-mentioned second preset value, that is enough hour of the eigenwert quantity in characteristic value collection, can directly utilize a small amount of computing equipment to process, and avoids the unnecessary increase of multiplicity.
Can find out, reluctant for single computing equipment large-scale data is distributed to different computing equipments based on the estimation of computing velocity and is carried out iterative computation by the embodiment of the present invention, thus can promote counting yield widely; , all carry out based on the identical standard removing similar features value when different computing equipment calculates meanwhile, can effective guarantee Clustering Effect.
As an example more specifically, above-mentioned steps 1022: the computing velocity estimating each available computing equipment, specifically can comprise the following steps flow chart gone out as shown in Figure 4:
Step 1022a: the eigenwert of predetermined number is sent to arbitrary available computing equipment, to make the eigenwert of this computing equipment to above-mentioned predetermined number screen, makes the similarity between any two eigenwerts be less than above-mentioned predetermined threshold value;
Step 1022b: the processing time obtaining this computing equipment, to obtain the estimated value of the computing velocity of each available computing equipment above-mentioned.
For example, above-mentioned steps 1022a can comprise 10000 eigenwerts are sent to equipment C1, with make this equipment C1 to these eigenwerts carry out as in above-mentioned steps 1024 the cluster calculation of carrying out, above-mentioned steps 1022b can comprise the computing velocity that its processing time of acquisition is used as this equipment C1, and the difference on hardware parameter estimates the computing velocity of each available computing equipment according to other computing equipments and equipment C1.Thus, the actual processing time obtained can have enough good representativeness to the computing velocity of this computing equipment, and goes out the computing velocity of all computing equipments by test evaluation, can improve treatment effeciency.Certainly, in other embodiments of the invention, other more accurate or more coarse computing velocitys can be adopted to estimate mode based on different application demands, the present invention does not limit this.
In the middle of any one above-mentioned embodiment, the process of above-mentioned " screen the eigenwert be assigned to, make the similarity between any two eigenwerts be less than above-mentioned predetermined threshold value ", specifically can comprise not shown in the accompanying drawings following steps flow chart:
Step 201 a: eigenwert is retained, and successively following steps are performed to remaining all eigenwert:
Step 202: the similarity of judging characteristic value whether and between the arbitrary eigenwert retained is more than or equal to above-mentioned predetermined threshold value;
Step 203: if so, then this eigenwert is removed;
Step 204: if not, then retain this eigenwert.
For example, for above-mentioned eigenwert N1, N2, N3, N4, first keeping characteristics value N1, and whether the similarity between carry out N1 and N2 to eigenwert N2 is more than or equal to the judgement of predetermined threshold value 0.8, due to N1:N2=0.9, therefore according to step 203, eigenwert N2 is removed.Next, more whether the similarity between carry out N3 and N2 to eigenwert N3 is more than or equal to the judgement of predetermined threshold value 0.8, due to N1:N3=0.3, is therefore retained by eigenwert N3 according to step 204.Finally, whether the similarity between carry out N4 and N1 to eigenwert N4 or between N4 and N3 is more than or equal to the judgement of predetermined threshold value 0.8, due to N1:N4=0.1, N3:N4=0.3, is therefore retained by eigenwert N4 according to step 204.Finally, the eigenwert N1 after screening, N3, N4 is obtained.Certainly, for the eigenwert of greater number, also can process according to identical mode.Based on this, can under the sortord of a default eigenwert, the eigenwert making arbitrary similarity to each other be greater than predetermined threshold value can remove one comparatively, therefore can improve the consistance of the cluster calculation of carrying out between different computing equipment.
On the other hand, on the basis of any one above-mentioned embodiment, above-mentioned steps 102: several Virus Samples above-mentioned are divided at least one classification, to make the similarity belonged between the eigenwert of other any two Virus Samples of same class be more than or equal to a predetermined threshold value, following steps flow chart not shown in figures also can be comprised:
Step 1025: several Virus Samples above-mentioned are divided into some parts, and several computing equipments are sent to respectively together with above-mentioned characteristic value collection, with the similarity making above-mentioned computing equipment calculate all eigenwerts in the eigenwert of each Virus Sample and above-mentioned characteristic value collection successively, and each Virus Sample is labeled as the maximum classification corresponding to eigenwert of similarity between the eigenwert of this Virus Sample;
Step 1026: the category label receiving each Virus Sample from several computing equipments above-mentioned, to classify to several Virus Samples above-mentioned.
Be understood that, each eigenwert in the characteristic value collection obtained after above-mentioned steps 1023 and step 1024 can represent the classification of the sample that is carried out dividing according to the similarity between eigenwert, therefore can be divided in the middle of a classification by all samples to be clustered based on this.And in above-mentioned steps 1025, a similarity minimum value preset all is less than in the event of the similarity of all eigenwerts in the eigenwert of a Virus Sample and characteristic value collection, then the eigenwert of this Virus Sample can be added in characteristic value collection, and this type of sample be processed separately or re-starts after regeneration characteristics value set classification process.
Thus, in above-mentioned steps 103, in same classification, all Virus Samples are all similar (such as mutation each other, or have identical author, script or source code), therefore can obtain by extracting common trait the feature that this classification is different from other classifications.Thus, the viral classification and common trait composition virus base thereof that obtain can be utilized, in order to carry out the killing of virus.
Based on same inventive concept, Fig. 5 is the structured flowchart of the extraction element of a kind of virus characteristic in one embodiment of the invention.See Fig. 5, this device comprises:
Acquiring unit 51, for obtaining several Virus Samples;
Taxon 52, is divided at least one classification for several Virus Samples obtained by above-mentioned acquiring unit 51, is more than or equal to a predetermined threshold value to make the similarity belonged between the eigenwert of other any two Virus Samples of same class;
Extraction unit 53, for each classification obtained for above-mentioned taxon 52, extracts the common trait belonging to such other all Virus Samples.
Be understandable that, this device can perform step 101 in Fig. 1, to the flow process of step 103, thus can be had corresponding function and structure, not repeat them here.It should be noted that to have among all or part of extraction element being included in this virus characteristic in above-mentioned computing equipment, also can be independent of the extraction element of this virus characteristic outside external unit.Be understood that, exchanges data between the extraction element of virus characteristic and above-mentioned computing equipment can based on wire communication or radio communication, and the extraction element of virus characteristic and above-mentioned computing equipment can also be respectively a network node in cable network or wireless network.
Can find out, the embodiment of the present invention is in conjunction with clustering method, before extracting virus characteristic, first all Virus Samples are carried out to the classification of similarity, thus the virus of close classification can be avoided to be extracted different virus characteristics respectively, thus the embodiment of the present invention not only can solve relatively-stationary classifying rules in prior art and be easy to the problem of the redundancy causing information in virus base, greatly reduce the information redundancy in virus base, the classification mechanism improving virus can also be contributed to, promote order of accuarcy and the detection efficiency of Viral diagnosis.
As the preferred example of one, above-mentioned eigenwert is the fuzzy hash value of Virus Sample under default file form.Wherein, fuzzy hash algorithm is also called burst hash algorithm (the context triggered piecewise hashing of content-based segmentation, CTPH), cardinal principle is the weak Hash calculation file local content of use one, under given conditions burst is carried out to file, then a strong Hash is used to calculate cryptographic hash to every sheet file, get a part for these values and couple together, a fuzzy Hash result is formed together with fragmented condition, re-use a string-similarity contrast algorithm afterwards and judge the similarity of two fuzzy hash value has how many, just can judge the similarity degree of two files.Concrete fuzzy hash algorithm can be recorded with reference to document of the prior art, does not repeat them here.Be understandable that, the impact of variations in detail in Virus Sample on global outcome can be limited in local by fuzzy hash algorithm effectively that adopt in the embodiment of the present invention, thus effective judgement is made to final similarity, thus ensure the validity of Virus Sample cluster result further.
For example, for the variant virus V2 of viral V1 and viral V1, both files are only different in some places details, make the condition code for killing virus V1 can not be used for the killing of viral V2.Thus based on existing relatively-stationary classifying rules, viral V1 and viral V2 can be classified as different classifications usually, thus in virus base separately with different virus characteristics by record respectively.But based on the fuzzy hash algorithm that the embodiment of the present invention adopts, the impact that local detail change between virus V1 and viral V2 causes for final similarity is little, thus still can be classified as same classification and adopt same feature to carry out killing, the redundancy in virus base can be reduced compared to prior art, promoting the detection efficiency of virus.
As a kind of example, above-mentioned taxon 52 specifically can comprise unshowned following structure in accompanying drawing:
Acquisition module 521, for obtaining the eigenwert of several Virus Samples that above-mentioned acquiring unit 51 obtains, with composition characteristic value set;
Estimation module 522, for estimating the computing velocity of each available computing equipment;
Replicated blocks 523, before being less than predetermined threshold value, repeatedly perform following step for the similarity between two eigenwerts any in above-mentioned characteristic value collection:
All eigenwerts in above-mentioned characteristic value collection are distributed at least one computing equipment by the computing velocity of each the available computing equipment obtained according to above-mentioned estimation module 522, to make at least one computing equipment above-mentioned screen the eigenwert be assigned under the processing time meets pre-conditioned prerequisite, the similarity between any two eigenwerts is made to be less than above-mentioned predetermined threshold value.
Be understandable that, acquisition module 521, estimation module 522 and replicated blocks 523 can perform step 1021, step 1022 and step 1023 in Fig. 2 respectively, to the flow process of step 1024, thus can be had corresponding function and structure, not repeat them here.
Particularly, as a kind of example, above-mentioned estimation module 522 specifically can comprise unshowned following structure in accompanying drawing:
Send submodule 522a, eigenwert for the predetermined number obtained by above-mentioned acquiring unit 51 sends to arbitrary available computing equipment, to make the eigenwert of this computing equipment to above-mentioned predetermined number screen, the similarity between any two eigenwerts is made to be less than above-mentioned predetermined threshold value;
Obtain submodule 522b, for obtaining the processing time of this computing equipment, to obtain the estimated value of the computing velocity of each available computing equipment above-mentioned.
Be understandable that, send submodule 522a and obtain the flow process that submodule 522b can perform step 1022a to step 1022b in Fig. 4 respectively, thus can have corresponding function and structure, not repeat them here.
Similarly, as a kind of example, above-mentioned replicated blocks 523 specifically can comprise unshowned following structure in accompanying drawing:
Determine submodule 523a, for the computing velocity of each available computing equipment that obtains according to above-mentioned estimation module 522 and the above-mentioned pre-conditioned quantity determining to distribute to the eigenwert of each computing equipment;
Send submodule 523b, for determining that all eigenwerts in above-mentioned characteristic value collection are distributed at least one computing equipment by the quantity of the eigenwert that submodule 523a obtains according to above-mentioned, to make at least one computing equipment above-mentioned screen the eigenwert be assigned to, the similarity between any two eigenwerts is made to be less than above-mentioned predetermined threshold value;
Receive submodule 523c, for receiving the eigenwert after from the screening of at least one computing equipment above-mentioned, to upgrade above-mentioned characteristic value collection.
Be understandable that, determine submodule 523a, send submodule 523b and receive the flow process that submodule 523c can perform step 1024a to step 1024c in Fig. 3 respectively, thus can have corresponding function and structure, not repeat them here.
Similar with the extracting method of above-mentioned virus characteristic, in the middle of any one above-mentioned embodiment, the process of above-mentioned " screening the eigenwert be assigned to; make the similarity between any two eigenwerts be less than above-mentioned predetermined threshold value ", specifically can comprise not shown in the accompanying drawings following steps flow chart:
Step 201 a: eigenwert is retained, and successively following steps are performed to remaining all eigenwert:
Step 202: the similarity of judging characteristic value whether and between the arbitrary eigenwert retained is more than or equal to above-mentioned predetermined threshold value;
Step 203: if so, then this eigenwert is removed;
Step 204: if not, then retain this eigenwert.
Thus, above-mentioned computing equipment can carry out cluster calculation according to the flow process of above-mentioned steps 201 to step 204.
On the basis of above-mentioned any embodiment, state and pre-conditionedly can comprise any one condition following, or the combination of any number of condition:
Condition F1: the above-mentioned processing time of arbitrary computing equipment is less than the first preset value;
Condition F2: the above-mentioned processing time of all computing equipments reaches unanimity;
Condition F3: when the eigenwert quantity in above-mentioned characteristic value collection is greater than the second preset value, above-mentioned processing time of arbitrary above-mentioned computing equipment levels off to the 3rd preset value.
In addition, above-mentioned taxon 52 can also comprise unshowned following structure in accompanying drawing:
Sending module 524, for all samples to be clustered are divided into some parts, and send to several computing equipments respectively together with the characteristic value collection obtained with above-mentioned replicated blocks 523, with the similarity making above-mentioned computing equipment calculate all eigenwerts in the eigenwert of each sample and above-mentioned characteristic value collection successively, and the classification corresponding to eigenwert that similarity between the eigenwert being this sample by each sample labeling is maximum;
Receiver module 525, for receiving the category label of each sample from several computing equipments above-mentioned, to classify to all samples to be clustered.
Be understandable that, sending module 524 and receiver module 525 can perform the flow process of above-mentioned steps 1025 and step 1026 respectively, thus can have corresponding function and structure, not repeat them here.Based on this, in same classification, all Virus Samples are all similar (such as mutation each other, or have identical author, script or source code), therefore can obtain by extracting common trait the feature that this classification is different from other classifications.Thus, the viral classification and common trait composition virus base thereof that obtain can be utilized, in order to carry out the killing of virus.
It should be understood that other embodiments of the present invention also disclose following technical scheme:
The extraction element of A1, a kind of virus characteristic, is characterized in that, comprising:
Acquiring unit, for obtaining several Virus Samples;
Taxon, is divided at least one classification for several Virus Samples obtained by described acquiring unit, is more than or equal to a predetermined threshold value to make the similarity belonged between the eigenwert of other any two Virus Samples of same class;
Extraction unit, for each classification obtained for described taxon, extracts the common trait belonging to such other all Virus Samples.
A2, device according to aforementioned schemes A1, it is characterized in that, the eigenwert of described Virus Sample is the fuzzy Hash eigenwert of this Virus Sample under file layout.
A3, device according to aforementioned schemes A1 or A2, it is characterized in that, described taxon specifically comprises:
Acquisition module, for obtaining the eigenwert of several Virus Samples that described acquiring unit obtains, with composition characteristic value set;
Estimation module, for estimating the computing velocity of each available computing equipment;
Replicated blocks, before being less than predetermined threshold value, repeatedly perform following step for the similarity between two eigenwerts any in described characteristic value collection:
All eigenwerts in described characteristic value collection are distributed at least one computing equipment by the computing velocity of each the available computing equipment obtained according to described estimation module, to make at least one computing equipment described screen the eigenwert be assigned under the processing time meets pre-conditioned prerequisite, the similarity between any two eigenwerts is made to be less than described predetermined threshold value.
A4, device according to aforementioned schemes A3, it is characterized in that, described estimation module specifically comprises:
Send submodule, eigenwert for the predetermined number obtained by described acquiring unit sends to arbitrary available computing equipment, to make the eigenwert of this computing equipment to described predetermined number screen, the similarity between any two eigenwerts is made to be less than described predetermined threshold value;
Obtain submodule, for obtaining the processing time of this computing equipment, to obtain the estimated value of the computing velocity of each available computing equipment described.
A5, device according to aforementioned schemes A3, it is characterized in that, described replicated blocks specifically comprise:
Determine submodule, for the computing velocity of each available computing equipment that obtains according to described estimation module and the described pre-conditioned quantity determining to distribute to the eigenwert of each computing equipment;
Send submodule, for determining that all eigenwerts in described characteristic value collection are distributed at least one computing equipment by the quantity of the eigenwert that submodule obtains according to described, to make at least one computing equipment described screen the eigenwert be assigned to, the similarity between any two eigenwerts is made to be less than described predetermined threshold value;
Receive submodule, for receiving the eigenwert after from the screening of at least one computing equipment described, to upgrade described characteristic value collection.
A6, device according to aforementioned schemes A3, it is characterized in that, the described eigenwert to being assigned to is screened, and makes the similarity between any two eigenwerts be less than described predetermined threshold value, specifically comprises:
An eigenwert is retained, and successively following steps is performed to remaining all eigenwert:
The similarity of judging characteristic value whether and between the arbitrary eigenwert retained is more than or equal to described predetermined threshold value;
If so, then this eigenwert is removed;
If not, then this eigenwert is retained.
A7, device according to aforementioned schemes A3, it is characterized in that, described taxon also comprises:
Sending module, for all samples to be clustered are divided into some parts, and send to several computing equipments respectively together with the characteristic value collection obtained with described replicated blocks, with the similarity making described computing equipment calculate all eigenwerts in the eigenwert of each sample and described characteristic value collection successively, and the classification corresponding to eigenwert that similarity between the eigenwert being this sample by each sample labeling is maximum;
Receiver module, for receiving the category label of each sample from several computing equipments described, to classify to all samples to be clustered.
A8, according to the device in aforementioned schemes A1 to A7 described in any one, to it is characterized in that, describedly pre-conditionedly to comprise:
The described processing time of arbitrary computing equipment is less than the first preset value;
And/or,
The described processing time of all computing equipments reaches unanimity;
And/or,
When eigenwert quantity in described characteristic value collection is greater than the second preset value, described processing time of arbitrary described computing equipment levels off to the 3rd preset value.
The extracting method of B9, a kind of virus characteristic, is characterized in that, comprising:
Obtain several Virus Samples;
Several Virus Samples described are divided at least one classification, are more than or equal to a predetermined threshold value to make the similarity belonged between the eigenwert of other any two Virus Samples of same class;
For each classification, extract the common trait belonging to such other all Virus Samples.
B10, method according to aforementioned schemes B9, it is characterized in that, the eigenwert of described Virus Sample is the fuzzy Hash eigenwert of this Virus Sample under file layout.
B11, method according to aforementioned schemes B9 or B10, it is characterized in that, described several Virus Samples described are divided at least one classification, to make the similarity belonged between the eigenwert of other any two Virus Samples of same class be more than or equal to a predetermined threshold value, comprise:
Obtain the eigenwert of several Virus Samples described, with composition characteristic value set;
Estimate the computing velocity of each available computing equipment;
Similarity in described characteristic value collection between any two eigenwerts repeatedly carries out following step before being less than predetermined threshold value:
All eigenwerts in described characteristic value collection are distributed at least one computing equipment by the computing velocity according to each available computing equipment described, to make at least one computing equipment described screen the eigenwert be assigned under the processing time meets pre-conditioned prerequisite, the similarity between any two eigenwerts is made to be less than described predetermined threshold value.
B12, method according to aforementioned schemes B11, it is characterized in that, the computing velocity of each available computing equipment of described estimation, comprising:
The eigenwert of predetermined number is sent to arbitrary available computing equipment, to make the eigenwert of this computing equipment to described predetermined number screen, makes the similarity between any two eigenwerts be less than described predetermined threshold value;
Obtain the processing time of this computing equipment, to obtain the estimated value of the computing velocity of each available computing equipment described.
B13, method according to aforementioned schemes B11, it is characterized in that, all eigenwerts in described characteristic value collection are distributed at least one computing equipment by the computing velocity of each available computing equipment described in described basis, the eigenwert be assigned to is screened under the processing time meets pre-conditioned prerequisite to make at least one computing equipment described, make the similarity between any two eigenwerts be less than described predetermined threshold value, comprising:
According to computing velocity and the described pre-conditioned quantity determining the eigenwert distributing to each computing equipment of each available computing equipment described;
According to the quantity of determined eigenwert, all eigenwerts in described characteristic value collection are distributed at least one computing equipment, to make at least one computing equipment described screen the eigenwert be assigned to, the similarity between any two eigenwerts is made to be less than described predetermined threshold value;
Receive from the eigenwert after the screening of at least one computing equipment described, to upgrade described characteristic value collection.
B14, method according to aforementioned schemes B11, it is characterized in that, the described eigenwert to being assigned to is screened, and makes the similarity between any two eigenwerts be less than described predetermined threshold value, specifically comprises:
An eigenwert is retained, and successively following steps is performed to remaining all eigenwert:
The similarity of judging characteristic value whether and between the arbitrary eigenwert retained is more than or equal to described predetermined threshold value;
If so, then this eigenwert is removed;
If not, then this eigenwert is retained.
B15, method according to aforementioned schemes B11, it is characterized in that, described several Virus Samples described are divided at least one classification, to make the similarity belonged between the eigenwert of other any two Virus Samples of same class be more than or equal to a predetermined threshold value, also comprise:
Several Virus Samples described are divided into some parts, and several computing equipments are sent to respectively together with described characteristic value collection, with the similarity making described computing equipment calculate all eigenwerts in the eigenwert of each Virus Sample and described characteristic value collection successively, and each Virus Sample is labeled as the maximum classification corresponding to eigenwert of similarity between the eigenwert of this Virus Sample;
Receive the category label from each Virus Sample of several computing equipments described, to classify to several Virus Samples described.
B16, according to the method in aforementioned schemes B9 to B15 described in any one, to it is characterized in that, describedly pre-conditionedly to comprise:
The described processing time of arbitrary computing equipment is less than the first preset value;
And/or,
The described processing time of all computing equipments reaches unanimity;
And/or,
When the quantity of remaining all samples is greater than the second preset value, described processing time of arbitrary described computing equipment levels off to the 3rd preset value.

Claims (10)

1. an extraction element for virus characteristic, is characterized in that, comprising:
Acquiring unit, for obtaining several Virus Samples;
Taxon, is divided at least one classification for several Virus Samples obtained by described acquiring unit, is more than or equal to a predetermined threshold value to make the similarity belonged between the eigenwert of other any two Virus Samples of same class;
Extraction unit, for each classification obtained for described taxon, extracts the common trait belonging to such other all Virus Samples.
2. device according to claim 1, is characterized in that, the eigenwert of described Virus Sample is the fuzzy Hash eigenwert of this Virus Sample under file layout.
3. device according to claim 1 and 2, is characterized in that, described taxon specifically comprises:
Acquisition module, for obtaining the eigenwert of several Virus Samples that described acquiring unit obtains, with composition characteristic value set;
Estimation module, for estimating the computing velocity of each available computing equipment;
Replicated blocks, before being less than predetermined threshold value, repeatedly perform following step for the similarity between two eigenwerts any in described characteristic value collection:
All eigenwerts in described characteristic value collection are distributed at least one computing equipment by the computing velocity of each the available computing equipment obtained according to described estimation module, to make at least one computing equipment described screen the eigenwert be assigned under the processing time meets pre-conditioned prerequisite, the similarity between any two eigenwerts is made to be less than described predetermined threshold value.
4. device according to claim 3, is characterized in that, described estimation module specifically comprises:
Send submodule, eigenwert for the predetermined number obtained by described acquiring unit sends to arbitrary available computing equipment, to make the eigenwert of this computing equipment to described predetermined number screen, the similarity between any two eigenwerts is made to be less than described predetermined threshold value;
Obtain submodule, for obtaining the processing time of this computing equipment, to obtain the estimated value of the computing velocity of each available computing equipment described.
5. device according to claim 3, is characterized in that, described replicated blocks specifically comprise:
Determine submodule, for the computing velocity of each available computing equipment that obtains according to described estimation module and the described pre-conditioned quantity determining to distribute to the eigenwert of each computing equipment;
Send submodule, for determining that all eigenwerts in described characteristic value collection are distributed at least one computing equipment by the quantity of the eigenwert that submodule obtains according to described, to make at least one computing equipment described screen the eigenwert be assigned to, the similarity between any two eigenwerts is made to be less than described predetermined threshold value;
Receive submodule, for receiving the eigenwert after from the screening of at least one computing equipment described, to upgrade described characteristic value collection.
6. device according to claim 3, is characterized in that, the described eigenwert to being assigned to is screened, and makes the similarity between any two eigenwerts be less than described predetermined threshold value, specifically comprises:
An eigenwert is retained, and successively following steps is performed to remaining all eigenwert:
The similarity of judging characteristic value whether and between the arbitrary eigenwert retained is more than or equal to described predetermined threshold value;
If so, then this eigenwert is removed;
If not, then this eigenwert is retained.
7. device according to claim 3, is characterized in that, described taxon also comprises:
Sending module, for all samples to be clustered are divided into some parts, and send to several computing equipments respectively together with the characteristic value collection obtained with described replicated blocks, with the similarity making described computing equipment calculate all eigenwerts in the eigenwert of each sample and described characteristic value collection successively, and the classification corresponding to eigenwert that similarity between the eigenwert being this sample by each sample labeling is maximum;
Receiver module, for receiving the category label of each sample from several computing equipments described, to classify to all samples to be clustered.
8. device as claimed in any of claims 1 to 7, is characterized in that, describedly pre-conditionedly to comprise:
The described processing time of arbitrary computing equipment is less than the first preset value;
And/or,
The described processing time of all computing equipments reaches unanimity;
And/or,
When eigenwert quantity in described characteristic value collection is greater than the second preset value, described processing time of arbitrary described computing equipment levels off to the 3rd preset value.
9. an extracting method for virus characteristic, is characterized in that, comprising:
Obtain several Virus Samples;
Several Virus Samples described are divided at least one classification, are more than or equal to a predetermined threshold value to make the similarity belonged between the eigenwert of other any two Virus Samples of same class;
For each classification, extract the common trait belonging to such other all Virus Samples.
10. method according to claim 9, is characterized in that, the eigenwert of described Virus Sample is the fuzzy Hash eigenwert of this Virus Sample under file layout.
CN201510378081.3A 2015-06-30 2015-06-30 The extracting method and device of virus characteristic Active CN104978526B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510378081.3A CN104978526B (en) 2015-06-30 2015-06-30 The extracting method and device of virus characteristic

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510378081.3A CN104978526B (en) 2015-06-30 2015-06-30 The extracting method and device of virus characteristic

Publications (2)

Publication Number Publication Date
CN104978526A true CN104978526A (en) 2015-10-14
CN104978526B CN104978526B (en) 2018-03-13

Family

ID=54275020

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510378081.3A Active CN104978526B (en) 2015-06-30 2015-06-30 The extracting method and device of virus characteristic

Country Status (1)

Country Link
CN (1) CN104978526B (en)

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106845229A (en) * 2016-12-28 2017-06-13 哈尔滨安天科技股份有限公司 A kind of virus characteristic extracting method and system based on FTS models
CN106909839A (en) * 2015-12-22 2017-06-30 北京奇虎科技有限公司 A kind of method and device for extracting sample code feature
CN107046532A (en) * 2017-03-09 2017-08-15 湖北工业大学 A kind of Web application system securities detection method
CN107273746A (en) * 2017-05-18 2017-10-20 广东工业大学 A kind of mutation malware detection method based on APK character string features
CN107562618A (en) * 2017-08-07 2018-01-09 北京奇安信科技有限公司 A kind of shellcode detection method and device
CN109522915A (en) * 2017-09-20 2019-03-26 腾讯科技(深圳)有限公司 Virus document clustering method, device and readable medium
CN110392081A (en) * 2018-04-20 2019-10-29 武汉安天信息技术有限责任公司 Virus base method for pushing and device, computer equipment and computer storage medium
CN112487432A (en) * 2020-12-10 2021-03-12 杭州安恒信息技术股份有限公司 Method, system and equipment for malicious file detection based on icon matching
CN112579828A (en) * 2019-09-30 2021-03-30 奇安信安全技术(珠海)有限公司 Feature code processing method, device and system, storage medium and electronic device
CN112580039A (en) * 2019-09-30 2021-03-30 奇安信安全技术(珠海)有限公司 Method, device and equipment for processing virus characteristic data
CN112818347A (en) * 2021-02-22 2021-05-18 深信服科技股份有限公司 File label determination method, device, equipment and storage medium
CN113360904A (en) * 2021-05-17 2021-09-07 杭州美创科技有限公司 Unknown virus detection method and system

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080066179A1 (en) * 2006-09-11 2008-03-13 Fujian Eastern Micropoint Info-Tech Co., Ltd. Antivirus protection system and method for computers
CN101464893B (en) * 2008-12-31 2010-09-08 清华大学 Method and device for extracting video abstract
CN102930206A (en) * 2011-08-09 2013-02-13 腾讯科技(深圳)有限公司 Cluster partitioning processing method and cluster partitioning processing device for virus files
CN104424190A (en) * 2013-08-20 2015-03-18 富士通株式会社 Method and device for integrating a plurality of databases

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080066179A1 (en) * 2006-09-11 2008-03-13 Fujian Eastern Micropoint Info-Tech Co., Ltd. Antivirus protection system and method for computers
CN101464893B (en) * 2008-12-31 2010-09-08 清华大学 Method and device for extracting video abstract
CN102930206A (en) * 2011-08-09 2013-02-13 腾讯科技(深圳)有限公司 Cluster partitioning processing method and cluster partitioning processing device for virus files
CN104424190A (en) * 2013-08-20 2015-03-18 富士通株式会社 Method and device for integrating a plurality of databases

Cited By (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106909839A (en) * 2015-12-22 2017-06-30 北京奇虎科技有限公司 A kind of method and device for extracting sample code feature
CN106909839B (en) * 2015-12-22 2020-04-17 北京奇虎科技有限公司 Method and device for extracting sample code features
CN106845229B (en) * 2016-12-28 2019-12-20 哈尔滨安天科技集团股份有限公司 Virus characteristic extraction method and system based on FTS model
CN106845229A (en) * 2016-12-28 2017-06-13 哈尔滨安天科技股份有限公司 A kind of virus characteristic extracting method and system based on FTS models
CN107046532A (en) * 2017-03-09 2017-08-15 湖北工业大学 A kind of Web application system securities detection method
CN107046532B (en) * 2017-03-09 2020-04-17 湖北工业大学 Web application system security detection method
CN107273746A (en) * 2017-05-18 2017-10-20 广东工业大学 A kind of mutation malware detection method based on APK character string features
CN107562618A (en) * 2017-08-07 2018-01-09 北京奇安信科技有限公司 A kind of shellcode detection method and device
CN109522915A (en) * 2017-09-20 2019-03-26 腾讯科技(深圳)有限公司 Virus document clustering method, device and readable medium
CN109522915B (en) * 2017-09-20 2022-08-23 腾讯科技(深圳)有限公司 Virus file clustering method and device and readable medium
CN110392081A (en) * 2018-04-20 2019-10-29 武汉安天信息技术有限责任公司 Virus base method for pushing and device, computer equipment and computer storage medium
CN112579828A (en) * 2019-09-30 2021-03-30 奇安信安全技术(珠海)有限公司 Feature code processing method, device and system, storage medium and electronic device
CN112580039A (en) * 2019-09-30 2021-03-30 奇安信安全技术(珠海)有限公司 Method, device and equipment for processing virus characteristic data
CN112487432A (en) * 2020-12-10 2021-03-12 杭州安恒信息技术股份有限公司 Method, system and equipment for malicious file detection based on icon matching
CN112818347A (en) * 2021-02-22 2021-05-18 深信服科技股份有限公司 File label determination method, device, equipment and storage medium
CN112818347B (en) * 2021-02-22 2024-04-09 深信服科技股份有限公司 File tag determining method, device, equipment and storage medium
CN113360904A (en) * 2021-05-17 2021-09-07 杭州美创科技有限公司 Unknown virus detection method and system

Also Published As

Publication number Publication date
CN104978526B (en) 2018-03-13

Similar Documents

Publication Publication Date Title
CN104978526A (en) Virus signature extraction method and apparatus
CN110417901B (en) Data processing method and device and gateway server
CN106649831B (en) Data filtering method and device
CN110808994B (en) Method and device for detecting brute force cracking operation and server
CN107222511B (en) Malicious software detection method and device, computer device and readable storage medium
CN103617256A (en) Method and device for processing file needing mutation detection
CN110944016B (en) DDoS attack detection method, device, network equipment and storage medium
CN112765161B (en) Alarm rule matching method and device, electronic equipment and storage medium
CN114785567B (en) Flow identification method, device, equipment and medium
WO2021109724A1 (en) Log anomaly detection method and apparatus
CN104980407A (en) Misinformation detecting method and device
CN116186267A (en) Policy data processing method, device, computer equipment and storage medium
CN115394361A (en) Method, apparatus and medium for constructing a microbial genome database
CN111064719A (en) Method and device for detecting abnormal downloading behavior of file
CN105095382A (en) Method and device for sample distributed clustering calculation
EP3287929B1 (en) Virus scanning method and virus scanning apparatus
CN111819559A (en) Using machine learning models with quantized step sizes for malware detection
CN116566766A (en) Intelligent power gateway management and control method and system
WO2020006909A1 (en) Method and device for deduplicating urls
US10783244B2 (en) Information processing system, information processing method, and program
EP3243145A1 (en) Efficiently detecting user credentials
CN112882707B (en) Rendering method and device, storage medium and electronic equipment
CN112818347A (en) File label determination method, device, equipment and storage medium
CN111930612A (en) Method and device for detecting code updating correctness and computing equipment
CN111198900A (en) Data caching method and device for industrial control network, terminal equipment and medium

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
TR01 Transfer of patent right

Effective date of registration: 20211201

Address after: 300450 No. 9-3-401, No. 39, Gaoxin 6th Road, Binhai Science Park, high tech Zone, Binhai New Area, Tianjin

Patentee after: 3600 Technology Group Co.,Ltd.

Address before: 100088 room 112, block D, 28 new street, new street, Xicheng District, Beijing (Desheng Park)

Patentee before: BEIJING QIHOO TECHNOLOGY Co.,Ltd.

Patentee before: Qizhi software (Beijing) Co.,Ltd.

TR01 Transfer of patent right
TR01 Transfer of patent right

Effective date of registration: 20230717

Address after: 1765, floor 17, floor 15, building 3, No. 10 Jiuxianqiao Road, Chaoyang District, Beijing 100015

Patentee after: Beijing Hongxiang Technical Service Co.,Ltd.

Address before: 300450 No. 9-3-401, No. 39, Gaoxin 6th Road, Binhai Science Park, high tech Zone, Binhai New Area, Tianjin

Patentee before: 3600 Technology Group Co.,Ltd.

TR01 Transfer of patent right