CN110135160A - The method, apparatus and system of software detection - Google Patents

The method, apparatus and system of software detection Download PDF

Info

Publication number
CN110135160A
CN110135160A CN201910353079.9A CN201910353079A CN110135160A CN 110135160 A CN110135160 A CN 110135160A CN 201910353079 A CN201910353079 A CN 201910353079A CN 110135160 A CN110135160 A CN 110135160A
Authority
CN
China
Prior art keywords
api
sequence
malware
multithreading
thread
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910353079.9A
Other languages
Chinese (zh)
Other versions
CN110135160B (en
Inventor
徐国爱
徐国胜
孙博文
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing University of Posts and Telecommunications
Original Assignee
Beijing University of Posts and Telecommunications
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing University of Posts and Telecommunications filed Critical Beijing University of Posts and Telecommunications
Priority to CN201910353079.9A priority Critical patent/CN110135160B/en
Publication of CN110135160A publication Critical patent/CN110135160A/en
Application granted granted Critical
Publication of CN110135160B publication Critical patent/CN110135160B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/50Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
    • G06F21/52Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems during program execution, e.g. stack integrity ; Preventing unwanted data erasure; Buffer overflow
    • G06F21/53Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems during program execution, e.g. stack integrity ; Preventing unwanted data erasure; Buffer overflow by executing in a restricted environment, e.g. sandbox or secure virtual machine
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/50Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
    • G06F21/55Detecting local intrusion or implementing counter-measures
    • G06F21/56Computer malware detection or handling, e.g. anti-virus arrangements
    • G06F21/562Static detection

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Security & Cryptography (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Hardware Design (AREA)
  • General Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Virology (AREA)
  • Stored Programmes (AREA)

Abstract

The present invention provides the method, apparatus and system of a kind of software detection, this method, comprising: obtains software data to be detected;System file thread is filtered out from the software data;The API sequence in the system file thread is extracted by sandbox;The API sequence is truncated, and the API sequence after truncation is recompiled according to multithreading, obtains API multithreading sequence;The API multithreading sequence is tested and analyzed by target detection model, obtains software detection result.More efficient, accurate malware detection may be implemented, a greater variety of malware detection results can also be obtained.

Description

The method, apparatus and system of software detection
Technical field
The present invention relates to field of information security technology more particularly to the method, apparatus and system of a kind of software detection.
Background technique
With information technology, the high speed development of internet, network security problem is more and more paid close attention to by people, malice Software, which is undoubtedly, to be endangered maximum, and Malware deliberately executes the virus of malice task, worm to refer on the computer systems Or the program of Trojan Horse etc..Implement control by destroying software process.N-Gram traditional detection method is in malice It is widely used in software sequences detection.
With the continuous development of Malware countermeasure techniques, malware detection techniques are also from static detection gradually to sound Bonding position development.
However, the software detecting method of static sequence extraction scheme is used mostly, when being based on N-Gram feature extraction, because Dynamic sequence is there are multithreading, the problems such as sequence length is extremely uneven, it is inaccurate to cause testing result, while calculating being brought to open Pin is big, and receptive field is too small.
Summary of the invention
The present invention provides the method, apparatus and system of a kind of software detection, to realize more efficient, accurate Malware Detection, can also obtain a greater variety of malware detection results.
In a first aspect, a kind of method of software detection provided in an embodiment of the present invention, comprising:
Obtain software data to be detected;
System file thread is filtered out from the software data;
The API sequence in the system file thread is extracted by sandbox;
The API sequence is truncated, and the API sequence after truncation is recompiled according to multithreading, is obtained API multithreading sequence;
The API multithreading sequence is tested and analyzed by target detection model, obtains software detection result.
In a kind of possible design, the API multithreading sequence is being tested and analyzed by target detection model, Before obtaining software detection result, further includes:
Obtain malware data;The malware data includes: infection type virus, trojan horse program, digs mine program, strangles Rope virus;
From the malware data, the system file thread of Malware is filtered out;
The API sequence in the system file thread of the Malware is extracted by sandbox, obtains the API sequence of Malware Column;
The API sequence of the Malware is truncated, and the API sequence after truncation is carried out again according to multithreading Coding, obtains the API multithreading sequence of Malware;
Construct initial detecting model;The initial detecting model is the disaggregated model based on empty convolution sum TextCNN;
Using the logloss that more classifies as evaluation goal, by the API multithreading sequence of Malware to the initial detecting mould Type is iterated training, obtains the target detection model.
In a kind of possible design, the API sequence in the system file thread is extracted by sandbox, comprising:
Dynamic Execution is carried out to the system file thread by sandbox, obtains API Name, API thread that file calls The serial number of API Calls in number, API return value, thread.
In a kind of possible design, the API sequence is truncated, and according to multithreading to the API sequence after truncation Column are recompiled, and API multithreading sequence is obtained, comprising:
When calling the quantity of documents of API to be more than preset threshold value, then API sequence corresponding to the thread in a certain thread It is truncated, saves the API record of preset quantity, the API sequence being truncated;
According to multithreading, the API sequence of truncation is recompiled, obtains API multithreading sequence.
In a kind of possible design, the API multithreading sequence is tested and analyzed by target detection model, is obtained To software detection result, comprising:
The API multithreading sequence is detected by target detection model, judges that the API multithreading sequence is corresponding Software data whether be malware data;
If malware data, then the type label of malware data is exported;
If not malware data, then prompt software data safe.
Second aspect, a kind of device of software detection provided in an embodiment of the present invention, comprising:
Module is obtained, for obtaining software data to be detected;
Screening module, for filtering out system file thread from the software data;
Extraction module, for extracting the API sequence in the system file thread by sandbox;
Coding module carries out the API sequence after truncation for the API sequence to be truncated, and according to multithreading It recompiles, obtains API multithreading sequence;
Module is obtained, for testing and analyzing by target detection model to the API multithreading sequence, obtains software Testing result.
In a kind of possible design, the API multithreading sequence is being tested and analyzed by target detection model, Before obtaining software detection result, further includes:
Obtain malware data;The malware data includes: infection type virus, trojan horse program, digs mine program, strangles Rope virus;
From the malware data, the system file thread of Malware is filtered out;
The API sequence in the system file thread of the Malware is extracted by sandbox, obtains the API sequence of Malware Column;
The API sequence of the Malware is truncated, and the API sequence after truncation is carried out again according to multithreading Coding, obtains the API multithreading sequence of Malware;
Construct initial detecting model;The initial detecting model is the disaggregated model based on empty convolution sum TextCNN;
Using the logloss that more classifies as evaluation goal, by the API multithreading sequence of Malware to the initial detecting mould Type is iterated training, obtains the target detection model.
In a kind of possible design, extraction module is specifically used for:
Dynamic Execution is carried out to the system file thread by sandbox, obtains API Name, API thread that file calls The serial number of API Calls in number, API return value, thread.
In a kind of possible design, coding module is specifically used for:
When calling the quantity of documents of API to be more than preset threshold value, then API sequence corresponding to the thread in a certain thread It is truncated, saves the API record of preset quantity, the API sequence being truncated;
According to multithreading, the API sequence of truncation is recompiled, obtains API multithreading sequence.
In a kind of possible design, module is obtained, is specifically used for:
The API multithreading sequence is detected by target detection model, judges that the API multithreading sequence is corresponding Software data whether be malware data;
If malware data, then the type label of malware data is exported;
If not malware data, then prompt software data safe.
The third aspect, a kind of system of software detection provided in an embodiment of the present invention, including memory and processor, storage The executable instruction of the processor is stored in device;Wherein, the processor is configured to next via executing instruction described in execution The method for executing software detection described in any one of first aspect.
A kind of fourth aspect, computer readable storage medium provided in an embodiment of the present invention, is stored thereon with computer journey Sequence realizes the method for the described in any item software detections of first aspect when the program is executed by processor.
The present invention provides the method, apparatus and system of a kind of software detection, this method, comprising: obtains software to be detected Data;System file thread is filtered out from the software data;The API sequence in the system file thread is extracted by sandbox Column;The API sequence is truncated, and the API sequence after truncation is recompiled according to multithreading, it is more to obtain API Thread sequence;The API multithreading sequence is tested and analyzed by target detection model, obtains software detection result.It can To realize more efficient, accurate malware detection, a greater variety of malware detection results can also be obtained.
Detailed description of the invention
In order to more clearly explain the embodiment of the invention or the technical proposal in the existing technology, to embodiment or will show below There is attached drawing needed in technical description to be briefly described, it should be apparent that, the accompanying drawings in the following description is this hair Bright some embodiments for those of ordinary skill in the art without any creative labor, can be with It obtains other drawings based on these drawings.
Fig. 1 is an application scenarios schematic diagram of the invention;
Fig. 2 is the method flow diagram for the software detection that the embodiment of the present invention one provides;
Fig. 3 is the schematic diagram of empty convolution in the method for the software detection that the embodiment of the present invention one provides;
Fig. 4 is the apparatus structure schematic diagram of software detection provided by Embodiment 2 of the present invention;
Fig. 5 is the system structure diagram for the software detection that the embodiment of the present invention three provides.
Specific embodiment
In order to make the object, technical scheme and advantages of the embodiment of the invention clearer, below in conjunction with the embodiment of the present invention In attached drawing, technical scheme in the embodiment of the invention is clearly and completely described, it is clear that described embodiment is A part of the embodiment of the present invention, instead of all the embodiments.Based on the embodiments of the present invention, those of ordinary skill in the art Every other embodiment obtained without making creative work, shall fall within the protection scope of the present invention.
Description and claims of this specification and term " first ", " second ", " third " " in above-mentioned attached drawing The (if present)s such as four " are to be used to distinguish similar objects, without being used to describe a particular order or precedence order.It should manage The data that solution uses in this way are interchangeable under appropriate circumstances, so that the embodiment of the present invention described herein for example can be to remove Sequence other than those of illustrating or describe herein is implemented.In addition, term " includes " and " having " and theirs is any Deformation, it is intended that cover it is non-exclusive include, for example, containing the process, method of a series of steps or units, system, production Product or equipment those of are not necessarily limited to be clearly listed step or unit, but may include be not clearly listed or for this A little process, methods, the other step or units of product or equipment inherently.
How to be solved with technical solution of the specifically embodiment to technical solution of the present invention and the application below above-mentioned Technical problem is described in detail.These specific embodiments can be combined with each other below, for the same or similar concept Or process may repeat no more in certain embodiments.Below in conjunction with attached drawing, the embodiment of the present invention is described.
Fig. 1 is an application scenarios schematic diagram of the invention, as shown in Figure 1, being obtained in the present invention using software detection systems 11 Software data 12 to be detected filters out system file thread from these software datas to be detected, and being extracted by sandbox should API sequence in system file thread, is truncated API sequence, and carries out weight to the API sequence after truncation according to multithreading It is newly encoded, obtain API multithreading sequence;The API multithreading sequence is tested and analyzed by target detection model, is obtained soft Part testing result 13.In an alternative embodiment, software detection systems include target detection model.Using above-mentioned detection side The detection of more efficient, accurate Malware may be implemented in method, can also obtain a greater variety of malware detection knots Fruit.
Fig. 2 is the method flow diagram for the software detection that the embodiment of the present invention one provides, as shown in Fig. 2, the software detection Method may include:
S201, software data to be detected is obtained.
In the present embodiment, software detection systems carry out obtaining software data to be detected using module is obtained, in one kind In optional embodiment, the operating status program information obtained of program can be captured when program is run by obtaining module, this A little operating statuses may include that the CPU of program executes instruction sequence, and system calls (system call), application programming interfaces (API, Application Programming Interface) or the higher system service of the level of abstraction etc..A kind of optional Embodiment in, software detection systems use API Monitor (i.e. API Calls monitoring software), can monitor and show application The calling that program carries out can also track any derived API, such as Win32API and other third party API etc.. APIMonitor is supported to show information abundant, be may include function name, calling sequence, output and input parameter, function returns Value etc. is returned, and 82 DLL ((Dynamic Link Library, application program are expanded) and about 4000 can be predefined The prototype of API.
In multidate information in the present embodiment in the available operating status of software detection systems, with acquisition operating status Static information is lower compared to redundancy, and can capture in real time, and in addition the multidate information is not influenced by shell adding encryption, Huo Zhe The denaturation technique of some instruction-levels is called to the higher system of the system level of abstraction or API (Application Programming Interface, application programming interface) in vain, it is influenced by denaturation smaller.
S202, system file thread is filtered out from software data.
In an alternative embodiment, system file thread is filtered out from the software data, wherein system file line Journey may include windows platform software document thread, or may include simply being adapted to improvement by API sequence to be suitable for The software document thread of Android, Linux platform.
S203, pass through the API sequence in sandbox extraction system file thread.
Specifically, carrying out Dynamic Execution to system file thread by sandbox, API Name, API line that file calls are obtained The serial number of API Calls in journey number, API return value, thread.
In an alternative embodiment, software detection systems may include different sandboxs, by sandbox to system text Part thread carries out Dynamic Execution, extracts the API sequence in the system file thread, API Name that main document retaining calls, API thread number, API return value, in thread API Calls serial number.In an alternative embodiment, software detection system System uses extraction module, such as sandbox (Sanbox) that can be placed on insincere software data to be detected in isolation environment automatically Ground Dynamic Execution extracts the dynamic behaviours such as process behavior, network behavior, the file behavior in its operational process.In the present embodiment It, only need to API sequence after sandbox dynamic analysis in extraction system file thread without the specific restriction of sandbox.Wherein example If Cuckoo is using the automation malware analysis system of the Python open source write, it is all that Malware can be tracked and recorded Calling situation;Malware file behavior may include creating new file in Malware implementation procedure, modification file, deleting File reads file or downloads the behavior of file, the memory mirror of available Malware, with PCAP (Process Characterization Analysis Package, process characteristic analysis software package) format record Malware network flow Amount;The screenshot capture etc. in Malware implementation procedure can also be obtained.And then it can be according to the Dynamic Execution of sandbox as a result, right Software data carries out deep analysis.
S204, API sequence is truncated, and the API sequence after truncation is recompiled according to multithreading, obtained API multithreading sequence.
Specifically, calling the quantity of documents of API in a certain thread is more than preset threshold value, then API corresponding to thread Sequence is truncated, and the API record of preset quantity, the API sequence being truncated are saved;According to multithreading, to the API sequence of truncation Column are recompiled, and API multithreading sequence is obtained.
Have precision limitation since sandbox executes the time in the present embodiment, it may appear that occur on an index with thread or Different threads are all the case where executing multiple API, although can guarantee same TID (ThreadIdentifier, thread command character) Internal sequence, but cannot be guaranteed continuous.It, can be to the thread when calling more than 5000 API files in a thread TID Corresponding API sequence is truncated, and retains the record of 5000 API before each TID in sequence, the API sequence being truncated Column.And then according to multithreading, the API sequence of truncation is recompiled, API multithreading sequence is obtained.A kind of optional In embodiment, there is no ordinal relation between different threads TID, the index in the same TID is ascending to represent the successive of calling Relationship.In an alternative embodiment, software detection systems can be expanded by different coding modes, or can be with Term vector technology is introduced to be recompiled.
In the present embodiment using above-mentioned steps S204 can it is excessive to avoid API sequence length, caused by computing cost it is excessive The problem of, and the API sequence after truncation is recompiled, it can obtain with apparent timing, the relevance of height Sequence, and then obtain a greater variety of malware detection results.
S205, API multithreading sequence is tested and analyzed by target detection model, obtains software detection result.
Specifically, detecting by target detection model to API multithreading sequence, judge that API multithreading sequence is corresponding Software data whether be malware data;
If malware data, then the type label of malware data is exported;
If not malware data, then prompt software data safe.
In an alternative embodiment, the taxonomic structure of Malware may include: infectious virus, trojan horse program, Dig mine program, DDOS (Distributed Denial of Service, distributed denial of service) wooden horse, extort virus etc., Quantity of classifying can be with up to 600,000,000.
In the present embodiment, by the way that API Calls and data packet reconstruct, which can submit copy version, i.e., Malware data copy, and this is exactly the typical behaviour of Vflooder wooden horse family.Therefore judge the API multithreading sequence pair The software data answered is malware data.Wherein Vflooder is a kind of Flooder (worm) wooden horse of specific type, and Flooder wooden horse can send the normal operating that a large amount of information carrys out interrupt targets to target.And then export malware data Type label: worm wooden horse.In an alternative embodiment, target detection model examines the API multithreading sequence It surveys, in an alternative embodiment, software detection systems obtain standard, the data of safe operation, and judgement is not Malware Data, the software detection systems then prompt software data safe.
In an alternative embodiment, API multithreading sequence is being tested and analyzed by target detection model, is being obtained To before software detection result, further includes:
Obtain malware data;Malware data includes: infection type virus, trojan horse program, digs mine program, extorts disease Poison;
From malware data, the system file thread of Malware is filtered out;
The API sequence in the system file thread of Malware is extracted by sandbox, obtains the API sequence of Malware;
The API sequence of Malware is truncated, and the API sequence after truncation is compiled again according to multithreading Code, obtains the API multithreading sequence of Malware;
Construct initial detecting model;Initial detecting model is the disaggregated model based on empty convolution sum TextCNN;
Using the logloss that more classifies as evaluation goal, by the API multithreading sequence of Malware to initial detecting model into Row iteration training, obtains target detection model.
In an alternative embodiment, software detection systems obtain malware data, wherein the malware data It may include: infectious virus, trojan horse program, dig mine program, extort virus etc..Software detection systems are from the Malware number In, the system file thread of Malware is filtered out.The API in the system file thread of the Malware is extracted by sandbox Sequence obtains the API sequence of Malware, wherein obtaining API Name, the API thread number, API return value, line that file calls The serial number of API Calls in journey.The API sequence of the Malware is truncated, and API sequence after being truncated according to multithreading It is recompiled, obtains the API multithreading sequence of Malware.To coding mode without limiting in the present embodiment, ability Field technique personnel can specifically limit according to actual needs, such as can be expanded using different coding modes, or Term vector technology can also be introduced recompile etc..
And then initial detecting model is constructed, which is the disaggregated model based on empty convolution sum TextCNN; Using the logloss that more classifies as evaluation goal, instruction is iterated to initial detecting model by the API multithreading sequence of Malware Practice, obtains target detection model.
In an alternative embodiment, software detection systems construct initial detecting model, which is based on The disaggregated model of empty convolution sum TextCNN, wherein empty convolution also known as expansion convolution, introduce one into convolutional layer and be known as The new parameter of " spreading rate (dilation rate) ", the parameter define the spacing being respectively worth when convolution kernel handles data.Fig. 3 is The schematic diagram of empty convolution in the method for the software detection that the embodiment of the present invention one provides.With reference to Fig. 3, the 2- of corresponding 3x3 Dilated conv, actual convolution kernel size or 3x3, but cavity is 1, the i.e. image block for a 7x7, only 9 Convolution operation occurs for the point of a red and the convolution kernel of 3x3, remaining point skips over.It can be appreciated that the size of convolution kernel is 7x7, but the weight of 9 dots only in figure is not 0, remaining is all 0.It can be seen that although convolution kernel only has 3x3, The receptive field of this convolution has had increased to 7x7.In an alternative embodiment, if it is considered that this 2-dilated If the preceding layer of conv is a 1-dilated conv, then each dot is exactly the convolution output of 1-dilated, so Receptive field is the convolution effect that 3x3, i.e. 1-dilated and 2-dilated can reach 7x7 altogether, the impression of empty convolution Open country is exponential growth.
In an alternative embodiment, empty convolution can fill in space between convolution nuclear element when convolution, Here a new hyper parameter d being introduced, the value of (d-1) is then the space number filled in, it is assumed that convolution kernel size originally is k, Convolution kernel size n after so having filled in a space (d-1) are as follows: n=k+ (k-1) * (d-1) is in turn, it is assumed that input empty convolution Size be i, step-length s, the calculation formula of characteristic pattern size o after empty convolution are as follows:
In an alternative embodiment, TextCNN uses text classification convolutional neural networks, due to its structure is simple, The advantages that effect is good, by using stratification convolution kernel cavity convolution, i.e., be added in convolution process Dilated_size=[1, 2,3,4] and two windows of Kernel_size=[2,3,4,5], thus TextCNN neural network can be to maximum in the present invention The convolution window of 20 length models.
In an alternative embodiment, it is evaluation goal with the logloss that more classifies (Log loss, logarithm loss), leads to The API multithreading sequence for crossing Malware is iterated training to initial detecting model, obtains target detection model.Wherein Logloss are as follows:M represents classification number, N generation Table test set sample number, yijRepresent whether i-th of sample is classification j (be~1, no~0), PijI-th of sample is represented to be predicted For the probability (prob) of classification j, final logloss retains after decimal point 6.
More efficient, accurate malware detection may be implemented in the method for software detection in the present embodiment, can also obtain Obtain a greater variety of malware detection results.
Fig. 4 is the apparatus structure schematic diagram of software detection provided by Embodiment 2 of the present invention, as shown in figure 4, the device can To include:
Module 31 is obtained, for obtaining software data to be detected;
Screening module 32, for filtering out system file thread from software data;
Extraction module 33, for passing through the API sequence in sandbox extraction system file thread;
Coding module 34 carries out weight to the API sequence after truncation for API sequence to be truncated, and according to multithreading It is newly encoded, obtain API multithreading sequence;
Module 35 is obtained, for testing and analyzing by target detection model to API multithreading sequence, obtains software inspection Survey result.
In an alternative embodiment, API multithreading sequence is being tested and analyzed by target detection model, is being obtained To before software detection result, further includes:
Obtain malware data;Malware data includes: infection type virus, trojan horse program, digs mine program, extorts disease Poison;
From malware data, the system file thread of Malware is filtered out;
The API sequence in the system file thread of Malware is extracted by sandbox, obtains the API sequence of Malware;
The API sequence of Malware is truncated, and the API sequence after truncation is compiled again according to multithreading Code, obtains the API multithreading sequence of Malware;
Construct initial detecting model;Initial detecting model is the disaggregated model based on empty convolution sum TextCNN;
Using the logloss that more classifies as evaluation goal, by the API multithreading sequence of Malware to initial detecting model into Row iteration training, obtains target detection model.
In an alternative embodiment, extraction module 33 are specifically used for:
By sandbox to system file thread carry out Dynamic Execution, obtain file call API Name, API thread number, The serial number of API Calls in API return value, thread.
In an alternative embodiment, coding module 34 are specifically used for:
When calling the quantity of documents of API to be more than preset threshold value in a certain thread, then API sequence corresponding to thread carries out Truncation saves the API record of preset quantity, the API sequence being truncated;
According to multithreading, the API sequence of truncation is recompiled, obtains API multithreading sequence.
In an alternative embodiment, module 35 is obtained, is specifically used for:
API multithreading sequence is detected by target detection model, judges the corresponding software number of API multithreading sequence According to whether being malware data;
If malware data, then the type label of malware data is exported;
If not malware data, then prompt software data safe.
The device of software detection in the present embodiment, can execute the technical solution in method shown in Fig. 2, implement Associated description in journey and technical principle method shown in Figure 2, details are not described herein again.
Fig. 5 is the system structure diagram for the software detection that the embodiment of the present invention three provides, as shown in figure 5, the present embodiment The system 40 of software detection may include: processor 41 and memory 42.
Memory 42 (such as realizes application program, the function mould of the method for above-mentioned software detection for storing computer program Block etc.), computer instruction etc.;
Above-mentioned computer program, computer instruction etc. can be with partitioned storages in one or more memories 42.And Above-mentioned computer program, computer instruction, data etc. can be called with device 41 processed.
Processor 41, for executing the computer program of the storage of memory 42, to realize method that above-described embodiment is related to In each step.
It specifically may refer to the associated description in previous methods embodiment.
Processor 41 and memory 42 can be absolute construction, be also possible to the integrated morphology integrated.Work as processing When device 41 and memory 42 are absolute construction, memory 42, processor 41 can be of coupled connections by bus 43.
The server of the present embodiment can execute the technical solution in method shown in Fig. 2, implement process and technology Associated description in principle method shown in Figure 2, details are not described herein again.
In addition, the embodiment of the present application also provides a kind of computer readable storage medium, deposited in computer readable storage medium Computer executed instructions are contained, when at least one processor of user equipment executes the computer executed instructions, user equipment Execute above-mentioned various possible methods.
Wherein, computer-readable medium includes computer storage media and communication media, and wherein communication media includes being convenient for From a place to any medium of another place transmission computer program.Storage medium can be general or specialized computer Any usable medium that can be accessed.A kind of illustrative storage medium is coupled to processor, to enable a processor to from this Read information, and information can be written to the storage medium.Certainly, storage medium is also possible to the composition portion of processor Point.Pocessor and storage media can be located in ASIC.In addition, the ASIC can be located in user equipment.Certainly, processor and Storage medium can also be used as discrete assembly and be present in communication equipment.
Those of ordinary skill in the art will appreciate that: realize that all or part of the steps of above-mentioned each method embodiment can lead to The relevant hardware of program instruction is crossed to complete.Program above-mentioned can be stored in a computer readable storage medium.The journey When being executed, execution includes the steps that above-mentioned each method embodiment to sequence;And storage medium above-mentioned include: ROM, RAM, magnetic disk or The various media that can store program code such as person's CD.
Finally, it should be noted that the above embodiments are only used to illustrate the technical solution of the present invention., rather than its limitations;To the greatest extent Pipe present invention has been described in detail with reference to the aforementioned embodiments, those skilled in the art should understand that: its according to So be possible to modify the technical solutions described in the foregoing embodiments, or to some or all of the technical features into Row equivalent replacement;And these are modified or replaceed, various embodiments of the present invention technology that it does not separate the essence of the corresponding technical solution The range of scheme.

Claims (12)

1. a kind of method of software detection characterized by comprising
Obtain software data to be detected;
System file thread is filtered out from the software data;
The API sequence in the system file thread is extracted by sandbox;
The API sequence is truncated, and the API sequence after truncation is recompiled according to multithreading, it is more to obtain API Thread sequence;
The API multithreading sequence is tested and analyzed by target detection model, obtains software detection result.
2. the method according to claim 1, wherein passing through target detection model to the multi-thread program of the API Column are tested and analyzed, before obtaining software detection result, further includes:
Obtain malware data;The malware data includes: infection type virus, trojan horse program, digs mine program, extorts disease Poison;
From the malware data, the system file thread of Malware is filtered out;
The API sequence in the system file thread of the Malware is extracted by sandbox, obtains the API sequence of Malware;
The API sequence of the Malware is truncated, and the API sequence after truncation is compiled again according to multithreading Code, obtains the API multithreading sequence of Malware;
Construct initial detecting model;The initial detecting model is the disaggregated model based on empty convolution sum TextCNN;
Using the logloss that more classifies as evaluation goal, by the API multithreading sequence of Malware to the initial detecting model into Row iteration training, obtains the target detection model.
3. the method according to claim 1, wherein extracting the API in the system file thread by sandbox Sequence, comprising:
By sandbox to the system file thread carry out Dynamic Execution, obtain file call API Name, API thread number, The serial number of API Calls in API return value, thread.
4. the method according to claim 1, wherein the API sequence is truncated, and according to multithreading pair API sequence after truncation is recompiled, and API multithreading sequence is obtained, comprising:
When calling the quantity of documents of API to be more than preset threshold value in a certain thread, then API sequence corresponding to the thread carries out Truncation saves the API record of preset quantity, the API sequence being truncated;
According to multithreading, the API sequence of truncation is recompiled, obtains API multithreading sequence.
5. method according to any of claims 1-4, which is characterized in that by target detection model to the API Multithreading sequence is tested and analyzed, and software detection result is obtained, comprising:
The API multithreading sequence is detected by target detection model, judges that the API multithreading sequence is corresponding soft Whether number of packages evidence is malware data;
If malware data, then the type label of malware data is exported;
If not malware data, then prompt software data safe.
6. a kind of device of software detection characterized by comprising
Module is obtained, for obtaining software data to be detected;
Screening module, for filtering out system file thread from the software data;
Extraction module, for extracting the API sequence in the system file thread by sandbox;
Coding module carries out again the API sequence after truncation for the API sequence to be truncated, and according to multithreading Coding, obtains API multithreading sequence;
Module is obtained, for testing and analyzing by target detection model to the API multithreading sequence, obtains software detection As a result.
7. device according to claim 6, which is characterized in that passing through target detection model to the multi-thread program of the API Column are tested and analyzed, before obtaining software detection result, further includes:
Obtain malware data;The malware data includes: infection type virus, trojan horse program, digs mine program, extorts disease Poison;
From the malware data, the system file thread of Malware is filtered out;
The API sequence in the system file thread of the Malware is extracted by sandbox, obtains the API sequence of Malware;
The API sequence of the Malware is truncated, and the API sequence after truncation is compiled again according to multithreading Code, obtains the API multithreading sequence of Malware;
Construct initial detecting model;The initial detecting model is the disaggregated model based on empty convolution sum TextCNN;
Using the logloss that more classifies as evaluation goal, by the API multithreading sequence of Malware to the initial detecting model into Row iteration training, obtains the target detection model.
8. device according to claim 6, which is characterized in that extraction module is specifically used for:
By sandbox to the system file thread carry out Dynamic Execution, obtain file call API Name, API thread number, The serial number of API Calls in API return value, thread.
9. device according to claim 6, which is characterized in that coding module is specifically used for:
When calling the quantity of documents of API to be more than preset threshold value in a certain thread, then API sequence corresponding to the thread carries out Truncation saves the API record of preset quantity, the API sequence being truncated;
According to multithreading, the API sequence of truncation is recompiled, obtains API multithreading sequence.
10. the device according to any one of claim 6-9, which is characterized in that obtain module, be specifically used for:
The API multithreading sequence is detected by target detection model, judges that the API multithreading sequence is corresponding soft Whether number of packages evidence is malware data;
If malware data, then the type label of malware data is exported;
If not malware data, then prompt software data safe.
11. a kind of system of software detection characterized by comprising memory and processor are stored with the place in memory Manage the executable instruction of device;Wherein, the processor is configured to carry out perform claim requirement 1-5 via the execution executable instruction Any one of described in software detection method.
12. a kind of computer readable storage medium, is stored thereon with computer program, which is characterized in that the program is by processor The method of software detection of any of claims 1-5 is realized when execution.
CN201910353079.9A 2019-04-29 2019-04-29 Software detection method, device and system Active CN110135160B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910353079.9A CN110135160B (en) 2019-04-29 2019-04-29 Software detection method, device and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910353079.9A CN110135160B (en) 2019-04-29 2019-04-29 Software detection method, device and system

Publications (2)

Publication Number Publication Date
CN110135160A true CN110135160A (en) 2019-08-16
CN110135160B CN110135160B (en) 2021-11-30

Family

ID=67575625

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910353079.9A Active CN110135160B (en) 2019-04-29 2019-04-29 Software detection method, device and system

Country Status (1)

Country Link
CN (1) CN110135160B (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111475808A (en) * 2020-04-08 2020-07-31 苏州浪潮智能科技有限公司 Software security analysis method, system, equipment and computer storage medium
CN111797393A (en) * 2020-06-23 2020-10-20 哈尔滨安天科技集团股份有限公司 Detection method and device for malicious mining behavior based on GPU
CN112000954A (en) * 2020-08-25 2020-11-27 莫毓昌 Malicious software detection method based on feature sequence mining and simplification
CN112507330A (en) * 2020-11-04 2021-03-16 北京航空航天大学 Malicious software detection system based on distributed sandbox
CN112528284A (en) * 2020-12-18 2021-03-19 北京明略软件***有限公司 Malicious program detection method and device, storage medium and electronic equipment
CN113139187A (en) * 2021-04-22 2021-07-20 北京启明星辰信息安全技术有限公司 Method and device for generating and detecting pre-training language model
CN113571199A (en) * 2021-09-26 2021-10-29 成都健康医联信息产业有限公司 Medical data classification and classification method, computer equipment and storage medium

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140115718A1 (en) * 2012-10-18 2014-04-24 Broadcom Corporation Set Top Box Architecture With Application Based Security Definitions
CN107609396A (en) * 2017-09-22 2018-01-19 杭州安恒信息技术有限公司 A kind of escape detection method based on sandbox virtual machine
CN108133139A (en) * 2017-11-28 2018-06-08 西安交通大学 A kind of Android malicious application detecting system compared based on more running environment behaviors
CN108376220A (en) * 2018-02-01 2018-08-07 东巽科技(北京)有限公司 A kind of malice sample program sorting technique and system based on deep learning
CN108734012A (en) * 2018-05-21 2018-11-02 上海戎磐网络科技有限公司 Malware recognition methods, device and electronic equipment
CN108830077A (en) * 2018-06-14 2018-11-16 腾讯科技(深圳)有限公司 A kind of script detection method, device and terminal
CN108874658A (en) * 2017-12-25 2018-11-23 北京安天网络安全技术有限公司 A kind of sandbox analysis method, device, electronic equipment and storage medium
CN109635523A (en) * 2018-11-29 2019-04-16 北京奇虎科技有限公司 Application program detection method, device and computer readable storage medium
CN109657468A (en) * 2018-11-29 2019-04-19 北京奇虎科技有限公司 Virus behavior detection method, device and computer readable storage medium
US20190121977A1 (en) * 2017-10-19 2019-04-25 AO Kaspersky Lab System and method of detecting a malicious file

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140115718A1 (en) * 2012-10-18 2014-04-24 Broadcom Corporation Set Top Box Architecture With Application Based Security Definitions
CN107609396A (en) * 2017-09-22 2018-01-19 杭州安恒信息技术有限公司 A kind of escape detection method based on sandbox virtual machine
US20190121977A1 (en) * 2017-10-19 2019-04-25 AO Kaspersky Lab System and method of detecting a malicious file
CN108133139A (en) * 2017-11-28 2018-06-08 西安交通大学 A kind of Android malicious application detecting system compared based on more running environment behaviors
CN108874658A (en) * 2017-12-25 2018-11-23 北京安天网络安全技术有限公司 A kind of sandbox analysis method, device, electronic equipment and storage medium
CN108376220A (en) * 2018-02-01 2018-08-07 东巽科技(北京)有限公司 A kind of malice sample program sorting technique and system based on deep learning
CN108734012A (en) * 2018-05-21 2018-11-02 上海戎磐网络科技有限公司 Malware recognition methods, device and electronic equipment
CN108830077A (en) * 2018-06-14 2018-11-16 腾讯科技(深圳)有限公司 A kind of script detection method, device and terminal
CN109635523A (en) * 2018-11-29 2019-04-16 北京奇虎科技有限公司 Application program detection method, device and computer readable storage medium
CN109657468A (en) * 2018-11-29 2019-04-19 北京奇虎科技有限公司 Virus behavior detection method, device and computer readable storage medium

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
OMAR INVERSO ET AL: "Lazy-CSeq: A Context-Bounded Model Checking Tool for Multi-threaded C-Programs", 《IEEE》 *
芦效峰 等: "基于API序列特征和统计特征组合的恶意样本检测框架", 《清华大学学报(自然科学版)》 *

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111475808A (en) * 2020-04-08 2020-07-31 苏州浪潮智能科技有限公司 Software security analysis method, system, equipment and computer storage medium
CN111475808B (en) * 2020-04-08 2022-07-08 苏州浪潮智能科技有限公司 Software security analysis method, system, equipment and computer storage medium
CN111797393A (en) * 2020-06-23 2020-10-20 哈尔滨安天科技集团股份有限公司 Detection method and device for malicious mining behavior based on GPU
CN111797393B (en) * 2020-06-23 2023-05-23 安天科技集团股份有限公司 Method and device for detecting malicious mining behavior based on GPU
CN112000954A (en) * 2020-08-25 2020-11-27 莫毓昌 Malicious software detection method based on feature sequence mining and simplification
CN112000954B (en) * 2020-08-25 2024-01-30 华侨大学 Malicious software detection method based on feature sequence mining and simplification
CN112507330A (en) * 2020-11-04 2021-03-16 北京航空航天大学 Malicious software detection system based on distributed sandbox
CN112507330B (en) * 2020-11-04 2022-06-28 北京航空航天大学 Malicious software detection system based on distributed sandbox
CN112528284A (en) * 2020-12-18 2021-03-19 北京明略软件***有限公司 Malicious program detection method and device, storage medium and electronic equipment
CN113139187A (en) * 2021-04-22 2021-07-20 北京启明星辰信息安全技术有限公司 Method and device for generating and detecting pre-training language model
CN113139187B (en) * 2021-04-22 2023-12-19 北京启明星辰信息安全技术有限公司 Method and device for generating and detecting pre-training language model
CN113571199A (en) * 2021-09-26 2021-10-29 成都健康医联信息产业有限公司 Medical data classification and classification method, computer equipment and storage medium

Also Published As

Publication number Publication date
CN110135160B (en) 2021-11-30

Similar Documents

Publication Publication Date Title
CN110135160A (en) The method, apparatus and system of software detection
Raff et al. Malware detection by eating a whole exe
Li et al. Deeppayload: Black-box backdoor attack on deep learning models through neural payload injection
CN110008703B (en) System and method for statically detecting malicious software in container
CN109858239B (en) Dynamic and static combined detection method for CPU vulnerability attack program in container
CN110287702B (en) Binary vulnerability clone detection method and device
CN107590388A (en) Malicious code detection method and device
CN111753290B (en) Software type detection method and related equipment
US11106801B1 (en) Utilizing orchestration and augmented vulnerability triage for software security testing
KR102151318B1 (en) Method and apparatus for malicious detection based on heterogeneous information network
CN110618854B (en) Virtual machine behavior analysis system based on deep learning and memory mirror image analysis
CN108491228A (en) A kind of binary vulnerability Code Clones detection method and system
CN111931179A (en) Cloud malicious program detection system and method based on deep learning
CN110837641A (en) Malicious software detection method and detection system based on memory analysis
US20200335124A1 (en) Neural representation of automated conversational agents (chatbots)
Alrabaee et al. On leveraging coding habits for effective binary authorship attribution
CN111651768B (en) Method and device for identifying link library function name of computer binary program
CN109240807A (en) A kind of malicious program detection system and method based on VMI
CN114491523A (en) Malicious software detection method and device, electronic equipment, medium and product
Lin et al. Towards interpreting ML-based automated malware detection models: A survey
Grover et al. Malware threat analysis of IoT devices using deep learning neural network methodologies
CN110414233A (en) Malicious code detection method and device
Vahedi et al. Cloud based malware detection through behavioral entropy
Waghmare et al. A review on malware detection methods
CN113971282A (en) AI model-based malicious application program detection method and equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant