CN110210219B - Virus file identification method, device, equipment and storage medium - Google Patents

Virus file identification method, device, equipment and storage medium Download PDF

Info

Publication number
CN110210219B
CN110210219B CN201810539850.7A CN201810539850A CN110210219B CN 110210219 B CN110210219 B CN 110210219B CN 201810539850 A CN201810539850 A CN 201810539850A CN 110210219 B CN110210219 B CN 110210219B
Authority
CN
China
Prior art keywords
target file
file
operating system
virus
virtual operating
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201810539850.7A
Other languages
Chinese (zh)
Other versions
CN110210219A (en
Inventor
陈伟平
易洪
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tencent Technology Shenzhen Co Ltd
Original Assignee
Tencent Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tencent Technology Shenzhen Co Ltd filed Critical Tencent Technology Shenzhen Co Ltd
Priority to CN201810539850.7A priority Critical patent/CN110210219B/en
Publication of CN110210219A publication Critical patent/CN110210219A/en
Application granted granted Critical
Publication of CN110210219B publication Critical patent/CN110210219B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/50Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
    • G06F21/55Detecting local intrusion or implementing counter-measures
    • G06F21/56Computer malware detection or handling, e.g. anti-virus arrangements
    • G06F21/566Dynamic detection, i.e. detection performed at run-time, e.g. emulation, suspicious activities
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Security & Cryptography (AREA)
  • Software Systems (AREA)
  • Computer Hardware Design (AREA)
  • General Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Virology (AREA)
  • Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Stored Programmes (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The application discloses a virus file identification method, a virus file identification device, virus file identification equipment and a storage medium, and belongs to the technical field of computers. The method comprises the following steps: acquiring stake point data of at least one target file in the running process, wherein the stake point data comprise an Application Programming Interface (API) called by the target file and contents transmitted when the API is called by the target file; generating a behavior sequence of each target file according to the pile point data; and determining the target file with the behavior sequence matched with the standard characteristic sequence as a virus file. According to the method and the device, the behavior sequence of each target file is generated by obtaining the pile point data of at least one target file, the target file matched with the behavior sequence and the standard characteristic sequence is determined to be the virus file, reverse analysis is not needed to be carried out on the target file, so that the problem that reverse analysis is difficult to be carried out on the encrypted target file in the related technology is solved, and the virus file identification efficiency of the identification device is improved.

Description

Virus file identification method, device, equipment and storage medium
Technical Field
The present application relates to the field of computer technologies, and in particular, to a method, an apparatus, a device, and a storage medium for identifying a virus file.
Background
A virus file refers to a computer file into which malicious code is inserted. After the terminal stores or installs the virus file, when the operating environment of the terminal meets the trigger condition of the malicious code, corresponding destructive behavior can be generated, for example, the terminal function is destroyed when the virus file is started, or user data is stolen when the system clock of the terminal reaches a predetermined time, and the like.
In order to reduce the invasion of virus files to users, the terminal needs to identify the stored files. When it is determined that the files stored in the terminal have the virus files, the virus files need to be isolated or deleted. In the related art, the identification of the virus file is realized by performing reverse analysis on the target file by a technician.
Since a part of the object file is encrypted, it is difficult for a technician to analyze the encrypted object file, and thus the efficiency in identifying the encrypted virus file is low in the related art.
Disclosure of Invention
The embodiment of the application provides a method, a device, equipment and a storage medium for identifying a virus file, which are used for solving the problem that the method for identifying the virus file in the related technology is inaccurate. The technical scheme is as follows:
in one aspect, an embodiment of the present application provides a method for identifying a virus file, where the method includes:
acquiring stake point data of a target file in the running process, wherein the stake point data comprise an Application Programming Interface (API) called by the target file and contents transmitted when the API is called by the target file;
generating a behavior sequence of each target file in the at least one target file according to the pile point data;
and determining the target file matched with the behavior sequence and the standard characteristic sequence as a virus file, wherein the standard characteristic sequence is obtained by calculation according to the stake point data of the sample virus file.
In one aspect, an embodiment of the present application provides a method for identifying a virus file, where the method includes:
displaying prompt information on a first user interface, wherein the prompt information is used for prompting that a target file is being identified;
acquiring an API called by the target file and content transmitted when the target file calls the API;
and displaying a recognition result on a second user interface, wherein the recognition result is obtained according to the API called by the target file and the content transmitted when the API is called by the target file.
In one aspect, an embodiment of the present application provides an apparatus for identifying a virus file, where the apparatus includes:
the device comprises a collecting unit, a processing unit and a display unit, wherein the collecting unit is used for obtaining pile point data of a target file in the running process, and the pile point data comprises an Application Programming Interface (API) called by the target file and content transmitted when the API is called by the target file;
the generating unit is used for generating a behavior sequence of each target file in the at least one target file according to the pile point data;
and the identification unit is used for determining the target file matched with the behavior sequence and the standard characteristic sequence as a virus file, and the standard characteristic sequence is obtained by calculation according to the stake point data of the sample virus file.
In one aspect, an embodiment of the present application provides an electronic device, which includes a processor and a memory, where at least one instruction, at least one program, a code set, or a set of instructions is stored in the memory, and the at least one instruction, the at least one program, the code set, or the set of instructions is loaded and executed by the processor to implement the method for identifying a virus file as described above.
In one aspect, an embodiment of the present application provides a computer-readable storage medium, in which at least one instruction, at least one program, a code set, or a set of instructions is stored, and the at least one instruction, the at least one program, the code set, or the set of instructions is loaded and executed by the processor to implement the method for identifying a virus file as described above.
The beneficial effects that technical scheme that this application embodiment brought include at least:
the method comprises the steps of obtaining API called by at least one target file in the operation process and file transmitted in the API calling process to obtain pile point data of the at least one target file, generating a behavior sequence of each target file according to the pile point data, determining the target file of which the behavior sequence is matched with a standard characteristic sequence as a virus file, and generating the behavior sequence according to calling behavior and transmission content of the target file to the API without reversely analyzing the target file, so that the problem that encrypted target files are difficult to reversely analyze in the related technology is solved, and the virus file identification efficiency of identification equipment is improved.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings required to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the description below are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings without creative efforts.
FIG. 1 is a schematic diagram of an implementation environment of a virus file identification method according to an exemplary embodiment of the present application;
FIG. 2 is a flowchart of a method for identifying a virus file according to an exemplary embodiment of the present application;
FIG. 3 is a block diagram illustrating an exemplary embodiment of a system for identifying virus files provided in the present application;
FIG. 4 is a flowchart of a method for identifying a virus file provided by an exemplary embodiment of the present application;
FIG. 5 is an architecture diagram of a virtual operating system provided by an exemplary embodiment of the present application;
FIG. 6 is a flowchart illustrating the operation of a dynamic behavior collection sub-module provided by an exemplary embodiment of the present application;
FIG. 7 is a flowchart of the operation of a behavior sequence generation submodule provided in an exemplary embodiment of the present application;
FIG. 8 is a flowchart of the operation of the dynamic rules engine submodule provided in an exemplary embodiment of the present application;
FIG. 9 is a flowchart of a method for identifying virus files provided by an exemplary embodiment of the present application;
FIG. 10 is a display interface diagram of an application provided by an exemplary embodiment of the present application;
FIG. 11 is a block diagram illustrating an exemplary embodiment of a virus file identification apparatus;
fig. 12 is a block diagram of a terminal according to an exemplary embodiment of the present application;
fig. 13 is a block diagram of a computer device according to an exemplary embodiment of the present application.
Detailed Description
To make the objects, technical solutions and advantages of the present application more clear, the following detailed description of the embodiments of the present application will be made with reference to the accompanying drawings.
Typically, the virus file identification method can be applied to the following product scenarios:
(1) Establishing a virus file database in a server: after the server acquires at least one target file, pile point data of the target file in the running process is acquired, wherein the pile point data comprises an API called by the target file and content transmitted when the API is called; generating a behavior sequence of each target file in at least one target file according to the pile point data; determining a target file matched with the behavior sequence and the standard characteristic sequence as a virus file, wherein the standard characteristic sequence is obtained by calculation according to the stake point data of the sample virus file; and establishing a virus file database according to the target file determined as the virus file.
(2) The virus searching and killing application program installed on the terminal identifies the virus files: after the terminal acquires at least one target file, acquiring pile point data of the target file in the operation process; generating a behavior sequence of each target file in at least one target file according to the pile point data; determining a target file with the behavior sequence matched with the standard characteristic sequence as a virus file; virus files are quarantined and/or deleted.
In the following, some terms referred to in the embodiments of the present application are introduced:
application Programming Interface (API): is a function preset in the application program and is used for providing a function interface for other application programs to access the function of the application program. For example, an application may implement a series of program functions of an Android operating system by calling an API of the Android operating system, for example, the application implements a positioning service by calling an API corresponding to a satellite positioning function in the Android operating system.
The virtual operating system: the virtual mirror image of other operating systems generated in the identification device is generated, after the virtual operating system is entered, all operations are performed in the virtual operating system, application programs can be independently installed and run in the virtual operating system, data can be stored and the like, and the real operating system of the identification device cannot be influenced. The virtual operating system may also be considered as an operating system running in a virtual environment, or an operating system running in a simulator, or an operating system running in a virtual machine, and the specific form of the virtual operating system is not limited in the embodiments of the present application.
Pile point technology: the method is a technology for calling an API (application program interface) by modifying a source code of an operating system, adding a self-defined function in the source code and recording a target file through the self-defined function. Through the stub technology, when the target file calls an API of an operating system, namely a behavior log of the target file is stored locally in the device, the behavior log stores the calling behavior of the target file to the API and the content transmitted by the target file when the API is called. Illustratively, the dynamic log records an identifier of the target file, a called API, and data transmitted when the API is called, for example, the target file calls an API corresponding to a satellite positioning function, and the content corresponding to the calling behavior in the behavior log includes the identifier of the target file, the identifier of the satellite positioning function, and the acquired location information of the target file.
Hook technique (Hook): the method is a technology for recording the calling condition of a target file to an API (application program interface) through a self-defined function by replacing a preset function of an operating system with the self-defined function. And identifying the calling behavior of the target file to the API, which can be acquired by the device through the hook technology, and the content transmitted when the target file calls the API.
A machine learning model: the model is an operation model formed by connecting a large number of nodes (or neurons) with each other, each node corresponds to a strategy function, and the connection between every two nodes represents a weighted value for a signal passing through the connection, which is called weight. After the samples are input into the nodes of the machine learning model, an output result is output through each node, the output result is used as an input sample of the next node, the machine learning model adjusts the strategy function and the weight of each node through the final output result of the samples, and the process is called training.
Referring to fig. 1, a schematic diagram of an implementation environment of a virus file identification method according to an exemplary embodiment of the present application is shown. As shown in fig. 1, the implementation environment includes: a terminal 110 and a server 120, wherein the terminal 110 establishes a communication connection with the server 120 through a wired or wireless network.
The terminal 110 installs and runs an application program with a virus file searching and killing function, and after the terminal 110 downloads or installs the file to be tested, the application program obtains an identifier of the file to be tested, and reports the identifier of the file to be tested to the server 120 through a wired or wireless network.
After receiving the identifier of the file to be tested reported by the application program, the server 120 queries whether the identifier of the file to be tested is in the virus file database, if the identifier of the file to be tested is in the virus file database, the file to be tested is determined to be a virus file, a query result for confirming that the file to be tested is the virus file is sent to the application program through a wired or wireless network, and the application program isolates or kills the file to be tested according to the query result fed back by the server 120.
The virus file database of the server 120 may be constructed by the following method: acquiring stub point data of a target file in the running process, wherein the stub point data comprises an API called by the target file and content transmitted when the target file calls the API, the target file comprises at least one target file, and the at least one target file can be stored in the server 120 by a technician or uploaded to the server 120 by at least one terminal through a wired or wireless network; generating a behavior sequence of each target file in at least one target file according to the pile point data; determining a target file matched with the behavior sequence and the standard characteristic sequence as a virus file, wherein the standard characteristic sequence is obtained by calculation according to the stake point data of the sample virus file; and establishing a virus file database according to the identification of the target file determined as the virus file.
In an optional embodiment, an implementation environment of the virus file identification method provided in an exemplary embodiment of the present application includes a terminal 110 as shown in fig. 1.
An application program with a virus file searching and killing function is installed and operated in the terminal 110, and the application program identifies virus files through the following modes: pile point data of a target file in the running process is obtained, wherein the pile point data comprises an API called by the target file and content transmitted when the API is called by the target file; generating a behavior sequence of each target file in at least one target file according to the pile point data; and determining the target file matched with the behavior sequence and the standard characteristic sequence as a virus file, wherein the standard characteristic sequence is obtained by calculation according to the stake point data of the sample virus file.
The terminal 110 may be generally referred to as one of a plurality of terminals, and the embodiment is only illustrated by the terminal 110. The device types of the terminal 110 include: at least one of a smartphone, a tablet, an e-book reader, an MP3 player, an MP4 player, a laptop portable computer, and a desktop computer. The following embodiments are illustrated with the terminal comprising a smartphone.
Those skilled in the art will appreciate that the number of terminals may be greater. For example, the number of the terminals may be only one, or several tens or hundreds of the terminals, or more. The number of terminals and the type of the device are not limited in the embodiments of the present application.
Referring to fig. 2, a flowchart of a virus file identification method according to an exemplary embodiment of the present application is shown. The method is applied to an identification device, which may be the terminal 110 or the server 120 shown in fig. 1, and includes:
step 201, pile point data of the target file in the operation process is obtained.
After the identification device acquires at least one target file, the acquired target file is installed and/or operated, and the API of the operating system called by each target file in the operation process and the content transmitted by each target file when the API of the operating system is called are intercepted and acquired through the stub technology or the hook technology.
The object file is at least one of an executable program, a text file, a video file, an audio file, a moving image file, a table file, a compression package file, and a web page file. The executable program may be an executable program that needs to be installed, or an executable program that is not needed to be installed and can be run after being opened.
Optionally, the identification device is installed with a virtual operating system, after obtaining the at least one target file, the identification device installs and/or stores the at least one target file in the virtual operating system, runs each target file in the at least one target file in the virtual operating system, and obtains stub point data of each target file in the running process of the virtual operating system.
Illustratively, after acquiring at least one target file, the identification device installs and/or stores each target file in the at least one target file in the virtual operating system, runs each target file in the virtual operating system, and captures a call record of each target file calling an API of the virtual operating system by using a stub technique or a hook technique, so as to obtain stub data of each target file. The stake point data is output in a behavior log, for example:
sample:{pid1,action1(content1);pid2,action2(content2);……}
wherein, sample is a behavior log formed by the stub data of each target file, pid1 is an identifier of a first target file in at least one target file, action1 calls a calling behavior of a first API of the virtual operating system, and content1 calls content transmitted by the first API for the first target file; pid2 is an identifier of a second target file in the at least one target file, action2 is a calling behavior for calling a second API of the virtual operating system, and content2 calls the content transmitted by the second API for the second target file.
Step 202, generating a behavior sequence of each target file in at least one target file according to the pile point data.
And the identification equipment generates the pile point data belonging to the same target file into a behavior sequence of the target file according to the acquired pile point data.
Illustratively, the identifying the behavior log formed by the pile point data of each target file acquired by the device includes:
{pid1,action1(content1);pid2,action2(content2);pid1,action2(content3);pid3,action1(content4);pid2,action1(content4)}
the step of generating, by the identification device, the target file behavior sequence corresponding to the identifier from the pile point data belonging to the same identifier includes:
the behavior sequence of the first target file comprises: pid1: { action1 (content 1); action2 (content 3) };
the sequence of behaviors of the second target document includes: pid2: { action2 (content 2); action1 (content 4) };
the behavior sequence of the third target file includes: pid3: { action1 (content 4) }.
And step 203, determining the target file with the behavior sequence matched with the standard characteristic sequence as a virus file.
The identification equipment matches the behavior sequence of each target file with the standard characteristic sequence, and determines the target file corresponding to the behavior sequence with the matching degree higher than a preset threshold value as a virus file. And calculating the standard characteristic sequence according to the stake point data of the sample virus file.
Optionally, the identification device installs at least one sample virus file in the virtual operating system, obtains stub point data of the at least one sample virus file, generates a behavior sequence of each sample virus file according to the stub point data of each sample virus file in the at least one sample virus file, and obtains a standard feature sequence of the virus file by extracting features from the behavior sequence of each sample virus file by a technician.
Illustratively, the behavior sequence of acquiring the first target file by the identification device includes:
pid1:{action1(content1);action2(content3)};
if { action1 (content 1) is contained in the standard feature sequence; action2 (content 2), the matching degree of the behavior sequence of the first target file and the standard feature sequence is 50%, and if the preset threshold value of the matching degree is 40%, the first target file is determined to be a virus file.
To sum up, in the embodiment of the present application, the stub point data of at least one target file is obtained by obtaining the API called by at least one target file in the operation process and the file transmitted in the API calling process, the behavior sequence of each target file is generated according to the stub point data, and the target file whose behavior sequence matches the standard feature sequence is determined as a virus file.
Optionally, in this embodiment of the present application, at least one target file is installed in the virtual operating system, and stub data of the at least one target file in the running process of the virtual operating system is obtained, so as to generate a behavior sequence of each target file, and whether each target file is a virus file is determined according to the behavior sequence.
Referring to fig. 3, a block diagram of a virus file identification system according to an exemplary embodiment of the present application is shown. The system may be implemented in hardware or software in the terminal 110 or the server 120 shown in fig. 1, and includes a dynamic behavior collection sub-module 310, a behavior sequence generation sub-module 320, and a dynamic rule engine sub-module 330.
The dynamic behavior acquisition sub-module 310 is configured to obtain an API called in the running process of the target file and content transmitted in the API calling process, obtain a stub of each target file, record the stub data of each target file as a behavior log, and output the behavior log to the behavior sequence sub-module 320.
The behavior sequence generating sub-module 320 is configured to generate a behavior sequence of each target file according to the stub data of each target file, and output the behavior sequence of each target file to the rule engine sub-module 330. If the target file is a known sample virus file, a technician can extract a standard characteristic sequence capable of distinguishing the behavior of the virus through the behavior sequence of the sample virus file, and input the standard characteristic sequence into the rule engine submodule 330 for identification and recording.
And the rule engine sub-module 330 is configured to detect whether the behavior sequence of each target file matches the standard feature sequence, so as to determine whether the target file is a virus file. Meanwhile, a technician may extract the standard signature sequence according to the behavior sequence of the target file determined as the virus file, and integrate the standard signature sequence into the dynamic acquisition submodule 310 according to the standard signature sequence, thereby continuously optimizing the recognition system.
Referring to fig. 4, a flowchart of a virus file identification method according to an exemplary embodiment of the present application is shown. The method is applied to an identification device, which may be the terminal 110 or the server 120 shown in fig. 1, and includes:
step 401, at least one target file is installed in at least one version of virtual operating system.
Different virus files have different triggering conditions, for example, part of the virus files need to be triggered in a specific version of the operating system; meanwhile, different target files have different compatibility with different versions of operating systems, for example, an old version of target file cannot be installed in a new version of operating system.
Since the target file may be an executable file that does not need to be installed, or other non-executable files, the installation of the target file in the identification device may also be a storage target file, which is only exemplified by the installation target file in the embodiment of the present application.
In order to improve the identification accuracy of the virus file, at least one version of a virtual operating system may be deployed in the identification device, for example, a virtual Android (Android) 2.0 operating system, an Android 2.3 operating system, and an Android 4.0 operating system may be pre-installed in the identification device. A technician may implement emulation of the runtime environment of each version of the virtual android operating system by modifying the Hardware Abstraction Layer (HAL) of each version of the android operating system.
After the identification device acquires the at least one target file, installing each target file in the acquired at least one target file in the virtual operating system of each version.
At step 402, at least one target file is run in at least one virtual operating system.
After the identification device installs each target file in each version of the virtual operating environment, each target file is run in each version of the virtual operating system.
Step 403, detecting whether each target file has an abnormal operation in each version of the virtual operating system.
Since different target files have different compatibilities with different versions of operating systems, when the identification device runs each target file in each version of operating system, it needs to detect whether each target file runs abnormally in each version of virtual operating system. If the identification device detects that the target file has abnormal operation, the step 404b is entered; if the identification device does not detect a target file with an operational anomaly, step 404a is entered.
In step 404a, a simulation trigger event is executed in each version of the virtual operating system at preset time intervals.
Since the trigger conditions of different virus files are different, the identification device needs to execute the simulation trigger event at preset time intervals in each version of the virtual operating system.
A simulation trigger event is an event that simulates an operation in the virtual operating system on the virtual operating system or the target file, for example, a simulation trigger event may be a simulation of starting the virtual operating system, clicking on the target file,
illustratively, the identification device executes an instruction for accelerating a counter in the virtual operating system in the first version of the virtual operating system, so as to detect whether a virus file with a time as a trigger condition exists in the first version of the virtual operating system; or, the identification device sequentially clicks the user interface of each target file in the virtual operating system of the first version, and detects whether a virus file taking a clicking operation as a trigger condition exists in the virtual operating system of the first version; or, the identification device restarts the virtual operating system in the first version of the virtual operating system, detects whether a virus file which takes the started operating system as a trigger exists in the first version of the virtual operating system, and the like.
The identification device can execute the simulation trigger event in two ways, wherein one way is that the simulation trigger event is executed through an accessible Service plug-in (access Service) in each version of virtual operating system; one is to execute the simulation trigger event by a test tool (Monkey) in each version of the virtual operating system.
And step 404b, unloading and/or deleting the target file with the abnormal operation.
When the identification device detects that the target file with the abnormal operation exists, the target file with the abnormal operation is unloaded and/or deleted from the virtual operating system for installing the target file.
Step 405, pile point data of each target file after executing the simulation trigger event is obtained.
Illustratively, before acquiring the stub point data of each target file, the identification device injects a custom code into each version of the virtual operating system, and replaces a preset function in the source code of each version of the virtual operating system with the custom code. Wherein the preset function is obtained by a technician according to the API called in the standard feature sequence. For example, if the API of the screenshot function is called in the standard signature sequence of a certain sample virus file, the technician can set the screenshot function as a preset function.
Optionally, the virtual operating system is a virtual android operating system, the identification device injects a custom code into a Java Layer (Java Layer) and/or a Native Layer (Native Layer) of the virtual android operating system through a hook technology, and replaces a preset function in each version of the android operating system with the custom function through the custom code.
Optionally, the virtual operating system is a virtual android operating system, and the technician adds a custom function in a Java layer and/or a Native layer of the virtual android operating system.
And the identification equipment captures the stub point data through a custom function in each version of the virtual operating system. The stub data comprises an API called in the virtual operating system of each version, a target file identifier of the calling API, a timestamp of calling the API each time and content transmitted when the API is called. And the identification equipment generates a behavior log according to the pile point data.
Because the trigger conditions of different virus files are different, the identification device needs to execute the simulation trigger conditions in the virtual operating system of each version, simulate the running condition of the virus files in the real operating system, and obtain the calling behavior and calling content of each target file to the API after the simulation trigger conditions are executed.
Illustratively, the stub point data acquired by the identification device is output in the form of a behavior log, which includes:
sample1:{time1,pid1,action1(content1);time2,pid2,action2(content2);……}
sample2:{time1,pid1,action1(content1);time2,pid3,action1(content3);……}
wherein sample1 is a first version of virtual operating system, action1 represents a calling behavior for calling a first API, time1 represents calling the first API, transmission content is a first timestamp of content1, content1 represents content for calling the first API for transmission, pid1 represents calling the first API at the first timestamp, calling content is an identifier of a first target file of content1, action2 represents calling behavior for calling a second API, time2 represents calling the second API, transmission content is a second timestamp of content12, content2 represents content for calling the second API for transmission, pid2 represents calling the second API at the second timestamp, and calling content is an identifier of a second target file of content 2; sample2 is a virtual operating system of a second version, action1 represents a calling behavior for calling a first API, time1 represents calling the first API, the transmission content is a first timestamp of content1, content1 is the content transmitted for calling the first API, pid1 represents calling the first API at the first timestamp, the calling content is an identifier of a first target file of content1, time2 represents calling the first API, the transmission content is a first timestamp of content3, content3 is the content transmitted for calling the first API, pid3 represents calling the first API at a second timestamp, and the calling content is an identifier of a third target file of content 3.
And step 406, generating a behavior sequence of each target file according to the API called by the target file and the content transmitted when the target file calls the API, and according to the sequence of the timestamps of the API called by the target file.
And the identification equipment generates the stub data belonging to the same target file in the virtual operating system of each version into a behavior sequence of the target file according to the acquired stub data of each target file in the virtual operating system of each version.
For example, the acquiring, by the identification device, the behavior log formed by the stub data of each target file in the first version of the virtual operating system includes:
{pid1,action1(content1);pid2,action2(content2);pid1,action2(content3);pid3,action1(content4);pid2,action1(content4)}
the step of generating, by the identification device, a behavior sequence of the target file corresponding to the identifier from the stub point data belonging to the same identifier, and obtaining the behavior sequence of each target file in the first version of the virtual operating system includes:
the sequence of behaviors of the first target document includes: pid1: { action1 (content 1); action2 (content 3) };
the behavior sequence of the second target file comprises: pid2: { action2 (content 2); action1 (content 4) };
the sequence of actions of the third target file includes: pid3: { action1 (content 4) }.
The step of acquiring the behavior log formed by the stub data of each target file in the second version of the virtual operating system by the identification device comprises the following steps:
{pid1,action1(content1);pid2,action2(content3);pid1,action2(content2);pid3,action2(content4);pid2,action1(content4)}
the step of generating, by the identification device, a behavior sequence of the target file corresponding to the identifier from the stub data belonging to the same identifier to obtain the behavior sequence of each target file in the virtual operating system of the second version includes:
the sequence of behaviors of the first target document includes: pid1: { action1 (content 1); action2 (content 2) };
the behavior sequence of the second target file comprises: pid2: { action2 (content 3); action1 (content 4) };
the behavior sequence of the third target file includes: pid3: { action2 (content 4) }.
Therefore, the behavior sequence of the first target file in the first version of the virtual operating system is found to be pid1: { action1 (content 1); action2 (content 3), the sequence of actions in the second version of the virtual operating system is pid1: { action1 (content 1); action2 (content 2) }; the behavior sequence of the second target file in the first version of the virtual operating system is pid2: { action2 (content 2); action1 (content 4), the action sequence in the second version of the virtual operating system is pid2: { action2 (content 3); action1 (content 4) }; the action sequence of the third target file in the first version of the virtual operating system is pid3: { action1 (content 4) }, and the action sequence of the third target file in the second version of the virtual operating system is pid3: { action2 (content 4) }.
Step 407, matching the behavior sequence of each target file in the operating system of each version with the standard feature sequence, and determining the target file with the matching degree higher than the preset threshold value as a virus file.
The identification equipment matches the behavior sequence of each target file in the virtual system of each version with the standard characteristic sequence, and determines the target file corresponding to the behavior sequence with the matching degree higher than a preset threshold value as a virus file.
It should be noted that, since each target file may have different behavior sequences in multiple versions of the virtual system, as long as the matching degree of any behavior sequence of the target file in the multiple versions of the virtual system with the standard feature sequence is higher than the preset threshold, the target file is determined to be a virus file.
Optionally, the identification device installs at least one sample virus file in the virtual operating system, obtains stub point data of the at least one sample virus file, generates a behavior sequence of each sample virus file according to the stub point data of each sample virus file in the at least one sample virus file, and the technician extracts features from the behavior sequence of each sample virus file to obtain a standard feature sequence of the virus file.
Illustratively, the step of acquiring, by the identification device, the row sequence of the first target file in the first version of the virtual operating system is: pid1: { action1 (content 1); action2 (content 3) }; the behavior sequence of the first target file in the second version of the virtual operating system is pid1: { action1 (content 1); action2 (content 2) }
If { action1 (content 4) is included in the standard signature sequence; action2 (content 2), the matching degree of the behavior sequence of the first target file in the virtual operating system of the first version and the standard feature sequence is 0%, the matching degree of the behavior sequence of the first target file in the virtual operating system of the second version and the standard feature sequence is 50%, and if the preset threshold value of the matching degree is 40%, the first target file is determined to be a virus file.
Optionally, the identification device inputs the behavior sequence into a virus file identification model to obtain a matching result of the behavior sequence and the standard feature sequence, wherein the virus file identification model is a machine learning model and is obtained by training according to at least one sample virus file; and determining the target file corresponding to the behavior sequence with the matching result of the preset value as a virus file.
Illustratively, the identification device inputs the behavior sequence of at least one sample virus file into a virus file identification model to obtain a matching result, and if the matching result is different from the calibration result, the strategy function and the weight of each node in the virus identification model are adjusted and trained to obtain a trained virus file identification model.
To sum up, in the embodiment of the present application, the stub point data of at least one target file is obtained by obtaining the API called by at least one target file in the operation process and the file transmitted in the API calling process, the behavior sequence of each target file is generated according to the stub point data, and the target file whose behavior sequence matches the standard feature sequence is determined as a virus file.
Optionally, in this embodiment of the present application, at least one target file is installed in the virtual operating system, and stub data of the at least one target file in the running process of the virtual operating system is obtained, so as to generate a behavior sequence of each target file, and whether each target file is a virus file is determined according to the behavior sequence.
Optionally, in this embodiment of the present application, at least one version of virtual operating system is deployed in the identification device, at least one target file is installed in the at least one version of virtual operating system, stub data of each target file in each virtual operating system is obtained, so as to generate a behavior sequence of each target file in each version of operating system, and whether each target file is a virus file is determined according to the behavior sequence of each target file in each version of operating system, so that a problem of different compatibility of different virus files in different versions of operating systems is solved, and accuracy of the identification method for virus files is improved.
Optionally, in the embodiment of the present application, by detecting whether the target file running in the virtual operating system of each version runs abnormally or not, the target file running abnormally is unloaded, so that the problem that the running load of the identification device is increased because different target files run abnormally in the operating systems of the versions with low compatibility due to different compatibilities of different target files in the operating systems of different versions is solved, and the running load is reduced and the working efficiency of the identification device is improved because the identification device unloads the target file running abnormally.
Optionally, in the embodiment of the present application, the pile point data of each target file after the simulation triggering condition is executed is obtained by executing the simulation triggering condition, and the behavior sequence of each target file is generated according to the pile point data, so that whether each target file is a virus file is determined according to the behavior sequence of each target file, a problem that the pile point data of the virus file under the triggering condition cannot be accurately obtained due to different triggering conditions of different virus files is solved, and accuracy of virus file identification is improved.
Optionally, in the embodiment of the present application, by calling the virus file identification model, it is determined whether the behavior sequence of the target file matches the standard feature sequence, so that accuracy of virus file identification is improved.
Fig. 5 to 8 illustrate a specific application example of the present application:
referring to FIG. 5, a block diagram of a virtual operating system is shown, according to an exemplary embodiment of the present application. The virtual operating system comprises an android system image file 500 installed on the identification device, wherein the image file 500 comprises three parts, namely a virtual operating system 510, a custom function 520 and a trigger tool 530.
The virtual operating system 510 is a virtual android operating system, where technicians modify HALs of the operating system to simulate a running environment, and multiple versions of the virtual android operating system are deployed in the identification device, so as to ensure compatibility.
The custom function 520 is a function for capturing stub data of the target file, and can be implemented by modifying a source code of an operating system through a stub technology or a hook technology.
The identification device can inject custom codes into a Java layer or a Native layer of the virtual android operating system through a hook technology, preset functions in the android operating system of each version are replaced by the custom functions 520 through the custom codes, and pile point data are captured through the custom functions 520.
Or, the identification device may modify the source code of the Java layer or Native layer of the virtual android operating system through the stub technology, add the self-defined function 520 in the source code, and capture stub data through the self-defined function 520.
The trigger tool 530 is a plug-in for executing a simulated trigger behavior in the image file, and for the virtual android operating system, the simulated trigger behavior is executed through an Accessibility Service or a Monkey test tool in the system.
As shown in fig. 6, the work flow of the dynamic behavior collection sub-module 310 includes: in step 601, the dynamic collection submodule 310 installs at least one obtained target file, which is typically an application program file in an Android Package (APK) format, in the image file 500; in step 602, the dynamic collection sub-module 310 executes a simulation trigger action through a trigger tool 530 (access property Service or Monkey) in the image file 500, so as to trigger the target file; in step 603, the dynamic collection sub-module 310 detects whether the running target file runs abnormally, and if the running target file runs abnormally, the running target file is unloaded in step 604; in step 605, the dynamic collection sub-module 310 obtains the triggered stub data of the target file, and outputs a behavior log.
As shown in fig. 7, the workflow of the behavior sequence generation submodule 320 includes: in step 701, the behavior sequence generation submodule 320 reads the behavior log output by the dynamic behavior acquisition submodule 310; in step 702, the behavior sequence generation sub-module 320 detects whether the stub data in the behavior log is a target file, for example, when the behavior sequence generation sub-module 320 generates a behavior sequence of a first target file, in the behavior log, it detects whether the identifier corresponding to the stub data is the identifier of the first target file, if not, the behavior log is continuously read, and if yes, step 703 is performed; in step 703, the behavior sequence generating sub-module 320 merges the stub data belonging to the same target file to generate a behavior sequence of the target file, and outputs the behavior sequence of each target file in at least one target file.
As shown in fig. 8, the work flow of the dynamic rules engine sub-module 330 includes: in step 801, a sequence of behaviors is entered in the dynamics engine submodule 330; in step 802, the dynamic engine sub-module 330 matches the behavior sequence with a standard feature sequence, where the standard feature sequence is a feature behavior sequence manually extracted from a sample virus file by a technician; in step 803, the dynamic engine sub-module 330 detects whether the behavior sequence matches the standard feature sequence, if so, determines that the target file corresponding to the behavior sequence is a virus file, and ends the process, and if not, inputs other behavior sequences.
Referring to fig. 9, a flowchart of a virus file identification method according to an exemplary embodiment of the present application is shown. The method can be applied to the terminal 110 shown in fig. 1, and the method includes:
step 901, displaying prompt information on a first user interface, where the prompt information is used to prompt that a target file is being identified.
As shown in fig. 10, after the terminal 110 acquires the target file, the prompt information is displayed on the first user interface 1100: "target file 1 is being identified …".
The terminal 110 is installed with an application program for identifying virus files, and after the user installs the application program, the application program can automatically identify the target file acquired by the terminal in the terminal background, or the user can open the application program and manually select the target file to be identified. When the application program identifies the target file, prompt information is displayed on the first user interface to prompt that the target file is being identified.
Step 902, obtain the API called by the target file and the content transmitted when the target file calls the API.
The terminal 110 displays the target file to be identified, and simultaneously obtains the API called by the target file and the content transmitted when the target file calls the API.
Step 903, displaying the recognition result on the second user interface, wherein the recognition result is obtained according to the API called by the target file and the content transmitted when the API is called by the target file.
As shown in FIG. 10, the terminal 110 displays the recognition result "the target file is not a virus file, please be confident to use! ", the recognition result is obtained by the terminal 110 according to the API called by the target file and the contents transmitted when the target file calls the API.
The terminal 110 may obtain the identification result according to the API called by the target file and the content transmitted when the target file calls the API, according to the identification method of the virus file in the foregoing embodiment.
Referring to fig. 11, a block diagram of a virus file identification apparatus provided in an exemplary embodiment of the present application, which may be implemented by software, hardware, or a combination of the two as all or a part of the terminal 110 or the server 120 shown in fig. 1, includes: acquisition unit 1110, generation unit 1120, and identification unit 1130.
The acquiring unit 1110 is configured to acquire stub data of the target file in the running process, where the stub data includes an API called by the target file and content transmitted when the target file calls the API.
The generating unit 1120 is configured to generate a behavior sequence of each target file in the at least one target file according to the stub point data.
The identifying unit 1130 is configured to determine a target file with a behavior sequence matching the standard feature sequence as a virus file, where the standard feature sequence is calculated according to the stake point data of the sample virus file.
In an optional embodiment, the collecting unit 1110 is further configured to install a target file in the virtual operating system; running the target file in the virtual operating system; and acquiring stake point data of the target file in the running process of the virtual operating system, wherein the stake point data comprises an API (application programming interface) of the virtual operating system called by the target file and the content transmitted when the API is called by the target file.
In an optional embodiment, the collecting unit 1110 is further configured to inject custom code into source code of the virtual operating system through hooking technology; replacing a preset function in the source code with a custom function through the custom code; and intercepting the API corresponding to the calling preset function of the target file and the content transmitted when the target file calls the API corresponding to the preset function through the custom function, and obtaining the stake point data of each target file.
In an optional embodiment, the acquisition unit 1110 is further configured to capture, by using a custom function in the virtual operating system, an API corresponding to a preset function called by the target file and content transmitted when the API corresponding to the preset function called by the target file, so as to obtain stub data of each target file.
In an optional embodiment, the stub data further includes a timestamp of the target file calling API;
the generating unit 1120 is further configured to generate a behavior sequence of each target file according to the API called by the target file and the content transmitted when the target file calls the API, and according to the sequence of the timestamps of the target file calling the API.
In an optional embodiment, the acquiring unit 1110 is further configured to execute a simulation trigger event in the virtual operating system at preset time intervals, where the simulation trigger event is an event of an operation on the virtual operating system or a target file simulated in the virtual operating system; pile point data of the target file after the simulation trigger event is executed is obtained.
In an optional embodiment, the acquisition unit 1110 is further configured to execute, at preset time intervals, a simulation trigger event through an accessible service plug-in the virtual operating system; or, executing the simulation trigger event by the test tool in the virtual operating system at preset time intervals.
In an optional embodiment, the collecting unit 1110 is further configured to run at least one target file in at least one version of the virtual operating system; pile point data of a target file in the running process of at least one version of virtual operating system is obtained.
In an optional embodiment, the collecting unit 1110 is further configured to determine whether an ith target file of the at least two target files runs abnormally, where i is a natural number; when the operation of the ith target file is determined to be abnormal, unloading and/or deleting the ith target file in the virtual operating system; and when the ith target file is determined not to be abnormal in operation, pile point data of the ith target file in the operation process of the virtual operating system is continuously acquired.
In an optional embodiment, the identifying unit 1130 is further configured to input the behavior sequence into a virus file identification model, so as to obtain a matching result between the behavior sequence and the standard feature sequence; determining a target file corresponding to the behavior sequence with the matching result of a preset value as a virus file; wherein, the virus file identification model is obtained by training according to at least one sample virus file.
Fig. 12 is a block diagram illustrating a terminal 1200 according to an exemplary embodiment of the present invention. The terminal 1200 may be a portable mobile terminal such as: smart phones, tablet computers, MP3 players (Moving Picture Experts Group Audio Layer III, moving Picture Experts Group Audio Layer IV, moving Picture Experts Group Audio Layer 4) players. Terminal 1200 may also be referred to by other names such as user equipment, portable terminal, and the like.
In general, terminal 1200 includes: a processor 1201 and a memory 1202.
The processor 1201 may include one or more processing cores, such as a 4-core processor, an 8-core processor, and so on. The processor 1201 may be implemented in at least one hardware form of a DSP (Digital Signal Processing), an FPGA (Field-Programmable Gate Array), and a PLA (Programmable Logic Array). The processor 1201 may also include a main processor and a coprocessor, where the main processor is a processor for Processing data in an awake state, and is also called a Central Processing Unit (CPU); a coprocessor is a low power processor for processing data in a standby state. In some embodiments, the processor 1201 may be integrated with a GPU (Graphics Processing Unit) that is responsible for rendering and drawing content that the display screen needs to display. In some embodiments, the processor 1201 may further include an AI (Artificial Intelligence) processor for processing a computing operation related to machine learning.
Memory 1202 may include one or more computer-readable storage media, which may be tangible and non-transitory. Memory 1202 may also include high-speed random access memory, as well as non-volatile memory, such as one or more magnetic disk storage devices, flash memory storage devices. In some embodiments, a non-transitory computer readable storage medium in memory 1202 is used to store at least one instruction for execution by processor 1201 to implement the virus file identification methods provided herein.
In some embodiments, the terminal 1200 may further optionally include: a peripheral device interface 1203 and at least one peripheral device. Specifically, the peripheral device includes: at least one of radio frequency circuitry 1204, touch display 1205, camera 1206, audio circuitry 1207, and power supply 1208.
Peripheral interface 1203 may be used to connect at least one peripheral associated with I/O (Input/Output) to processor 1201 and memory 1202. In some embodiments, the processor 1201, memory 1202, and peripheral interface 1203 are integrated on the same chip or circuit board; in some other embodiments, any one or two of the processor 1201, the memory 1202 and the peripheral device interface 1203 may be implemented on a separate chip or circuit board, which is not limited in this embodiment.
The Radio Frequency circuit 1204 is used for receiving and transmitting RF (Radio Frequency) signals, also called electromagnetic signals. The radio frequency circuit 1204 communicates with a communication network and other communication devices by electromagnetic signals. The radio frequency circuit 1204 converts an electrical signal into an electromagnetic signal to transmit, or converts a received electromagnetic signal into an electrical signal. Optionally, the radio frequency circuit 1204 comprises: an antenna system, an RF transceiver, one or more amplifiers, a tuner, an oscillator, a digital signal processor, a codec chipset, a subscriber identity module card, and so forth. The radio frequency circuit 1204 may communicate with other terminals through at least one wireless communication protocol. The wireless communication protocols include, but are not limited to: the world wide web, metropolitan area networks, intranets, generations of mobile communication networks (2G, 3G, 4G, and 5G), wireless local area networks, and/or WiFi (Wireless Fidelity) networks. In some embodiments, the rf circuit 1204 may further include NFC (Near Field Communication) related circuits, which are not limited in this application.
The touch display 1205 is used to display a UI (User Interface). The UI may include graphics, text, icons, video, and any combination thereof. The touch display screen 1205 also has the ability to capture touch signals on or over the surface of the touch display screen 1205. The touch signal may be input to the processor 1201 as a control signal for processing. The touch display 1205 is used to provide virtual buttons and/or a virtual keyboard, also referred to as soft buttons and/or a soft keyboard. In some embodiments, the touch display 1205 may be one, providing the front panel of the terminal 1200; in other embodiments, the touch display 1205 can be at least two, respectively disposed on different surfaces of the terminal 1200 or in a folded design; in still other embodiments, the touch display 1205 may be a flexible display disposed on a curved surface or on a folded surface of the terminal 1200. Even more, the touch display panel 1205 may be arranged in a non-rectangular irregular figure, i.e., a shaped panel. The touch Display panel 1205 can be made of LCD (Liquid Crystal Display), OLED (Organic Light-Emitting Diode), and other materials.
Camera assembly 1206 is used to capture images or video. Optionally, camera assembly 1206 includes a front camera and a rear camera. Generally, a front camera is used for realizing video call or self-shooting, and a rear camera is used for realizing shooting of pictures or videos. In some embodiments, the number of the rear cameras is at least two, and each of the rear cameras is any one of a main camera, a depth-of-field camera and a wide-angle camera, so that the main camera and the depth-of-field camera are fused to realize a background blurring function, and the main camera and the wide-angle camera are fused to realize a panoramic shooting function and a VR (Virtual Reality) shooting function. In some embodiments, camera assembly 1206 may also include a flash. The flash lamp can be a monochrome temperature flash lamp or a bicolor temperature flash lamp. The double-color-temperature flash lamp is a combination of a warm-light flash lamp and a cold-light flash lamp, and can be used for light compensation at different color temperatures.
The audio circuit 1207 is used to provide an audio interface between the user and the terminal 1200. The audio circuitry 1207 may include a microphone and a speaker. The microphone is used for collecting sound waves of a user and the environment, converting the sound waves into electric signals, and inputting the electric signals into the processor 1201 for processing or inputting the electric signals into the radio frequency circuit 1204 to achieve voice communication. For stereo capture or noise reduction purposes, multiple microphones may be provided at different locations of terminal 1200. The microphone may also be an array microphone or an omni-directional pick-up microphone. The speaker is used to convert electrical signals from the processor 1201 or the radio frequency circuit 1204 into sound waves. The loudspeaker can be a traditional film loudspeaker or a piezoelectric ceramic loudspeaker. When the speaker is a piezoelectric ceramic speaker, the speaker can be used for purposes such as converting an electric signal into a sound wave audible to a human being, or converting an electric signal into a sound wave inaudible to a human being to measure a distance. In some embodiments, the audio circuitry 1207 may also include a headphone jack.
A power supply 1208 is used to supply power to various components in the terminal 1200. The power supply 1208 may be an alternating current, direct current, disposable battery, or rechargeable battery. When power supply 1208 includes a rechargeable battery, the rechargeable battery can be a wired rechargeable battery or a wireless rechargeable battery. The wired rechargeable battery is a battery charged through a wired line, and the wireless rechargeable battery is a battery charged through a wireless coil. The rechargeable battery may also be used to support fast charge technology.
In some embodiments, the terminal 1200 also includes one or more sensors 1209. The one or more sensors 1209 include, but are not limited to: acceleration sensor 1210, gyro sensor 1211, pressure sensor 1212, optical sensor 1213 and proximity sensor 1214.
The acceleration sensor 1210 may detect the magnitude of acceleration in three coordinate axes of a coordinate system established with the terminal 1200. For example, the acceleration sensor 1210 may be used to detect the components of the gravitational acceleration in three coordinate axes. The processor 1201 may control the touch display 1205 to display the user interface in a landscape view or a portrait view according to the gravitational acceleration signal collected by the acceleration sensor 1210. The acceleration sensor 1210 may also be used for game or user motion data acquisition.
The gyro sensor 1211 may detect a body direction and a rotation angle of the terminal 1200, and the gyro sensor 1211 may collect a 3D motion of the user with respect to the terminal 1200 in cooperation with the acceleration sensor 1210. From the data collected by the gyro sensor 1211, the processor 1201 may implement the following functions: motion sensing (such as changing the UI according to a user's tilting operation), image stabilization at the time of photographing, game control, and inertial navigation.
The pressure sensor 1212 may be disposed on a side bezel of the terminal 1200 and/or an underlying layer of the touch display 1205. When the pressure sensor 1212 is disposed on the side frame of the terminal 1200, a user's grip signal on the terminal 1200 may be detected, and left-right hand recognition or shortcut operation may be performed according to the grip signal. When the pressure sensor 1212 is disposed at a lower layer of the touch display panel 1205, the operability control on the UI interface may be controlled according to a pressure operation performed by the user on the touch display panel 1205. The operability control comprises at least one of a button control, a scroll bar control, an icon control and a menu control.
The optical sensor 1213 is used to collect the ambient light intensity. In one embodiment, the processor 1201 may control the display brightness of the touch display 1205 according to the ambient light intensity collected by the optical sensor 1213. Specifically, when the ambient light intensity is high, the display brightness of the touch display panel 1205 is increased; when the ambient light intensity is low, the display brightness of the touch display panel 1205 is turned down. In another embodiment, the processor 1201 may also dynamically adjust the shooting parameters of the camera assembly 1206 according to the ambient light intensity collected by the optical sensor 1213.
Proximity sensors 1214, also known as distance sensors, are typically provided on the front face of terminal 1200. The proximity sensor 1214 is used to collect the distance between the user and the front surface of the terminal 1200. In one embodiment, when the proximity sensor 1214 detects that the distance between the user and the front surface of the terminal 1200 gradually decreases, the processor 1201 controls the touch display 1205 to switch from the bright screen state to the dark screen state; when the proximity sensor 1214 detects that the distance between the user and the front surface of the terminal 1200 gradually becomes larger, the touch display 1205 is controlled by the processor 1201 to switch from the breath-screen state to the bright-screen state.
Those skilled in the art will appreciate that the configuration shown in fig. 12 is not intended to be limiting of terminal 1200 and may include more or fewer components than those shown, or some components may be combined, or a different arrangement of components may be used.
Fig. 13 shows a schematic structural diagram of a computer device according to an embodiment of the present application. The computer device is used for implementing the virus file identification method provided in the above embodiment, and may be the server 120 in the implementation of fig. 1. Specifically, the method comprises the following steps:
the computer device 1300 includes a Central Processing Unit (CPU) 1301, a system memory 1304 including a Random Access Memory (RAM) 1302 and a Read Only Memory (ROM) 1303, and a system bus 1305 connecting the system memory 1304 and the central processing unit 1301. The computer device 1300 also includes a basic input/output system (I/O system) 1306, which facilitates transfer of information between devices within the computer, and a mass storage device 1307 for storing an operating system 1313, application programs 1314, and other program modules 1315.
The basic input/output system 1306 includes a display 1308 for displaying information and an input device 1309, such as a mouse, keyboard, etc., for a user to input information. Wherein the display 1308 and input device 1309 are connected to the central processing unit 1301 through an input-output controller 1310 connected to the system bus 1305. The basic input/output system 1306 may also include an input/output controller 1310 for receiving and processing input from a number of other devices, such as a keyboard, mouse, or electronic stylus. Similarly, the input-output controller 1310 also provides output to a display screen, a printer, or other type of output device.
The mass storage device 1307 is connected to the central processing unit 1301 through a mass storage controller (not shown) connected to the system bus 1305. The mass storage device 1307 and its associated computer-readable media provide non-volatile storage for the computer device 1300. That is, the mass storage device 1307 may include a computer-readable medium (not shown) such as a hard disk or CD-ROM drive.
Without loss of generality, the computer-readable media may comprise computer storage media and communication media. Computer storage media includes volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data. Computer storage media includes RAM, ROM, EPROM, EEPROM, flash memory or other solid state memory technology, CD-ROM, DVD, or other optical storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices. Of course, those skilled in the art will appreciate that the computer storage media is not limited to the foregoing. The system memory 1304 and mass storage device 1307 described above may collectively be referred to as memory.
According to various embodiments of the present application, the computer device 1300 may also operate as a remote computer connected to a network via a network, such as the Internet. That is, the computer device 1300 may be connected to the network 1312 through the network interface unit 1311, which is coupled to the system bus 1305, or may be connected to other types of networks or remote computer systems (not shown) using the network interface unit 1311.
The memory also includes one or more programs stored in the memory and configured to be executed by one or more processors. The one or more programs include instructions for performing the virus file identification method provided in the above embodiments.
The present application further provides a computer-readable storage medium, in which at least one instruction, at least one program, a code set, or a set of instructions is stored, and the at least one instruction, the at least one program, the code set, or the set of instructions is loaded and executed by the processor to implement the method for identifying a virus file provided by the above method embodiment.
Optionally, the present application further provides a computer program product containing instructions, which when run on a computer, causes the computer to execute the method for identifying a virus file according to the above aspects.
It should be understood that reference herein to "a plurality" means two or more. "and/or" describes the association relationship of the associated objects, meaning that there may be three relationships, e.g., a and/or B, which may mean: a exists alone, A and B exist simultaneously, and B exists alone. The character "/" generally indicates that the former and latter associated objects are in an "or" relationship.
The above-mentioned serial numbers of the embodiments of the present application are merely for description, and do not represent the advantages and disadvantages of the embodiments.
It will be understood by those skilled in the art that all or part of the steps for implementing the above embodiments may be implemented by hardware, or may be implemented by a program instructing relevant hardware, where the program may be stored in a computer-readable storage medium, and the storage medium may be a read-only memory, a magnetic disk or an optical disk.
The above description is only exemplary of the present application and should not be taken as limiting the present application, as any modification, equivalent replacement, or improvement made within the spirit and principle of the present application should be included in the protection scope of the present application.

Claims (10)

1. A method for identifying a virus file, the method comprising:
installing at least one target file in a plurality of versions of the virtual operating system;
running the at least one target file in the multiple versions of virtual operating systems;
acquiring stub point data of the at least one target file in the running process of each version of virtual operating system, wherein the stub point data comprises an Application Programming Interface (API) of the virtual operating system called by the target file and content transmitted when the target file calls the API;
generating a behavior sequence of each target file in the at least one target file in the virtual operating system of each version according to the stub point data;
inputting the behavior sequence into a virus file identification model to obtain a matching result of the behavior sequence and a standard characteristic sequence; determining the target file as the virus file under the condition that the matching result corresponding to the behavior sequence of the target file in any version of the virtual operating system is a preset value;
wherein at least one of the target files having different behavior sequences in the plurality of versions of the virtual operating system exists; the standard characteristic sequence is obtained by calculation according to pile point data of a sample virus file, the virus file identification model is obtained by inputting the behavior sequence of at least one sample virus file into the virus file identification model, and under the condition that the obtained matching result corresponding to the sample virus file is different from the calibration result of the sample virus file, the strategy function and the weight of each node in the virus file identification model are adjusted and trained, and the virus file identification model is a machine learning model.
2. The method according to claim 1, wherein the obtaining stub data of the at least one target file during running in each version of virtual operating system comprises:
injecting a custom code into the source code of the virtual operating system through Hook technology;
replacing a preset function in the source code with a custom function through the custom code;
and intercepting the API corresponding to the preset function called by the target file and the content transmitted when the API corresponding to the preset function is called by the target file through the custom function, and obtaining the stub point data of each target file.
3. The method of claim 1, wherein the obtaining stub data of the at least one target file during running in each version of the virtual operating system comprises:
and intercepting an API corresponding to the calling of the target file by the custom function in the virtual operating system and the content transmitted when the target file calls the API corresponding to the preset function, so as to obtain the stub point data of each target file.
4. The method of claim 1, wherein the stub data further comprises a timestamp that the target file calls the API; generating a behavior sequence of each target file in the at least one target file in the virtual operating system of each version according to the stub data, including:
and generating a behavior sequence of each target file according to the API called by the target file and the content transmitted when the target file calls the API and the sequence of the timestamps of calling the API by the target file.
5. The method of any of claims 1 to 4, further comprising:
executing a simulation trigger event in the virtual operating system at preset time intervals, wherein the simulation trigger event is an event of the operation on the virtual operating system or the target file simulated in the virtual operating system;
and acquiring pile point data of the target file after the simulation trigger event is executed.
6. The method of claim 5, wherein the executing a simulated trigger event in the virtual operating system at preset time intervals comprises:
executing the simulation trigger event through an accessible service plug-in the virtual operating system at intervals of the preset time interval;
or the like, or, alternatively,
and executing the simulation trigger event by a test tool in the virtual operating system at the preset time interval.
7. The method of any of claims 1 to 4, wherein there are at least two of the at least one object file, the method further comprising:
determining whether the ith target file in the at least two target files runs abnormally, wherein i is a natural number;
when the ith target file is determined to be abnormal in operation, unloading and/or deleting the ith target file in the virtual operating system;
and when the ith target file is determined not to be abnormal in operation, continuously acquiring the pile point data of the ith target file in the operation process of the virtual operating system.
8. An apparatus for identifying a virus file, the apparatus comprising:
the system comprises a collection unit, a storage unit and a processing unit, wherein the collection unit is used for installing at least one target file in a plurality of versions of virtual operating systems; running the at least one target file in the multiple versions of virtual operating systems; acquiring stub data of the at least one target file in the running process of each version of virtual operating system, wherein the stub data comprises an Application Programming Interface (API) of the virtual operating system called by the target file and content transmitted when the API is called by the target file;
the generating unit is used for generating a behavior sequence of each target file in the at least one target file in the virtual operating system of each version according to the stub point data;
the identification unit is used for inputting the behavior sequence into a virus file identification model to obtain a matching result of the behavior sequence and a standard characteristic sequence; determining the target file as the virus file under the condition that the matching result corresponding to the behavior sequence of the target file in any version of the virtual operating system is a preset value; wherein at least one of the target files having different behavior sequences in the plurality of versions of the virtual operating system exists; the standard characteristic sequence is obtained by calculation according to pile point data of a sample virus file, the virus file identification model is obtained by inputting the behavior sequence of at least one sample virus file into the virus file identification model, and under the condition that the obtained matching result corresponding to the sample virus file is different from the calibration result of the sample virus file, the strategy function and the weight of each node in the virus file identification model are adjusted and trained, and the virus file identification model is a machine learning model.
9. An electronic device, comprising a processor and a memory, wherein the memory stores at least one program, and the at least one program is loaded and executed by the processor to implement the method for identifying a virus file according to any one of claims 1 to 7.
10. A computer-readable storage medium, wherein at least one program is stored in the storage medium, and the at least one program is loaded and executed by a processor to implement the method for identifying a virus file according to any one of claims 1 to 7.
CN201810539850.7A 2018-05-30 2018-05-30 Virus file identification method, device, equipment and storage medium Active CN110210219B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810539850.7A CN110210219B (en) 2018-05-30 2018-05-30 Virus file identification method, device, equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810539850.7A CN110210219B (en) 2018-05-30 2018-05-30 Virus file identification method, device, equipment and storage medium

Publications (2)

Publication Number Publication Date
CN110210219A CN110210219A (en) 2019-09-06
CN110210219B true CN110210219B (en) 2023-04-18

Family

ID=67778863

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810539850.7A Active CN110210219B (en) 2018-05-30 2018-05-30 Virus file identification method, device, equipment and storage medium

Country Status (1)

Country Link
CN (1) CN110210219B (en)

Families Citing this family (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112580041B (en) * 2019-09-30 2023-07-07 奇安信安全技术(珠海)有限公司 Malicious program detection method and device, storage medium and computer equipment
CN112580035B (en) * 2019-09-30 2024-02-06 奇安信安全技术(珠海)有限公司 Program shelling method and device, storage medium and computer equipment
CN112580043B (en) * 2019-09-30 2023-08-01 奇安信安全技术(珠海)有限公司 Virtual machine-based disinfection method and device, storage medium and computer equipment
CN112580024B (en) * 2019-09-30 2023-08-01 奇安信安全技术(珠海)有限公司 Simulation method and device of virtual machine, storage medium and computer equipment
CN112580042B (en) * 2019-09-30 2024-02-02 奇安信安全技术(珠海)有限公司 Method and device for combating malicious programs, storage medium and computer equipment
CN113139176A (en) * 2020-01-20 2021-07-20 华为技术有限公司 Malicious file detection method, device, equipment and storage medium
CN111368298B (en) * 2020-02-27 2023-07-21 腾讯科技(深圳)有限公司 Virus file identification method, device, equipment and storage medium
CN114036517A (en) * 2021-11-02 2022-02-11 安天科技集团股份有限公司 Virus identification method and device, electronic equipment and storage medium
CN116150753A (en) * 2022-12-21 2023-05-23 上海交通大学 Mobile end malicious software detection system based on federal learning

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105989283B (en) * 2015-02-06 2019-08-09 阿里巴巴集团控股有限公司 A kind of method and device identifying virus mutation
CN108009424A (en) * 2017-11-22 2018-05-08 北京奇虎科技有限公司 Virus behavior detection method, apparatus and system

Also Published As

Publication number Publication date
CN110210219A (en) 2019-09-06

Similar Documents

Publication Publication Date Title
CN110210219B (en) Virus file identification method, device, equipment and storage medium
CN107979851B (en) Abnormal data reporting method and device
CN111338910B (en) Log data processing method, log data display method, log data processing device, log data display device, log data processing equipment and log data storage medium
CN109117635B (en) Virus detection method and device for application program, computer equipment and storage medium
CN109495616B (en) Photographing method and terminal equipment
CN110532188B (en) Page display test method and device
CN108229171B (en) Driver processing method, device and storage medium
CN111737100A (en) Data acquisition method, device, equipment and storage medium
CN112116690A (en) Video special effect generation method and device and terminal
CN111416996B (en) Multimedia file detection method, multimedia file playing device, multimedia file equipment and storage medium
CN111459466B (en) Code generation method, device, equipment and storage medium
CN112052167A (en) Method and device for generating test script code
CN109684123B (en) Problem resource positioning method, device, terminal and storage medium
CN111753606A (en) Intelligent model upgrading method and device
CN112529871A (en) Method and device for evaluating image and computer storage medium
CN109634872B (en) Application testing method, device, terminal and storage medium
CN112231666A (en) Illegal account processing method, device, terminal, server and storage medium
CN113591090B (en) Program bug reporting method, device, equipment and storage medium
CN114253442A (en) Module processing method and device for foreground and background separation system and storage medium
CN111897726A (en) Abnormity positioning method, abnormity positioning device, storage medium and mobile terminal
CN113268234A (en) Page generation method, device, terminal and storage medium
CN112783533A (en) Version information updating method, version information updating device, terminal and storage medium
CN112560612A (en) System, method, computer device and storage medium for determining business algorithm
CN112308104A (en) Abnormity identification method and device and computer storage medium
CN113342645B (en) Method, device, equipment and storage medium for testing business function

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant