US20160042180A1 - Behavior specification, finding main, and call graph visualizations - Google Patents

Behavior specification, finding main, and call graph visualizations Download PDF

Info

Publication number
US20160042180A1
US20160042180A1 US14/820,976 US201514820976A US2016042180A1 US 20160042180 A1 US20160042180 A1 US 20160042180A1 US 201514820976 A US201514820976 A US 201514820976A US 2016042180 A1 US2016042180 A1 US 2016042180A1
Authority
US
United States
Prior art keywords
function
behavior
code
software
functions
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Abandoned
Application number
US14/820,976
Inventor
Kirk D. Sayre
Richard A. Willems
Stephen Lanse Lindberg
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
UT Battelle LLC
Original Assignee
UT Battelle LLC
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by UT Battelle LLC filed Critical UT Battelle LLC
Priority to US14/820,976 priority Critical patent/US20160042180A1/en
Assigned to U.S. DEPARTMENT OF ENERGY reassignment U.S. DEPARTMENT OF ENERGY CONFIRMATORY LICENSE (SEE DOCUMENT FOR DETAILS). Assignors: UT-BATTELLE, LLC
Assigned to UT-BATTELLE, LLC reassignment UT-BATTELLE, LLC ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: LINDBERG, STEPHEN L, SAYRE, KIRK D, WILLEMS, RICHARD A
Publication of US20160042180A1 publication Critical patent/US20160042180A1/en
Priority to US15/906,831 priority patent/US10198580B2/en
Abandoned legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/50Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
    • G06F21/55Detecting local intrusion or implementing counter-measures
    • G06F21/56Computer malware detection or handling, e.g. anti-virus arrangements
    • G06F21/562Static detection
    • G06F21/563Static detection by source code analysis

Definitions

  • This disclosure relates to systems that monitor program behavior and specifically to systems that identify patterns in external function calls, systems that find the starting execution point of a compiled program, and an interactive user interface that differentiates software call functions.
  • FIG. 1 is a hierarchical view of a Behavior Specification Unit or BSU.
  • FIG. 2 is a key logger BSU hierarchy.
  • FIG. 3 is exemplary KeyLoggerToFile key logging BSU.
  • FIG. 4 is exemplary GetKeyPress key logging BSU.
  • FIG. 5 is exemplary GetKeyNameText key logging BSU.
  • FIG. 6 is exemplary ToAscii key logging BSU.
  • FIG. 7 is exemplary GetKeyboardState key logging BSU.
  • FIG. 8 is exemplary FileWriteComplete key logging BSU.
  • FIG. 9 is exemplary FileOpenWrite key logging BSU.
  • FIG. 10 is exemplary FileWrite key logging BSU.
  • FIG. 11 is exemplary FileClose key logging BSU.
  • FIG. 12 is a BSU recognition process flow.
  • FIG. 13 is a distant view of a function call graph.
  • FIG. 14 is a near view of a local function.
  • FIG. 15 is a selected external function.
  • FIG. 16 is a second distant view of another function call graph.
  • FIG. 17 is a near filtered view of another local function.
  • FIG. 18 is a distant view of another function call graph.
  • FIG. 19 is a near filtered view of another local function.
  • This disclosure describes a novel compiler agnostic system that automatically identifies where the functionality of a program (that may include application software, operating system software, and/or software libraries/tools) begins and ends, and detects malicious software by analyzing program behavior.
  • Operating system software manages computer hardware and software resources and provides common services for computer programs.
  • Application software (an application) is a set of computer programs designed to permit the user to perform a group of coordinated functions, tasks, or activities. Application software cannot run on itself but is dependent on system or operating software to execute.
  • the novel compiler agnostic system recognizes specific classes of program behavior without decompiling the machine language into its original source code.
  • the system recognizes specific program behaviors by identifying patterns in external function call behavior.
  • the system includes recognizer modules called Behavior Specification Units (BSUs) that abstract targeted program behavior, identify undesired behavior, and add semantic descriptions of the targeted code.
  • BSUs Behavior Specification Units
  • Some BSUs are organized in a hierarchical structure. These BSUs abstract program behavior as compositions or sets of lower-level behaviors. Through the use of a knowledge base that stores complex structured and unstructured behavior information in structured form in a local or remote database stored in a unitary or distributed computing memory, high level precise behavior abstractions are generated and stored in a BSU repository, library, or a BSU enterprise data warehouse.
  • the libraries include a collection of behavior abstractions stored in a file.
  • Each BSU in a library has a name, and each recognizes a specific class of behavior.
  • Some systems define precise behavior abstractions through a Domain Specific Language (DSL) and subject matter experts.
  • DSL Domain Specific Language
  • a subject matter expert or expert system is a computer system that emulates the decision-making ability of a human expert.
  • the BSUs are used to analyze the targeted software's behavioral functionality by first discovering the full, general program behavior and then classifying the general program behavior against the abstracted behavior stored in the BSU repositories or libraries.
  • the behaviors abstracted via classification against the selected BSUs generate a behavior analysis that identifies malware and/or classifies program functionality.
  • machine code is electronically transformed into a functional specification of the program behavior (a semantic representation of the program rather than a syntactic representation) by processing the targeted code with the functional effects of the machine instructions contained in the targeted code.
  • the machine instruction functional semantics are stored in an instruction semantics repository that may comprise a library or an instruction semantic enterprise data warehouse.
  • the semantic form of the targeted code is then transformed into a structured form by a Hyperion-like or Hyperion system.
  • the Hyperion system is a static program analysis tool that uses a semantic machine instruction behaviour language like the semantic language developed by Oak Ridge National Laboratory.
  • the Hyperion system statically extracts the behavior of targeted software to identify software functionality and security properties. The analysis of the functionality reveals security attributes, which are specialized functional behaviors of the targeted software.
  • the behavior computation process is described through an exemplary process that detects key logging (although it may detect and identify other security risks with other functionality).
  • Key logging tracks or logs keys struck on a keyboard, typically in a covert manner.
  • the BSU behaviors are represented in the form of pairs, where each pair comprises a predicate, which is a Boolean like expression that states under what conditions certain actions are executed by a program.
  • the second element of the BSU is the action taken by the program when the predicate is true.
  • the actions are represented as function calls, which are the program calls made under certain predicate conditions.
  • the function calls are represented with the name of the called function and a parenthesized argument list used in the function call.
  • the BSUs are written as a single behavior unit or one that is linked or associated with many behaviors such as the hierarchical BSU structures shown in FIGS. 1 and 2 .
  • the behavior computation process is looking for malicious behavior that tracks and/or logs key presses.
  • the KeyloggerToFile BSU FIG. 3
  • the KeyloggerToFile BSU FIG. 3
  • a key captures the keyboard state, where it was matched in the behavior of the local function, and identifies the key that was pressed. It is possible for a key to be in multiple states. A key may be in a pressed state and in a toggled state, for example.
  • the key logger functionality may be implemented in different ways, such as by capturing a key press event in different ways.
  • Two such ways of capturing a key press are represented in the GetKeyPress BSU ( FIG. 4 ).
  • One way to capture a key press is to perform the behaviour represented by the GetKeyboardState BSU followed by the behaviour represented by the ToAscii BSU.
  • ToAscii is a leaf level BSU which defines behaviour for translating the raw key press results into an ASCII value.
  • a leaf level is the lowest level in the BSU hierarchy that includes details of the defined behaviour.
  • the behavior computation process binds the KeyboardState value to the arguments made in the ToAscii calls as shown by call line C 1 of the ToAscii BSU ( FIG. 6 ).
  • a key press event may also be captured through the behaviour represented by the GetKeyNameText BSU ( FIG. 5 ).
  • the GetKeyNameText BSU ( FIG. 5 ) is a leaf level BSU and child of the GetKeyPress BSU ( FIG. 4 ).
  • the GetKeyNameText BSU function captures the key event and the actual key pressed. For example, if a user presses a backspace key that is tracked by a key logger, the GetKeyNameText BSU ( FIG. 5 ) identifies the key press as a backspace and returns the value to the BSU calling function shown by line B 3 line under the GetKeyPress BSU ( FIG. 4 ).
  • the KeyLoggerToFile BSU ( FIG. 3 ) when a key press is captured by a key logger it is identified by the GetKeyPress BSU ( FIG. 4 ) and written to a file through the FileWriteComplete BSU ( FIG. 8 ).
  • the FileWriteComplete BSU ( FIG. 8 ) references BSUs that represent the behaviour of opening the file (FileOpenWrite BSU, FIG. 9 ), writing to that open file (FileWrite, FIG. 10 ) and closing the open file (FileClose, FIG. 11 ) after the open file is written to.
  • the exemplary FileOpenWrite BSU ( FIG. 3 ) when a key press is captured by a key logger it is identified by the GetKeyPress BSU ( FIG. 4 ) and written to a file through the FileWriteComplete BSU ( FIG. 8 ).
  • the FileWriteComplete BSU ( FIG. 8 ) references BSUs that represent the behaviour of opening the file (FileOpenWrite BSU, FIG. 9
  • the FileWrite BSU ( FIG. 10 ) may invoke four different function calls to write to the open file.
  • the FileClose BSU ( FIG. 11 ) may invoke two different function calls to close the file.
  • the behavior computation process (also referred to as the BSU classification process flow) classifies the behaviour of various malware, viruses, etc., that are unauthorized, disable, interrupt, and/or damage computers.
  • an exemplary behavior computation process loads the functional behaviour specifications of local functions from the program analysis repository or library into computing memory for analysis at 1202 .
  • the BSU hierarchy is also loaded into the computing memory.
  • the process then pulls the leaves, branches, and trunk of the BSUs onto a program stack sequentially at 1204 , which may comprise a Last-In-First-Out (LIFO) data structure.
  • a stack is a restricted data structure, as only a limited number of operations are performed on it.
  • the last element added to the structure is the first one to be removed by the processor.
  • elements are removed from the stack in the reverse order of their addition, with the lowest members bing those that have been on the stack the longest.
  • a key logging BSU classification flow may be executed in any of the tree stack ordering sequence shown below.
  • a valid stack ordering is one where all of lower level BSUs used by a higher level BSU have been recognized before recognizing the higher level BSUs.
  • a BSU is removed by a pop function at 1208 , causing a processor to analyze the functionality of the targeted program in view of the behavior abstracted to the BSUs at 1210 and when a match is found, the behavior computation process inserts a program marker in the call list, which the process bubbles up (the match) through the program call tree at 1214 .
  • the behavior computation process performs BSU matching against the remaining BSUs in the BSU stack, collects the markings, bubbles those markings up through the program call tree (as defined by the local function calls made within the analysed program) before analysing the behavior functionality and the next highest BSU level until it traverses the entire BSU hierarchy.
  • markings are bubbled up through propagation phases that may mark all their appearances and repeat the propagation phase which bubbles up the marking up the program call trees.
  • the behavior computation processes and BSU systems have many other uses, such detecting process software code injection under Windows, detecting various anti-virtualization processes, anti-sandboxing processes, debugging techniques, etc. Further, in other alternative systems, different BSUs recognize FTP and HTTP operations allowing the BSUs to perform many different detections and classifications.
  • main the function called at program startup is named main.
  • the main( ) function is defined with a return type of “int” and with no parameters (as shown below):
  • the finding main system For each local function, the finding main system computes a path based complexity measure.
  • the path based complexity measure increases as the number of potential software execution paths through the local function increases.
  • the finding main system computes the complexity of the local functions of a program and identifies main ( ) by identifying the function with the highest path based complexity measure that accepts two or three arguments (zero argument main( ) functions tend to be rare). In other words, the system processes all of the local functions, and identifies main( ) as the code section having the highest path based complexity passing two to three arguments.
  • a interactive, dynamic visualization user interface allows users to explore the graph to view only the functions that can reach or are reachable by selected function.
  • the functions are color coded by their ancestral relation and their function call distance to the selected function of interest.
  • the user interface applies different color scales and adapts a color mapping function through a color algorithm to support tasks like calling out localizations and identification of processes and/or values.
  • the disclosed processes and systems allows the dynamic visualization user interface to render analysis by visually identifying and differentiating the relations of other functions of the program to that at a glance.
  • the dynamic visualization user interface may render the entirety of a call graph with no special formatting on a display.
  • the user interface allows the user to select the vertex for a function that is tracked by a trace.
  • the user interface When a function is selected (automatically or by a user), the user interface renders a graph that hides all functions that do not precede the selected function in the call chain or cannot be reached through a chain of function calls from that function without deleting them from memory. So, the dynamic visualization user interface may render a partial view that displays part of a call graph that is reachable by that function or another function, or the entire functions without having to recreate it.
  • the function-driven color-coding visualization rendered by the coloring algorithm identifies which functions precede the selected function (that is, the functions that can reach the selected function through some chain of function calls) and which functions follow the selected function (that is, the functions that can be reached from the selected function through some chain of function calls).
  • the saturation of the colors on a display indicates a relative call distance (the minimum number of function calls needed to reach the selected function or to be reached from the selected function) of a function to the selected function.
  • a selected function may be colored white in one color-coding approach. Preceding functions are colored to different color scales, for example, some shade of red, with lighter, less saturated reds having a shorter function call distance to the selected function. Functions following the selected function may be colored in an alternative range of colors on a computer display and may comprise an alternative color such as a primary color such as some shade of blue, with lighter, less saturated blues having a shorter function call distance to the selected function. In the case that a function can both precede and follow the selected function, it may be colored in yet another primary color or alternatively a differentiated colored some shade of violet, with lighter, less saturated violets having a shorter function call distance.
  • the function-driven distance measuring color-coding algorithm for the ancestor and descendant coloring may first execute a breadth-first search following backward-edges to locate and identify the preceding functions that are coupled to the selected function, keeping track of and storing the processing distance along functional call path by updates to memory.
  • a breadth-first search encounters a function that hasn't been marked as an ancestor, it marks that function as an ancestor and it sets its function call distance to the current depth of the breadth-first search.
  • the function-driven distance measuring color-coding algorithm executes a second breadth-first search that identifies the functions that follow the selected function and their function call distance from the selected function.
  • the second breadth-first search encounters a function that hasn't been previously marked as either an ancestor or descendant, it marks that function as a descendant in memory and it sets its function call distance to the current depth of the breadth-first search.
  • the second breadth-first search encounters a function that has been previously marked as an ancestor by the first breadth-first search, then it is marked as both an ancestor and a descendant. If the ancestor-descendant function's current function call distance to the selected function is greater than the current depth of the second breadth-first search, then its function call distance is set to the current depth of the second breadth-first search, since it was established as a shorter distance.
  • the dynamic visualization user interface allows users to further explore the call graph by selecting functions that precede and follow the selected function in the current function call graph visualization. Some user selections of preceding functions reveal more parts of the call graph, as more functions are reachable from ancestral nodes. Selecting functions that follow that function automatically hide paths and functions of the graph that were reachable by the previously selected function but not reachable by the newly selected function. While the paths and functions are hidden, they are not deleted which would force the user interface to recreate them if they are to be later rendered.
  • the “main” function of the visualized programs is the root of the function call graph. In these examples, the main precedes all other functions in the function call graph; the entire function call graph can be viewed by selecting main.
  • FIG. 13 shows a distant view of the exemplary function call graph for the program, with the “main” function selected (which is not shaded). Since all functions in the program follow main, all the other functions are colored a shade of blue, proportional to their function call distance from main.
  • the arrows show us the direction in which functions are called, with the caller functions being at the tail end of the arrows and the callee functions being at the head-end of the arrows.
  • FIG. 14 shows a close-up view of the local function at 0x004011B2 selected.
  • the view of the function call graph for the program is filtered by the dynamic visualization user interface to show only the functions that precede and follow this local function. Functions that precede it, including main, are colored a shade of red, proportional to their function call distance from local function 0x004011B2. Functions that follow after local function 0x004011B2 are colored to a different color and scale such a shade of blue, proportional to their function call distance from it.
  • FIG. 15 displays the external function “strtok” when selected.
  • the external function comprises C function used to tokenize a string.
  • the view is automatically filtered to show only the paths of local functions that precede it that were called by the program.
  • the dynamic visualization user interface automatically analyzes a utility program, which displays information about a user on a remote computer (usually one running UNIX) running the Finger service or daemon. It is called finger.exe.
  • FIG. 16 shows a distant view of the entire function call graph for the program with main selected and
  • FIG. 17 shows a close-up, filtered view of the graph with the local function 0x01001D21 selected.
  • the two local functions that precede it in the program are displayed in red, with main (0x01001493) being the deepest shade of red.
  • the functions that follow local function 0x01001D21, including the external functions “FormatMessageA”, “LocalFree”, and “s_perror” are colored shades of blue.
  • the dynamic visualization user interface may automatically analyze diagnostic software too such as a diagnostics program that performs network diagnostics and reachability for a remote computer at an IPv4 or IPv6 address called PING.exe
  • FIG. 18 shows a distant view of the function call graph for an entire program
  • FIG. 19 shows a close-up view of selected the external function “memcpy”, a standard C function used to copy bytes from one block of memory to another.
  • the dynamic visualization user interface view of the function call graph for the program is automatically filtered to only display the paths of function calls leading to “memcpy” while hiding the remaining paths.
  • the methods, devices, systems, and logic described above may be implemented in many different ways in many different combinations of hardware, software or both hardware and software.
  • all or parts of the system may diagnose software or circuitry in one or more controllers, one or more microprocessors (CPUs), one or more signal processors (SPU), one or more graphics processors (GPUs), one or more application specific integrated circuit (ASIC), one or more programmable media or any and all combinations of such hardware.
  • CPUs microprocessors
  • SPU signal processors
  • GPUs graphics processors
  • ASIC application specific integrated circuit
  • All or part of the logic, specialized processes, and systems described may be implemented as instructions for execution by multi-core processors (e.g., CPUs, SPUs, and/or GPUs), controller, or other processing device including exascale computers and compute clusters, and may be displayed through a display driver in communication with a remote or local display, or stored in a tangible or non-transitory machine-readable or computer-readable medium such as flash memory, random access memory (RAM) or read only memory (ROM), erasable programmable read only memory (EPROM) or other machine-readable medium such as a compact disc read only memory (CDROM), or magnetic or optical disk.
  • a product such as a computer program product, may include a storage medium and computer readable instructions stored on the medium, which when executed in an endpoint, computer system, or other device, cause the device to perform operations according to any of the description above.
  • the systems may evaluate software and data structures through processors (e.g., CPUs, SPUs, GPUs, etc.), memory, interconnect shared and/or distributed among multiple system components, such as among multiple processors and memories, including multiple distributed processing systems.
  • processors e.g., CPUs, SPUs, GPUs, etc.
  • memory interconnect shared and/or distributed among multiple system components, such as among multiple processors and memories, including multiple distributed processing systems.
  • Parameters, databases, software and data structures used to evaluate and analyze these systems or logic may be separately stored and managed, may be incorporated into a single memory or database, may be logically and/or physically organized in many different ways, and may implemented in many ways, including data structures such as linked lists, programming libraries, or implicit storage mechanisms.
  • Programs may be parts (e.g., subroutines) of a single program, separate programs, application program or programs distributed across several memories and processor cores and/or processing nodes, or implemented in many different ways, such as in a library, such as a shared library.
  • the library may store behavior abstractions that performs analyze the behavior functionality described herein. While various embodiments have been described, it will be apparent to those of ordinary skill in the art that many more embodiments and implementations are possible.
  • first and second parts are said to be coupled together when they directly contact one another, as well as when the first part couples to an intermediate part which couples either directly or via one or more additional intermediate parts to the second part.
  • the term “substantially” or “about” may encompass a range that is largely, but not necessarily wholly, that which is specified. It encompasses all but a significant amount.
  • the actions and/or steps of the devices such as the operations that devices are performing, necessarily occur as a direct or indirect result of the preceding commands, events, actions, and/or requests. In other words, the operations occur as a result of the preceding operations.
  • a device that is responsive to another requires more than an action (i.e., the device's response to) merely follow another action.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Security & Cryptography (AREA)
  • Computer Hardware Design (AREA)
  • General Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Virology (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Stored Programmes (AREA)

Abstract

A process transforms compiled software into a semantic form. The process transforms the code into a semantic form. The process analyzes behavior functionality by processing precise programming behavior abstractions stored in a memory and classifies the code as malware based on the code behavior. Another method identifies the starting point of execution of a compiled program. The method calculates a complexity measure by calculating the number of potential execution paths of local functions; identifies the number of arguments passed to local functions; and identifies the starting point of execution of the compiled program. Another method provides interactive, dynamic visualization of a group of related functions wherein a user can explore the rendered graph and select a specific function and display functions that are color coded by their ancestral relation and their function call distance to the selected function.

Description

    RELATED APPLICATION
  • This application claims the benefit of priority of U.S. Provisional Pat. App. No. 62/034,410 filed Aug. 7, 2014 and titled “Behavior Specification and Finding Main,” which is incorporated by reference.
  • STATEMENT REGARDING FEDERALLY SPONSORED RESEARCH AND DEVELOPMENT
  • This invention was made with United States government support under Contract No. DE-ACO5-00OR22725 awarded by the United States Department of Energy. The United States government has certain rights in the invention.
  • BACKGROUND
  • 1. Technical Field
  • This disclosure relates to systems that monitor program behavior and specifically to systems that identify patterns in external function calls, systems that find the starting execution point of a compiled program, and an interactive user interface that differentiates software call functions.
  • 2. Related Art
  • Software controls many aspects of systems used in our daily life. However, most source code is complex, making it difficult to track, identify errors, detect vulnerabilities, or detect malware. Current design and coding methods are vulnerable to malicious software that attempts to disable or damage computer programs or the computers themselves. Such code is usually transformed from source code into a target machine language through compilers that also may insert vulnerabilities into the compiled executable program. Some techniques used to detect malware employ only functional testing. Unfortunately, functional testing alone is incapable of catching many types of errors and vulnerabilities. Many tests are not exhaustive and do not scale to the length and functionality of the machine code.
  • When validating code, sometimes it is necessary to process machine language. Analysis of a compiled code, whether by manual or automated methods, typically focuses on the unique functionality of the executable code, not on the common start up and shut down functionality of the code.
  • BRIEF DESCRIPTION OF THE DRAWINGS
  • The patent or application file contains at least one drawing executed in color. Copies of this patent or patent application publication with color drawing(s) will be provided by the Office upon request and payment of the necessary fee.
  • FIG. 1 is a hierarchical view of a Behavior Specification Unit or BSU.
  • FIG. 2 is a key logger BSU hierarchy.
  • FIG. 3 is exemplary KeyLoggerToFile key logging BSU.
  • FIG. 4 is exemplary GetKeyPress key logging BSU.
  • FIG. 5 is exemplary GetKeyNameText key logging BSU.
  • FIG. 6 is exemplary ToAscii key logging BSU.
  • FIG. 7 is exemplary GetKeyboardState key logging BSU.
  • FIG. 8 is exemplary FileWriteComplete key logging BSU.
  • FIG. 9 is exemplary FileOpenWrite key logging BSU.
  • FIG. 10 is exemplary FileWrite key logging BSU.
  • FIG. 11 is exemplary FileClose key logging BSU.
  • FIG. 12 is a BSU recognition process flow.
  • FIG. 13 is a distant view of a function call graph.
  • FIG. 14 is a near view of a local function.
  • FIG. 15 is a selected external function.
  • FIG. 16 is a second distant view of another function call graph.
  • FIG. 17 is a near filtered view of another local function.
  • FIG. 18 is a distant view of another function call graph.
  • FIG. 19 is a near filtered view of another local function.
  • DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS
  • This disclosure describes a novel compiler agnostic system that automatically identifies where the functionality of a program (that may include application software, operating system software, and/or software libraries/tools) begins and ends, and detects malicious software by analyzing program behavior. Operating system software manages computer hardware and software resources and provides common services for computer programs. Application software (an application) is a set of computer programs designed to permit the user to perform a group of coordinated functions, tasks, or activities. Application software cannot run on itself but is dependent on system or operating software to execute.
  • The novel compiler agnostic system recognizes specific classes of program behavior without decompiling the machine language into its original source code. The system recognizes specific program behaviors by identifying patterns in external function call behavior. The system includes recognizer modules called Behavior Specification Units (BSUs) that abstract targeted program behavior, identify undesired behavior, and add semantic descriptions of the targeted code.
  • Some BSUs are organized in a hierarchical structure. These BSUs abstract program behavior as compositions or sets of lower-level behaviors. Through the use of a knowledge base that stores complex structured and unstructured behavior information in structured form in a local or remote database stored in a unitary or distributed computing memory, high level precise behavior abstractions are generated and stored in a BSU repository, library, or a BSU enterprise data warehouse. In some systems, the libraries include a collection of behavior abstractions stored in a file. Each BSU in a library has a name, and each recognizes a specific class of behavior. Some systems define precise behavior abstractions through a Domain Specific Language (DSL) and subject matter experts. A subject matter expert or expert system is a computer system that emulates the decision-making ability of a human expert. The BSUs are used to analyze the targeted software's behavioral functionality by first discovering the full, general program behavior and then classifying the general program behavior against the abstracted behavior stored in the BSU repositories or libraries. The behaviors abstracted via classification against the selected BSUs generate a behavior analysis that identifies malware and/or classifies program functionality.
  • In the behavior computation process that classifies program behaviour against the BSUs, machine code is electronically transformed into a functional specification of the program behavior (a semantic representation of the program rather than a syntactic representation) by processing the targeted code with the functional effects of the machine instructions contained in the targeted code. The machine instruction functional semantics are stored in an instruction semantics repository that may comprise a library or an instruction semantic enterprise data warehouse. The semantic form of the targeted code is then transformed into a structured form by a Hyperion-like or Hyperion system. The Hyperion system is a static program analysis tool that uses a semantic machine instruction behaviour language like the semantic language developed by Oak Ridge National Laboratory. The Hyperion system statically extracts the behavior of targeted software to identify software functionality and security properties. The analysis of the functionality reveals security attributes, which are specialized functional behaviors of the targeted software.
  • For instructional purposes explaining BSU behavior, the behavior computation process is described through an exemplary process that detects key logging (although it may detect and identify other security risks with other functionality). Key logging tracks or logs keys struck on a keyboard, typically in a covert manner. The BSU behaviors are represented in the form of pairs, where each pair comprises a predicate, which is a Boolean like expression that states under what conditions certain actions are executed by a program. The second element of the BSU is the action taken by the program when the predicate is true. The actions are represented as function calls, which are the program calls made under certain predicate conditions. The function calls are represented with the name of the called function and a parenthesized argument list used in the function call.
  • The BSUs are written as a single behavior unit or one that is linked or associated with many behaviors such as the hierarchical BSU structures shown in FIGS. 1 and 2. In FIG. 2, the behavior computation process is looking for malicious behavior that tracks and/or logs key presses. When key logging behavior occurs in an analysed program, the KeyloggerToFile BSU (FIG. 3) will recognize the key logging behaviour as capturing a key press (recognized by a lower level GetKeyPress BSU) followed by writing the key press to a file (recognized by a lower level FileWriteComplete BSU). Note that when a key press event occurs the GetKeyPress BSU (FIG. 4) captures the keyboard state, where it was matched in the behavior of the local function, and identifies the key that was pressed. It is possible for a key to be in multiple states. A key may be in a pressed state and in a toggled state, for example.
  • In the Windows environment the key logger functionality may be implemented in different ways, such as by capturing a key press event in different ways. Two such ways of capturing a key press are represented in the GetKeyPress BSU (FIG. 4). One way to capture a key press is to perform the behaviour represented by the GetKeyboardState BSU followed by the behaviour represented by the ToAscii BSU. ToAscii is a leaf level BSU which defines behaviour for translating the raw key press results into an ASCII value. A leaf level is the lowest level in the BSU hierarchy that includes details of the defined behaviour. As shown in the ToAscii BSU (FIG. 6), the behavior computation process binds the KeyboardState value to the arguments made in the ToAscii calls as shown by call line C1 of the ToAscii BSU (FIG. 6).
  • A key press event may also be captured through the behaviour represented by the GetKeyNameText BSU (FIG. 5). The GetKeyNameText BSU (FIG. 5) is a leaf level BSU and child of the GetKeyPress BSU (FIG. 4). The GetKeyNameText BSU function captures the key event and the actual key pressed. For example, if a user presses a backspace key that is tracked by a key logger, the GetKeyNameText BSU (FIG. 5) identifies the key press as a backspace and returns the value to the BSU calling function shown by line B3 line under the GetKeyPress BSU (FIG. 4).
  • As shown by the KeyLoggerToFile BSU (FIG. 3) when a key press is captured by a key logger it is identified by the GetKeyPress BSU (FIG. 4) and written to a file through the FileWriteComplete BSU (FIG. 8). In operation, the FileWriteComplete BSU (FIG. 8) references BSUs that represent the behaviour of opening the file (FileOpenWrite BSU, FIG. 9), writing to that open file (FileWrite, FIG. 10) and closing the open file (FileClose, FIG. 11) after the open file is written to. At the leaf level, there are eleven different ways in which the exemplary FileOpenWrite BSU (FIG. 9) may open a file in Windows that can be written to (as shown by the call functions of the FileOpenWrite BSU). The FileWrite BSU (FIG. 10) may invoke four different function calls to write to the open file. And, the FileClose BSU (FIG. 11) may invoke two different function calls to close the file.
  • Using the hierarchical BSU structure, the behavior computation process (also referred to as the BSU classification process flow) classifies the behaviour of various malware, viruses, etc., that are unauthorized, disable, interrupt, and/or damage computers. As shown in FIG. 12, an exemplary behavior computation process loads the functional behaviour specifications of local functions from the program analysis repository or library into computing memory for analysis at 1202. The BSU hierarchy is also loaded into the computing memory. The process then pulls the leaves, branches, and trunk of the BSUs onto a program stack sequentially at 1204, which may comprise a Last-In-First-Out (LIFO) data structure. A stack is a restricted data structure, as only a limited number of operations are performed on it. In the LIFO data structure, the last element added to the structure is the first one to be removed by the processor. In other words, elements are removed from the stack in the reverse order of their addition, with the lowest members bing those that have been on the stack the longest. A key logging BSU classification flow, for example, may be executed in any of the tree stack ordering sequence shown below.
  • Stack Ordering 1 Stack Ordering 2 Stack Ordering 3
    FileOpenWrite GetKeyboard State GetKeyboardState
    FileWrite ToAscii ToAscii
    FileClose GetKeyNameText GetKeyNameText
    FileWriteComplete GetKeyPress FileOpenWrite
    GetKeyboardState FileOpenWrite FileWrite
    ToAscii FileWrite FileClose
    GetKeyNameText FileClose FileWriteComplete
    GetKeyPress FileWriteComplete GetKeyPress
    KeyLoggerToFile KeyLoggerToFile KeyLoggerToFile

    In alternative systems and processes, other orderings of the stack are executed. A valid stack ordering is one where all of lower level BSUs used by a higher level BSU have been recognized before recognizing the higher level BSUs.
  • In FIG. 12, when the BSU stack is not empty, a BSU is removed by a pop function at 1208, causing a processor to analyze the functionality of the targeted program in view of the behavior abstracted to the BSUs at 1210 and when a match is found, the behavior computation process inserts a program marker in the call list, which the process bubbles up (the match) through the program call tree at 1214. If the stack is not empty at 1206 the behavior computation process performs BSU matching against the remaining BSUs in the BSU stack, collects the markings, bubbles those markings up through the program call tree (as defined by the local function calls made within the analysed program) before analysing the behavior functionality and the next highest BSU level until it traverses the entire BSU hierarchy. In some systems, markings are bubbled up through propagation phases that may mark all their appearances and repeat the propagation phase which bubbles up the marking up the program call trees.
  • Besides key logging, the behavior computation processes and BSU systems have many other uses, such detecting process software code injection under Windows, detecting various anti-virtualization processes, anti-sandboxing processes, debugging techniques, etc. Further, in other alternative systems, different BSUs recognize FTP and HTTP operations allowing the BSUs to perform many different detections and classifications.
  • The problem of identifying the starting point of execution of a program, which in compiled software is especially difficult and even more challenging in legacy applications, is solved in an alternate system referred to as a finding main (system) in an alternative embodiment of this disclosure. In the C language the function called at program startup is named main. The main( ) function is defined with a return type of “int” and with no parameters (as shown below):
      • int main(void){/* . . . */}
        or with two parameters (referred to here as “argc” and “argv,” though any names may be used, as they are local to the function in which they are declared) as shown below:
      • int main(int argc, char *argv[ ]){/* . . . */}
        or, optionally (under Windows),
      • int main(int argc, char *argv[ ], char *envp[ ]){/* . . . */}
        The special function named main is the starting point of execution for C and C++ programs. The main function is not predefined by the compiler. It is supplied in the program text. Thus, the execution of the user-defined functionality of a C/C++ program starts at main( ) and main( ) will have 2, 3, or 0 arguments.
  • For each local function, the finding main system computes a path based complexity measure. The path based complexity measure increases as the number of potential software execution paths through the local function increases. The path based complexity measure computation steps into called functions, so the complexity of called functions is reflected in the complexity of the calling function. Since the complexity is very large for some programs, the finding main system uses a log scale for the path based complexity measure. For example, for the statement, ‘if a b( ) else c( ) d; if e f( ) else g( )’ the complexity is log10(4), that is, there are four possible execution paths.
  • To detect main, the finding main system computes the complexity of the local functions of a program and identifies main ( ) by identifying the function with the highest path based complexity measure that accepts two or three arguments (zero argument main( ) functions tend to be rare). In other words, the system processes all of the local functions, and identifies main( ) as the code section having the highest path based complexity passing two to three arguments.
  • To visualize the functional calls in software, like the software described above a interactive, dynamic visualization user interface allows users to explore the graph to view only the functions that can reach or are reachable by selected function. The functions are color coded by their ancestral relation and their function call distance to the selected function of interest. In some systems the user interface applies different color scales and adapts a color mapping function through a color algorithm to support tasks like calling out localizations and identification of processes and/or values. The disclosed processes and systems allows the dynamic visualization user interface to render analysis by visually identifying and differentiating the relations of other functions of the program to that at a glance.
  • Initially, the dynamic visualization user interface may render the entirety of a call graph with no special formatting on a display. The user interface allows the user to select the vertex for a function that is tracked by a trace. When a function is selected (automatically or by a user), the user interface renders a graph that hides all functions that do not precede the selected function in the call chain or cannot be reached through a chain of function calls from that function without deleting them from memory. So, the dynamic visualization user interface may render a partial view that displays part of a call graph that is reachable by that function or another function, or the entire functions without having to recreate it.
  • The function-driven color-coding visualization rendered by the coloring algorithm identifies which functions precede the selected function (that is, the functions that can reach the selected function through some chain of function calls) and which functions follow the selected function (that is, the functions that can be reached from the selected function through some chain of function calls). In one system and process, the saturation of the colors on a display indicates a relative call distance (the minimum number of function calls needed to reach the selected function or to be reached from the selected function) of a function to the selected function.
  • For example, a selected function may be colored white in one color-coding approach. Preceding functions are colored to different color scales, for example, some shade of red, with lighter, less saturated reds having a shorter function call distance to the selected function. Functions following the selected function may be colored in an alternative range of colors on a computer display and may comprise an alternative color such as a primary color such as some shade of blue, with lighter, less saturated blues having a shorter function call distance to the selected function. In the case that a function can both precede and follow the selected function, it may be colored in yet another primary color or alternatively a differentiated colored some shade of violet, with lighter, less saturated violets having a shorter function call distance.
  • The function-driven distance measuring color-coding algorithm for the ancestor and descendant coloring may first execute a breadth-first search following backward-edges to locate and identify the preceding functions that are coupled to the selected function, keeping track of and storing the processing distance along functional call path by updates to memory. When the breadth-first search encounters a function that hasn't been marked as an ancestor, it marks that function as an ancestor and it sets its function call distance to the current depth of the breadth-first search.
  • After performing the breadth-first search that identifies the preceding functions and their function call distance to the selected function, the function-driven distance measuring color-coding algorithm executes a second breadth-first search that identifies the functions that follow the selected function and their function call distance from the selected function. When the second breadth-first search encounters a function that hasn't been previously marked as either an ancestor or descendant, it marks that function as a descendant in memory and it sets its function call distance to the current depth of the breadth-first search.
  • If the second breadth-first search encounters a function that has been previously marked as an ancestor by the first breadth-first search, then it is marked as both an ancestor and a descendant. If the ancestor-descendant function's current function call distance to the selected function is greater than the current depth of the second breadth-first search, then its function call distance is set to the current depth of the second breadth-first search, since it was established as a shorter distance.
  • The dynamic visualization user interface allows users to further explore the call graph by selecting functions that precede and follow the selected function in the current function call graph visualization. Some user selections of preceding functions reveal more parts of the call graph, as more functions are reachable from ancestral nodes. Selecting functions that follow that function automatically hide paths and functions of the graph that were reachable by the previously selected function but not reachable by the newly selected function. While the paths and functions are hidden, they are not deleted which would force the user interface to recreate them if they are to be later rendered. As a point of reference for the following examples, the “main” function of the visualized programs is the root of the function call graph. In these examples, the main precedes all other functions in the function call graph; the entire function call graph can be viewed by selecting main.
  • For example, consider tracing a program that parses a dice-rolling expression given as input by the user (such as 2d6+4, which means roll 2 6-sided dice and add 4 to the result) and produces a pseudo-random integer result from the dice roll. It is shown in FIG. 13 called DeterministicDice.exe. FIG. 13 shows a distant view of the exemplary function call graph for the program, with the “main” function selected (which is not shaded). Since all functions in the program follow main, all the other functions are colored a shade of blue, proportional to their function call distance from main. The arrows show us the direction in which functions are called, with the caller functions being at the tail end of the arrows and the callee functions being at the head-end of the arrows.
  • FIG. 14 shows a close-up view of the local function at 0x004011B2 selected. The view of the function call graph for the program is filtered by the dynamic visualization user interface to show only the functions that precede and follow this local function. Functions that precede it, including main, are colored a shade of red, proportional to their function call distance from local function 0x004011B2. Functions that follow after local function 0x004011B2 are colored to a different color and scale such a shade of blue, proportional to their function call distance from it.
  • FIG. 15 displays the external function “strtok” when selected. The external function comprises C function used to tokenize a string. Here, the view is automatically filtered to show only the paths of local functions that precede it that were called by the program.
  • In another example, the dynamic visualization user interface automatically analyzes a utility program, which displays information about a user on a remote computer (usually one running UNIX) running the Finger service or daemon. It is called finger.exe. FIG. 16 shows a distant view of the entire function call graph for the program with main selected and FIG. 17 shows a close-up, filtered view of the graph with the local function 0x01001D21 selected. The two local functions that precede it in the program are displayed in red, with main (0x01001493) being the deepest shade of red. The functions that follow local function 0x01001D21, including the external functions “FormatMessageA”, “LocalFree”, and “s_perror” are colored shades of blue.
  • The dynamic visualization user interface may automatically analyze diagnostic software too such as a diagnostics program that performs network diagnostics and reachability for a remote computer at an IPv4 or IPv6 address called PING.exe FIG. 18 shows a distant view of the function call graph for an entire program and FIG. 19 shows a close-up view of selected the external function “memcpy”, a standard C function used to copy bytes from one block of memory to another. With “memcpy” selected, the dynamic visualization user interface view of the function call graph for the program is automatically filtered to only display the paths of function calls leading to “memcpy” while hiding the remaining paths.
  • The methods, devices, systems, and logic described above may be implemented in many different ways in many different combinations of hardware, software or both hardware and software. For example, all or parts of the system may diagnose software or circuitry in one or more controllers, one or more microprocessors (CPUs), one or more signal processors (SPU), one or more graphics processors (GPUs), one or more application specific integrated circuit (ASIC), one or more programmable media or any and all combinations of such hardware. All or part of the logic, specialized processes, and systems described may be implemented as instructions for execution by multi-core processors (e.g., CPUs, SPUs, and/or GPUs), controller, or other processing device including exascale computers and compute clusters, and may be displayed through a display driver in communication with a remote or local display, or stored in a tangible or non-transitory machine-readable or computer-readable medium such as flash memory, random access memory (RAM) or read only memory (ROM), erasable programmable read only memory (EPROM) or other machine-readable medium such as a compact disc read only memory (CDROM), or magnetic or optical disk. Thus, a product, such as a computer program product, may include a storage medium and computer readable instructions stored on the medium, which when executed in an endpoint, computer system, or other device, cause the device to perform operations according to any of the description above.
  • The systems may evaluate software and data structures through processors (e.g., CPUs, SPUs, GPUs, etc.), memory, interconnect shared and/or distributed among multiple system components, such as among multiple processors and memories, including multiple distributed processing systems. Parameters, databases, software and data structures used to evaluate and analyze these systems or logic may be separately stored and managed, may be incorporated into a single memory or database, may be logically and/or physically organized in many different ways, and may implemented in many ways, including data structures such as linked lists, programming libraries, or implicit storage mechanisms. Programs may be parts (e.g., subroutines) of a single program, separate programs, application program or programs distributed across several memories and processor cores and/or processing nodes, or implemented in many different ways, such as in a library, such as a shared library. The library may store behavior abstractions that performs analyze the behavior functionality described herein. While various embodiments have been described, it will be apparent to those of ordinary skill in the art that many more embodiments and implementations are possible.
  • The term “coupled” disclosed in this description may encompass both direct and indirect coupling. Thus, first and second parts are said to be coupled together when they directly contact one another, as well as when the first part couples to an intermediate part which couples either directly or via one or more additional intermediate parts to the second part. The term “substantially” or “about” may encompass a range that is largely, but not necessarily wholly, that which is specified. It encompasses all but a significant amount. When devices are responsive to or occur in response to commands events, and/or requests, the actions and/or steps of the devices, such as the operations that devices are performing, necessarily occur as a direct or indirect result of the preceding commands, events, actions, and/or requests. In other words, the operations occur as a result of the preceding operations. A device that is responsive to another requires more than an action (i.e., the device's response to) merely follow another action.
  • While various embodiments of the invention have been described, it will be apparent to those of ordinary skill in the art that many more embodiments and implementations are possible within the scope of the invention. Accordingly, the invention is not to be restricted except in light of the attached claims and their equivalents.

Claims (20)

What is claimed is:
1. A behavior computation process comprising:
transforming compiled software code into a semantic form of the compiled software;
transforming the semantic form of the software code into a structured form;
computing code behavior and analyzing behavior functionality by processing precise programming behavior abstractions stored in a data repository; and
classifying the software code as malware based on the computing code behavior and the analysis of the behavior functionality.
2. The process if claim 1 where the process transforms the compiled software code, transforms the semantic form of the software code, computes code behavior and analyzes behavior functionality, and classifies the software code without decompiling the software code.
3. The process of claim 1 where the process that classifies the software code classifies by identifying patterns in external call behavior.
4. The process of claim 1 where the software code comprises operating system software that manages computer hardware and software resources and provides common services for computer programs.
5. The process of claim 4 where the software code further comprises application software.
6. The process of claim 1 where the process that classifies the software code classifies by processing computing libraries comprises behavior abstraction data stored in computer files of the computer's memory.
7. The process of claim 1 where the behavior functionality is accessible through a hierarchical structure of behavior levels stored in a memory stack.
8. The process of claim 1 where the transformed code is statically extracted by Hyperion-like software that determines the complied software's intentions without running the compiled software itself or processing its source code.
9. The process of claim 1 where the semantic form of the compiled software comprises an intermediary code that represents the meaning of the code rather than a structure of the computer language.
10. A method of identifying the starting point of execution of a compiled program comprising:
calculating a path based complexity by processing the execution paths of the local functions of a program;
identifying the number of arguments passed to each local function;
deriving a logarithm of the complexity measure; and
identifying a first starting point of execution of the compiled program based on the derived logarithm and the arguments passed in the function calls.
11. The method of claim 10 where the path based complexity measures the total number of software execution paths.
12. The method of claim 10 where the path based complexity measures the number of call functions.
13. The method of claim 10 where the path based complexity is a logarithmic measure.
14. The method of claim 10 where the starting posing of execution of the compiled program comprises the starting point having the highest path based complexity measure that accepts two or three arguments.
15. A dynamic visualization user interface comprising:
a central processing unit processing executable code accessed from a random access memory, in which the executable code:
identifies the ancestral and dependent paths coupled to a selected computer function;
performs a first breadth-first search that follows backward-edges of a function to locate and identify the preceding functions that are coupled to the selected computer function;
performs a second breadth-first search that identifies the computer functions that follow the selected computer function and their function call distance from the selected computer function; and
rendering a color-coded representation of the processing paths that are color coded by their ancestral relation and function call distances to the selected computer function on a display.
16. The process of claim 15 where the dynamic visualization user interface is responsive to the selected computer function by concealing all the call functions that cannot be reached without deleting them from a display.
17. The process of claim 15 where a saturation of the colors indicates a relative call distance of a function to the selected computer function.
18. The process of claim 17 where the relative call distance comprises a minimum number of function calls required to reach the selected function or to be reached from the selected computer function.
19. The process of claim 17 where the ancestral paths comprise paths that that can reach the selected computer function through some chain of function calls and dependent paths comprise paths flow from the selected computer function.
20. The process of claim 17 where the scale of the color-coded representations is responsive to localizations and identification of processes.
US14/820,976 2014-08-07 2015-08-07 Behavior specification, finding main, and call graph visualizations Abandoned US20160042180A1 (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
US14/820,976 US20160042180A1 (en) 2014-08-07 2015-08-07 Behavior specification, finding main, and call graph visualizations
US15/906,831 US10198580B2 (en) 2014-08-07 2018-02-27 Behavior specification, finding main, and call graph visualizations

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US201462034410P 2014-08-07 2014-08-07
US14/820,976 US20160042180A1 (en) 2014-08-07 2015-08-07 Behavior specification, finding main, and call graph visualizations

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US15/906,831 Continuation US10198580B2 (en) 2014-08-07 2018-02-27 Behavior specification, finding main, and call graph visualizations

Publications (1)

Publication Number Publication Date
US20160042180A1 true US20160042180A1 (en) 2016-02-11

Family

ID=55267623

Family Applications (2)

Application Number Title Priority Date Filing Date
US14/820,976 Abandoned US20160042180A1 (en) 2014-08-07 2015-08-07 Behavior specification, finding main, and call graph visualizations
US15/906,831 Active US10198580B2 (en) 2014-08-07 2018-02-27 Behavior specification, finding main, and call graph visualizations

Family Applications After (1)

Application Number Title Priority Date Filing Date
US15/906,831 Active US10198580B2 (en) 2014-08-07 2018-02-27 Behavior specification, finding main, and call graph visualizations

Country Status (1)

Country Link
US (2) US20160042180A1 (en)

Cited By (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160042179A1 (en) * 2014-08-11 2016-02-11 Sentinel Labs Israel Ltd. Method of malware detection and system thereof
US20160119366A1 (en) * 2008-10-30 2016-04-28 Mcafee, Inc. Structural recognition of malicious code patterns
US20170351597A1 (en) * 2016-06-02 2017-12-07 International Business Machines Corporation Identifying and isolating library code in software applications
US10073764B1 (en) * 2015-03-05 2018-09-11 National Technology & Engineering Solutions Of Sandia, Llc Method for instruction sequence execution analysis and visualization
US20180285567A1 (en) * 2017-03-31 2018-10-04 Qualcomm Incorporated Methods and Systems for Malware Analysis and Gating Logic
US10102374B1 (en) * 2014-08-11 2018-10-16 Sentinel Labs Israel Ltd. Method of remediating a program and system thereof by undoing operations
US10462171B2 (en) 2017-08-08 2019-10-29 Sentinel Labs Israel Ltd. Methods, systems, and devices for dynamically modeling and grouping endpoints for edge networking
US10762200B1 (en) 2019-05-20 2020-09-01 Sentinel Labs Israel Ltd. Systems and methods for executable code detection, automatic feature extraction and position independent code detection
CN111628997A (en) * 2020-05-26 2020-09-04 中国联合网络通信集团有限公司 Attack prevention method and device
US20210397710A1 (en) * 2014-08-11 2021-12-23 Sentinel Labs Israel Ltd. Method of remediating operations performed by a program and system thereof
US11321155B2 (en) * 2017-08-10 2022-05-03 Bank Of America Corporation Automatic resource dependency tracking and structure for maintenance of resource fault propagation
US11579857B2 (en) 2020-12-16 2023-02-14 Sentinel Labs Israel Ltd. Systems, methods and devices for device fingerprinting and automatic deployment of software in a computing network using a peer-to-peer approach
US11616812B2 (en) 2016-12-19 2023-03-28 Attivo Networks Inc. Deceiving attackers accessing active directory data
US11695800B2 (en) 2016-12-19 2023-07-04 SentinelOne, Inc. Deceiving attackers accessing network data
US11888897B2 (en) 2018-02-09 2024-01-30 SentinelOne, Inc. Implementing decoys in a network environment
US11899782B1 (en) 2021-07-13 2024-02-13 SentinelOne, Inc. Preserving DLL hooks
US11997139B2 (en) 2023-03-13 2024-05-28 SentinelOne, Inc. Deceiving attackers accessing network data

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109814924B (en) * 2019-01-28 2020-10-02 华东师范大学 Software complexity calculation method

Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5894311A (en) * 1995-08-08 1999-04-13 Jerry Jackson Associates Ltd. Computer-based visual data evaluation
US6163882A (en) * 1997-04-03 2000-12-19 Nec Corporation Language processing apparatus for converting source program into object program
US6356285B1 (en) * 1997-12-17 2002-03-12 Lucent Technologies, Inc System for visually representing modification information about an characteristic-dependent information processing system
US6363435B1 (en) * 1998-02-03 2002-03-26 Microsoft Corporation Event sourcing and filtering for transient objects in a hierarchical object model
US20030233640A1 (en) * 2002-04-29 2003-12-18 Hewlett-Packard Development Company, L.P. Structuring program code
US6721275B1 (en) * 1999-05-03 2004-04-13 Hewlett-Packard Development Company, L.P. Bridged network stations location revision
US20040111719A1 (en) * 2002-12-09 2004-06-10 Sun Microsystems, Inc. Method for safely instrumenting large binary code
US20050223238A1 (en) * 2003-09-26 2005-10-06 Schmid Matthew N Methods for identifying malicious software
US20130198841A1 (en) * 2012-01-30 2013-08-01 Cisco Technology, Inc. Malware Classification for Unknown Executable Files
US20150104106A1 (en) * 2013-10-16 2015-04-16 Canon Kabushiki Kaisha Method, system and apparatus for determining a contour segment for an object in an image captured by a camera
US20160357965A1 (en) * 2015-06-04 2016-12-08 Ut Battelle, Llc Automatic clustering of malware variants based on structured control flow

Family Cites Families (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5778212A (en) * 1996-06-03 1998-07-07 Silicon Graphics, Inc. Interprocedural analysis user interface
US7996825B2 (en) * 2003-10-31 2011-08-09 Hewlett-Packard Development Company, L.P. Cross-file inlining by using summaries and global worklist
US8990792B2 (en) * 2008-05-26 2015-03-24 Samsung Electronics Co., Ltd. Method for constructing dynamic call graph of application
US8473928B2 (en) * 2010-04-19 2013-06-25 Sap Ag Call graph simplification/comparison and automatic initial suspects finding of performance degradations
US8627291B2 (en) * 2012-04-02 2014-01-07 International Business Machines Corporation Identification of localizable function calls
US9092568B2 (en) * 2012-04-30 2015-07-28 Nec Laboratories America, Inc. Method and system for correlated tracing with automated multi-layer function instrumentation localization
WO2014041561A2 (en) * 2012-08-31 2014-03-20 Iappsecure Solutions Pvt. Ltd. A system for analyzing applications accurately for finding security and quality issues
US8984495B2 (en) * 2013-01-03 2015-03-17 International Business Machines Corporation Enhanced string analysis that improves accuracy of static analysis
US9098627B2 (en) * 2013-03-06 2015-08-04 Red Hat, Inc. Providing a core dump-level stack trace
US8997256B1 (en) * 2014-03-31 2015-03-31 Terbium Labs LLC Systems and methods for detecting copied computer code using fingerprints

Patent Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5894311A (en) * 1995-08-08 1999-04-13 Jerry Jackson Associates Ltd. Computer-based visual data evaluation
US6163882A (en) * 1997-04-03 2000-12-19 Nec Corporation Language processing apparatus for converting source program into object program
US6356285B1 (en) * 1997-12-17 2002-03-12 Lucent Technologies, Inc System for visually representing modification information about an characteristic-dependent information processing system
US6363435B1 (en) * 1998-02-03 2002-03-26 Microsoft Corporation Event sourcing and filtering for transient objects in a hierarchical object model
US6721275B1 (en) * 1999-05-03 2004-04-13 Hewlett-Packard Development Company, L.P. Bridged network stations location revision
US20030233640A1 (en) * 2002-04-29 2003-12-18 Hewlett-Packard Development Company, L.P. Structuring program code
US20040111719A1 (en) * 2002-12-09 2004-06-10 Sun Microsystems, Inc. Method for safely instrumenting large binary code
US20050223238A1 (en) * 2003-09-26 2005-10-06 Schmid Matthew N Methods for identifying malicious software
US20130198841A1 (en) * 2012-01-30 2013-08-01 Cisco Technology, Inc. Malware Classification for Unknown Executable Files
US20150104106A1 (en) * 2013-10-16 2015-04-16 Canon Kabushiki Kaisha Method, system and apparatus for determining a contour segment for an object in an image captured by a camera
US20160357965A1 (en) * 2015-06-04 2016-12-08 Ut Battelle, Llc Automatic clustering of malware variants based on structured control flow

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
Hall et al., Efficient Call Graph Analysis, September 1992, ACM, ACM Letters on Programming Languages and Systems Vol. 1 No. 3, Pages 227-242 *
Hu et al., Large-Scale Malware Indexing Using Function-Call Graphs, November 9-13 2009, ACM, CCS '09 *
Joris Kinable, Malware Detection Through Call Graphs, June 30 2010, Master's Thesis Aalto University *

Cited By (43)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9680847B2 (en) * 2008-10-30 2017-06-13 Mcafee, Inc. Structural recognition of malicious code patterns
US20160119366A1 (en) * 2008-10-30 2016-04-28 Mcafee, Inc. Structural recognition of malicious code patterns
US11507663B2 (en) * 2014-08-11 2022-11-22 Sentinel Labs Israel Ltd. Method of remediating operations performed by a program and system thereof
US9710648B2 (en) * 2014-08-11 2017-07-18 Sentinel Labs Israel Ltd. Method of malware detection and system thereof
US20160042179A1 (en) * 2014-08-11 2016-02-11 Sentinel Labs Israel Ltd. Method of malware detection and system thereof
US11625485B2 (en) 2014-08-11 2023-04-11 Sentinel Labs Israel Ltd. Method of malware detection and system thereof
US10102374B1 (en) * 2014-08-11 2018-10-16 Sentinel Labs Israel Ltd. Method of remediating a program and system thereof by undoing operations
US10417424B2 (en) * 2014-08-11 2019-09-17 Sentinel Labs Israel Ltd. Method of remediating operations performed by a program and system thereof
US20210397710A1 (en) * 2014-08-11 2021-12-23 Sentinel Labs Israel Ltd. Method of remediating operations performed by a program and system thereof
US11886591B2 (en) * 2014-08-11 2024-01-30 Sentinel Labs Israel Ltd. Method of remediating operations performed by a program and system thereof
US10664596B2 (en) 2014-08-11 2020-05-26 Sentinel Labs Israel Ltd. Method of malware detection and system thereof
US10977370B2 (en) * 2014-08-11 2021-04-13 Sentinel Labs Israel Ltd. Method of remediating operations performed by a program and system thereof
US10073764B1 (en) * 2015-03-05 2018-09-11 National Technology & Engineering Solutions Of Sandia, Llc Method for instruction sequence execution analysis and visualization
US20170351597A1 (en) * 2016-06-02 2017-12-07 International Business Machines Corporation Identifying and isolating library code in software applications
US10423408B2 (en) * 2016-06-02 2019-09-24 International Business Machines Corporation Identifying and isolating library code in software applications
US11616812B2 (en) 2016-12-19 2023-03-28 Attivo Networks Inc. Deceiving attackers accessing active directory data
US11695800B2 (en) 2016-12-19 2023-07-04 SentinelOne, Inc. Deceiving attackers accessing network data
US20180285567A1 (en) * 2017-03-31 2018-10-04 Qualcomm Incorporated Methods and Systems for Malware Analysis and Gating Logic
US11245715B2 (en) 2017-08-08 2022-02-08 Sentinel Labs Israel Ltd. Methods, systems, and devices for dynamically modeling and grouping endpoints for edge networking
US11876819B2 (en) 2017-08-08 2024-01-16 Sentinel Labs Israel Ltd. Methods, systems, and devices for dynamically modeling and grouping endpoints for edge networking
US11212309B1 (en) 2017-08-08 2021-12-28 Sentinel Labs Israel Ltd. Methods, systems, and devices for dynamically modeling and grouping endpoints for edge networking
US11290478B2 (en) 2017-08-08 2022-03-29 Sentinel Labs Israel Ltd. Methods, systems, and devices for dynamically modeling and grouping endpoints for edge networking
US11973781B2 (en) 2017-08-08 2024-04-30 Sentinel Labs Israel Ltd. Methods, systems, and devices for dynamically modeling and grouping endpoints for edge networking
US10462171B2 (en) 2017-08-08 2019-10-29 Sentinel Labs Israel Ltd. Methods, systems, and devices for dynamically modeling and grouping endpoints for edge networking
US11522894B2 (en) 2017-08-08 2022-12-06 Sentinel Labs Israel Ltd. Methods, systems, and devices for dynamically modeling and grouping endpoints for edge networking
US11245714B2 (en) 2017-08-08 2022-02-08 Sentinel Labs Israel Ltd. Methods, systems, and devices for dynamically modeling and grouping endpoints for edge networking
US11838306B2 (en) 2017-08-08 2023-12-05 Sentinel Labs Israel Ltd. Methods, systems, and devices for dynamically modeling and grouping endpoints for edge networking
US10841325B2 (en) 2017-08-08 2020-11-17 Sentinel Labs Israel Ltd. Methods, systems, and devices for dynamically modeling and grouping endpoints for edge networking
US11838305B2 (en) 2017-08-08 2023-12-05 Sentinel Labs Israel Ltd. Methods, systems, and devices for dynamically modeling and grouping endpoints for edge networking
US11722506B2 (en) 2017-08-08 2023-08-08 Sentinel Labs Israel Ltd. Methods, systems, and devices for dynamically modeling and grouping endpoints for edge networking
US11716342B2 (en) 2017-08-08 2023-08-01 Sentinel Labs Israel Ltd. Methods, systems, and devices for dynamically modeling and grouping endpoints for edge networking
US11716341B2 (en) 2017-08-08 2023-08-01 Sentinel Labs Israel Ltd. Methods, systems, and devices for dynamically modeling and grouping endpoints for edge networking
US11321155B2 (en) * 2017-08-10 2022-05-03 Bank Of America Corporation Automatic resource dependency tracking and structure for maintenance of resource fault propagation
US11888897B2 (en) 2018-02-09 2024-01-30 SentinelOne, Inc. Implementing decoys in a network environment
US10762200B1 (en) 2019-05-20 2020-09-01 Sentinel Labs Israel Ltd. Systems and methods for executable code detection, automatic feature extraction and position independent code detection
US11790079B2 (en) 2019-05-20 2023-10-17 Sentinel Labs Israel Ltd. Systems and methods for executable code detection, automatic feature extraction and position independent code detection
US11580218B2 (en) 2019-05-20 2023-02-14 Sentinel Labs Israel Ltd. Systems and methods for executable code detection, automatic feature extraction and position independent code detection
US11210392B2 (en) 2019-05-20 2021-12-28 Sentinel Labs Israel Ltd. Systems and methods for executable code detection, automatic feature extraction and position independent code detection
CN111628997A (en) * 2020-05-26 2020-09-04 中国联合网络通信集团有限公司 Attack prevention method and device
US11748083B2 (en) 2020-12-16 2023-09-05 Sentinel Labs Israel Ltd. Systems, methods and devices for device fingerprinting and automatic deployment of software in a computing network using a peer-to-peer approach
US11579857B2 (en) 2020-12-16 2023-02-14 Sentinel Labs Israel Ltd. Systems, methods and devices for device fingerprinting and automatic deployment of software in a computing network using a peer-to-peer approach
US11899782B1 (en) 2021-07-13 2024-02-13 SentinelOne, Inc. Preserving DLL hooks
US11997139B2 (en) 2023-03-13 2024-05-28 SentinelOne, Inc. Deceiving attackers accessing network data

Also Published As

Publication number Publication date
US20180189487A1 (en) 2018-07-05
US10198580B2 (en) 2019-02-05

Similar Documents

Publication Publication Date Title
US10198580B2 (en) Behavior specification, finding main, and call graph visualizations
You et al. Profuzzer: On-the-fly input type probing for better zero-day vulnerability discovery
Herzig et al. The impact of tangled code changes on defect prediction models
Shirani et al. Binshape: Scalable and robust binary library function identification using function shape
Bernardi et al. Design pattern detection using a DSL‐driven graph matching approach
WO2021182986A1 (en) Method and system for searching for similar malicious programs on the basis of dynamic analysis results
Meurice et al. Static analysis of dynamic database usage in java systems
Ragkhitwetsagul et al. Using compilation/decompilation to enhance clone detection
Cheers et al. Academic source code plagiarism detection by measuring program behavioral similarity
Hsiao et al. Using web corpus statistics for program analysis
Nagy et al. A static code smell detector for SQL queries embedded in Java code
Li et al. Detecting similar programs via the Weisfeiler-Leman graph kernel
Zhang et al. {APICraft}: Fuzz driver generation for closed-source {SDK} libraries
Padmanabhuni et al. Buffer overflow vulnerability prediction from x86 executables using static analysis and machine learning
CN112256271B (en) Block chain intelligent contract safety detection system based on static analysis
Alrabaee et al. On leveraging coding habits for effective binary authorship attribution
Degiovanni et al. µbert: Mutation testing using pre-trained language models
Yang et al. Complex Python features in the wild
Alrabaee et al. Decoupling coding habits from functionality for effective binary authorship attribution
Kaul et al. A Uniform Representation of Classical and Quantum Source Code for Static Code Analysis
Hendrikse The effect of code obfuscation on authorship attribution of binary computer files
Ghosh et al. An empirical study of a hybrid code clone detection approach on java byte code
Neubauer Kamino: Dynamic approach to semantic code clone detection
Alshanqiti et al. Towards dynamic reverse engineering visual contracts from java
Xiao et al. Performing high efficiency source code static analysis with intelligent extensions

Legal Events

Date Code Title Description
AS Assignment

Owner name: U.S. DEPARTMENT OF ENERGY, DISTRICT OF COLUMBIA

Free format text: CONFIRMATORY LICENSE;ASSIGNOR:UT-BATTELLE, LLC;REEL/FRAME:037323/0853

Effective date: 20151021

AS Assignment

Owner name: UT-BATTELLE, LLC, TENNESSEE

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:SAYRE, KIRK D;WILLEMS, RICHARD A;LINDBERG, STEPHEN L;SIGNING DATES FROM 20151023 TO 20160122;REEL/FRAME:037596/0128

STCB Information on status: application discontinuation

Free format text: ABANDONED -- FAILURE TO RESPOND TO AN OFFICE ACTION