EP3876128A1 - Verifying structured data - Google Patents

Verifying structured data Download PDF

Info

Publication number
EP3876128A1
EP3876128A1 EP21171377.1A EP21171377A EP3876128A1 EP 3876128 A1 EP3876128 A1 EP 3876128A1 EP 21171377 A EP21171377 A EP 21171377A EP 3876128 A1 EP3876128 A1 EP 3876128A1
Authority
EP
European Patent Office
Prior art keywords
structured data
data
processing hardware
elements
activity
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
EP21171377.1A
Other languages
German (de)
French (fr)
Inventor
Parth Shukla
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Google LLC
Original Assignee
Google LLC
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Google LLC filed Critical Google LLC
Publication of EP3876128A1 publication Critical patent/EP3876128A1/en
Withdrawn legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/23Updating
    • G06F16/2365Ensuring data consistency and integrity
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/50Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
    • G06F21/51Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems at application loading time, e.g. accepting, rejecting, starting or inhibiting executable software based on integrity or source reliability
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/50Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
    • G06F21/55Detecting local intrusion or implementing counter-measures
    • G06F21/552Detecting local intrusion or implementing counter-measures involving long-term monitoring or reporting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/50Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
    • G06F21/55Detecting local intrusion or implementing counter-measures
    • G06F21/56Computer malware detection or handling, e.g. anti-virus arrangements
    • G06F21/566Dynamic detection, i.e. detection performed at run-time, e.g. emulation, suspicious activities
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/62Protecting access to data via a platform, e.g. using keys or access control rules
    • G06F21/6218Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database
    • G06F21/6227Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database where protection concerns the structure of data, e.g. records, types, queries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/50Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
    • G06F21/57Certifying or maintaining trusted computer platforms, e.g. secure boots or power-downs, version controls, system software checks, secure updates or assessing vulnerabilities
    • G06F21/572Secure firmware programming, e.g. of basic input output system [BIOS]

Definitions

  • This disclosure relates to verifying structured data.
  • Determining whether or not structured data on a computing device includes malicious or unexpected code can be difficult when the structured data includes mutable elements. For instance, binary data of a computing device associated with a manufacturer inevitably changes by some degree each time the computing device boots. As such, there may be differences between structured data samples taken from the same computing device at different times, or between structured data samples of the same type from different computing devices associated with the same manufacturer, that are not the result of the data being infected with bad or malicious code. Since some portions/elements of structured data are expected to change, and therefore permissible, merely identifying differences based on a comparison between structured data samples and standard structured data samples provided by a creator/manufacturer is not an accurate technique for identifying bad or malicious code.
  • determining whether or not a structured data sample has been compromised based solely upon identified element differences in the structured data can be problematic. These difficulties are further compounded when verifying larger numbers of structured data samples, such as verifying structured data samples taken from multiple computing devices in a fleet.
  • One aspect of the disclosure provides a method for verifying structured data.
  • the method includes receiving, at data processing hardware, structured data.
  • the method also includes deconstructing, by the data processing hardware, the structured data into corresponding elements.
  • the method further includes obtaining, at the data processing hardware, standard structured data having corresponding standard elements.
  • the method also includes comparing, by the data processing hardware, the elements of the structured data with the standard elements of the standard structured data to identify any element differences.
  • the method includes: comparing, by the data processing hardware, the element difference against a registry of element comparisons; determining, by the data processing hardware, whether the element difference is expected or unexpected based on a heuristic or at least one rule; and when the element difference is unexpected, generating, by the data processing hardware, a signal indicating the presence of an unexpected element in the structured data.
  • Implementations of the disclosure may include one or more of the following optional features.
  • the method includes storing the corresponding comparison between the respective element of the structured data with the respective standard element of the standard structured data in the registry of element comparisons.
  • the method may further include statistically analyzing, by the data processing hardware, the registry of element comparisons to determine the at least one rule indicating whether the element difference is expected or unexpected.
  • the method includes determining, by the data processing hardware, whether the element comprises any sub-elements.
  • the method includes deconstructing, by the data processing hardware, the element into the corresponding sub-elements.
  • the deconstructed structured data may include a recursively extracted tree structure.
  • the method may also include receiving, at the data processing hardware, a structured data type, and obtaining, at the data processing hardware, a data structure template based on the structured data type.
  • the method may further include deconstructing, by the data processing hardware, the structured data into corresponding elements based on the data structure template, and determining, by the data processing hardware, whether the element comprises any sub-elements based on the data structure template.
  • the method includes annotating each element of the structured data as matching, differing, missing, or extra based on the comparison of the respective element with the respective standard element.
  • the method may include identifying a hash or a location of each element.
  • the method may include identifying the corresponding standard element based on the hash or the location of each element, and determining whether data of the element is matching, differing, missing, or extra relative to standard data of the corresponding standard element.
  • the method may include marking the annotation of the respective element as expected or unexpected.
  • the structured data includes binary data.
  • the system includes data processing hardware and memory hardware in communication with the data processing hardware.
  • the memory hardware stores instructions that when executed on the data processing hardware cause the data processing hardware to perform operations.
  • the operations include receiving structured data, deconstructing the structured data into corresponding elements, obtaining standard structured data having corresponding standard elements, and comparing the elements of the structured data with the standard elements of the standard structured data to identify any element differences.
  • the operations include comparing the element difference against a registry of element comparisons and determining whether the element difference is expected or unexpected based on a heuristic or at least one rule.
  • the operations include generating a signal indicating the presence of an unexpected element in the structured data.
  • Implementations of the disclosure may include one or more of the following optional features.
  • the operations include storing the corresponding comparison between the respective element of the structured data with the respective standard element of the standard structured data in the registry of element comparisons.
  • the operations may also include statistically analyzing the registry of element comparisons to determine the at least one rule indicating whether element difference is expected or unexpected.
  • the operations include determining whether the element comprises any sub-elements.
  • the operations include deconstructing the element into the corresponding sub-elements.
  • the deconstructed structured data may include a recursively extracted tree structure.
  • the operations may also include receiving a structured data type, obtaining a data structure template based on the structured data type, deconstructing the structured data into corresponding elements based on the data structure template, and determining whether the element comprises any sub-elements based on the data structure template.
  • the operations include annotating each element of the structured data as matching, differing, missing, or extra based on the comparison of the respective element with the respective standard element.
  • the operations may include identifying a hash or a location of each element.
  • the operations may further include identifying the corresponding standard element based on the hash or the location of each element and determining whether data of the element is matching, differing, missing, or extra relative to standard data of the respective standard element.
  • the operations may include marking the annotation of the respective element as expected or unexpected.
  • the structured data includes binary data.
  • Implementations herein are directed toward a verification pipeline configured to, inter alia, determine/detect whether or not structured data includes bad or malicious code that may compromise one or more workstations in a fleet operated by an entity.
  • the structured data may include binary data, such as Basic Input/Output System (BIOS) data, that changes each time a workstation reboots.
  • BIOS Basic Input/Output System
  • comparing elements of structured data with corresponding standard elements from a golden copy of the structured data may not always provide a one-to-one match. While these comparisons may reveal element differences, the element differences identified from structured data received from all the workstations within the fleet may be statistically analyzed so that whitelists can be automatically generated.
  • These automatically-generated whitelists may specify whether or not an element difference is expected, i.e., due to mutations that are expected to occur, or unexpected, i.e., due to being infected by bad or malicious code.
  • existing whitelists may be updated to fine tune the verification process for determining whether or not an element difference is expected or unexpected. For instance, if the verification pipeline observes that a majority of samples of structured data in the fleet contain a corresponding element difference specified by a whitelist as being unexpected, the verification pipeline may update the whitelist so that the corresponding element difference is in fact expected. Implementations further include notifying an operator of the fleet (e.g., a verification device) when a presence of an unexpected element difference is detected. The operator of the fleet may assess whether or not the unexpected element difference is the result of bad or malicious code that may compromise the workstations in the fleet.
  • an operator of the fleet e.g., a verification device
  • an example system 100 includes one or more user devices 102, 102a-n each associated with a respective user 10 and in communication with a remote system 110 via a network 120.
  • Each user device 102 may correspond to a computing device, such as a desktop workstation or laptop workstation.
  • the remote system 110 may be a distributed system (e.g., a cloud environment) having scalable / elastic computing resources 112 (e.g., data processing hardware) and/or storage resources 114.
  • the computing resources 112 and/or storage resources 114 may also communicate with a verification device 180 over the network 120.
  • computing resources 112 of the remote system 110 execute a verifier 150 that receives a sample of structured data 200 from one or more user devices 102.
  • an entity operating the remote system 110 may own a fleet of user devices 102 each associated with a corresponding user 10 employed by the entity, and each user device 102 may provide the sample of structured data 200 to the verifier 150 for verifying that the contents of the structured data 200 have not been compromised.
  • the verifier 150 determines whether or not the structured data 200 has been infected with bad or malicious code that may compromise the user device 102 sourcing the structured data 200 and/or compromise multiple user devices 102 among a fleet in communication with each other via the network 120.
  • the storage resources 114 implement data storage hardware 160 and the data processing hardware 112 is in communication with the data storage hardware 160.
  • the verification device 180 is in communication with the verifier 150 (e.g., via the network 120) and provides one or more inputs 190 to the verifier 150.
  • the verification device 180 may send an input 190 to the verifier 150 requesting verification of structured data 200 from one or more user devices 102 in the fleet.
  • the verification device 180 may execute a user interface 182 on a display 184 of the verification device 180 to allow an operator of the verification device 180 to communicate with the verifier 150.
  • the inputs 190 may further include thresholds/constrains for determining whether the structured data 200 includes any element differences 430 when compared to corresponding standard structured data 250.
  • the thresholds/constraints may include a percentage of acceptability to determine whether the structured data 200 is matching or differing.
  • the inputs 190 may further include a heuristic or at least one rule for determining whether an identified element difference 430 is unexpected or expected.
  • the structured data 200 is associated with one or more attributes 202.
  • the attributes 202 of the structured data 200 include at least one of creator information 202a, version information 202b, or a data type 202c.
  • the creator information 202a may indicate a creator/manufacturer of the user device 102 sourcing the structured data 200 while the version information 202b may indicate a version associated with the structured data 200.
  • the data type 202c specifies the type of data the structured data 200 represents. For instance, the data type 202c may indicate that the structured data 200 represents a Portable Executable (PE) file that encapsulates executable code for loading an operating system on the user device 102.
  • PE Portable Executable
  • the data type 202c may further indicate that the structured data 200 is associated with an installer, certificate, a zip file, or Basic Input/Output System (BIOS) firmware.
  • BIOS firmware may be pre-installed on the user device 102 by a manufacturer thereof (e.g., as specified by the creator information 202a) for use in performing hardware initialization during the booting process and/or providing runtime services for operating systems and programs executing on the user device 102.
  • Structured data 200 associated with BIOS firmware is generally mutable as portions/elements of the structured data 200 may change each time the user device 102 re-boots.
  • the verifier 150 of the data processing hardware 112 implements a deconstructor 300, a structured data comparator 400, and an element difference analyzer 500.
  • the deconstructor 300 is configured to deconstruct/extract the structured data 200 received from the user device 102 into corresponding data elements 210, 210a-d.
  • the deconstructed structured data 200 includes a first element 210a, a second element 210b, a third element 210c, and a fourth element 210d.
  • Other examples may include the deconstructor 300 deconstructing each sample of structured data 200 into any number of data elements 210 corresponding to the structured data 200 under deconstruction.
  • the deconstructed structured data 200 includes a recursively extracted tree structure.
  • the deconstructor 300 includes a structured data type determiner 310 that determines the data type 202c of the received sample of structured data 200, and then provides the data type 202c to a data structure template module 320 configured to obtain a data structure template 340 based on the data type 202c.
  • the data structure template 340 may be provided from the creator/manufacturer of the user device 102 that is the source of the structured data 200.
  • the structured data type determiner 310 may also determine the creator information 202a and the version information 202b of the sample of structured data 200 for obtaining the data structure template 340.
  • the data structure template 340 may provide instructions for deconstructing the structured data 200 into the corresponding data elements 210.
  • the data structure template module 320 may reside on the data storage hardware 160 and may store multiple data structure templates 340 each associated with a corresponding data type 202c (and optionally a corresponding creator 202a and/or version 202b) that provide instructions for deconstructing/extracting the structured data 200 of the corresponding data type 202c.
  • the structured data 200 may be a recursively extracted tree structure and the template 340 may be used to deconstruct the structured data 200.
  • the deconstructor 300 may further include an element deconstructor 330 that uses the data structure template 340 to deconstruct/extract the structured data 200 into the corresponding data elements 210, 210a-d (e.g., E1, E2, E3, E4).
  • the element deconstructor 330 executes an appropriate parser configured to deconstruct/extract the structured data 200 into the corresponding data elements 210.
  • the deconstructor 300 also implements a sub-element deconstructor 350 that determines whether or not any of the data elements 210 include any sub-elements 220, 220a-c, and for each data element 210 that includes sub-elements 220, deconstructs the data element 210 into the corresponding sub-elements 220.
  • the sub-element deconstructor 350 determines the third element 210c includes sub-elements 220, 220a-c and deconstructs the sub-elements 220 (e.g., Sub-E1 220a, Sub-E2 220b, Sub-E3 220c) from the third element 210c.
  • the sub-element deconstructor 350 may further determine that the sub-elements 220 are of a data type 202c (e.g., as indicated by the data structure template 340) that requires further extraction/deconstruction. Accordingly, the element deconstructor 330 and the sub-element deconstructor 350 may include appropriate parsers for recursively extracting all of the elements 210 and sub-elements 220 until no more parsing is possible. For instance, structured data 200 having a data type 202c indicative of BIOS firmware or a zip file may necessitate further extraction of sub-elements 220 from within one or more of the data elements 210. Thereafter, the deconstructor 300 may provide the elements 210 and sub-elements 220 (if any) to the structured data comparator 400.
  • a data type 202c e.g., as indicated by the data structure template 340
  • the structured data comparator 400 is configured to obtain standard structured data 250 having corresponding standard elements 260, 260a-d and compare the elements 210 of the structured data 200 with the standard elements 260 of the standard structured data 250 to identify any element differences 430 therebetween.
  • the structured data comparator 400 may further compare the sub-elements 220 with standard-sub elements 270 of the standard elements 260 to identify element differences 430 therebetween.
  • standard structured data 250 refers to a golden copy (for example, a master, authoritative, and/or approved copy) of structured data provided by a manufacturer/creator that specifies paths, hashes, values, objects or other information or data for each standard element 260 (and sub-element 270) associated therewith.
  • the data processing hardware 112 may obtain multiple sets of standard structured data 250 from one or more manufacturers/creators and store each set of standard structured data 250 within a structured data registry 162 on the data storage hardware 160.
  • each set of standard structured data 250 may include corresponding attributes 202 so that each set of standard structured data 250 is associated with a corresponding manufacturer/creator (e.g., using creator information 202a), a corresponding version (e.g., using version information 202b), and/or a corresponding data type 202c.
  • the structured data registry 162 may be continuously updated by the data processing hardware 112 as manufacturers/creators provide new sets of standard structured data 250. For instance, new standard structure data 250 associated with BIOS firmware may be uploaded to the structured data registry 162 each time the manufacturer creates a new version of the BIOS firmware.
  • the structured data comparator 400 includes a standard structured data retriever 410 for retrieving corresponding standard structured data 250 from the structured data registry 162 using one or more of the attributes 202 of the sample of structured data 200.
  • the retriever 410 may identify the corresponding standard structured data 250 for retrieval from the structured data registry 162 as having the same data type 202c, the same version 202b, and the same creator 202a as the sample of structured data 200.
  • the retriever 410 may provide the standard structured data 250 to the deconstructor 300 for deconstructing/extracting the standard structured data 250 into the corresponding standard elements 260 (and any standard sub-elements 270), as discussed above in FIG.
  • An element comparator 420 may receive the elements/sub-elements 210, 220 of the sample of structured data 200 and the standard elements/sub-elements 260, 270 of the standard structured data 250 after the deconstructor 300 deconstructs respective ones of the structured data 200 and the standard structured data 250.
  • the element comparator 420 is configured to compare the elements/sub-elements 210, 220 of the structured data 200 to the corresponding standard elements/sub-elements 260, 270 of the standard structured data 250 on an element-by-element basis to identify element differences 430.
  • the element comparator 420 identifies a hash or location of each element/sub-element 210, 220 within the structured data 200 (e.g., recursively extracted tree structure) and then identifies the corresponding standard element/sub-element 260, 270 for comparison with the element/sub-element 210, 220 based on the hash or location thereof.
  • the element comparator 420 may compare each element/sub-element 210, 220 to the corresponding standard element/sub-element 260, 270 to determine a corresponding element comparison 440 indicating whether data of the element/sub-element 210, 220 is matching, differing, missing, or extra relative to standard data of the corresponding standard element/sub-element 260, 270. Accordingly, the element comparator 420 may output a list of element comparisons 440 with each element comparison 440 annotating a comparison result between a corresponding element/sub-element 210, 220 and a corresponding standard element/sub-element 260, 270 as either matching, differing, missing, or extra.
  • the element comparator 420 includes a threshold module 422 to set tolerances/constraints for how much (e.g., a percentage of acceptability) an element/sub-element 210, 220 can differ from a corresponding standard element/sub-element 260, 270 and still be annotated as "matching".
  • the element comparator 420 may employ the threshold module 422 to fine tune the tolerance/constraint of each element comparison 440 to initially require the element/sub-element 210, 220 to be within strict bounds (e.g., narrow set of tolerances/constraints) of the corresponding standard element/sub-element 260, 270 and subsequently permit the element/sub-element 210, 220 to deviate by some degree (e.g., wide set of tolerance/constraints) from the standard element/sub-element 260, 270.
  • strict bounds e.g., narrow set of tolerances/constraints
  • some degree e.g., wide set of tolerance/constraints
  • each element comparison 440 may identify the element difference 430 (e.g., differing) when a narrow set of tolerance/constraints are used in the comparison but annotate the element comparison 440 as "matching" when the wider set of tolerance/constrains are used in the comparison.
  • the verifier 150 may allow the element comparator 420 to self-learn for improving the accuracy and reliability as more samples of the structured data 200 pass through the comparator 420.
  • the verification device 180 FIG. 1 ) provides tolerance/constraint inputs 190 to the threshold module 422 for setting initial tolerances/constrains for each element comparison 440 and/or modifying existing tolerances/constraints.
  • the list of element comparisons 440 indicates data of the first and third elements (e.g., first element 210a and third element 210c in FIG. 1 ) are "differing" relative to corresponding standard data of corresponding first and third standard elements 260 of the standard structured data 250.
  • each of the "differing" annotations of the element comparisons 440 for elements 1 and 3 are identified as a corresponding element difference 430.
  • the element comparisons 440 further indicate that data of the first sub-element Sub-E1 220a ( FIG.
  • any sub-elements 220 annotated as “extra” or “missing” are identified as a corresponding element difference 430.
  • An annotation of "missing" may indicate that extraction/deconstruction of the structured data 200 does not produce a corresponding element/sub-element 210, 220 that surfaces in the standard structured data 250.
  • the element comparator 420 may store each of the annotated element comparisons 440 in a registry of element comparisons 164 and provide the annotated element comparisons 440 to the analyzer 500 for determining whether each element difference 430 is expected or unexpected based on a heuristic or at least one rule.
  • Each element comparison 440 may include a corresponding identifier 442 indicating the hash or location of the element/sub-element 210, 220 associated with element comparison 440.
  • BIOS firmware may contain an area to store machine specific settings which will be different for each BIOS firmware sample of structured data 200 when compared with corresponding standard structured data 250.
  • an element difference 430 identified in an element comparison 440 between an element/sub-element 210, 220 and a corresponding standard element/sub-element 260, 270 may be expected, and therefore, not indicative of the element/sub-element 210, 220 containing bad or malicious code.
  • the analyzer 500 is configured to determine, for each element difference 430 identified by the structured data comparator 400, whether the element difference 430 is "expected” or "unexpected".
  • an element difference 430 that is "expected” can be deemed allowable, or verified, by the verifier 150.
  • an element difference 430 that is "unexpected” is flagged by the verifier 150 as being suspicious and provided to an alarm module 170 for generating a signal 172 indicating the presence of an unexpected element/sub-element 210, 220 in the structured data 200.
  • the alarm module 170 may send the signal 172 to the verification device 180 requesting verification (e.g., via a corresponding input 190) of the sample of structured data 200 sourced from the user device 102.
  • the user interface 182 executing on the verification device 180 may display the indication of the presence of the unexpected element/sub-element 210, 220 in the structured data 200 on the display 184.
  • the analyzer 500 determines that the element difference 430 for the first element 210a is "unexpected” and that the element difference 430 for the third element 210c is "expected". Accordingly, the alarm module 170 may generate a signal 172 indicating the presence of the unexpected first element 210a to notify the verification device 180 that the first element 210a of the structured data 200 may include bad or malicious code that may compromise the user device(s) 102.
  • Implementations herein are directed toward a self-learning analyzer 500 having heuristic capabilities to not only identify when an element difference 430 is expected or unexpected based on the heuristic or the at least one rule, but to also allow changes/updates to the rule and/or allow identified element differences 430 to change from being "unexpected" to "expected” through statistical analysis of the registry of element comparisons 164.
  • the registry of element comparisons 164 may update continuously as more samples of structured data 200 are received from user devices 102 and pass through the verifier 150.
  • the analyzer 500 may update an element difference 430 identified as "unexpected” to now be “expected” when a threshold number and/or threshold percentage of other user devices 102 also source the same element difference 430.
  • the heuristic rule may indicate that an element difference 430 identified in an element comparison 440 under a wide set of tolerances/constrains is "unexpected” while identifying the element difference 430 under a narrower set of tolerances/constrains is “expected”. Accordingly, the tolerances/constraints used by the threshold module 422 of the element comparator 420 may interact or link to the rules indicating whether or not a corresponding element difference 430 is "unexpected” or “expected”.
  • the analyzer 500 performs an example analyzation process for determining whether each element difference 430 identified by the structured data comparator 400 is expected or unexpected based on a comparison against the registry of element comparisons 164.
  • FIGS. 5B and 5C show an example registry of element comparisons 164 corresponding to structured data 200 associated with the attributes 202 of manufacturer/creator 202a ("Manufacturer XYZ"), data type 202c ("BIOS Firmware"), and version 202b ("Version 2.1").
  • the registry of element comparisons 164 store the results of element comparisons 440 (i.e., from the structured data comparator 400) between elements 210 of the structured data 200 and corresponding standard elements 260 of the standard structured data 250.
  • the standard structured data 250 may be provided by the creator/manufacturer, e.g., "Manufacturer XYZ", of multiple user devices 102, 102a-n that source the samples of the structured data 200.
  • the registry of element comparisons 164 may include a timestamp 550.
  • FIG. 5B includes the registry of element comparisons 164 including a timestamp 550 at a first time (Time 1) and
  • FIG. 5C includes the registry of element comparisons 164 including a timestamp 550 at a second time (Time 2) occurring after Time 1.
  • the multiple user devices 102a-n may each be manufactured by the "Manufacturer XYZ" and belong to a fleet of user devices 102 owned and operated by an entity associated with the verification device 180.
  • the registry of element comparisons 164 depicts four element comparisons 440 associated with Elements 1-4 of the structured data 200 provided by each user device 102 in the fleet and corresponding standard structured data 250 having the same manufacturer/creator, version, and data type attributes 202, 202a-c as the structured data.
  • each element comparison 440 annotates a corresponding comparison result for each of the Elements 1-4 from each of the user devices 102a-n as either Matching or Differing.
  • the registry of element comparisons 164 may include more or less element comparisons 440 each associated with corresponding elements 210, 260 or any sub-elements 220, 270 deconstructed (e.g., via the deconstructor 300) from each sample of structured data 200 and the standard structured data 250. Accordingly, recursively extracted tree structures requiring element comparisons 440 between sub-elements 220 and corresponding standard sub-elements 270 may include corresponding comparison results annotated as either matching, differing, missing, or extra. Each element comparison 440 stored by the registry of element comparisons 164 may include the corresponding identifier 442 ( FIG. 4 ) indicating the hash or location of the element/sub-element 210, 220 associated with the element comparison 440.
  • the registry of element comparisons 164 further includes a counter 560 that indicates at least one of a percentage of user devices 102 in the fleet or a number of user devices 102 in the fleet that return an element comparison 440 annotated as "differing" for each element comparison 440 associated with Elements 1-4.
  • Other counters 560 may similarly be assigned to other annotations, such as, "matching", “extra”, or "missing”.
  • the verification device 180 may provide inputs 190 that assign annotations for the counter 560 to count.
  • the example registry of element comparisons 164 further includes a corresponding whitelist 540, 540a-d for each element comparison 440 that provides a rule indicating whether an identified element difference 430 is Expected or Unexpected.
  • each whitelist 540 codifies what changes (e.g., element differences 430) are expected and acceptable, and what changes are unexpected and need to be flagged as possibly including bad or malicious code.
  • the element comparison 440 for each of Elements 1-4 includes a corresponding whitelist 540a-d. In FIG.
  • the registry of element comparisons 164 at Time 1 includes the first, second, third, and fourth whitelists 540a, 540b, 540c, 540d for Elements 1, 2, 3, 4 all including a corresponding rule that indicates that any element comparisons 440 annotated as "differing" are Unexpected. Accordingly, the whitelists 540a-d at Time 1 may be initially set with the rule that any element comparison 440 annotated as "differing" is Unexpected. In some examples, the verification device 180 sets the rules for the different annotations as being Unexpected or Expected.
  • the manufacture/creator 202a associated with the registry of element comparisons 164 provides initial sets of the whitelists 540 for one or more of the Elements 1, 2, 3, 4 that may indicate when element differences 430 are Unexpected or Expected.
  • the whitelists 540 may be updated through the statistical analysis of the element comparisons 440 for all of the samples of structured data 200.
  • a whitelist 540 indicates that an element difference 430 annotated as "differing" is Expected but an element difference 430 annotated as "missing” or "extra” is Unexpected.
  • the analyzer 500 may automatically generate whitelists 540 and/or continuously update existing whitelists 540 for samples of structured data 200 associated with a particular set of one or more unique attributes 202, 202a-c.
  • the manually-created conventional whitelists include "static" rules that never change
  • the rules assigned to whitelists 540 may be automatically generated and/or dynamically updated by statistically analyzing the most recent element comparisons 440 stored in the registry of element comparisons 164.
  • FIG. 5B shows the third whitelist 540c at Time 1 including the rule that the annotation of "differing" for the element comparison 440 associated with Element 3 is Unexpected
  • FIG. 5C shows the third whitelist 540c updating the rule at Time 2 to now be Expected after the registry of element comparisons 164 determines that a threshold number of samples also include the corresponding element comparison 440 that annotates Element 3 as "Differing".
  • the registry of element comparisons 164 may update the third whitelist 540c so that any subsequent "differing" comparisons associated with Element 3 are Expected.
  • the verification device 180 may send an input 190 to the analyzer 500 that includes a value for the "threshold number of samples” to change a corresponding rule of a whitelist 540 from Unexpected to Expected.
  • the threshold number of samples associated with one whitelist 540 may be the same or different than the threshold number of samples associated with other whitelists 540.
  • the registry of element comparisons 164 will maintain the first whitelist 540a at Time 2 ( FIG. 5C ) since the threshold number of samples (e.g., at least 95% of the fleet of user devices 102) annotating Element 1 as "differing" is not satisfied.
  • the alarm module 170 may generate the signal 172 indicating the presence of an unexpected Element 1 (e.g., the first element 210a of FIG.
  • the verification device 180 may assess the signal 172 to determine whether or not the unexpected Element 1 is the result of being infected with bad or malicious code.
  • the analyzer 500 waits until all samples of the same structured data 200 from the user devices 102 in the fleet have passed through the verifier 150 to avoid prematurely sending signals 172 to the verification device 180.
  • the analyzer 500 first determines whether each element difference 430 associated with the sample of structured data 200 received from the first user device 102a (User Device a) is expected or unexpected based on the comparison against the registry of element comparisons 164.
  • an element difference 430 is identified for each of Elements 1 and 3 since Elements 1 and 3 are both annotated as "differing”.
  • the analyzation process obtains the registry of element comparisons 164 from the data storage hardware 160 based on the attributes 202 of the sample of the structured data 200.
  • the "registry of element comparisons 164" corresponds to the registry of element comparisons 164 of FIG. 5C at Time 2.
  • the analyzation process compares the element difference 430 against the registry of element comparisons 164 to determine if the "differing" element difference 430 is "unexpected".
  • the corresponding whitelist 540 may include the corresponding rule that indicates whether the "Differing" element difference 430 is "unexpected” or “expected”.
  • the first whitelist 540a for Element 1 includes the rule indicating that the "Differing" element difference 430 is "unexpected”
  • the third whitelist 540c for Element 3 includes the rule indicating that the "Differing" element difference 430 is "expected”.
  • step 506 determines that the "Differing" element difference 430 is "expected", i.e., step 504 is "No".
  • step 508 determines whether or not a threshold number of samples also include the corresponding element difference 430.
  • the analyzation process determines that the "Differing" element difference 430 is "unexpected", i.e., step 504 is "Yes” and proceeds to step 508.
  • the analyzer 500 may review the counter 560 of the registry of element comparisons 164 that indicates at least one of a percentage of the user devices 102 in the fleet or a number of the user devices 102 in the fleet that return the corresponding element difference 430, i.e., the "Differing" element difference 430 associated with Element 1.
  • the counter 560 indicates that 5-percent (5%), or one (1) user device 102, in the fleet user devices 102 includes the corresponding "Differing" element difference 430.
  • the "threshold number of samples" includes at least 90- or 95-percent of the user devices 102 in the fleet to return the element difference 430.
  • the "threshold number of samples” may also require at least a minimum number of devices 102 in the fleet to return the corresponding element difference 430 before the threshold is satisfied.
  • the minimum number may be about 10 devices to make sure that the number of samples is robust before overturning a rule specified by the whitelist 540. If the threshold number of samples is satisfied, i.e., step 508 is "Yes", then the analyzation process proceeds to step 510 and changes the rule of the corresponding whitelist 540 from "unexpected" to "expected”.
  • step 508 the threshold number of samples is not satisfied, i.e., step 508 is "No"
  • the analyzation process proceeds to step 512 and flags the corresponding element difference 430 as being "unexpected". Since the counter 560 of the registry of element comparisons 164 identifies that Element 1 is annotated as "differing" in only 5-percent (5%) of the fleet of user devices 102, the analyzation process may determine that the threshold number of samples is not satisfied. Accordingly, the analyzation process may flag the corresponding "Differing" element difference 430 associated with Element 1 and notify the alarm module 170.
  • the alarm module 170 may generate the signal 172 indicating the presence of the unexpected element (Element 1) in the structured data 200 received from the first user device 102a.
  • the verification device 180 may receive the signal 172 to determine whether or not the first user device 102a has been compromised as a result of bad or malicious code identified by the element difference associated with Element 1.
  • the analyzation process executing by the analyzer 500 may repeat for each sample of the structured data 200 received from the other user devices 102b-n in the fleet.
  • FIG. 6 is a schematic view of an example computing device 600 that may be used to implement the systems and methods described in this document, such as the computing resource 112.
  • the computing device 600 is intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers.
  • the components shown here, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the inventions described and/or claimed in this document.
  • the computing device 600 includes a processor 610 (i.e., data processing hardware), memory 620, a storage device 630, a high-speed interface/controller 640 connecting to the memory 620 and high-speed expansion ports 650, and a low speed interface/controller 660 connecting to a low speed bus 670 and storage device 630.
  • processor 610 i.e., data processing hardware
  • memory 620 i.e., a main memory
  • storage device 630 i.e., data processing hardware
  • a high-speed interface/controller 640 connecting to the memory 620 and high-speed expansion ports 650
  • a low speed interface/controller 660 connecting to a low speed bus 670 and storage device 630.
  • Each of the components 610, 620, 630, 640, 650, and 660 are interconnected using various busses, and may be mounted on a common motherboard or in other manners as appropriate.
  • the processor 610 can process instructions for execution within the computing device 600, including instructions stored in the memory 620 or on the storage device 630 to display graphical information for a GUI on an external input/output device, such as a display 680 coupled to a high speed interface 640.
  • an external input/output device such as a display 680 coupled to a high speed interface 640.
  • multiple processors and/or multiple buses may be used, as appropriate, along with multiple memories and types of memory.
  • multiple computing devices 600 may be connected, with each device providing portions of the necessary operations (e.g., as a server bank, a group of blade servers, or a multi-processor system).
  • the memory 620 includes hardware (e.g., data storage hardware 160) that stores information non-transitorily within the computing device 600.
  • the memory 620 may be a computer-readable medium, a volatile memory unit(s), or non-volatile memory unit(s).
  • the non-transitory memory 620 may be physical devices (e.g. hardware) used to store programs (e.g., sequences of instructions) or data (e.g., program state information) on a temporary or permanent basis for use by the computing device 600.
  • non-volatile memory examples include, but are not limited to, flash memory and read-only memory (ROM) / programmable read-only memory (PROM) / erasable programmable read-only memory (EPROM) / electronically erasable programmable read-only memory (EEPROM) (e.g., typically used for firmware, such as boot programs) as well as disks or tapes.
  • ROM read-only memory
  • PROM programmable read-only memory
  • EPROM erasable programmable read-only memory
  • EEPROM electronically erasable programmable read-only memory
  • volatile memory examples include, but are not limited to, random access memory (RAM), dynamic random access memory (DRAM), static random access memory (SRAM), phase change memory (PCM).
  • RAM random access memory
  • DRAM dynamic random access memory
  • SRAM static random access memory
  • PCM phase change memory
  • the storage device 630 is capable of providing mass storage for the computing device 600.
  • the storage device 630 is a computer-readable medium.
  • the storage device 630 may be a floppy disk device, a hard disk device, an optical disk device, or a tape device, a flash memory or other similar solid state memory device, or an array of devices, including devices in a storage area network or other configurations.
  • a computer program product is tangibly embodied in an information carrier.
  • the computer program product contains instructions that, when executed, perform one or more methods, such as those described above.
  • the information carrier is a computer- or machine-readable medium, such as the memory 620, the storage device 630, or memory on processor 610.
  • the high speed controller 640 manages bandwidth-intensive operations for the computing device 600, while the low speed controller 660 manages lower bandwidth-intensive operations. Such allocation of duties is exemplary only.
  • the high-speed controller 640 is coupled to the memory 620, the display 680 (e.g., through a graphics processor or accelerator), and to the high-speed expansion ports 650, which may accept various expansion cards (not shown).
  • the low-speed controller 660 is coupled to the storage device 630 and a low-speed expansion port 670.
  • the low-speed expansion port 670 which may include various communication ports (e.g., USB, Bluetooth, Ethernet, wireless Ethernet), may be coupled to one or more input/output devices, such as a keyboard, a pointing device, a scanner, or a networking device, such as a switch or router, e.g., through a network adapter.
  • input/output devices such as a keyboard, a pointing device, a scanner, or a networking device, such as a switch or router, e.g., through a network adapter.
  • the computing device 600 may be implemented in a number of different forms, as shown in the figure. For example, it may be implemented as a standard server 600a or multiple times in a group of such servers 600a, as a laptop computer 600b, or as part of a rack server system 600c.
  • a software application may refer to computer software that causes a computing device to perform a task.
  • a software application may be referred to as an "application,” an "app,” or a "program.”
  • Example applications include, but are not limited to, mobile applications, system diagnostic applications, system management applications, system maintenance applications, word processing applications, spreadsheet applications, messaging applications, media streaming applications, social networking applications, and gaming applications.
  • the memory hardware 110 may be physical devices used to store programs (e.g., sequences of instructions) or data (e.g., program state information) on a temporary or permanent basis for use by a computing device 110hc.
  • the non-transitory memory hardware110hm may be volatile and/or non-volatile addressable semiconductor memory. Examples of non-volatile memory include, but are not limited to, flash memory and read-only memory (ROM) / programmable read-only memory (PROM) / erasable programmable read-only memory (EPROM) / electronically erasable programmable read-only memory (EEPROM) (e.g., typically used for firmware, such as boot programs). Examples of volatile memory include, but are not limited to, random access memory (RAM), dynamic random access memory (DRAM), static random access memory (SRAM), phase change memory (PCM) as well as disks or tapes.
  • RAM random access memory
  • DRAM dynamic random access memory
  • SRAM static random access memory
  • PCM phase change memory
  • FIG. 7 is a flowchart of an example method 700 executed by the computing device 600 of FIG. 6 for verifying structured data 200.
  • the flowchart starts at operation 702 by receiving structured data 200 at data processing hardware 112 (executing on the computing device 600) and deconstructing, by the data processing hardware 112, the structured data 200 into corresponding elements 210 (and any sub-elements 220).
  • the data processing hardware 112 may execute a verifier 150 that implements a deconstructor 300, a structured data comparator 400, an analyzer 500, and an alarm module 170.
  • the data processing hardware 112 may use the deconstructor 300 to deconstruct the structured data 200.
  • a verification device 180 in communication with the verifier 150 may request the verifier 150, e.g., via an input 190, to verify the structured data 200 from one or more user devices 102 in a fleet.
  • the data processing hardware 112 obtains standard structured data 250 having corresponding standard elements 260 (and any sub-elements 270).
  • the data processing hardware 112 may obtain the standard structured data 250 from a standard structured data registry 162 residing on the data storage hardware 160.
  • the data processing hardware 112 may retrieve the standard structured data 250 having the same one or more attributes 202 as the attributes 202 associated with the structured data 200.
  • the data processing hardware 112 compares (e.g., using the structured data comparator 400) the elements/sub-elements 210, 220 of the structured data 200 with the standard elements/sub-elements 260, 270 of the standard structured data to identify any element differences 430.
  • the data processing hardware 112 compares the element difference 430 against a registry of element comparisons 164, and at step 710, determines whether the element difference 430 is expected or unexpected based on a heuristic or at least one rule.
  • the registry of element comparisons 164 may include the most current state of element differences 430 and a corresponding whitelist 540 including a rule indicating whether or not the element differences 430 is expected or unexpected.
  • the whitelist 540 may be automatically generated by the data processing hardware 112 as samples of structured data 200 pass through the verifier 150 and/or existing whitelists 540 may be continuously updated based on samples of structured data 200 passing through the verifier 150.
  • the data processing hardware 112 generates (e.g., using the alarm module 170) a signal 172 indicating the presence of an unexpected element/sub-element 210, 220 in the structured data 200.
  • the verification device 180 may receive the signal 172 and cause a user interface 182 to display the indication of the presence of the unexpected element/sub-element 210, 220 in the structured data 200 on a display 184.
  • implementations of the systems and techniques described herein can be realized in digital electronic and/or optical circuitry, integrated circuitry, specially designed ASICs (application specific integrated circuits), computer hardware, firmware, software, and/or combinations thereof.
  • ASICs application specific integrated circuits
  • These various implementations can include implementation in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, coupled to receive data and instructions from, and to transmit data and instructions to, a storage system, at least one input device, and at least one output device.
  • the processes and logic flows described in this specification can be performed by one or more programmable processors executing one or more computer programs to perform functions by operating on input data and generating output.
  • the processes and logic flows can also be performed by special purpose logic circuitry, e.g., an FPGA (field programmable gate array) or an ASIC (application specific integrated circuit).
  • processors suitable for the execution of a computer program include, by way of example, both general and special purpose microprocessors, and any one or more processors of any kind of digital computer.
  • a processor will receive instructions and data from a read only memory or a random access memory or both.
  • the essential elements of a computer are a processor for performing instructions and one or more memory devices for storing instructions and data.
  • a computer will also include, or be operatively coupled to receive data from or transfer data to, or both, one or more mass storage devices for storing data, e.g., magnetic, magneto optical disks, or optical disks.
  • mass storage devices for storing data
  • a computer need not have such devices.
  • Computer readable media suitable for storing computer program instructions and data include all forms of non-volatile memory, media and memory devices, including by way of example semiconductor memory devices, e.g., EPROM, EEPROM, and flash memory devices; magnetic disks, e.g., internal hard disks or removable disks; magneto optical disks; and CD ROM and DVD-ROM disks.
  • the processor and the memory can be supplemented by, or incorporated in, special purpose logic circuitry.
  • one or more aspects of the disclosure can be implemented on a computer having a display device, e.g., a CRT (cathode ray tube), LCD (liquid crystal display) monitor, or touch screen for displaying information to the user and optionally a keyboard and a pointing device, e.g., a mouse or a trackball, by which the user can provide input to the computer.
  • a display device e.g., a CRT (cathode ray tube), LCD (liquid crystal display) monitor, or touch screen for displaying information to the user and optionally a keyboard and a pointing device, e.g., a mouse or a trackball, by which the user can provide input to the computer.
  • Other kinds of devices can be used to provide interaction with a user as well; for example, feedback provided to the user can be any form of sensory feedback, e.g., visual feedback, auditory feedback, or tactile feedback; and input from the user can be received in any form, including acoustic, speech, or tactile input

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Security & Cryptography (AREA)
  • Software Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Computer Hardware Design (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Bioethics (AREA)
  • Data Mining & Analysis (AREA)
  • Virology (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Document Processing Apparatus (AREA)

Abstract

A method (700) for verifying structured data (200) includes receiving structured data, deconstructing the structured data into corresponding elements (210) and obtaining standard structured data (250) having corresponding standard elements (260). The method also includes comparing the elements of the structured data with the standard elements of the standard structured data to identify any element differences (440). For each element difference, the method includes comparing the element difference against a registry of element comparisons (164), determining whether the element difference is expected or unexpected based on a heuristic or at least one rule, and when the element difference is unexpected, generating a signal (172) indicating the presence of an unexpected element in the structured data.

Description

    TECHNICAL FIELD
  • This disclosure relates to verifying structured data.
  • BACKGROUND
  • Determining whether or not structured data on a computing device includes malicious or unexpected code can be difficult when the structured data includes mutable elements. For instance, binary data of a computing device associated with a manufacturer inevitably changes by some degree each time the computing device boots. As such, there may be differences between structured data samples taken from the same computing device at different times, or between structured data samples of the same type from different computing devices associated with the same manufacturer, that are not the result of the data being infected with bad or malicious code. Since some portions/elements of structured data are expected to change, and therefore permissible, merely identifying differences based on a comparison between structured data samples and standard structured data samples provided by a creator/manufacturer is not an accurate technique for identifying bad or malicious code. Accordingly, without information on which elements of a structured data sample may be different from other corresponding samples and/or may change over time, determining whether or not a structured data sample has been compromised based solely upon identified element differences in the structured data can be problematic. These difficulties are further compounded when verifying larger numbers of structured data samples, such as verifying structured data samples taken from multiple computing devices in a fleet.
  • SUMMARY
  • One aspect of the disclosure provides a method for verifying structured data. The method includes receiving, at data processing hardware, structured data. The method also includes deconstructing, by the data processing hardware, the structured data into corresponding elements. The method further includes obtaining, at the data processing hardware, standard structured data having corresponding standard elements. The method also includes comparing, by the data processing hardware, the elements of the structured data with the standard elements of the standard structured data to identify any element differences. For each element difference, the method includes: comparing, by the data processing hardware, the element difference against a registry of element comparisons; determining, by the data processing hardware, whether the element difference is expected or unexpected based on a heuristic or at least one rule; and when the element difference is unexpected, generating, by the data processing hardware, a signal indicating the presence of an unexpected element in the structured data.
  • Implementations of the disclosure may include one or more of the following optional features. In some examples, for each element difference, the method includes storing the corresponding comparison between the respective element of the structured data with the respective standard element of the standard structured data in the registry of element comparisons. Optionally, the method may further include statistically analyzing, by the data processing hardware, the registry of element comparisons to determine the at least one rule indicating whether the element difference is expected or unexpected.
  • In some implementations, for each element of the structured data, the method includes determining, by the data processing hardware, whether the element comprises any sub-elements. When the element comprises sub-elements, the method includes deconstructing, by the data processing hardware, the element into the corresponding sub-elements. Here, the deconstructed structured data may include a recursively extracted tree structure. The method may also include receiving, at the data processing hardware, a structured data type, and obtaining, at the data processing hardware, a data structure template based on the structured data type. The method may further include deconstructing, by the data processing hardware, the structured data into corresponding elements based on the data structure template, and determining, by the data processing hardware, whether the element comprises any sub-elements based on the data structure template.
  • In some configurations, the method includes annotating each element of the structured data as matching, differing, missing, or extra based on the comparison of the respective element with the respective standard element. When comparing the elements of the structured data with the standard elements of the standard structured data, the method may include identifying a hash or a location of each element. For each element, the method may include identifying the corresponding standard element based on the hash or the location of each element, and determining whether data of the element is matching, differing, missing, or extra relative to standard data of the corresponding standard element. When determining whether the element difference is expected or unexpected, the method may include marking the annotation of the respective element as expected or unexpected. In some examples, the structured data includes binary data.
  • Another aspect of the disclosure provides a system for verifying structured data. The system includes data processing hardware and memory hardware in communication with the data processing hardware. The memory hardware stores instructions that when executed on the data processing hardware cause the data processing hardware to perform operations. The operations include receiving structured data, deconstructing the structured data into corresponding elements, obtaining standard structured data having corresponding standard elements, and comparing the elements of the structured data with the standard elements of the standard structured data to identify any element differences. For each element difference, the operations include comparing the element difference against a registry of element comparisons and determining whether the element difference is expected or unexpected based on a heuristic or at least one rule. When the element difference is unexpected, the operations include generating a signal indicating the presence of an unexpected element in the structured data.
  • Implementations of the disclosure may include one or more of the following optional features. In some implementations, for each element difference, the operations include storing the corresponding comparison between the respective element of the structured data with the respective standard element of the standard structured data in the registry of element comparisons. The operations may also include statistically analyzing the registry of element comparisons to determine the at least one rule indicating whether element difference is expected or unexpected.
  • In some examples, for each element, the operations include determining whether the element comprises any sub-elements. When the element comprises sub-elements, the operations include deconstructing the element into the corresponding sub-elements. The deconstructed structured data may include a recursively extracted tree structure. Additionally or alternatively, the operations may also include receiving a structured data type, obtaining a data structure template based on the structured data type, deconstructing the structured data into corresponding elements based on the data structure template, and determining whether the element comprises any sub-elements based on the data structure template.
  • In some configurations, the operations include annotating each element of the structured data as matching, differing, missing, or extra based on the comparison of the respective element with the respective standard element. When comparing the elements of the structured data with the standard elements of the standard structured data, the operations may include identifying a hash or a location of each element. For each element, the operations may further include identifying the corresponding standard element based on the hash or the location of each element and determining whether data of the element is matching, differing, missing, or extra relative to standard data of the respective standard element. When determining whether the element difference is expected or unexpected, the operations may include marking the annotation of the respective element as expected or unexpected. In some implementations, the structured data includes binary data.
  • DESCRIPTION OF DRAWINGS
    • FIG. 1 is a schematic view of an example system for verifying structured data.
    • FIG. 2 is a schematic view of attributes associated with structured data.
    • FIG. 3 is a schematic view of example components of a deconstructor of the system of FIG. 1.
    • FIG. 4 is a schematic view of example components of a structured data comparator of the system of FIG. 1.
    • FIG. 5A is a schematic view of an example analyzation process for determining whether or not an identified element difference in the structured data is expected or unexpected.
    • FIGS. 5B and 5C are schematic views of an example registry of element comparisons.
    • FIG. 6 is an example computing device.
    • FIG. 7 is a flowchart of an example method for verifying structured data.
  • Like reference symbols in the various drawings indicate like elements.
  • DETAILED DESCRIPTION
  • Implementations herein are directed toward a verification pipeline configured to, inter alia, determine/detect whether or not structured data includes bad or malicious code that may compromise one or more workstations in a fleet operated by an entity. The structured data may include binary data, such as Basic Input/Output System (BIOS) data, that changes each time a workstation reboots. As a result, comparing elements of structured data with corresponding standard elements from a golden copy of the structured data may not always provide a one-to-one match. While these comparisons may reveal element differences, the element differences identified from structured data received from all the workstations within the fleet may be statistically analyzed so that whitelists can be automatically generated. These automatically-generated whitelists may specify whether or not an element difference is expected, i.e., due to mutations that are expected to occur, or unexpected, i.e., due to being infected by bad or malicious code. Moreover, as more structured data passes through the pipeline, existing whitelists may be updated to fine tune the verification process for determining whether or not an element difference is expected or unexpected. For instance, if the verification pipeline observes that a majority of samples of structured data in the fleet contain a corresponding element difference specified by a whitelist as being unexpected, the verification pipeline may update the whitelist so that the corresponding element difference is in fact expected. Implementations further include notifying an operator of the fleet (e.g., a verification device) when a presence of an unexpected element difference is detected. The operator of the fleet may assess whether or not the unexpected element difference is the result of bad or malicious code that may compromise the workstations in the fleet.
  • Referring to FIG. 1, in some implementations, an example system 100 includes one or more user devices 102, 102a-n each associated with a respective user 10 and in communication with a remote system 110 via a network 120. Each user device 102 may correspond to a computing device, such as a desktop workstation or laptop workstation. The remote system 110 may be a distributed system (e.g., a cloud environment) having scalable / elastic computing resources 112 (e.g., data processing hardware) and/or storage resources 114. The computing resources 112 and/or storage resources 114 may also communicate with a verification device 180 over the network 120. In some implementations, computing resources 112 of the remote system 110 execute a verifier 150 that receives a sample of structured data 200 from one or more user devices 102. For example, an entity operating the remote system 110 may own a fleet of user devices 102 each associated with a corresponding user 10 employed by the entity, and each user device 102 may provide the sample of structured data 200 to the verifier 150 for verifying that the contents of the structured data 200 have not been compromised. To put another way, the verifier 150 determines whether or not the structured data 200 has been infected with bad or malicious code that may compromise the user device 102 sourcing the structured data 200 and/or compromise multiple user devices 102 among a fleet in communication with each other via the network 120. In some examples, the storage resources 114 implement data storage hardware 160 and the data processing hardware 112 is in communication with the data storage hardware 160.
  • In some implementations, the verification device 180 is in communication with the verifier 150 (e.g., via the network 120) and provides one or more inputs 190 to the verifier 150. For instance, the verification device 180 may send an input 190 to the verifier 150 requesting verification of structured data 200 from one or more user devices 102 in the fleet. The verification device 180 may execute a user interface 182 on a display 184 of the verification device 180 to allow an operator of the verification device 180 to communicate with the verifier 150. Described in greater detail below, the inputs 190 may further include thresholds/constrains for determining whether the structured data 200 includes any element differences 430 when compared to corresponding standard structured data 250. The thresholds/constraints may include a percentage of acceptability to determine whether the structured data 200 is matching or differing. The inputs 190 may further include a heuristic or at least one rule for determining whether an identified element difference 430 is unexpected or expected.
  • The structured data 200 is associated with one or more attributes 202. Referring to FIG. 2, in some implementations, the attributes 202 of the structured data 200 include at least one of creator information 202a, version information 202b, or a data type 202c. The creator information 202a may indicate a creator/manufacturer of the user device 102 sourcing the structured data 200 while the version information 202b may indicate a version associated with the structured data 200. The data type 202c specifies the type of data the structured data 200 represents. For instance, the data type 202c may indicate that the structured data 200 represents a Portable Executable (PE) file that encapsulates executable code for loading an operating system on the user device 102. The data type 202c may further indicate that the structured data 200 is associated with an installer, certificate, a zip file, or Basic Input/Output System (BIOS) firmware. BIOS firmware may be pre-installed on the user device 102 by a manufacturer thereof (e.g., as specified by the creator information 202a) for use in performing hardware initialization during the booting process and/or providing runtime services for operating systems and programs executing on the user device 102. Structured data 200 associated with BIOS firmware is generally mutable as portions/elements of the structured data 200 may change each time the user device 102 re-boots.
  • Referring back to FIG. 1, in some implementations, the verifier 150 of the data processing hardware 112 implements a deconstructor 300, a structured data comparator 400, and an element difference analyzer 500. The deconstructor 300 is configured to deconstruct/extract the structured data 200 received from the user device 102 into corresponding data elements 210, 210a-d. In the example shown, the deconstructed structured data 200 includes a first element 210a, a second element 210b, a third element 210c, and a fourth element 210d. Other examples may include the deconstructor 300 deconstructing each sample of structured data 200 into any number of data elements 210 corresponding to the structured data 200 under deconstruction. In some implementations, the deconstructed structured data 200 includes a recursively extracted tree structure.
  • Referring to FIG. 3, in some implementations, the deconstructor 300 includes a structured data type determiner 310 that determines the data type 202c of the received sample of structured data 200, and then provides the data type 202c to a data structure template module 320 configured to obtain a data structure template 340 based on the data type 202c. The data structure template 340 may be provided from the creator/manufacturer of the user device 102 that is the source of the structured data 200. Moreover, the structured data type determiner 310 may also determine the creator information 202a and the version information 202b of the sample of structured data 200 for obtaining the data structure template 340. Here, the data structure template 340 may provide instructions for deconstructing the structured data 200 into the corresponding data elements 210. The data structure template module 320 may reside on the data storage hardware 160 and may store multiple data structure templates 340 each associated with a corresponding data type 202c (and optionally a corresponding creator 202a and/or version 202b) that provide instructions for deconstructing/extracting the structured data 200 of the corresponding data type 202c. For instance, the structured data 200 may be a recursively extracted tree structure and the template 340 may be used to deconstruct the structured data 200. The deconstructor 300 may further include an element deconstructor 330 that uses the data structure template 340 to deconstruct/extract the structured data 200 into the corresponding data elements 210, 210a-d (e.g., E1, E2, E3, E4). In some examples, the element deconstructor 330 executes an appropriate parser configured to deconstruct/extract the structured data 200 into the corresponding data elements 210.
  • In some implementations, the deconstructor 300 also implements a sub-element deconstructor 350 that determines whether or not any of the data elements 210 include any sub-elements 220, 220a-c, and for each data element 210 that includes sub-elements 220, deconstructs the data element 210 into the corresponding sub-elements 220. In the example shown, the sub-element deconstructor 350 determines the third element 210c includes sub-elements 220, 220a-c and deconstructs the sub-elements 220 (e.g., Sub-E1 220a, Sub-E2 220b, Sub-E3 220c) from the third element 210c. The sub-element deconstructor 350 may further determine that the sub-elements 220 are of a data type 202c (e.g., as indicated by the data structure template 340) that requires further extraction/deconstruction. Accordingly, the element deconstructor 330 and the sub-element deconstructor 350 may include appropriate parsers for recursively extracting all of the elements 210 and sub-elements 220 until no more parsing is possible. For instance, structured data 200 having a data type 202c indicative of BIOS firmware or a zip file may necessitate further extraction of sub-elements 220 from within one or more of the data elements 210. Thereafter, the deconstructor 300 may provide the elements 210 and sub-elements 220 (if any) to the structured data comparator 400.
  • Referring back to FIG. 1, the structured data comparator 400 is configured to obtain standard structured data 250 having corresponding standard elements 260, 260a-d and compare the elements 210 of the structured data 200 with the standard elements 260 of the standard structured data 250 to identify any element differences 430 therebetween. When the deconstructor 300 has deconstructed sub-elements 220 from one or more of the elements 210 of the structured data 200, the structured data comparator 400 may further compare the sub-elements 220 with standard-sub elements 270 of the standard elements 260 to identify element differences 430 therebetween. As used herein, "standard structured data 250" refers to a golden copy (for example, a master, authoritative, and/or approved copy) of structured data provided by a manufacturer/creator that specifies paths, hashes, values, objects or other information or data for each standard element 260 (and sub-element 270) associated therewith. The data processing hardware 112 may obtain multiple sets of standard structured data 250 from one or more manufacturers/creators and store each set of standard structured data 250 within a structured data registry 162 on the data storage hardware 160. Here, each set of standard structured data 250 may include corresponding attributes 202 so that each set of standard structured data 250 is associated with a corresponding manufacturer/creator (e.g., using creator information 202a), a corresponding version (e.g., using version information 202b), and/or a corresponding data type 202c. The structured data registry 162 may be continuously updated by the data processing hardware 112 as manufacturers/creators provide new sets of standard structured data 250. For instance, new standard structure data 250 associated with BIOS firmware may be uploaded to the structured data registry 162 each time the manufacturer creates a new version of the BIOS firmware.
  • Referring to FIG. 4, in some implementations, the structured data comparator 400 includes a standard structured data retriever 410 for retrieving corresponding standard structured data 250 from the structured data registry 162 using one or more of the attributes 202 of the sample of structured data 200. For instance, the retriever 410 may identify the corresponding standard structured data 250 for retrieval from the structured data registry 162 as having the same data type 202c, the same version 202b, and the same creator 202a as the sample of structured data 200. Upon obtaining the standard structured data 250 from the registry 162, the retriever 410 may provide the standard structured data 250 to the deconstructor 300 for deconstructing/extracting the standard structured data 250 into the corresponding standard elements 260 (and any standard sub-elements 270), as discussed above in FIG. 3 with respect to the sample of structured data 200. An element comparator 420 may receive the elements/ sub-elements 210, 220 of the sample of structured data 200 and the standard elements/sub-elements 260, 270 of the standard structured data 250 after the deconstructor 300 deconstructs respective ones of the structured data 200 and the standard structured data 250.
  • The element comparator 420 is configured to compare the elements/ sub-elements 210, 220 of the structured data 200 to the corresponding standard elements/sub-elements 260, 270 of the standard structured data 250 on an element-by-element basis to identify element differences 430. In some examples, the element comparator 420 identifies a hash or location of each element/ sub-element 210, 220 within the structured data 200 (e.g., recursively extracted tree structure) and then identifies the corresponding standard element/sub-element 260, 270 for comparison with the element/ sub-element 210, 220 based on the hash or location thereof. For instance, the element comparator 420 may compare each element/ sub-element 210, 220 to the corresponding standard element/sub-element 260, 270 to determine a corresponding element comparison 440 indicating whether data of the element/ sub-element 210, 220 is matching, differing, missing, or extra relative to standard data of the corresponding standard element/sub-element 260, 270. Accordingly, the element comparator 420 may output a list of element comparisons 440 with each element comparison 440 annotating a comparison result between a corresponding element/ sub-element 210, 220 and a corresponding standard element/sub-element 260, 270 as either matching, differing, missing, or extra.
  • In some configurations, the element comparator 420 includes a threshold module 422 to set tolerances/constraints for how much (e.g., a percentage of acceptability) an element/ sub-element 210, 220 can differ from a corresponding standard element/sub-element 260, 270 and still be annotated as "matching". In these configurations, the element comparator 420 may employ the threshold module 422 to fine tune the tolerance/constraint of each element comparison 440 to initially require the element/ sub-element 210, 220 to be within strict bounds (e.g., narrow set of tolerances/constraints) of the corresponding standard element/sub-element 260, 270 and subsequently permit the element/ sub-element 210, 220 to deviate by some degree (e.g., wide set of tolerance/constraints) from the standard element/sub-element 260, 270. For instance, if the element comparator 420 is determining that multiple samples of the same structured data 200, e.g., where each sample is sourced from a different user device 102, are consistently (or by some configurable threshold) returning "differing" element comparisons 440, then the threshold module 422 may widen the tolerance/constraints to determine if subsequent results of the same element comparisons 440 change to "matching" or remain as "differing". Accordingly, each element comparison 440 may identify the element difference 430 (e.g., differing) when a narrow set of tolerance/constraints are used in the comparison but annotate the element comparison 440 as "matching" when the wider set of tolerance/constrains are used in the comparison. Thus, the verifier 150 may allow the element comparator 420 to self-learn for improving the accuracy and reliability as more samples of the structured data 200 pass through the comparator 420. In some examples, the verification device 180 (FIG. 1) provides tolerance/constraint inputs 190 to the threshold module 422 for setting initial tolerances/constrains for each element comparison 440 and/or modifying existing tolerances/constraints.
  • In the example shown, the list of element comparisons 440 indicates data of the first and third elements (e.g., first element 210a and third element 210c in FIG. 1) are "differing" relative to corresponding standard data of corresponding first and third standard elements 260 of the standard structured data 250. Here, each of the "differing" annotations of the element comparisons 440 for elements 1 and 3 are identified as a corresponding element difference 430. Moreover, the element comparisons 440 further indicate that data of the first sub-element Sub-E1 220a (FIG. 3) of the third element 210c is "matching" relative to corresponding standard data of a corresponding standard sub-element 270, data of the second sub-element Sub-E2 220b (FIG. 3) of the third element 210c is "differing" relative to corresponding standard data of a corresponding standard sub-element 270, and data of the third sub-element Sub-E3 220c (FIG. 3) of the third element 210c is "extra" indicating that the standard structured data 250 does not include a sub-element 270 corresponding to Sub-E3 220c. In some examples, any sub-elements 220 annotated as "extra" or "missing" are identified as a corresponding element difference 430. An annotation of "missing" may indicate that extraction/deconstruction of the structured data 200 does not produce a corresponding element/ sub-element 210, 220 that surfaces in the standard structured data 250. The element comparator 420 may store each of the annotated element comparisons 440 in a registry of element comparisons 164 and provide the annotated element comparisons 440 to the analyzer 500 for determining whether each element difference 430 is expected or unexpected based on a heuristic or at least one rule. Each element comparison 440 may include a corresponding identifier 442 indicating the hash or location of the element/ sub-element 210, 220 associated with element comparison 440.
  • Mutable types of structured data 200 (e.g., BIOS firmware) are expected to change by some extent each time the user device 102 reboots. For instance, BIOS firmware may contain an area to store machine specific settings which will be different for each BIOS firmware sample of structured data 200 when compared with corresponding standard structured data 250. As a result, an element difference 430 identified in an element comparison 440 between an element/ sub-element 210, 220 and a corresponding standard element/sub-element 260, 270 may be expected, and therefore, not indicative of the element/ sub-element 210, 220 containing bad or malicious code. Referring back to FIG. 1, the analyzer 500 is configured to determine, for each element difference 430 identified by the structured data comparator 400, whether the element difference 430 is "expected" or "unexpected". Here, an element difference 430 that is "expected" can be deemed allowable, or verified, by the verifier 150. On the other hand, an element difference 430 that is "unexpected" is flagged by the verifier 150 as being suspicious and provided to an alarm module 170 for generating a signal 172 indicating the presence of an unexpected element/ sub-element 210, 220 in the structured data 200. The alarm module 170 may send the signal 172 to the verification device 180 requesting verification (e.g., via a corresponding input 190) of the sample of structured data 200 sourced from the user device 102. When the signal 172 is received, the user interface 182 executing on the verification device 180 may display the indication of the presence of the unexpected element/ sub-element 210, 220 in the structured data 200 on the display 184. In the example shown, the analyzer 500 determines that the element difference 430 for the first element 210a is "unexpected" and that the element difference 430 for the third element 210c is "expected". Accordingly, the alarm module 170 may generate a signal 172 indicating the presence of the unexpected first element 210a to notify the verification device 180 that the first element 210a of the structured data 200 may include bad or malicious code that may compromise the user device(s) 102.
  • Implementations herein are directed toward a self-learning analyzer 500 having heuristic capabilities to not only identify when an element difference 430 is expected or unexpected based on the heuristic or the at least one rule, but to also allow changes/updates to the rule and/or allow identified element differences 430 to change from being "unexpected" to "expected" through statistical analysis of the registry of element comparisons 164. For instance, the registry of element comparisons 164 may update continuously as more samples of structured data 200 are received from user devices 102 and pass through the verifier 150. By statistically analyzing a most current state of the registry of element comparisons 164, the analyzer 500 may update an element difference 430 identified as "unexpected" to now be "expected" when a threshold number and/or threshold percentage of other user devices 102 also source the same element difference 430. In some examples, the heuristic rule may indicate that an element difference 430 identified in an element comparison 440 under a wide set of tolerances/constrains is "unexpected" while identifying the element difference 430 under a narrower set of tolerances/constrains is "expected". Accordingly, the tolerances/constraints used by the threshold module 422 of the element comparator 420 may interact or link to the rules indicating whether or not a corresponding element difference 430 is "unexpected" or "expected".
  • Referring to FIGS. 5A-5C, the analyzer 500 performs an example analyzation process for determining whether each element difference 430 identified by the structured data comparator 400 is expected or unexpected based on a comparison against the registry of element comparisons 164. FIGS. 5B and 5C show an example registry of element comparisons 164 corresponding to structured data 200 associated with the attributes 202 of manufacturer/creator 202a ("Manufacturer XYZ"), data type 202c ("BIOS Firmware"), and version 202b ("Version 2.1"). The registry of element comparisons 164 store the results of element comparisons 440 (i.e., from the structured data comparator 400) between elements 210 of the structured data 200 and corresponding standard elements 260 of the standard structured data 250. The standard structured data 250 may be provided by the creator/manufacturer, e.g., "Manufacturer XYZ", of multiple user devices 102, 102a-n that source the samples of the structured data 200. The registry of element comparisons 164 may include a timestamp 550. FIG. 5B includes the registry of element comparisons 164 including a timestamp 550 at a first time (Time 1) and FIG. 5C includes the registry of element comparisons 164 including a timestamp 550 at a second time (Time 2) occurring after Time 1.
  • The multiple user devices 102a-n may each be manufactured by the "Manufacturer XYZ" and belong to a fleet of user devices 102 owned and operated by an entity associated with the verification device 180. For simplicity, the registry of element comparisons 164 depicts four element comparisons 440 associated with Elements 1-4 of the structured data 200 provided by each user device 102 in the fleet and corresponding standard structured data 250 having the same manufacturer/creator, version, and data type attributes 202, 202a-c as the structured data. Here, each element comparison 440 annotates a corresponding comparison result for each of the Elements 1-4 from each of the user devices 102a-n as either Matching or Differing. However, the registry of element comparisons 164 may include more or less element comparisons 440 each associated with corresponding elements 210, 260 or any sub-elements 220, 270 deconstructed (e.g., via the deconstructor 300) from each sample of structured data 200 and the standard structured data 250. Accordingly, recursively extracted tree structures requiring element comparisons 440 between sub-elements 220 and corresponding standard sub-elements 270 may include corresponding comparison results annotated as either matching, differing, missing, or extra. Each element comparison 440 stored by the registry of element comparisons 164 may include the corresponding identifier 442 (FIG. 4) indicating the hash or location of the element/ sub-element 210, 220 associated with the element comparison 440. The registry of element comparisons 164 further includes a counter 560 that indicates at least one of a percentage of user devices 102 in the fleet or a number of user devices 102 in the fleet that return an element comparison 440 annotated as "differing" for each element comparison 440 associated with Elements 1-4. Other counters 560 may similarly be assigned to other annotations, such as, "matching", "extra", or "missing". For instance, the verification device 180 may provide inputs 190 that assign annotations for the counter 560 to count.
  • Still referring to FIGS. 5B and 5C, the example registry of element comparisons 164 further includes a corresponding whitelist 540, 540a-d for each element comparison 440 that provides a rule indicating whether an identified element difference 430 is Expected or Unexpected. Thus, each whitelist 540 codifies what changes (e.g., element differences 430) are expected and acceptable, and what changes are unexpected and need to be flagged as possibly including bad or malicious code. In the example shown, the element comparison 440 for each of Elements 1-4 includes a corresponding whitelist 540a-d. In FIG. 5B, the registry of element comparisons 164 at Time 1 includes the first, second, third, and fourth whitelists 540a, 540b, 540c, 540d for Elements 1, 2, 3, 4 all including a corresponding rule that indicates that any element comparisons 440 annotated as "differing" are Unexpected. Accordingly, the whitelists 540a-d at Time 1 may be initially set with the rule that any element comparison 440 annotated as "differing" is Unexpected. In some examples, the verification device 180 sets the rules for the different annotations as being Unexpected or Expected. In other examples, the manufacture/creator 202a associated with the registry of element comparisons 164 provides initial sets of the whitelists 540 for one or more of the Elements 1, 2, 3, 4 that may indicate when element differences 430 are Unexpected or Expected. In these examples, the whitelists 540 may be updated through the statistical analysis of the element comparisons 440 for all of the samples of structured data 200. In some scenarios, a whitelist 540 indicates that an element difference 430 annotated as "differing" is Expected but an element difference 430 annotated as "missing" or "extra" is Unexpected.
  • While conventional whitelists are manually created by humans, the analyzer 500 (e.g., data processing hardware 112) may automatically generate whitelists 540 and/or continuously update existing whitelists 540 for samples of structured data 200 associated with a particular set of one or more unique attributes 202, 202a-c. Thus, while the manually-created conventional whitelists include "static" rules that never change, the rules assigned to whitelists 540 may be automatically generated and/or dynamically updated by statistically analyzing the most recent element comparisons 440 stored in the registry of element comparisons 164. Having the ability to automatically generate and continuously update multiple whitelists 540 vastly improves processing times and accuracy for verifying structured data 200 compared to relying on manually-created conventional whitelists that include static rules without the ability to adapt or be tuned for accuracy. For example, while FIG. 5B shows the third whitelist 540c at Time 1 including the rule that the annotation of "differing" for the element comparison 440 associated with Element 3 is Unexpected, FIG. 5C shows the third whitelist 540c updating the rule at Time 2 to now be Expected after the registry of element comparisons 164 determines that a threshold number of samples also include the corresponding element comparison 440 that annotates Element 3 as "Differing". For instance, when the counter 560 of the registry of element comparisons 164 identifies that Element 3 is annotated as "differing" in at least 95-percent (95%) of the user devices 102 in the fleet, the registry of element comparisons 164 may update the third whitelist 540c so that any subsequent "differing" comparisons associated with Element 3 are Expected. The verification device 180 may send an input 190 to the analyzer 500 that includes a value for the "threshold number of samples" to change a corresponding rule of a whitelist 540 from Unexpected to Expected. The threshold number of samples associated with one whitelist 540 may be the same or different than the threshold number of samples associated with other whitelists 540.
  • On the other hand, as the counter 560 of the registry of element comparisons 164 identifies that Element 1 is annotated as "differing" in only 5-percent (5%) of the fleet of user devices 102, the registry of element comparisons 164 will maintain the first whitelist 540a at Time 2 (FIG. 5C) since the threshold number of samples (e.g., at least 95% of the fleet of user devices 102) annotating Element 1 as "differing" is not satisfied. Here, only the first user device 102a includes the first Element 1 annotated as "Differing". Accordingly, the alarm module 170 may generate the signal 172 indicating the presence of an unexpected Element 1 (e.g., the first element 210a of FIG. 1) in the structured data 200. The verification device 180 may assess the signal 172 to determine whether or not the unexpected Element 1 is the result of being infected with bad or malicious code. In some implementations, the analyzer 500 waits until all samples of the same structured data 200 from the user devices 102 in the fleet have passed through the verifier 150 to avoid prematurely sending signals 172 to the verification device 180.
  • Referring back to the analyzation process of FIG. 5A, the analyzer 500 first determines whether each element difference 430 associated with the sample of structured data 200 received from the first user device 102a (User Device a) is expected or unexpected based on the comparison against the registry of element comparisons 164. Here, an element difference 430 is identified for each of Elements 1 and 3 since Elements 1 and 3 are both annotated as "differing". At step 502, the analyzation process obtains the registry of element comparisons 164 from the data storage hardware 160 based on the attributes 202 of the sample of the structured data 200. In these examples, the "registry of element comparisons 164" corresponds to the registry of element comparisons 164 of FIG. 5C at Time 2.
  • At step 504, the analyzation process compares the element difference 430 against the registry of element comparisons 164 to determine if the "differing" element difference 430 is "unexpected". For instance, the corresponding whitelist 540 may include the corresponding rule that indicates whether the "Differing" element difference 430 is "unexpected" or "expected". For instance, the first whitelist 540a for Element 1 includes the rule indicating that the "Differing" element difference 430 is "unexpected", while the third whitelist 540c for Element 3 includes the rule indicating that the "Differing" element difference 430 is "expected".
  • When the analyzation process determines that the element difference 430 is "expected", i.e., step 504 is "No", then the analyzation process proceeds to step 506 and ignores the element difference 430 and updates the registry 164 to indicate that the element difference 430 is "expected". For the element difference 430 associated with Element 3, the analyzer 500 determines that the "Differing" element difference 430 is "expected", i.e., step 504 is "No". Conversely, when the analyzation process determines that the element difference 430 is "unexpected", i.e., step 504 is "Yes", then the analyzation process proceeds to step 508 to determine whether or not a threshold number of samples also include the corresponding element difference 430. For the element difference 430 associated with Element 1, the analyzation process determines that the "Differing" element difference 430 is "unexpected", i.e., step 504 is "Yes" and proceeds to step 508.
  • At step 508, the analyzer 500 may review the counter 560 of the registry of element comparisons 164 that indicates at least one of a percentage of the user devices 102 in the fleet or a number of the user devices 102 in the fleet that return the corresponding element difference 430, i.e., the "Differing" element difference 430 associated with Element 1. For the element difference 430 associated with Element 1, the counter 560 indicates that 5-percent (5%), or one (1) user device 102, in the fleet user devices 102 includes the corresponding "Differing" element difference 430. In some examples, the "threshold number of samples" includes at least 90- or 95-percent of the user devices 102 in the fleet to return the element difference 430. Additionally, the "threshold number of samples" may also require at least a minimum number of devices 102 in the fleet to return the corresponding element difference 430 before the threshold is satisfied. For instance, the minimum number may be about 10 devices to make sure that the number of samples is robust before overturning a rule specified by the whitelist 540. If the threshold number of samples is satisfied, i.e., step 508 is "Yes", then the analyzation process proceeds to step 510 and changes the rule of the corresponding whitelist 540 from "unexpected" to "expected".
  • If on the other hand, the threshold number of samples is not satisfied, i.e., step 508 is "No", then the analyzation process proceeds to step 512 and flags the corresponding element difference 430 as being "unexpected". Since the counter 560 of the registry of element comparisons 164 identifies that Element 1 is annotated as "differing" in only 5-percent (5%) of the fleet of user devices 102, the analyzation process may determine that the threshold number of samples is not satisfied. Accordingly, the analyzation process may flag the corresponding "Differing" element difference 430 associated with Element 1 and notify the alarm module 170. The alarm module 170 may generate the signal 172 indicating the presence of the unexpected element (Element 1) in the structured data 200 received from the first user device 102a. The verification device 180 may receive the signal 172 to determine whether or not the first user device 102a has been compromised as a result of bad or malicious code identified by the element difference associated with Element 1. The analyzation process executing by the analyzer 500 may repeat for each sample of the structured data 200 received from the other user devices 102b-n in the fleet.
  • FIG. 6 is a schematic view of an example computing device 600 that may be used to implement the systems and methods described in this document, such as the computing resource 112. The computing device 600 is intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The components shown here, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the inventions described and/or claimed in this document.
  • The computing device 600 includes a processor 610 (i.e., data processing hardware), memory 620, a storage device 630, a high-speed interface/controller 640 connecting to the memory 620 and high-speed expansion ports 650, and a low speed interface/controller 660 connecting to a low speed bus 670 and storage device 630. Each of the components 610, 620, 630, 640, 650, and 660, are interconnected using various busses, and may be mounted on a common motherboard or in other manners as appropriate. The processor 610 can process instructions for execution within the computing device 600, including instructions stored in the memory 620 or on the storage device 630 to display graphical information for a GUI on an external input/output device, such as a display 680 coupled to a high speed interface 640. In other implementations, multiple processors and/or multiple buses may be used, as appropriate, along with multiple memories and types of memory. Also, multiple computing devices 600 may be connected, with each device providing portions of the necessary operations (e.g., as a server bank, a group of blade servers, or a multi-processor system).
  • The memory 620 includes hardware (e.g., data storage hardware 160) that stores information non-transitorily within the computing device 600. The memory 620 may be a computer-readable medium, a volatile memory unit(s), or non-volatile memory unit(s). The non-transitory memory 620 may be physical devices (e.g. hardware) used to store programs (e.g., sequences of instructions) or data (e.g., program state information) on a temporary or permanent basis for use by the computing device 600. Examples of non-volatile memory include, but are not limited to, flash memory and read-only memory (ROM) / programmable read-only memory (PROM) / erasable programmable read-only memory (EPROM) / electronically erasable programmable read-only memory (EEPROM) (e.g., typically used for firmware, such as boot programs) as well as disks or tapes. Examples of volatile memory include, but are not limited to, random access memory (RAM), dynamic random access memory (DRAM), static random access memory (SRAM), phase change memory (PCM).
  • The storage device 630 is capable of providing mass storage for the computing device 600. In some implementations, the storage device 630 is a computer-readable medium. In various different implementations, the storage device 630 may be a floppy disk device, a hard disk device, an optical disk device, or a tape device, a flash memory or other similar solid state memory device, or an array of devices, including devices in a storage area network or other configurations. In additional implementations, a computer program product is tangibly embodied in an information carrier. The computer program product contains instructions that, when executed, perform one or more methods, such as those described above. The information carrier is a computer- or machine-readable medium, such as the memory 620, the storage device 630, or memory on processor 610.
  • The high speed controller 640 manages bandwidth-intensive operations for the computing device 600, while the low speed controller 660 manages lower bandwidth-intensive operations. Such allocation of duties is exemplary only. In some implementations, the high-speed controller 640 is coupled to the memory 620, the display 680 (e.g., through a graphics processor or accelerator), and to the high-speed expansion ports 650, which may accept various expansion cards (not shown). In some implementations, the low-speed controller 660 is coupled to the storage device 630 and a low-speed expansion port 670. The low-speed expansion port 670, which may include various communication ports (e.g., USB, Bluetooth, Ethernet, wireless Ethernet), may be coupled to one or more input/output devices, such as a keyboard, a pointing device, a scanner, or a networking device, such as a switch or router, e.g., through a network adapter.
  • The computing device 600 may be implemented in a number of different forms, as shown in the figure. For example, it may be implemented as a standard server 600a or multiple times in a group of such servers 600a, as a laptop computer 600b, or as part of a rack server system 600c.
  • A software application (i.e., a software resource 110) may refer to computer software that causes a computing device to perform a task. In some examples, a software application may be referred to as an "application," an "app," or a "program." Example applications include, but are not limited to, mobile applications, system diagnostic applications, system management applications, system maintenance applications, word processing applications, spreadsheet applications, messaging applications, media streaming applications, social networking applications, and gaming applications.
  • The memory hardware 110 may be physical devices used to store programs (e.g., sequences of instructions) or data (e.g., program state information) on a temporary or permanent basis for use by a computing device 110hc. The non-transitory memory hardware110hm may be volatile and/or non-volatile addressable semiconductor memory. Examples of non-volatile memory include, but are not limited to, flash memory and read-only memory (ROM) / programmable read-only memory (PROM) / erasable programmable read-only memory (EPROM) / electronically erasable programmable read-only memory (EEPROM) (e.g., typically used for firmware, such as boot programs). Examples of volatile memory include, but are not limited to, random access memory (RAM), dynamic random access memory (DRAM), static random access memory (SRAM), phase change memory (PCM) as well as disks or tapes.
  • FIG. 7 is a flowchart of an example method 700 executed by the computing device 600 of FIG. 6 for verifying structured data 200. The flowchart starts at operation 702 by receiving structured data 200 at data processing hardware 112 (executing on the computing device 600) and deconstructing, by the data processing hardware 112, the structured data 200 into corresponding elements 210 (and any sub-elements 220). The data processing hardware 112 may execute a verifier 150 that implements a deconstructor 300, a structured data comparator 400, an analyzer 500, and an alarm module 170. The data processing hardware 112 may use the deconstructor 300 to deconstruct the structured data 200. A verification device 180 in communication with the verifier 150 may request the verifier 150, e.g., via an input 190, to verify the structured data 200 from one or more user devices 102 in a fleet. At operation 704, the data processing hardware 112 obtains standard structured data 250 having corresponding standard elements 260 (and any sub-elements 270). The data processing hardware 112 may obtain the standard structured data 250 from a standard structured data registry 162 residing on the data storage hardware 160. Here, the data processing hardware 112 may retrieve the standard structured data 250 having the same one or more attributes 202 as the attributes 202 associated with the structured data 200. At operation 706, the data processing hardware 112 compares (e.g., using the structured data comparator 400) the elements/ sub-elements 210, 220 of the structured data 200 with the standard elements/sub-elements 260, 270 of the standard structured data to identify any element differences 430.
  • At operation 708, for each element difference, the data processing hardware 112 (e.g., using the analyzer 500) compares the element difference 430 against a registry of element comparisons 164, and at step 710, determines whether the element difference 430 is expected or unexpected based on a heuristic or at least one rule. The registry of element comparisons 164 may include the most current state of element differences 430 and a corresponding whitelist 540 including a rule indicating whether or not the element differences 430 is expected or unexpected. The whitelist 540 may be automatically generated by the data processing hardware 112 as samples of structured data 200 pass through the verifier 150 and/or existing whitelists 540 may be continuously updated based on samples of structured data 200 passing through the verifier 150. At operation 712, the data processing hardware 112 generates (e.g., using the alarm module 170) a signal 172 indicating the presence of an unexpected element/ sub-element 210, 220 in the structured data 200. The verification device 180 may receive the signal 172 and cause a user interface 182 to display the indication of the presence of the unexpected element/ sub-element 210, 220 in the structured data 200 on a display 184.
  • Various implementations of the systems and techniques described herein can be realized in digital electronic and/or optical circuitry, integrated circuitry, specially designed ASICs (application specific integrated circuits), computer hardware, firmware, software, and/or combinations thereof. These various implementations can include implementation in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, coupled to receive data and instructions from, and to transmit data and instructions to, a storage system, at least one input device, and at least one output device.
  • These computer programs (also known as programs, software, software applications or code) include machine instructions for a programmable processor, and can be implemented in a high-level procedural and/or object-oriented programming language, and/or in assembly/machine language. As used herein, the terms "machine-readable medium" and "computer-readable medium" refer to any computer program product, non-transitory computer readable medium, apparatus and/or device (e.g., magnetic discs, optical disks, memory, Programmable Logic Devices (PLDs)) used to provide machine instructions and/or data to a programmable processor, including a machine-readable medium that receives machine instructions as a machine-readable signal. The term "machine-readable signal" refers to any signal used to provide machine instructions and/or data to a programmable processor.
  • The processes and logic flows described in this specification can be performed by one or more programmable processors executing one or more computer programs to perform functions by operating on input data and generating output. The processes and logic flows can also be performed by special purpose logic circuitry, e.g., an FPGA (field programmable gate array) or an ASIC (application specific integrated circuit). Processors suitable for the execution of a computer program include, by way of example, both general and special purpose microprocessors, and any one or more processors of any kind of digital computer. Generally, a processor will receive instructions and data from a read only memory or a random access memory or both. The essential elements of a computer are a processor for performing instructions and one or more memory devices for storing instructions and data. Generally, a computer will also include, or be operatively coupled to receive data from or transfer data to, or both, one or more mass storage devices for storing data, e.g., magnetic, magneto optical disks, or optical disks. However, a computer need not have such devices. Computer readable media suitable for storing computer program instructions and data include all forms of non-volatile memory, media and memory devices, including by way of example semiconductor memory devices, e.g., EPROM, EEPROM, and flash memory devices; magnetic disks, e.g., internal hard disks or removable disks; magneto optical disks; and CD ROM and DVD-ROM disks. The processor and the memory can be supplemented by, or incorporated in, special purpose logic circuitry.
  • To provide for interaction with a user, one or more aspects of the disclosure can be implemented on a computer having a display device, e.g., a CRT (cathode ray tube), LCD (liquid crystal display) monitor, or touch screen for displaying information to the user and optionally a keyboard and a pointing device, e.g., a mouse or a trackball, by which the user can provide input to the computer. Other kinds of devices can be used to provide interaction with a user as well; for example, feedback provided to the user can be any form of sensory feedback, e.g., visual feedback, auditory feedback, or tactile feedback; and input from the user can be received in any form, including acoustic, speech, or tactile input. In addition, a computer can interact with a user by sending documents to and receiving documents from a device that is used by the user; for example, by sending web pages to a web browser on a user's client device in response to requests received from the web browser.
  • A number of implementations have been described. Nevertheless, it will be understood that various modifications may be made without departing from the spirit and scope of the disclosure. Accordingly, other implementations are within the scope of the following claims.
  • The following clauses are also disclosed herein:
    1. 1. A method (700) comprising:
      • receiving, at data processing hardware (112), structured data (200);
      • deconstructing, by the data processing hardware (112), the structured data (200) into corresponding elements (210);
      • obtaining, at the data processing hardware (112), standard structured data (250) having corresponding standard elements (260);
      • comparing, by the data processing hardware (112), the elements (210) of the structured data (200) with the standard elements (260) of the standard structured data (250) to identify any element differences (430); and
      • for each element difference (430):
        • comparing, by the data processing hardware (112), the element difference (430) against a registry of element comparisons (164);
        • determining, by the data processing hardware (112), whether the element difference (430) is expected or unexpected based on a heuristic or at least one rule; and
        • when the element difference (430) is unexpected, generating, by the data processing hardware (112), a signal (172) indicating the presence of an unexpected element in the structured data (200).
    2. 2. The method (700) of clause 1, further comprising, for each element difference (430), storing the corresponding comparison between the respective element of the structured data (200) with the respective standard element of the standard structured data (250) in the registry of element comparisons (164).
    3. 3. The method (700) of clause 1 or 2, further comprising statistically analyzing, by the data processing hardware (112), the registry of element comparisons (164) to determine the at least one rule indicating whether element difference (430) is expected or unexpected.
    4. 4. The method (700) of any of clauses 1-3, further comprising, for each element (210):
      • determining, by the data processing hardware (112), whether the element (210) comprises any sub-elements (220); and
      • when the element comprises sub-elements (220), deconstructing, by the data processing hardware (112), the element (210) into the corresponding sub-elements (220).
    5. 5. The method (700) of clause 4, wherein the deconstructed structured data (200) comprises a recursively extracted tree structure.
    6. 6. The method (700) of clause 4 or 5, further comprising:
      • receiving, at the data processing hardware (112), a structured data type (202c);
      • obtaining, at the data processing hardware (112), a data structure template (340) based on the structured data type (202c);
      • deconstructing, by the data processing hardware (112), the structured data (200) into corresponding elements (210) based on the data structure template (340); and
      • determining, by the data processing hardware (112), whether the element comprises any sub-elements (220) based on the data structure template (340).
    7. 7. The method (700) of any of clauses 1-6, further comprising annotating each element (210) of the structured data (200) as matching, differing, missing, or extra based on the comparison of the respective element (210) with the respective standard element (260).
    8. 8. The method (700) of clause 7, wherein comparing the elements (210) of the structured data (200) with the standard elements (260) of the standard structured data (250) comprises:
      • identifying a hash or a location of each element (210); and
      • for each element (210):
        • identifying the corresponding standard element (260) based on the hash or the location of each element (210); and
        • determining whether data of the element (210) is matching, differing, missing, or extra relative to standard data of the corresponding standard element (260).
    9. 9. The method (700) of clause 7 or 8, wherein determining whether the element difference (430) is expected or unexpected comprises marking the annotation of the respective element (210) as expected or unexpected.
    10. 10. The method (700) of any of clauses 1-9, wherein the structured data (200) comprises binary data.
    11. 11. A system (100) comprising:
      • data processing hardware (112); and
      • memory hardware (110) in communication with the data processing hardware (112), the memory hardware (110) storing instructions that when executed on the data processing hardware (112) cause the data processing hardware (112) to perform operations comprising:
        • receiving structured data (200);
        • deconstructing the structured data (200) into corresponding elements (210);
        • obtaining standard structured data (250) having corresponding standard elements (260);
        • comparing the elements (210) of the structured data (200) with the standard elements (260) of the standard structured data (250) to identify any element differences (430); and
        • for each element difference (430):
          • comparing the element difference (430) against a registry of element comparisons (164);
          • determining whether the element difference (430) is expected or unexpected based on a heuristic or at least one rule; and
          • when the element difference (430) is unexpected, generating a signal (172) indicating the presence of an unexpected element in the structured data (200).
    12. 12. The system (100) of clause 11, wherein the operations further comprise, for each element difference (430), storing the corresponding comparison between the respective element (210) of the structured data (200) with the respective standard element (260) of the standard structured data (250) in the registry of element comparisons (164).
    13. 13. The system (100) of clause 11 or 12, wherein the operations further comprise statistically analyzing the registry of element comparisons (164) to determine the at least one rule indicating whether element difference (430) is expected or unexpected.
    14. 14. The system (100) of any of clauses 11-13, wherein the operations further comprise, for each element (210):
      • determining whether the element (210) comprises any sub-elements (220); and
      • when the element (210) comprises sub-elements (220), deconstructing the element (210) into the corresponding sub-elements (220).
    15. 15. The system (100) of clause 14, wherein the deconstructed structured data (200) comprises a recursively extracted tree structure.
    16. 16. The system (100) of clause 14 or 15, wherein the operations further comprise:
      • receiving a structured data type (202c);
      • obtaining a data structure template (340) based on the structured data type (202c);
      • deconstructing the structured data (200) into corresponding elements (210) based on the data structure template (340); and
      • determining whether the element (210) comprises any sub-elements (220) based on the data structure template (340).
    17. 17. The system (100) of any of clauses 11-16, wherein the operations further comprise annotating each element (210) of the structured data (200) as matching, differing, missing, or extra based on the comparison of the respective element (210) with the respective standard element (260).
    18. 18. The system (100) of clause 17, wherein comparing the elements (210) of the structured data (200) with the standard elements (260) of the standard structured data (250) comprises:
      • identifying a hash or a location of each element (210); and
      • for each element (210):
        • identifying the corresponding standard element (260) based on the hash or the location of each element (210); and
        • determining whether data of the element (210) is matching, differing, missing, or extra relative to standard data of the corresponding standard element (210).
    19. 19. The system (100) of clause 17 or 18, wherein determining whether the element difference (430) is expected or unexpected comprises marking the annotation of the respective element (210) as expected or unexpected.
    20. 20. The system (100) of any of clauses 11-19, wherein the structured data (200) comprises binary data.

Claims (11)

  1. A method comprising:
    receiving, at data processing hardware, an indication of a possible instance of malicious activity for an element of structured data, the indication indicating that the element of structured data deviates from an assessment standard comprising attributes corresponding to the structured data;
    identifying, by the data processing hardware, a plurality of other instances of activity for the element of structured data, the plurality of other instances of activity stored in a registry in communication with the data processing hardware;
    determining, by the data processing hardware, whether the possible instance of malicious activity for the element of structured data matches other instances of activity for the element of structured data; and
    when the possible instance of malicious activity for the element of structured data fails to match other instances of activity for the element of structured data, communicating, by the data processing hardware, the possible instance of malicious activity for the element of structured data as a security finding to an entity overseeing the structured data.
  2. The method of claim 1, further comprising updating, by the data processing hardware, the registry to include the possible instance of malicious activity for the element of structured data.
  3. The method of any preceding claim, further comprising, when the possible instance of malicious activity for the element of structured data matches other instances of activity for the element of structures data, determining, by the data processing hardware, that the possible instance of malicious activity fails to correspond to a respective security finding.
  4. The method of claim 3, further comprising communicating, by the data processing hardware to a source of the indication of a possible instance of malicious activity for an element of structured data, that the possible instance of malicious activity for the element of structured data corresponds to an expected instance of behavior.
  5. The method of any preceding claim, wherein determining whether the possible instance of malicious activity for the element of structured data matches other instances of activity for the element of structured data comprises:
    identifying that a threshold number of other activity instances match the possible instance of malicious activity; and
    communicating that the possible instance of malicious activity for the element of structured data corresponds to an expected instance of behavior.
  6. The method of any preceding claim, wherein the attributes comprise at least one of creator information, version information, or data type.
  7. The method of any preceding claim, wherein the registry logs instances of activity for the element of structured data over a period of time from multiple computing devices.
  8. The method of any preceding claim, wherein determining whether the possible instance of malicious activity for the element of structured data matches other instances of activity for the element of structured data comprises determining whether the possible instance of malicious activity for the element of structured data is expected or unexpected based on a heuristic or at least one rule.
  9. The method of claim 8, further comprising statistically analyzing, by the data processing hardware, the registry to determine the at least one rule indicating whether the possible instance of malicious activity for the element of structured data is expected or unexpected.
  10. The method of any preceding claim, wherein the structured data comprises binary data.
  11. A system comprising:
    data processing hardware; and
    memory hardware in communication with the data processing hardware, the memory hardware storing instructions that when executed on the data processing hardware cause the data processing hardware to perform operations of the method of any preceding claim.
EP21171377.1A 2017-10-23 2018-07-12 Verifying structured data Withdrawn EP3876128A1 (en)

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
US15/790,453 US10783138B2 (en) 2017-10-23 2017-10-23 Verifying structured data
PCT/US2018/041770 WO2019083581A1 (en) 2017-10-23 2018-07-12 Verifying structured data
EP18749670.8A EP3616117B1 (en) 2017-10-23 2018-07-12 Verifying structured data

Related Parent Applications (1)

Application Number Title Priority Date Filing Date
EP18749670.8A Division EP3616117B1 (en) 2017-10-23 2018-07-12 Verifying structured data

Publications (1)

Publication Number Publication Date
EP3876128A1 true EP3876128A1 (en) 2021-09-08

Family

ID=63080521

Family Applications (2)

Application Number Title Priority Date Filing Date
EP18749670.8A Active EP3616117B1 (en) 2017-10-23 2018-07-12 Verifying structured data
EP21171377.1A Withdrawn EP3876128A1 (en) 2017-10-23 2018-07-12 Verifying structured data

Family Applications Before (1)

Application Number Title Priority Date Filing Date
EP18749670.8A Active EP3616117B1 (en) 2017-10-23 2018-07-12 Verifying structured data

Country Status (4)

Country Link
US (3) US10783138B2 (en)
EP (2) EP3616117B1 (en)
CN (2) CN116975915A (en)
WO (1) WO2019083581A1 (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11755745B2 (en) * 2019-01-29 2023-09-12 Johnson Controls Tyco IP Holdings LLP Systems and methods for monitoring attacks to devices
US11693841B2 (en) * 2020-01-03 2023-07-04 International Business Machines Corporation Hash for structural data with same data meaning
CN111986750B (en) * 2020-07-27 2023-12-26 北京天健源达科技股份有限公司 Structural detection method for electronic medical record template

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140096184A1 (en) * 2012-09-28 2014-04-03 Kaspersky Lab Zao System and Method for Assessing Danger of Software Using Prioritized Rules
US20160203316A1 (en) * 2015-01-14 2016-07-14 Microsoft Technology Licensing, Llc Activity model for detecting suspicious user activity
US20170262352A1 (en) * 2014-09-23 2017-09-14 Hewlett-Packard Development Company, L.P. Detecting a change to system management mode bios code

Family Cites Families (90)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4899290A (en) * 1987-10-16 1990-02-06 Digital Equipment Corporation System for specifying and executing protocols for using iterative analogy and comparative induction in a model-based computation system
US5557780A (en) * 1992-04-30 1996-09-17 Micron Technology, Inc. Electronic data interchange system for managing non-standard data
JP3724847B2 (en) * 1995-06-05 2005-12-07 株式会社日立製作所 Structured document difference extraction method and apparatus
CA2255047A1 (en) * 1998-11-30 2000-05-30 Ibm Canada Limited-Ibm Canada Limitee Comparison of hierarchical structures and merging of differences
US6560620B1 (en) * 1999-08-03 2003-05-06 Aplix Research, Inc. Hierarchical document comparison system and method
US7073122B1 (en) * 2000-09-08 2006-07-04 Sedghi Ali R Method and apparatus for extracting structured data from HTML pages
US6904588B2 (en) * 2001-07-26 2005-06-07 Tat Consultancy Services Limited Pattern-based comparison and merging of model versions
US6754676B2 (en) * 2001-09-13 2004-06-22 International Business Machines Corporation Apparatus and method for providing selective views of on-line surveys
US7774844B1 (en) * 2002-03-28 2010-08-10 Emc Corporation Intrusion detection through storage monitoring
US7653621B2 (en) * 2003-07-30 2010-01-26 Oracle International Corporation Method of determining the similarity of two strings
US7877399B2 (en) * 2003-08-15 2011-01-25 International Business Machines Corporation Method, system, and computer program product for comparing two computer files
JP4177218B2 (en) * 2003-09-24 2008-11-05 株式会社エヌ・ティ・ティ・ドコモ Document converter
US20050071217A1 (en) * 2003-09-30 2005-03-31 General Electric Company Method, system and computer product for analyzing business risk using event information extracted from natural language sources
US7606695B1 (en) 2003-09-30 2009-10-20 Sun Microsystems, Inc. Self-checking simulations using dynamic data loading
US7774320B1 (en) * 2005-04-01 2010-08-10 Apple Inc. Verifying integrity of file system data structures
US7822620B2 (en) * 2005-05-03 2010-10-26 Mcafee, Inc. Determining website reputations using automatic testing
US20070049323A1 (en) * 2005-08-25 2007-03-01 Research In Motion Limited Rogue access point detection and restriction
US8122111B2 (en) * 2006-07-25 2012-02-21 Network Appliance, Inc. System and method for server configuration control and management
US7630520B2 (en) * 2006-07-31 2009-12-08 Canadian Bank Note Company, Limited Method and system for document comparison using cross plane comparison
US8141132B2 (en) * 2006-08-15 2012-03-20 Symantec Corporation Determining an invalid request
US8789172B2 (en) * 2006-09-18 2014-07-22 The Trustees Of Columbia University In The City Of New York Methods, media, and systems for detecting attack on a digital processing device
GB0619147D0 (en) * 2006-09-28 2006-11-08 Ibm A method, apparatus or software for managing software component version identifications in a componentised software system
US7908301B2 (en) * 2007-03-30 2011-03-15 Infosys Technologies Ltd. Efficient XML joins
KR100938672B1 (en) * 2007-11-20 2010-01-25 한국전자통신연구원 The method and apparatus for detecting dll inserted by malicious code
US7885292B2 (en) * 2008-02-11 2011-02-08 International Business Machines Corporation Method, system, and computer program product for data exchange
US8078909B1 (en) * 2008-03-10 2011-12-13 Symantec Corporation Detecting file system layout discrepancies
US8745001B1 (en) 2008-03-31 2014-06-03 Symantec Operating Corporation Automated remediation of corrupted and tempered files
US7990229B2 (en) * 2008-04-01 2011-08-02 Sand9, Inc. Methods and devices for compensating a signal using resonators
US8230325B1 (en) * 2008-06-30 2012-07-24 Amazon Technologies, Inc. Structured document customizable comparison systems and methods
US7720626B2 (en) * 2008-09-22 2010-05-18 The Boeing Company Model-based dissimilarity indices for health monitoring systems
US20100251156A1 (en) * 2009-03-31 2010-09-30 American Express Travel Related Services Company, Inc. Facilitating Discovery and Re-Use of Information Constructs
US8311330B2 (en) * 2009-04-06 2012-11-13 Accenture Global Services Limited Method for the logical segmentation of contents
US8200617B2 (en) * 2009-04-15 2012-06-12 Evri, Inc. Automatic mapping of a location identifier pattern of an object to a semantic type using object metadata
US8441349B1 (en) * 2009-09-04 2013-05-14 Lockheed Martin Corporation Change detection in a monitored environment
US9536109B2 (en) 2009-10-21 2017-01-03 International Business Machines Corporation Method and system for administering a secure data repository
US20110172989A1 (en) * 2010-01-12 2011-07-14 Moraes Ian M Intelligent and parsimonious message engine
US9569285B2 (en) * 2010-02-12 2017-02-14 International Business Machines Corporation Method and system for message handling
EP2367119B1 (en) * 2010-03-15 2013-03-13 Accenture Global Services Limited Electronic file comparator
US20110296003A1 (en) * 2010-06-01 2011-12-01 Microsoft Corporation User account behavior techniques
US8869272B2 (en) * 2010-08-13 2014-10-21 Mcafee, Inc. System, method, and computer program product for preventing a modification to a domain name system setting
JP5505234B2 (en) * 2010-09-29 2014-05-28 富士通株式会社 Character string comparison program, character string comparison device, and character string comparison method
US8984396B2 (en) * 2010-11-01 2015-03-17 Architecture Technology Corporation Identifying and representing changes between extensible markup language (XML) files using symbols with data element indication and direction indication
US8438182B2 (en) * 2010-12-30 2013-05-07 Microsoft Corporation Patient identification
US9002545B2 (en) * 2011-01-07 2015-04-07 Wabtec Holding Corp. Data improvement system and method
JP5315368B2 (en) * 2011-02-28 2013-10-16 株式会社日立製作所 Document processing device
JP5417359B2 (en) * 2011-02-28 2014-02-12 株式会社日立製作所 Document evaluation support system and document evaluation support method
US8874598B2 (en) * 2011-07-21 2014-10-28 Sap Se Method and system for an executable specification
US9697238B2 (en) * 2012-03-01 2017-07-04 Microsoft Technology Licensing, Llc Drift detection and notification
WO2014064777A1 (en) * 2012-10-24 2014-05-01 株式会社 日立製作所 Document evaluation assistance system and document evaluation assistance method
US20140129526A1 (en) * 2012-11-06 2014-05-08 International Business Machines Corporation Verifying data structure consistency across computing environments
US9413515B2 (en) * 2012-12-31 2016-08-09 Spreadtrum Communications (Shanghai) Co., Ltd. Mobile terminal and method for selecting network
AU2014205389A1 (en) * 2013-01-11 2015-06-04 Db Networks, Inc. Systems and methods for detecting and mitigating threats to a structured data storage system
US20140257908A1 (en) * 2013-03-07 2014-09-11 Avaya Inc. Viewer pattern analysis
US9406089B2 (en) * 2013-04-30 2016-08-02 Intuit Inc. Video-voice preparation of electronic tax return
US20140331119A1 (en) * 2013-05-06 2014-11-06 Mcafee, Inc. Indicating website reputations during user interactions
US20140379668A1 (en) * 2013-06-21 2014-12-25 Microsoft Corporation Automated published data monitoring system
US9747259B2 (en) * 2013-07-03 2017-08-29 Infinote Corporation Searching, reviewing, comparing, modifying, and/or merging documents
US9661023B1 (en) 2013-07-12 2017-05-23 Symantec Corporation Systems and methods for automatic endpoint protection and policy management
US9172720B2 (en) * 2013-08-30 2015-10-27 Bank Of America Corporation Detecting malware using revision control logs
US9836388B1 (en) * 2013-09-26 2017-12-05 Amazon Technologies, Inc. Software testing environment that includes a duplicating proxy service
US9559840B2 (en) * 2013-10-18 2017-01-31 Globalfoundries Inc. Low-bandwidth time-embargoed content disclosure
US9547657B2 (en) * 2014-02-18 2017-01-17 Black Duck Software, Inc. Methods and systems for efficient comparison of file sets
US9483387B1 (en) * 2014-03-17 2016-11-01 Amazon Technologies, Inc. Tree comparison functionality for services
US9565204B2 (en) * 2014-07-18 2017-02-07 Empow Cyber Security Ltd. Cyber-security system and methods thereof
US9692765B2 (en) * 2014-08-21 2017-06-27 International Business Machines Corporation Event analytics for determining role-based access
US10178114B2 (en) * 2014-09-15 2019-01-08 PerimeterX, Inc. Analyzing client application behavior to detect anomalies and prevent access
US10353955B2 (en) * 2014-11-06 2019-07-16 Lexisnexis, A Division Of Reed Elsevier Inc. Systems and methods for normalized schema comparison
CN104978525A (en) * 2014-11-18 2015-10-14 哈尔滨安天科技股份有限公司 Heuristic script detection method and system based on structured exception
FR3035984B1 (en) * 2015-05-04 2018-06-29 Lexsi METHOD FOR DETECTING MALWARE SOFTWARE
EP3258375A1 (en) * 2015-05-07 2017-12-20 Cyber-Ark Software Ltd. Systems and methods for detecting and reacting to malicious activity in computer networks
US10409802B2 (en) * 2015-06-12 2019-09-10 Ab Initio Technology Llc Data quality analysis
US10366073B2 (en) * 2015-06-30 2019-07-30 Bank Of America Corporation System for automating data validation
CN105069169B (en) * 2015-08-31 2019-03-05 国家计算机网络与信息安全管理中心 A kind of detection method and device of website mirroring
US9454564B1 (en) * 2015-09-09 2016-09-27 Palantir Technologies Inc. Data integrity checks
US20180025225A1 (en) * 2015-11-29 2018-01-25 Vatbox, Ltd. System and method for generating consolidated data for electronic documents
US9703818B1 (en) * 2015-12-16 2017-07-11 International Business Machines Corporation Automatic comparison of enterprise profile analytics
WO2017120175A1 (en) * 2016-01-04 2017-07-13 RiskIQ, Inc. Techniques for infrastructure analysis of internet-based activity
US10410826B2 (en) * 2016-03-18 2019-09-10 Hitachi, Ltd. Device processing method and device processing apparatus
US10585875B2 (en) * 2016-04-06 2020-03-10 International Businses Machines Corporation Data warehouse model validation
US10776740B2 (en) * 2016-06-07 2020-09-15 International Business Machines Corporation Detecting potential root causes of data quality issues using data lineage graphs
CA3026624A1 (en) * 2016-06-08 2017-12-14 Exxonmobil Research And Engineering Company Automatic visual and acoustic analytics for event detection
CA3032282A1 (en) * 2016-07-29 2018-02-01 Magic Leap, Inc. Secure exchange of cryptographically signed records
US10706144B1 (en) * 2016-09-09 2020-07-07 Bluerisc, Inc. Cyber defense with graph theoretical approach
US10791133B2 (en) * 2016-10-21 2020-09-29 Tata Consultancy Services Limited System and method for detecting and mitigating ransomware threats
US10607134B1 (en) * 2016-12-19 2020-03-31 Jasmin Cosic Artificially intelligent systems, devices, and methods for learning and/or using an avatar's circumstances for autonomous avatar operation
US20180219881A1 (en) * 2017-01-31 2018-08-02 Qualcomm Incorporated Detecting Anomalous Hypertext Transfer Protocol (HTTP) Events from Semi-Structured Data
US10671723B2 (en) * 2017-08-01 2020-06-02 Sap Se Intrusion detection system enrichment based on system lifecycle
US10038715B1 (en) * 2017-08-01 2018-07-31 Cloudflare, Inc. Identifying and mitigating denial of service (DoS) attacks
US10887333B1 (en) * 2017-08-03 2021-01-05 Amazon Technologies, Inc. Multi-tenant threat intelligence service
US20210073819A1 (en) * 2019-09-11 2021-03-11 Defensestorm, Inc. Systems for detecting application, database, and system anomalies

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140096184A1 (en) * 2012-09-28 2014-04-03 Kaspersky Lab Zao System and Method for Assessing Danger of Software Using Prioritized Rules
US20170262352A1 (en) * 2014-09-23 2017-09-14 Hewlett-Packard Development Company, L.P. Detecting a change to system management mode bios code
US20160203316A1 (en) * 2015-01-14 2016-07-14 Microsoft Technology Licensing, Llc Activity model for detecting suspicious user activity

Also Published As

Publication number Publication date
US20190121886A1 (en) 2019-04-25
CN116975915A (en) 2023-10-31
US10783138B2 (en) 2020-09-22
CN110720101B (en) 2023-08-04
WO2019083581A1 (en) 2019-05-02
US20200387499A1 (en) 2020-12-10
EP3616117B1 (en) 2021-05-05
US20230376478A1 (en) 2023-11-23
CN110720101A (en) 2020-01-21
US11748331B2 (en) 2023-09-05
EP3616117A1 (en) 2020-03-04

Similar Documents

Publication Publication Date Title
US20230376478A1 (en) Verifying structured data
US11736530B2 (en) Framework for coordination between endpoint security and network security services
US10484419B1 (en) Classifying software modules based on fingerprinting code fragments
US9021592B2 (en) Source code analysis of inter-related code bases
EP2860657B1 (en) Determining a security status of potentially malicious files
US8336099B2 (en) Methods, hardware products, and computer program products for implementing introspection data comparison utilizing hypervisor guest introspection data
US10394579B2 (en) Automatically fixing inaccessible widgets during mobile application execution
US20120311709A1 (en) Automatic management system for group and mutant information of malicious codes
CN104956376A (en) Method and technique for application and device control in a virtualized environment
US9298926B2 (en) Remediation of security vulnerabilities in computer software
US20160098563A1 (en) Signatures for software components
US20180341769A1 (en) Threat detection method and threat detection device
US20160224791A1 (en) Process testing apparatus, process testing program, and process testing method
US20180341770A1 (en) Anomaly detection method and anomaly detection apparatus
US8938722B2 (en) Identifying errors using context based class names
US20150339476A1 (en) Methods, systems, and computer readable mediums for providing supply chain validation
CN113010268B (en) Malicious program identification method and device, storage medium and electronic equipment
US20210183497A1 (en) Systems and methods for analyzing network packets
CN110659478B (en) Method for detecting malicious files preventing analysis in isolated environment
US20210385235A1 (en) Security analysis assistance apparatus, security analysis assistance method, and computer-readable recording medium
US20220269785A1 (en) Enhanced cybersecurity analysis for malicious files detected at the endpoint level
US11296868B1 (en) Methods and system for combating cyber threats using a related object sequence hash
US8825651B1 (en) Determining a group of related products on a computing device
CN117852043A (en) Determination method and device for abnormal device, electronic device and storage medium

Legal Events

Date Code Title Description
PUAI Public reference made under article 153(3) epc to a published international application that has entered the european phase

Free format text: ORIGINAL CODE: 0009012

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: THE APPLICATION HAS BEEN PUBLISHED

AC Divisional application: reference to earlier application

Ref document number: 3616117

Country of ref document: EP

Kind code of ref document: P

AK Designated contracting states

Kind code of ref document: A1

Designated state(s): AL AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HR HU IE IS IT LI LT LU LV MC MK MT NL NO PL PT RO RS SE SI SK SM TR

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: REQUEST FOR EXAMINATION WAS MADE

17P Request for examination filed

Effective date: 20220131

RBV Designated contracting states (corrected)

Designated state(s): AL AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HR HU IE IS IT LI LT LU LV MC MK MT NL NO PL PT RO RS SE SI SK SM TR

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: EXAMINATION IS IN PROGRESS

17Q First examination report despatched

Effective date: 20230512

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: THE APPLICATION HAS BEEN WITHDRAWN

18W Application withdrawn

Effective date: 20230727