WO2024041301A1 - Method and apparatus for generating unified abstract syntax tree, and program analysis method and apparatus - Google Patents

Method and apparatus for generating unified abstract syntax tree, and program analysis method and apparatus Download PDF

Info

Publication number
WO2024041301A1
WO2024041301A1 PCT/CN2023/109532 CN2023109532W WO2024041301A1 WO 2024041301 A1 WO2024041301 A1 WO 2024041301A1 CN 2023109532 W CN2023109532 W CN 2023109532W WO 2024041301 A1 WO2024041301 A1 WO 2024041301A1
Authority
WO
WIPO (PCT)
Prior art keywords
nodes
unified
node
syntax
unique
Prior art date
Application number
PCT/CN2023/109532
Other languages
French (fr)
Chinese (zh)
Inventor
李永超
徐兆桂
刘地军
汤震浩
赵泽林
狄鹏
Original Assignee
支付宝(杭州)信息技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 支付宝(杭州)信息技术有限公司 filed Critical 支付宝(杭州)信息技术有限公司
Publication of WO2024041301A1 publication Critical patent/WO2024041301A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/40Transformation of program code
    • G06F8/41Compilation
    • G06F8/42Syntactic analysis
    • G06F8/427Parsing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/30Creation or generation of source code
    • G06F8/31Programming languages or programming paradigms

Definitions

  • the embodiments of this specification belong to the field of computer technology, and particularly relate to a method and device for generating a unified abstract syntax tree and program analysis.
  • an abstract syntax tree is an abstract representation of source code. It represents the grammatical structure of a programming language in the form of a tree. Each node on the abstract syntax tree represents a grammatical structure in the source code.
  • the object of the present invention is to provide a method and device for generating a unified abstract syntax tree and program analysis.
  • a method for generating a unified abstract syntax tree including: obtaining a program file of any programming language, and parsing the program file into a method for characterizing any programming language.
  • An original abstract syntax tree corresponding to a unique grammatical structure of a programming language; determining grammatical conversion rules corresponding to a unified grammatical structure.
  • the grammatical conversion rules include general layer rules and unique layer rules, wherein the general layer rules include: the unified layer rules The mapping relationship between the general standard nodes defined in the syntax structure and the corresponding original nodes defined in the unique syntax structures of all programming languages supported by the syntax conversion rules.
  • the unique layer rules include: the unified syntax structure The mapping relationship between the unique standard nodes defined in and the corresponding original nodes defined in the unique syntax structures corresponding to some programming languages supported by the syntax conversion rules; compare each original node included in the original abstract syntax tree with The mapping relationships included in the grammar conversion rules are matched, and each original node is converted into a corresponding standard node in each successfully matched mapping relationship, so as to obtain a unified abstract syntax tree used to represent the unified grammar structure.
  • a method of program analysis including: obtaining a unified abstract syntax tree corresponding to a program file of any programming language for representing a unified syntax structure; wherein, The standard nodes included in the unified abstract syntax tree are converted from the original nodes included in the original abstract syntax tree based on the grammar conversion rules corresponding to the unified syntax structure.
  • the original abstract syntax tree is obtained by parsing the program file and used for Characterizing the unique grammatical structure corresponding to any programming language
  • the grammatical conversion rules include universal layer rules and unique layer rules, wherein the universal layer rules include: universal standard nodes defined in the unified grammatical structure and the The mapping relationship between the corresponding original nodes defined in the unique syntax structures of all programming languages supported by the syntax conversion rules.
  • the unique layer rules include: the unique standard nodes defined in the unified syntax structure and the syntax transformation. Mapping relationships between corresponding original nodes defined in the unique syntax structures corresponding to some programming languages supported by the rules; based on the program analysis system corresponding to the unified syntax structure, perform program analysis on the unified abstract syntax tree.
  • a device for generating a unified abstract syntax tree including: a program file acquisition unit, used to obtain a program file of any programming language, and parse the program file It is an original abstract syntax tree used to characterize the unique grammatical structure corresponding to any programming language; the grammatical conversion rule determination unit is used to determine the grammatical conversion rules corresponding to the unified grammatical structure, and the grammatical conversion rules include general layer rules and unique Layer rules, wherein the general layer rules include: the general standard nodes defined in the unified syntax structure and the corresponding original nodes defined in the unique syntax structures of all programming languages supported by the syntax conversion rules.
  • the mapping relationship, the unique layer rules include: the mapping relationship between the unique standard nodes defined in the unified syntax structure and the corresponding original nodes defined in the unique syntax structures corresponding to some programming languages supported by the syntax conversion rules; A conversion unit configured to match each original node contained in the original abstract syntax tree with the mapping relationship included in the grammar conversion rule, and convert each original node into a corresponding corresponding mapping relationship in each successfully matched standard nodes to obtain a unified abstract syntax tree used to characterize the unified syntax structure.
  • a program analysis device including: A unified abstract syntax tree acquisition unit is used to acquire a unified abstract syntax tree representing a unified syntax structure corresponding to the program file of any programming language; wherein, the standard nodes included in the unified abstract syntax tree are obtained from the original abstract syntax tree. The included original nodes are converted based on the grammar conversion rules corresponding to the unified grammar structure. The original abstract syntax tree is obtained by parsing the program file and is used to characterize the unique grammar structure corresponding to the any programming language.
  • the conversion rules include general layer rules and specific layer rules, wherein the general layer rules include: the general standard nodes defined in the unified syntax structure and the unique syntax structures corresponding to all programming languages supported by the syntax conversion rules.
  • the mapping relationship between the corresponding original nodes defined, the unique layer rules include: the unique standard nodes defined in the unified syntax structure and the corresponding unique syntax structures defined in some programming languages supported by the syntax conversion rules Mapping relationships between original nodes; a program analysis unit configured to perform program analysis on the unified abstract syntax tree based on the program analysis system corresponding to the unified syntax structure.
  • an electronic device including: a processor; a memory for storing executable instructions by the processor; wherein the processor executes the executable instructions To implement the method described in any one of the first aspect or the second aspect.
  • a computer-readable storage medium on which computer instructions are stored.
  • the instructions When the instructions are executed by a processor, the instructions implement any one of the first aspect or the second aspect. The steps of the method described in the item.
  • the embodiment of this specification designs a unified grammatical structure and corresponding grammatical conversion rules.
  • the grammatical conversion rules include general layer rules and specific layer rules, which are respectively used to convert the grammatical structures that all programming languages have and the grammatical structures that only some programming languages have.
  • the grammatical structure is converted into the corresponding unified grammatical structure, thereby realizing the unified grammatical structure's complete representation of all programming languages that have been supported by the grammatical conversion rules. This enables program files in different programming languages to obtain an abstract representation of the unified grammatical structure definition.
  • the original abstract syntax trees obtained by parsing program files of multiple programming languages can be converted into a unified abstract syntax tree used to represent a unified syntax structure through syntax conversion rules.
  • This unified abstract syntax tree can indirectly represent the specific features of multiple programming languages at the same time. Syntax structure, thereby bringing convenience to the development of systems based on a unified abstract syntax tree for further processing. For example, developers of program analysis systems only need to develop a set of program analysis algorithms based on this unified syntax structure, which can be adapted to program analysis tasks for different programming languages. Even if subsequent modifications are needed, only one set of programs needs to be modified. The analysis algorithm is modified, which reduces the development cost of development, maintenance and iteration of the program analysis system.
  • Figure 1 is a flow chart of a method for generating a unified abstract syntax tree provided by an exemplary embodiment.
  • FIG. 2 is a schematic diagram of node types defined by a unified syntax structure provided by an exemplary embodiment.
  • Figure 3 is a flow chart of a program analysis method provided by an exemplary embodiment.
  • Figure 4 is a schematic structural diagram of a device provided by an exemplary embodiment.
  • Figure 5 is a block diagram of a device for generating a unified abstract syntax tree provided by an exemplary embodiment.
  • FIG. 6 is a block diagram of a program analysis device provided in an exemplary embodiment.
  • Figure 1 is a flow chart of a method for generating a unified abstract syntax tree provided by an exemplary embodiment.
  • the method includes: S102: Obtain a program file of any programming language, and parse the program file into an original abstract syntax tree used to characterize the unique syntax structure corresponding to the any programming language.
  • the program file of any programming language can be parsed by a parser for the any programming language into an original abstract syntax tree used to represent the unique syntax structure corresponding to the any programming language.
  • the program file first obtains the corresponding word segmentation result through word segmentation, and then passes the parsing function in the parser Perform grammatical analysis on the word segmentation results, and finally convert all grammatical structures contained in the program file into corresponding grammatical nodes (a proxy for abstract grammatical structures), and there is a parent-child relationship between these grammatical nodes, thus ultimately forming a structure used to represent the described
  • the original abstract syntax tree corresponding to the unique syntax structure of any programming language.
  • the syntax nodes contained in the original abstract syntax tree are called original nodes, which have corresponding node types and node attributes defined in the unique syntax structure corresponding to any programming language.
  • the node attributes of any original node include the any original node.
  • a syntax node can be a function declaration, a declaration of a static global variable, or the introduction of a file, etc. It is understandable that syntax structures can be nested.
  • the syntax node corresponding to a function declaration contains attributes introduced to a file, so the syntax node introduced by the file is a child node of the syntax node of the function declaration.
  • the syntax node of the function declaration The node is the parent node of the syntax node introduced by the file. Similar parent nodes and child nodes form a tree structure, that is, an abstract syntax tree.
  • official or community parsers can also be used.
  • the grammar conversion rules include general layer rules and unique layer rules, wherein the general layer rules include: the general standard nodes defined in the unified grammar structure and the grammar The mapping relationship between the corresponding original nodes defined in the unique syntax structures of all programming languages supported by the conversion rules.
  • the unique layer rules include: the unique standard nodes defined in the unified syntax structure and the syntax conversion rules. Mapping relationships between corresponding primitive nodes defined in the unique syntax structures corresponding to some supported programming languages.
  • the embodiment of this specification defines a unified syntax structure.
  • the syntax nodes included in the unified syntax structure are called standard nodes.
  • Any standard node defined in the unified syntax structure has a corresponding definition in the unified syntax structure.
  • node type and node attribute wherein the node attribute of any standard node includes the node type corresponding to the child node of the any standard node and/or the value corresponding to the any standard node.
  • node attribute of any standard node includes the node type corresponding to the child node of the any standard node and/or the value corresponding to the any standard node.
  • its corresponding node attribute stipulates that it has two child nodes, and the node types of the two child nodes are both expression nodes, and are used to represent the binary Operator for this binary expression on the value of the expression node.
  • Standard nodes can be divided into universal standard nodes and unique standard nodes according to their types.
  • universal standard nodes refer to the common grammatical structures that are available in all programming languages, such as the If( node in the branch node that is used to represent the conditional judgment grammatical structure. ) belongs to a general standard node; and a unique standard node refers to a unique grammatical structure that not all programming languages have.
  • it is further reflected as a part (for example, a) that the grammar conversion rule has supported Grammatical structures unique to programming languages.
  • the embodiments of this specification unify the common grammatical structures that all programming languages have and the unique grammatical structures that specific programming languages have, and integrate them together and include them in the unified grammatical structures involved in the embodiments of this specification. , so that the unified grammatical structure can theoretically support the grammatical structures that characterize all programming languages at the same time.
  • the abstract syntax tree composed of standard nodes defined in the unified syntax structure is called a unified abstract syntax tree, which represents a program written by any programming language (or a mixture of multiple programming languages) through a tree structure composed of standard nodes. ) all the syntax structures in the program file.
  • Figure 2 is a schematic diagram of node types defined by a unified syntax structure provided by an exemplary embodiment.
  • the node types of the standard nodes defined in the unified syntax structure include universal standard nodes and unique standard nodes, and the unique standard nodes can be further classified into Python nodes, C nodes, and Java nodes according to the programming language. Nodes, JavaScript nodes, and other nodes corresponding to programming languages supported by the syntax conversion rules described above.
  • the node types of general standard nodes include root nodes, expression nodes, statement nodes and type nodes; among them, the specific types of root nodes include CompileUnit; the subtypes of expression nodes include: literal nodes, definition nodes, and unary expressions.
  • the specific types of definition nodes include ClassDefinition and FunctionDefiniton.
  • the specific types of unary expression nodes include Unary.
  • the specific types of binary expression nodes include Binary and Assignment.
  • the specific types of ternary expression nodes include Condition.
  • the subtypes of statement nodes include: declaration nodes, branch nodes, loop nodes, control jump nodes, and exception nodes; among them, the specific types of declaration nodes include VariableDeclaration, the specific types of branch nodes include If, Switch, and the specific types of loop nodes Including For, ForIn, While, the specific types of control jump nodes include Return, Break, and Continue, and the specific types of exception nodes include Throw, Try, and Catch; the subtypes of type nodes include: basic type nodes and aggregate type nodes; among them, The specific types of basic type nodes include integer, float, and string, and the specific types of aggregate type nodes include Array and Scoped.
  • corresponding grammar conversion rules are also designed based on the above unified grammar structure, so that Guides the mutual conversion between the original abstract syntax tree corresponding to various programming languages and the unified abstract syntax tree corresponding to the unified syntax structure.
  • the grammar conversion rules include general layer rules and specific layer rules, wherein the general layer rules include: each general standard node defined in the unified grammar structure and all programming supported by the grammar conversion rules.
  • the unique layer rules include: each unique standard node defined in the unified syntax structure and some programming languages supported by the syntax conversion rules.
  • the mapping relationship between the corresponding original nodes defined in the corresponding unique syntax structure are also designed based on the above unified grammar structure, so that Guides the mutual conversion between the original abstract syntax tree corresponding to various programming languages and the unified abstract syntax tree corresponding to the unified syntax structure.
  • the grammar conversion rules include general layer rules and specific layer rules, wherein the general layer rules include: each general standard node defined in the unified grammar structure and
  • mapping relationship in the general layer rules is a pair of N relationships
  • mapping relationship in the specific layer rules can be A one-to-one relationship can also be a pair of M (1 ⁇ M ⁇ N).
  • the grammar conversion rules involved in the embodiments of this specification are a set of conversion rules that describe the conversion relationship between the unified grammar structure and the unique grammar structure corresponding to each programming language, and can realize the conversion between the original abstract syntax tree and the unified abstract syntax tree. Convert each other.
  • the above-mentioned grammar conversion rules support programming language A.
  • the grammar conversion rules for each general standard node in the unified grammar structure, the general standard node corresponding to programming language A is saved.
  • the mapping relationship between a specific syntax node (original node) in a unique syntax structure As a specific programming language, it may not have a unique grammatical structure, so the unique layer rules may not support every programming language for conversion rules, and maintain a certain original node in the unique grammatical structure corresponding to the programming language. and the mapping relationship between a unique standard node and .
  • the programming languages supported by the grammar conversion rules at least include: Python language, C language, Java language, PHP language, GO language, and JavaScript language. Since the grammar conversion rules can be continuously updated, the grammar conversion rules can theoretically support all programming languages.
  • S106 Match each original node contained in the original abstract syntax tree with the mapping relationship included in the syntax conversion rule, and convert each original node into a corresponding standard node in the respective successfully matched mapping relationship. , to obtain a unified abstract syntax tree used to represent the unified syntax structure.
  • the tree structure of the original abstract syntax tree can be retained, and only each original node contained therein is converted into a corresponding standard node according to the syntax conversion rules, thereby generating a program file for represents a unified abstract syntax tree under the unified syntax structure.
  • the embodiment of this specification establishes a unified abstract representation of the original nodes unique to some programming languages (that is, unique standard nodes) by stipulating unique standard nodes in a unified syntax structure while maintaining corresponding unique layer rules.
  • the unified syntax structure is highly scalable (by adding unique standard nodes and updating unique layer rules), which enables the syntax conversion rules involved in the embodiments of this specification to theoretically support the original abstract syntax corresponding to all programming languages.
  • the tree is converted into the corresponding unified abstract syntax tree.
  • the embodiment of this specification designs a unified grammatical structure and corresponding grammatical conversion rules.
  • the grammatical conversion rules include general layer rules and specific layer rules, which are respectively used to convert the grammatical structures that all programming languages have and the grammatical structures that only some programming languages have.
  • the grammatical structure is converted into the corresponding unified grammatical structure, thereby realizing the unified grammatical structure's complete representation of all programming languages that have been supported by the grammatical conversion rules. This enables program files in different programming languages to obtain an abstract representation of the unified grammatical structure definition.
  • the original abstract syntax trees obtained by parsing program files of multiple programming languages can be converted into a unified abstract syntax tree used to represent a unified syntax structure through syntax conversion rules.
  • This unified abstract syntax tree can indirectly represent the specific features of multiple programming languages at the same time. Syntax structure, thereby bringing convenience to the development of systems based on a unified abstract syntax tree for further processing. For example, developers of program analysis systems only need to develop a set of program analysis algorithms based on this unified syntax structure, which can be adapted to program analysis tasks for different programming languages. Even if subsequent modifications are needed, only one set of programs needs to be modified. The analysis algorithm is modified, which reduces the development cost of development, maintenance and iteration of the program analysis system.
  • the method further includes: obtaining a first mapping relationship between the first undetermined standard node and the first original node defined by the unique syntax structure; including the first undetermined standard node in the unified syntax structure.
  • the first mapping relationship is updated to the general layer rule; in the case where the unique standard nodes defined in the unified syntax structure include the first undetermined standard node, the first mapping relationship is updated to the Unique layer rules; neither the general standard nodes nor the unique standard nodes defined in the unified syntax structure include the first undetermined standard.
  • the first undetermined standard node is defined as a new unique standard node in the unified syntax structure, and the first mapping relationship is updated to the unique layer rule.
  • the embodiments of this specification introduce how to update the grammar conversion rules. Since the grammar conversion rules are essentially a set of mapping relationships, the grammar conversion rules can be updated by acquiring and adding new mapping relationships. Specifically, when the grammar conversion rule does not include the first mapping relationship, the first mapping relationship is updated to the grammar conversion rule, so that the first original node is included in the representation range of the unified grammar structure, and the unified grammar structure is realized. Extension.
  • the above-mentioned first mapping relationship may be derived from the conversion rule base corresponding to any programming language.
  • the conversion rule base includes each undetermined standard node and each original node in the specific syntax structure corresponding to the any programming language.
  • the conversion rule base corresponding to any programming language is updated, the updated first mapping relationship is obtained to achieve synchronization between the grammar conversion rules and the conversion rule base corresponding to any programming language.
  • Update The above first mapping relationship can also come from the manual input of the developer of the grammar conversion rule.
  • the first undetermined standard node After obtaining the first mapping relationship, it is necessary to determine the corresponding type of the first undetermined standard node in the unified syntax structure, so as to correctly update the first mapping relationship to the general layer rule or the specific layer rule. If the first undetermined standard node does not belong to any standard node defined in the unified syntax structure, it means that the current unified syntax structure cannot represent the first original node for the time being. Therefore, at this time, the first undetermined standard node can be defined as all
  • the new unique standard nodes in the unified syntax structure simultaneously update the first mapping relationship to the unique layer rules, thereby establishing a unified abstract representation of the unique original nodes in any programming language and the corresponding in the unified syntax structure.
  • the conversion rules improve the support of the grammar conversion rules for any programming language.
  • the obtaining the first mapping relationship between the first undetermined standard node and the first original node defined by the unique syntax structure includes: the first original node included in the original abstract syntax tree and the first original node included in the original abstract syntax tree. If all mapping relationships included in the grammar conversion rule fail to match, obtain the first mapping relationship; or obtain a rule update instruction, where the rule update instruction includes the first mapping relationship.
  • the first mapping relationship can be obtained to thereby Update the grammar conversion rules so that the updated grammar conversion rules can normally support the conversion of the first original node into the first to-be-determined standard node, so that in the case where the grammar conversion rules cannot fully support the any programming language (section (If an original node does not support conversion), the first mapping relationship can still be updated immediately in the grammar conversion rule to ensure that the original abstract syntax tree can be successfully converted to a unified abstract syntax tree.
  • the above-mentioned timing of obtaining the first mapping relationship may also be arbitrary.
  • the update of the relevant rule update instruction may be executed.
  • the method further includes: performing program analysis on the unified abstract syntax tree based on a program analysis system corresponding to the unified syntax structure.
  • the program analysis system since the program analysis system only needs to connect to a set of unified syntax structures, it can perform program analysis on program files of any programming language, that is, it only needs to be oriented to the described unified syntax structure used to characterize the Unifying the abstract syntax tree eliminates the need for repeated development at the functional level for different syntax structures corresponding to different programming languages. Subsequent modifications to a set of program analysis algorithms are required, reducing the development, maintenance and iteration of the program analysis system. development costs.
  • Figure 3 is a flow chart of a program analysis method provided by an exemplary embodiment.
  • the method includes: S302: Obtain a unified abstract syntax tree representing a unified syntax structure corresponding to the program file of any programming language; wherein, the standard nodes included in the unified abstract syntax tree are composed of original abstract syntax trees. The original nodes included in the syntax tree are converted based on the syntax conversion rules corresponding to the unified syntax structure.
  • the original abstract syntax tree is obtained by parsing the program file and used to characterize the unique syntax structure corresponding to any programming language
  • the grammar conversion rules include general layer rules and unique layer rules, wherein the general layer rules include: the general standard nodes defined in the unified grammar structure and the unique corresponding to all programming languages supported by the grammar conversion rules.
  • the unique layer rules include: the unique standard nodes defined in the unified syntax structure and the unique syntax structures corresponding to some programming languages supported by the syntax conversion rules.
  • the mapping relationship between the corresponding original nodes is defined; S304: Based on the program analysis system corresponding to the unified syntax structure, perform program analysis on the unified abstract syntax tree.
  • the embodiment of this specification designs a unified grammatical structure and corresponding grammatical conversion rules.
  • the grammatical conversion rules include general layer rules and specific layer rules, which are respectively used to convert the grammatical structures that all programming languages have and the grammatical structures that only some programming languages have.
  • the grammatical structure is converted into the corresponding unified grammatical structure, thereby realizing the unified grammatical structure's complete representation of all programming languages that have been supported by the grammatical conversion rules. This enables program files in different programming languages to obtain an abstract representation of the unified grammatical structure definition.
  • the original abstract syntax trees obtained by parsing program files of multiple programming languages can be converted into a unified abstract syntax tree used to represent a unified syntax structure through syntax conversion rules.
  • This unification Abstract syntax trees can indirectly represent specific grammatical structures of multiple programming languages at the same time, thus bringing convenience to the development of program analysis systems for further program analysis based on unified abstract syntax trees. Developers of program analysis systems only need to develop a set of program analysis algorithms for this unified syntax structure, which can be adapted to program analysis tasks for different programming languages. Even if subsequent modifications are needed, only one set of program analysis algorithms needs to be modified. Modifications are made to reduce the development costs of development, maintenance and iteration of the program analysis system.
  • Figure 4 is a schematic structural diagram of a device provided by an exemplary embodiment.
  • the device includes a processor 402, an internal bus 404, a network interface 406, a memory 408 and a non-volatile memory 410.
  • the processor 402 reads the corresponding computer program from the non-volatile memory 410 into the memory 408 and then runs it.
  • the execution subject of the following processing flow is not limited to each A logic unit can also be a hardware or logic device.
  • Figure 5 is a block diagram of a device for generating a unified abstract syntax tree provided in this specification according to an exemplary embodiment.
  • This device can be applied to the device shown in Figure 4 to implement the instructions of this specification.
  • the device includes: a program file acquisition unit 501, used to acquire a program file of any programming language, and parse the program file into an original abstract syntax tree used to characterize the unique syntax structure corresponding to the any programming language.
  • the grammar conversion rule determining unit 502 is used to determine the grammar conversion rules corresponding to the unified grammar structure.
  • the grammar conversion rules include general layer rules and unique layer rules, wherein the general layer rules include: the rules defined in the unified grammar structure. Mapping relationship between universal standard nodes and corresponding original nodes defined in the unique syntax structures of all programming languages supported by the syntax conversion rules.
  • the unique layer rules include: unique standards defined in the unified syntax structure. The mapping relationship between the node and the corresponding original node defined in the unique syntax structure corresponding to some programming languages supported by the syntax conversion rule.
  • Conversion unit 503 configured to match each original node contained in the original abstract syntax tree with the mapping relationship included in the syntax conversion rule, and convert each original node into a mapping relationship that has successfully been matched.
  • the corresponding standard node is used to obtain a unified abstract syntax tree used to represent the unified syntax structure.
  • the method also includes: a first mapping relationship obtaining unit 504, configured to obtain a first mapping relationship between the first undetermined standard node and the first original node defined by the unique syntax structure.
  • a first mapping relationship obtaining unit 504 configured to obtain a first mapping relationship between the first undetermined standard node and the first original node defined by the unique syntax structure.
  • the first mapping relationship updating unit 505 is configured to: when the universal standard nodes defined in the unified syntax structure include a first undetermined node, update the first mapping relationship to the universal layer rule; in the unified syntax structure, When the unique standard nodes defined in the syntax structure include the first undetermined standard node, the first mapping relationship is updated to the unique layer rules; in both the universal standard nodes and the unique standard nodes defined in the unified syntax structure, If the first undetermined standard node is not included, the first undetermined standard node is defined as a new unique standard node in the unified syntax structure, and the first mapping relationship is updated to the unique layer rule.
  • the first mapping relationship acquisition unit 504 is specifically configured to: in the case where the first original node contained in the original abstract syntax tree fails to match all mapping relationships included in the grammar conversion rule, obtain the first mapping relationship; or, obtain a rule update instruction, where the rule update instruction includes the first mapping relationship.
  • a program analysis execution unit 506 is also included, configured to perform program analysis on the unified abstract syntax tree based on the program analysis system corresponding to the unified syntax structure.
  • any standard node defined in the unified syntax structure has a corresponding node type and node attribute, wherein the node attribute of any standard node includes the node type corresponding to the child node of any standard node. and/or the value corresponding to any of the standard nodes.
  • the node types include root nodes, expression nodes, statement nodes, and type nodes; where the specific type of the root node includes CompileUnit; the subtypes of the expression node include: literal nodes, definition nodes, and unary expressions.
  • the specific types of definition nodes include ClassDefinition and FunctionDefiniton.
  • the specific types of unary expression nodes include Unary.
  • the specific types of binary expression nodes include Binary and Assignment.
  • the specific types of ternary expression nodes include Condition.
  • the subtypes of statement nodes include: declaration nodes, branch nodes, loop nodes, control jump nodes, and exception nodes; among them, the specific types of declaration nodes include VariableDeclaration, branch nodes
  • the specific types of loop nodes include If and Switch, the specific types of loop nodes include For, ForIn, and While, the specific types of control jump nodes include Return, Break, and Continue, and the specific types of exception nodes include Throw, Try, and Catch;
  • the subtypes of type nodes Types include: basic type nodes and aggregate type nodes; among them, the specific types of basic type nodes include integer, float, and string, and the specific types of aggregate type nodes include Array and Scoped.
  • the programming languages supported by the grammar conversion rules at least include: Python language, C language, Java language, PHP language, GO language, and JavaScript language.
  • Figure 6 is a block diagram of a program analysis device provided in this specification according to an exemplary embodiment.
  • This device can be applied to the equipment shown in Figure 4 to implement the technical solution of this specification.
  • the device includes: a unified abstract syntax tree acquisition unit 601, used to acquire a unified abstract syntax tree representing a unified syntax structure corresponding to a program file of any programming language; wherein, the standard nodes included in the unified abstract syntax tree are represented by The original nodes contained in the original abstract syntax tree are converted based on the grammar conversion rules corresponding to the unified grammar structure.
  • the original abstract syntax tree is obtained by parsing the program file and is used to represent the unique grammar corresponding to any programming language.
  • the grammar conversion rules include general layer rules and unique layer rules, wherein the general layer rules include: the general standard nodes defined in the unified grammar structure correspond to all programming languages supported by the grammar conversion rules.
  • the mapping relationship between the corresponding original nodes defined in the unique syntax structure, the unique layer rules include: the unique standard nodes defined in the unified syntax structure and the unique syntax corresponding to some programming languages supported by the syntax conversion rules Mapping relationships between corresponding primitive nodes defined in the structure.
  • the program analysis unit 602 is configured to perform program analysis on the unified abstract syntax tree based on the program analysis system corresponding to the unified syntax structure.
  • PLD Programmable Logic Device
  • FPGA Field Programmable Gate Array
  • HDL Hardware Description Language
  • the controller may be implemented in any suitable manner, for example, the controller may take the form of, for example, a microprocessor or processor and a computer readable medium storing computer readable program code (eg, software or firmware) executable by the (micro)processor. , logic gates, switches, Application Specific Integrated Circuit (ASIC), programmable logic controllers and embedded microcontrollers.
  • controllers include but are not limited to the following microcontrollers: ARC 625D, Atmel AT91SAM, For Microchip PIC18F26K20 and Silicone Labs C8051F320, the memory controller can also be implemented as part of the memory's control logic.
  • the controller in addition to implementing the controller in the form of pure computer-readable program code, the controller can be completely programmed with logic gates, switches, application-specific integrated circuits, programmable logic controllers and embedded logic by logically programming the method steps. Microcontroller, etc. to achieve the same function. Therefore, this controller can be considered as a hardware component, and the devices included therein for implementing various functions can also be considered as structures within the hardware component. Or even, the means for implementing various functions can be considered as structures within hardware components as well as software modules implementing the methods.
  • the systems, devices, modules or units described in the above embodiments may be implemented by computer chips or entities, or by products with certain functions.
  • a typical implementation device is a server system.
  • the computer that implements the functions of the above embodiments may be, for example, a personal computer, a laptop computer, a vehicle-mounted human-computer interaction device, a cellular phone, a camera phone, a smart phone, or a personal digital assistant. , media player, navigation device, email device, game console, tablet, wearable device, or a combination of any of these devices.
  • the functions are divided into various modules and described separately.
  • the functions of each module can be implemented in the same or multiple software and/or hardware, or the modules that implement the same function can be implemented by a combination of multiple sub-modules or sub-units, etc. .
  • the device embodiments described above are only illustrative.
  • the division of the units is only a logical function division. In actual implementation, there may be other division methods.
  • multiple units or components may be combined or integrated. to another system, or some features can be ignored, or not implemented.
  • the coupling or direct coupling or communication connection between each other shown or discussed may be through some interfaces, and the indirect coupling or communication connection of the devices or units may be in electrical, mechanical or other forms.
  • These computer program instructions may also be stored in a computer-readable memory that causes a computer or other programmable data processing apparatus to operate in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including the instruction means, the instructions
  • the device implements the functions specified in a process or processes of the flowchart and/or a block or blocks of the block diagram.
  • These computer program instructions may also be loaded onto a computer or other programmable data processing device, causing a series of operating steps to be performed on the computer or other programmable device to produce computer-implemented processing, thereby executing on the computer or other programmable device.
  • Instructions provide steps for implementing the functions specified in a process or processes of a flowchart diagram and/or a block or blocks of a block diagram.
  • a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.
  • processors CPUs
  • input/output interfaces network interfaces
  • memory volatile and non-volatile memory
  • Memory may include non-permanent storage in computer-readable media, random access memory (RAM) and/or non-volatile memory in the form of read-only memory (ROM) or flash memory (flash RAM). Memory is an example of computer-readable media.
  • RAM random access memory
  • ROM read-only memory
  • flash RAM flash random access memory
  • Computer-readable media includes both persistent and non-volatile, removable and non-removable media that can be implemented by any method or technology for storage of information.
  • Information may be computer-readable instructions, data structures, modules of programs, or other data.
  • Examples of computer storage media include, but are not limited to, phase change memory (PRAM), static random access memory (SRAM), dynamic random access memory (DRAM), other types of random access memory (RAM), and read-only memory.
  • PRAM phase change memory
  • SRAM static random access memory
  • DRAM dynamic random access memory
  • RAM random access memory
  • read-only memory read-only memory
  • ROM read-only memory
  • EEPROM electrically erasable programmable read-only memory
  • flash memory or other memory technology
  • compact disc read-only memory CD-ROM
  • DVD digital versatile disc
  • Magnetic tape magnetic tape storage, graphene storage or other magnetic storage devices or any other non-transmission medium can be used to store information that can be accessed by a computing device.
  • computer-readable media does not include transitory media, such as modulated data signals and carrier waves.
  • one or more embodiments of the present description may be provided as a method, system, or computer program product. Accordingly, one or more embodiments of the present description may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment that combines software and hardware aspects. Furthermore, one or more embodiments of the present description may employ a computer program implemented on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, etc.) having computer-usable program code embodied therein. Product form.
  • computer-usable storage media including, but not limited to, disk storage, CD-ROM, optical storage, etc.
  • program modules include routines, programs, objects, components, data structures, etc. that perform specific tasks or implement specific abstract data types.
  • program modules may also be practiced in distributed computing environments in which remote processes are connected through a communications network. equipment to perform tasks.
  • program modules may be located in both local and remote computer storage media including storage devices.

Landscapes

  • Engineering & Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Computing Systems (AREA)
  • Devices For Executing Special Programs (AREA)

Abstract

Provided in the present description are a method and apparatus for generating a unified abstract syntax tree, and a program analysis method and apparatus. The method for generating a unified abstract syntax tree comprises: acquiring a program file in any programming language, and parsing the program file into an original abstract syntax tree, which is used for representing a specific syntax structure corresponding to the programming language; determining a syntax conversion rule corresponding to a unified syntax structure, wherein the syntax conversion rule comprises a general-layer rule and a specific-layer rule, with the general-layer rule comprising: mapping relationships between general standard nodes defined in the unified syntax structure and corresponding original nodes defined in specific syntax structures respectively corresponding to all the programming languages, and the specific-layer rule comprising: mapping relationships between specific standard nodes defined in the unified syntax structure and corresponding original nodes defined in specific syntax structures corresponding to some programming languages; and converting, into a standard node, each original node included in the original abstract syntax tree, so as to obtain a unified abstract syntax tree, which is used for representing the unified syntax structure.

Description

一种生成统一抽象语法树与程序分析的方法和装置A method and device for generating unified abstract syntax tree and program analysis 技术领域Technical field
本说明书实施例属于计算机技术领域,尤其涉及一种生成统一抽象语法树与程序分析的方法和装置。The embodiments of this specification belong to the field of computer technology, and particularly relate to a method and device for generating a unified abstract syntax tree and program analysis.
背景技术Background technique
在计算机科学中,抽象语法树是源代码的一种抽象表示,它以树状的形式表征编程语言的语法结构,抽象语法树上的每个节点都表示源代码中的一种语法结构。In computer science, an abstract syntax tree is an abstract representation of source code. It represents the grammatical structure of a programming language in the form of a tree. Each node on the abstract syntax tree represents a grammatical structure in the source code.
相关技术中,不同的编程语言具有不同的语法结构,而不同编程语言的程序文件所解析得到的不同类型的抽象语法树分别用于表征相应一种程序语言的特定语法结构,这使得基于抽象语法树进行进一步处理的***需要适配不同类型的抽象语法树,以针对不同的编程语言的语法结构开发多套功能相同的程序算法,从而导致较大的开发成本。例如,程序分析***的开发者不得不为多种编程语言的程序文件解析得到的多种类型的抽象语法树分别开发多套程序分析算法,以适配不同编程语言的程序分析任务,而这些重复的开发增加了程序分析***的大小和开发成本,不利于进一步的维护与迭代。In related technologies, different programming languages have different grammatical structures, and different types of abstract syntax trees obtained by parsing program files of different programming languages are used to represent the specific grammatical structure of a corresponding programming language, which makes the abstract syntax-based The system for further processing of the tree needs to adapt to different types of abstract syntax trees to develop multiple sets of program algorithms with the same functions for the grammatical structures of different programming languages, resulting in greater development costs. For example, developers of program analysis systems have to develop multiple sets of program analysis algorithms for various types of abstract syntax trees obtained by parsing program files in multiple programming languages to adapt to program analysis tasks in different programming languages, and these duplications The development increases the size and development cost of the program analysis system, which is not conducive to further maintenance and iteration.
发明内容Contents of the invention
本发明的目的在于提供一种生成统一抽象语法树与程序分析的方法和装置。The object of the present invention is to provide a method and device for generating a unified abstract syntax tree and program analysis.
根据本说明书一个或多个实施例的第一方面,提出了一种生成统一抽象语法树的方法,包括:获取任一编程语言的程序文件,将所述程序文件解析为用于表征所述任一编程语言对应的特有语法结构的原始抽象语法树;确定统一语法结构对应的语法转换规则,所述语法转换规则包括通用层规则与特有层规则,其中,所述通用层规则包括:所述统一语法结构中定义的通用标准节点与所述语法转换规则已支持的所有编程语言各自对应的特有语法结构中定义的相应原始节点之间的映射关系,所述特有层规则包括:所述统一语法结构中定义的特有标准节点与所述语法转换规则已支持的部分编程语言对应的特有语法结构中定义的相应原始节点之间的映射关系;将所述原始抽象语法树中包含的每一原始节点与所述语法转换规则包括的映射关系进行匹配,并将所述每一原始节点转换为各自匹配成功的映射关系中对应的标准节点,得到用于表征所述统一语法结构的统一抽象语法树。According to a first aspect of one or more embodiments of this specification, a method for generating a unified abstract syntax tree is proposed, including: obtaining a program file of any programming language, and parsing the program file into a method for characterizing any programming language. An original abstract syntax tree corresponding to a unique grammatical structure of a programming language; determining grammatical conversion rules corresponding to a unified grammatical structure. The grammatical conversion rules include general layer rules and unique layer rules, wherein the general layer rules include: the unified layer rules The mapping relationship between the general standard nodes defined in the syntax structure and the corresponding original nodes defined in the unique syntax structures of all programming languages supported by the syntax conversion rules. The unique layer rules include: the unified syntax structure The mapping relationship between the unique standard nodes defined in and the corresponding original nodes defined in the unique syntax structures corresponding to some programming languages supported by the syntax conversion rules; compare each original node included in the original abstract syntax tree with The mapping relationships included in the grammar conversion rules are matched, and each original node is converted into a corresponding standard node in each successfully matched mapping relationship, so as to obtain a unified abstract syntax tree used to represent the unified grammar structure.
根据本说明书一个或多个实施例的第二方面,提出了一种程序分析的方法,包括:获取任一编程语言的程序文件对应的用于表征统一语法结构的统一抽象语法树;其中,所述统一抽象语法树中包含的标准节点由原始抽象语法树中包含的原始节点基于所述统一语法结构对应的语法转换规则转换得到,所述原始抽象语法树由所述程序文件解析得到且用于表征所述任一编程语言对应的特有语法结构,所述语法转换规则包括通用层规则与特有层规则,其中,所述通用层规则包括:所述统一语法结构中定义的通用标准节点与所述语法转换规则已支持的所有编程语言各自对应的特有语法结构中定义的相应原始节点之间的映射关系,所述特有层规则包括:所述统一语法结构中定义的特有标准节点与所述语法转换规则已支持的部分编程语言对应的特有语法结构中定义的相应原始节点之间的映射关系;基于所述统一语法结构对应的程序分析***,对所述统一抽象语法树执行程序分析。According to the second aspect of one or more embodiments of this specification, a method of program analysis is proposed, including: obtaining a unified abstract syntax tree corresponding to a program file of any programming language for representing a unified syntax structure; wherein, The standard nodes included in the unified abstract syntax tree are converted from the original nodes included in the original abstract syntax tree based on the grammar conversion rules corresponding to the unified syntax structure. The original abstract syntax tree is obtained by parsing the program file and used for Characterizing the unique grammatical structure corresponding to any programming language, the grammatical conversion rules include universal layer rules and unique layer rules, wherein the universal layer rules include: universal standard nodes defined in the unified grammatical structure and the The mapping relationship between the corresponding original nodes defined in the unique syntax structures of all programming languages supported by the syntax conversion rules. The unique layer rules include: the unique standard nodes defined in the unified syntax structure and the syntax transformation. Mapping relationships between corresponding original nodes defined in the unique syntax structures corresponding to some programming languages supported by the rules; based on the program analysis system corresponding to the unified syntax structure, perform program analysis on the unified abstract syntax tree.
根据本说明书一个或多个实施例的第三方面,提出了一种生成统一抽象语法树的装置,包括:程序文件获取单元,用于获取任一编程语言的程序文件,将所述程序文件解析为用于表征所述任一编程语言对应的特有语法结构的原始抽象语法树;语法转换规则确定单元,用于确定统一语法结构对应的语法转换规则,所述语法转换规则包括通用层规则与特有层规则,其中,所述通用层规则包括:所述统一语法结构中定义的通用标准节点与所述语法转换规则已支持的所有编程语言各自对应的特有语法结构中定义的相应原始节点之间的映射关系,所述特有层规则包括:所述统一语法结构中定义的特有标准节点与所述语法转换规则已支持的部分编程语言对应的特有语法结构中定义的相应原始节点之间的映射关系;转换单元,用于将所述原始抽象语法树中包含的每一原始节点与所述语法转换规则包括的映射关系进行匹配,并将所述每一原始节点转换为各自匹配成功的映射关系中对应的标准节点,得到用于表征所述统一语法结构的统一抽象语法树。According to the third aspect of one or more embodiments of this specification, a device for generating a unified abstract syntax tree is proposed, including: a program file acquisition unit, used to obtain a program file of any programming language, and parse the program file It is an original abstract syntax tree used to characterize the unique grammatical structure corresponding to any programming language; the grammatical conversion rule determination unit is used to determine the grammatical conversion rules corresponding to the unified grammatical structure, and the grammatical conversion rules include general layer rules and unique Layer rules, wherein the general layer rules include: the general standard nodes defined in the unified syntax structure and the corresponding original nodes defined in the unique syntax structures of all programming languages supported by the syntax conversion rules. The mapping relationship, the unique layer rules include: the mapping relationship between the unique standard nodes defined in the unified syntax structure and the corresponding original nodes defined in the unique syntax structures corresponding to some programming languages supported by the syntax conversion rules; A conversion unit configured to match each original node contained in the original abstract syntax tree with the mapping relationship included in the grammar conversion rule, and convert each original node into a corresponding corresponding mapping relationship in each successfully matched standard nodes to obtain a unified abstract syntax tree used to characterize the unified syntax structure.
根据本说明书一个或多个实施例的第四方面,提出了一种程序分析的装置,包括: 统一抽象语法树获取单元,用于获取任一编程语言的程序文件对应的用于表征统一语法结构的统一抽象语法树;其中,所述统一抽象语法树中包含的标准节点由原始抽象语法树中包含的原始节点基于所述统一语法结构对应的语法转换规则转换得到,所述原始抽象语法树由所述程序文件解析得到且用于表征所述任一编程语言对应的特有语法结构,所述语法转换规则包括通用层规则与特有层规则,其中,所述通用层规则包括:所述统一语法结构中定义的通用标准节点与所述语法转换规则已支持的所有编程语言各自对应的特有语法结构中定义的相应原始节点之间的映射关系,所述特有层规则包括:所述统一语法结构中定义的特有标准节点与所述语法转换规则已支持的部分编程语言对应的特有语法结构中定义的相应原始节点之间的映射关系;程序分析单元,用于基于所述统一语法结构对应的程序分析***,对所述统一抽象语法树执行程序分析。According to a fourth aspect of one or more embodiments of this specification, a program analysis device is proposed, including: A unified abstract syntax tree acquisition unit is used to acquire a unified abstract syntax tree representing a unified syntax structure corresponding to the program file of any programming language; wherein, the standard nodes included in the unified abstract syntax tree are obtained from the original abstract syntax tree. The included original nodes are converted based on the grammar conversion rules corresponding to the unified grammar structure. The original abstract syntax tree is obtained by parsing the program file and is used to characterize the unique grammar structure corresponding to the any programming language. The grammar The conversion rules include general layer rules and specific layer rules, wherein the general layer rules include: the general standard nodes defined in the unified syntax structure and the unique syntax structures corresponding to all programming languages supported by the syntax conversion rules. The mapping relationship between the corresponding original nodes defined, the unique layer rules include: the unique standard nodes defined in the unified syntax structure and the corresponding unique syntax structures defined in some programming languages supported by the syntax conversion rules Mapping relationships between original nodes; a program analysis unit configured to perform program analysis on the unified abstract syntax tree based on the program analysis system corresponding to the unified syntax structure.
根据本说明书一个或多个实施例的第五方面,提出了一种电子设备,包括:处理器;用于存储处理器可执行指令的存储器;其中,所述处理器通过运行所述可执行指令以实现如第一方面或第二方面中任一项所述的方法。According to a fifth aspect of one or more embodiments of this specification, an electronic device is proposed, including: a processor; a memory for storing executable instructions by the processor; wherein the processor executes the executable instructions To implement the method described in any one of the first aspect or the second aspect.
根据本说明书一个或多个实施例的第六方面,提出了一种计算机可读存储介质,其上存储有计算机指令,该指令被处理器执行时实现如第一方面或第二方面中任一项所述方法的步骤。According to a sixth aspect of one or more embodiments of the present specification, a computer-readable storage medium is proposed, on which computer instructions are stored. When the instructions are executed by a processor, the instructions implement any one of the first aspect or the second aspect. The steps of the method described in the item.
本说明书实施例设计了一种统一语法结构以及对应的语法转换规则,该语法转换规则包括通用层规则与特有层规则,分别用于将所有编程语言都具有的语法结构以及部分编程语言才具有的语法结构转换为相应的统一语法结构,从而实现了统一语法结构对语法转换规则已支持的所有编程语言的完全表征,这使得不同的编程语言的程序文件都能获得统一语法结构定义的抽象表示,即多种编程语言的程序文件解析得到的原始抽象语法树都能够通过语法转换规则转换为用于表征统一语法结构的统一抽象语法树,该统一抽象语法树能够同时间接表征多种编程语言的特定语法结构,从而给基于统一抽象语法树进行进一步处理的***的开发工作带来了便利。例如,程序分析***的开发者只需要针对这种统一语法结构开发一套程序分析算法,即可适配于针对不同的编程语言的程序分析任务,后续即使需要进行修改也只需对一套程序分析算法进行修改,降低了程序分析***的开发、维护与迭代的开发成本。The embodiment of this specification designs a unified grammatical structure and corresponding grammatical conversion rules. The grammatical conversion rules include general layer rules and specific layer rules, which are respectively used to convert the grammatical structures that all programming languages have and the grammatical structures that only some programming languages have. The grammatical structure is converted into the corresponding unified grammatical structure, thereby realizing the unified grammatical structure's complete representation of all programming languages that have been supported by the grammatical conversion rules. This enables program files in different programming languages to obtain an abstract representation of the unified grammatical structure definition. That is, the original abstract syntax trees obtained by parsing program files of multiple programming languages can be converted into a unified abstract syntax tree used to represent a unified syntax structure through syntax conversion rules. This unified abstract syntax tree can indirectly represent the specific features of multiple programming languages at the same time. Syntax structure, thereby bringing convenience to the development of systems based on a unified abstract syntax tree for further processing. For example, developers of program analysis systems only need to develop a set of program analysis algorithms based on this unified syntax structure, which can be adapted to program analysis tasks for different programming languages. Even if subsequent modifications are needed, only one set of programs needs to be modified. The analysis algorithm is modified, which reduces the development cost of development, maintenance and iteration of the program analysis system.
附图说明Description of drawings
为了更清楚地说明本说明书实施例的技术方案,下面将对实施例描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本说明书中记载的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。In order to explain the technical solutions of the embodiments of this specification more clearly, the drawings needed to be used in the description of the embodiments will be briefly introduced below. Obviously, the drawings in the following description are only some of the embodiments recorded in this specification. , for those of ordinary skill in the art, other drawings can also be obtained based on these drawings without exerting creative efforts.
图1是一示例性实施例提供的一种生成统一抽象语法树的方法的流程图。Figure 1 is a flow chart of a method for generating a unified abstract syntax tree provided by an exemplary embodiment.
图2是一示例性实施例提供的统一语法结构定义的节点类型的示意图。FIG. 2 is a schematic diagram of node types defined by a unified syntax structure provided by an exemplary embodiment.
图3是一示例性实施例提供的一种程序分析的方法的流程图。Figure 3 is a flow chart of a program analysis method provided by an exemplary embodiment.
图4是一示例性实施例提供的一种设备的结构示意图。Figure 4 is a schematic structural diagram of a device provided by an exemplary embodiment.
图5是一示例性实施例提供的一种生成统一抽象语法树的装置的框图。Figure 5 is a block diagram of a device for generating a unified abstract syntax tree provided by an exemplary embodiment.
图6是一示例性实施例提供的一种程序分析的装置的框图。FIG. 6 is a block diagram of a program analysis device provided in an exemplary embodiment.
具体实施方式Detailed ways
为了使本技术领域的人员更好地理解本说明书中的技术方案,下面将结合本说明书实施例中的附图,对本说明书实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本说明书一部分实施例,而不是全部的实施例。基于本说明书中的实施例,本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其他实施例,都应当属于本说明书保护的范围。In order to enable those skilled in the art to better understand the technical solutions in this specification, the technical solutions in the embodiments of this specification will be clearly and completely described below in conjunction with the accompanying drawings in the embodiments of this specification. Obviously, the described The embodiments are only some of the embodiments of this specification, but not all of the embodiments. Based on the embodiments in this specification, all other embodiments obtained by those of ordinary skill in the art without creative efforts should fall within the scope of protection of this specification.
请参见图1,图1是一示例性实施例提供的一种生成统一抽象语法树的方法的流程图。如图1所示,该方法包括:S102:获取任一编程语言的程序文件,将所述程序文件解析为用于表征所述任一编程语言对应的特有语法结构的原始抽象语法树。Please refer to Figure 1, which is a flow chart of a method for generating a unified abstract syntax tree provided by an exemplary embodiment. As shown in Figure 1, the method includes: S102: Obtain a program file of any programming language, and parse the program file into an original abstract syntax tree used to characterize the unique syntax structure corresponding to the any programming language.
在本说明书实施例中,任一编程语言的程序文件可以通过针对所述任一编程语言的解析器解析为用于表征所述任一编程语言对应的特有语法结构的原始抽象语法树。具体而言,所述程序文件首先通过分词得到对应的分词结果,然后通过解析器中的解析函数 对分词结果进行语法分析,最终将程序文件包含的所有语法结构转换为对应的语法节点(一种抽象语法结构的代称),而这些语法节点之间存在父子关系,从而最终构成用于表征所述任一编程语言对应的特有语法结构的原始抽象语法树。原始抽象语法树中包含的语法节点称为原始节点,其具有对应的定义在任一编程语言对应的特有语法结构中的节点类型与节点属性,其中,任一原始节点的节点属性包括该任一原始节点的子节点对应的节点类型和/或该任一原始节点对应的值。In the embodiment of this specification, the program file of any programming language can be parsed by a parser for the any programming language into an original abstract syntax tree used to represent the unique syntax structure corresponding to the any programming language. Specifically, the program file first obtains the corresponding word segmentation result through word segmentation, and then passes the parsing function in the parser Perform grammatical analysis on the word segmentation results, and finally convert all grammatical structures contained in the program file into corresponding grammatical nodes (a proxy for abstract grammatical structures), and there is a parent-child relationship between these grammatical nodes, thus ultimately forming a structure used to represent the described The original abstract syntax tree corresponding to the unique syntax structure of any programming language. The syntax nodes contained in the original abstract syntax tree are called original nodes, which have corresponding node types and node attributes defined in the unique syntax structure corresponding to any programming language. Among them, the node attributes of any original node include the any original node. The node type corresponding to the node's child nodes and/or the value corresponding to any original node.
语法节点可以为一个函数声明、一个静态全局变量的声明或一个文件的引入等。可以理解,语法结构之间可以嵌套,例如,一个函数声明对应的语法节点中包含对一个文件的引入的属性,从而文件引入的语法节点为函数声明的语法节点的子节点,函数声明的语法节点为文件引入的语法节点的父节点,类似这种父节点和子节点构成树形结构,即抽象语法树。A syntax node can be a function declaration, a declaration of a static global variable, or the introduction of a file, etc. It is understandable that syntax structures can be nested. For example, the syntax node corresponding to a function declaration contains attributes introduced to a file, so the syntax node introduced by the file is a child node of the syntax node of the function declaration. The syntax node of the function declaration The node is the parent node of the syntax node introduced by the file. Similar parent nodes and child nodes form a tree structure, that is, an abstract syntax tree.
具体实施时,为了提高通用性和减少开发成本,还可以使用官方或社区的解析器。During specific implementation, in order to improve versatility and reduce development costs, official or community parsers can also be used.
S104:确定统一语法结构对应的语法转换规则,所述语法转换规则包括通用层规则与特有层规则,其中,所述通用层规则包括:所述统一语法结构中定义的通用标准节点与所述语法转换规则已支持的所有编程语言各自对应的特有语法结构中定义的相应原始节点之间的映射关系,所述特有层规则包括:所述统一语法结构中定义的特有标准节点与所述语法转换规则已支持的部分编程语言对应的特有语法结构中定义的相应原始节点之间的映射关系。S104: Determine the grammar conversion rules corresponding to the unified grammar structure. The grammar conversion rules include general layer rules and unique layer rules, wherein the general layer rules include: the general standard nodes defined in the unified grammar structure and the grammar The mapping relationship between the corresponding original nodes defined in the unique syntax structures of all programming languages supported by the conversion rules. The unique layer rules include: the unique standard nodes defined in the unified syntax structure and the syntax conversion rules. Mapping relationships between corresponding primitive nodes defined in the unique syntax structures corresponding to some supported programming languages.
本说明书实施例定义了一种统一语法结构,所述统一语法结构中包含的语法节点称为标准节点,所述统一语法结构中定义的任一标准节点具有对应的定义在所述统一语法结构中的节点类型与节点属性,其中,所述任一标准节点的节点属性包括所述任一标准节点的子节点对应的节点类型和/或所述任一标准节点对应的值。例如,对于一个节点类型为二元表达式节点的标准节点,其对应的节点属性规定了它有两个子节点,并且两个子结点的节点类型都是表达式节点,以及用于表征该二元表达式节点的值的该二元表达式的操作符。The embodiment of this specification defines a unified syntax structure. The syntax nodes included in the unified syntax structure are called standard nodes. Any standard node defined in the unified syntax structure has a corresponding definition in the unified syntax structure. node type and node attribute, wherein the node attribute of any standard node includes the node type corresponding to the child node of the any standard node and/or the value corresponding to the any standard node. For example, for a standard node whose node type is a binary expression node, its corresponding node attribute stipulates that it has two child nodes, and the node types of the two child nodes are both expression nodes, and are used to represent the binary Operator for this binary expression on the value of the expression node.
标准节点按照类型可以分为通用标准节点与特有标准节点,其中,通用标准节点是指所有编程语言中都具备的通用的语法结构,例如分支节点中的用于表征条件判断语法结构的If(节点)就属于一种通用标准节点;而特有标准节点是指并非所有编程语言都具备的特有的语法结构,在本说明书实施例则进一步体现为所述语法转换规则已支持的部分(例如一种)编程语言所特有的语法结构。本说明书实施例通过将所有编程语言都具备的通用的语法结构与特定编程语言才具备的特有的语法结构进行了统一,将它们一起整合并囊括在了本说明书实施例所涉及的统一语法结构中,以使统一语法结构在理论上能够支持同时表征所有编程语言的语法结构。由统一语法结构中定义的标准节点所构成的抽象语法树称为统一抽象语法树,其通过由标准节点构成的树形结构来表征由任一编程语言所编写的(或者多种程序语言混合编写的)程序文件中的全部语法结构。Standard nodes can be divided into universal standard nodes and unique standard nodes according to their types. Among them, universal standard nodes refer to the common grammatical structures that are available in all programming languages, such as the If( node in the branch node that is used to represent the conditional judgment grammatical structure. ) belongs to a general standard node; and a unique standard node refers to a unique grammatical structure that not all programming languages have. In the embodiments of this specification, it is further reflected as a part (for example, a) that the grammar conversion rule has supported Grammatical structures unique to programming languages. The embodiments of this specification unify the common grammatical structures that all programming languages have and the unique grammatical structures that specific programming languages have, and integrate them together and include them in the unified grammatical structures involved in the embodiments of this specification. , so that the unified grammatical structure can theoretically support the grammatical structures that characterize all programming languages at the same time. The abstract syntax tree composed of standard nodes defined in the unified syntax structure is called a unified abstract syntax tree, which represents a program written by any programming language (or a mixture of multiple programming languages) through a tree structure composed of standard nodes. ) all the syntax structures in the program file.
请参见图2,图2是一示例性实施例提供的统一语法结构定义的节点类型的示意图。如图2所示,所述统一语法结构中定义的标准节点的节点类型包括通用标准节点与特有标准节点,而其中的特有标准节点可以按照编程语言可以进一步进行分类为Python节点、C节点、Java节点、JavaScript节点等所述语法转换规则已支持的编程语言对应的节点。Please refer to Figure 2. Figure 2 is a schematic diagram of node types defined by a unified syntax structure provided by an exemplary embodiment. As shown in Figure 2, the node types of the standard nodes defined in the unified syntax structure include universal standard nodes and unique standard nodes, and the unique standard nodes can be further classified into Python nodes, C nodes, and Java nodes according to the programming language. Nodes, JavaScript nodes, and other nodes corresponding to programming languages supported by the syntax conversion rules described above.
而通用标准节点的节点类型则包括根节点、表达式节点、语句节点与类型节点;其中,根节点的具体类型包括CompileUnit;表达式节点的子类型包括:字面量节点、定义式节点、一元表达式节点、二元表达式节点、三元表达式节点、模块节点、调用节点、成员访问节点、申请堆节点、标识符节点、关键字节点;其中,字面量节点的具体类型包括Literal、Object,定义式节点的具体类型包括ClassDefinition、FunctionDefiniton,一元表达式节点的具体类型包括Unary,二元表达式节点的具体类型包括Binary、Assignment,三元表达式节点的具体类型包括Condition,模块节点的具体类型包括Import、Export,调用节点的具体类型包括Call,成员访问节点的具体类型包括MemberAccess,申请堆节点的具体类型包括New,标识符节点的具体类型包括Identifier,关键字节点的具体类型包括This、Super;语句节点的子类型包括:声明节点、分支节点、循环节点、控制跳转节点、异常节点;其中,声明节点的具体类型包括VariableDeclaration,分支节点的具体类型包括If、Switch,循环节点的具体类型包括For、ForIn、While,控制跳转节点的具体类型包括Return、Break、Continue,异常节点的具体类型包括Throw、Try、Catch;类型节点的子类型包括:基本类型节点、聚合类型节点;其中,基本类型节点的具体类型包括integer、float、string,聚合类型节点的具体类型包括Array、Scoped。The node types of general standard nodes include root nodes, expression nodes, statement nodes and type nodes; among them, the specific types of root nodes include CompileUnit; the subtypes of expression nodes include: literal nodes, definition nodes, and unary expressions. Formula node, binary expression node, ternary expression node, module node, call node, member access node, application heap node, identifier node, keyword node; among them, the specific types of literal nodes include Literal, Object, The specific types of definition nodes include ClassDefinition and FunctionDefiniton. The specific types of unary expression nodes include Unary. The specific types of binary expression nodes include Binary and Assignment. The specific types of ternary expression nodes include Condition. The specific types of module nodes Including Import and Export, the specific types of calling nodes include Call, the specific types of member access nodes include MemberAccess, the specific types of application heap nodes include New, the specific types of identifier nodes include Identifier, and the specific types of keyword nodes include This, Super ;The subtypes of statement nodes include: declaration nodes, branch nodes, loop nodes, control jump nodes, and exception nodes; among them, the specific types of declaration nodes include VariableDeclaration, the specific types of branch nodes include If, Switch, and the specific types of loop nodes Including For, ForIn, While, the specific types of control jump nodes include Return, Break, and Continue, and the specific types of exception nodes include Throw, Try, and Catch; the subtypes of type nodes include: basic type nodes and aggregate type nodes; among them, The specific types of basic type nodes include integer, float, and string, and the specific types of aggregate type nodes include Array and Scoped.
在本说明书实施例中,还根据上述统一语法结构设计了对应的语法转换规则,从而 指导各种编程语言对应的原始抽象语法树与统一语法结构对应的统一抽象语法树之间的互相转换。具体而言,所述语法转换规则包括通用层规则与特有层规则,其中,所述通用层规则包括:所述统一语法结构中定义的各个通用标准节点与所述语法转换规则已支持的所有编程语言各自对应的特有语法结构中定义的相应原始节点之间的映射关系,所述特有层规则包括:所述统一语法结构中定义的各个特有标准节点与所述语法转换规则已支持的部分编程语言对应的特有语法结构中定义的相应原始节点之间的映射关系。显然,在所述语法转换规则已支持的所有编程语言包含至少N(N≥2)个时,通用层规则中的映射关系是一对N的关系,而特有层规则中的映射关系则可以是一对一也可以是一对M(1<M<N)的关系。通过查找上述映射关系,可以将语法转换规则已支持的任一编程语言对应的原始抽象语法树中的原始节点逐一转换为统一语法结构中对应的标准节点,从而实现原始抽象语法树到统一抽象语法树的转换;或者,也可以将统一抽象语法树中的标准节点逐一转换为语法转换规则已支持的任一编程语言对应的原始抽象语法树,从而实现统一抽象语法树到特定编程语言对应的原始抽象语法树的转换。因此,本说明书实施例涉及的语法转换规则作为一套描述了统一语法结构与各编程语言对应的特有语法结构之间转换关系的转换规则,可以实现原始抽象语法树与统一抽象语法树之间的互相转换。In the embodiment of this specification, corresponding grammar conversion rules are also designed based on the above unified grammar structure, so that Guides the mutual conversion between the original abstract syntax tree corresponding to various programming languages and the unified abstract syntax tree corresponding to the unified syntax structure. Specifically, the grammar conversion rules include general layer rules and specific layer rules, wherein the general layer rules include: each general standard node defined in the unified grammar structure and all programming supported by the grammar conversion rules. The mapping relationship between the corresponding original nodes defined in the unique syntax structure of each language. The unique layer rules include: each unique standard node defined in the unified syntax structure and some programming languages supported by the syntax conversion rules. The mapping relationship between the corresponding original nodes defined in the corresponding unique syntax structure. Obviously, when all programming languages supported by the grammar conversion rules include at least N (N≥2), the mapping relationship in the general layer rules is a pair of N relationships, while the mapping relationship in the specific layer rules can be A one-to-one relationship can also be a pair of M (1<M<N). By finding the above mapping relationship, the original nodes in the original abstract syntax tree corresponding to any programming language supported by the grammar conversion rules can be converted one by one into the corresponding standard nodes in the unified syntax structure, thereby realizing the original abstract syntax tree to the unified abstract syntax. tree conversion; alternatively, you can also convert the standard nodes in the unified abstract syntax tree one by one into the original abstract syntax tree corresponding to any programming language that has been supported by the syntax conversion rules, thereby realizing the unified abstract syntax tree to the original abstract syntax tree corresponding to a specific programming language. Transformation of abstract syntax trees. Therefore, the grammar conversion rules involved in the embodiments of this specification are a set of conversion rules that describe the conversion relationship between the unified grammar structure and the unique grammar structure corresponding to each programming language, and can realize the conversion between the original abstract syntax tree and the unified abstract syntax tree. Convert each other.
在本说明书实施例中,上述语法转换规则支持程序语言A,具体是指在语法转换规则中,针对统一语法结构中的每一个通用标准节点,均保存有该通用标准节点与程序语言A对应的特有语法结构中的某一种特定语法节点(原始节点)之间的映射关系。作为一种特定的程序语言,其可能不存在特有的语法结构,因此特有层规则可能不会针对转换规则支持每一种程序语言,都维护有该程序语言对应的特有语法结构中某一原始节点与某一特有标准节点与之间的映射关系。In the embodiment of this specification, the above-mentioned grammar conversion rules support programming language A. Specifically, in the grammar conversion rules, for each general standard node in the unified grammar structure, the general standard node corresponding to programming language A is saved. The mapping relationship between a specific syntax node (original node) in a unique syntax structure. As a specific programming language, it may not have a unique grammatical structure, so the unique layer rules may not support every programming language for conversion rules, and maintain a certain original node in the unique grammatical structure corresponding to the programming language. and the mapping relationship between a unique standard node and .
可选的,所述语法转换规则支持的编程语言至少包括:Python语言、C语言、Java语言、PHP语言、GO语言、JavaScript语言。由于所述语法转换规则可以持续进行更新,因此所述语法转换规则理论上可以支持所有的编程语言。Optionally, the programming languages supported by the grammar conversion rules at least include: Python language, C language, Java language, PHP language, GO language, and JavaScript language. Since the grammar conversion rules can be continuously updated, the grammar conversion rules can theoretically support all programming languages.
S106:将所述原始抽象语法树中包含的每一原始节点与所述语法转换规则包括的映射关系进行匹配,并将所述每一原始节点转换为各自匹配成功的映射关系中对应的标准节点,得到用于表征所述统一语法结构的统一抽象语法树。S106: Match each original node contained in the original abstract syntax tree with the mapping relationship included in the syntax conversion rule, and convert each original node into a corresponding standard node in the respective successfully matched mapping relationship. , to obtain a unified abstract syntax tree used to represent the unified syntax structure.
在获取原始抽象语法树后,可以保留该原始抽象语法树的树形结构,而只对其中包含的每个原始节点按照语法转换规则转换为对应的标准节点,从而生成用于将所述程序文件表征在所述统一语法结构下的统一抽象语法树。After obtaining the original abstract syntax tree, the tree structure of the original abstract syntax tree can be retained, and only each original node contained therein is converted into a corresponding standard node according to the syntax conversion rules, thereby generating a program file for represents a unified abstract syntax tree under the unified syntax structure.
在相关技术中,虽然存在统合两种或少量几种编程语言的抽象语法树,但由于缺少对只有部分编程语言所特有的特定语法结构中的语法节点的统一抽象表示(即缺少本说明书实施例所述的特有标准节点和特有层规则),使得相关技术中抽象语法树所表征的语法结构缺乏扩展性,这最终导致通过具有统一语法结构的统一抽象语法树来表征大量编程语言的程序文件的构想始终无法实现,而本说明书实施例则通过在统一语法结构中规定特有标准节点同时维护对应的特有层规则,建立了部分编程语言特有的原始节点的统一抽象表示(即特有标准节点),从而使得统一语法结构具有高度的可扩展性(通过添加特有标准节点以及更新特有层规则),这使得本说明书实施例所涉及的语法转换规则在理论上可以支持将所有的编程语言对应的原始抽象语法树转换为对应的统一抽象语法树。In the related art, although there are abstract syntax trees that integrate two or a small number of programming languages, due to the lack of a unified abstract representation of syntax nodes in specific syntax structures unique to only some programming languages (that is, the lack of the embodiment of this specification The unique standard nodes and unique layer rules) make the grammatical structure represented by the abstract syntax tree in the related art lack of scalability, which ultimately leads to the need to represent a large number of program files of programming languages through a unified abstract syntax tree with a unified grammatical structure. The idea has never been realized, but the embodiment of this specification establishes a unified abstract representation of the original nodes unique to some programming languages (that is, unique standard nodes) by stipulating unique standard nodes in a unified syntax structure while maintaining corresponding unique layer rules. The unified syntax structure is highly scalable (by adding unique standard nodes and updating unique layer rules), which enables the syntax conversion rules involved in the embodiments of this specification to theoretically support the original abstract syntax corresponding to all programming languages. The tree is converted into the corresponding unified abstract syntax tree.
本说明书实施例设计了一种统一语法结构以及对应的语法转换规则,该语法转换规则包括通用层规则与特有层规则,分别用于将所有编程语言都具有的语法结构以及部分编程语言才具有的语法结构转换为相应的统一语法结构,从而实现了统一语法结构对语法转换规则已支持的所有编程语言的完全表征,这使得不同的编程语言的程序文件都能获得统一语法结构定义的抽象表示,即多种编程语言的程序文件解析得到的原始抽象语法树都能够通过语法转换规则转换为用于表征统一语法结构的统一抽象语法树,该统一抽象语法树能够同时间接表征多种编程语言的特定语法结构,从而给基于统一抽象语法树进行进一步处理的***的开发工作带来了便利。例如,程序分析***的开发者只需要针对这种统一语法结构开发一套程序分析算法,即可适配于针对不同的编程语言的程序分析任务,后续即使需要进行修改也只需对一套程序分析算法进行修改,降低了程序分析***的开发、维护与迭代的开发成本。The embodiment of this specification designs a unified grammatical structure and corresponding grammatical conversion rules. The grammatical conversion rules include general layer rules and specific layer rules, which are respectively used to convert the grammatical structures that all programming languages have and the grammatical structures that only some programming languages have. The grammatical structure is converted into the corresponding unified grammatical structure, thereby realizing the unified grammatical structure's complete representation of all programming languages that have been supported by the grammatical conversion rules. This enables program files in different programming languages to obtain an abstract representation of the unified grammatical structure definition. That is, the original abstract syntax trees obtained by parsing program files of multiple programming languages can be converted into a unified abstract syntax tree used to represent a unified syntax structure through syntax conversion rules. This unified abstract syntax tree can indirectly represent the specific features of multiple programming languages at the same time. Syntax structure, thereby bringing convenience to the development of systems based on a unified abstract syntax tree for further processing. For example, developers of program analysis systems only need to develop a set of program analysis algorithms based on this unified syntax structure, which can be adapted to program analysis tasks for different programming languages. Even if subsequent modifications are needed, only one set of programs needs to be modified. The analysis algorithm is modified, which reduces the development cost of development, maintenance and iteration of the program analysis system.
可选的,还包括:获取第一待定标准节点与所述特有语法结构定义的第一原始节点之间的第一映射关系;在所述统一语法结构中定义的通用标准节点中包括第一待定节点的情况下,将第一映射关系更新至所述通用层规则;在所述统一语法结构中定义的特有标准节点中包括第一待定标准节点的情况下,将第一映射关系更新至所述特有层规则;在所述统一语法结构中定义的通用标准节点和特有标准节点中均不包括第一待定标准 节点的情况下,将第一待定标准节点定义为所述统一语法结构中新的特有标准节点,并将第一映射关系更新至所述特有层规则。Optionally, the method further includes: obtaining a first mapping relationship between the first undetermined standard node and the first original node defined by the unique syntax structure; including the first undetermined standard node in the unified syntax structure. In the case of nodes, the first mapping relationship is updated to the general layer rule; in the case where the unique standard nodes defined in the unified syntax structure include the first undetermined standard node, the first mapping relationship is updated to the Unique layer rules; neither the general standard nodes nor the unique standard nodes defined in the unified syntax structure include the first undetermined standard. In the case of a node, the first undetermined standard node is defined as a new unique standard node in the unified syntax structure, and the first mapping relationship is updated to the unique layer rule.
本说明书实施例介绍了语法转换规则的更新方式,由于语法转换规则本质上是一组映射关系的集合,因此可以通过获取并加入新的映射关系的方式来更新语法转换规则。具体而言,在语法转换规则不包含第一映射关系的情况下,将第一映射关系更新至语法转换规则,从而使得第一原始节点纳入统一语法结构的表征范围,实现了对统一语法结构的扩展。The embodiments of this specification introduce how to update the grammar conversion rules. Since the grammar conversion rules are essentially a set of mapping relationships, the grammar conversion rules can be updated by acquiring and adding new mapping relationships. Specifically, when the grammar conversion rule does not include the first mapping relationship, the first mapping relationship is updated to the grammar conversion rule, so that the first original node is included in the representation range of the unified grammar structure, and the unified grammar structure is realized. Extension.
上述第一映射关系可以来源于所述任一编程语言对应的转换规则库,所述转换规则库中包括各待定标准节点分别与所述任一编程语言对应的所述特定语法结构中各原始节点之间的映射关系,在任一编程语言对应的转换规则库发生更新的情况下,获取更新的第一映射关系实现从而实现所述语法转换规则与任一编程语言对应的转换规则库之间的同步更新;上述第一映射关系也可以来源于语法转换规则的开发者的手动输入。The above-mentioned first mapping relationship may be derived from the conversion rule base corresponding to any programming language. The conversion rule base includes each undetermined standard node and each original node in the specific syntax structure corresponding to the any programming language. When the conversion rule base corresponding to any programming language is updated, the updated first mapping relationship is obtained to achieve synchronization between the grammar conversion rules and the conversion rule base corresponding to any programming language. Update: The above first mapping relationship can also come from the manual input of the developer of the grammar conversion rule.
在获取到第一映射关系后,需要确定第一待定标准节点在统一语法结构中对应的类型,从而将第一映射关系正确更新至的通用层规则或特有层规则。如果第一待定标准节点不属于统一语法结构中定义的任何一种标准节点,则说明当前的统一语法结构暂时无法表征该第一原始节点,于是此时可以通过将第一待定标准节点定义为所述统一语法结构中新的特有标准节点同时将第一映射关系更新至所述特有层规则,从而在统一语法结构中建立了所述任一编程语言中特有的原始节点的统一抽象表示以及对应的转换规则,提高了语法转换规则对所述任一编程语言的支持性。After obtaining the first mapping relationship, it is necessary to determine the corresponding type of the first undetermined standard node in the unified syntax structure, so as to correctly update the first mapping relationship to the general layer rule or the specific layer rule. If the first undetermined standard node does not belong to any standard node defined in the unified syntax structure, it means that the current unified syntax structure cannot represent the first original node for the time being. Therefore, at this time, the first undetermined standard node can be defined as all The new unique standard nodes in the unified syntax structure simultaneously update the first mapping relationship to the unique layer rules, thereby establishing a unified abstract representation of the unique original nodes in any programming language and the corresponding in the unified syntax structure. The conversion rules improve the support of the grammar conversion rules for any programming language.
可选的,所述获取第一待定标准节点与所述特有语法结构定义的第一原始节点之间的第一映射关系,包括:在所述原始抽象语法树中包含的第一原始节点与所述语法转换规则包括的所有映射关系均匹配失败的情况下,获取第一映射关系;或者,获取规则更新指令,所述规则更新指令包括第一映射关系。Optionally, the obtaining the first mapping relationship between the first undetermined standard node and the first original node defined by the unique syntax structure includes: the first original node included in the original abstract syntax tree and the first original node included in the original abstract syntax tree. If all mapping relationships included in the grammar conversion rule fail to match, obtain the first mapping relationship; or obtain a rule update instruction, where the rule update instruction includes the first mapping relationship.
在本说明书实施例中,可以在所述原始抽象语法树中包含第一原始节点且第一原始节点与所述语法转换规则包括的所有映射关系均匹配失败的情况下,获取第一映射关系从而更新语法转换规则,以使更新后的语法转换规则能够正常支持将第一原始节点转换为第一待定标准节点,从而在所述语法转换规则不能完全支持所述任一编程语言的情况下(第一原始节点不支持进行转换的情况下),仍然可以通过在语法转换规则中即时更新第一映射关系来确保原始抽象语法树能够成功转换得到统一抽象语法树。In this embodiment of the present description, when the original abstract syntax tree contains a first original node and the first original node fails to match all the mapping relationships included in the syntax conversion rule, the first mapping relationship can be obtained to thereby Update the grammar conversion rules so that the updated grammar conversion rules can normally support the conversion of the first original node into the first to-be-determined standard node, so that in the case where the grammar conversion rules cannot fully support the any programming language (section (If an original node does not support conversion), the first mapping relationship can still be updated immediately in the grammar conversion rule to ensure that the original abstract syntax tree can be successfully converted to a unified abstract syntax tree.
当然,上述获取第一映射关系的时机也可以是任意的,例如可以在获取到包含第一映射关系的规则更新指令时,执行相关规则更新指令的更新。Of course, the above-mentioned timing of obtaining the first mapping relationship may also be arbitrary. For example, when a rule update instruction containing the first mapping relationship is obtained, the update of the relevant rule update instruction may be executed.
可选的,还包括:基于所述统一语法结构对应的程序分析***,对所述统一抽象语法树执行程序分析。Optionally, the method further includes: performing program analysis on the unified abstract syntax tree based on a program analysis system corresponding to the unified syntax structure.
在本说明书实施例中,由于程序分析***只需要对接一套统一语法结构,就能够实现对任一编程语言的程序文件执行程序分析,即只需要面向用于表征所述统一语法结构的所述统一抽象语法树,而不需要针对不同的编程语言对应的不同的语法结构进行功能层面的重复开发,后续也只需对一套程序分析算法进行修改,降低了程序分析***的开发、维护与迭代的开发成本。In the embodiment of this specification, since the program analysis system only needs to connect to a set of unified syntax structures, it can perform program analysis on program files of any programming language, that is, it only needs to be oriented to the described unified syntax structure used to characterize the Unifying the abstract syntax tree eliminates the need for repeated development at the functional level for different syntax structures corresponding to different programming languages. Subsequent modifications to a set of program analysis algorithms are required, reducing the development, maintenance and iteration of the program analysis system. development costs.
请参见图3,图3是一示例性实施例提供的一种程序分析的方法的流程图。如图3所示,该方法包括:S302:获取任一编程语言的程序文件对应的用于表征统一语法结构的统一抽象语法树;其中,所述统一抽象语法树中包含的标准节点由原始抽象语法树中包含的原始节点基于所述统一语法结构对应的语法转换规则转换得到,所述原始抽象语法树由所述程序文件解析得到且用于表征所述任一编程语言对应的特有语法结构,所述语法转换规则包括通用层规则与特有层规则,其中,所述通用层规则包括:所述统一语法结构中定义的通用标准节点与所述语法转换规则已支持的所有编程语言各自对应的特有语法结构中定义的相应原始节点之间的映射关系,所述特有层规则包括:所述统一语法结构中定义的特有标准节点与所述语法转换规则已支持的部分编程语言对应的特有语法结构中定义的相应原始节点之间的映射关系;S304:基于所述统一语法结构对应的程序分析***,对所述统一抽象语法树执行程序分析。Please refer to Figure 3. Figure 3 is a flow chart of a program analysis method provided by an exemplary embodiment. As shown in Figure 3, the method includes: S302: Obtain a unified abstract syntax tree representing a unified syntax structure corresponding to the program file of any programming language; wherein, the standard nodes included in the unified abstract syntax tree are composed of original abstract syntax trees. The original nodes included in the syntax tree are converted based on the syntax conversion rules corresponding to the unified syntax structure. The original abstract syntax tree is obtained by parsing the program file and used to characterize the unique syntax structure corresponding to any programming language, The grammar conversion rules include general layer rules and unique layer rules, wherein the general layer rules include: the general standard nodes defined in the unified grammar structure and the unique corresponding to all programming languages supported by the grammar conversion rules. The mapping relationship between the corresponding original nodes defined in the syntax structure. The unique layer rules include: the unique standard nodes defined in the unified syntax structure and the unique syntax structures corresponding to some programming languages supported by the syntax conversion rules. The mapping relationship between the corresponding original nodes is defined; S304: Based on the program analysis system corresponding to the unified syntax structure, perform program analysis on the unified abstract syntax tree.
本说明书实施例设计了一种统一语法结构以及对应的语法转换规则,该语法转换规则包括通用层规则与特有层规则,分别用于将所有编程语言都具有的语法结构以及部分编程语言才具有的语法结构转换为相应的统一语法结构,从而实现了统一语法结构对语法转换规则已支持的所有编程语言的完全表征,这使得不同的编程语言的程序文件都能获得统一语法结构定义的抽象表示,即多种编程语言的程序文件解析得到的原始抽象语法树都能够通过语法转换规则转换为用于表征统一语法结构的统一抽象语法树,该统一 抽象语法树能够同时间接表征多种编程语言的特定语法结构,从而给基于统一抽象语法树进行进一步程序分析的程序分析***的开发工作带来了便利。程序分析***的开发者只需要针对这种统一语法结构开发一套程序分析算法,即可适配于针对不同的编程语言的程序分析任务,后续即使需要进行修改也只需对一套程序分析算法进行修改,降低了程序分析***的开发、维护与迭代的开发成本。The embodiment of this specification designs a unified grammatical structure and corresponding grammatical conversion rules. The grammatical conversion rules include general layer rules and specific layer rules, which are respectively used to convert the grammatical structures that all programming languages have and the grammatical structures that only some programming languages have. The grammatical structure is converted into the corresponding unified grammatical structure, thereby realizing the unified grammatical structure's complete representation of all programming languages that have been supported by the grammatical conversion rules. This enables program files in different programming languages to obtain an abstract representation of the unified grammatical structure definition. That is, the original abstract syntax trees obtained by parsing program files of multiple programming languages can be converted into a unified abstract syntax tree used to represent a unified syntax structure through syntax conversion rules. This unification Abstract syntax trees can indirectly represent specific grammatical structures of multiple programming languages at the same time, thus bringing convenience to the development of program analysis systems for further program analysis based on unified abstract syntax trees. Developers of program analysis systems only need to develop a set of program analysis algorithms for this unified syntax structure, which can be adapted to program analysis tasks for different programming languages. Even if subsequent modifications are needed, only one set of program analysis algorithms needs to be modified. Modifications are made to reduce the development costs of development, maintenance and iteration of the program analysis system.
容易理解的是,图3所示的实施例中有关统一抽象语法树的生成过程与图1所示的实施例中的相关描述并不存在本质上的差异,前文针对图1所示实施例的描述,均适用于图3所示的实施例。It is easy to understand that there is no essential difference between the generation process of the unified abstract syntax tree in the embodiment shown in Fig. 3 and the relevant description in the embodiment shown in Fig. 1. The previous descriptions of the embodiment shown in Fig. 1 The descriptions are all applicable to the embodiment shown in FIG. 3 .
图4是一示例性实施例提供的一种设备的示意结构图。请参考图4,在硬件层面,该设备包括处理器402、内部总线404、网络接口406、内存408以及非易失性存储器410,当然还可能包括其他业务所需要的硬件。本说明书一个或多个实施例可以基于软件方式来实现,比如由处理器402从非易失性存储器410中读取对应的计算机程序到内存408中然后运行。当然,除了软件实现方式之外,本说明书一个或多个实施例并不排除其他实现方式,比如逻辑器件抑或软硬件结合的方式等等,也就是说以下处理流程的执行主体并不限定于各个逻辑单元,也可以是硬件或逻辑器件。Figure 4 is a schematic structural diagram of a device provided by an exemplary embodiment. Please refer to Figure 4. At the hardware level, the device includes a processor 402, an internal bus 404, a network interface 406, a memory 408 and a non-volatile memory 410. Of course, it may also include other hardware required by the business. One or more embodiments of this specification can be implemented based on software. For example, the processor 402 reads the corresponding computer program from the non-volatile memory 410 into the memory 408 and then runs it. Of course, in addition to software implementation, one or more embodiments of this specification do not exclude other implementations, such as logic devices or a combination of software and hardware, etc. That is to say, the execution subject of the following processing flow is not limited to each A logic unit can also be a hardware or logic device.
如图5所示,图5是本说明书根据一示例性实施例提供的一种生成统一抽象语法树的装置的框图,该装置可以应用于如图4所示的设备中,以实现本说明书的技术方案。该装置包括:程序文件获取单元501,用于获取任一编程语言的程序文件,将所述程序文件解析为用于表征所述任一编程语言对应的特有语法结构的原始抽象语法树。As shown in Figure 5, Figure 5 is a block diagram of a device for generating a unified abstract syntax tree provided in this specification according to an exemplary embodiment. This device can be applied to the device shown in Figure 4 to implement the instructions of this specification. Technical solutions. The device includes: a program file acquisition unit 501, used to acquire a program file of any programming language, and parse the program file into an original abstract syntax tree used to characterize the unique syntax structure corresponding to the any programming language.
语法转换规则确定单元502,用于确定统一语法结构对应的语法转换规则,所述语法转换规则包括通用层规则与特有层规则,其中,所述通用层规则包括:所述统一语法结构中定义的通用标准节点与所述语法转换规则已支持的所有编程语言各自对应的特有语法结构中定义的相应原始节点之间的映射关系,所述特有层规则包括:所述统一语法结构中定义的特有标准节点与所述语法转换规则已支持的部分编程语言对应的特有语法结构中定义的相应原始节点之间的映射关系。The grammar conversion rule determining unit 502 is used to determine the grammar conversion rules corresponding to the unified grammar structure. The grammar conversion rules include general layer rules and unique layer rules, wherein the general layer rules include: the rules defined in the unified grammar structure. Mapping relationship between universal standard nodes and corresponding original nodes defined in the unique syntax structures of all programming languages supported by the syntax conversion rules. The unique layer rules include: unique standards defined in the unified syntax structure. The mapping relationship between the node and the corresponding original node defined in the unique syntax structure corresponding to some programming languages supported by the syntax conversion rule.
转换单元503,用于将所述原始抽象语法树中包含的每一原始节点与所述语法转换规则包括的映射关系进行匹配,并将所述每一原始节点转换为各自匹配成功的映射关系中对应的标准节点,得到用于表征所述统一语法结构的统一抽象语法树。Conversion unit 503, configured to match each original node contained in the original abstract syntax tree with the mapping relationship included in the syntax conversion rule, and convert each original node into a mapping relationship that has successfully been matched. The corresponding standard node is used to obtain a unified abstract syntax tree used to represent the unified syntax structure.
可选的,还包括:第一映射关系获取单元504,用于获取第一待定标准节点与所述特有语法结构定义的第一原始节点之间的第一映射关系。Optionally, the method also includes: a first mapping relationship obtaining unit 504, configured to obtain a first mapping relationship between the first undetermined standard node and the first original node defined by the unique syntax structure.
第一映射关系更新单元505,用于:在所述统一语法结构中定义的通用标准节点中包括第一待定节点的情况下,将第一映射关系更新至所述通用层规则;在所述统一语法结构中定义的特有标准节点中包括第一待定标准节点的情况下,将第一映射关系更新至所述特有层规则;在所述统一语法结构中定义的通用标准节点和特有标准节点中均不包括第一待定标准节点的情况下,将第一待定标准节点定义为所述统一语法结构中新的特有标准节点,并将第一映射关系更新至所述特有层规则。The first mapping relationship updating unit 505 is configured to: when the universal standard nodes defined in the unified syntax structure include a first undetermined node, update the first mapping relationship to the universal layer rule; in the unified syntax structure, When the unique standard nodes defined in the syntax structure include the first undetermined standard node, the first mapping relationship is updated to the unique layer rules; in both the universal standard nodes and the unique standard nodes defined in the unified syntax structure, If the first undetermined standard node is not included, the first undetermined standard node is defined as a new unique standard node in the unified syntax structure, and the first mapping relationship is updated to the unique layer rule.
可选的,所述第一映射关系获取单元504具体用于:在所述原始抽象语法树中包含的第一原始节点与所述语法转换规则包括的所有映射关系均匹配失败的情况下,获取第一映射关系;或者,获取规则更新指令,所述规则更新指令包括第一映射关系。Optionally, the first mapping relationship acquisition unit 504 is specifically configured to: in the case where the first original node contained in the original abstract syntax tree fails to match all mapping relationships included in the grammar conversion rule, obtain the first mapping relationship; or, obtain a rule update instruction, where the rule update instruction includes the first mapping relationship.
可选的,还包括:程序分析执行单元506,用于基于所述统一语法结构对应的程序分析***,对所述统一抽象语法树执行程序分析。Optionally, a program analysis execution unit 506 is also included, configured to perform program analysis on the unified abstract syntax tree based on the program analysis system corresponding to the unified syntax structure.
可选的,所述统一语法结构中定义的任一标准节点具有对应的节点类型与节点属性,其中,所述任一标准节点的节点属性包括所述任一标准节点的子节点对应的节点类型和/或所述任一标准节点对应的值。Optionally, any standard node defined in the unified syntax structure has a corresponding node type and node attribute, wherein the node attribute of any standard node includes the node type corresponding to the child node of any standard node. and/or the value corresponding to any of the standard nodes.
可选的,所述节点类型包括根节点、表达式节点、语句节点与类型节点;其中,根节点的具体类型包括CompileUnit;表达式节点的子类型包括:字面量节点、定义式节点、一元表达式节点、二元表达式节点、三元表达式节点、模块节点、调用节点、成员访问节点、申请堆节点、标识符节点、关键字节点;其中,字面量节点的具体类型包括Literal、Object,定义式节点的具体类型包括ClassDefinition、FunctionDefiniton,一元表达式节点的具体类型包括Unary,二元表达式节点的具体类型包括Binary、Assignment,三元表达式节点的具体类型包括Condition,模块节点的具体类型包括Import、Export,调用节点的具体类型包括Call,成员访问节点的具体类型包括MemberAccess,申请堆节点的具体类型包括New,标识符节点的具体类型包括Identifier,关键字节点的具体类型包括This、Super;语句节点的子类型包括:声明节点、分支节点、循环节点、控制跳转节点、异常节点;其中,声明节点的具体类型包括VariableDeclaration,分支节点 的具体类型包括If、Switch,循环节点的具体类型包括For、ForIn、While,控制跳转节点的具体类型包括Return、Break、Continue,异常节点的具体类型包括Throw、Try、Catch;类型节点的子类型包括:基本类型节点、聚合类型节点;其中,基本类型节点的具体类型包括integer、float、string,聚合类型节点的具体类型包括Array、Scoped。Optionally, the node types include root nodes, expression nodes, statement nodes, and type nodes; where the specific type of the root node includes CompileUnit; the subtypes of the expression node include: literal nodes, definition nodes, and unary expressions. Formula node, binary expression node, ternary expression node, module node, call node, member access node, application heap node, identifier node, keyword node; among them, the specific types of literal nodes include Literal, Object, The specific types of definition nodes include ClassDefinition and FunctionDefiniton. The specific types of unary expression nodes include Unary. The specific types of binary expression nodes include Binary and Assignment. The specific types of ternary expression nodes include Condition. The specific types of module nodes Including Import and Export, the specific types of calling nodes include Call, the specific types of member access nodes include MemberAccess, the specific types of application heap nodes include New, the specific types of identifier nodes include Identifier, and the specific types of keyword nodes include This, Super ;The subtypes of statement nodes include: declaration nodes, branch nodes, loop nodes, control jump nodes, and exception nodes; among them, the specific types of declaration nodes include VariableDeclaration, branch nodes The specific types of loop nodes include If and Switch, the specific types of loop nodes include For, ForIn, and While, the specific types of control jump nodes include Return, Break, and Continue, and the specific types of exception nodes include Throw, Try, and Catch; the subtypes of type nodes Types include: basic type nodes and aggregate type nodes; among them, the specific types of basic type nodes include integer, float, and string, and the specific types of aggregate type nodes include Array and Scoped.
可选的,所述语法转换规则支持的编程语言至少包括:Python语言、C语言、Java语言、PHP语言、GO语言、JavaScript语言。Optionally, the programming languages supported by the grammar conversion rules at least include: Python language, C language, Java language, PHP language, GO language, and JavaScript language.
如图6所示,图6是本说明书根据一示例性实施例提供的一种程序分析的装置的框图,该装置可以应用于如图4所示的设备中,以实现本说明书的技术方案。该装置包括:统一抽象语法树获取单元601,用于获取任一编程语言的程序文件对应的用于表征统一语法结构的统一抽象语法树;其中,所述统一抽象语法树中包含的标准节点由原始抽象语法树中包含的原始节点基于所述统一语法结构对应的语法转换规则转换得到,所述原始抽象语法树由所述程序文件解析得到且用于表征所述任一编程语言对应的特有语法结构,所述语法转换规则包括通用层规则与特有层规则,其中,所述通用层规则包括:所述统一语法结构中定义的通用标准节点与所述语法转换规则已支持的所有编程语言各自对应的特有语法结构中定义的相应原始节点之间的映射关系,所述特有层规则包括:所述统一语法结构中定义的特有标准节点与所述语法转换规则已支持的部分编程语言对应的特有语法结构中定义的相应原始节点之间的映射关系。As shown in Figure 6, Figure 6 is a block diagram of a program analysis device provided in this specification according to an exemplary embodiment. This device can be applied to the equipment shown in Figure 4 to implement the technical solution of this specification. The device includes: a unified abstract syntax tree acquisition unit 601, used to acquire a unified abstract syntax tree representing a unified syntax structure corresponding to a program file of any programming language; wherein, the standard nodes included in the unified abstract syntax tree are represented by The original nodes contained in the original abstract syntax tree are converted based on the grammar conversion rules corresponding to the unified grammar structure. The original abstract syntax tree is obtained by parsing the program file and is used to represent the unique grammar corresponding to any programming language. Structure, the grammar conversion rules include general layer rules and unique layer rules, wherein the general layer rules include: the general standard nodes defined in the unified grammar structure correspond to all programming languages supported by the grammar conversion rules. The mapping relationship between the corresponding original nodes defined in the unique syntax structure, the unique layer rules include: the unique standard nodes defined in the unified syntax structure and the unique syntax corresponding to some programming languages supported by the syntax conversion rules Mapping relationships between corresponding primitive nodes defined in the structure.
程序分析单元602,用于基于所述统一语法结构对应的程序分析***,对所述统一抽象语法树执行程序分析。The program analysis unit 602 is configured to perform program analysis on the unified abstract syntax tree based on the program analysis system corresponding to the unified syntax structure.
上述装置实施例与前述方法实施例相对应,不存在本质上的差异,前文针对图1和图3所示实施例的描述,均适用于图5和6所示的实施例,这里不再赘述。The above device embodiments correspond to the foregoing method embodiments, and there is no essential difference. The previous descriptions of the embodiments shown in Figures 1 and 3 are applicable to the embodiments shown in Figures 5 and 6, and will not be described again here. .
在20世纪90年代,对于一个技术的改进可以很明显地区分是硬件上的改进(例如,对二极管、晶体管、开关等电路结构的改进)还是软件上的改进(对于方法流程的改进)。然而,随着技术的发展,当今的很多方法流程的改进已经可以视为硬件电路结构的直接改进。设计人员几乎都通过将改进的方法流程编程到硬件电路中来得到相应的硬件电路结构。因此,不能说一个方法流程的改进就不能用硬件实体模块来实现。例如,可编程逻辑器件(Programmable Logic Device,PLD)(例如现场可编程门阵列(Field Programmable Gate Array,FPGA))就是这样一种集成电路,其逻辑功能由用户对器件编程来确定。由设计人员自行编程来把一个数字***“集成”在一片PLD上,而不需要请芯片制造厂商来设计和制作专用的集成电路芯片。而且,如今,取代手工地制作集成电路芯片,这种编程也多半改用“逻辑编译器(logic compiler)”软件来实现,它与程序开发撰写时所用的软件编译器相类似,而要编译之前的原始代码也得用特定的编程语言来撰写,此称之为硬件描述语言(Hardware Description Language,HDL),而HDL也并非仅有一种,而是有许多种,如ABEL(Advanced Boolean Expression Language)、AHDL(Altera Hardware Description Language)、Confluence、CUPL(Cornell University Programming Language)、HDCal、JHDL(Java Hardware Description Language)、Lava、Lola、MyHDL、PALASM、RHDL(Ruby Hardware Description Language)等,目前最晋迥使用的是VHDL(Very-High-Speed Integrated Circuit Hardware Description Language)与Verilog。本领域技术人员也应该清楚,只需要将方法流程用上述几种硬件描述语言稍作逻辑编程并编程到集成电路中,就可以很容易得到实现该逻辑方法流程的硬件电路。In the 1990s, improvements in a technology could be clearly distinguished as hardware improvements (for example, improvements in circuit structures such as diodes, transistors, switches, etc.) or software improvements (improvements in method processes). However, with the development of technology, many improvements in today's method processes can be regarded as direct improvements in hardware circuit structures. Designers almost always obtain the corresponding hardware circuit structure by programming the improved method flow into the hardware circuit. Therefore, it cannot be said that an improvement of a method flow cannot be implemented using hardware entity modules. For example, a Programmable Logic Device (PLD) (such as a Field Programmable Gate Array (FPGA)) is such an integrated circuit whose logic functions are determined by the user programming the device. Designers can program themselves to "integrate" a digital system on a PLD, instead of asking chip manufacturers to design and produce dedicated integrated circuit chips. Moreover, nowadays, instead of manually making integrated circuit chips, this kind of programming is mostly implemented using "logic compiler" software, which is similar to the software compiler used in program development and writing, and before compilation The original code must also be written in a specific programming language, which is called Hardware Description Language (HDL), and HDL is not just one kind, but there are many, such as ABEL (Advanced Boolean Expression Language) , AHDL (Altera Hardware Description Language), Confluence, CUPL (Cornell University Programming Language), HDCal, JHDL (Java Hardware Description Language), Lava, Lola, MyHDL, PALASM, RHDL (Ruby Hardware Description Language), etc., currently the most advanced VHDL (Very-High-Speed Integrated Circuit Hardware Description Language) and Verilog are used. Those skilled in the art should also know that by simply logically programming the method flow using the above-mentioned hardware description languages and programming it into the integrated circuit, the hardware circuit that implements the logical method flow can be easily obtained.
控制器可以按任何适当的方式实现,例如,控制器可以采取例如微处理器或处理器以及存储可由该(微)处理器执行的计算机可读程序代码(例如软件或固件)的计算机可读介质、逻辑门、开关、专用集成电路(Application Specific Integrated Circuit,ASIC)、可编程逻辑控制器和嵌入微控制器的形式,控制器的例子包括但不限于以下微控制器:ARC 625D、Atmel AT91SAM、Microchip PIC18F26K20以及Silicone Labs C8051F320,存储器控制器还可以被实现为存储器的控制逻辑的一部分。本领域技术人员也知道,除了以纯计算机可读程序代码方式实现控制器以外,完全可以通过将方法步骤进行逻辑编程来使得控制器以逻辑门、开关、专用集成电路、可编程逻辑控制器和嵌入微控制器等的形式来实现相同功能。因此这种控制器可以被认为是一种硬件部件,而对其内包括的用于实现各种功能的装置也可以视为硬件部件内的结构。或者甚至,可以将用于实现各种功能的装置视为既可以是实现方法的软件模块又可以是硬件部件内的结构。The controller may be implemented in any suitable manner, for example, the controller may take the form of, for example, a microprocessor or processor and a computer readable medium storing computer readable program code (eg, software or firmware) executable by the (micro)processor. , logic gates, switches, Application Specific Integrated Circuit (ASIC), programmable logic controllers and embedded microcontrollers. Examples of controllers include but are not limited to the following microcontrollers: ARC 625D, Atmel AT91SAM, For Microchip PIC18F26K20 and Silicone Labs C8051F320, the memory controller can also be implemented as part of the memory's control logic. Those skilled in the art also know that in addition to implementing the controller in the form of pure computer-readable program code, the controller can be completely programmed with logic gates, switches, application-specific integrated circuits, programmable logic controllers and embedded logic by logically programming the method steps. Microcontroller, etc. to achieve the same function. Therefore, this controller can be considered as a hardware component, and the devices included therein for implementing various functions can also be considered as structures within the hardware component. Or even, the means for implementing various functions can be considered as structures within hardware components as well as software modules implementing the methods.
上述实施例阐明的***、装置、模块或单元,具体可以由计算机芯片或实体实现,或者由具有某种功能的产品来实现。一种典型的实现设备为服务器***。当然,本发明不排除随着未来计算机技术的发展,实现上述实施例功能的计算机例如可以为个人计算机、膝上型计算机、车载人机交互设备、蜂窝电话、相机电话、智能电话、个人数字助理、媒体播放器、导航设备、电子邮件设备、游戏控制台、平板计算机、可穿戴设备或者这些设备中的任何设备的组合。 The systems, devices, modules or units described in the above embodiments may be implemented by computer chips or entities, or by products with certain functions. A typical implementation device is a server system. Of course, the present invention does not exclude that with the development of computer technology in the future, the computer that implements the functions of the above embodiments may be, for example, a personal computer, a laptop computer, a vehicle-mounted human-computer interaction device, a cellular phone, a camera phone, a smart phone, or a personal digital assistant. , media player, navigation device, email device, game console, tablet, wearable device, or a combination of any of these devices.
虽然本说明书一个或多个实施例提供了如实施例或流程图所述的方法操作步骤,但基于常规或者无创造性的手段可以包括更多或者更少的操作步骤。实施例中列举的步骤顺序仅仅为众多步骤执行顺序中的一种方式,不代表唯一的执行顺序。在实际中的装置或终端产品执行时,可以按照实施例或者附图所示的方法顺序执行或者并行执行(例如并行处理器或者多线程处理的环境,甚至为分布式数据处理环境)。术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、产品或者设备不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方法、产品或者设备所固有的要素。在没有更多限制的情况下,并不排除在包括所述要素的过程、方法、产品或者设备中还存在另外的相同或等同要素。例如若使用到第一,第二等词语用来表示名称,而并不表示任何特定的顺序。Although one or more embodiments of this specification provide method operation steps as described in the embodiments or flow charts, more or fewer operation steps may be included based on conventional or non-inventive means. The sequence of steps listed in the embodiment is only one way of executing the sequence of many steps, and does not represent the only execution sequence. When the actual device or terminal product is executed, it may be executed sequentially or in parallel according to the methods shown in the embodiments or figures (for example, a parallel processor or a multi-thread processing environment, or even a distributed data processing environment). The terms "comprises," "comprises" or any other variation thereof are intended to cover a non-exclusive inclusion such that a process, method, product or apparatus including a list of elements includes not only those elements but also others not expressly listed elements, or also elements inherent to the process, method, product or equipment. Without further limitation, it does not exclude the presence of additional identical or equivalent elements in a process, method, product or apparatus including the stated elements. For example, if the words "first" and "second" are used to express names, they do not indicate any specific order.
为了描述的方便,描述以上装置时以功能分为各种模块分别描述。当然,在实施本说明书一个或多个时可以把各模块的功能在同一个或多个软件和/或硬件中实现,也可以将实现同一功能的模块由多个子模块或子单元的组合实现等。以上所描述的装置实施例仅仅是示意性的,例如,所述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个***,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或单元的间接耦合或通信连接,可以是电性,机械或其它的形式。For the convenience of description, when describing the above device, the functions are divided into various modules and described separately. Of course, when implementing one or more of this specification, the functions of each module can be implemented in the same or multiple software and/or hardware, or the modules that implement the same function can be implemented by a combination of multiple sub-modules or sub-units, etc. . The device embodiments described above are only illustrative. For example, the division of the units is only a logical function division. In actual implementation, there may be other division methods. For example, multiple units or components may be combined or integrated. to another system, or some features can be ignored, or not implemented. On the other hand, the coupling or direct coupling or communication connection between each other shown or discussed may be through some interfaces, and the indirect coupling or communication connection of the devices or units may be in electrical, mechanical or other forms.
本发明是参照根据本发明实施例的方法、装置(***)、和计算机程序产品的流程图和/或方框图来描述的。应理解可由计算机程序指令实现流程图和/或方框图中的每一流程和/或方框、以及流程图和/或方框图中的流程和/或方框的结合。可提供这些计算机程序指令到通用计算机、专用计算机、嵌入式处理机或其他可编程数据处理设备的处理器以产生一个机器,使得通过计算机或其他可编程数据处理设备的处理器执行的指令产生用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的装置。The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each process and/or block in the flowchart illustrations and/or block diagrams, and combinations of processes and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing device to produce a machine, such that the instructions executed by the processor of the computer or other programmable data processing device produce a use A device for realizing the functions specified in one process or multiple processes of the flowchart and/or one block or multiple blocks of the block diagram.
这些计算机程序指令也可存储在能引导计算机或其他可编程数据处理设备以特定方式工作的计算机可读存储器中,使得存储在该计算机可读存储器中的指令产生包括指令装置的制造品,该指令装置实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能。These computer program instructions may also be stored in a computer-readable memory that causes a computer or other programmable data processing apparatus to operate in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including the instruction means, the instructions The device implements the functions specified in a process or processes of the flowchart and/or a block or blocks of the block diagram.
这些计算机程序指令也可装载到计算机或其他可编程数据处理设备上,使得在计算机或其他可编程设备上执行一系列操作步骤以产生计算机实现的处理,从而在计算机或其他可编程设备上执行的指令提供用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的步骤。These computer program instructions may also be loaded onto a computer or other programmable data processing device, causing a series of operating steps to be performed on the computer or other programmable device to produce computer-implemented processing, thereby executing on the computer or other programmable device. Instructions provide steps for implementing the functions specified in a process or processes of a flowchart diagram and/or a block or blocks of a block diagram.
在一个典型的配置中,计算设备包括一个或多个处理器(CPU)、输入/输出接口、网络接口和内存。In a typical configuration, a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.
内存可能包括计算机可读介质中的非永久性存储器,随机存取存储器(RAM)和/或非易失性内存等形式,如只读存储器(ROM)或闪存(flash RAM)。内存是计算机可读介质的示例。Memory may include non-permanent storage in computer-readable media, random access memory (RAM) and/or non-volatile memory in the form of read-only memory (ROM) or flash memory (flash RAM). Memory is an example of computer-readable media.
计算机可读介质包括永久性和非永久性、可移动和非可移动媒体可以由任何方法或技术来实现信息存储。信息可以是计算机可读指令、数据结构、程序的模块或其他数据。计算机的存储介质的例子包括,但不限于相变内存(PRAM)、静态随机存取存储器(SRAM)、动态随机存取存储器(DRAM)、其他类型的随机存取存储器(RAM)、只读存储器(ROM)、电可擦除可编程只读存储器(EEPROM)、快闪记忆体或其他内存技术、只读光盘只读存储器(CD-ROM)、数字多功能光盘(DVD)或其他光学存储、磁盒式磁带,磁带磁磁盘存储、石墨烯存储或其他磁性存储设备或任何其他非传输介质,可用于存储可以被计算设备访问的信息。按照本文中的界定,计算机可读介质不包括暂存电脑可读媒体(transitory media),如调制的数据信号和载波。Computer-readable media includes both persistent and non-volatile, removable and non-removable media that can be implemented by any method or technology for storage of information. Information may be computer-readable instructions, data structures, modules of programs, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), static random access memory (SRAM), dynamic random access memory (DRAM), other types of random access memory (RAM), and read-only memory. (ROM), electrically erasable programmable read-only memory (EEPROM), flash memory or other memory technology, compact disc read-only memory (CD-ROM), digital versatile disc (DVD) or other optical storage, Magnetic tape, magnetic tape storage, graphene storage or other magnetic storage devices or any other non-transmission medium can be used to store information that can be accessed by a computing device. As defined in this article, computer-readable media does not include transitory media, such as modulated data signals and carrier waves.
本领域技术人员应明白,本说明书一个或多个实施例可提供为方法、***或计算机程序产品。因此,本说明书一个或多个实施例可采用完全硬件实施例、完全软件实施例或结合软件和硬件方面的实施例的形式。而且,本说明书一个或多个实施例可采用在一个或多个其中包含有计算机可用程序代码的计算机可用存储介质(包括但不限于磁盘存储器、CD-ROM、光学存储器等)上实施的计算机程序产品的形式。It should be understood by those skilled in the art that one or more embodiments of the present description may be provided as a method, system, or computer program product. Accordingly, one or more embodiments of the present description may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment that combines software and hardware aspects. Furthermore, one or more embodiments of the present description may employ a computer program implemented on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, etc.) having computer-usable program code embodied therein. Product form.
本说明书一个或多个实施例可以在由计算机执行的计算机可执行指令的一般上下文中描述,例如程序模块。一般地,程序模块包括执行特定任务或实现特定抽象数据类型的例程、程序、对象、组件、数据结构等等。也可以在分布式计算环境中实践本本说明书一个或多个实施例,在这些分布式计算环境中,由通过通信网络而被连接的远程处理 设备来执行任务。在分布式计算环境中,程序模块可以位于包括存储设备在内的本地和远程计算机存储介质中。One or more embodiments of this specification may be described in the general context of computer-executable instructions, such as program modules, being executed by a computer. Generally, program modules include routines, programs, objects, components, data structures, etc. that perform specific tasks or implement specific abstract data types. One or more embodiments of the present description may also be practiced in distributed computing environments in which remote processes are connected through a communications network. equipment to perform tasks. In a distributed computing environment, program modules may be located in both local and remote computer storage media including storage devices.
本说明书中的各个实施例均采用递进的方式描述,各个实施例之间相同相似的部分互相参见即可,每个实施例重点说明的都是与其他实施例的不同之处。尤其,对于***实施例而言,由于其基本相似于方法实施例,所以描述的比较简单,相关之处参见方法实施例的部分说明即可。在本说明书的描述中,参考术语“一个实施例”、“一些实施例”、“示例”、“具体示例”、或“一些示例”等的描述意指结合该实施例或示例描述的具体特征、结构、材料或者特点包含于本说明书的至少一个实施例或示例中。在本说明书中,对上述术语的示意性表述不必须针对的是相同的实施例或示例。而且,描述的具体特征、结构、材料或者特点可以在任一个或多个实施例或示例中以合适的方式结合。此外,在不相互矛盾的情况下,本领域的技术人员可以将本说明书中描述的不同实施例或示例以及不同实施例或示例的特征进行结合和组合。Each embodiment in this specification is described in a progressive manner. The same and similar parts between the various embodiments can be referred to each other. Each embodiment focuses on its differences from other embodiments. In particular, for the system embodiment, since it is basically similar to the method embodiment, the description is relatively simple. For relevant details, please refer to the partial description of the method embodiment. In the description of this specification, reference to the terms "one embodiment," "some embodiments," "an example," "specific examples," or "some examples" or the like means that specific features are described in connection with the embodiment or example. , structures, materials or features are included in at least one embodiment or example of this specification. In this specification, the schematic expressions of the above terms are not necessarily directed to the same embodiment or example. Furthermore, the specific features, structures, materials or characteristics described may be combined in any suitable manner in any one or more embodiments or examples. Furthermore, those skilled in the art may combine and combine different embodiments or examples and features of different embodiments or examples described in this specification unless they are inconsistent with each other.
以上所述为本说明书一个或多个实施例的实施例,并不用于限制本本说明书一个或多个实施例。对于本领域技术人员来说,本说明书一个或多个实施例可以有各种更改和变化。凡在本说明书的精神和原理之内所作的任何修改、等同替换、改进等,均应包含在权利要求范围之内。 The above are examples of one or more embodiments of this specification, and are not intended to limit one or more embodiments of this specification. To those skilled in the art, various modifications and changes may be made to one or more embodiments of this specification. Any modifications, equivalent substitutions, improvements, etc. made within the spirit and principles of this specification shall be included in the scope of the claims.

Claims (12)

  1. 一种生成统一抽象语法树的方法,包括:A method for generating a unified abstract syntax tree, including:
    获取任一编程语言的程序文件,将所述程序文件解析为用于表征所述任一编程语言对应的特有语法结构的原始抽象语法树;Obtain the program file of any programming language, and parse the program file into an original abstract syntax tree used to characterize the unique syntax structure corresponding to the any programming language;
    确定统一语法结构对应的语法转换规则,所述语法转换规则包括通用层规则与特有层规则,其中,所述通用层规则包括:所述统一语法结构中定义的通用标准节点与所述语法转换规则已支持的所有编程语言各自对应的特有语法结构中定义的相应原始节点之间的映射关系,所述特有层规则包括:所述统一语法结构中定义的特有标准节点与所述语法转换规则已支持的部分编程语言对应的特有语法结构中定义的相应原始节点之间的映射关系;Determine the grammar conversion rules corresponding to the unified grammar structure. The grammar conversion rules include general layer rules and unique layer rules, wherein the general layer rules include: general standard nodes defined in the unified grammar structure and the grammar conversion rules. The mapping relationship between the corresponding original nodes defined in the unique syntax structure of all supported programming languages. The unique layer rules include: the unique standard nodes defined in the unified syntax structure and the syntax conversion rules are supported. The mapping relationship between the corresponding original nodes defined in the unique syntax structure corresponding to some programming languages;
    将所述原始抽象语法树中包含的每一原始节点与所述语法转换规则包括的映射关系进行匹配,并将所述每一原始节点转换为各自匹配成功的映射关系中对应的标准节点,得到用于表征所述统一语法结构的统一抽象语法树。Match each original node contained in the original abstract syntax tree with the mapping relationship included in the syntax conversion rule, and convert each original node into the corresponding standard node in the corresponding mapping relationship that successfully matched, to obtain A unified abstract syntax tree used to represent the unified syntax structure.
  2. 根据权利要求1所述的方法,还包括:The method of claim 1, further comprising:
    获取第一待定标准节点与所述特有语法结构定义的第一原始节点之间的第一映射关系;Obtain the first mapping relationship between the first undetermined standard node and the first original node defined by the unique syntax structure;
    在所述统一语法结构中定义的通用标准节点中包括第一待定节点的情况下,将第一映射关系更新至所述通用层规则;When the universal standard nodes defined in the unified syntax structure include the first undetermined node, update the first mapping relationship to the universal layer rule;
    在所述统一语法结构中定义的特有标准节点中包括第一待定标准节点的情况下,将第一映射关系更新至所述特有层规则;When the unique standard nodes defined in the unified syntax structure include a first undetermined standard node, update the first mapping relationship to the unique layer rule;
    在所述统一语法结构中定义的通用标准节点和特有标准节点中均不包括第一待定标准节点的情况下,将第一待定标准节点定义为所述统一语法结构中新的特有标准节点,并将第一映射关系更新至所述特有层规则。When neither the universal standard node nor the unique standard node defined in the unified syntax structure includes the first undetermined standard node, the first undetermined standard node is defined as a new unique standard node in the unified syntax structure, and Update the first mapping relationship to the specific layer rule.
  3. 根据权利要求2所述的方法,所述获取第一待定标准节点与所述特有语法结构定义的第一原始节点之间的第一映射关系,包括:The method according to claim 2, said obtaining the first mapping relationship between the first undetermined standard node and the first original node defined by the unique syntax structure, including:
    在所述原始抽象语法树中包含的第一原始节点与所述语法转换规则包括的所有映射关系均匹配失败的情况下,获取第一映射关系;或者,In the case where the first original node contained in the original abstract syntax tree fails to match all the mapping relationships included in the grammar conversion rule, obtain the first mapping relationship; or,
    获取规则更新指令,所述规则更新指令包括第一映射关系。Obtain a rule update instruction, where the rule update instruction includes a first mapping relationship.
  4. 根据权利要求1所述的方法,还包括:The method of claim 1, further comprising:
    基于所述统一语法结构对应的程序分析***,对所述统一抽象语法树执行程序分析。Based on the program analysis system corresponding to the unified syntax structure, program analysis is performed on the unified abstract syntax tree.
  5. 根据权利要求1所述的方法,所述统一语法结构中定义的任一标准节点具有对应的节点类型与节点属性,其中,所述任一标准节点的节点属性包括所述任一标准节点的子节点对应的节点类型和/或所述任一标准节点对应的值。The method according to claim 1, any standard node defined in the unified syntax structure has a corresponding node type and node attribute, wherein the node attribute of any standard node includes a child of any standard node. The node type corresponding to the node and/or the value corresponding to any of the standard nodes.
  6. 根据权利要求5所述的方法,所述节点类型包括根节点、表达式节点、语句节点与类型节点;其中,The method according to claim 5, the node type includes a root node, an expression node, a statement node and a type node; wherein,
    根节点的类型包括CompileUnit;The types of root nodes include CompileUnit;
    表达式节点的子类型包括:字面量节点、定义式节点、一元表达式节点、二元表达式节点、三元表达式节点、模块节点、调用节点、成员访问节点、申请堆节点、标识符节点、关键字节点;其中,字面量节点的类型包括Literal、Object,定义式节点的类型包括ClassDefinition、FunctionDefiniton,一元表达式节点的类型包括Unary,二元表达式节点的类型包括Binary、Assignment,三元表达式节点的类型包括Condition,模块节点的类型包括Import、Export,调用节点的类型包括Call,成员访问节点的类型包括MemberAccess,申请堆节点的类型包括New,标识符节点的类型包括Identifier,关键字节点的类型包括This、Super;Subtypes of expression nodes include: literal nodes, definition nodes, unary expression nodes, binary expression nodes, ternary expression nodes, module nodes, call nodes, member access nodes, application heap nodes, and identifier nodes. , keyword node; among them, the types of literal nodes include Literal and Object, the types of definition nodes include ClassDefinition, FunctionDefiniton, the types of unary expression nodes include Unary, the types of binary expression nodes include Binary, Assignment, and ternary The types of expression nodes include Condition, the types of module nodes include Import and Export, the types of call nodes include Call, the types of member access nodes include MemberAccess, the types of application heap nodes include New, the types of identifier nodes include Identifier, Keyword Node types include This and Super;
    语句节点的子类型包括:声明节点、分支节点、循环节点、控制跳转节点、异常节点;其中,声明节点的类型包括VariableDeclaration,分支节点的类型包括If、Switch,循环节点的类型包括For、ForIn、While,控制跳转节点的类型包括Return、Break、Continue,异常节点的类型包括Throw、Try、Catch;The subtypes of statement nodes include: declaration nodes, branch nodes, loop nodes, control jump nodes, and exception nodes; among them, the types of declaration nodes include VariableDeclaration, the types of branch nodes include If, Switch, and the types of loop nodes include For, ForIn , While, the types of control jump nodes include Return, Break, and Continue, and the types of exception nodes include Throw, Try, and Catch;
    类型节点的子类型包括:基本类型节点、聚合类型节点;其中,基本类型节点的类型包括integer、float、string,聚合类型节点的类型包括Array、Scoped。The subtypes of type nodes include: basic type nodes and aggregate type nodes; among them, the types of basic type nodes include integer, float, and string, and the types of aggregate type nodes include Array and Scoped.
  7. 根据权利要求1所述的方法,所述语法转换规则支持的编程语言至少包括:Python语言、C语言、Java语言、PHP语言、GO语言、JavaScript语言。According to the method of claim 1, the programming languages supported by the grammar conversion rules include at least: Python language, C language, Java language, PHP language, GO language, and JavaScript language.
  8. 一种程序分析的方法,包括:A method of program analysis, including:
    获取任一编程语言的程序文件对应的用于表征统一语法结构的统一抽象语法树;其中,所述统一抽象语法树中包含的标准节点由原始抽象语法树中包含的原始节点基于所述统一语法结构对应的语法转换规则转换得到,所述原始抽象语法树由所述程序文件解析得到且用于表征所述任一编程语言对应的特有语法结构,所述语法转换规则包括通用层规则与特有层规则,其中,所述通用层规则包括:所述统一语法结构中定义的通用标准节点与所述语法转换规则已支持的所有编程语言各自对应的特有语法结构中定义的相应原始节点之间的映射关系,所述特有层规则包括:所述统一语法结构中定义的特有 标准节点与所述语法转换规则已支持的部分编程语言对应的特有语法结构中定义的相应原始节点之间的映射关系;Obtain a unified abstract syntax tree for representing a unified syntax structure corresponding to the program file of any programming language; wherein, the standard nodes contained in the unified abstract syntax tree are composed of the original nodes included in the original abstract syntax tree based on the unified syntax The grammar conversion rules corresponding to the structure are converted. The original abstract syntax tree is obtained by parsing the program file and used to represent the unique grammar structure corresponding to any programming language. The grammar conversion rules include general layer rules and specific layers. Rules, wherein the general layer rules include: a mapping between the general standard nodes defined in the unified syntax structure and the corresponding original nodes defined in the corresponding unique syntax structures of all programming languages supported by the syntax conversion rules relationship, the unique layer rules include: unique layer rules defined in the unified syntax structure The mapping relationship between standard nodes and corresponding original nodes defined in the unique syntax structures corresponding to some programming languages supported by the syntax conversion rules;
    基于所述统一语法结构对应的程序分析***,对所述统一抽象语法树执行程序分析。Based on the program analysis system corresponding to the unified syntax structure, program analysis is performed on the unified abstract syntax tree.
  9. 一种生成统一抽象语法树的装置,包括:A device for generating a unified abstract syntax tree, including:
    程序文件获取单元,用于获取任一编程语言的程序文件,将所述程序文件解析为用于表征所述任一编程语言对应的特有语法结构的原始抽象语法树;A program file acquisition unit is used to acquire the program file of any programming language, and parse the program file into an original abstract syntax tree used to represent the unique syntax structure corresponding to the any programming language;
    语法转换规则确定单元,用于确定统一语法结构对应的语法转换规则,所述语法转换规则包括通用层规则与特有层规则,其中,所述通用层规则包括:所述统一语法结构中定义的通用标准节点与所述语法转换规则已支持的所有编程语言各自对应的特有语法结构中定义的相应原始节点之间的映射关系,所述特有层规则包括:所述统一语法结构中定义的特有标准节点与所述语法转换规则已支持的部分编程语言对应的特有语法结构中定义的相应原始节点之间的映射关系;A grammar conversion rule determination unit is used to determine the grammar conversion rules corresponding to the unified grammar structure. The grammar conversion rules include general layer rules and unique layer rules, wherein the general layer rules include: general layer rules defined in the unified grammar structure. The mapping relationship between standard nodes and the corresponding original nodes defined in the unique syntax structures of all programming languages supported by the syntax conversion rules. The unique layer rules include: unique standard nodes defined in the unified syntax structure. The mapping relationship between the corresponding original nodes defined in the unique syntax structure corresponding to some programming languages supported by the syntax conversion rules;
    转换单元,用于将所述原始抽象语法树中包含的每一原始节点与所述语法转换规则包括的映射关系进行匹配,并将所述每一原始节点转换为各自匹配成功的映射关系中对应的标准节点,得到用于表征所述统一语法结构的统一抽象语法树。A conversion unit configured to match each original node contained in the original abstract syntax tree with the mapping relationship included in the grammar conversion rule, and convert each original node into a corresponding corresponding mapping relationship in each successfully matched standard nodes to obtain a unified abstract syntax tree used to characterize the unified syntax structure.
  10. 一种程序分析的装置,包括:A device for program analysis, including:
    统一抽象语法树获取单元,用于获取任一编程语言的程序文件对应的用于表征统一语法结构的统一抽象语法树;其中,所述统一抽象语法树中包含的标准节点由原始抽象语法树中包含的原始节点基于所述统一语法结构对应的语法转换规则转换得到,所述原始抽象语法树由所述程序文件解析得到且用于表征所述任一编程语言对应的特有语法结构,所述语法转换规则包括通用层规则与特有层规则,其中,所述通用层规则包括:所述统一语法结构中定义的通用标准节点与所述语法转换规则已支持的所有编程语言各自对应的特有语法结构中定义的相应原始节点之间的映射关系,所述特有层规则包括:所述统一语法结构中定义的特有标准节点与所述语法转换规则已支持的部分编程语言对应的特有语法结构中定义的相应原始节点之间的映射关系;A unified abstract syntax tree acquisition unit is used to acquire a unified abstract syntax tree representing a unified syntax structure corresponding to the program file of any programming language; wherein, the standard nodes included in the unified abstract syntax tree are obtained from the original abstract syntax tree. The included original nodes are converted based on the grammar conversion rules corresponding to the unified grammar structure. The original abstract syntax tree is obtained by parsing the program file and is used to characterize the unique grammar structure corresponding to the any programming language. The grammar The conversion rules include general layer rules and specific layer rules, wherein the general layer rules include: the general standard nodes defined in the unified syntax structure and the unique syntax structures corresponding to all programming languages supported by the syntax conversion rules. The mapping relationship between the corresponding original nodes defined, the unique layer rules include: the unique standard nodes defined in the unified syntax structure and the corresponding unique syntax structures defined in some programming languages supported by the syntax conversion rules Mapping relationship between original nodes;
    程序分析单元,用于基于所述统一语法结构对应的程序分析***,对所述统一抽象语法树执行程序分析。A program analysis unit is configured to perform program analysis on the unified abstract syntax tree based on the program analysis system corresponding to the unified syntax structure.
  11. 一种电子设备,包括:An electronic device including:
    处理器;processor;
    用于存储处理器可执行指令的存储器;Memory used to store instructions executable by the processor;
    其中,所述处理器通过运行所述可执行指令以实现如权利要求1-8中任一项所述的方法。Wherein, the processor implements the method according to any one of claims 1-8 by running the executable instructions.
  12. 一种计算机可读存储介质,其上存储有计算机指令,该指令被处理器执行时实现如权利要求1-8中任一项所述方法的步骤。 A computer-readable storage medium having computer instructions stored thereon, which when executed by a processor, implements the steps of the method according to any one of claims 1-8.
PCT/CN2023/109532 2022-08-26 2023-07-27 Method and apparatus for generating unified abstract syntax tree, and program analysis method and apparatus WO2024041301A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202211037955.5A CN115390852A (en) 2022-08-26 2022-08-26 Method and device for generating uniform abstract syntax tree and program analysis
CN202211037955.5 2022-08-26

Publications (1)

Publication Number Publication Date
WO2024041301A1 true WO2024041301A1 (en) 2024-02-29

Family

ID=84122183

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2023/109532 WO2024041301A1 (en) 2022-08-26 2023-07-27 Method and apparatus for generating unified abstract syntax tree, and program analysis method and apparatus

Country Status (2)

Country Link
CN (1) CN115390852A (en)
WO (1) WO2024041301A1 (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115390852A (en) * 2022-08-26 2022-11-25 支付宝(杭州)信息技术有限公司 Method and device for generating uniform abstract syntax tree and program analysis

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140282444A1 (en) * 2013-03-15 2014-09-18 ArtinSoft Corporation Programming language transformations with abstract syntax tree extensions
CN110825384A (en) * 2019-10-28 2020-02-21 国电南瑞科技股份有限公司 ST language compiling method, system and compiler based on LLVM
CN113467828A (en) * 2021-06-23 2021-10-01 中国海洋大学 Method and system for converting programming language in heterogeneous many-core processor
CN115390852A (en) * 2022-08-26 2022-11-25 支付宝(杭州)信息技术有限公司 Method and device for generating uniform abstract syntax tree and program analysis

Family Cites Families (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105335412B (en) * 2014-07-31 2019-06-11 阿里巴巴集团控股有限公司 For data conversion, the method and apparatus of Data Migration
CN104391730B (en) * 2014-08-03 2017-07-11 浙江网新恒天软件有限公司 A kind of software source codes language translation system and method
CN113535184A (en) * 2020-04-14 2021-10-22 华为技术有限公司 Cross-platform code conversion method and device
CN112269566B (en) * 2020-11-03 2022-09-02 支付宝(杭州)信息技术有限公司 Script generation processing method, device, equipment and system
CN112988163B (en) * 2021-04-01 2024-02-02 中国工商银行股份有限公司 Intelligent adaptation method, intelligent adaptation device, intelligent adaptation electronic equipment and intelligent adaptation medium for programming language
CN113504900A (en) * 2021-07-26 2021-10-15 中国工商银行股份有限公司 Programming language conversion method and device
CN114443041A (en) * 2021-11-30 2022-05-06 阿里云计算有限公司 Method for parsing abstract syntax tree and computer program product
CN114489670A (en) * 2022-01-14 2022-05-13 北京达佳互联信息技术有限公司 Data processing method, device, equipment and storage medium
CN114780100B (en) * 2022-04-08 2023-04-07 芯华章科技股份有限公司 Compiling method, electronic device and storage medium

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140282444A1 (en) * 2013-03-15 2014-09-18 ArtinSoft Corporation Programming language transformations with abstract syntax tree extensions
CN110825384A (en) * 2019-10-28 2020-02-21 国电南瑞科技股份有限公司 ST language compiling method, system and compiler based on LLVM
CN113467828A (en) * 2021-06-23 2021-10-01 中国海洋大学 Method and system for converting programming language in heterogeneous many-core processor
CN115390852A (en) * 2022-08-26 2022-11-25 支付宝(杭州)信息技术有限公司 Method and device for generating uniform abstract syntax tree and program analysis

Also Published As

Publication number Publication date
CN115390852A (en) 2022-11-25

Similar Documents

Publication Publication Date Title
CN108628947B (en) Business rule matching processing method, device and processing equipment
CN110764748B (en) Code calling method, device, terminal and storage medium
CN110489323B (en) Visual RPC API debugging method, device, medium and equipment
WO2024041301A1 (en) Method and apparatus for generating unified abstract syntax tree, and program analysis method and apparatus
US11275568B2 (en) Generating a synchronous digital circuit from a source code construct defining a function call
US20240184543A1 (en) Page multiplexing method, page multiplexing device, storage medium and electronic apparatus
CN112181378B (en) Method and device for realizing business process
CN116483859A (en) Data query method and device
JP2023036634A (en) Access method, device, electronic apparatus, and computer storage medium
CN116028028A (en) Request function generation method, device, equipment and storage medium
CN106681781B (en) Method and system for realizing real-time computing service
US20200110584A1 (en) Automated code generation for functional testing of software applications
CN112269566B (en) Script generation processing method, device, equipment and system
WO2024046015A1 (en) Data query method and apparatus, storage medium, and electronic device
CN110941655A (en) Data format conversion method and device
CN110633162B (en) Remote call implementation method and device, computer equipment and storage medium
CN116436936A (en) Data storage system, method, storage medium and electronic equipment
CN116432185B (en) Abnormality detection method and device, readable storage medium and electronic equipment
CN110874322B (en) Test method and test server for application program
JP7213337B2 (en) Application interface realization method, device, device and medium on upper platform layer
CN115563183B (en) Query method, query device and program product
CN112988260B (en) Application cold start optimization method and device, computer equipment and storage medium
CN117170669B (en) Page display method based on front-end high-low code fusion
CN114047922B (en) Transcoding method, device, medium and equipment for precompiled device
CN116302219A (en) Data stream batch processing method, device, equipment and medium

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23856391

Country of ref document: EP

Kind code of ref document: A1