CN115390852A - Method and device for generating uniform abstract syntax tree and program analysis - Google Patents

Method and device for generating uniform abstract syntax tree and program analysis Download PDF

Info

Publication number
CN115390852A
CN115390852A CN202211037955.5A CN202211037955A CN115390852A CN 115390852 A CN115390852 A CN 115390852A CN 202211037955 A CN202211037955 A CN 202211037955A CN 115390852 A CN115390852 A CN 115390852A
Authority
CN
China
Prior art keywords
node
grammar
specific
original
nodes
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211037955.5A
Other languages
Chinese (zh)
Inventor
李永超
徐兆桂
刘地军
汤震浩
赵泽林
狄鹏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Alipay Hangzhou Information Technology Co Ltd
Original Assignee
Alipay Hangzhou Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Alipay Hangzhou Information Technology Co Ltd filed Critical Alipay Hangzhou Information Technology Co Ltd
Priority to CN202211037955.5A priority Critical patent/CN115390852A/en
Publication of CN115390852A publication Critical patent/CN115390852A/en
Priority to PCT/CN2023/109532 priority patent/WO2024041301A1/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/40Transformation of program code
    • G06F8/41Compilation
    • G06F8/42Syntactic analysis
    • G06F8/427Parsing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/30Creation or generation of source code
    • G06F8/31Programming languages or programming paradigms

Landscapes

  • Engineering & Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Computing Systems (AREA)
  • Devices For Executing Special Programs (AREA)

Abstract

The present specification provides a method and apparatus for generating a unified abstract syntax tree and program analysis. The method for generating the uniform abstract syntax tree comprises the following steps: acquiring a program file of any programming language, and analyzing the program file into an original abstract syntax tree for representing a specific syntax structure corresponding to any programming language; determining grammar conversion rules corresponding to the unified grammar structure, wherein the grammar conversion rules comprise general layer rules and special layer rules, and the general layer rules comprise: the mapping relation between the general standard node defined in the uniform grammar structure and the corresponding original node defined in the specific grammar structure corresponding to all the programming languages, the specific layer rule includes: mapping relation between specific standard nodes defined in the uniform grammar structure and corresponding original nodes defined in the specific grammar structure corresponding to the partial programming language; and converting each original node contained in the original abstract syntax tree into a standard node to obtain a uniform abstract syntax tree for representing a uniform syntax structure.

Description

Method and device for generating uniform abstract syntax tree and program analysis
Technical Field
The embodiment of the specification belongs to the technical field of computers, and particularly relates to a method and a device for generating a unified abstract syntax tree and analyzing a program.
Background
In computer science, an abstract syntax tree is an abstract representation of source code that characterizes the syntax structure of a programming language in the form of a tree, with each node on the abstract syntax tree representing a syntax structure in the source code.
In the related art, different programming languages have different syntax structures, and different types of abstract syntax trees parsed from program files of different programming languages are respectively used for representing specific syntax structures of a corresponding program language, so that a system for further processing based on the abstract syntax trees needs to adapt to different types of abstract syntax trees to develop multiple sets of program algorithms with the same function aiming at the syntax structures of different programming languages, thereby resulting in higher development cost. For example, a developer of the program analysis system has to separately develop multiple sets of program analysis algorithms for multiple types of abstract syntax trees obtained by parsing program files of multiple programming languages to adapt to program analysis tasks of different programming languages, and these repeated developments increase the size and development cost of the program analysis system, which is not favorable for further maintenance and iteration.
Disclosure of Invention
The invention aims to provide a method and a device for generating a unified abstract syntax tree and analyzing a program.
According to a first aspect of one or more embodiments of the present specification, there is provided a method of generating a unified abstract syntax tree, comprising:
acquiring a program file of any programming language, and analyzing the program file into an original abstract syntax tree for representing a specific syntax structure corresponding to any programming language;
determining grammar conversion rules corresponding to a unified grammar structure, wherein the grammar conversion rules comprise a general layer rule and a specific layer rule, and the general layer rule comprises the following steps: mapping relationships between common standard nodes defined in the unified grammar structure and corresponding original nodes defined in a unique grammar structure corresponding to each of all programming languages supported by the grammar conversion rules, the unique layer rules including: mapping relation between specific standard nodes defined in the unified grammar structure and corresponding original nodes defined in a specific grammar structure corresponding to a part of programming languages supported by the grammar conversion rule;
and matching each original node contained in the original abstract syntax tree with the mapping relation included in the syntax conversion rule, and converting each original node into a corresponding standard node in the mapping relation which is successfully matched, so as to obtain the uniform abstract syntax tree for representing the uniform syntax structure.
According to a second aspect of one or more embodiments of the present specification, there is provided a program analysis method including:
acquiring a uniform abstract syntax tree which is corresponding to a program file of any programming language and is used for representing a uniform syntax structure; wherein, the standard node included in the uniform abstract syntax tree is obtained by converting an original node included in an original abstract syntax tree based on a syntax conversion rule corresponding to the uniform syntax structure, the original abstract syntax tree is obtained by parsing the program file and is used for representing a specific syntax structure corresponding to any programming language, the syntax conversion rule includes a generic layer rule and a specific layer rule, wherein the generic layer rule includes: mapping relationships between common standard nodes defined in the unified grammar structure and corresponding original nodes defined in a unique grammar structure corresponding to each of all programming languages supported by the grammar conversion rules, the unique layer rules including: mapping relation between specific standard nodes defined in the uniform grammar structure and corresponding original nodes defined in the specific grammar structure corresponding to a part of programming languages supported by the grammar conversion rule;
and executing program analysis on the uniform abstract syntax tree based on a program analysis system corresponding to the uniform syntax structure.
According to a third aspect of one or more embodiments of the present specification, there is provided an apparatus for generating a unified abstract syntax tree, including:
the system comprises a program file acquisition unit, a syntax analysis unit and a syntax analysis unit, wherein the program file acquisition unit is used for acquiring a program file of any programming language and analyzing the program file into an original abstract syntax tree for representing a specific syntax structure corresponding to the any programming language;
a syntax transformation rule determining unit, configured to determine a syntax transformation rule corresponding to a unified syntax structure, where the syntax transformation rule includes a generic layer rule and a specific layer rule, and the generic layer rule includes: mapping relationships between common standard nodes defined in the unified grammar structure and corresponding original nodes defined in a unique grammar structure corresponding to each of all programming languages supported by the grammar conversion rules, the unique layer rules including: mapping relation between specific standard nodes defined in the unified grammar structure and corresponding original nodes defined in a specific grammar structure corresponding to a part of programming languages supported by the grammar conversion rule;
and the conversion unit is used for matching each original node contained in the original abstract syntax tree with the mapping relation included in the syntax conversion rule, and converting each original node into a corresponding standard node in the mapping relation which is successfully matched, so as to obtain the uniform abstract syntax tree for representing the uniform syntax structure.
According to a fourth aspect of one or more embodiments of the present specification, there is provided an apparatus for program analysis, including:
the uniform abstract syntax tree acquisition unit is used for acquiring a uniform abstract syntax tree which corresponds to a program file of any programming language and is used for representing a uniform syntax structure; wherein, a standard node included in the uniform abstract syntax tree is obtained by converting an original node included in an original abstract syntax tree based on a syntax conversion rule corresponding to the uniform syntax structure, the original abstract syntax tree is obtained by parsing the program file and is used for representing a specific syntax structure corresponding to any programming language, the syntax conversion rule includes a generic layer rule and a specific layer rule, wherein the generic layer rule includes: mapping relationships between common standard nodes defined in the unified grammar structure and corresponding original nodes defined in a unique grammar structure corresponding to each of all programming languages supported by the grammar conversion rules, the unique layer rules including: mapping relation between specific standard nodes defined in the unified grammar structure and corresponding original nodes defined in a specific grammar structure corresponding to a part of programming languages supported by the grammar conversion rule;
and the program analysis unit is used for executing program analysis on the uniform abstract syntax tree based on a program analysis system corresponding to the uniform syntax structure.
According to a fifth aspect of one or more embodiments herein, there is provided an electronic device, comprising:
a processor;
a memory for storing processor-executable instructions;
wherein the processor implements the method of any of the first or second aspects by executing the executable instructions.
According to a sixth aspect of one or more embodiments herein, there is provided a computer readable storage medium having stored thereon computer instructions which, when executed by a processor, implement the steps of the method according to any one of the first or second aspects.
The embodiment of the specification designs a unified grammar structure and a corresponding grammar conversion rule, wherein the grammar conversion rule comprises a general layer rule and a specific layer rule which are respectively used for converting grammar structures of all programming languages and grammar structures of part of the programming languages into corresponding unified grammar structures, so that the complete representation of all the programming languages supported by the grammar conversion rule by the unified grammar structures is realized, the program files of different programming languages can obtain abstract representation of unified grammar structure definition, namely, the original abstract grammar trees obtained by analyzing the program files of multiple programming languages can be converted into the unified abstract grammar trees for representing the unified grammar structures through the grammar conversion rule, and the unified abstract grammar trees can indirectly represent the specific grammar structures of the multiple programming languages at the same time, thereby bringing convenience for the development work of a system which carries out further processing based on the unified abstract grammar trees. For example, a developer of the program analysis system only needs to develop a set of program analysis algorithms for the unified grammar structure, and can adapt to program analysis tasks for different programming languages, and only needs to modify the set of program analysis algorithms even if modification is needed subsequently, so that development cost of development, maintenance and iteration of the program analysis system is reduced.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present disclosure, the drawings needed to be used in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments described in the present disclosure, and it is obvious for a person skilled in the art to obtain other drawings based on these drawings without inventive labor.
FIG. 1 is a flowchart of a method for generating a unified abstract syntax tree, according to an exemplary embodiment.
FIG. 2 is a diagram of node types defined by a unified syntax structure provided by an exemplary embodiment.
FIG. 3 is a flow chart of a method of program analysis provided by an exemplary embodiment.
Fig. 4 is a schematic structural diagram of an apparatus according to an exemplary embodiment.
FIG. 5 is a block diagram of an apparatus for generating a unified abstract syntax tree, according to an exemplary embodiment.
Fig. 6 is a block diagram of an apparatus for program analysis according to an exemplary embodiment.
Detailed Description
In order to make those skilled in the art better understand the technical solutions in the present specification, the technical solutions in the embodiments of the present specification will be clearly and completely described below with reference to the drawings in the embodiments of the present specification, and it is obvious that the described embodiments are only a part of the embodiments of the present specification, and not all of the embodiments. All other embodiments obtained by a person of ordinary skill in the art based on the embodiments in the present specification without making any creative effort shall fall within the protection scope of the present specification.
Referring to fig. 1, fig. 1 is a flowchart illustrating a method for generating a unified abstract syntax tree according to an exemplary embodiment. As shown in fig. 1, the method includes:
s102: acquiring a program file of any programming language, and analyzing the program file into an original abstract syntax tree for representing a specific syntax structure corresponding to any programming language.
In the embodiment of the specification, a program file of any programming language can be parsed into an original abstract syntax tree for characterizing a specific syntax structure corresponding to the any programming language through a parser aiming at the any programming language. Specifically, the program file obtains a corresponding word segmentation result through word segmentation, then performs syntax analysis on the word segmentation result through a parsing function in a parser, and finally converts all syntax structures contained in the program file into corresponding syntax nodes (a name of an abstract syntax structure), and parent-child relations exist among the syntax nodes, so that an original abstract syntax tree used for representing a specific syntax structure corresponding to any programming language is finally formed. The syntax nodes included in the original abstract syntax tree are called original nodes, and have corresponding node types and node attributes defined in a specific syntax structure corresponding to any programming language, wherein the node attributes of any original node include the node types corresponding to the child nodes of any original node and/or the values corresponding to any original node.
The grammar node may be a function declaration, a declaration of a static global variable, or the introduction of a file, etc. It is understood that the syntax structures can be nested, for example, a function statement corresponds to a syntax node that includes an attribute introduced to a file, so that the syntax node introduced by the file is a child node of the syntax node of the function statement, the syntax node of the function statement is a parent node of the syntax node introduced by the file, and a tree structure, i.e., an abstract syntax tree, is formed by similar parent nodes and child nodes.
In specific implementation, in order to improve the universality and reduce the development cost, an official or community resolver can be used.
S104: determining a grammar conversion rule corresponding to a unified grammar structure, wherein the grammar conversion rule comprises a general layer rule and a specific layer rule, and the general layer rule comprises the following steps: mapping relationships between common standard nodes defined in the unified grammar structure and corresponding original nodes defined in a unique grammar structure corresponding to each of all programming languages supported by the grammar conversion rules, the unique layer rules including: and mapping relation between the specific standard node defined in the unified grammar structure and the corresponding original node defined in the specific grammar structure corresponding to the part of programming languages supported by the grammar conversion rule.
The embodiment of the present specification defines a unified syntax structure, syntax nodes included in the unified syntax structure are referred to as standard nodes, and any standard node defined in the unified syntax structure has a corresponding node type and node attribute defined in the unified syntax structure, wherein the node attribute of any standard node includes a node type corresponding to a child node of any standard node and/or a value corresponding to any standard node. For example, for a standard node whose node type is a binary expression node, its corresponding node attribute specifies that it has two children, and the node types of both children are expression nodes, and the operators of the binary expression that characterize the value of the binary expression node.
The standard nodes can be divided into general standard nodes and specific standard nodes according to types, wherein the general standard nodes refer to general grammar structures in all programming languages, for example, if (node) used for representing a condition judgment grammar structure in a branch node belongs to one general standard node; the specific standard node refers to a specific syntax structure that is not possessed by all programming languages, and is further embodied in the embodiment of the present specification as a syntax structure specific to a part (for example, one) of the programming languages that the syntax conversion rule already supports. The embodiments of the present specification unify the common syntax structure possessed by all the programming languages and the specific syntax structure possessed by a specific programming language, integrate and enclose them together in the unified syntax structure referred to in the embodiments of the present specification, so that the unified syntax structure can theoretically support the simultaneous representation of the syntax structures of all the programming languages. An abstract syntax tree composed of standard nodes defined in a unified syntax structure is called a unified abstract syntax tree, and characterizes all syntax structures in a program file written in any programming language (or a mixture of multiple programming languages) by a tree structure composed of standard nodes.
Referring to fig. 2, fig. 2 is a schematic diagram of node types defined by a unified syntax structure according to an exemplary embodiment. As shown in fig. 2, the node types of the standard nodes defined in the unified syntax structure include a general standard node and a specific standard node, and the specific standard node can be further classified into a node corresponding to a programming language supported by the syntax conversion rule, such as a Python node, a C node, a Java node, and a JavaScript node, according to the programming language.
The node types of the general standard nodes comprise root nodes, expression nodes, statement nodes and type nodes; wherein,
specific types of root nodes include CompileUnit;
subtypes of expression nodes include: literal nodes, definition nodes, unitary expression nodes, binary expression nodes, ternary expression nodes, module nodes, calling nodes, member access nodes, application heap nodes, identifier nodes and keyword nodes; the specific types of the Literal nodes comprise Literal and Object, the specific types of the definition nodes comprise ClassDefinition and function Definiton, the specific types of the Unary expression nodes comprise Unnary, the specific types of the Binary expression nodes comprise Binary and Assignment, the specific types of the ternary expression nodes comprise Condition, the specific types of the module nodes comprise Import and Export, the specific types of the calling nodes comprise Call, the specific types of the member access nodes comprise MemberAccess, the specific types of the application heap nodes comprise New, the specific types of the Identifier nodes comprise Identifier, and the specific types of the keyword nodes comprise This and Super;
subtypes of sentence nodes include: a declaration node, a branch node, a loop node, a control skip node and an abnormal node; the specific types of the declaration nodes comprise VariableDeclaration, the specific types of the branch nodes comprise If and Switch, the specific types of the loop nodes comprise For, forIn and While, the specific types of the control jump nodes comprise Return, break and Continue, and the specific types of the exception nodes comprise Throw, try and latch;
subtypes of type nodes include: a basic type node and an aggregation type node; the specific types of the basic type nodes comprise integer, float and string, and the specific types of the aggregation type nodes comprise Array and scanned.
In the embodiment of the present specification, a corresponding syntax transformation rule is further designed according to the uniform syntax structure, so as to guide mutual transformation between the original abstract syntax tree corresponding to various programming languages and the uniform abstract syntax tree corresponding to the uniform syntax structure. Specifically, the syntax transformation rule includes a generic layer rule and a specific layer rule, wherein the generic layer rule includes: mapping relationships between the respective universal standard nodes defined in the unified grammar structure and the corresponding original nodes defined in the unique grammar structure corresponding to each of all programming languages supported by the grammar conversion rules, the unique layer rules including: and mapping relations between each specific standard node defined in the unified grammar structure and corresponding original nodes defined in the specific grammar structure corresponding to the part of programming languages supported by the grammar conversion rules. Obviously, when all the programming languages supported by the syntax conversion rule include at least N (N ≧ 2), the mapping relationship in the common layer rule is a one-to-N relationship, and the mapping relationship in the specific layer rule can be either a one-to-one or a one-to-M (1 < M < N) relationship. By searching the mapping relation, original nodes in an original abstract syntax tree corresponding to any programming language supported by the syntax conversion rule can be converted into corresponding standard nodes in a uniform syntax structure one by one, so that the conversion from the original abstract syntax tree to the uniform abstract syntax tree is realized; or, the standard nodes in the uniform abstract syntax tree may be converted into the original abstract syntax tree corresponding to any programming language supported by the syntax conversion rule one by one, so as to realize the conversion from the uniform abstract syntax tree to the original abstract syntax tree corresponding to the specific programming language. Therefore, the syntax transformation rules related in the embodiments of the present specification are used as a set of transformation rules describing transformation relationships between the unified syntax structure and the specific syntax structures corresponding to the respective programming languages, and can implement mutual transformation between the original abstract syntax tree and the unified abstract syntax tree.
In the embodiment of the present specification, the syntax conversion rule supports the program language a, and specifically means that in the syntax conversion rule, for each general standard node in the unified syntax structure, a mapping relationship between the general standard node and a specific syntax node (original node) in the specific syntax structure corresponding to the program language a is stored. As a specific program language, there may not exist a specific syntax structure, so that the specific layer rule may not support each program language for the conversion rule, and a mapping relationship between a certain original node and a certain specific standard node in the specific syntax structure corresponding to the program language is maintained.
Optionally, the programming language supported by the syntax transformation rule at least includes: python language, C language, java language, PHP language, GO language, and JavaScript language. Since the grammar conversion rules can be continuously updated, the grammar conversion rules can theoretically support all programming languages.
S106: and matching each original node contained in the original abstract syntax tree with the mapping relation included in the syntax conversion rule, and converting each original node into a corresponding standard node in the mapping relation which is successfully matched, so as to obtain the uniform abstract syntax tree for representing the uniform syntax structure.
After the original abstract syntax tree is obtained, the tree structure of the original abstract syntax tree can be retained, and only each original node contained in the original abstract syntax tree is converted into a corresponding standard node according to a syntax conversion rule, so that a uniform abstract syntax tree for representing the program file in the uniform syntax structure is generated.
In the related art, although there are abstract syntax trees combining two or a small number of programming languages, the lack of extensibility of the syntax structure represented by the abstract syntax tree in the related art is caused by the lack of a uniform abstract representation of syntax nodes in a specific syntax structure unique to only a part of the programming languages (i.e. the lack of the unique standard nodes and the unique layer rules described in the embodiments of the present specification), which eventually leads to that the concept of representing program files of a large number of programming languages by a uniform abstract syntax tree having a uniform syntax structure cannot be realized all the time, whereas the embodiments of the present specification establish a uniform abstract representation of original nodes unique to a part of the programming languages (i.e. the unique standard nodes) by specifying the unique standard nodes in the uniform syntax structure while maintaining the corresponding unique layer rules, so that the syntax structure has a high degree of extensibility (by adding the unique standard nodes and updating the unique layer rules), which makes the syntax transformation rules involved in the embodiments of the present specification theoretically support the transformation of all the original abstract syntax trees corresponding to the programming languages into the corresponding uniform abstract syntax trees.
The embodiment of the specification designs a unified grammar structure and a corresponding grammar conversion rule, wherein the grammar conversion rule comprises a general layer rule and a specific layer rule which are respectively used for converting grammar structures of all programming languages and grammar structures of part of the programming languages into corresponding unified grammar structures, so that the complete representation of all the programming languages supported by the grammar conversion rule by the unified grammar structures is realized, the program files of different programming languages can obtain abstract representation of unified grammar structure definition, namely, the original abstract grammar trees obtained by analyzing the program files of multiple programming languages can be converted into the unified abstract grammar trees for representing the unified grammar structures through the grammar conversion rule, and the unified abstract grammar trees can indirectly represent the specific grammar structures of the multiple programming languages at the same time, thereby bringing convenience for the development work of a system which carries out further processing based on the unified abstract grammar trees. For example, a developer of the program analysis system only needs to develop a set of program analysis algorithms for the unified grammar structure, and can adapt to program analysis tasks for different programming languages, and only needs to modify the set of program analysis algorithms even if modification is needed subsequently, so that development cost of development, maintenance and iteration of the program analysis system is reduced.
Optionally, the method further includes:
acquiring a first mapping relation between a first standard node to be determined and a first original node defined by the special grammar structure;
updating a first mapping relation to the universal layer rule under the condition that a universal standard node defined in the unified grammar structure comprises a first to-be-determined node;
under the condition that a specific standard node defined in the unified grammar structure comprises a first standard node to be determined, updating a first mapping relation to the specific layer rule;
and under the condition that the universal standard node and the specific standard node defined in the unified grammar structure do not comprise a first standard node to be determined, defining the first standard node to be determined as a new specific standard node in the unified grammar structure, and updating the first mapping relation to the specific layer rule.
The embodiment of the present specification introduces an updating manner of a syntax transformation rule, and since the syntax transformation rule is essentially a set of mapping relationships, the syntax transformation rule can be updated by acquiring and adding a new mapping relationship. Specifically, under the condition that the grammar conversion rule does not contain the first mapping relation, the first mapping relation is updated to the grammar conversion rule, so that the first original node is brought into the representation range of the unified grammar structure, and the unified grammar structure is expanded.
The first mapping relation may be derived from a conversion rule base corresponding to any one of the programming languages, where the conversion rule base includes mapping relations between each undetermined standard node and each original node in the specific syntax structure corresponding to the any one of the programming languages, and when the conversion rule base corresponding to the any one of the programming languages is updated, the updated first mapping relation is obtained to implement synchronous updating between the syntax conversion rule and the conversion rule base corresponding to the any one of the programming languages; the first mapping relationship may also be derived from manual input by a developer of the grammar conversion rules.
After the first mapping relation is obtained, the type of the first standard node to be determined corresponding to the unified syntax structure needs to be determined, so that the first mapping relation is correctly updated to the general layer rule or the specific layer rule. If the first standard node to be determined does not belong to any standard node defined in the unified grammar structure, it indicates that the current unified grammar structure can not represent the first original node temporarily, and then the first mapping relation can be updated to the specific layer rule by defining the first standard node to be the new specific standard node in the unified grammar structure, so that the unified abstract representation and the corresponding conversion rule of the specific original node in any programming language are established in the unified grammar structure, and the support of the grammar conversion rule to any programming language is improved.
Optionally, the obtaining a first mapping relationship between the first standard node to be determined and the first original node defined by the unique syntax structure includes:
under the condition that the matching of a first original node contained in the original abstract syntax tree and all mapping relations included in the syntax conversion rule fails, acquiring a first mapping relation; or,
and acquiring a rule updating instruction, wherein the rule updating instruction comprises a first mapping relation.
In this embodiment of the present specification, when the original abstract syntax tree includes a first original node and all mapping relationships included in the first original node and the syntax conversion rule fail to match, the first mapping relationship may be obtained to update the syntax conversion rule, so that the updated syntax conversion rule may normally support conversion of the first original node into the first node to be determined, and thus, in a case where the syntax conversion rule cannot completely support any one of the programming languages (in a case where the first original node does not support conversion), it may still be ensured that the original abstract syntax tree may be successfully converted to obtain a unified abstract syntax tree by immediately updating the first mapping relationship in the syntax conversion rule.
Of course, the timing of acquiring the first mapping relationship may be any, and for example, when a rule update command including the first mapping relationship is acquired, the update of the relevant rule update command may be executed.
Optionally, the method further includes: and executing program analysis on the uniform abstract syntax tree based on a program analysis system corresponding to the uniform syntax structure.
In the embodiment of the present specification, since the program analysis system only needs to interface a set of uniform syntax structure, the program analysis of the program file of any programming language can be implemented, that is, only the uniform abstract syntax tree for representing the uniform syntax structure needs to be oriented, and the repeated development of the functional layer for different syntax structures corresponding to different programming languages is not needed, and then only a set of program analysis algorithm needs to be modified, which reduces the development cost of development, maintenance and iteration of the program analysis system.
Referring to fig. 3, fig. 3 is a flowchart of a method for program analysis according to an exemplary embodiment. As shown in fig. 3, the method includes:
s302: acquiring a uniform abstract syntax tree which is corresponding to a program file of any programming language and is used for representing a uniform syntax structure; wherein, a standard node included in the uniform abstract syntax tree is obtained by converting an original node included in an original abstract syntax tree based on a syntax conversion rule corresponding to the uniform syntax structure, the original abstract syntax tree is obtained by parsing the program file and is used for representing a specific syntax structure corresponding to any programming language, the syntax conversion rule includes a generic layer rule and a specific layer rule, wherein the generic layer rule includes: mapping relationships between common standard nodes defined in the unified grammar structure and corresponding original nodes defined in a unique grammar structure corresponding to each of all programming languages supported by the grammar conversion rules, the unique layer rules including: mapping relation between specific standard nodes defined in the unified grammar structure and corresponding original nodes defined in a specific grammar structure corresponding to a part of programming languages supported by the grammar conversion rule;
s304: and executing program analysis on the uniform abstract syntax tree based on a program analysis system corresponding to the uniform syntax structure.
The embodiment of the specification designs a unified grammar structure and a corresponding grammar conversion rule, wherein the grammar conversion rule comprises a general layer rule and a specific layer rule which are respectively used for converting grammar structures of all programming languages and grammar structures of part of the programming languages into corresponding unified grammar structures, so that the complete representation of all the programming languages supported by the grammar conversion rule by the unified grammar structures is realized, the program files of different programming languages can obtain abstract representation of unified grammar structure definition, namely, the original grammar trees obtained by analyzing the program files of multiple programming languages can be converted into the unified abstract grammar trees for representing the unified grammar structures through the grammar conversion rule, and the unified abstract grammar trees can simultaneously and indirectly represent specific grammar structures of the multiple programming languages, thereby bringing convenience for the development work of a program analysis system for further program analysis based on the unified abstract grammar trees. A developer of the program analysis system can adapt to program analysis tasks of different programming languages only by developing a set of program analysis algorithms according to the unified grammar structure, and only needs to modify the set of program analysis algorithms even if modification is needed subsequently, so that the development cost of development, maintenance and iteration of the program analysis system is reduced.
It is easily understood that the generation process of the unified abstract syntax tree in the embodiment shown in fig. 3 is not substantially different from the related description in the embodiment shown in fig. 1, and the foregoing description for the embodiment shown in fig. 1 is applicable to the embodiment shown in fig. 3.
FIG. 4 is a schematic block diagram of an apparatus provided in an exemplary embodiment. Referring to fig. 4, at the hardware level, the apparatus includes a processor 402, an internal bus 404, a network interface 406, a memory 408, and a non-volatile memory 410, but may also include hardware required for other services. One or more embodiments of the present description may be implemented in software, such as by processor 402 reading a corresponding computer program from non-volatile storage 410 into memory 408 and then executing the computer program. Of course, besides software implementation, the one or more embodiments in this specification do not exclude other implementations, such as logic devices or combinations of software and hardware, and so on, that is, the execution subject of the following processing flow is not limited to each logic unit, and may also be hardware or logic devices.
Fig. 5 is a block diagram of an apparatus for generating a unified abstract syntax tree according to an exemplary embodiment, which may be applied to the device shown in fig. 4 to implement the technical solution of the present specification, as shown in fig. 5. The device includes:
a program file obtaining unit 501, configured to obtain a program file of any programming language, and parse the program file into an original abstract syntax tree used for representing a specific syntax structure corresponding to the any programming language.
A syntax transformation rule determining unit 502, configured to determine a syntax transformation rule corresponding to the unified syntax structure, where the syntax transformation rule includes a generic layer rule and a specific layer rule, and the generic layer rule includes: mapping relationships between common standard nodes defined in the unified grammar structure and corresponding original nodes defined in a unique grammar structure corresponding to each of all programming languages supported by the grammar conversion rules, the unique layer rules including: and mapping relation between the specific standard node defined in the uniform grammar structure and the corresponding original node defined in the specific grammar structure corresponding to the part of the programming language supported by the grammar conversion rule.
A converting unit 503, configured to match each original node included in the original abstract syntax tree with the mapping relationship included in the syntax converting rule, and convert each original node into a standard node corresponding to the successfully matched mapping relationship, so as to obtain a unified abstract syntax tree for representing the unified syntax structure.
Optionally, the method further includes:
a first mapping relationship obtaining unit 504, configured to obtain a first mapping relationship between the first standard node to be determined and the first original node defined by the specific syntax structure.
A first mapping relationship updating unit 505, configured to: updating a first mapping relation to the universal layer rule under the condition that a universal standard node defined in the unified grammar structure comprises a first to-be-determined node; under the condition that a specific standard node defined in the unified grammar structure comprises a first standard node to be determined, updating a first mapping relation to the specific layer rule; and under the condition that the universal standard node and the specific standard node defined in the unified grammar structure do not comprise a first standard node to be determined, defining the first standard node to be determined as a new specific standard node in the unified grammar structure, and updating the first mapping relation to the specific layer rule.
Optionally, the first mapping relationship obtaining unit 504 is specifically configured to:
under the condition that the matching of a first original node contained in the original abstract syntax tree and all mapping relations included in the syntax conversion rule fails, acquiring a first mapping relation; or,
and acquiring a rule updating instruction, wherein the rule updating instruction comprises a first mapping relation.
Optionally, the method further includes:
a program analysis executing unit 506, configured to execute program analysis on the unified abstract syntax tree based on the program analysis system corresponding to the unified syntax structure.
Optionally, any standard node defined in the unified syntax structure has a corresponding node type and node attribute, where the node attribute of any standard node includes a node type corresponding to a child node of any standard node and/or a value corresponding to any standard node.
Optionally, the node types include a root node, an expression node, a statement node, and a type node; wherein,
the specific type of root node includes CompileUnit;
subtypes of expression nodes include: the system comprises a literal node, a definitional node, a unitary expression node, a binary expression node, a ternary expression node, a module node, a calling node, a member access node, an application heap node, an identifier node and a keyword node; the specific types of the Literal nodes comprise Literal and Object, the specific types of the definitional nodes comprise ClassDefinition and function Definiton, the specific types of the Unary expression nodes comprise Unary, the specific types of the Binary expression nodes comprise Binary and Assignment, the specific types of the ternary expression nodes comprise Condition, the specific types of the module nodes comprise Import and Export, the specific types of the calling nodes comprise Call, the specific types of the member access nodes comprise MemberAccess, the specific types of the application heap nodes comprise New, the specific types of the Identifier nodes comprise Identifier, and the specific types of the keyword nodes comprise This and Super;
subtypes of sentence nodes include: a declaration node, a branch node, a loop node, a control skip node and an abnormal node; the specific types of the declaration nodes comprise VariableDeclusion, the specific types of the branch nodes comprise If and Switch, the specific types of the loop nodes comprise For, forin and While, the specific types of the control jump nodes comprise Return, break and Continue, and the specific types of the exception nodes comprise Throw, try and latch;
subtypes of type nodes include: a basic type node and an aggregation type node; the specific types of the basic type nodes comprise integer, float and string, and the specific types of the aggregation type nodes comprise Array and scanned.
Optionally, the programming language supported by the grammar conversion rule at least includes: python language, C language, java language, PHP language, GO language, and JavaScript language.
Fig. 6 is a block diagram of a program analysis apparatus provided in this specification according to an exemplary embodiment, and the apparatus may be applied to the device shown in fig. 4 to implement the technical solution of this specification. The device comprises:
a uniform abstract syntax tree obtaining unit 601, configured to obtain a uniform abstract syntax tree corresponding to a program file of any programming language and used for representing a uniform syntax structure; wherein, a standard node included in the uniform abstract syntax tree is obtained by converting an original node included in an original abstract syntax tree based on a syntax conversion rule corresponding to the uniform syntax structure, the original abstract syntax tree is obtained by parsing the program file and is used for representing a specific syntax structure corresponding to any programming language, the syntax conversion rule includes a generic layer rule and a specific layer rule, wherein the generic layer rule includes: mapping relationships between common standard nodes defined in the unified grammar structure and corresponding original nodes defined in a unique grammar structure corresponding to each of all programming languages supported by the grammar conversion rules, the unique layer rules including: and mapping relation between the specific standard node defined in the unified grammar structure and the corresponding original node defined in the specific grammar structure corresponding to the part of programming languages supported by the grammar conversion rule.
A program analysis unit 602, configured to perform program analysis on the uniform abstract syntax tree based on a program analysis system corresponding to the uniform syntax structure.
The above device embodiment corresponds to the above method embodiment, and there is no substantial difference, and the above descriptions for the embodiments shown in fig. 1 and fig. 3 are applicable to the embodiments shown in fig. 5 and fig. 6, and are not repeated here.
In the 90 s of the 20 th century, improvements in a technology could clearly distinguish between improvements in hardware (e.g., improvements in circuit structures such as diodes, transistors, switches, etc.) and improvements in software (improvements in process flow). However, as technology advances, many of today's process flow improvements have been seen as direct improvements in hardware circuit architecture. Designers almost always obtain the corresponding hardware circuit structure by programming an improved method flow into the hardware circuit. Thus, it cannot be said that an improvement in the process flow cannot be realized by hardware physical blocks. For example, a Programmable Logic Device (PLD), such as a Field Programmable Gate Array (FPGA), is an integrated circuit whose Logic functions are determined by programming the Device by a user. A digital system is "integrated" on a PLD by the designer's own programming without requiring the chip manufacturer to design and fabricate application-specific integrated circuit chips. Furthermore, nowadays, instead of manually manufacturing an Integrated Circuit chip, such Programming is often implemented by "logic compiler" software, which is similar to a software compiler used in program development, but the original code before compiling is also written in a specific Programming Language, which is called Hardware Description Language (HDL), and the HDL is not only one kind but many kinds, such as abll (Advanced boot Expression Language), AHDL (alternate hard Description Language), traffic, CUPL (computer universal Programming Language), HDCal (Java hard Description Language), lava, lola, HDL, PALASM, software, rhydl (Hardware Description Language), and vhul-Language (vhyg-Language), which is currently used in the field. It will also be apparent to those skilled in the art that hardware circuitry that implements the logical method flows can be readily obtained by merely slightly programming the method flows into an integrated circuit using the hardware description languages described above.
The controller may be implemented in any suitable manner, for example, the controller may take the form of, for example, a microprocessor or processor and a computer readable medium that stores computer readable program code (e.g., software or firmware) executable by the (micro) processor, logic gates, switches, an Application Specific Integrated Circuit (ASIC), a programmable logic controller, and embedded microcontrollers, examples of which include, but are not limited to, the following microcontrollers: ARC 625D, atmel AT91SAM, microchip PIC18F26K20, and Silicone Labs C8051F320, the memory controller may also be implemented as part of the control logic for the memory. Those skilled in the art will also appreciate that, in addition to implementing the controller as pure computer readable program code, the same functionality can be implemented by logically programming method steps such that the controller is in the form of logic gates, switches, application specific integrated circuits, programmable logic controllers, embedded microcontrollers and the like. Such a controller may thus be considered a hardware component, and the means included therein for performing the various functions may also be considered as a structure within the hardware component. Or even means for performing the functions may be conceived to be both a software module implementing the method and a structure within a hardware component.
The systems, apparatuses, modules or units described in the above embodiments may be specifically implemented by a computer chip or an entity, or implemented by a product with certain functions. One typical implementation device is a server system. Of course, the present invention does not exclude that as future computer technology develops, the computer implementing the functionality of the above described embodiments may be, for example, a personal computer, a laptop computer, a vehicle-mounted human-computer interaction device, a cellular phone, a camera phone, a smart phone, a personal digital assistant, a media player, a navigation device, an email device, a game console, a tablet computer, a wearable device or a combination of any of these devices.
Although one or more embodiments of the present description provide method operation steps as described in the embodiments or flowcharts, more or fewer operation steps may be included based on conventional or non-inventive means. The order of steps recited in the embodiments is merely one manner of performing the steps in a multitude of orders and does not represent the only order of execution. When an actual apparatus or end product executes, it may execute sequentially or in parallel (e.g., parallel processors or multi-threaded environments, or even distributed data processing environments) according to the method shown in the embodiment or the figures. The terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, the presence of additional identical or equivalent elements in a process, method, article, or apparatus that comprises the recited elements is not excluded. For example, the use of the terms first, second, etc. are used to denote names, but not to denote any particular order.
For convenience of description, the above devices are described as being divided into various modules by functions, which are described separately. Of course, when implementing one or more of the present description, the functions of each module may be implemented in one or more software and/or hardware, or a module implementing the same function may be implemented by a combination of multiple sub-modules or sub-units, etc. The above-described embodiments of the apparatus are merely illustrative, and for example, the division of the units is only one type of logical functional division, and other divisions may be realized in practice, for example, multiple units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.
The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
In a typical configuration, a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.
The memory may include forms of volatile memory in a computer readable medium, random Access Memory (RAM) and/or non-volatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM). Memory is an example of a computer-readable medium.
Computer-readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), static Random Access Memory (SRAM), dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), read Only Memory (ROM), electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), digital Versatile Disks (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage, graphene storage or other magnetic storage devices, or any other non-transmission medium, which can be used to store information that can be accessed by a computing device. As defined herein, a computer readable medium does not include a transitory computer readable medium such as a modulated data signal and a carrier wave.
As will be appreciated by one skilled in the art, one or more embodiments of the present description may be provided as a method, system, or computer program product. Accordingly, one or more embodiments of the present description may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, one or more embodiments of the present description may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
One or more embodiments of the specification may be described in the general context of computer-executable instructions, such as program modules, being executed by a computer. Generally, program modules include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types. One or more embodiments of the present specification can also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, program modules may be located in both local and remote computer storage media including memory storage devices.
The embodiments in the present specification are described in a progressive manner, and the same and similar parts among the embodiments are referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, for the system embodiment, since it is substantially similar to the method embodiment, the description is simple, and for the relevant points, reference may be made to the partial description of the method embodiment. In the description of the specification, reference to the description of the term "one embodiment," "some embodiments," "an example," "a specific example," or "some examples," etc., means that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the specification. In this specification, the schematic representations of the terms used above are not necessarily intended to refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples. Moreover, various embodiments or examples and features of various embodiments or examples described in this specification can be combined and combined by one skilled in the art without being mutually inconsistent.
The above description is merely exemplary of one or more embodiments of the present disclosure and is not intended to limit the scope of one or more embodiments of the present disclosure. Various modifications and alterations to one or more embodiments described herein will be apparent to those skilled in the art. Any modification, equivalent replacement, improvement or the like made within the spirit and principle of the present specification should be included in the scope of the claims.

Claims (12)

1. A method of generating a unified abstract syntax tree, comprising:
acquiring a program file of any programming language, and analyzing the program file into an original abstract syntax tree for representing a specific syntax structure corresponding to any programming language;
determining a grammar conversion rule corresponding to a unified grammar structure, wherein the grammar conversion rule comprises a general layer rule and a specific layer rule, and the general layer rule comprises the following steps: mapping relationships between common standard nodes defined in the unified grammar structure and corresponding original nodes defined in a unique grammar structure corresponding to each of all programming languages supported by the grammar conversion rules, the unique layer rules including: mapping relation between specific standard nodes defined in the unified grammar structure and corresponding original nodes defined in a specific grammar structure corresponding to a part of programming languages supported by the grammar conversion rule;
and matching each original node contained in the original abstract syntax tree with the mapping relation included in the syntax conversion rule, and converting each original node into a corresponding standard node in the mapping relation which is successfully matched, so as to obtain the uniform abstract syntax tree for representing the uniform syntax structure.
2. The method of claim 1, further comprising:
acquiring a first mapping relation between a first standard node to be determined and a first original node defined by the special grammar structure;
updating a first mapping relation to the universal layer rule under the condition that a universal standard node defined in the unified grammar structure comprises a first to-be-determined node;
under the condition that a specific standard node defined in the unified grammar structure comprises a first standard node to be determined, updating a first mapping relation to the specific layer rule;
and under the condition that the universal standard node and the specific standard node defined in the unified grammar structure do not comprise a first standard node to be determined, defining the first standard node to be determined as a new specific standard node in the unified grammar structure, and updating the first mapping relation to the specific layer rule.
3. The method of claim 2, said obtaining a first mapping relationship between a first candidate standard node and a first original node defined by the specific syntax structure, comprising:
under the condition that the matching of a first original node contained in the original abstract syntax tree and all mapping relations included in the syntax conversion rule fails, acquiring a first mapping relation; or,
and acquiring a rule updating instruction, wherein the rule updating instruction comprises a first mapping relation.
4. The method of claim 1, further comprising:
and executing program analysis on the uniform abstract syntax tree based on a program analysis system corresponding to the uniform syntax structure.
5. The method according to claim 1, wherein any standard node defined in the unified syntax structure has a corresponding node type and node attribute, wherein the node attribute of any standard node comprises a node type corresponding to a child node of any standard node and/or a value corresponding to any standard node.
6. The method of claim 5, the node types comprising a root node, an expression node, a statement node, and a type node; wherein,
the specific type of root node includes CompileUnit;
subtypes of expression nodes include: the system comprises a literal node, a definitional node, a unitary expression node, a binary expression node, a ternary expression node, a module node, a calling node, a member access node, an application heap node, an identifier node and a keyword node; the specific types of the Literal nodes comprise Literal and Object, the specific types of the definition nodes comprise ClassDefinition and function Definiton, the specific types of the Unary expression nodes comprise Unnary, the specific types of the Binary expression nodes comprise Binary and Assignment, the specific types of the ternary expression nodes comprise Condition, the specific types of the module nodes comprise Import and Export, the specific types of the calling nodes comprise Call, the specific types of the member access nodes comprise MemberAccess, the specific types of the application heap nodes comprise New, the specific types of the Identifier nodes comprise Identifier, and the specific types of the keyword nodes comprise This and Super;
subtypes of sentence nodes include: a declaration node, a branch node, a loop node, a control skip node and an abnormal node; the specific types of the declaration nodes comprise VariableDeclusion, the specific types of the branch nodes comprise If and Switch, the specific types of the loop nodes comprise For, forin and While, the specific types of the control jump nodes comprise Return, break and Continue, and the specific types of the exception nodes comprise Throw, try and latch;
subtypes of type nodes include: a basic type node and an aggregation type node; the specific types of the basic type nodes comprise integer, float and string, and the specific types of the aggregation type nodes comprise Array and scanned.
7. The method of claim 1, the programming language supported by the grammar transformation rules comprising at least: python language, C language, java language, PHP language, GO language, and JavaScript language.
8. A method of program analysis, comprising:
acquiring a uniform abstract syntax tree which is corresponding to a program file of any programming language and is used for representing a uniform syntax structure; wherein, the standard node included in the uniform abstract syntax tree is obtained by converting an original node included in an original abstract syntax tree based on a syntax conversion rule corresponding to the uniform syntax structure, the original abstract syntax tree is obtained by parsing the program file and is used for representing a specific syntax structure corresponding to any programming language, the syntax conversion rule includes a generic layer rule and a specific layer rule, wherein the generic layer rule includes: mapping relationships between common standard nodes defined in the unified grammar structure and corresponding original nodes defined in a unique grammar structure corresponding to each of all programming languages supported by the grammar conversion rules, the unique layer rules including: mapping relation between specific standard nodes defined in the uniform grammar structure and corresponding original nodes defined in the specific grammar structure corresponding to a part of programming languages supported by the grammar conversion rule;
and executing program analysis on the uniform abstract syntax tree based on a program analysis system corresponding to the uniform syntax structure.
9. An apparatus for generating a unified abstract syntax tree, comprising:
the system comprises a program file acquisition unit, a syntax analysis unit and a syntax analysis unit, wherein the program file acquisition unit is used for acquiring a program file of any programming language and analyzing the program file into an original abstract syntax tree for representing a specific syntax structure corresponding to the any programming language;
a grammar conversion rule determining unit, configured to determine a grammar conversion rule corresponding to a unified grammar structure, where the grammar conversion rule includes a generic layer rule and a specific layer rule, and the generic layer rule includes: mapping relationships between common standard nodes defined in the unified grammar structure and corresponding original nodes defined in a unique grammar structure corresponding to each of all programming languages supported by the grammar conversion rules, the unique layer rules including: mapping relation between specific standard nodes defined in the unified grammar structure and corresponding original nodes defined in a specific grammar structure corresponding to a part of programming languages supported by the grammar conversion rule;
and the conversion unit is used for matching each original node contained in the original abstract syntax tree with the mapping relation included in the syntax conversion rule, and converting each original node into a corresponding standard node in the mapping relation which is successfully matched, so as to obtain the uniform abstract syntax tree for representing the uniform syntax structure.
10. An apparatus for program analysis, comprising:
the uniform abstract syntax tree acquisition unit is used for acquiring a uniform abstract syntax tree which corresponds to a program file of any programming language and is used for representing a uniform syntax structure; wherein, a standard node included in the uniform abstract syntax tree is obtained by converting an original node included in an original abstract syntax tree based on a syntax conversion rule corresponding to the uniform syntax structure, the original abstract syntax tree is obtained by parsing the program file and is used for representing a specific syntax structure corresponding to any programming language, the syntax conversion rule includes a generic layer rule and a specific layer rule, wherein the generic layer rule includes: mapping relationships between common standard nodes defined in the unified grammar structure and corresponding original nodes defined in a unique grammar structure corresponding to each of all programming languages supported by the grammar conversion rules, the unique layer rules including: mapping relation between specific standard nodes defined in the uniform grammar structure and corresponding original nodes defined in the specific grammar structure corresponding to a part of programming languages supported by the grammar conversion rule;
and the program analysis unit is used for executing program analysis on the uniform abstract syntax tree based on a program analysis system corresponding to the uniform syntax structure.
11. An electronic device, comprising:
a processor;
a memory for storing processor-executable instructions;
wherein the processor implements the method of any one of claims 1-8 by executing the executable instructions.
12. A computer readable storage medium having stored thereon computer instructions which, when executed by a processor, carry out the steps of the method according to any one of claims 1-8.
CN202211037955.5A 2022-08-26 2022-08-26 Method and device for generating uniform abstract syntax tree and program analysis Pending CN115390852A (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN202211037955.5A CN115390852A (en) 2022-08-26 2022-08-26 Method and device for generating uniform abstract syntax tree and program analysis
PCT/CN2023/109532 WO2024041301A1 (en) 2022-08-26 2023-07-27 Method and apparatus for generating unified abstract syntax tree, and program analysis method and apparatus

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211037955.5A CN115390852A (en) 2022-08-26 2022-08-26 Method and device for generating uniform abstract syntax tree and program analysis

Publications (1)

Publication Number Publication Date
CN115390852A true CN115390852A (en) 2022-11-25

Family

ID=84122183

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211037955.5A Pending CN115390852A (en) 2022-08-26 2022-08-26 Method and device for generating uniform abstract syntax tree and program analysis

Country Status (2)

Country Link
CN (1) CN115390852A (en)
WO (1) WO2024041301A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2024041301A1 (en) * 2022-08-26 2024-02-29 支付宝(杭州)信息技术有限公司 Method and apparatus for generating unified abstract syntax tree, and program analysis method and apparatus

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104391730A (en) * 2014-08-03 2015-03-04 浙江网新恒天软件有限公司 Software source code language translation system and method
CN105335412A (en) * 2014-07-31 2016-02-17 阿里巴巴集团控股有限公司 Method and device for data conversion and data migration
CN112269566A (en) * 2020-11-03 2021-01-26 支付宝(杭州)信息技术有限公司 Script generation processing method, device, equipment and system
CN112988163A (en) * 2021-04-01 2021-06-18 中国工商银行股份有限公司 Intelligent programming language adaptation method and device, electronic equipment and medium
CN113504900A (en) * 2021-07-26 2021-10-15 中国工商银行股份有限公司 Programming language conversion method and device
CN113535184A (en) * 2020-04-14 2021-10-22 华为技术有限公司 Cross-platform code conversion method and device
CN114443041A (en) * 2021-11-30 2022-05-06 阿里云计算有限公司 Method for parsing abstract syntax tree and computer program product
CN114489670A (en) * 2022-01-14 2022-05-13 北京达佳互联信息技术有限公司 Data processing method, device, equipment and storage medium
CN114780100A (en) * 2022-04-08 2022-07-22 芯华章科技股份有限公司 Compiling method, electronic device, and storage medium

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9239710B2 (en) * 2013-03-15 2016-01-19 ArtinSoft Corporation Programming language transformations with abstract syntax tree extensions
CN110825384A (en) * 2019-10-28 2020-02-21 国电南瑞科技股份有限公司 ST language compiling method, system and compiler based on LLVM
CN113467828B (en) * 2021-06-23 2024-01-12 中国海洋大学 Method and system for converting programming language in heterogeneous many-core processor
CN115390852A (en) * 2022-08-26 2022-11-25 支付宝(杭州)信息技术有限公司 Method and device for generating uniform abstract syntax tree and program analysis

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105335412A (en) * 2014-07-31 2016-02-17 阿里巴巴集团控股有限公司 Method and device for data conversion and data migration
CN104391730A (en) * 2014-08-03 2015-03-04 浙江网新恒天软件有限公司 Software source code language translation system and method
CN113535184A (en) * 2020-04-14 2021-10-22 华为技术有限公司 Cross-platform code conversion method and device
CN112269566A (en) * 2020-11-03 2021-01-26 支付宝(杭州)信息技术有限公司 Script generation processing method, device, equipment and system
CN112988163A (en) * 2021-04-01 2021-06-18 中国工商银行股份有限公司 Intelligent programming language adaptation method and device, electronic equipment and medium
CN113504900A (en) * 2021-07-26 2021-10-15 中国工商银行股份有限公司 Programming language conversion method and device
CN114443041A (en) * 2021-11-30 2022-05-06 阿里云计算有限公司 Method for parsing abstract syntax tree and computer program product
CN114489670A (en) * 2022-01-14 2022-05-13 北京达佳互联信息技术有限公司 Data processing method, device, equipment and storage medium
CN114780100A (en) * 2022-04-08 2022-07-22 芯华章科技股份有限公司 Compiling method, electronic device, and storage medium

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2024041301A1 (en) * 2022-08-26 2024-02-29 支付宝(杭州)信息技术有限公司 Method and apparatus for generating unified abstract syntax tree, and program analysis method and apparatus

Also Published As

Publication number Publication date
WO2024041301A1 (en) 2024-02-29

Similar Documents

Publication Publication Date Title
CN107545030B (en) Method, device and equipment for processing data blood relationship
CN108628947B (en) Business rule matching processing method, device and processing equipment
CN107562467B (en) Page rendering method, device and equipment
CN107239479B (en) Block chain based data storage and query method and device
CN107391101B (en) Information processing method and device
CN110968601A (en) Data query processing method and device
CN110245002B (en) System interaction method, device, equipment and storage medium
CN107622080B (en) Data processing method and equipment
CN111399812B (en) Component construction method and device, development framework and equipment
CN115756449B (en) Page multiplexing method and device, storage medium and electronic equipment
CN115982416A (en) Data processing method and device, readable storage medium and electronic equipment
CN116185532B (en) Task execution system, method, storage medium and electronic equipment
CN116483859A (en) Data query method and device
CN112199416A (en) Data rule generation method and device
WO2024041301A1 (en) Method and apparatus for generating unified abstract syntax tree, and program analysis method and apparatus
CN112269566B (en) Script generation processing method, device, equipment and system
CN110941443B (en) Method and device for modifying file name in SDK (software development kit) and electronic equipment
CN116432185B (en) Abnormality detection method and device, readable storage medium and electronic equipment
CN111324803A (en) Query request processing method and device of search engine and client
CN116010419A (en) Method and device for creating unique index and optimizing logic deletion
CN115391426A (en) Data query method and device, storage medium and electronic equipment
CN115934161A (en) Code change influence analysis method, device and equipment
CN115391378A (en) Attribute graph query method, device and equipment
CN117170669B (en) Page display method based on front-end high-low code fusion
CN116644090B (en) Data query method, device, equipment and medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination