CN114579130A

CN114579130A - Automatic inference method for node.JS code segment environment dependency based on program analysis

Info

Publication number: CN114579130A
Application number: CN202011374137.5A
Authority: CN
Inventors: 张卫丰; 黄泽龙; ***
Original assignee: Nanjing University of Posts and Telecommunications
Current assignee: Nanjing University of Posts and Telecommunications
Priority date: 2020-11-30
Filing date: 2020-11-30
Publication date: 2022-06-03

Abstract

The invention relates to an automatic inference method of node.JS code segment environment dependency based on program analysis, which comprises the following steps: firstly, constructing a knowledge base of a known npm package according to SourceRank in a library. Secondly, discovering information related to package dependency by using a combination of static analysis, dynamic analysis and association rule mining, modeling the information into an interdependence graph according to the relationship between the information and the graph, and storing the graph by using a graph database; then, for a given new node.js code segment, the target code is analyzed and a list of all imported resources is extracted, the list is mapped back to a group of installable software packages, and then the found dependency items are correctly sequenced by using an inference algorithm for direct dependency and transfer dependency following the installation sequence, so as to obtain the final returned result.

Description

Automatic inference method for node.JS code segment environment dependency based on program analysis

Technical Field

The invention belongs to the technical field of computers. Especially in the field of software technology. The invention provides a method for automatically deducing the environment dependency of node.JS code segments based on program analysis, which can effectively solve the problems that the codes shared by all large code communication platforms at present cannot run and are difficult to reproduce, thereby promoting code technical communication and improving development efficiency.

Background

JS is an event-driven I/O server side JavaScript environment, and based on a V8 engine of Google, the V8 engine has very high Javascript execution speed and very good performance. Therefore, in the period of several years, node.js gradually develops into a mature development platform, and attracts a plurality of developers. Js, and in addition, developers can also use it to develop mobile Web frameworks.

The reason why js was successful is that its built-in packet manager npm is also extremely powerful, except that it uses the same syntax of front-end js, directly attracting a large front-end developer as the initial user. npm is the largest software registry in the world, with approximately 30 hundred million downloads per week, containing over 600000 packages. Open source software developers from various continents share and use npm for reference with each other. The structure of the package enables developers to easily track dependencies and versions. npm can manage the dependency of the node.js project well and make it easy for the developer to issue its own package change exception. Therefore, the cost is not high whether you use the bags of other people or distribute the bags to other people.

Currently, code sourcing is becoming more and more a concern in developer circles due to the growing proliferation of code sourcing communities, where platform registered users, such as StackOverFlow, GitHub Gist, and Jupyter communities, are very many and active, on which thousands of code sharing segments have accumulated. This is equivalent to providing a very useful application store for the user, all of whom are free to download the required code fragments or items, which provides great convenience to the developer.

The large-scale application of the node.JS expands the communication requirement among developers. Platforms such as StackOverFlow, GitHub Gist, and Jupyter communities provide a way for users to communicate technology and code. In the solution of many questions in these platforms, respondents are usually attached with corresponding code fragments. These code fragments are usually validated by the respondents and can be used to solve the problem.

While code sharing provides many benefits, these shared codes often suffer from being unworkable and difficult to reproduce. According to research, more than 50% of codes on the Gist platform cannot run smoothly under the native environment. Although there is a possibility that the code itself has errors, generally speaking, the main problem is caused by the inconsistency of the running environments of different users, wherein the relevant dependent library missing is the most important problem. Parnin's survey shows that developers configure code for which the operating environment depends on time consumption of typically no less than 20 minutes. Therefore, the method for automatically deducing the environment dependency of the Nodejs code segments has strong practical significance for promoting technical communication and improving development efficiency.

Disclosure of Invention

The invention mainly provides an automatic inference method for node.JS code segment environment dependency based on program analysis. Js code fragments are the result of the function call and the dependency package containing the function declaration. Second, the present invention uses an offline repository to correctly infer the dependencies of target scripts. This knowledge base contains the packages, their versions and resources and the relationships between them. It is constructed by applying static and dynamic analysis to known packages in the library of library. Where static analysis enumerates known resources of a package for later retrieval, dynamic analysis gathers information about delivery dependencies. Then, the association rule mining of dependencies in the public Python project leverages the knowledge of system-level transitive dependencies generated by developers. Finally, an inference algorithm of direct dependency and transitive dependency following the installation order is used on the basis of an offline knowledge base for the given strange code fragments. In view of the above problems, the work and contributions of the present invention are as follows:

1. a technique for computing package dependencies using static analysis, dynamic analysis, and mining knowledge sources of system-level transitive dependencies generated with the behavior of developers. The present invention analyzes npm the first ten thousand packets with high utilization according to SourceRank in library.io dataset, and selects packets to contain the most common library according to source level, because the common library can affect most packet ecosystems, and the size of the whole packet ecosystem cannot be analyzed comprehensively. Known common packet resources are found and enumerated through static analysis. For the case where some software packages may not list the dependent items correctly, we also address this by using the top ten thousand packages from SourceRank in the libraries.

2. An inference algorithm for direct dependency and transitive dependency following an installation order. For the environment dependent items of the code fragments, an additional constraint is provided, namely the dependent items must be returned in a correct mode after inference, otherwise errors can occur due to problems of direct dependency and transfer dependency among the dependent packages. The inference algorithm first extracts the imported resources from the target application, then queries a knowledge base generated by static analysis to determine a set of packages to which the resources may belong, and then traverses a dependency graph between the set of packages to determine delivery dependencies.

Drawings

FIG. 1 is a final modeled interdependence graph in the knowledge base of the present invention

FIG. 2 is a flow chart of the inference algorithm for direct dependency and transitive dependency following installation order according to the present invention

FIG. 3 is a schematic diagram illustrating the automatic inference generation of dependency of node.JS code fragment environment based on program analysis in accordance with the present invention

Detailed Description

The invention specifically comprises the following steps:

1) the most common packets are first selected according to the source level of SourceRank in the libraries. io dataset. The known resources of the package are enumerated for later retrieval by static analysis.

2) For software packages that cannot list dependent items correctly, we use dynamic analysis to resolve. An attempt is made to install the software package using npm install, record the successfully installed resources, parse the error output for the resources that failed to install, and based on the output, our dynamic analysis process will rely on the record input to the knowledge base.

3) We model the knowledge base as an interdependency graph, using a graph database store. Where nodes represent existing objects in the knowledge base and directed edges represent relationships between them.

4) Js code fragments are given, the target code is parsed and a list of all imported resources is extracted, which is mainly implemented by constructing an Abstract Syntax Tree (AST) of the source code.

5) Knowing the resources of the code, it can be mapped back to a set of installable software packages. We perform this reverse lookup by querying our knowledge base and the package management system of potential matching records.

6) After the required dependency packages are obtained in 5), the found dependency items are correctly sequenced according to the interdependence graphs in 3) by the direct dependency and the transfer dependency of the packages, and the final return result is obtained.

The flow of static analysis in step 1) is as follows: for the first ten thousand npm packets in the SourceRank source-level ordering in the libraries. io dataset, an attempt is made to install using npm install, and if the installation is successful, the packet resources are recorded. For a small fraction of software packages that cannot be installed, we try to manually download and parse the distribution of the package.

Step 2) some software packages may not list their dependencies correctly, thereby preventing npm from automatically processing the resolution during installation. We will parse the output for its wrong output when the installation fails, for example: "no module name < name >", "cand find < name >" etc., which indicate dependence on certain non-existent packages, and enter their dependence records into the knowledge base according to its hint.

Step 3) for the interdependency graph, the nodes mainly used by us are package nodes, version nodes, resource nodes and association nodes, and the method specifically refers to fig. 1 in the description of the attached drawings. Where all known versions of a packet are represented as version nodes, the versions are tagged with tagged versions and store packet version numbers. The resource node is owned by the version node and indicated by the directed edge of the version node. The association nodes represent various association rules, the nodes are marked as associated and metadata is maintained to ensure confidence, support, promotion and counting.

And 4) analyzing the target application program and extracting all lists of the imported resources. We do this by building an Abstract Syntax Tree (AST) of the source code.

Step 5) once the resources of the application are known, a set of installable software packages is mapped back. We perform this reverse lookup by querying our knowledge base and the package management system of potential matching records. The match between the resource required by the application and the installable package may be determined by a full match or a partial match of one or more known resources in the knowledge base. In addition, we also check if there is a package with the same name as the required resource, i.e. after the reverse lookup is completed, the package name is normalized to match the name on the package management system.

Step 6) knowing only the packets corresponding to the top level resources is often not sufficient for proper environment configuration, as these packets may themselves depend on other packets. Assuming that the interdependence graph contains all necessary relationships, the set of packages that must be installed P is a set of resolved direct dependencies S joined with a set of packages R reachable from S.

However, it is not sufficient to calculate P alone. We must also maintain the correct ordering of dependencies so that each package is installed before any other package that depends on it. We do this by performing a depth-first search rooted at each package p ∈ S.

Claims

1. A method for automatically deducing node.JS code segment environment dependency based on program analysis comprises the following steps: firstly, constructing a knowledge base of a known npm package according to SourceRank in a library. Secondly, discovering information related to package dependency by using a combination of static analysis, dynamic analysis and association rule mining, modeling the information into an interdependence graph according to the relationship between the information and the graph, and storing the graph by using a graph database; then, for a given new node.js code segment, the target code is analyzed and a list of all imported resources is extracted, the list is mapped back to a group of installable software packages, and then the found dependency items are correctly sequenced by using an inference algorithm for direct dependency and transfer dependency following the installation sequence, so as to obtain the final returned result.

2. Js code fragment environment dependency automatic inference method based on program analysis described in claim 1, characterized by the following steps:

1) the most common packet is first selected according to the source level of SourceRank in the library. The known resources of the package are enumerated for later retrieval by static analysis.

4) Js code fragments are given, the target code is parsed and a list of all imported resources is extracted, which is mainly implemented by constructing an Abstract Syntax Tree (AST) of the source code. Knowing the resources of the code, it can be mapped back to a set of installable software packages. We perform this reverse lookup by querying our knowledge base and the package management system of potential matching records.

5) After the required dependency packages are obtained in 4), the found dependency items are correctly sequenced according to the interdependence graphs in 3) by the direct dependency and the transfer dependency of the packages, and the final return result is obtained.

3. The method of claim 2, wherein in step 1) a technique is used to compute package dependencies using a knowledge source of system-level transitive dependencies generated by static analysis. The most common packets are selected according to the source level of SourceRank in the libraries. io dataset. The known resources of the package are enumerated for later retrieval by static analysis, i.e., an offline knowledge base is built.

4. Js code fragment environment dependency automatic inference method based on program analysis as claimed in claim 2, characterized by the fact that in step 2) we use dynamic analysis to resolve for packages that cannot list the dependent items correctly. Some software packages may not list their dependencies correctly, preventing npm from automatically processing the resolution during installation. We will parse the output for its wrong output when the installation fails, for example: "no module name < name >", "cand find < name >" etc., which indicate dependence on certain non-existent packages, and enter their dependence records into the knowledge base according to its hint.

5. A method for automatically inferring node.js code fragment environment dependency based on procedural analysis as claimed in claim 2 wherein in step 3) the knowledge base is modeled as an interdependence graph. For the interdependency graph, the nodes mainly used by us are a package node, a version node, a resource node and an association node, and refer to fig. 1 in the description of the drawings specifically. Where all known versions of a package are represented as version nodes, the versions are tagged with tagged versions and store package version numbers. The resource node is owned by the version node, indicated by the directed edge of the version node. The association nodes represent various association rules, and the nodes are marked as associations and maintain metadata to ensure confidence, support, promotion, and counting.

6. A method of automatically inferring node.js code fragment environment dependency based on procedural analysis as claimed in claim 2, wherein in step 4) the object code is parsed and a list of all imported resources is extracted for a given new node.js code fragment, this reverse lookup is performed by querying our knowledge base and the package management system for potential matching records. The match between the resource required by the application and the installable package may be determined by a full match or a partial match of one or more known resources in the knowledge base. In addition, we also check if there is a package with the same name as the required resource, i.e. after the reverse lookup is completed, the package name is normalized to match the name on the package management system.

7. Method for automatic inference of dependency of environment of js code sections based on procedural analysis according to claim 2 characterised by an inference algorithm for direct dependency and transitive dependency following the installation order in step 5). Knowing only the packets corresponding to the top level resources is often not sufficient for proper environment configuration, as these packets may themselves depend on other packets. Assuming that the interdependence graph contains all necessary relationships, the set of packages that must be installed P is a set of resolved direct dependencies S joined with a set of packages R reachable from S.

The invention automatically deduces the dependency of the node.JS code segment environment based on program analysis. First, the present invention is concerned with the relationship between function calls in a node.js code fragment and the dependency package containing the function declaration. Second, the present invention uses an offline repository to correctly infer the dependencies of target scripts. This knowledge base contains the packages, their versions and resources and the relationships between them. It is constructed by applying static and dynamic analysis to known packages in the library of library. Where static analysis enumerates known resources of a package for later retrieval, dynamic analysis gathers information about delivery dependencies. Then, the association rule mining of dependencies in the public Python project leverages the knowledge of system-level transitive dependencies generated by developers. Finally, an inference algorithm of direct dependency and transitive dependency following the installation order is used on the basis of an offline knowledge base for the given strange code fragments.