CA3153550A1 - Core recommendation method, device and system - Google Patents

Core recommendation method, device and system

Info

Publication number
CA3153550A1
CA3153550A1 CA3153550A CA3153550A CA3153550A1 CA 3153550 A1 CA3153550 A1 CA 3153550A1 CA 3153550 A CA3153550 A CA 3153550A CA 3153550 A CA3153550 A CA 3153550A CA 3153550 A1 CA3153550 A1 CA 3153550A1
Authority
CA
Canada
Prior art keywords
code
codes
recommended
feature
searching
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CA3153550A
Other languages
French (fr)
Inventor
Min Lu
Sikai Wang
Zhiliang Geng
Jie Sun
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
10353744 Canada Ltd
Original Assignee
10353744 Canada Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 10353744 Canada Ltd filed Critical 10353744 Canada Ltd
Publication of CA3153550A1 publication Critical patent/CA3153550A1/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2411Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on the proximity to a decision surface, e.g. support vector machines
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/40Transformation of program code
    • G06F8/41Compilation
    • G06F8/42Syntactic analysis

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Software Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

Pertaining to the field of computer technology, the present invention discloses a code recommending method, and corresponding device and system. The method comprises: obtaining a code recommending request, wherein the code recommending request includes therein a searching code; parsing the searching code to obtain a syntax tree; generating a feature vector of the searching code according to the syntax tree, wherein the feature vector of the searching code at least expresses a structural feature of the searching code; and matching the feature vector of the searching code with feature vectors of codes stored in advance in a code feature library, and obtaining a recommended code. By parsing the searching code into a syntax tree, the present invention renders the code recommending tool applicable to any development language with a grammatical structure, and better applicability is achieved.

Description

CODE RECOMMENDATION METHOD, DEVICE AND SYSTEM
BACKGROUND OF THE INVENTION
Technical Field [0001] The present invention relates to the field of computer technology, and more particularly to a code recommending method, and corresponding device and system.
Description of Related Art
[0002] As one of the software development aids, the code recommending tool is mainly used to match out the optimal open-source code from a code feature library and recommend the code to the user according to a code segment or a row of codes input by the user. At present, since codes are variegated in structures, and code feature libraries are colossal in volume, the currently available recommending tools are all problematic in terms of slow matching speed, inferior extensibility of the corpus, and poor adaptability to different code structures and types, whereby are caused such deficiencies as the code recommending efficiency is low, the effectiveness of recommendation is not so high, and it is required to replace with different code recommending tools.
SUMMARY OF THE INVENTION
[0003] In order to solve the problems pending in the state of the art, embodiments of the present invention provide a code recommending method, and corresponding device and system.
The technical solutions are as follows.
[0004] According to the first aspect, there is provided a code recommending method that comprises:
[0005] obtaining a code recommending request, wherein the code recommending request includes therein a searching code;

Date Recue/Date Received 2022-03-25
[0006] parsing the searching code to obtain a syntax tree;
[0007] generating a feature vector of the searching code according to the syntax tree, wherein the feature vector of the searching code at least expresses a structural feature of the searching code; and
[0008] matching the feature vector of the searching code with feature vectors of codes stored in advance in a code feature library, and obtaining a recommended code.
[0009] Further, the step of generating a feature vector of the searching code according to the syntax tree includes:
[0010] constructing a sparse matrix of the searching code according to the structure of the syntax tree; and
[0011] generating a sparse vector of the searching code according to the sparse matrix, wherein the sparse vector is the feature vector of the searching code.
[0012] Further, the step of matching the feature vector of the searching code with feature vectors of codes stored in advance in a code feature library, and obtaining a recommended code includes:
[0013] performing a dot product calculation on the feature vector of the searching code with respect to the feature vectors of the various codes stored in advance in the code feature library, and obtaining a dot product value; and
[0014] screening the codes stored in advance according to the dot product value, and obtaining the recommended code.
[0015] Further, the step of screening the codes stored in advance according to the dot product value, and obtaining the recommended code includes:
[0016] comparing the dot product value with a first dot product value condition, and taking any code stored in advance that satisfies the first dot product value condition as a candidate code; and
[0017] calculating, if there are two or more candidate codes, a similarity between every two Date Recue/Date Received 2022-03-25 candidate codes, screening the candidate codes according to the similarities of the candidate codes, and obtaining the recommended code.
[0018] Further, the step of screening the candidate codes according to the similarities of the candidate codes, and obtaining the recommended code includes:
[0019] comparing the similarities of the candidate codes with a first similarity condition, if the condition is satisfied, retaining from every two candidate codes the candidate code with a higher dot product value, sequencing the entire retained candidate codes according to the dot product values, and taking any candidate code that satisfies a first sequencing condition as the recommended code.
[0020] Further, the code feature library consists of at least two pieces of fragmented data, the various pieces of fragmented data are configured in different servers, and the various servers respectively obtain the recommended codes according to the code feature libraries stored thereby.
[0021] Further, the method further comprises:
[0022] summarizing the recommended codes obtained by the various servers, and calculating a similarity between every two recommended codes; and
[0023] comparing the similarities of the recommended codes with a second similarity condition, if the condition is satisfied, retaining from every two recommended codes the recommended code with a higher dot product value, sequencing the retained recommended codes according to the dot product values, and taking any recommended code that satisfies a second sequencing condition as a comprehensive recommended code.
[0024] Further, the method further comprises:
[0025] comparing sizes of the code feature libraries stored by the various servers; and
[0026] incrementally updating the code feature libraries in the server that stores the least code feature libraries.

Date Recue/Date Received 2022-03-25
[0027] According to the second aspect, there is provided a code recommending device that comprises:
[0028] a communicating module, for obtaining a code recommending request, wherein the code recommending request includes therein a searching code;
[0029] a parsing module, for parsing the searching code to obtain a syntax tree;
[0030] a vector generating module, for generating a feature vector of the searching code according to the syntax tree, wherein the feature vector of the searching code at least expresses a structural feature of the searching code; and
[0031] a recommended code obtaining module, for matching the feature vector of the searching code with feature vectors of codes stored in advance in a code feature library, and obtaining a recommended code.
[0032] Further, the vector generating module includes:
[0033] a matrix constructing module, for constructing a sparse matrix of the searching code according to the structure of the syntax tree; and
[0034] a feature vector obtaining module, for generating a sparse vector of the searching code according to the sparse matrix, wherein the sparse vector is the feature vector of the searching code.
[0035] Further, the recommended code obtaining module includes:
[0036] a dot product calculating module, for performing a dot product calculation on the feature vector of the searching code with respect to the feature vectors of the various codes stored in advance in the code feature library, and obtaining a dot product value; and
[0037] a screening module, for screening the codes stored in advance in the code feature library according to the dot product value, and obtaining the recommended code.
[0038] Further, the screening module includes:
[0039] a candidate code determining module, for comparing the dot product value with a first Date Recue/Date Received 2022-03-25 dot product value condition, and taking any code stored in advance that satisfies the first dot product value condition as a candidate code;
[0040] a first similarity calculating module, for calculating, if there are two or more candidate codes, a similarity between every two candidate codes; and
[0041] a first screening sub-module, for screening the candidate codes according to the similarities of the candidate codes, and obtaining the recommended code.
[0042] Further, the first screening sub-module is specifically employed for:
[0043] comparing the similarities of the candidate codes with a first similarity condition, if the first similarity condition is satisfied, retaining from every two candidate codes the candidate code with a higher dot product value, sequencing the entire retained candidate codes according to the dot product values, and taking any candidate code that satisfies a first sequencing condition as the recommended code.
[0044] Further, the code feature library is at least two pieces of fragmented data generated in advance according to feature values of codes, the pieces of fragmented data are respectively configured in the servers, and the various servers respectively obtain the recommended codes according to the code feature libraries stored thereby.
[0045] Further, the device further comprises a comprehensive recommended code obtaining module that includes:
[0046] a second similarity calculating module, for summarizing the recommended codes obtained by the various servers, and calculating a similarity between every two recommended codes; and
[0047] a second screening sub-module, for comparing the similarities of the recommended codes with a second similarity condition, if the second similarity condition is satisfied, retaining from every two recommended codes the recommended code with a higher dot product value, sequencing the retained recommended codes according to the dot product values, and taking any recommended code that satisfies a second sequencing condition as a Date Recue/Date Received 2022-03-25 comprehensive recommended code.
[0048] Further, the device further comprises:
[0049] an updating module, for comparing sizes of the code feature libraries stored by the various servers; and
[0050] incrementally updating the code feature libraries in the server that stores the least code feature libraries.
[0051] According to the third aspect, there is provided a computer system that comprises:
[0052] one or more processor(s); and
[0053] a memory, associated with the one or more processor(s) for storing a program instruction that executes the method according to anyone of the aforementioned first aspect when read and executed by the one or more processor(s).
[0054] The technical solutions provided by the embodiments of the present invention bring about the following advantageous effects.
[0055] By parsing the searching code into a syntax tree, the technical solution disclosed by the present invention renders the code recommending tool applicable to any development language with a grammatical structure, and better applicability is achieved.
[0056] By screening codes in the code feature library through dot product values of the feature vector of the searching code and feature vectors of codes in the code feature library, and further screening the screened codes through similarities therebetween, the technical solution disclosed by the present invention facilitates to obtain the codes that most conform to the searching code from the code feature library, whereby user requirement is better satisfied.
[0057] The technical solution disclosed by the present invention disposes code feature libraries Date Recue/Date Received 2022-03-25 in plural servers, the various servers operate unobstructed and through multiple threads, and the comprehensive recommended code can be fed back to the user as long as preset similarities in the recommendation results returned by the servers satisfy conditions and as long as dot product values of the recommendation results satisfy preset dot product value conditions; in comparison with prior-art technology in which the code feature libraries are disposed in a single server to obtain recommended codes, the response speed is quicker in the present invention.
[0058] The technical solution disclosed by the present invention disposes code feature libraries in plural servers, when the code feature libraries are to be updated, servers storing less code feature libraries can be selected to be updated, and the code feature libraries in the servers are configured in balance, so as to ensure the operating capabilities of the servers.
BRIEF DESCRIPTION OF THE DRAWINGS
[0059] To more clearly describe the technical solutions in the embodiments of the present invention, drawings required to be used in the description of the embodiments will be briefly introduced below. Apparently, the drawings introduced below are merely directed to some embodiments of the present invention, while it is possible for persons ordinarily skilled in the art to acquire other drawings based on these drawings without spending creative effort in the process.
[0060] Fig. 1 is a flowchart illustrating the code recommending method provided by an embodiment of the present invention;
[0061] Fig.2 is a syntax tree example provided by an embodiment of the present invention;
[0062] Fig. 3 is a view schematically illustrating the structure of the code recommending device provided by an embodiment of the present invention; and Date Recue/Date Received 2022-03-25
[0063] Fig. 4 is a view schematically illustrating the structure of the computer system provided by an embodiment of the present invention.
DETAILED DESCRIPTION OF THE INVENTION
[0064] In order to make more lucid and clear the objectives, technical solutions and advantages of the present invention, the technical solutions in the embodiments of the present invention will be more clearly and comprehensively described below with reference to the accompanying drawings in the embodiments of the present invention.
Apparently, the embodiments as described are merely partial, rather than the entire, embodiments of the present invention. All other embodiments obtainable by persons ordinarily skilled in the art on the basis of the embodiments in the present invention without spending creative effort shall all fall within the protection scope of the present invention.
[0065] As noted in the Description of Related Art, currently available code recommending tools are invariably problematic in terms of few applicable code types, slow response and inferior extensibility of code feature libraries, while embodiments of the present invention aim to solve the above problems prevailing in the state of the art, and provide a code recommending method, and corresponding device and system, and the specific technical solutions are as follows:
[0066] As shown in Fig. 1, there is provided a code recommending method that comprises the following steps:
[0067] Si - obtaining a code recommending request, wherein the code recommending request includes therein a searching code.
[0068] The aforementioned searching code is a code having a grammatical structure, the Date Recue/Date Received 2022-03-25 searching code is usually not a complete code, and can be a row of codes or a segment of code snippets. Obtainment of the searching code can be effected by directly accepting a searching code input by the user, and can also be effected by monitoring a development tool through a plug-in component and crawling corresponding code snippets.
[0069] S2 - parsing the searching code to obtain a syntax tree.
[0070] Parsing the searching code into a syntax tree can specifically be effected via an Antr14 syntax parser for parsing and transformation. Since a code parsing method is employed in the embodiments of the present invention, there are more applicable types of codes than those applied to currently available code recommending tools, and any development language is applicable to the recommending method disclosed in the embodiments of the present invention as long as the language has a certain grammatical structure.
[0071] S3 - generating a feature vector of the searching code according to the syntax tree, wherein the feature vector of the searching code at least expresses a structural feature of the searching code.
[0072] In addition to expressing the structural feature of the searching code, the aforementioned feature vector of the searching code can also express the invoking method and/or using method of the searching code. The structural feature of the searching code can specifically be the structural feature of its syntax tree, such as the sequencing order of characters in the syntax tree.
[0073] In one embodiment, step S3 includes:
[0074] S31 - constructing a sparse matrix of the searching code according to the structure of the syntax tree; and
[0075] S32 - generating a sparse vector of the searching code according to the sparse matrix, wherein the sparse vector is the feature vector of the searching code.

Date Recue/Date Received 2022-03-25
[0076] The aforementioned sparse matrix is constructed according to the structural feature of the syntax tree. The sparse matrix is a matrix in which the number of elements whose values are 0 is by far larger than the number of non-zero elements, and the non-zero elements are irregularly distributed. After the sparse matrix has been constructed to completion, a set of structural features is extracted therefrom to generate a sparse vector, preferably, generation of the sparse vector can be further based on such features as the invoking method and/or using method of the searching code. An example is taken below.
[0077] Take for example the following segment of the searching code:
if (view ins tanceef ViewGreup) {
for (int i =0 ; i < ( (ViewGroup) view) . getChi IdCount ( ) ; i++) View innerView=((ViewGroup) View) . getChildAt (i) ;
[0078]
[0079] The syntax tree generated according to the grammatical structure is as shown in Fig. 2.
[0080] Keywords are retained, non-keywords are expressed as unified symbols as far as possible, and leaf nodes cannot be keywords.
[0081] These structural features have been carefully selected, and can capture the using method, invoking method and structure information of each code. A sparse vector is hence created for each code according to the characteristics of the code. For instance, an index matrix formed by total feature vectors has many features of rows, structural features for searching are subsequently based on to find in these many rows whether the given row is included, if yes, 1 is used to express, if not, 0 is used to express, and the sparse vector of the searching code is thusly formed.
[0082] S4 - matching the feature vector of the searching code with feature vectors of codes stored in advance in a code feature library, and obtaining a recommended code.
Date Recue/Date Received 2022-03-25
[0083] The feature code library stores feature vectors to which massive quantities of previously stored codes correspond, and the previously stored codes can specifically be open-source codes. The code feature library is constructed in advance, and can specifically store massive quantities of complete codes stored in advanced and feature vectors to which the various codes stored in advance correspond, or can only store feature vectors to which the various codes stored in advance correspond and correspondence relations between the codes stored in advance and their corresponding feature vectors; the complete codes stored in advance are stored in a code corpus, and it is possible, during specific operation, to firstly determine the feature vectors of the codes stored in advance according to the feature vector of the searching code, and to thereafter determine the recommended code according to the correspondence relations between the feature vectors of the codes stored in advance and the codes stored in advance.
[0084] Since the feature vector of the searching code at least expresses the structural feature of the searching code, it is therefore made possible to base on the feature vector of the searching code to match out the code stored in advance that is most close to the structural feature of the searching code from the code feature library.
[0085] In one embodiment, step S4 includes:
[0086] S41 - performing a dot product calculation on the feature vector of the searching code with respect to the feature vectors of the various codes stored in advance in the code feature library, and obtaining a dot product value; and
[0087] S42 - screening the codes stored in advance in the code feature library according to the dot product value, and obtaining the recommended code.
[0088] The dot product calculation is performed on the feature vector of the searching code with respect to the feature vectors of the various codes stored in advance in the code feature library, the greater the dot product value is, the closer will be the searching code to the given code stored in advance, so it is possible to screen the codes stored in advance in the Date Recue/Date Received 2022-03-25 code feature library through the dot product value, and to obtain the recommended code.
[0089] In one embodiment, step S42 includes:
[0090] comparing the dot product value with a first dot product value condition, and taking any code stored in advance that satisfies the first dot product value condition as a candidate code; and
[0091] calculating, if there are two or more candidate codes, a similarity between every two candidate codes, screening the candidate codes according to the similarities of the candidate codes, and obtaining the recommended code.
[0092] In the method disclosed by this embodiment, the first dot product value condition includes a threshold of the dot product value. The dot product value is firstly used to preliminarily select the codes stored in advance, after two or more candidate codes have been selected, the codes stored in advance can be sequenced according to dot product values, and the codes stored in advance that satisfy the dot product value sequencing condition are taken as candidate codes.
[0093] After the candidate codes have been obtained, the candidate codes are pruned and rearranged, whereby portions irrelevant to the searching code are removed from the main body of the candidate codes, and only the portion most matching the searching code is retained.
[0094] When two or more candidate codes have been obtained, since the structures of the various candidate codes may be repetitive or similar, in order to enhance comprehensiveness of code recommendation, the method disclosed by the embodiments of the present invention screens among candidate codes with high similarities. The method of calculating similarities can be directed to the cosine similarity between two candidate codes. The candidate codes are screened according to the similarities of the candidate codes, and it is specifically possible to preset a similarity threshold (the similarity being greater than Date Recue/Date Received 2022-03-25 0.7, for example), when the similarity of any candidate code exceeds 0.7, this candidate code is screened out.
[0095] In one embodiment, the step of screening the candidate codes according to the similarities of the candidate codes, and obtaining the recommended code includes:
[0096] comparing the similarities of the candidate codes with a first similarity condition, if the first similarity condition is satisfied, retaining from every two candidate codes the candidate code with a higher dot product value, sequencing the entire retained candidate codes according to the dot product values, and taking any candidate code that satisfies a first sequencing condition as the recommended code.
[0097] The aforementioned first similarity condition includes a threshold of similarities. In the method disclosed by the embodiments of the present invention, when similarities of the candidate codes satisfy the first similarity condition, the candidate codes are screened according to dot product values. However, if candidate codes with higher similarities are screened only according to dot product values, the number of finally obtained recommended codes may still be unduly large, therefore, in order to further enhance precision of the recommended codes, the method disclosed by the embodiments of the present invention further sequences and screens the retained candidate codes according to dot product values, the first sequencing condition can be either a sequencing order of the candidate codes according to dot product values, or a threshold condition of the dot product values.
[0098] The recommended code obtained in the aforementioned step S4 can be recommended to the user as the finally matched code according to the searching code. However, since code feature libraries are configured in a single server in the currently available code recommending tool, the matching result is relatively low because the magnitude of the code feature libraries is significantly large; accordingly, in order to enhance the speed of generating the recommended codes, an embodiment of the present invention discloses the Date Recue/Date Received 2022-03-25 following technical solution on the basis of the foregoing embodiment:
[0099] the code feature library is at least two pieces of fragmented data generated in advance according to feature values of codes, the pieces of fragmented data are configured in different servers, and the various servers respectively obtain recommended codes according to the code feature libraries stored thereby.
[0100] The fragmented data means different data portions into which the data is split. The code feature libraries are configured in plural servers, the various servers operate unobstructed through multiple threads, and respectively obtain recommended codes according to the code feature libraries stored thereby by employing the method disclosed by any embodiment of the present invention, and the final comprehensive recommended code is determined from the recommended codes obtained by the various servers.
[0101] On the basis of the embodiment in which the code feature libraries are distributed in at least two servers, in one embodiment, the method further comprises:
[0102] summarizing the recommended codes obtained by the various servers, and calculating a similarity between every two recommended codes; and
[0103] comparing the similarities of the recommended codes with a second similarity condition, if the second similarity condition is satisfied, retaining from every two recommended codes the recommended code with a higher dot product value, sequencing the retained recommended codes according to the dot product values, and taking any recommended code that satisfies a second sequencing condition as a comprehensive recommended code.
[0104] The aforementioned second similarity condition includes a threshold of similarities. The second sequencing condition can be either a sequencing order of the recommended codes according to dot product values, or a threshold condition of the dot product values. The recommended codes determined by the various servers are screened through the similarities of the recommended codes, the recommended codes retained after screening according to similarities are screened again according to dot product values, and the Date Recue/Date Received 2022-03-25 recommended code more conforming to the feature of the searching code is determined to serve as a comprehensive recommended code, for recommendation to the user.
The similarity of recommended codes is a cosine similarity between two recommended codes.
[0105] On the basis of the embodiment in which the code feature libraries are distributed in at least two servers, in one embodiment, the method further comprises:
[0106] obtaining in real time, during the process in which the various servers obtain recommended codes, the recommended codes obtained by the various servers, and calculating a similarity between every two recommended codes; and
[0107] comparing the similarities of the recommended codes with a third similarity condition, if the condition is satisfied, comparing the dot product values of the recommended codes with a second dot product value condition, if the condition is satisfied, counting the number of the recommended codes that satisfy the second dot product value condition, and taking recommended codes that satisfy a preset number condition as comprehensive recommended codes.
[0108] The aforementioned third similarity condition includes a threshold of similarities. The third sequencing condition can be either a sequencing order of the recommended codes according to dot product values, or a threshold condition of the dot product values. In order to further enhance the response speed of the recommending method, the recommended codes obtained by the various servers are obtained in the embodiments of the present invention, similarity calculation is performed in real time, the number of retained recommended codes is counted while the recommended codes are being screened according to the similarities, and the retained recommended codes satisfying a preset number condition are taken as final comprehensive recommended codes. Through the real-time calculation and by employing the preset number condition as the condition to determine the comprehensive recommended codes, this embodiment of the present invention makes it possible to more quickly feed back the comprehensive recommended codes as compared with the previous embodiment.
Date Recue/Date Received 2022-03-25
[0109] On the basis of the embodiment in which the code feature libraries are distributed in at least two servers, in one embodiment, the method further comprises:
[0110] comparing sizes of the code feature libraries stored by the various servers; and
[0111] incrementally updating the code feature libraries in the server that stores the least code feature libraries.
[0112] The technical solution of updating the code feature libraries is disclosed above. Since the code feature libraries are merely stored in one server in the prior-art technology, the operating pressure of the server would be more aggravated after the code feature libraries have been updated. Since the code feature libraries are configured in plural servers in the embodiments of the present invention, the server with small operating pressure can be selected as the server for update, whereby extensibility of the code feature libraries is enhanced.
[0113] As shown in Fig. 3, on the basis of the code recommending method disclosed in the foregoing embodiment, the present invention further provides a code recommending device that comprises the following modules.
[0114] A communicating module 201 is employed for obtaining a code recommending request, wherein the code recommending request includes therein a searching code.
[0115] The aforementioned searching code is a code having a grammatical structure, and can be a row of codes or a segment of code snippets.
[0116] A parsing module 202 is employed for parsing the searching code to obtain a syntax tree.
[0117] The aforementioned searching code can be parsed and transformed to a syntax tree through a common syntax parser.
[0118] A vector generating module 203 is employed for generating a feature vector of the Date Recue/Date Received 2022-03-25 searching code according to the syntax tree, wherein the feature vector of the searching code at least expresses a structural feature of the searching code.
[0119] In addition to expressing the structural feature of the searching code, the aforementioned feature vector of the searching code can also express the invoking method and/or using method of the searching code. The structural feature of the searching code can specifically be the structural feature of its syntax tree, such as the sequencing order of characters in the syntax tree.
[0120] In one embodiment, the vector generating module 203 includes:
[0121] a matrix constructing module, for constructing a sparse matrix of the searching code according to the structure of the syntax tree; and
[0122] a feature vector obtaining module, for generating a sparse vector of the searching code according to the sparse matrix, wherein the sparse vector is the feature vector of the searching code.
[0123] Preferably, the aforementioned sparse vector can further be a feature based on the invoking method and/or using method of the searching code.
[0124] A recommended code obtaining module 204 is employed for matching the feature vector of the searching code with feature vectors of codes stored in advance in a code feature library, and obtaining a recommended code.
[0125] The code feature library is constructed in advance, and can contain complete codes stored in advanced and feature vectors to which the various codes stored in advance correspond, or can only store feature vectors to which the various codes stored in advance correspond and correspondence relations between the codes stored in advance and the feature vectors.
[0126] In one embodiment, the recommended code obtaining module 204 includes:

Date Recue/Date Received 2022-03-25
[0127] a dot product calculating module, for performing a dot product calculation on the feature vector of the searching code with respect to the feature vectors of the various codes stored in advance in the code feature library, and obtaining a dot product value; and
[0128] a screening module, for screening the codes stored in advance in the code feature library according to the dot product value, and obtaining the recommended code.
[0129] The greater the dot product value is, the closer will be the searching code to the given code in the code feature library.
[0130] In one embodiment, the screening module includes:
[0131] a candidate code determining module, for comparing the dot product value with a first dot product value condition, and taking any code stored in advance that satisfies the first dot product value condition as a candidate code;
[0132] a first similarity calculating module, for calculating, if there are two or more candidate codes, a similarity between every two candidate codes; and
[0133] a first screening sub-module, for screening the candidate codes according to the similarities of the candidate codes, and obtaining the recommended code.
[0134] The candidate code determining module is further employed for pruning and rearranging the candidate codes after the candidate codes have been obtained, removing portions irrelevant to the searching code from the main body of the candidate codes, and only retaining the portion most matching the searching code. The similarity calculating module is specifically employed for calculating cosine similarities of the candidate codes.
[0135] In one embodiment, the first screening sub-module is specifically employed for:
[0136] comparing the similarities of the candidate codes with a first similarity condition, if the first similarity condition is satisfied, retaining from every two candidate codes the candidate code with a higher dot product value, sequencing the entire retained candidate codes according to the dot product values, and taking any candidate code that satisfies a Date Recue/Date Received 2022-03-25 first sequencing condition as the recommended code.
[0137] In one embodiment, the code recommending device disclosed by the present invention comprises at least two servers, the code feature library is at least two pieces of fragmented data generated in advance according to feature values of codes, the pieces of fragmented data are respectively configured in the servers, and the various servers respectively obtain the recommended codes according to the code feature libraries stored thereby.
[0138] Based on the aforementioned code recommending device that comprises at least two servers, in one embodiment, the device further comprises a comprehensive recommended code obtaining module that includes:
[0139] a second similarity calculating module, for summarizing the recommended codes obtained by the various servers, and calculating a similarity between every two recommended codes; and
[0140] a second screening sub-module, for comparing the similarities of the recommended codes with a second similarity condition, if the second similarity condition is satisfied, retaining from every two recommended codes the recommended code with a higher dot product value, sequencing the retained recommended codes according to the dot product values, and taking any recommended code that satisfies a second sequencing condition as a comprehensive recommended code.
[0141] Based on the aforementioned code recommending device that comprises at least two servers, in one embodiment, the device further comprises a comprehensive recommended code obtaining module that includes:
[0142] a third similarity calculating module, for obtaining in real time, during the process in which the various servers obtain recommended codes, the recommended codes obtained by the various servers, and calculating a similarity between every two recommended codes; and
[0143] a third screening sub-module, for comparing the similarities of the recommended codes Date Recue/Date Received 2022-03-25 with a third similarity condition, if the condition is satisfied, comparing the dot product values of the recommended codes with a second dot product value condition, if the condition is satisfied, counting the number of the recommended codes that satisfy the second dot product value condition, and taking recommended codes that satisfy a preset number condition as comprehensive recommended codes.
[0144] Based on the aforementioned code recommending device that comprises at least two servers, in one embodiment, the device further comprises:
[0145] an updating module, for comparing sizes of the code feature libraries stored by the various servers; and
[0146] incrementally updating the code feature libraries in the server that stores the least code feature libraries.
[0147] As should be noted, in the aforementioned device disclosed by the present invention, the first similarity calculating module, the second similarity calculating module, and the third similarity calculating module can be integrated as the same and single similarity module, and the first screening sub-module, the second screening sub-module, and the third screening sub-module can be integrated as the same and single screening sub-module.
The above differentiations in this embodiment of the present invention mainly aims to correspond to the method embodiment.
[0148] Based on the aforementioned code recommending method, the present invention further provides a computer system that comprises:
[0149] one or more processor(s); and
[0150] a memory, associated with the one or more processor(s) for storing a program instruction that executes the aforementioned code recommending method when read and executed by the one or more processor(s).
[0151] Fig. 4 exemplarily illustrates the framework of the computer system that can specifically Date Recue/Date Received 2022-03-25 include a processor 310, a video display adapter 311, a magnetic disk driver 312, an input/output interface 313, a network interface 314, and a memory 320. The processor 310, the video display adapter 311, the magnetic disk driver 312, the input/output interface 313, the network interface 314, and the memory 320 can be communicably connected with one another via a communication bus 330.
[0152] The processor 310 can be embodied as a general CPU (Central Processing Unit), a microprocessor, an ASIC (Application Specific Integrated Circuit), or one or more integrated circuit(s) for executing relevant program(s) to realize the technical solutions provided by the present application.
[0153] The memory 320 can be embodied in such a form as an ROM (Read Only Memory), an RAM (Random Access Memory), a static storage device, or a dynamic storage device.
The memory 320 can store an operating system 321 for controlling the running of an electronic equipment 300, and a basic input/output system 322 (BIOS) for controlling lower-level operations of the electronic equipment 300. In addition, the memory 320 can also store a web browser 323, a data storage administration system 324, and an equipment identification information processing system 325, etc. The equipment identification information processing system 325 can be an application program that specifically realizes the aforementioned various step operations in the embodiments of the present application. To sum it up, when the technical solutions provided by the present application are to be realized via software or firmware, the relevant program codes are stored in the memory 320, and invoked and executed by the processor 310.
[0154] The input/output interface 313 is employed to connect with an input/output module to realize input and output of information. The input/output module can be equipped in the device as a component part (not shown in the drawings), and can also be externally connected with the device to provide corresponding functions. The input means can include a keyboard, a mouse, a touch screen, a microphone, and various sensors etc., and Date Recue/Date Received 2022-03-25 the output means can include a display screen, a loudspeaker, a vibrator, an indicator light etc.
[0155] The network interface 314 is employed to connect to a communication module (not shown in the drawings) to realize intercommunication between the current device and other devices. The communication module can realize communication in a wired mode (via USB, network cable, for example) or in a wireless mode (via mobile network, WIFI, Bluetooth, etc.).
[0156] The bus 330 includes a passageway transmitting information between various component parts of the device (such as the processor 310, the video display adapter 311, the magnetic disk driver 312, the input/output interface 313, the network interface 314, and the memory 320).
[0157] Additionally, the electronic equipment 300 may further obtain information of specific collection conditions from a virtual resource object collection condition information database for judgment on conditions, and so on.
[0158] As should be noted, although merely the processor 310, the video display adapter 311, the magnetic disk driver 312, the input/output interface 313, the network interface 314, the memory 320, and the bus 330 are illustrated for the aforementioned device, the device may further include other component parts prerequisite for realizing normal running during specific implementation. In addition, as can be understood by persons skilled in the art, the aforementioned device may as well only include component parts necessary for realizing the solutions of the present application, without including the entire component parts as illustrated.
[0159] As can be known through the description to the aforementioned embodiments, it is clearly learnt by person skilled in the art that the present application can be realized through Date Recue/Date Received 2022-03-25 software plus a general hardware platform. Based on such understanding, the technical solutions of the present application, or the contributions made thereby over the state of the art, can be essentially embodied in the form of a software product, and such a computer software product can be stored in a storage medium, such as an ROM/RAM, a magnetic disk, an optical disk etc., and includes plural instructions enabling a computer equipment (such as a personal computer, a server, or a network device etc.) to execute the methods described in various embodiments or some sections of the embodiments of the present application.
[0160] The various embodiments are progressively described in the Description, identical or similar sections among the various embodiments can be inferred from one another, and each embodiment stresses what is different from other embodiments.
Particularly, with respect to the system or system embodiment, since it is essentially similar to the method embodiment, its description is relatively simple, and the relevant sections thereof can be inferred from the corresponding sections of the method embodiment. The system or system embodiment as described above is merely exemplary in nature, units therein described as separate parts can be or may not be physically separate, parts displayed as units can be or may not be physical units, that is to say, they can be located in a single site, or distributed over a plurality of network units. It is possible to base on practical requirements to select partial modules or the entire modules to realize the objectives of the embodied solutions. It is understandable and implementable by persons ordinarily skilled in the art without spending creative effort in the process.
[0161] The technical solutions provided by the embodiments of the present invention bring about the following advantageous effects.
[0162] 1. By parsing the searching code into a syntax tree, the technical solution disclosed by the present invention renders the code recommending tool applicable to any development language with a grammatical structure, and better applicability is achieved.

Date Recue/Date Received 2022-03-25
[0163] 2. By screening codes in the code feature library through dot product values of the feature vector of the searching code and feature vectors of codes in the code feature library, and further screening the screened codes through similarities therebetween, the technical solution disclosed by the present invention facilitates to obtain the codes that most conform to the searching code from the code feature library, whereby user requirement is better satisfied.
[0164] 3. The technical solution disclosed by the present invention disposes code feature libraries in plural servers, the various servers operate unobstructed and through multiple threads, and the comprehensive recommended code can be fed back to the user as long as preset similarities in the recommendation results returned by the servers satisfy conditions and as long as dot product values of the recommendation results satisfy preset dot product value conditions; in comparison with prior-art technology in which the code feature libraries are disposed in a single server to obtain recommended codes, the response speed is quicker in the present invention.
[0165] 4. The technical solution disclosed by the present invention disposes code feature libraries in plural servers, when the code feature libraries are to be updated, servers storing less code feature libraries can be selected to be updated, and the code feature libraries in the servers are configured in balance, so as to ensure the operating capabilities of the servers.
[0166] All of the aforementioned optional technical solutions are randomly combinable to form optional embodiments of the present invention, to which no redundancy is made on a one-by-one basis.
[0167] What the above describes is merely directed to preferred embodiments of the present invention, and is not meant to restrict the present invention. Any amendment, equivalent replacement or improvement makeable within the spirit and scope of the present Date Recue/Date Received 2022-03-25 invention shall all be covered by the protection scope of the present invention.
Date Recue/Date Received 2022-03-25

Claims (10)

What is claimed is:
1. A code recommending method, characterized in that the method comprises:
obtaining a code recommending request, wherein the code recommending request includes therein a searching code;
parsing the searching code to obtain a syntax tree;
generating a feature vector of the searching code according to the syntax tree, wherein the feature vector of the searching code at least expresses a structural feature of the searching code; and matching the feature vector of the searching code with feature vectors of codes stored in advance in a code feature library, and obtaining a recommended code.
2. The method according to Claim 1, characterized in that the step of generating a feature vector of the searching code according to the syntax tree includes:
constructing a sparse matrix of the searching code according to the structure of the syntax tree;
and generating a sparse vector of the searching code according to the sparse matrix, wherein the sparse vector is the feature vector of the searching code.
3. The method according to Claim 1, characterized in that the step of matching the feature vector of the searching code with feature vectors of codes stored in advance in a code feature library, and obtaining a recommended code includes:
performing a dot product calculation on the feature vector of the searching code with respect to the feature vectors of the various codes stored in advance in the code feature library, and obtaining a dot product value; and screening the codes stored in advance according to the dot product value, and obtaining the recommended code.

Date Recue/Date Received 2022-03-25
4. The method according to Claim 3, characterized in that the step of screening the codes stored in advance according to the dot product value, and obtaining the recommended code includes:
comparing the dot product value with a first dot product value condition, and taking any code stored in advance that satisfies the first dot product value condition as a candidate code; and calculating, if there are two or more candidate codes, a similarity between every two candidate codes, screening the candidate codes according to the similarities of the candidate codes, and obtaining the recommended code.
5. The method according to Claim 4, characterized in that the step of screening the candidate codes according to the similarities of the candidate codes, and obtaining the recommended code includes:
comparing the similarities of the candidate codes with a first similarity condition, if the condition is satisfied, retaining from every two candidate codes the candidate code with a higher dot product value, sequencing the entire retained candidate codes according to the dot product values, and taking any candidate code that satisfies a first sequencing condition as the recommended code.
6. The method according to anyone of Claims 1 to 5, characterized in that the code feature library consists of at least two pieces of fragmented data, the various pieces of fragmented data are configured in different servers, and the various servers respectively obtain the recommended codes according to the code feature libraries stored thereby.
7. The method according to Claim 6, characterized in further comprising:
summarizing the recommended codes obtained by the various servers, and calculating a similarity between every two recommended codes; and comparing the similarities of the recommended codes with a second similarity condition, if the condition is satisfied, retaining from every two recommended codes the recommended code with a higher dot product value, sequencing the retained recommended codes according to the dot product values, and taking any recommended code that satisfies a second sequencing condition as a comprehensive recommended code.
8. The method according to Claim 6, characterized in further comprising:
comparing sizes of the code feature libraries stored by the various servers;
and incrementally updating the code feature libraries in the server that stores the least code feature libraries.
9. A code recommending device, characterized in that the device comprises:
a communicating module, for obtaining a code recommending request, wherein the code recommending request includes therein a searching code;
a parsing module, for parsing the searching code to obtain a syntax tree;
a vector generating module, for generating a feature vector of the searching code according to the syntax tree, wherein the feature vector of the searching code at least expresses a structural feature of the searching code; and a recommended code obtaining module, for matching the feature vector of the searching code with feature vectors of codes stored in advance in a code feature library, and obtaining a recommended code.
10. A computer system, characterized in comprising:
one or more processor(s); and a memory, associated with the one or more processor(s) for storing a program instruction that executes the method according to anyone of Claims 1 to 8 when read and executed by the one or more processor(s).

Date Recue/Date Received 2022-03-25
CA3153550A 2021-03-25 2022-03-25 Core recommendation method, device and system Pending CA3153550A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202110319416.XA CN113344023A (en) 2021-03-25 2021-03-25 Code recommendation method, device and system
CN202110319416.X 2021-03-25

Publications (1)

Publication Number Publication Date
CA3153550A1 true CA3153550A1 (en) 2022-09-25

Family

ID=77467827

Family Applications (1)

Application Number Title Priority Date Filing Date
CA3153550A Pending CA3153550A1 (en) 2021-03-25 2022-03-25 Core recommendation method, device and system

Country Status (2)

Country Link
CN (1) CN113344023A (en)
CA (1) CA3153550A1 (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117289919B (en) * 2023-11-24 2024-02-20 浙江口碑网络技术有限公司 Data processing method and device and electronic equipment

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190303141A1 (en) * 2018-03-29 2019-10-03 Elasticsearch B.V. Syntax Based Source Code Search
CN108829764B (en) * 2018-05-28 2021-11-09 腾讯科技(深圳)有限公司 Recommendation information acquisition method, device, system, server and storage medium
CN111723192B (en) * 2020-06-19 2024-02-02 南开大学 Code recommendation method and device
CN112328743A (en) * 2020-11-03 2021-02-05 北京嘀嘀无限科技发展有限公司 Code searching method and device, readable storage medium and electronic equipment

Also Published As

Publication number Publication date
CN113344023A (en) 2021-09-03

Similar Documents

Publication Publication Date Title
US10237295B2 (en) Automated event ID field analysis on heterogeneous logs
CN112052138A (en) Service data quality detection method and device, computer equipment and storage medium
CN110795455A (en) Dependency relationship analysis method, electronic device, computer device and readable storage medium
CN113987086A (en) Data processing method, data processing device, electronic device, and storage medium
CA3153550A1 (en) Core recommendation method, device and system
CN112508119B (en) Feature mining combination method, device, equipment and computer readable storage medium
CN113609195A (en) Report generation method, report generation device, electronic equipment and storage medium
CN106557178B (en) Method and device for updating entries of input method
CN116860311A (en) Script analysis method, script analysis device, computer equipment and storage medium
CN112306452A (en) Method, device and system for processing service data by merging and sorting algorithm
WO2023077944A1 (en) Method and apparatus for outputting information, device, and storage medium
CN113918577B (en) Data table identification method and device, electronic equipment and storage medium
US9235639B2 (en) Filter regular expression
CN115358397A (en) Parallel graph rule mining method and device based on data sampling
CN111652281B (en) Information data classification method, device and readable storage medium
CN114048315A (en) Method and device for determining document tag, electronic equipment and storage medium
CN113836005A (en) Virtual user generation method and device, electronic equipment and storage medium
CN112214497A (en) Label processing method and device and computer system
CN111222066A (en) Pull-down component system and method for event processing and searching
CN110719260B (en) Intelligent network security analysis method and device and computer readable storage medium
CN116028750B (en) Webpage text auditing method and device, electronic equipment and medium
CN117390216B (en) Music data processing method and device
CN112307050B (en) Identification method and device for repeated correlation calculation and computer system
CN117435185A (en) Code generation method, device, computer equipment and storage medium
CN114898060A (en) Method, apparatus, device, medium and product for processing data