CN103927177A - Characteristic-interface digraph establishment method based on LDA model and PageRank algorithm - Google Patents

Characteristic-interface digraph establishment method based on LDA model and PageRank algorithm Download PDF

Info

Publication number
CN103927177A
CN103927177A CN201410156746.1A CN201410156746A CN103927177A CN 103927177 A CN103927177 A CN 103927177A CN 201410156746 A CN201410156746 A CN 201410156746A CN 103927177 A CN103927177 A CN 103927177A
Authority
CN
China
Prior art keywords
interface
digraph
characteristic
feature
project
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201410156746.1A
Other languages
Chinese (zh)
Other versions
CN103927177B (en
Inventor
孙小兵
施伟
李斌
李云
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Yangzhou University
Original Assignee
Yangzhou University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Yangzhou University filed Critical Yangzhou University
Priority to CN201410156746.1A priority Critical patent/CN103927177B/en
Publication of CN103927177A publication Critical patent/CN103927177A/en
Application granted granted Critical
Publication of CN103927177B publication Critical patent/CN103927177B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Stored Programmes (AREA)

Abstract

The invention discloses a characteristic-interface digraph establishment method based on an LDA model and the PageRank algorithm in the field of software engineering. The method includes the following steps that firstly, a suitable open source software library is selected as a code assistant; secondly, topics corresponding to items are put forward through the LDA topic model and serve as characteristic sets of the corresponding items; thirdly, interface messages in all the items are retrieved, relationships between the interface messages and the corresponding characteristic sets of the items are established to form a characteristic-interface digraph that the item characteristic sets are directed to the item interface sets, and the number of times that interfaces are called in the items is calculated through the PageRank algorithm and serves as the weight of edges of the characteristic-interface digraph; fourthly, after the characteristic-interface digraph is formed, a program developer matches the characteristics of an item to be developed with characteristics in the characteristic-interface digraph, and the possibly optimum interface list is recommended to the developer to be selected and used according to the directional relationship of the edges of the digraph. The method improves software development efficiency and can be applied to software development.

Description

Set up the method for feature interface digraph based on LDA model and PageRank algorithm
Technical field
The present invention relates to a kind of project interface choosing method, particularly a kind of method of setting up feature interface digraph, belongs to field of software engineering.
Background technology
When developer is in the time developing new program, first can analyze according to demand the division of carrying out functional module, database step is considered in setting data unit afterwards.In program development process, the setting of interface often also there will be in developer's consideration category, so under such background, developer is after having carried out functional module division, according to developer, the functional description of program is carried out to the interface suitable for developer's recommended program, select for developer, improve software development efficiency.In the time of software development, conventionally can according to project demands document, interface be manually set by project leader at present, in the time of search interface, according to keyword search, the method existence is searched not comprehensive, and recall precision is not high, undesirable for realizing of code reuse.
Core technology of the present invention is to generate " feature-interface " digraph, can realize the function of the item characteristic recommendation interface of exploitation according to demand by the digraph generating.Core technology of the present invention need to extract the importance of the project interface in feature and the open source software storehouse of project, and the technology using comprises LDA topic model and PageRank algorithm.LDA topic model is a kind of probabilistic language model, and field is applied to text modeling, finds hiding subject information from text, uses LDA topic model to extract the theme feature of project in open source software storehouse in the present invention.PageRank algorithm is a kind of search engine algorithms, for weighing the significance level of particular webpage with respect to other webpages of search engine index, PageRank algorithm has been realized link Concept of Value as rank factor, in the present invention, PageRank algorithm application, in the call number of calculating the distinct interface under certain item characteristic, is carried out rank.
According to the characteristic of LDA theme, use this model extraction item characteristic.Can identify developer's demand characteristic by topic model technology, obtain the feature (theme) of the project of increasing income in open source software storehouse simultaneously; The interface that utilizes simple traversal technology to find out in open source software forms digraph.And be analogous to the form of calculation of web page interlinkage importance, and carry out the significance level of interface in computational item with PageRank algorithm, using the theme feature of project as homepage, and interface is as the subpage frame being linked by homepage.Utilize PageRank technology to retrieve the interface of the character pair of developer's needs, feed back to developer, for its choice for use.
Summary of the invention
The object of this invention is to provide a kind of method of setting up feature interface digraph based on LDA model and PageRank algorithm, the performance of program providing according to developer, automatically from software library, recommend the routine interface that is applicable to this feature for developer, thereby improved the efficiency of software development, the interface of code is reused.
The object of the present invention is achieved like this: a kind of method of setting up feature interface digraph based on LDA model and PageRank algorithm, it is characterized in that, and comprise the steps:
Step 1) choose suitable open source software storehouse as code support;
Step 2) to the project in the open source software storehouse of choosing, extract the characteristic set of theme corresponding to project as this project by LDA topic model;
Step 3) interface message of retrieval in projects, set up and contact with the set of projects characteristic of correspondence, form the feature-interface digraph that is pointed to project interface set by item characteristic set, utilize PageRank algorithm calculate interface the call number of project inside as feature-interface digraph in the weights on limit;
Step 4) after constitutive characteristic-interface digraph, program development personnel are mated with the feature in described feature-interface digraph according to the feature of project to be developed, recommend possible optimum interface list for developer's choice for use according to the points relationship on digraph limit.
As further restriction of the present invention, simplify step, raise the efficiency, step 2) concrete grammar as follows: LDA model during respectively to item extraction theme in open source software storehouse, by arranging for LDA parameter, is realized the project of at every turn only extracting, taking this project as a document library, go out the theme of this project by LDA model extraction, then, successively to the extraction theme that makes to use the same method of other projects of increasing income in open source software storehouse; Use LDA model theme in the current project that need to extract theme time, using the first two theme the highest Distribution Value as best features set B estF i, best features set B estF ias the match objects of interface in step 4).
As further restriction of the present invention, simplify step, raise the efficiency, the concrete grammar of step 3) is as follows: travel through current project, retrieve interface in item file and with " filename. interface name " form locate this interface, set up by best features set B estF ipoint to feature-interface digraph of set of interfaces; After feature-interface digraph is set up, use the call number of each interface in PageRank algorithm statistical item,, sort according to calculated value size docking port meanwhile.
As further restriction of the present invention, simplify step, raise the efficiency, the concrete grammar of step 4) is as follows: in the time that program development personnel carry out characteristic matching, and the best features set B estF in characteristic set f and the claim 2 of the project to be developed that developer is selected imate, matching process is: each word in characteristic set f is mated to BestF successively ithe word of middle correspondence, in the time that the number percent of the match is successful word number the accounts for total word number of characteristic set f is more than or equal to 50%, judges characteristic set f and described best features set B estF ithe match is successful, and recommend interface list to select voluntarily for program development personnel according to the program development personnel that are oriented on limit in feature-interface digraph; After certain interface of program development personnel selection, determine concrete interface according to the locator meams in claim 3, and recommend this interface for developer.
Compared with prior art, beneficial effect of the present invention is, the present invention extracts theme feature by LDA topic model, simplify the artificial step of understanding code and extract project theme, by interface keyword search interface and it is stored according to specific format, the theme extracting forms digraph, recommends optimum interface when needed by PageRank algorithm according to the relation sequence on limit in digraph; The advantage that this technology is brought mainly contain following some:
1) be developer's recommendation function interface, simplified software development process, improve development efficiency;
2) while recommending interface according to developer's functional description for developer, can carry out function refinement for developer, improve the function of software;
3) recommendation of interface can help developer to write the better code of reusability;
4), after project interface is had good positioning, the developer of disparate modules can, first according to the module of the definition exploitation of interface oneself, improve development efficiency.The present invention can be used in software development.
Brief description of the drawings
Fig. 1 is use procedure process flow diagram of the present invention.
Fig. 2 utilizes LDA topic model to extract theme process flow diagram flow chart in the present invention.
Fig. 3 is the process flow diagram of setting up feature-interface digraph process in the present invention.
Fig. 4 recommends interface procedure schematic diagram according to item characteristic retrieval character-interface digraph to be developed.
Wherein, 1LAD extracts theme process, and 2LDA Computation distribution value is extracted BestF iprocess, 3 set up feature-interface digraph process, 4 interface position processes, 5 characteristic matching processes.
Embodiment
Below in conjunction with the drawings and specific embodiments, technical scheme of the present invention is made to detailed description.
Method of the present invention is to extract the theme in open source software storehouse by LDA (Latent Dirichlet Allocation) topic model, give project subject description, use the subject word set that LDA model extraction goes out to describe as the feature of this project, the theme feature set that uses PageRank algorithm to set up to extract with travel through the digraph of the interface finding out and calculate the weights on every limit, then sort according to weights, according to the summit of the feature key search digraph of developer's input, the match is successful recommends the node (being interface) under this summit to select for developer, as shown in Figure 1.
One, parameter-definition
1, open source software library file
Suppose existing open source software storehouse (using the entirety of the project of increasing income in China of increasing income as open source software storehouse), in open source software storehouse, have n project to be respectively P 1, P 2p n.
2, LDA (Latent Dirichlet Allocation) model parameter
1) use LDA model to extract theme feature set to the software in open source software storehouse and set is numbered to theme feature according to bullets, the theme feature set of extracting as project 1 is F1, by that analogy.
2) much more as far as possible the number of every theme row of LDA model extraction is K, is set to 30, extract and comprehensively theme.
3) extracting word in the theme row of every counts the codomain of w and is , the very few theme of word number is summarized not comprehensive, the too much redundancy of word number.
4) the theme row of the theme Distribution Value size front two being calculated by LDA form best features set, are expressed as BestF i, the value of i is identical with bullets value.
3,, when developer carries out project development, project leaved for development is carried out to functional description, the characteristic set f using the word of developer's functional description as development project, in characteristic set f, Feature Words number is N, mates each BestF ifeature vocabulary in set, by the each word in characteristic set f respectively with BestF ieach feature word in set mates, and mates similar matching value n and adds 1, works as matching value time, by this BestF isets definition is optimum matching set, find out optimum matching set, the child node of retrieval character-interface digraph, go in corresponding item file according to interface position in child node, interface list is fed back to developer, developer selects suitable interface voluntarily, recommends selecteed interface according to the developer that orientates as of interface.
Two, method flow
Utilize LDA model and PageRank algorithm to set up feature interface digraph, be mainly divided into two steps for program development personnel recommend the process of suitable interface: set up feature interface digraph and recommend according to demand interface.
1, set up feature-interface digraph
1) address, selected open source software storehouse, project in retrieval software storehouse, is used LDA model successively to project P 1, P 2p ncarry out theme and extract operation, arrange each for LDA parameter a project is operated, taking this project as a document library.The feature (or being referred to as theme) that goes out a project by LDA model extraction afterwards, makes to use the same method to the project of increasing income of increasing income in storehouse successively and extracts feature and be numbered, and show that respectively the whole themes row of respective items object are numbered respectively F 1, F 2f n; For example in open source software storehouse, have n project, first project is about music software, and the theme set going out by LDA model extraction is { F 11: { play, next, stop}, F 12: { load, download}, F 13: Lyricsdownload, singer} ... we are F by this theme feature sets definition 1.Carry out identical operation for other open source softwares.The definition subscript difference of theme feature set;
2) in LDA model extraction theme feature process, obtain the theme Distribution Value that each theme is listed as, get the first two theme that theme Distribution Value is the highest and be listed as best features set B estFi.As a) gathered for example F 1in, F 11with F 12theme Distribution Value be front two, in fact by F1 set can find out F13 in fact for music software not necessarily, but the specific function of a few playout software; Best features set B estF 1=F 1∪ F 2, as shown in Figure 2;
3) travel through current project, retrieve interface in item file and with " filename. interface name " form locate this interface, the traversal of interface realizes interface by interface keyword Interface or class and searches." filename. interface name " form preserve as set form, wherein filename content has comprised current entry name, bag name under project, separates with ". " between the each individual event of class name in bag; Form is as follows: ProjectName.PackageName.ClassName.InterfaceName, for example in open source software storehouse, exist certain music software to be called Music, in project, there is bag Play (broadcast message class of putting the music on) by name, under bag, be useful on the class SetTime by name that adjusts time shaft, in class file, be provided with interface setTime.Retrieve for this interface so " filename. interface name " be: Music.Play.SetTime.setTime, does directed edge from BestF 1point to interface, form BestF 1the digraph of-> interface, as shown in Figure 3;
4) after digraph foundation, we use PageRank algorithm to calculate best features set B estF in digraph 1the number of times that links with each interface, while being similar to PageRank algorithm for Webpage search, calculate the importance that links number of times and determine webpage between webpage and webpage, in like manner can calculate the importance of same feature lower interface, the invoked number of times of the larger specification interface of calculated value is more also just more important should be first recommended, and carry out rank according to calculated value size docking port, for example go out best features set B estF by LDA model extraction 1, BestF 1under called interface 1 number of times be n j, the number of times of calling interface 2 is n i, and n j>n i, be first to recommend interface 1 when recommending so, its call number is more than interface 2, as shown in Figure 3.
2, recommend interface according to user's request
At user side, in the time that program development personnel carry out feature selecting, the theme feature sets definition that developer is selected is f and best features set B estF 1mate, according to above-mentioned condition criterion the match is successful, and according to digraph limit be oriented to user recommend sequence after interface, search interface according to the locating content of interface in node and recommend developer.According to above-mentioned example, BestF1={play, next, stop, load, download}, if when user carries out feature selecting, the theme feature set that developer selects is f={Music, play}, what user needed is playing function, finds word play the match is successful to be the interface of broadcasting music for user recommends the function under BestF1 after coupling.
The present invention is not limited to above-described embodiment; on the basis of technical scheme disclosed by the invention; those skilled in the art is according to disclosed technology contents; do not need performing creative labour just can make some replacements and distortion to some technical characterictics wherein, these replacements and distortion are all in protection scope of the present invention.

Claims (4)

1. a method of setting up feature interface digraph based on LDA model and PageRank algorithm, is characterized in that, comprises the steps:
Step 1) choose suitable open source software storehouse as code support;
Step 2) to the project in the open source software storehouse of choosing, extract the characteristic set of theme corresponding to project as this project by LDA topic model;
Step 3) interface message of retrieval in projects, set up and contact with the set of projects characteristic of correspondence, form the feature-interface digraph that is pointed to project interface set by item characteristic set, utilize PageRank algorithm calculate interface the call number of project inside as feature-interface digraph in the weights on limit;
Step 4) after constitutive characteristic-interface digraph, program development personnel are mated with the feature in described feature-interface digraph according to the feature of project to be developed, recommend possible optimum interface list for developer's choice for use according to the points relationship on digraph limit.
2. the method for setting up feature interface digraph based on LDA model and PageRank algorithm according to claim 1, it is characterized in that, step 2) concrete grammar as follows: LDA model is during to item extraction theme in open source software storehouse, by arranging for LDA parameter, realize and only extract a project at every turn, taking this project as a document library, go out the theme of this project by LDA model extraction, then, successively to the extraction theme that makes to use the same method of other projects of increasing income in open source software storehouse; Use LDA model extraction theme in the current project that need to extract theme time, using the first two theme the highest Distribution Value as best features set B estF i, best features set B estF ias the match objects of interface in step 4).
3. the method for setting up feature interface digraph based on LDA model and PageRank algorithm according to claim 2, it is characterized in that, the concrete grammar of step 3) is as follows: travel through current project, retrieve interface in item file and with " filename. interface name " form locate this interface, set up by best features set B estF ipoint to feature-interface digraph of set of interfaces; After feature-interface digraph is set up, use the call number of each interface in PageRank algorithm statistical item,, sort according to calculated value size docking port meanwhile.
4. the method for setting up feature interface digraph based on LDA model and PageRank algorithm according to claim 3, it is characterized in that, the concrete grammar of step 4) is as follows: in the time that program development personnel carry out characteristic matching, and the best features set B estF in characteristic set f and the claim 2 of the project to be developed that developer is selected imate, matching process is: each word in characteristic set f is mated to BestF successively ithe word of middle correspondence, in the time that the number percent of the match is successful word number the accounts for total word number of characteristic set f is more than or equal to 50%, judges characteristic set f and described best features set B estF ithe match is successful, and recommend interface list to select voluntarily for program development personnel according to the program development personnel that are oriented on limit in feature-interface digraph; After certain interface of program development personnel selection, determine concrete interface according to the locator meams in claim 3, and recommend this interface for developer.
CN201410156746.1A 2014-04-18 2014-04-18 Characteristic-interface digraph establishment method based on LDA model and PageRank algorithm Active CN103927177B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201410156746.1A CN103927177B (en) 2014-04-18 2014-04-18 Characteristic-interface digraph establishment method based on LDA model and PageRank algorithm

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410156746.1A CN103927177B (en) 2014-04-18 2014-04-18 Characteristic-interface digraph establishment method based on LDA model and PageRank algorithm

Publications (2)

Publication Number Publication Date
CN103927177A true CN103927177A (en) 2014-07-16
CN103927177B CN103927177B (en) 2017-01-25

Family

ID=51145409

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410156746.1A Active CN103927177B (en) 2014-04-18 2014-04-18 Characteristic-interface digraph establishment method based on LDA model and PageRank algorithm

Country Status (1)

Country Link
CN (1) CN103927177B (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104572111A (en) * 2015-01-20 2015-04-29 扬州大学 Program understanding and characteristic locating method based on correlated topic model
CN105487913A (en) * 2015-12-18 2016-04-13 浙江工商大学 Software package importance measurement method based on weighted a index
CN106294662A (en) * 2016-08-05 2017-01-04 华东师范大学 Inquiry based on context-aware theme represents and mixed index method for establishing model
CN109814855A (en) * 2017-11-21 2019-05-28 南京大学 A kind of API recommended method based on object classification and adaptive subgraph match
CN110554868A (en) * 2019-09-11 2019-12-10 北京航空航天大学 Software multiplexing code detection method and system
CN112051986A (en) * 2020-08-26 2020-12-08 西安电子科技大学 Code search recommendation device and method based on open source knowledge
CN116820555A (en) * 2023-08-29 2023-09-29 腾讯科技(深圳)有限公司 Application program packetizing method and device, electronic equipment and readable storage medium

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101645037B (en) * 2009-09-11 2011-06-29 兰雨晴 Integrated test coverage analysis method of foundational software platform application program interface
CN102629194B (en) * 2011-12-26 2015-07-01 天津大学 Novel application store adaptor facing mobile terminals

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
ADRIENGUILLE等: "SONDY : An Open Source Platform for Social Dynamics Mining and Analysis", 《SIGMOD’13》 *
尹莉: "一种基于PageRank算法的期刊评价理论模型", 《情报科学》 *

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104572111A (en) * 2015-01-20 2015-04-29 扬州大学 Program understanding and characteristic locating method based on correlated topic model
CN104572111B (en) * 2015-01-20 2017-12-01 扬州大学 A kind of program comprehension and characteristic positioning method based on related subject model
CN105487913A (en) * 2015-12-18 2016-04-13 浙江工商大学 Software package importance measurement method based on weighted a index
CN105487913B (en) * 2015-12-18 2018-07-31 浙江工商大学 A kind of software package importance measures method based on weighting a indexes
CN106294662A (en) * 2016-08-05 2017-01-04 华东师范大学 Inquiry based on context-aware theme represents and mixed index method for establishing model
CN109814855A (en) * 2017-11-21 2019-05-28 南京大学 A kind of API recommended method based on object classification and adaptive subgraph match
CN110554868A (en) * 2019-09-11 2019-12-10 北京航空航天大学 Software multiplexing code detection method and system
CN112051986A (en) * 2020-08-26 2020-12-08 西安电子科技大学 Code search recommendation device and method based on open source knowledge
CN112051986B (en) * 2020-08-26 2021-07-27 西安电子科技大学 Code search recommendation device and method based on open source knowledge
CN116820555A (en) * 2023-08-29 2023-09-29 腾讯科技(深圳)有限公司 Application program packetizing method and device, electronic equipment and readable storage medium
CN116820555B (en) * 2023-08-29 2023-11-28 腾讯科技(深圳)有限公司 Application program packetizing method and device, electronic equipment and readable storage medium

Also Published As

Publication number Publication date
CN103927177B (en) 2017-01-25

Similar Documents

Publication Publication Date Title
Zhang et al. Ad hoc table retrieval using semantic similarity
CN101876981B (en) A kind of method and device building knowledge base
CN103927177A (en) Characteristic-interface digraph establishment method based on LDA model and PageRank algorithm
US9495345B2 (en) Methods and systems for modeling complex taxonomies with natural language understanding
CN102184169B (en) Method, device and equipment used for determining similarity information among character string information
US8918348B2 (en) Web-scale entity relationship extraction
US8959080B2 (en) Search method, search apparatus and search engine system
Deshpande et al. Text summarization using clustering technique
US20130311487A1 (en) Semantic search using a single-source semantic model
CN105493075A (en) Retrieval of attribute values based upon identified entities
CN104978332B (en) User-generated content label data generation method, device and correlation technique and device
CN101727447A (en) Generation method and device of regular expression based on URL
CN103324666A (en) Topic tracing method and device based on micro-blog data
CN102419778A (en) Information searching method for discovering and clustering sub-topics of query statement
CN108509405A (en) A kind of generation method of PowerPoint, device and equipment
CN104281702A (en) Power keyword segmentation based data retrieval method and device
US11886515B2 (en) Hierarchical clustering on graphs for taxonomy extraction and applications thereof
CN105389328B (en) A kind of extensive open source software searching order optimization method
US20130151519A1 (en) Ranking Programs in a Marketplace System
Zhang et al. An approach of service discovery based on service goal clustering
CN109271624A (en) A kind of target word determines method, apparatus and storage medium
CN103377225A (en) Method and device for building knowledge base system
CN107784019A (en) Word treatment method and system are searched in a kind of searching service
Eyal-Salman et al. Feature-to-code traceability in legacy software variants
WO2006106740A1 (en) Information processing device and method, and program recording medium

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant