CN103927177A - Characteristic-interface digraph establishment method based on LDA model and PageRank algorithm - Google Patents
Characteristic-interface digraph establishment method based on LDA model and PageRank algorithm Download PDFInfo
- Publication number
- CN103927177A CN103927177A CN201410156746.1A CN201410156746A CN103927177A CN 103927177 A CN103927177 A CN 103927177A CN 201410156746 A CN201410156746 A CN 201410156746A CN 103927177 A CN103927177 A CN 103927177A
- Authority
- CN
- China
- Prior art keywords
- interface
- digraph
- characteristic
- feature
- project
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Landscapes
- Stored Programmes (AREA)
Abstract
The invention discloses a characteristic-interface digraph establishment method based on an LDA model and the PageRank algorithm in the field of software engineering. The method includes the following steps that firstly, a suitable open source software library is selected as a code assistant; secondly, topics corresponding to items are put forward through the LDA topic model and serve as characteristic sets of the corresponding items; thirdly, interface messages in all the items are retrieved, relationships between the interface messages and the corresponding characteristic sets of the items are established to form a characteristic-interface digraph that the item characteristic sets are directed to the item interface sets, and the number of times that interfaces are called in the items is calculated through the PageRank algorithm and serves as the weight of edges of the characteristic-interface digraph; fourthly, after the characteristic-interface digraph is formed, a program developer matches the characteristics of an item to be developed with characteristics in the characteristic-interface digraph, and the possibly optimum interface list is recommended to the developer to be selected and used according to the directional relationship of the edges of the digraph. The method improves software development efficiency and can be applied to software development.
Description
Technical field
The present invention relates to a kind of project interface choosing method, particularly a kind of method of setting up feature interface digraph, belongs to field of software engineering.
Background technology
When developer is in the time developing new program, first can analyze according to demand the division of carrying out functional module, database step is considered in setting data unit afterwards.In program development process, the setting of interface often also there will be in developer's consideration category, so under such background, developer is after having carried out functional module division, according to developer, the functional description of program is carried out to the interface suitable for developer's recommended program, select for developer, improve software development efficiency.In the time of software development, conventionally can according to project demands document, interface be manually set by project leader at present, in the time of search interface, according to keyword search, the method existence is searched not comprehensive, and recall precision is not high, undesirable for realizing of code reuse.
Core technology of the present invention is to generate " feature-interface " digraph, can realize the function of the item characteristic recommendation interface of exploitation according to demand by the digraph generating.Core technology of the present invention need to extract the importance of the project interface in feature and the open source software storehouse of project, and the technology using comprises LDA topic model and PageRank algorithm.LDA topic model is a kind of probabilistic language model, and field is applied to text modeling, finds hiding subject information from text, uses LDA topic model to extract the theme feature of project in open source software storehouse in the present invention.PageRank algorithm is a kind of search engine algorithms, for weighing the significance level of particular webpage with respect to other webpages of search engine index, PageRank algorithm has been realized link Concept of Value as rank factor, in the present invention, PageRank algorithm application, in the call number of calculating the distinct interface under certain item characteristic, is carried out rank.
According to the characteristic of LDA theme, use this model extraction item characteristic.Can identify developer's demand characteristic by topic model technology, obtain the feature (theme) of the project of increasing income in open source software storehouse simultaneously; The interface that utilizes simple traversal technology to find out in open source software forms digraph.And be analogous to the form of calculation of web page interlinkage importance, and carry out the significance level of interface in computational item with PageRank algorithm, using the theme feature of project as homepage, and interface is as the subpage frame being linked by homepage.Utilize PageRank technology to retrieve the interface of the character pair of developer's needs, feed back to developer, for its choice for use.
Summary of the invention
The object of this invention is to provide a kind of method of setting up feature interface digraph based on LDA model and PageRank algorithm, the performance of program providing according to developer, automatically from software library, recommend the routine interface that is applicable to this feature for developer, thereby improved the efficiency of software development, the interface of code is reused.
The object of the present invention is achieved like this: a kind of method of setting up feature interface digraph based on LDA model and PageRank algorithm, it is characterized in that, and comprise the steps:
Step 1) choose suitable open source software storehouse as code support;
Step 2) to the project in the open source software storehouse of choosing, extract the characteristic set of theme corresponding to project as this project by LDA topic model;
Step 3) interface message of retrieval in projects, set up and contact with the set of projects characteristic of correspondence, form the feature-interface digraph that is pointed to project interface set by item characteristic set, utilize PageRank algorithm calculate interface the call number of project inside as feature-interface digraph in the weights on limit;
Step 4) after constitutive characteristic-interface digraph, program development personnel are mated with the feature in described feature-interface digraph according to the feature of project to be developed, recommend possible optimum interface list for developer's choice for use according to the points relationship on digraph limit.
As further restriction of the present invention, simplify step, raise the efficiency, step 2) concrete grammar as follows: LDA model during respectively to item extraction theme in open source software storehouse, by arranging for LDA parameter, is realized the project of at every turn only extracting, taking this project as a document library, go out the theme of this project by LDA model extraction, then, successively to the extraction theme that makes to use the same method of other projects of increasing income in open source software storehouse; Use LDA model theme in the current project that need to extract theme time, using the first two theme the highest Distribution Value as best features set B estF
i, best features set B estF
ias the match objects of interface in step 4).
As further restriction of the present invention, simplify step, raise the efficiency, the concrete grammar of step 3) is as follows: travel through current project, retrieve interface in item file and with " filename. interface name " form locate this interface, set up by best features set B estF
ipoint to feature-interface digraph of set of interfaces; After feature-interface digraph is set up, use the call number of each interface in PageRank algorithm statistical item,, sort according to calculated value size docking port meanwhile.
As further restriction of the present invention, simplify step, raise the efficiency, the concrete grammar of step 4) is as follows: in the time that program development personnel carry out characteristic matching, and the best features set B estF in characteristic set f and the claim 2 of the project to be developed that developer is selected
imate, matching process is: each word in characteristic set f is mated to BestF successively
ithe word of middle correspondence, in the time that the number percent of the match is successful word number the accounts for total word number of characteristic set f is more than or equal to 50%, judges characteristic set f and described best features set B estF
ithe match is successful, and recommend interface list to select voluntarily for program development personnel according to the program development personnel that are oriented on limit in feature-interface digraph; After certain interface of program development personnel selection, determine concrete interface according to the locator meams in claim 3, and recommend this interface for developer.
Compared with prior art, beneficial effect of the present invention is, the present invention extracts theme feature by LDA topic model, simplify the artificial step of understanding code and extract project theme, by interface keyword search interface and it is stored according to specific format, the theme extracting forms digraph, recommends optimum interface when needed by PageRank algorithm according to the relation sequence on limit in digraph; The advantage that this technology is brought mainly contain following some:
1) be developer's recommendation function interface, simplified software development process, improve development efficiency;
2) while recommending interface according to developer's functional description for developer, can carry out function refinement for developer, improve the function of software;
3) recommendation of interface can help developer to write the better code of reusability;
4), after project interface is had good positioning, the developer of disparate modules can, first according to the module of the definition exploitation of interface oneself, improve development efficiency.The present invention can be used in software development.
Brief description of the drawings
Fig. 1 is use procedure process flow diagram of the present invention.
Fig. 2 utilizes LDA topic model to extract theme process flow diagram flow chart in the present invention.
Fig. 3 is the process flow diagram of setting up feature-interface digraph process in the present invention.
Fig. 4 recommends interface procedure schematic diagram according to item characteristic retrieval character-interface digraph to be developed.
Wherein, 1LAD extracts theme process, and 2LDA Computation distribution value is extracted BestF
iprocess, 3 set up feature-interface digraph process, 4 interface position processes, 5 characteristic matching processes.
Embodiment
Below in conjunction with the drawings and specific embodiments, technical scheme of the present invention is made to detailed description.
Method of the present invention is to extract the theme in open source software storehouse by LDA (Latent Dirichlet Allocation) topic model, give project subject description, use the subject word set that LDA model extraction goes out to describe as the feature of this project, the theme feature set that uses PageRank algorithm to set up to extract with travel through the digraph of the interface finding out and calculate the weights on every limit, then sort according to weights, according to the summit of the feature key search digraph of developer's input, the match is successful recommends the node (being interface) under this summit to select for developer, as shown in Figure 1.
One, parameter-definition
1, open source software library file
Suppose existing open source software storehouse (using the entirety of the project of increasing income in China of increasing income as open source software storehouse), in open source software storehouse, have n project to be respectively P
1, P
2p
n.
2, LDA (Latent Dirichlet Allocation) model parameter
1) use LDA model to extract theme feature set to the software in open source software storehouse and set is numbered to theme feature according to bullets, the theme feature set of extracting as project 1 is F1, by that analogy.
2) much more as far as possible the number of every theme row of LDA model extraction is K, is set to 30, extract and comprehensively theme.
3) extracting word in the theme row of every counts the codomain of w and is
, the very few theme of word number is summarized not comprehensive, the too much redundancy of word number.
4) the theme row of the theme Distribution Value size front two being calculated by LDA form best features set, are expressed as BestF
i, the value of i is identical with bullets value.
3,, when developer carries out project development, project leaved for development is carried out to functional description, the characteristic set f using the word of developer's functional description as development project, in characteristic set f, Feature Words number is N, mates each BestF
ifeature vocabulary in set, by the each word in characteristic set f respectively with BestF
ieach feature word in set mates, and mates similar matching value n and adds 1, works as matching value
time, by this BestF
isets definition is optimum matching set, find out optimum matching set, the child node of retrieval character-interface digraph, go in corresponding item file according to interface position in child node, interface list is fed back to developer, developer selects suitable interface voluntarily, recommends selecteed interface according to the developer that orientates as of interface.
Two, method flow
Utilize LDA model and PageRank algorithm to set up feature interface digraph, be mainly divided into two steps for program development personnel recommend the process of suitable interface: set up feature interface digraph and recommend according to demand interface.
1, set up feature-interface digraph
1) address, selected open source software storehouse, project in retrieval software storehouse, is used LDA model successively to project P
1, P
2p
ncarry out theme and extract operation, arrange each for LDA parameter a project is operated, taking this project as a document library.The feature (or being referred to as theme) that goes out a project by LDA model extraction afterwards, makes to use the same method to the project of increasing income of increasing income in storehouse successively and extracts feature and be numbered, and show that respectively the whole themes row of respective items object are numbered respectively F
1, F
2f
n; For example in open source software storehouse, have n project, first project is about music software, and the theme set going out by LDA model extraction is { F
11: { play, next, stop}, F
12: { load, download}, F
13: Lyricsdownload, singer} ... we are F by this theme feature sets definition
1.Carry out identical operation for other open source softwares.The definition subscript difference of theme feature set;
2) in LDA model extraction theme feature process, obtain the theme Distribution Value that each theme is listed as, get the first two theme that theme Distribution Value is the highest and be listed as best features set B estFi.As a) gathered for example F
1in, F
11with F
12theme Distribution Value be front two, in fact by F1 set can find out F13 in fact for music software not necessarily, but the specific function of a few playout software; Best features set B estF
1=F
1∪ F
2, as shown in Figure 2;
3) travel through current project, retrieve interface in item file and with " filename. interface name " form locate this interface, the traversal of interface realizes interface by interface keyword Interface or class and searches." filename. interface name " form preserve as set form, wherein filename content has comprised current entry name, bag name under project, separates with ". " between the each individual event of class name in bag; Form is as follows: ProjectName.PackageName.ClassName.InterfaceName, for example in open source software storehouse, exist certain music software to be called Music, in project, there is bag Play (broadcast message class of putting the music on) by name, under bag, be useful on the class SetTime by name that adjusts time shaft, in class file, be provided with interface setTime.Retrieve for this interface so " filename. interface name " be: Music.Play.SetTime.setTime, does directed edge from BestF
1point to interface, form BestF
1the digraph of-> interface, as shown in Figure 3;
4) after digraph foundation, we use PageRank algorithm to calculate best features set B estF in digraph
1the number of times that links with each interface, while being similar to PageRank algorithm for Webpage search, calculate the importance that links number of times and determine webpage between webpage and webpage, in like manner can calculate the importance of same feature lower interface, the invoked number of times of the larger specification interface of calculated value is more also just more important should be first recommended, and carry out rank according to calculated value size docking port, for example go out best features set B estF by LDA model extraction
1, BestF
1under called interface 1 number of times be n
j, the number of times of calling interface 2 is n
i, and n
j>n
i, be first to recommend interface 1 when recommending so, its call number is more than interface 2, as shown in Figure 3.
2, recommend interface according to user's request
At user side, in the time that program development personnel carry out feature selecting, the theme feature sets definition that developer is selected is f and best features set B estF
1mate, according to above-mentioned
condition criterion the match is successful, and according to digraph limit be oriented to user recommend sequence after interface, search interface according to the locating content of interface in node and recommend developer.According to above-mentioned example, BestF1={play, next, stop, load, download}, if when user carries out feature selecting, the theme feature set that developer selects is f={Music, play}, what user needed is playing function, finds word play the match is successful to be the interface of broadcasting music for user recommends the function under BestF1 after coupling.
The present invention is not limited to above-described embodiment; on the basis of technical scheme disclosed by the invention; those skilled in the art is according to disclosed technology contents; do not need performing creative labour just can make some replacements and distortion to some technical characterictics wherein, these replacements and distortion are all in protection scope of the present invention.
Claims (4)
1. a method of setting up feature interface digraph based on LDA model and PageRank algorithm, is characterized in that, comprises the steps:
Step 1) choose suitable open source software storehouse as code support;
Step 2) to the project in the open source software storehouse of choosing, extract the characteristic set of theme corresponding to project as this project by LDA topic model;
Step 3) interface message of retrieval in projects, set up and contact with the set of projects characteristic of correspondence, form the feature-interface digraph that is pointed to project interface set by item characteristic set, utilize PageRank algorithm calculate interface the call number of project inside as feature-interface digraph in the weights on limit;
Step 4) after constitutive characteristic-interface digraph, program development personnel are mated with the feature in described feature-interface digraph according to the feature of project to be developed, recommend possible optimum interface list for developer's choice for use according to the points relationship on digraph limit.
2. the method for setting up feature interface digraph based on LDA model and PageRank algorithm according to claim 1, it is characterized in that, step 2) concrete grammar as follows: LDA model is during to item extraction theme in open source software storehouse, by arranging for LDA parameter, realize and only extract a project at every turn, taking this project as a document library, go out the theme of this project by LDA model extraction, then, successively to the extraction theme that makes to use the same method of other projects of increasing income in open source software storehouse; Use LDA model extraction theme in the current project that need to extract theme time, using the first two theme the highest Distribution Value as best features set B estF
i, best features set B estF
ias the match objects of interface in step 4).
3. the method for setting up feature interface digraph based on LDA model and PageRank algorithm according to claim 2, it is characterized in that, the concrete grammar of step 3) is as follows: travel through current project, retrieve interface in item file and with " filename. interface name " form locate this interface, set up by best features set B estF
ipoint to feature-interface digraph of set of interfaces; After feature-interface digraph is set up, use the call number of each interface in PageRank algorithm statistical item,, sort according to calculated value size docking port meanwhile.
4. the method for setting up feature interface digraph based on LDA model and PageRank algorithm according to claim 3, it is characterized in that, the concrete grammar of step 4) is as follows: in the time that program development personnel carry out characteristic matching, and the best features set B estF in characteristic set f and the claim 2 of the project to be developed that developer is selected
imate, matching process is: each word in characteristic set f is mated to BestF successively
ithe word of middle correspondence, in the time that the number percent of the match is successful word number the accounts for total word number of characteristic set f is more than or equal to 50%, judges characteristic set f and described best features set B estF
ithe match is successful, and recommend interface list to select voluntarily for program development personnel according to the program development personnel that are oriented on limit in feature-interface digraph; After certain interface of program development personnel selection, determine concrete interface according to the locator meams in claim 3, and recommend this interface for developer.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201410156746.1A CN103927177B (en) | 2014-04-18 | 2014-04-18 | Characteristic-interface digraph establishment method based on LDA model and PageRank algorithm |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201410156746.1A CN103927177B (en) | 2014-04-18 | 2014-04-18 | Characteristic-interface digraph establishment method based on LDA model and PageRank algorithm |
Publications (2)
Publication Number | Publication Date |
---|---|
CN103927177A true CN103927177A (en) | 2014-07-16 |
CN103927177B CN103927177B (en) | 2017-01-25 |
Family
ID=51145409
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201410156746.1A Active CN103927177B (en) | 2014-04-18 | 2014-04-18 | Characteristic-interface digraph establishment method based on LDA model and PageRank algorithm |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN103927177B (en) |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104572111A (en) * | 2015-01-20 | 2015-04-29 | 扬州大学 | Program understanding and characteristic locating method based on correlated topic model |
CN105487913A (en) * | 2015-12-18 | 2016-04-13 | 浙江工商大学 | Software package importance measurement method based on weighted a index |
CN106294662A (en) * | 2016-08-05 | 2017-01-04 | 华东师范大学 | Inquiry based on context-aware theme represents and mixed index method for establishing model |
CN109814855A (en) * | 2017-11-21 | 2019-05-28 | 南京大学 | A kind of API recommended method based on object classification and adaptive subgraph match |
CN110554868A (en) * | 2019-09-11 | 2019-12-10 | 北京航空航天大学 | Software multiplexing code detection method and system |
CN112051986A (en) * | 2020-08-26 | 2020-12-08 | 西安电子科技大学 | Code search recommendation device and method based on open source knowledge |
CN116820555A (en) * | 2023-08-29 | 2023-09-29 | 腾讯科技(深圳)有限公司 | Application program packetizing method and device, electronic equipment and readable storage medium |
Family Cites Families (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101645037B (en) * | 2009-09-11 | 2011-06-29 | 兰雨晴 | Integrated test coverage analysis method of foundational software platform application program interface |
CN102629194B (en) * | 2011-12-26 | 2015-07-01 | 天津大学 | Novel application store adaptor facing mobile terminals |
-
2014
- 2014-04-18 CN CN201410156746.1A patent/CN103927177B/en active Active
Non-Patent Citations (2)
Title |
---|
ADRIENGUILLE等: "SONDY : An Open Source Platform for Social Dynamics Mining and Analysis", 《SIGMOD’13》 * |
尹莉: "一种基于PageRank算法的期刊评价理论模型", 《情报科学》 * |
Cited By (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104572111A (en) * | 2015-01-20 | 2015-04-29 | 扬州大学 | Program understanding and characteristic locating method based on correlated topic model |
CN104572111B (en) * | 2015-01-20 | 2017-12-01 | 扬州大学 | A kind of program comprehension and characteristic positioning method based on related subject model |
CN105487913A (en) * | 2015-12-18 | 2016-04-13 | 浙江工商大学 | Software package importance measurement method based on weighted a index |
CN105487913B (en) * | 2015-12-18 | 2018-07-31 | 浙江工商大学 | A kind of software package importance measures method based on weighting a indexes |
CN106294662A (en) * | 2016-08-05 | 2017-01-04 | 华东师范大学 | Inquiry based on context-aware theme represents and mixed index method for establishing model |
CN109814855A (en) * | 2017-11-21 | 2019-05-28 | 南京大学 | A kind of API recommended method based on object classification and adaptive subgraph match |
CN110554868A (en) * | 2019-09-11 | 2019-12-10 | 北京航空航天大学 | Software multiplexing code detection method and system |
CN112051986A (en) * | 2020-08-26 | 2020-12-08 | 西安电子科技大学 | Code search recommendation device and method based on open source knowledge |
CN112051986B (en) * | 2020-08-26 | 2021-07-27 | 西安电子科技大学 | Code search recommendation device and method based on open source knowledge |
CN116820555A (en) * | 2023-08-29 | 2023-09-29 | 腾讯科技(深圳)有限公司 | Application program packetizing method and device, electronic equipment and readable storage medium |
CN116820555B (en) * | 2023-08-29 | 2023-11-28 | 腾讯科技(深圳)有限公司 | Application program packetizing method and device, electronic equipment and readable storage medium |
Also Published As
Publication number | Publication date |
---|---|
CN103927177B (en) | 2017-01-25 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Zhang et al. | Ad hoc table retrieval using semantic similarity | |
CN101876981B (en) | A kind of method and device building knowledge base | |
CN103927177A (en) | Characteristic-interface digraph establishment method based on LDA model and PageRank algorithm | |
US9495345B2 (en) | Methods and systems for modeling complex taxonomies with natural language understanding | |
CN102184169B (en) | Method, device and equipment used for determining similarity information among character string information | |
US8918348B2 (en) | Web-scale entity relationship extraction | |
US8959080B2 (en) | Search method, search apparatus and search engine system | |
Deshpande et al. | Text summarization using clustering technique | |
US20130311487A1 (en) | Semantic search using a single-source semantic model | |
CN105493075A (en) | Retrieval of attribute values based upon identified entities | |
CN104978332B (en) | User-generated content label data generation method, device and correlation technique and device | |
CN101727447A (en) | Generation method and device of regular expression based on URL | |
CN103324666A (en) | Topic tracing method and device based on micro-blog data | |
CN102419778A (en) | Information searching method for discovering and clustering sub-topics of query statement | |
CN108509405A (en) | A kind of generation method of PowerPoint, device and equipment | |
CN104281702A (en) | Power keyword segmentation based data retrieval method and device | |
US11886515B2 (en) | Hierarchical clustering on graphs for taxonomy extraction and applications thereof | |
CN105389328B (en) | A kind of extensive open source software searching order optimization method | |
US20130151519A1 (en) | Ranking Programs in a Marketplace System | |
Zhang et al. | An approach of service discovery based on service goal clustering | |
CN109271624A (en) | A kind of target word determines method, apparatus and storage medium | |
CN103377225A (en) | Method and device for building knowledge base system | |
CN107784019A (en) | Word treatment method and system are searched in a kind of searching service | |
Eyal-Salman et al. | Feature-to-code traceability in legacy software variants | |
WO2006106740A1 (en) | Information processing device and method, and program recording medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
C14 | Grant of patent or utility model | ||
GR01 | Patent grant |