CN108182200A - Keyword expanding method and device based on semantic similarity - Google Patents
Keyword expanding method and device based on semantic similarity Download PDFInfo
- Publication number
- CN108182200A CN108182200A CN201711229082.7A CN201711229082A CN108182200A CN 108182200 A CN108182200 A CN 108182200A CN 201711229082 A CN201711229082 A CN 201711229082A CN 108182200 A CN108182200 A CN 108182200A
- Authority
- CN
- China
- Prior art keywords
- keyword
- candidate
- app
- semantic similarity
- words
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/953—Querying, e.g. by the use of web search engines
- G06F16/9535—Search customisation based on user profiles and personalisation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/30—Semantic analysis
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Databases & Information Systems (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Artificial Intelligence (AREA)
- Data Mining & Analysis (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The present invention relates to keyword expanding methods and device based on semantic similarity.The method includes:Keyword to be expanded is received, calculates keyword to be expanded and the semantic similarity of each candidate keywords in predetermined candidate key set of words;Multiple candidate keywords are included in the candidate key set of words;Obtain searchable index of each candidate keywords in application library platform, according to the searchable index of the semantic similarity, preset semantic similarity weight and each candidate keywords, similarity score of each candidate keywords relative to the keyword to be expanded is calculated;According to the sequence of similarity score, the candidate keywords of setting quantity are chosen from the candidate key set of words, obtain the expansion keyword of the keyword to be expanded.The present invention can Automatic sieve select the higher keyword of similitude, both realize volume production, while ensure that expansion quality.
Description
Technical field
The present invention relates to technical field of information retrieval, more particularly to the keyword expanding method based on semantic similarity and
Device.
Background technology
With the rapid development of intelligent terminal, the development of mobile Internet Software Industry has been driven.More and more users exist
Application library platform (i.e. using shop) in intelligent terminal downloads various APP (application, using), according to wikipedia
Data show that 65% user downloads required apply by the search of application shop.So APP developer exists to improve itself APP
Using the search quality in shop, need to carry out the Optimization Work using shop.One of its key job is that carry out APP keywords excellent
Change, and the core content of the optimization of keyword is to need to expand the emphasis keyword of APP.
At present, based on intelligent terminal apply shop specific industry knowledge background, keyword expand it is more by manually into
Row judges to expand, and for manually expanding, expands quality and is affected by human subjective's human-subject test, there are keyword expansions
As a result the defects of quality is unstable.
Invention content
Based on this, the present invention provides keyword expanding methods and device based on semantic similarity, can overcome existing
The defects of quality existing for keyword expansion mode is unstable.
Scheme provided in an embodiment of the present invention includes:
A kind of keyword expanding method based on semantic similarity, including:
Keyword to be expanded is received, calculates keyword to be expanded and each candidate in predetermined candidate key set of words
The semantic similarity of keyword;Multiple candidate keywords are included in the candidate key set of words;
Searchable index of each candidate keywords in application library platform is obtained, according to the semantic similarity, preset
The searchable index of semantic similarity weight and each candidate keywords calculates each candidate keywords and waits to expand relative to described
The similarity score of keyword;
According to the sequence of similarity score, the candidate key of setting quantity is chosen from the candidate key set of words
Word obtains the expansion keyword of the keyword to be expanded.
In one embodiment, keyword to be expanded is received, calculates keyword to be expanded and predetermined candidate keywords
In set before the semantic similarity of each candidate keywords, further include:
The historical search record information of application library platform is obtained, information is recorded according to the historical search and determines each keyword
Corresponding first mapping relations;Wherein, the historical search record information includes the key word information for searching for and respectively closes
The search result information of keyword;First mapping relations include the corresponding candidate APP set of keyword, further include candidate
The frequency of occurrence information of each APP in APP set;
First mapping relations of multiple keywords in information are recorded according to the historical search, determine that each APP is corresponded to
The second mapping relations;Second mapping relations include the corresponding keyword sets of APP;
The candidate key set of words of the application library platform is obtained according to first mapping relations and the second mapping relations.
In one embodiment, recording information according to the historical search determines each keyword with covering the first mapping of APP
Relationship, including:
Multiple search result of the same keyword in setting historical period in information is recorded according to the historical search,
Obtain the APP sequencing informations in the corresponding multiple search result of the keyword;
It sorts successively according to APP, the APP of setting quantity is chosen from each search result of the keyword, obtain described
The corresponding candidate APP set of keyword;
Frequency of occurrences of each APP in the multiple search result in candidate APP set is counted, obtains the keyword
Corresponding feature vector;Each element in described eigenvector corresponds to the appearance of each APP in the candidate APP set respectively
The frequency;
According to the corresponding candidate APP set of the keyword and feature vector, obtain the keyword corresponding first and reflect
Penetrate relationship.
In one embodiment, the time of the application library platform is obtained according to first mapping relations and the second mapping relations
Keyword set is selected, including:
A keyword matrix, the row of the keyword matrix are obtained according to first mapping relations and the second mapping relations
Number is equal to APP number in the corresponding candidate APP set of keyword in the first mapping relations, columns of the keyword matrix etc.
Keyword number in the second mapping relations in the corresponding keyword sets of APP;
According to the frequency of occurrence of keyword each in the keyword matrix, chosen from the keyword matrix and frequency occur
The secondary keyword for being greater than or equal to the setting frequency, obtains interim key set of words;
The searchable index of each keyword in the interim key set of words is obtained, is chosen from the interim key set of words
Searchable index is greater than or equal to the keyword of setting searchable index value, obtains candidate key set of words.
In one embodiment, by equation below calculate described in each time in keyword to be expanded and candidate key set of words
Select the semantic similarity of keyword:
Wherein, ki、kjI-th of keyword and j-th of keyword, V (k are represented respectivelyi)、V(kj) i-th of pass is represented respectively
The feature vector of keyword, the feature vector of j-th keyword, V (ki)·V(kj) represent two vectorial inner products, | | V (ki)||2
Represent vector V (ki) 2- norms, | | V (ki)||2||V(kj)||2Represent feature vector V (ki) and V (kj) 2- norms multiply
Product, sim (ki,kj) represent the semantic similarity of i-th of keyword and j-th of keyword.
In one embodiment, according to the semantic similarity, preset semantic similarity weight and each candidate key
The searchable index of word calculates each candidate keywords by equation below and is obtained relative to the similarity of the keyword to be expanded
Point:
Wherein, K ' represents keyword to be expanded, kiRepresent i-th of candidate keywords in candidate key set of words, Score
(ki) represent similarity score of candidate i-th of the candidate keywords relative to the keyword to be expanded;W represents the semanteme of setting
Similarity weight, (1-w) represent searchable index weight;sim(K′,ki) represent keyword to be expanded and i-th of candidate keywords
Semantic similarity;piRepresent the searchable index of i-th of candidate keywords, pminRepresent all candidates in candidate key set of words
The minimum searchable index value of keyword, pmaxIt is then maximum search exponential quantity, Scorei∈[0,100]。
In one embodiment, the historical search record information of application library platform is obtained, including:
By the interface of application library platform, the application library platform historical search of nearest one week record information is obtained.
A kind of keyword expanding device based on semantic similarity, including:
Semantic Similarity Measurement module, for receiving keyword to be expanded, calculate keyword to be expanded with it is predetermined
The semantic similarity of each candidate keywords in candidate key set of words;Multiple candidate passes are included in the candidate key set of words
Keyword;
Similarity score computing module, for obtaining searchable index of each candidate keywords in application library platform, root
According to the searchable index of the semantic similarity, preset semantic similarity weight and each candidate keywords, each time is calculated
Select similarity score of the keyword relative to the keyword to be expanded;
And selected ci poem modulus block is expanded, for according to the sequence of similarity score, from the candidate key set of words
The middle candidate keywords for choosing setting quantity obtain the expansion keyword of the keyword to be expanded.
A kind of computer readable storage medium, is stored thereon with computer program, which realizes when being executed by processor
The step of method described above.
A kind of computer equipment can be run on a memory and on a processor including memory, processor and storage
The step of computer program, the processor realizes method described above when performing described program.
Implement above-described embodiment, receiving when keyword is expanded, calculate keyword to be expanded first with predefining
Candidate key set of words in each candidate keywords semantic similarity;Multiple candidates are included in the candidate key set of words
Keyword;Then searchable index of each candidate keywords in application library platform is obtained, according to the semantic similarity, is preset
Semantic similarity weight and each candidate keywords searchable index, calculate each candidate keywords and wait to open up relative to described
Open up the similarity score of keyword;Finally according to the sequence of similarity score, chosen from the candidate key set of words
Set the candidate keywords of quantity, obtain the keyword to be expanded expansion keyword above-mentioned technical proposals can according to
The keyword of family input, selects the higher keyword of similitude based on semantic analysis Automatic sieve and provides its semantic similarity simultaneously and comment
Point, so as to improve the efficiency of APP operation personnel;In addition, by the keyword expanding method of above-described embodiment, it is also convenient for batch and leads
Go out similar key, realize that efficiency is also highly improved;Both it realizes volume production, while ensures that expansion quality.
Description of the drawings
Fig. 1 is the schematic flow chart of the keyword expanding method based on semantic similarity of an embodiment;
Fig. 2 is the schematic flow chart of the keyword expanding method based on semantic similarity of another embodiment;
Fig. 3 is the schematic diagram of the keyword expanding device based on semantic similarity of an embodiment.
Specific embodiment
In order to make the purpose , technical scheme and advantage of the present invention be clearer, with reference to the accompanying drawings and embodiments, it is right
The present invention is further elaborated.It should be appreciated that the specific embodiments described herein are merely illustrative of the present invention, and
It is not used in the restriction present invention.
Although the step in various embodiments of the present invention is arranged with label, it is not used to limit the priority time of step
Sequence, based on the order of step or the execution of certain step need other steps unless expressly stated, the otherwise phase of step
Order is adjustable.
Fig. 1 is the schematic flow chart of the keyword expanding method based on semantic similarity of an embodiment;Such as Fig. 1 institutes
Show, the keyword expanding method based on semantic similarity in the present embodiment includes step:
S11 receives keyword to be expanded, calculate keyword to be expanded with it is each in predetermined candidate key set of words
The semantic similarity of candidate keywords;Multiple candidate keywords are included in the candidate key set of words;
Keyword in the embodiment of the present invention includes all characters that can be used in application library platform search APP, such as the Chinese
Word, English word either letter, number or other letter symbols, can also be the combining form of several characters.
S12 obtains searchable index of each candidate keywords in application library platform, according to the semantic similarity, in advance
If semantic similarity weight and each candidate keywords searchable index, calculate each candidate keywords and treated relative to described
Expand the similarity score of keyword;
S13 according to the sequence of similarity score, chooses the candidate of setting quantity from the candidate key set of words
Keyword obtains the expansion keyword of the keyword to be expanded.
In an alternative embodiment, keyword to be expanded is received, calculates keyword to be expanded and predetermined candidate pass
In keyword set before the semantic similarity of each candidate keywords, the corresponding candidate keywords of determining application library platform are further included
The step of set.Comprising multiple candidate keywords in the candidate key set of words, the multiple candidate keywords are application
Be previously used in the platform of library for searching for the keyword of APP, form include Chinese character, English word or letter, number or
Other letter symbols of person etc..
The candidate key set of words can record information based on the historical search of the application library platform and obtain, such as based on
The search occurred in the application library platform in nearest one week records information, and described search record information includes the key for search
Word information and the corresponding search result information of each keyword, may also include the searchable index information of each keyword, searchable index
Be according to setting timing statistics in using the keyword application library platform carry out APP search cumulative number (volumes of searches), together
When consider what the search factors such as magnitude were calculated, both searchable index and volumes of searches are that positive relationship is presented, from empirically greatly
Cause estimation:(1) searchable index<4605 keyword, substantially volumes of searches is no more than 1 time daily;(2) searchable index>=4605
And<8000 keyword, daily volumes of searches ≈ searchable indexs -4604;(3) searchable index is more than 8000 keyword, daily
Volumes of searches ≈ (searchable index -4604) * f (x), f (x) represent searchable index and the non-simple linear increasing of volumes of searches both sides relation
Long relationship.
In an alternative embodiment, the step of determining application library platform corresponding candidate key set of words, may include:
First, the historical search record information of application library platform is obtained, information is recorded according to the historical search and is determined respectively
Corresponding first mapping relations of keyword;Wherein, historical search record information include the key word information for search with
And the search result information of each keyword;First mapping relations include the corresponding candidate APP set of keyword, further include
The frequency of occurrence information of each APP in candidate APP set.
Then, first mapping relations of multiple candidate keywords in information are recorded according to the historical search got,
Determine each APP with covering the second mapping relations of keyword;Second mapping relations include the corresponding keyword sets of APP
It closes.Further, the candidate key of the application library platform can be obtained according to first mapping relations and the second mapping relations
Set of words.
In an alternative embodiment, information is recorded according to historical search and determines each keyword with covering the first mapping of APP
The realization process of relationship includes:It is more in setting historical period that same keyword in information is recorded according to the historical search
Secondary search result obtains the APP sequencing informations in the corresponding multiple search result of the keyword;It sorts according to APP successively, from this
The APP of setting quantity is chosen in each search result of keyword, obtains the corresponding candidate APP set of the keyword;Statistics
Frequency of occurrences of each APP in the multiple search result in candidate APP set, obtain the corresponding feature of the keyword to
Amount;Each element in described eigenvector corresponds to the frequency of occurrence of each APP in the candidate APP set respectively;According to institute
The corresponding candidate APP set of keyword and feature vector are stated, obtains corresponding first mapping relations of the keyword.
It is above-mentioned to obtain the application according to first mapping relations and the second mapping relations in an alternative embodiment
The realization process of the candidate key set of words of library platform includes:
A keyword matrix, the row of the keyword matrix are obtained according to first mapping relations and the second mapping relations
Number is equal to APP number in the corresponding candidate APP set of keyword in the first mapping relations, columns of the keyword matrix etc.
Keyword number in the second mapping relations in the corresponding keyword sets of APP.According to pass each in the keyword matrix
The frequency of occurrence of keyword is chosen the keyword that frequency of occurrence is greater than or equal to the setting frequency from the keyword matrix, is obtained
Interim key set of words.The searchable index of each keyword in the interim key set of words is obtained, from the interim key word set
The keyword that searchable index is greater than or equal to setting searchable index value is chosen in conjunction, obtains candidate key set of words.
With reference to the logical schematic described in Fig. 2, the keyword based on semantic similarity of the embodiment of the present invention is opened up
Exhibition method is described further.
First, based on keyword in application library platform the search result information of nearest one week, for example, using i-th of pass
The APP that keyword scans for result covering represents as follows:
S(ki)=(appid1,appid2,…,appidn) (2-1)
In formula, i, n ∈ Z, Z are Positive Integer Set, kiRepresent i-th of keyword, n expressions are obtained by i-th of keyword search
N by there is tactic APP (can be identified with APPid), may not for the size of different keywords its corresponding n
Together.
Further, it is determined that the Direct mapping relationship of keyword and APP in application library platform:
Due to same keyword, (such as in one week) may repeatedly be searched for, and search plain knot in setting historical period
Fruit changes with the variation of search time.Statistics is carried out to search result to summarize, and it is corresponding to finally obtain i-th of keyword
APP set A (ki) and its feature vector V (ki),
A(ki)=(appid1,appid2,…,appidn) (2-2)
V(ki)=(count1,count2,…,countn) (2-3)
Wherein i, n ∈ Z, kiRepresent i-th of keyword, countnIt represents in setting historical period with the keyword search
There is the frequency of some app.
Further, it is determined that the reverse Mapping relationship of keyword and APP, i.e. corresponding second mapping relations of APP:
According to above-mentioned mapping relations S (ki) Inverted List is established, the corresponding keyword set K (a of i-th of APP can be obtainedi):
K(ai)=(keyword1,...,keywordn) (2-4)
Wherein i, n ∈ Z, aiRepresent i-th of APP, while difference APP corresponds to different n, i.e., the corresponding K (a of different APPi)
Dimension it is different.
Further, it is determined that candidate key set of words:
First, it obtains the corresponding APP of the keyword according to formula (2-2), formula (2-3) to gather, be chosen from APP set
The APP of m before frequency of occurrence ranking obtains the corresponding candidate's APP set S of the keywordapp:
Sapp=(appid(1),…,appid(m)) (3-1)
Wherein m ∈ Z.
According to candidate's APP set SappAnd its corresponding feature vector, obtain corresponding first mapping of the keyword
Relationship.
Further, the APP set in formula (3-1) is mapped according to formula (2-4), obtains a keyword matrix, remember
For Mkw:
Wherein m, n ∈ Z.
Further, the keyword matrix is screened:
(1) to keyword matrix MkwThe frequency of occurrence that merger counts wherein each keyword is carried out, frequency collating occurs in selection
The keyword of preceding n obtains interim key word set;
(2) keyword that the searchable index in the interim key word set is less than β is rejected, obtains candidate key word set
It closes, is denoted as,
Skw=(keyword1,keyword2,…,keywordn) (4-1)
Wherein n ∈ Z.
It should be noted that the determining process of above-mentioned candidate key set of words can be off-line calculation, and regularly update,
For example update a candidate key set of words again weekly, to ensure the expansion keyword obtained based on the candidate key set of words
Quality.
In an alternative embodiment, can be calculated by equation below described in keyword to be expanded and candidate key set of words
The semantic similarity of each candidate keywords:
Wherein, ki、kjI-th of keyword and j-th of keyword, V (k are represented respectivelyi)、V(kj) i-th of pass is represented respectively
The feature vector of keyword, the feature vector of j-th keyword, V (ki)·V(kj) represent two vectorial inner products, | | V (ki)||2
Represent vector V (ki) 2- norms, i.e. the quadratic sum of element absolute value evolution again, | | V (ki)||2||V(kj)||2Represent feature to
Measure V (ki) and V (kj) 2- norms product, sim (ki,kj) represent that the semanteme of i-th of keyword and j-th of keyword is similar
Degree.
It should be understood that between two keywords semantic similarity computational methods, it is including but not limited to above-mentioned based on
The algorithm of cosine similarity computing semantic similarity, the algorithm that other can also be used to be used for computing semantic similarity.
In an alternative embodiment, each candidate keywords can be calculated by equation below and wait to expand key relative to described
The similarity score of word:
Wherein, K ' represents keyword to be expanded, kiRepresent i-th of candidate keywords in candidate key set of words, Score
(ki) representing similarity score of i-th of candidate keywords relative to the keyword to be expanded, w represents that the semanteme of setting is similar
Weight is spent, (1-w) represents searchable index weight;piRepresent the searchable index of i-th of candidate keywords, pminRepresent candidate key
The minimum searchable index value of all candidate keywords, p in set of wordsmaxIt is then all candidate keywords in candidate key set of words
Maximum search exponential quantity;Wherein, Scorei∈[0,100]。
It should be understood that the computational methods of above-mentioned similarity score include but not limited to the above-mentioned calculating based under hundred-mark system
Formula can also determine other calculation formula, if can to a certain extent Technique Using Both Text similarity, keyword searchable index
And the influence power of the two.
By the above process, each keyword to be expanded input by user can be expanded out automatically in real time based on semantic phase
As lists of keywords;It expands high-quality and efficient.
It is right for applying shop (App store optimization, ASO) below with apple with reference to above-described embodiment
Application of the keyword expanding method in ASO based on semantic similarity of the present invention is illustrated:
1) APP historical search information of the apple using shop, such as the pass of nearest one week are obtained using apple developer API
Keyword search result, APP information (may include the dimensions such as APPID, APP title), key word information are (including keyword ID, key
The dimensions such as word, searchable index, search result)
2) historical search information in step 1) is pre-processed, arranges following mapping relations, be expressed as with hash tables
Shown in table 1.
Represent 1:
3) according to the mapping relations arranged in 2), the keyword ID of keyword to be expanded is inquired first, then based on the pass
Keyword ID inquires its corresponding APPID, chooses the APPID of 200 (can be based on actual conditions to set) before wherein ranking as crucial
Word corresponds to candidate APP set, and determines and the corresponding feature vector of the candidate APP set.
4) traversal step 3) in obtained feature vector, the mapping relations arranged in inquiry 2) obtain each APPID institutes
The keyword ID of covering obtains the corresponding keyword matrix of the keyword to be expanded.
5) frequency of occurrence that merger counts each keyword, selection wherein frequency of occurrence are carried out to keyword matrix in step 4)
The keyword that (can be based on actual conditions to set) 1000 before ranking, obtains interim key word matrix;Further to the intermediate pass
Keyword matrix is screened, and is rejected wherein searchable index and, less than the keyword of 4605 (can be based on actual conditions to set), is obtained institute
State the corresponding candidate key set of words of keyword to be expanded.
6) mapping relations obtained in inquiry 2), each candidate keywords are corresponding in candidate key set of words in obtaining 5)
Feature vector.
7) keyword to be expanded and the cosine similarity of each candidate keywords in candidate key set of words are calculated, with this
As pure semantic similarity.
8) semantic similarity weight is set as 0.9 (actual conditions can be based on to set), then searchable index weight is 1-0.9=
0.1, similarity score of each candidate keywords relative to the keyword to be expanded is calculated according to above-mentioned formula (6-1), from height
First 200 (actual conditions can be based on to set) are taken to low, obtain the expansion lists of keywords of the keyword to be expanded.
Under above-mentioned concrete application, i.e., by the keyword expanding method application of above-described embodiment based on semantic similarity
Word is opened up using upper in ASO, and that tests 5 keywords opens up word effect.10 keywords manually have been expanded for each keyword first,
Then first 50 are automatically determined for each keyword using the keyword expanding method based on semantic similarity of above-described embodiment
Similar key.Comparing result finds that preceding 50 keywords that the keyword 80% manually selected is selected automatically cover, it was demonstrated that
The validity of the keyword expanding method based on semantic similarity of above-described embodiment.Also, it compares and manually opens up word, above-mentioned reality
200 expansion keywords before applying the keyword expanding method based on semantic similarity of example and can providing within 3 seconds, speed is substantially
It is promoted.
It should be noted that for aforementioned each method embodiment, describe, it is all expressed as a series of for simplicity
Combination of actions, but those skilled in the art should know, the present invention is not limited by described sequence of movement, because according to
According to the present invention, certain steps may be used other sequences or be carried out at the same time.In addition, can also arbitrary group be carried out to above-described embodiment
It closes, obtains other embodiments.
Based on the thought identical with the keyword expanding method based on semantic similarity in above-described embodiment, the present invention is also
The keyword expanding device based on semantic similarity is provided, which can be used for performing the above-mentioned keyword based on semantic similarity
Expanding method.For convenience of description, in the structure diagram of the keyword expanding device embodiment based on semantic similarity, only
Show with the relevant part of the embodiment of the present invention, it will be understood by those skilled in the art that schematic structure not structure twin installation
Restriction, can include that more or fewer components either combine certain components or different components is arranged than illustrating.
Fig. 3 is the schematic diagram of the keyword expanding device based on semantic similarity of one embodiment of the invention;Such as
Shown in Fig. 3, the keyword expanding device based on semantic similarity of the present embodiment includes:
Semantic Similarity Measurement module 310 for receiving keyword to be expanded, calculates keyword to be expanded with predefining
Candidate key set of words in each candidate keywords semantic similarity;Multiple candidates are included in the candidate key set of words
Keyword;
Similarity score computing module 320, for obtaining searchable index of each candidate keywords in application library platform,
According to the searchable index of the semantic similarity, preset semantic similarity weight and each candidate keywords, calculate each
Candidate keywords relative to the keyword to be expanded similarity score;
And selected ci poem modulus block 330 is expanded, for according to the sequence of similarity score, from the candidate keywords
The candidate keywords of setting quantity are chosen in set, obtain the expansion keyword of the keyword to be expanded.
In an alternative embodiment, the keyword expanding device based on semantic similarity further includes:
Candidate key set of words determining module, for obtaining the historical search of application library platform record information, according to described
Historical search record information determines corresponding first mapping relations of each keyword;Wherein, the historical search record information includes
For the key word information of search and the search result information of each keyword;First mapping relations include keyword pair
The candidate APP set answered further includes the frequency of occurrence information of each APP in candidate APP set;It is recorded according to the historical search
First mapping relations of multiple keywords in information, determine corresponding second mapping relations of each APP;Second mapping is closed
System includes the corresponding keyword sets of APP;The application library is obtained according to first mapping relations and the second mapping relations
The candidate key set of words of platform.
In an alternative embodiment, the candidate key set of words determining module includes:
Candidate APP determination sub-modules are gone through for recording same keyword in information according to the historical search in setting
Multiple search result in the history period obtains the APP sequencing informations in the corresponding multiple search result of the keyword;According to APP
Sequence successively, the APP of setting quantity is chosen from each search result of the keyword, obtains the corresponding candidate of the keyword
APP gathers.
Feature vector determination sub-module, for counting in candidate APP set each APP in the multiple search result
Frequency of occurrence obtains the corresponding feature vector of the keyword;Each element in described eigenvector corresponds to the time respectively
Select the frequency of occurrence of each APP in APP set;
And mapping relations determination sub-module, for according to the keyword corresponding candidate APP set and feature to
Amount, obtains corresponding first mapping relations of the keyword.
In an alternative embodiment, the candidate key set of words determining module further includes:
Gather determination sub-module, put down for obtaining the application library according to first mapping relations and the second mapping relations
The candidate key set of words of platform;It is specifically used for:A keyword square is obtained according to first mapping relations and the second mapping relations
Battle array, the line number of the keyword matrix are equal to APP number in the corresponding candidate APP set of keyword in the first mapping relations,
The columns of the keyword matrix is equal to the keyword number in the corresponding keyword sets of APP in the second mapping relations;According to
The frequency of occurrence of each keyword in the keyword matrix, selection frequency of occurrence is greater than or equal to from the keyword matrix
The keyword of the frequency is set, obtains interim key set of words;The search for obtaining each keyword in the interim key set of words refers to
Number is chosen the keyword that searchable index is greater than or equal to setting searchable index value from the interim key set of words, is waited
Select keyword set.
It should be noted that in the embodiment of the keyword expanding device based on semantic similarity of above-mentioned example, respectively
The contents such as information exchange, implementation procedure between module, due to being based on same design, band with preceding method embodiment of the present invention
The technique effect come is identical with preceding method embodiment of the present invention, and particular content can be found in chatting in the method for the present invention embodiment
It states, details are not described herein again.
In addition, in the embodiment of the keyword expanding device based on semantic similarity of above-mentioned example, each program module
Logical partitioning be merely illustrative of, can be as needed in practical application, for example, for corresponding hardware configuration requirement or
The convenient of the realization of software considers, above-mentioned function distribution is completed by different program modules, i.e., will be described based on semantic similar
The internal structure of the keyword expanding device of degree is divided into different program modules, described above all or part of to complete
Function.
It will appreciated by the skilled person that realizing all or part of flow in above-described embodiment method, being can
It is completed with instructing relevant hardware by computer program, the program can be stored in a computer-readable storage and be situated between
In matter, it is independent product sale or uses.Described program when being executed, can perform the complete of such as method of the various embodiments described above
Portion or part steps.Wherein, the storage medium can be magnetic disc, CD, read-only memory (Read-Only
Memory, ROM) or random access memory (Random Access Memory, RAM) etc..
In addition, the storage medium can be also set in a kind of computer equipment, place is further included in the computer equipment
Manage device, when the processor performs the program in the storage medium, can realize the method for the various embodiments described above whole or
Part steps.
In the above-described embodiments, it all emphasizes particularly on different fields to the description of each embodiment, there is no the portion being described in detail in some embodiment
Point, it may refer to the associated description of other embodiments.It is appreciated that wherein used term " first ", " second " etc. are at this
It is used to distinguish object, but these objects should not be limited by these terms in text.
Embodiment described above only expresses the several embodiments of the present invention, it is impossible to be interpreted as to the scope of the claims of the present invention
Limitation.It should be pointed out that for those of ordinary skill in the art, without departing from the inventive concept of the premise,
Various modifications and improvements can be made, these belong to protection scope of the present invention.Therefore, the protection domain of patent of the present invention
It should be determined by the appended claims.
Claims (10)
1. a kind of keyword expanding method based on semantic similarity, which is characterized in that including:
Keyword to be expanded is received, calculates keyword to be expanded and each candidate key in predetermined candidate key set of words
The semantic similarity of word;Multiple candidate keywords are included in the candidate key set of words;
Obtain searchable index of each candidate keywords in application library platform;According to the semantic similarity, preset semanteme
The searchable index of similarity weight and each candidate keywords calculates each candidate keywords and waits to expand key relative to described
The similarity score of word;
According to the sequence of similarity score, the candidate keywords of setting quantity are chosen from the candidate key set of words,
Obtain the expansion keyword of the keyword to be expanded.
2. the keyword expanding method according to claim 1 based on semantic similarity, which is characterized in that receive and wait to expand
Keyword calculates keyword to be expanded and the semantic similarity of each candidate keywords in predetermined candidate key set of words
Before, it further includes:
The historical search record information of application library platform is obtained, recording information according to the historical search determines that each keyword corresponds to
The first mapping relations;Wherein, the historical search record information includes the key word information for search and each keyword
Search result information;First mapping relations include the corresponding candidate APP set of keyword, further include candidate APP collection
The frequency of occurrence information of each APP in conjunction;
First mapping relations of multiple keywords in information are recorded according to the historical search, determine each APP corresponding the
Two mapping relations;Second mapping relations include the corresponding keyword sets of APP;
The candidate key set of words of the application library platform is obtained according to first mapping relations and the second mapping relations.
3. the keyword expanding method according to claim 2 based on semantic similarity, which is characterized in that gone through according to described
History search record information determines corresponding first mapping relations of each keyword, including:
Multiple search result of the same keyword in setting historical period in information is recorded according to the historical search, is obtained
APP sequencing informations in the corresponding multiple search result of the keyword;
It sorts successively according to APP, the APP of setting quantity is chosen from each search result of the keyword, obtains the key
The corresponding candidate APP set of word;
Frequency of occurrences of each APP in the multiple search result in candidate APP set is counted, the keyword is obtained and corresponds to
Feature vector;Each element in described eigenvector corresponds to the appearance frequency of each APP in the candidate APP set respectively
It is secondary;
According to the corresponding candidate APP set of the keyword and feature vector, obtain corresponding first mapping of the keyword and close
System.
4. the keyword expanding method according to claim 3 based on semantic similarity, which is characterized in that according to described
One mapping relations and the second mapping relations obtain the candidate key set of words of the application library platform, including:
A keyword matrix, the line number of the keyword matrix etc. are obtained according to first mapping relations and the second mapping relations
The APP number in the first mapping relations in keyword corresponding candidate APP set, the columns of the keyword matrix is equal to the
Keyword number in two mapping relations in the corresponding keyword sets of APP;
According to the frequency of occurrence of keyword each in the keyword matrix, it is big that frequency of occurrence is chosen from the keyword matrix
In or equal to setting the frequency keyword, obtain interim key set of words;
The searchable index of each keyword in the interim key set of words is obtained, search is chosen from the interim key set of words
Index is greater than or equal to the keyword of setting searchable index value, obtains candidate key set of words.
5. the keyword expanding method according to claim 3 or 4 based on semantic similarity, which is characterized in that by such as
Keyword to be expanded and the semantic similarity of each candidate keywords in candidate key set of words described in lower formula calculating:
Wherein, ki、kjI-th of keyword and j-th of keyword, V (k are represented respectivelyi)、V(kj) i-th of keyword pair is represented respectively
Feature vector, the corresponding feature vector of j-th of keyword answered, V (ki)·V(kj) represent two vectorial inner products, | | V (ki)|
|2Represent vector V (ki) 2- norms, | | V (ki)||2||V(kj)||2Represent feature vector V (ki) and V (kj) 2- norms multiply
Product, sim (ki,kj) represent the semantic similarity of i-th of keyword and j-th of keyword.
6. the keyword expanding method according to any one of claims 1 to 4 based on semantic similarity, which is characterized in that root
According to the searchable index of the semantic similarity, preset semantic similarity weight and each candidate keywords, by following public
Formula calculates similarity score of each candidate keywords relative to the keyword to be expanded:
Wherein, K ' represents keyword to be expanded, kiRepresent i-th of candidate keywords in candidate key set of words, Score (ki) table
Show similarity score of candidate i-th of the candidate keywords relative to the keyword to be expanded;W represents the semantic similarity of setting
Weight, (1-w) represent searchable index weight;sim(K′,ki) represent keyword to be expanded and the semanteme of i-th of candidate keywords
Similarity;piRepresent the searchable index of i-th of candidate keywords, pminRepresent all candidate keywords in candidate key set of words
Minimum searchable index value, pmaxIt is then maximum search exponential quantity, Scorei∈[0,100]。
7. the keyword expanding method according to claim 2 based on semantic similarity, which is characterized in that obtain application library
The historical search record information of platform, including:
By the interface of application library platform, the application library platform historical search of nearest one week record information is obtained.
8. a kind of keyword expanding device based on semantic similarity, which is characterized in that including:
Semantic Similarity Measurement module for receiving keyword to be expanded, calculates keyword to be expanded and predetermined candidate
The semantic similarity of each candidate keywords in keyword set;Multiple candidate keys are included in the candidate key set of words
Word;
Similarity score computing module, for obtaining searchable index of each candidate keywords in application library platform, according to institute
The searchable index of semantic similarity, preset semantic similarity weight and each candidate keywords is stated, calculates each candidate pass
Keyword relative to the keyword to be expanded similarity score;
And selected ci poem modulus block is expanded, for according to the sequence of similarity score, being selected from the candidate key set of words
The candidate keywords of setting quantity are taken, obtain the expansion keyword of the keyword to be expanded.
9. a kind of computer readable storage medium, is stored thereon with computer program, which is characterized in that the program is held by processor
The step of claim 1 to 7 any the method is realized during row.
10. a kind of computer equipment including memory, processor and stores the meter that can be run on a memory and on a processor
Calculation machine program, which is characterized in that the processor realizes the step of claim 1 to 7 any the method when performing described program
Suddenly.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201711229082.7A CN108182200B (en) | 2017-11-29 | 2017-11-29 | Keyword expansion method and device based on semantic similarity |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201711229082.7A CN108182200B (en) | 2017-11-29 | 2017-11-29 | Keyword expansion method and device based on semantic similarity |
Publications (2)
Publication Number | Publication Date |
---|---|
CN108182200A true CN108182200A (en) | 2018-06-19 |
CN108182200B CN108182200B (en) | 2020-10-23 |
Family
ID=62545546
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201711229082.7A Active CN108182200B (en) | 2017-11-29 | 2017-11-29 | Keyword expansion method and device based on semantic similarity |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN108182200B (en) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109117475A (en) * | 2018-07-02 | 2019-01-01 | 武汉斗鱼网络科技有限公司 | A kind of method and relevant device of text rewriting |
CN110795534A (en) * | 2019-10-28 | 2020-02-14 | 维沃移动通信有限公司 | Information searching method and mobile terminal |
CN114238619A (en) * | 2022-02-23 | 2022-03-25 | 成都数联云算科技有限公司 | Method, system, device and medium for screening Chinese nouns based on edit distance |
CN115630154A (en) * | 2022-12-19 | 2023-01-20 | 竞速信息技术(廊坊)有限公司 | Big data environment-oriented dynamic summary information construction method and system |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103853722A (en) * | 2012-11-29 | 2014-06-11 | 腾讯科技(深圳)有限公司 | Query based keyword extension method, device and system |
CN106610972A (en) * | 2015-10-21 | 2017-05-03 | 阿里巴巴集团控股有限公司 | Query rewriting method and apparatus |
-
2017
- 2017-11-29 CN CN201711229082.7A patent/CN108182200B/en active Active
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103853722A (en) * | 2012-11-29 | 2014-06-11 | 腾讯科技(深圳)有限公司 | Query based keyword extension method, device and system |
CN106610972A (en) * | 2015-10-21 | 2017-05-03 | 阿里巴巴集团控股有限公司 | Query rewriting method and apparatus |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109117475A (en) * | 2018-07-02 | 2019-01-01 | 武汉斗鱼网络科技有限公司 | A kind of method and relevant device of text rewriting |
CN110795534A (en) * | 2019-10-28 | 2020-02-14 | 维沃移动通信有限公司 | Information searching method and mobile terminal |
CN114238619A (en) * | 2022-02-23 | 2022-03-25 | 成都数联云算科技有限公司 | Method, system, device and medium for screening Chinese nouns based on edit distance |
CN114238619B (en) * | 2022-02-23 | 2022-04-29 | 成都数联云算科技有限公司 | Method, system, device and medium for screening Chinese nouns based on edit distance |
CN115630154A (en) * | 2022-12-19 | 2023-01-20 | 竞速信息技术(廊坊)有限公司 | Big data environment-oriented dynamic summary information construction method and system |
CN115630154B (en) * | 2022-12-19 | 2023-05-05 | 竞速信息技术(廊坊)有限公司 | Big data environment-oriented dynamic abstract information construction method and system |
Also Published As
Publication number | Publication date |
---|---|
CN108182200B (en) | 2020-10-23 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CA2655196C (en) | System and method for generating a display of tags | |
AU2006266103B2 (en) | Determination of a desired repository | |
CN108182200A (en) | Keyword expanding method and device based on semantic similarity | |
CN110532351B (en) | Recommendation word display method, device and equipment and computer readable storage medium | |
CN109460519B (en) | Browsing object recommendation method and device, storage medium and server | |
CN108509617A (en) | Construction of knowledge base, intelligent answer method and device, storage medium, the terminal in knowledge based library | |
US20180150561A1 (en) | Searching method and searching apparatus based on neural network and search engine | |
CN104462336A (en) | Information pushing method and device | |
CN105224554A (en) | Search word is recommended to carry out method, system, server and the intelligent terminal searched for | |
WO2008084930A1 (en) | Method for offering result of search and system for executing the method | |
CN109933660A (en) | The API information search method based on handout and Stack Overflow towards natural language form | |
JP6728178B2 (en) | Method and apparatus for processing search data | |
CN109933708A (en) | Information retrieval method, device, storage medium and computer equipment | |
CN113806630B (en) | Attention-based multi-view feature fusion cross-domain recommendation method and device | |
CN111061954B (en) | Search result sorting method and device and storage medium | |
CN104077327B (en) | The recognition methods of core word importance and equipment and search result ordering method and equipment | |
CN114330329A (en) | Service content searching method and device, electronic equipment and storage medium | |
CN106294785A (en) | Content Selection method and system | |
CN109543113B (en) | Method and device for determining click recommendation words, storage medium and electronic equipment | |
CN114722086A (en) | Method and device for determining search rearrangement model | |
US20190347295A1 (en) | Display apparatus and display method | |
CN111160699A (en) | Expert recommendation method and system | |
CN110351183B (en) | Resource collection method and device in instant messaging | |
CN108170665A (en) | Keyword expanding method and device based on comprehensive similarity | |
CN104750692B (en) | A kind of information processing method, information retrieval method and its corresponding device |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |