CN106649818A

CN106649818A - Recognition method and device for application search intentions and application search method and server

Info

Publication number: CN106649818A
Application number: CN201611246921.1A
Authority: CN
Inventors: 庞伟
Original assignee: Beijing Qihoo Technology Co Ltd
Current assignee: Beijing Qihoo Technology Co Ltd
Priority date: 2016-12-29
Filing date: 2016-12-29
Publication date: 2017-05-10
Anticipated expiration: 2036-12-29
Also published as: CN106649818B

Abstract

The invention discloses a recognition method and device for application search intentions and an application search method and server. The method includes the steps that search terms in query sessions are acquired from a query session log of an application search engine; label systems of the search terms are mined according to the search terms in the query sessions and preset strategies; the application search intentions corresponding to the search terms are recognized according to the label systems of the search terms. In the scheme, by proposing the label method of the recognition method for the user intentions corresponding to the app label systems, the fine-grained user query intentions are flexibly expressed. The label systems of the user intentions are constructed based on the unsupervised machine learning technology, a traditional user intention classification method is abandoned, an automatic user intention mining process is achieved, and a high-accuracy and high-recall-rate user intention label list can be generated; the user intentions and apps are mapped into the same label system, and thus a user can rapidly and accurately acquire apps meeting intentions when searching for apps.

Description

Recognition methodss, device, application searches method and server that application searches are intended to

Technical field

The present invention relates to Data Mining, and in particular to a kind of recognition methodss of application searches intention, device, application are searched Suo Fangfa and server.

Background technology

Application searches engine is a mobile terminal software application search engine service, there is provided app on mobile phone search and under Carry, such as 360 mobile phone assistant, Tengxun's application treasured, GooglePlay, Appstore.Application searches engine is mounted on mobile phone Mobile search service, such as 360 mobile phone assistant app application, because the objective condition such as plane is little that represent of Search Results are limited, only Being provided with accurately Search Results could obtain optimal Consumer's Experience, be also the important area of mobile search and PC ends Webpage search One of not.Mobile terminal app number of applications is huge, there is millions of app applications, and application searches engine needs understanding user's inquiry On the premise of intention, could accurately be presented to user that it is a in the minds of thought app application.

It is the query intention for precisely understanding user that application searches engine provides the premise of accurate search service.Each of user Inquiry request all implies behind potential search intention, if application searches engine can perceive user's request, by search word text Originally it is mapped on corresponding app application functions or app applicating categories, before the app application results for more meeting user view are come Row, this obviously can strengthen the search experience of user.Therefore user view identification is the core technology of application searches engine, is also real The key of existing searching functions technology.

It is manual sorting sorted users search intention in existing traditional web search engine technology, user is searched for Intention is divided into navigation type, info class and resources-type three major types type, but this user view sorting technique for webpage and discomfort For app application scenarios.Because application has fixed application per a app, for people a certain materialization is provided Function, be using the fine-grained functional requirement of label digging user it is appropriate, based on classification method granularity it is wide, wide in range thus not It is suitable for.So, there is no so far it is a kind of very flexibly and effective method can meet user it is growing it is quick to app applications, The demand of precise search.

The content of the invention

In view of the above problems, it is proposed that the present invention so as to provide one kind overcome the problems referred to above or at least in part solve on State recognition methodss, device, application searches method and server that a kind of application searches of problem are intended to.

According to one aspect of the present invention, there is provided a kind of recognition methodss that application searches are intended to, the method includes：

The search word in each inquiry session is obtained from the inquiry session log of application searches engine；

Search word and preset strategy in each inquiry session, excavates the label system of each search word；

Identify that the corresponding application searches of the search word are intended to according to the label system of each search word.

Alternatively, the search word and preset strategy in each inquiry session, excavates the label system of each search word Including：

According to the search word in each inquiry session, corpus set is obtained；

Corpus set is input into into LDA models and is trained, the search word-theme for obtaining the output of LDA models is general Rate distribution results and theme-key words probabilities distribution results；

According to the search word-theme probability distribution result and the theme-key words probabilities distribution results, it is calculated The label system of each search word.

Alternatively, the search word in each inquiry session, obtaining corpus set includes：

According to the search word in each inquiry session, the original language material of each search word is obtained；

The original language material of each search word constitutes original language material set；Pretreatment is carried out to the original language material set, is obtained Corpus set.

Alternatively, the search word in each inquiry session, obtaining the original language material of each search word includes：

According to the search word in each inquiry session, the corresponding search word arrangement set of multiple queries session is obtained；And, obtain Obtain the corresponding search set of words of multiple queries session；

The search word vector file for obtaining N-dimensional is trained to the search word arrangement set；

For each search word in searched for set of words, the search is calculated according to the search word vector file of the N-dimensional Correlation degree between word and other each search words；Correlation degree with the search word meets pre-conditioned other are each Original language material of the search word as the search word.

Alternatively, the corresponding search word arrangement set of the acquisition multiple queries session includes：

For each inquiry session, the search word in the inquiry session is lined up in order a sequence；If the sequence A search word in row applies down operation, and the title of the application downloaded is inserted into into accordingly searching in the sequence The rear adjacent position of rope word；Obtain the corresponding search word sequence of the inquiry session；

The corresponding search set of words of multiple queries session that obtains includes：By the collection of the search word in multiple queries session Cooperate as the corresponding search set of words of the plurality of inquiry session.

Alternatively, the search word arrangement set is trained and obtains the search word vector file of N-dimensional and include：

Using each search word in the search word arrangement set as a word, using deep learning tool kit Word2vec is trained to the search word arrangement set, generates the search word vector file of N-dimensional.

Alternatively, described each search word in searched for set of words, according to the search word vector file of the N-dimensional Calculate the correlation degree between the search word and other each search words；Correlation degree with the search word is met into default bar Other each search words of part include as the original language material of the search word：

Computing is carried out to the search word vector file of the search set of words and the N-dimensional using KNN algorithms, according to institute The search word vector file for stating N-dimensional calculates the distance between described each two search word searched in set of words；

For each search word in the search set of words, sort from big to small according to the distance with the search word, select Front first predetermined threshold value search word is taken as the original language material of the search word.

Alternatively, it is described pretreatment is carried out to the original language material set to include：

In the original language material set,

For each original language material, word segmentation processing is carried out to the original language material, obtain the participle comprising multiple lexical items and tie Really；The phrase that adjacent lexical item in searching by the word segmentation result is constituted；Retain in the phrase, the word segmentation result and belong to name The lexical item of word and the lexical item for belonging to verb, as the key word that original language material correspondence retains.

Alternatively, the phrase that the adjacent lexical item during the lookup is by the word segmentation result is constituted includes：

The cPMId values of the adjacent lexical item of each two in word segmentation result are calculated, when the cPMId values of two adjacent lexical items are more than the During two predetermined threshold values, determine that the two adjacent lexical items constitute phrase.

Alternatively, it is described pretreatment is carried out to the original language material set also to include：

First stage corpus of the key word that the initial material correspondence of each search word is retained as the search word；

The first stage corpus of each search word constitute first stage corpus set；The first stage is trained Key word in language material set carries out data cleansing.

Alternatively, the key word in the first stage corpus set carries out data cleansing includes：

In the first stage corpus set,

For the first stage corpus of each search word, each calculated in the first stage corpus is crucial The TF-IDF values of word；TF-IDF values are deleted higher than the 3rd predetermined threshold value and/or less than the key word of the 4th predetermined threshold value, is obtained The corpus of the search word；

The corpus composing training language material set of each search word.

Alternatively, it is described to be tied according to the search word-theme probability distribution result and the theme-key words probabilities distribution Really, being calculated the label system of each search word includes：

According to the search word-theme probability distribution result and the theme-key words probabilities distribution results, it is calculated Search word-key words probabilities distribution results；

According to the search word-key words probabilities distribution results, for each search word, by key word according to searching with regard to this The probability of rope word sorts from big to small, chooses the key word of front 5th predetermined threshold value number.

Alternatively, it is described to be tied according to the search word-theme probability distribution result and the theme-key words probabilities distribution Really, being calculated search word-key words probabilities distribution results includes：

For each search word, each theme is obtained with regard to the search word according to the search word-theme probability distribution result Probability；

For each theme, each key word is obtained with regard to the theme according to the theme-key words probabilities distribution results Probability；

Then for each key word, by the key word with regard to a theme probability and the theme with regard to search word Probability with regard to the search word of the product of probability as the key word based on the theme；The key word is closed based on each theme In the search word probability sum as the key word with regard to the search word probability.

Alternatively, it is described to be tied according to the search word-theme probability distribution result and the theme-key words probabilities distribution Really, being calculated the label system of each search word also includes：

First rank of the key word of the 5th predetermined threshold value number as the search word before each search word correspondence is chosen Segment mark label system；

For the first stage label system of each search word, calculate every in the first stage label system of the search word Semantic relationship value between individual key word and the search word；For each key word, by the corresponding semantic relationship value of the key word With the key word with regard to the probability of the search word product as the key word with regard to the search word amendment probability；This is searched for Each key word in the first stage label system of word sorts from big to small according to the amendment probability with regard to the search word, before selection 6th predetermined threshold value key word constitutes the label system of the search word.

Alternatively, the language between each key word and the search word in the first stage label system of the search word is calculated Adopted relation value includes：

According to the search word in each inquiry session, the corresponding search word arrangement set of multiple queries session is obtained；To described Search word arrangement set is trained the key word vector file for obtaining N-dimensional；

According to the key word vector file of the N-dimensional, the term vector of the key word is calculated, calculate each in the search word The term vector of lexical item；

Calculate the cosine similarity between the term vector of the key word and the term vector of each lexical item, as the key word with The semantic relationship value of corresponding lexical item；

Close the semantic relationship value sum of the key word and each lexical item as the semanteme between the key word and the search word Set occurrence.

Alternatively, described being trained to the search word arrangement set obtains the key word vector file of N-dimensional and includes：

Word segmentation processing is carried out to the search word arrangement set, using deep learning tool kit word2vec to word segmentation processing Search word arrangement set afterwards is trained, and generates the key word vector file of N-dimensional.

Second-order segment mark of the 6th predetermined threshold value key word as the search word before each search word correspondence is chosen Label system；

For the second stage label system of each search word, count every in the second stage label system of the search word TF-IDF value of the individual key word in the corpus of the search word；For each key word, by the key word with regard to the search The product of the probability of word and the TF-IDF values as the key word with regard to the search word second-order correction probability；By the search word Second stage label system in each key word sort from big to small according to the second-order correction probability with regard to the search word, choose Front K key word constitutes the label system of the search word.

Alternatively, the label system that K key word constitutes the search word before the selection includes：

The inquiry time with regard to the search word in preset time period is obtained from the inquiry session log of application searches engine Number；

K key word constitutes the label system of the search word before being chosen according to the inquiry times；Wherein K values are searched as this The polygronal function of the corresponding inquiry times of rope word.

According to another aspect of the present invention, there is provided a kind of application searches method, the method includes：

Search word tag database is built, the search word tag database includes the label system of multiple search words；

The current search word of client upload is received, the mark of current search word is obtained according to the search word tag database Label system；

Calculate the correlation degree between the label system of current search word and the label system of each application；

When the correlation degree between the label system of current search word and the label system of an application meets pre-conditioned When, the relevant information of the application is back to into client and is shown；

The search word tag database is built by the method any one of present invention one side.

Alternatively, the label system for obtaining current search word according to the search word tag database includes：

The semantic similarity between current search word and each search word in the search word tag database is calculated, according to Semantic similarity sorts from big to small, chooses front first predetermined threshold value search word；

According to the label system of selected each search word, the label system of current search word is obtained.

Alternatively, the semanteme calculated between current search word and each search word in the search word tag database Similarity includes：The Euclidean distance between current search word and each search word in the search word tag database is calculated, will Euclidean distance between each search word and current search word is used as the corresponding semantic similarity of the search word；

The label system of each search word selected by the basis, obtaining the label system of current search word includes：Each Weight of the corresponding semantic similarity of search word as each label in the label system of the search word；For the mark of each search word The corresponding each label of label system, the weight of identical label is added, and obtains the final weight of each label；According to final weight from Little sequence is arrived greatly, chooses the label system that front second predetermined threshold value label constitutes current search word.

According to another aspect of the present invention, there is provided a kind of identifying device that application searches are intended to, the device includes：

Acquiring unit, is suitable to from the inquiry session log of application searches engine obtain the search word in each inquiry session；

Unit is excavated, is suitable to, according to each search word inquired about in session and preset strategy, excavate the mark of each search word Label system；

Recognition unit, is suitable to identify the corresponding application searches meaning of the search word according to the label system of each search word Figure.

Optionally, the excavation unit, the search word being suitable in each inquiry session, obtains corpus set；Will Corpus set is input into into LDA models and is trained, and obtains the search word-theme probability distribution result of LDA models output And theme-key words probabilities distribution results；According to the search word-theme probability distribution result and the theme-key word Probability distribution result, is calculated the label system of each search word.

Alternatively, the excavation unit, the search word being suitable in each inquiry session, obtains the original language of each search word Material；The original language material of each search word constitutes original language material set；Pretreatment is carried out to the original language material set, training language is obtained Material set.

Alternatively, the excavation unit, the search word being suitable in each inquiry session, obtains multiple queries session correspondence Search word arrangement set；And, obtain the corresponding search set of words of multiple queries session；The search word arrangement set is entered Row training obtains the search word vector file of N-dimensional；For each search word in searched for set of words, according to searching for the N-dimensional Rope term vector file calculates the correlation degree between the search word and other each search words；To accord with the correlation degree of the search word Conjunction meets other pre-conditioned each search words as the original language material of the search word.

Alternatively, the excavation unit, is suitable to for each inquiry session, by the search word in the inquiry session according to suitable Sequence lines up a sequence；If a search word in the sequence applies down operation, by the name of the application downloaded Title is inserted into the rear adjacent position of the corresponding search word in the sequence；Obtain the corresponding search word sequence of the inquiry session；Will The set of the search word in multiple queries session is used as the corresponding search set of words of the plurality of inquiry session.

Alternatively, the excavation unit, is suitable to each search word in the search word arrangement set as a list Word, is trained using deep learning tool kit word2vec to the search word arrangement set, generates the search term vector of N-dimensional File.

Alternatively, the excavation unit, is suitable to the search to the search set of words and the N-dimensional using KNN algorithms Term vector file carries out computing, calculates each two in the search set of words according to the search word vector file of the N-dimensional and searches The distance between rope word；For each search word in the search set of words, according to the distance of the search word from big to small Sequence, chooses front first predetermined threshold value search word as the original language material of the search word.

Alternatively, the excavation unit, is suitable in the original language material set, for each original language material, to described Original language material carries out word segmentation processing, obtains the word segmentation result comprising multiple lexical items；Adjacent word in searching by the word segmentation result The phrase that item is constituted；Retain and belong to the lexical item of noun in the phrase, the word segmentation result and belong to the lexical item of verb, as this The key word that original language material correspondence retains.

Alternatively, the excavation unit, is suitable to calculate the cPMId values of the adjacent lexical item of each two in word segmentation result, when two When the cPMId values of individual adjacent lexical item are more than the second predetermined threshold value, determine that the two adjacent lexical items constitute phrase.

Alternatively, the excavation unit, be further adapted for the key word that retains the initial material of each search word correspondence as The first stage corpus of the search word；The first stage corpus of each search word constitute first stage corpus collection Close；Data cleansing is carried out to the key word in the first stage corpus set.

Alternatively, the excavation unit, is suitable in the first stage corpus set, for each search word First stage corpus, calculate the TF-IDF values of each key word in the first stage corpus；By TF-IDF values Delete higher than the 3rd predetermined threshold value and/or less than the key word of the 4th predetermined threshold value, obtain the corpus of the search word；Respectively search The corpus composing training language material set of rope word.

Alternatively, the excavation unit, is suitable to according to the search word-theme probability distribution result and the theme-pass Keyword probability distribution result, is calculated search word-key words probabilities distribution results；According to the search word-key words probabilities Distribution results, for each search word, key word are sorted from big to small according to the probability with regard to the search word, choose the front 5th The key word of predetermined threshold value number.

Alternatively, the excavation unit, is suitable to for each search word, according to the search word-theme probability distribution knot Fruit obtains probability of each theme with regard to the search word；For each theme, obtained according to the theme-key words probabilities distribution results To each key word with regard to the theme probability；Then for each key word, by the key word is with regard to the probability of a theme and is somebody's turn to do Theme is based on the probability with regard to the search word of the theme with regard to the product of the probability of a search word as the key word；Will The key word based on each theme with regard to the search word probability sum as the key word with regard to the search word probability.

Alternatively, the excavation unit, the 5th predetermined threshold value number before being further adapted for choosing each search word correspondence First stage label system of the key word as the search word；For the first stage label system of each search word, calculating should The semantic relationship value between each key word and the search word in the first stage label system of search word；For each key Word, the corresponding semantic relationship value of the key word and the key word are closed with regard to the product of the probability of the search word as the key word In the amendment probability of the search word；By each key word in the first stage label system of the search word according to regard to the search word Amendment probability sort from big to small, choose the label system that front 6th predetermined threshold value key word constitutes the search word.

Alternatively, the excavation unit, the search word being suitable in each inquiry session, obtains multiple queries session correspondence Search word arrangement set；The key word vector file for obtaining N-dimensional is trained to the search word arrangement set；According to described The key word vector file of N-dimensional, calculates the term vector of the key word, calculates the term vector of each lexical item in the search word；Meter The cosine similarity between the term vector of the key word and the term vector of each lexical item is calculated, as the key word and corresponding lexical item Semantic relationship value；Using the semantic relationship value sum of the key word and each lexical item as the semanteme between the key word and the search word Relation value.

Alternatively, the excavation unit, is suitable to carry out word segmentation processing to the search word arrangement set, using deep learning Tool kit word2vec is trained to the search word arrangement set after word segmentation processing, generates the key word vector file of N-dimensional.

Alternatively, the excavation unit, the 6th predetermined threshold value is crucial before being further adapted for choosing each search word correspondence Second stage label system of the word as the search word；For the second stage label system of each search word, the search is counted TF-IDF value of each key word in the second stage label system of word in the corpus of the search word；For each pass Keyword, using the key word with regard to the probability of the search word and the product of the TF-IDF values as the key word with regard to the search word Second-order correction probability；By each key word in the second stage label system of the search word according to regard to the secondary of the search word Amendment probability sorts from big to small, chooses the label system that front K key word constitutes the search word.

Alternatively, the excavation unit, is suitable to from the inquiry session log of application searches engine obtain with regard to the search Inquiry times of the word in preset time period；K key word constitutes the label of the search word before being chosen according to the inquiry times System；Wherein polygronal function of the K values as the corresponding inquiry times of the search word.

According to another aspect of the invention, there is provided a kind of application searches server, the server includes：

Database sharing unit, is suitable to build search word tag database, and the search word tag database includes multiple The label system of search word；

Interactive unit, is suitable to receive the current search word of client upload；

Search processing, is suitable to obtain the label system of current search word according to the search word tag database；Meter Calculate the correlation degree between the label system of current search word and the label system of each application；

The interactive unit, is further adapted for when the pass between the label system of current search word and the label system of an application When connection degree meets pre-conditioned, the relevant information of the application is back to into client and is shown；

The identifying device that the database sharing unit is intended to the application searches any one of claim 22-39 The process for building the search word tag database is identical.

Alternatively, the search processing, is suitable to calculate in current search word and the search word tag database Semantic similarity between each search word, sorts from big to small according to semantic similarity, chooses front first predetermined threshold value search Word；According to the label system of selected each search word, the label system of current search word is obtained.

Alternatively, the search processing, is suitable to calculate in current search word and the search word tag database Euclidean distance between each search word, the Euclidean distance between each search word and current search word is corresponding as the search word Semantic similarity；Power of the corresponding semantic similarity of each search word as each label in the label system of the search word Weight；For the corresponding each label of the label system of each search word, the weight of identical label is added, obtains the final of each label Weight；Sort from big to small according to final weight, choose the label body that front second predetermined threshold value label constitutes current search word System.

Scheme of the invention, it is proposed that the user view recognition methodss-label matched with app application label systems Method, flexibly effectively, exactly excavates the corresponding label system of search word, sets up search word tag database, for The search word of family input, can accurately be described with label system to search word, thus both solve user view knowledge Other problem.It is possible to further be mapped in same label system user view and app applications, scan for matching energy Access more accurate application searches result.Thus user view identification problem had both been solved, while and solving using searching The correlation computations problem held up of index, is laying the foundation for a core technology-searching functions technology in application searches engine.

Described above is only the general introduction of technical solution of the present invention, in order to better understand the technological means of the present invention, And can be practiced according to the content of description, and in order to allow the above and other objects of the present invention, feature and advantage can Become apparent, below especially exemplified by the specific embodiment of the present invention.

Description of the drawings

By the detailed description for reading hereafter preferred implementation, various other advantages and benefit is common for this area Technical staff will be clear from understanding.Accompanying drawing is only used for illustrating the purpose of preferred implementation, and is not considered as to the present invention Restriction.And in whole accompanying drawing, it is denoted by the same reference numerals identical part.In the accompanying drawings：

Fig. 1 shows the flow chart of the recognition methodss that a kind of application searches according to an embodiment of the invention are intended to；

Fig. 2 shows a kind of flow chart of application searches method according to an embodiment of the invention；

Fig. 3 shows the schematic diagram of the identifying device that a kind of application searches according to an embodiment of the invention are intended to；With And

Fig. 4 shows a kind of schematic diagram of application searches server according to an embodiment of the invention.

Specific embodiment

The exemplary embodiment of the disclosure is more fully described below with reference to accompanying drawings.Although showing the disclosure in accompanying drawing Exemplary embodiment, it being understood, however, that may be realized in various forms the disclosure and should not be by embodiments set forth here Limited.On the contrary, there is provided these embodiments are able to be best understood from the disclosure, and can be by the scope of the present disclosure Complete conveys to those skilled in the art.

Hereinafter, application is represented with app, query represents search word, and tag represents label, and Session represents inquiry meeting Words.

The present invention proposes a kind of new user view recognition methodss for application searches engine, and labeling acts flexibly has The fine-grained query intention of expression user of effect, based on unsupervised machine learning techniques the label system of user view is built, and is thrown Traditional user view sorting technique is abandoned, a set of automatization's user intention mining flow process has been realized, high-accuracy can have been generated, called together The user view list of labels of the rate of returning, user query and app are mapped in total label system, while solving user The correlation calculations problem of intention assessment problem and application searches engine, obtains extraordinary effect.

Fig. 1 shows the flow chart of the recognition methodss that a kind of application searches according to an embodiment of the invention are intended to. As shown in figure 1, the method includes：

Step S110, obtains the search word in each inquiry session from the inquiry session log of application searches engine；

Step S120, the search word and preset strategy in each inquiry session, excavates the label body of each search word System；

Step S130, identifies that the corresponding application searches of the search word are intended to according to the label system of each search word.

It can be seen that, traditional user view identification is the sorting technique for webpage design, is not suitable for app application scenarios, There is fixed application per a application, for people a certain function of embodying is provided, it is thin using label digging user The functional requirement of granularity is appropriate, and the method granularity based on classification is wide, wide in range thus inapplicable.This programme is proposed and application User view recognition methodss-labeling acts that label system matches, flexibly effectively, is mapped to user view and application same In label system, user view identification problem was both solved, and while solving the problems, such as the correlation calculations of application searches engine, be Realize the basis of a core technology-searching functions technology of application searches engine.

Under normal circumstances, user's search word is short text, and user is in the search according to the demand construction thought in the minds of oneself Word feature is sparse, can not comprehensively describe demand itself.But if it is single that user only finds certain in a short time period Function scene app application, often constantly rewrite query search word around a unitary demand, these query words for sending it Between generally have very strong semantic association, this is a key character of application searches engine.

In search engine service, system can automatically record pass information related to user's search, and preserve to inquiry day In will.For example, when user opens a Baidu search page, " game " " Games Software " " joyful game " " trip is sequentially input Searched page is entered after the search word of play application download " etc., or after into a certain searched page, continues to be input into some search Word scans for action, until user completes the search events, closes the whole Baidu search page, and whole process is referred to as once Inquiry session.

In one embodiment of the invention, the search word and default plan in step S120 in each inquiry session Slightly, excavating the label system of each search word includes：According to the search word in each inquiry session, corpus set is obtained；Will Corpus set is input into into LDA models and is trained, and obtains the search word-theme probability distribution result of LDA models output And theme-key words probabilities distribution results；According to the search word-theme probability distribution result and the theme-key word Probability distribution result, is calculated the label system of each search word.

During corpus are obtained, technological difficulties are the extensions of query short texts, and being extended to after long text can be by One query regards a document as, is the key of our effectively utilizes LDA topic models, recalls so as to produce high-accuracy, height Intention tag of rate.It is intended to tag point for classification tag and feature tag, classification label reflects the application of user's request, Functionality tabs reflect the materialization demand of user.

Wherein, the search word in each inquiry session, obtaining corpus set includes：

According to the search word in each inquiry session, the original language material of each search word is obtained；The original language material structure of each search word Into original language material set；Pretreatment is carried out to the original language material set, corpus set is obtained.Specifically, the basis Search word in each inquiry session, obtaining the original language material of each search word includes：According to the search word in each inquiry session, obtain The corresponding search word arrangement set of multiple queries session；And, obtain the corresponding search set of words of multiple queries session.

The query search word sequence inside an inquiry session is kept, search word is treated as an entirety, certain User has downloaded some app under query, app names is close to and is spelled after the query sequences.Such as：One user session Sequence is query1, query2, query3, and user has downloaded an app1 after input query2, and app1 spellings are existed After query2, before query3, i.e. query1, query2, app1, query3.Each session sequence is a line, output To file session_query-app_list.txt, i.e. search word arrangement set.And all query outputs to another text Part query_all.txt, that is, search for set of words.

The search word vector file for obtaining N-dimensional is trained to the search word arrangement set；For searched for set of words In each search word, the pass between the search word and other each search words is calculated according to the search word vector file of the N-dimensional Connection degree；Correlation degree with the search word is met other pre-conditioned each search words as the original language of the search word Material.

In one embodiment of the invention, the corresponding search word arrangement set of the acquisition multiple queries session includes： For each inquiry session, the search word in the inquiry session is lined up in order a sequence；If in the sequence Individual search word applies down operation, after the title of the application downloaded to be inserted into the corresponding search word in the sequence Face adjacent position；Obtain the corresponding search word sequence of the inquiry session；It is described to obtain the corresponding search word set of multiple queries session Conjunction includes：Using the set of the search word in multiple queries session as the corresponding search set of words of the plurality of inquiry session.

For example, a user sequentially inputs " search word 1 ", " search word 2 ", " search word 3 " in one query session, And the user has downloaded afterwards an app1 in input " search word 2 ".Therefore the corresponding search word sequence of the inquiry session is：Search Word 1, search word 2, app1, search word 3.Each corresponding search word sequence of inquiry session is a line, multiple queries session Corresponding search word arrangement set is multirow.

Above-mentioned being trained to the search word arrangement set obtains the search word vector file of N-dimensional and includes：Search described Each search word in rope word sequence set as a word, using deep learning tool kit word2vec to the search word Arrangement set is trained, and generates the search word vector file of N-dimensional.For example, trained using deep learning tool kit word2vec, The query for generating 300 dimensions is vectorial, generates a query vector file query_w2v_300.dict, that is, search for term vector text Part.

In fact, user is in the application that search inquiry is wanted, the search word of input is various informative, or for a noun (such as：" game "), or for a phrase (as " joyful game "), or for a sentence (such as：" I wants to download one joyful Game.”).

In one embodiment of the invention, hereinbefore resulting search word vector file is to be used for as search The foundation of each search word calculating term vector in set of words, described each search word in searched for set of words, according to The search word vector file of the N-dimensional calculates the correlation degree between the search word and other each search words；Will be with the search word Correlation degree meet other pre-conditioned each search words as the original language material of the search word, specifically include：

Computing is carried out to the search word vector file of the search set of words and the N-dimensional using KNN algorithms, according to institute The search word vector file for stating N-dimensional calculates the distance between described each two search word searched in set of words；Search for described Each search word in rope set of words, sorts from big to small according to the distance with the search word, chooses front first predetermined threshold value Original language material of the search word as the search word.

Table 1 shows first 10 nearest successive terms of the search word in one embodiment of the invention for " search dog ", recently Successive term includes search word and app application names, " the search dog cellphone inputting method ", " search dog input as shown in first row in table 1 Method " etc..10 are the first preset value in this, and secondary series then represents corresponding nearest successive term and search word " search dog " in table 1 Distance.

Table 1

Nearest successive term	Statistical indicator (is based on Euclidean distance)
		Search dog cellphone inputting method	38 303.827 0.838104
Search dog input method	26 323.494 0.845153
		Sogou	20332 372.525 0.778589
Receive Canis familiaris L.	6986 385.809 0.76965
		Search dog phonetic	14577 410.986 0.753037
Search dog input method Semen setariae version	4042 423.929 0.746941
		Search dog spelling input method	4927 435.273 0.736172
Sohu's input method	18233 452.955 0.724872
		Search dog is input into	10274 455.505 0.720034
Mobile phone search dog input method	3075 476.93 0.721099

Table 2 shows front 10 arest neighbors of the search word in one embodiment of the invention for " winners' announcement in lottery inquiry " , the corresponding implication that represents is similar with table 1, repeats no more.

Table 2

In one embodiment of the invention, it is described to institute after the corresponding original language material set of each search word is obtained Stating original language material set and carrying out pretreatment includes：

In the original language material set, for each original language material, word segmentation processing is carried out to the original language material, obtained Word segmentation result comprising multiple lexical items；The phrase that adjacent lexical item in searching by the word segmentation result is constituted；Retain the phrase, The lexical item for belonging to noun in the word segmentation result and the lexical item for belonging to verb, as the key word that the original language material correspondence retains.

For example, one search word of user input is " download games ", then the lexical item that the search word belongs to noun is " game ", The lexical item for belonging to verb is " download ".

Wherein, the phrase that the adjacent lexical item during the lookup is by the word segmentation result is constituted includes：

Formula 1 shows cPMId computational methods, wherein, d (x, y) represents the co-occurrence frequency of two lexical items x, y, d (x) tables Show the appearance frequency of lexical item x, d (y) represents the appearance frequency of lexical item y, and D represents total app quantity, δ=0.7.

Formula 1

For example sort according to cPMId values backward, select cPMId higher than the lexical item combination of threshold values 5 as a phrase, it is and firm The verb for just retaining and noun merge, and generate new file query_corpus_seg_nouns_verb_phrase.txt, i.e., the Corpus after pretreatment.

Further, in one embodiment of the invention, it is described pretreatment is carried out to the original language material set also to wrap Include：First stage corpus of the key word that the initial material correspondence of each search word is retained as the search word；Respectively search The first stage corpus of rope word constitute first stage corpus set；To in the first stage corpus set Key word carries out data cleansing.

Specifically, the key word in the first stage corpus set carries out data cleansing includes：Institute In stating first stage corpus set, for the first stage corpus of each search word, the first stage instruction is calculated Practice the TF-IDF values of each key word in language material；By TF-IDF values higher than the 3rd predetermined threshold value and/or less than the 4th default threshold The key word of value is deleted, and obtains the corpus of the search word；The corpus composing training language material set of each search word.

This step is the non-tag words in excavating in first stage corpus set, for data cleansing.One high frequency or The lexical item of low frequency occurrence is that the probability of tag is less, utilizes tf-idf statistical method in corpus set in the first stage, is calculated Each lexical item, the tf-idf weights of phrase, will be above certain threshold values or less than certain threshold values lexical item or phrase as non-tag Word, this threshold values is relevant with concrete language material, and occurrence is not listed herein, and non-tag words generate a blacklist black_ Tag.list, filters out the non-tag words in the corpus set of file first stage, generates a new corpus set, lattice Formula：Search word _ id lexical item 2 ... lexical item n of t lexical items 1.

Table 3 shows the vocabulary of the non-label that part can be abandoned in data cleansing, and these vocabulary or too high occuring frequently show Or low frequency occurrence is crossed, nonsensical is searched for user.

Table 3

After corpus set is obtained, LDA model selection GibbsLDA++ versions.Need to change GibbsLDA++ source generations Code, the theme of same lexical item in query language materials is initialized as same.Each lexical item in original code with Machine is initialized to a theme, causes the same lexical item that repeats to be initialized as multiple themes, because in vertical applications field, word The probability of item ambiguity is little, so same lexical item is initialized to the present situation that same theme meets vertical applications field, also can Improve the effect of LDA models.For example, LDA training selects 120 themes, iteration 300 to take turns, and exports two number evidences, is respectively main Topic-lexical item probability distribution and document-theme probability distribution.

Then this programme need to be according to the search word-theme probability distribution result and the theme-key words probabilities distribution knot Really, the label system of each search word is calculated, including：

According to the search word-theme probability distribution result and the theme-key words probabilities distribution results, it is calculated Search word-key words probabilities distribution results；According to the search word-key words probabilities distribution results, for each search word, will Key word sorts from big to small according to the probability with regard to the search word, chooses the key word of front 5th predetermined threshold value number.

Wherein, it is described to be tied according to the search word-theme probability distribution result and the theme-key words probabilities distribution Really, being calculated search word-key words probabilities distribution results includes：It is general according to the search word-theme for each search word Rate distribution results obtain probability of each theme with regard to the search word；For each theme, according to the theme-key words probabilities point Cloth result obtains probability of each key word with regard to the theme；Then for each key word, by the key word with regard to theme Probability and the theme with regard to the probability of a search word product as the key word based on the theme with regard to the search word Probability；The key word is based on probability sum of each theme with regard to the search word as the key word with regard to the search word Probability.

This step is the process that initial LDA tag are generated, and this step obtains the tag of LDA generations.LDA outputs are each Topic probability distribution under query, and the lexical item probability distribution under each topic.In order to obtain the tag of each query, , respectively to topic probability distribution, lexical item probability distribution according to probability backward sequence from big to small, we select each for we Front 50 topic under query, select front 120 lexical items, the probability of lexical item to carry out adding using the probability of topic under each topic Power sequence, each tag lexical item has a lda weight, the importance under the query is represented, according to this tag weight backward Sequence, has just obtained the tag lists of LDA generations, and containing many noises, the order of tag is also inaccurate.

Further, predicting the outcome for LDA models is finely adjusted so that the order of the important tag of each query is more It is forward, it is in one embodiment of the invention, described according to the search word-theme probability distribution result and the theme-pass Keyword probability distribution result, being calculated the label system of each search word also includes：The before each search word correspondence is chosen First stage label system of the key word of five predetermined threshold value numbers as the search word；For the first stage of each search word Label system, calculates the semantic relation between each key word and the search word in the first stage label system of the search word Value；For each key word, by the corresponding semantic relationship value of the key word with the key word taking advantage of with regard to the probability of the search word Accumulate the amendment probability with regard to the search word as the key word；By each key word in the first stage label system of the search word Sort from big to small according to the amendment probability with regard to the search word, choose front 6th predetermined threshold value key word and constitute the search word Label system.

Wherein, the semanteme between each key word and the search word in the first stage label system of the search word is calculated Relation value includes：According to the search word in each inquiry session, the corresponding search word arrangement set of multiple queries session is obtained；To institute State search word arrangement set and be trained the key word vector file for obtaining N-dimensional；According to the key word vector file of the N-dimensional, The term vector of the key word is calculated, the term vector of each lexical item in the search word is calculated；Calculate the term vector of the key word with Cosine similarity between the term vector of each lexical item, as the semantic relationship value of the key word and corresponding lexical item；By the key The semantic relationship value sum of word and each lexical item is used as the semantic relationship value between the key word and the search word.

For example, the semantic relation of each tag word and query is calculated, this uses the term vector term_w2v_ for training 300.dict, method is：The remaining profound similarity of tag term vectors and the term vector of each word in query is calculated respectively, is accumulated in Together, be worth it is bigger, illustrate that tag is more important, with lda Weights after again again backward sort.

Specifically, described being trained to the search word arrangement set obtains the key word vector file of N-dimensional and includes：It is right The search word arrangement set carries out word segmentation processing, using deep learning tool kit word2vec to the search word after word segmentation processing Arrangement set is trained, and generates the key word vector file of N-dimensional.

For example, Chinese word segmentation is carried out to the search word arrangement set on, is instructed using deep learning tool kit word2vec Practice, the query for generating 300 dimensions is vectorial, generates another term vector file term_w2v_300.dict, i.e., crucial term vector text Part.

Yet further, in one embodiment of the invention, it is described according to the search word-theme probability distribution result With the theme-key words probabilities distribution results, being calculated the label system of each search word also includes：By each search word pair Second stage label system of the front 6th predetermined threshold value key word that should be chosen as the search word；For each search word Second stage label system, counts training of each key word in the second stage label system of the search word in the search word TF-IDF values in language material；For each key word, by the key word with regard to the search word probability and the TF-IDF values Product as the key word with regard to the search word second-order correction probability；Will be each in the second stage label system of the search word Key word sorts from big to small according to the second-order correction probability with regard to the search word, and K key word constitutes the search word before choosing Label system.

For example, according to tag, tf-idf weights are suitably weighted in query extension language materials, normalized weight and with this Reset tag order.

After the amendment of both the above method, the tag order accuracys rate for expressing query intentions are significantly lifted

In one embodiment of the invention, the label system that K key word constitutes the search word before the selection includes： The inquiry times with regard to the search word in preset time period are obtained from the inquiry session log of application searches engine；According to institute State inquiry times and choose the label system that front K key word constitutes the search word；Wherein K values are used as the corresponding inquiry of the search word The polygronal function of number of times.

This step is the quantity that tag is determined to each query, retains top k tag words, and this k value is searched as query The polygronal function of rope number of times, each query we remain 2 to 5 not wait tag, accuracy rate 88%, recall rate 75%. This step we generate query and be intended to dictionary query_intent_tag.txt.

Further, in a specific example, this programme to about 2,600,000 query labellings on express user view Tag words, be to regard query as an entirety, when user it is synonymous reconstruct rewrite query after, new query is not ours Query is intended in dictionary, the semantic similarity for calculating new query and the query in dictionary is at this moment needed, by semantic similitude Intention tag of query gives new query.Computational methods are：The lexical item amount of each word in new query is cumulative as newly Query is vectorial, and the query vectors for being intended to dictionary with query calculate Euclidean distance, select front 3 arest neighbors query, can use KdTree reduces computation complexity；As the weighting weight of tag words after Euclidean distance is smoothed with gaussian kernel, comprehensive 3 neighbours The intention tag word of query, generates the intention tag word of new query, retains the search intention that front 3 tag just meet user, accurately Rate reaches 80%.

Fig. 2 shows a kind of application searches method flow diagram according to an embodiment of the invention, and the method includes：

Step 210, builds search word tag database, and the search word tag database includes the label of multiple search words System.

Step 220, receives the current search word of client upload, is obtained according to the search word tag database and is currently searched The label system of rope word.

Step 230, calculates the correlation degree between the label system of current search word and the label system of each application.

Step 240, when the correlation degree between the label system of current search word and the label system of an application meets When pre-conditioned, the relevant information of the application be back to into client and be shown.

Wherein, step S210 build search word tag database during, the excavation to the label system of search word It is identical with the mining process of the label system to search word shown in any embodiment of the method shown in Fig. 1.

In one embodiment of the invention, the mark that current search word is obtained according to the search word tag database Label system includes：The semantic similarity between current search word and each search word in the search word tag database is calculated, Sort from big to small according to semantic similarity, choose front first predetermined threshold value search word；According to selected each search word Label system, obtains the label system of current search word.

In one embodiment of the invention, it is described calculate in current search word and the search word tag database it is each Semantic similarity between search word includes：Calculate each search word in current search word and the search word tag database it Between Euclidean distance, using the Euclidean distance between each search word and current search word as the corresponding semantic similitude of the search word Degree；The label system of each search word selected by the basis, obtaining the label system of current search word includes：Each search word Weight of the corresponding semantic similarity as each label in the label system of the search word；For the label system of each search word Corresponding each label, the weight of identical label is added, and obtains the final weight of each label；According to final weight from big to small Sequence, chooses the label system that front second predetermined threshold value label constitutes current search word.

Table 4 shows that 360 mobile phone assistant searches for the intention labels word of part search word.

Table 4

Fig. 3 shows the identifying device that a kind of application searches according to an embodiment of the invention are intended to, and the application is searched The identifying device 300 of Suo Yitu includes：

Acquiring unit 310, is suitable to from the inquiry session log of application searches engine obtain the search in each inquiry session Word；

Unit 320 is excavated, is suitable to, according to each search word inquired about in session and preset strategy, excavate each search word Label system；

Recognition unit 330, is suitable to identify the corresponding application searches of the search word according to the label system of each search word It is intended to.

In one embodiment of the invention, the excavation unit 320, the search word being suitable in each inquiry session, Obtain corpus set；Corpus set is input into into LDA models and is trained, obtain the search of LDA models output Word-theme probability distribution result and theme-key words probabilities distribution results；According to the search word-theme probability distribution knot Fruit and the theme-key words probabilities distribution results, are calculated the label system of each search word.

Wherein, in one embodiment of the invention, the excavation unit 320, is suitable to searching in each inquiry session Rope word, obtains the original language material of each search word；The original language material of each search word constitutes original language material set；To the original language material Set carries out pretreatment, obtains corpus set.

Specifically, in one embodiment of the invention, the excavation unit 320, is suitable to according in each inquiry session Search word, obtains the corresponding search word arrangement set of multiple queries session；And, obtain the corresponding search word of multiple queries session Set；The search word vector file for obtaining N-dimensional is trained to the search word arrangement set；For in searched for set of words Each search word, calculates according to the search word vector file of the N-dimensional and associate journey between the search word and other each search words Degree；Correlation degree with the search word is met other pre-conditioned each search words as the original language material of the search word.

That is, the excavation unit 320, is suitable to for each inquiry session, by the search word in the inquiry session A sequence is lined up in order；If a search word in the sequence applies down operation, should by what is downloaded Title is inserted into the rear adjacent position of the corresponding search word in the sequence；Obtain the corresponding search word order of the inquiry session Row；Using the set of the search word in multiple queries session as the corresponding search set of words of the plurality of inquiry session.

For example, the excavation unit 320, is suitable to each search word in the search word arrangement set as a list Word, is trained using deep learning tool kit word2vec to the search word arrangement set, generates the search term vector of N-dimensional File.

On this basis, in one embodiment of the invention, the excavation unit 320, is suitable to using KNN algorithms to institute Stating the search word vector file of search set of words and the N-dimensional carries out computing, according to the search word vector file meter of the N-dimensional Calculate the distance between described each two search word searched in set of words；For each search word in the search set of words, Sort from big to small according to the distance with the search word, choose front first predetermined threshold value search word as the original of the search word Language material.

During pretreatment, in one embodiment of the invention, the excavation unit 320 is suitable to described original In language material set, for each original language material, word segmentation processing is carried out to the original language material, obtain the participle comprising multiple lexical items As a result；The phrase that adjacent lexical item in searching by the word segmentation result is constituted；Retain and belong in the phrase, the word segmentation result The lexical item of noun and the lexical item for belonging to verb, as the key word that original language material correspondence retains.

Specifically, the excavation unit 320, is suitable to calculate the cPMId values of the adjacent lexical item of each two in word segmentation result, when When the cPMId values of two adjacent lexical items are more than the second predetermined threshold value, determine that the two adjacent lexical items constitute phrase.

Further, in one embodiment of the invention, the excavation unit 320, is further adapted for each search word First stage corpus of the key word that initial material correspondence retains as the search word；The first stage training of each search word Language material constitutes first stage corpus set；Data are carried out to the key word in the first stage corpus set clear Wash.

Specifically, in one embodiment of the invention, the excavation unit 320, is suitable to be trained in the first stage In language material set, for the first stage corpus of each search word, each in the first stage corpus is calculated The TF-IDF values of key word；TF-IDF values are deleted higher than the 3rd predetermined threshold value and/or less than the key word of the 4th predetermined threshold value, Obtain the corpus of the search word；The corpus composing training language material set of each search word.

In one embodiment of the invention, the excavation unit 320, is suitable to according to the search word-theme probability point Cloth result and the theme-key words probabilities distribution results, are calculated search word-key words probabilities distribution results；According to institute Search word-key words probabilities distribution results are stated, for each search word, by key word according to the probability with regard to the search word from big To little sequence, the key word of front 5th predetermined threshold value number is chosen.

In one embodiment of the invention, the excavation unit 320, is suitable to, for each search word, be searched according to described Rope word-theme probability distribution result obtains probability of each theme with regard to the search word；For each theme, according to the theme- Key words probabilities distribution results obtain probability of each key word with regard to the theme；Then for each key word, the key word is closed The pass of the theme is based on as the key word with regard to the product of the probability of a search word in the probability and the theme of a theme In the probability of the search word；Probability sum of the key word based on each theme with regard to the search word is closed as the key word In the probability of the search word.

Further, in one embodiment of the invention, the excavation unit 320, is further adapted for each search word pair First stage label system of the key word of the front 5th predetermined threshold value number that should be chosen as the search word；For each search The first stage label system of word, calculate each key word in the first stage label system of the search word and the search word it Between semantic relationship value；For each key word, by the corresponding semantic relationship value of the key word and the key word with regard to the search The product of the probability of word as the key word with regard to the search word amendment probability；By the first stage label system of the search word In each key word sort from big to small according to the amendment probability with regard to the search word, choose front 6th predetermined threshold value key word Constitute the label system of the search word.

In one embodiment of the invention, the excavation unit 320, the search word being suitable in each inquiry session, Obtain the corresponding search word arrangement set of multiple queries session；The pass for obtaining N-dimensional is trained to the search word arrangement set Keyword vector file；According to the key word vector file of the N-dimensional, the term vector of the key word is calculated, in calculating the search word Each lexical item term vector；The cosine similarity between the term vector of the key word and the term vector of each lexical item is calculated, is made For the semantic relationship value of the key word and corresponding lexical item；Using the semantic relationship value sum of the key word and each lexical item as the key Semantic relationship value between word and the search word.

In one embodiment of the invention, the excavation unit 320, is suitable to carry out the search word arrangement set point Word process, is trained using deep learning tool kit word2vec to the search word arrangement set after word segmentation processing, generates N-dimensional Key word vector file.

Further, in one embodiment of the invention, the excavation unit 320, is further adapted for each search word pair Second stage label system of the front 6th predetermined threshold value key word that should be chosen as the search word；For each search word Second stage label system, counts training of each key word in the second stage label system of the search word in the search word TF-IDF values in language material；For each key word, by the key word with regard to the search word probability and the TF-IDF values Product as the key word with regard to the search word second-order correction probability；Will be each in the second stage label system of the search word Key word sorts from big to small according to the second-order correction probability with regard to the search word, and K key word constitutes the search word before choosing Label system.

In one embodiment of the invention, the excavation unit 320, is suitable to the inquiry session day from application searches engine The inquiry times in preset time period with regard to the search word are obtained in will；K key word before being chosen according to the inquiry times Constitute the label system of the search word；Wherein polygronal function of the K values as the corresponding inquiry times of the search word.

Fig. 4 shows a kind of application searches server according to an embodiment of the invention, the application searches server 400 include：

Database sharing unit 410, is suitable to build search word tag database, and the search word tag database includes many The label system of individual search word；

Interactive unit 420, is suitable to receive the current search word of client upload；

Search processing 430, is suitable to obtain the label system of current search word according to the search word tag database； Calculate the correlation degree between the label system of current search word and the label system of each application；

The interactive unit 420, is further adapted for when between the label system of current search word and the label system of an application Correlation degree when meeting pre-conditioned, the relevant information of the application is back to into client and is shown；

Wherein, the database sharing unit 410 excavates search word during building the search word tag database Label system scheme with any one of the above embodiment of the present invention application searches be intended to identifying device 300 excavate The scheme of the label system of search word is identical.

In one embodiment of the invention, the search processing 430, is suitable to calculating current search word and searches with described The semantic similarity between each search word in rope word tag database, sorts from big to small, before selection according to semantic similarity First predetermined threshold value search word；According to the label system of selected each search word, the label system of current search word is obtained.

In of the present invention, the search processing 430 is suitable to calculate current search word and the search word mark Sign data base in each search word between Euclidean distance, using the Euclidean distance between each search word and current search word as The corresponding semantic similarity of the search word；The corresponding semantic similarity of each search word is used as in the label system of the search word The weight of each label；For the corresponding each label of the label system of each search word, the weight of identical label is added, obtains each The final weight of label；Sort from big to small according to final weight, choose front second predetermined threshold value label and constitute current search The label system of word.

It should be noted that each embodiment of Fig. 3-Fig. 4 shown devices is corresponding with each embodiment of method shown in Fig. 1-Fig. 2 It is identical, it is described in detail above, will not be described here.

In sum, application searches are intended in the present invention recognition methodss, device, application searches method and server, carry User view recognition methodss-the labeling acts matched with app application label systems is gone out, user is fine-grained looks into for flexible expression Ask and be intended to.The label system of user view is built based on unsupervised machine learning techniques, traditional user view classification has been abandoned Method, realizes a set of automatization's user intention mining flow process, can generate the user view label column of high-accuracy, recall rate Table.User view and app applications can be mapped in same label system, thus both solve user view identification and asked Topic, while and solve the problems, such as the correlation computations of application searches engine, it is a core technology-function in application searches engine Search technique lays the foundation.

It should be noted that：

Provided herein algorithm and display be not inherently related to any certain computer, virtual bench or miscellaneous equipment. Various fexible units can also be used together based on teaching in this.As described above, construct required by this kind of device Structure be obvious.Additionally, the present invention is also not for any certain programmed language.It is understood that, it is possible to use it is various Programming language realizes the content of invention described herein, and the description done to language-specific above is to disclose this Bright preferred forms.

In description mentioned herein, a large amount of details are illustrated.It is to be appreciated, however, that the enforcement of the present invention Example can be put into practice in the case of without these details.In some instances, known method, structure is not been shown in detail And technology, so as not to obscure the understanding of this description.

Similarly, it will be appreciated that in order to simplify the disclosure and help understand one or more in each inventive aspect, exist Above in the description of the exemplary embodiment of the present invention, each feature of the present invention is grouped together into single enforcement sometimes In example, figure or descriptions thereof.However, the method for the disclosure should be construed to reflect following intention：I.e. required guarantor The more features of feature that the application claims ratio of shield is expressly recited in each claim.More precisely, such as following Claims reflect as, inventive aspect is all features less than single embodiment disclosed above.Therefore, Thus the claims for following specific embodiment are expressly incorporated in the specific embodiment, wherein each claim itself All as the separate embodiments of the present invention.

Those skilled in the art are appreciated that can be carried out adaptively to the module in the equipment in embodiment Change and they are arranged in one or more equipment different from the embodiment.Can be the module or list in embodiment Unit or component are combined into a module or unit or component, and can be divided in addition multiple submodule or subelement or Sub-component.In addition at least some in such feature and/or process or unit is excluded each other, can adopt any Combine to all features disclosed in this specification (including adjoint claim, summary and accompanying drawing) and so disclosed Where all processes or unit of method or equipment are combined.Unless expressly stated otherwise, this specification is (including adjoint power Profit is required, summary and accompanying drawing) disclosed in each feature can it is identical by offers, be equal to or the alternative features of similar purpose carry out generation Replace.

Although additionally, it will be appreciated by those of skill in the art that some embodiments described herein include other embodiments In included some features rather than further feature, but the combination of the feature of different embodiments means in of the invention Within the scope of and form different embodiments.For example, in the following claims, embodiment required for protection appoint One of meaning can in any combination mode using.

The present invention all parts embodiment can be realized with hardware, or with one or more processor operation Software module realize, or with combinations thereof realization.It will be understood by those of skill in the art that can use in practice Microprocessor or digital signal processor (DSP) are realizing the identifying device that application searches according to embodiments of the present invention are intended to With some or all functions of some or all parts in application searches server.The present invention be also implemented as Perform some or all equipment or program of device (for example, computer program and the calculating of method as described herein Machine program product).Such program for realizing the present invention can be stored on a computer-readable medium, or can be with one Or the form of multiple signals.Such signal can be downloaded from internet website and obtained, or be provided on carrier signal, Or provide in any other form.

It should be noted that above-described embodiment the present invention will be described rather than limits the invention, and ability Field technique personnel can design without departing from the scope of the appended claims alternative embodiment.In the claims, Any reference markss between bracket should not be configured to limitations on claims.Word "comprising" is not excluded the presence of not Element listed in the claims or step.Word "a" or "an" before element does not exclude the presence of multiple such Element.The present invention can come real by means of the hardware for including some different elements and by means of properly programmed computer It is existing.If in the unit claim for listing equipment for drying, several in these devices can be by same hardware branch To embody.The use of word first, second, and third does not indicate that any order.These words can be explained and be run after fame Claim.

Claims

1. the recognition methodss that a kind of application searches are intended to, wherein, including：

2. search word and preset strategy the method for claim 1, wherein in each inquiry session, excavates The label system of each search word includes：

According to the search word in each inquiry session, corpus set is obtained；

Corpus set is input into into LDA models and is trained, obtain the search word-theme probability point of LDA models output Cloth result and theme-key words probabilities distribution results；

According to the search word-theme probability distribution result and the theme-key words probabilities distribution results, it is calculated and respectively searches The label system of rope word.

3. method as claimed in claim 1 or 2, wherein, the search word in each inquiry session obtains corpus Set includes：

The original language material of each search word constitutes original language material set；Pretreatment is carried out to the original language material set, is trained Language material set.

4. the method as any one of claim 1-3, wherein, the search word in each inquiry session is obtained The original language material of each search word includes：

According to the search word in each inquiry session, the corresponding search word arrangement set of multiple queries session is obtained；And, obtain many It is individual to inquire about the corresponding search set of words of session；

For each search word in searched for set of words, according to the search word vector file of the N-dimensional calculate the search word with Correlation degree between other each search words；Correlation degree with the search word is met into other pre-conditioned each search words to make For the original language material of the search word.

5. a kind of application searches method, wherein, including：

The current search word of client upload is received, the label body of current search word is obtained according to the search word tag database System；

When the correlation degree between the label system of current search word and the label system of an application meets pre-conditioned, will The relevant information of the application is back to client and is shown；

By search word tag database as described in the method structure any one of claim 1-4.

6. the identifying device that a kind of application searches are intended to, wherein, including：

Unit is excavated, is suitable to, according to each search word inquired about in session and preset strategy, excavate the label body of each search word System；

Recognition unit, is suitable to identify that the corresponding application searches of the search word are intended to according to the label system of each search word.

7. method as claimed in claim 6, wherein,

The excavation unit, the search word being suitable in each inquiry session, obtains corpus set；By corpus set It is input into into LDA models and is trained, obtains the search word-theme probability distribution result and theme-key of LDA models output Word probability distribution results；According to the search word-theme probability distribution result and the theme-key words probabilities distribution results, It is calculated the label system of each search word.

8. method as claimed in claims 6 or 7, wherein,

The excavation unit, the search word being suitable in each inquiry session, obtains the original language material of each search word；Each search word Original language material constitute original language material set；Pretreatment is carried out to the original language material set, corpus set is obtained.

9. the method as any one of claim 6-8, wherein, the excavation unit is suitable to according in each inquiry session Search word, obtain the corresponding search word arrangement set of multiple queries session；And, obtain the corresponding search of multiple queries session Set of words；The search word vector file for obtaining N-dimensional is trained to the search word arrangement set；For in searched for set of words Each search word, associating between the search word and other each search words is calculated according to the search word vector file of the N-dimensional Degree；Correlation degree with the search word is met other pre-conditioned each search words as the original language material of the search word.

10. a kind of application searches server, wherein, including：

Database sharing unit, is suitable to build search word tag database, and the search word tag database includes multiple search The label system of word；

Search processing, is suitable to obtain the label system of current search word according to the search word tag database；Calculate and work as Correlation degree between the label system of front search word and the label system of each application；

The interactive unit, is further adapted for working as and associate journey between the label system of current search word and the label system of an application When degree meets pre-conditioned, the relevant information of the application is back to into client and is shown；

The identifying device that the database sharing unit is intended to the application searches any one of claim 6-9 builds institute The process for stating search word tag database is identical.