CN106354852A - Search method and device based on artificial intelligence - Google Patents

Search method and device based on artificial intelligence Download PDF

Info

Publication number
CN106354852A
CN106354852A CN201610800766.7A CN201610800766A CN106354852A CN 106354852 A CN106354852 A CN 106354852A CN 201610800766 A CN201610800766 A CN 201610800766A CN 106354852 A CN106354852 A CN 106354852A
Authority
CN
China
Prior art keywords
search results
search
region
similarity
space vector
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201610800766.7A
Other languages
Chinese (zh)
Inventor
李辰
廖梦
姜迪
石磊
王昕煜
何径舟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Baidu Netcom Science and Technology Co Ltd
Original Assignee
Beijing Baidu Netcom Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Baidu Netcom Science and Technology Co Ltd filed Critical Beijing Baidu Netcom Science and Technology Co Ltd
Priority to CN201610800766.7A priority Critical patent/CN106354852A/en
Publication of CN106354852A publication Critical patent/CN106354852A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9537Spatial or temporal dependent retrieval, e.g. spatiotemporal queries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2465Query processing support for facilitating data mining operations in structured databases
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9535Search customisation based on user profiles and personalisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2216/00Indexing scheme relating to additional aspects of information retrieval not explicitly covered by G06F16/00 and subgroups
    • G06F2216/03Data mining

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Computational Linguistics (AREA)
  • Mathematical Physics (AREA)
  • Artificial Intelligence (AREA)
  • Fuzzy Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Probability & Statistics with Applications (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention provides a search method and device based on artificial intelligence. A search operation is not executed by completely relying on a search keyword any more but executed by combining the region where a user is located, and search results similar to the region where the user is located and the search keyword are taken as search results, so that the search results basically meet a real intention of the user. Accordingly, the problem that in the prior art, data interaction between an application and a search engine is increased due to the fact that the user repeatedly performs searching through the application can be avoided, and then the processing burden of the search engine is reduced.

Description

Searching method based on artificial intelligence and device
[technical field]
The present invention relates to Internet technology, more particularly, to a kind of searching method based on artificial intelligence and device.
[background technology]
Artificial intelligence (artificial intelligence), english abbreviation is ai.It is research, be developed for simulation, Extend and extend the theory of intelligence of people, new science of technology of method, technology and application system.Artificial intelligence is to calculate One branch of machine science, it attempts to understand essence of intelligence, and produce a kind of new can be in the way of human intelligence be similar The intelligent machine made a response, the research in this field includes robot, language identification, image recognition, natural language processing and specially Family's system etc..
Search engine refers to collect information from the Internet according to certain strategy, with specific computer program, After information is organized and processed, provide the user search service, by what information related for user's search showed user be System.According to State Statistics Bureau, Chinese netizen's number has been over 400,000,000, and this data means that China alreadys exceed U.S. State becomes first big net the Republic of China in the world, and the website total quantity of China has been over 2,000,000.Therefore, how using search Service meets user's request to greatest extent, for Internet enterprises, is an important problem all the time.User can will search Rope key word is supplied to related application, by application by search keyword, is sent to search engine.Search engine then closes according to search Keyword, scans in data base, to obtain the Search Results that mate with search keyword, and return to application carry out defeated Go out.
However, because the search keyword that user is provided may have certain geographic preferences, for example, subway line Related search keyword etc., is completely dependent on search keyword execution search operation, may be such that Search Results cannot meet The real intention of user, so that user needs search to be repeated by application, so, can increase between application and search engine Data interaction, thus result in the increase of the processing load of search engine.
[content of the invention]
The many aspects of the present invention provide a kind of searching method based on artificial intelligence and device, reduce search in order to improve The processing load of engine.
An aspect of of the present present invention, provides a kind of searching method based on artificial intelligence, comprising:
Obtain the search keyword that user is provided;
According to described search keyword, obtain at least one Search Results;
Obtain the region phase between described user place region and each Search Results at least one Search Results described Like degree, and the key word between described search keyword and each Search Results at least one Search Results described is similar Degree;
According to described region similarity and described key Word similarity, the sequence obtaining each Search Results described is similar Degree;
According to the sequence similarity of each Search Results described, at least one Search Results described are ranked up;
At least one Search Results after the described sequence of output.
Aspect as above and arbitrary possible implementation, it is further provided a kind of implementation, described acquisition institute State the region similarity between each Search Results in user place region and at least one Search Results described, comprising:
According to described region, using first nerves network, obtain the space vector of described region;
According to each Search Results described, using described first nerves network, obtain the first of each Search Results described Space vector;
Space vector according to described region and the first space vector of each Search Results described, obtain described region with Region similarity between each Search Results described.
Aspect as above and arbitrary possible implementation, it is further provided a kind of implementation, described according to institute State region, using first nerves network, before obtaining the space vector of described region, also include:
According to user's history behavioral data, obtain the positive example sample corresponding to same region and negative example sample;
By the positive example sample corresponding to same region and negative example sample combination of two, form paired sample, using as ground Domain training data;
Using described region training data, build described first nerves network.
Aspect as above and arbitrary possible implementation, it is further provided a kind of implementation, described acquisition is searched Crucial Word similarity between each Search Results in rope key word and at least one Search Results described, comprising:
According to described search keyword, using nervus opticus network, obtain the space vector of described search keyword;
According to each Search Results described, using described nervus opticus network, obtain the second of each Search Results described Space vector;
The second space vector of the space vector according to described search keyword and each Search Results described, obtains described Crucial Word similarity between search keyword and each Search Results described.
Aspect as above and arbitrary possible implementation, it is further provided a kind of implementation, described according to institute State search keyword, using nervus opticus network, before obtaining the space vector of described search keyword, also include:
According to user's history behavioral data, obtain the positive example sample corresponding to same search keyword and negative example sample;
By the positive example sample corresponding to same search keyword and negative example sample combination of two, form paired sample, with As key word training data;
Using described key word training data, build described nervus opticus network.
Another aspect of the present invention, provides a kind of searcher based on artificial intelligence, comprising:
Acquiring unit, for obtaining the search keyword that user is provided;
Matching unit, for according to described search keyword, obtaining at least one Search Results;
Pretreatment unit, for obtaining described user place region and each search knot at least one Search Results described Region similarity between fruit, and in described search keyword and at least one Search Results described between each Search Results Crucial Word similarity;
Integral unit, for according to described region similarity and described key Word similarity, obtaining each search knot described The sequence similarity of fruit;
Sequencing unit, for the sequence similarity according to each Search Results described, at least one Search Results described It is ranked up;
Output unit, for exporting at least one Search Results after described sequence.
Aspect as above and arbitrary possible implementation, it is further provided a kind of implementation, described pretreatment Unit, specifically for
According to described region, using first nerves network, obtain the space vector of described region;
According to each Search Results described, using described first nerves network, obtain the first of each Search Results described Space vector;And
Space vector according to described region and the first space vector of each Search Results described, obtain described region with Region similarity between each Search Results described.
Aspect as above and arbitrary possible implementation, it is further provided a kind of implementation, described pretreatment Unit, is additionally operable to
According to user's history behavioral data, obtain the positive example sample corresponding to same region and negative example sample;
By the positive example sample corresponding to same region and negative example sample combination of two, form paired sample, using as ground Domain training data;And
Using described region training data, build described first nerves network.
Aspect as above and arbitrary possible implementation, it is further provided a kind of implementation, described pretreatment Unit, specifically for
According to described search keyword, using nervus opticus network, obtain the space vector of described search keyword;
According to each Search Results described, using described nervus opticus network, obtain the second of each Search Results described Space vector;And
The second space vector of the space vector according to described search keyword and each Search Results described, obtains described Crucial Word similarity between search keyword and each Search Results described.
Aspect as above and arbitrary possible implementation, it is further provided a kind of implementation, described pretreatment Unit, is additionally operable to
According to user's history behavioral data, obtain the positive example sample corresponding to same search keyword and negative example sample;
By the positive example sample corresponding to same search keyword and negative example sample combination of two, form paired sample, with As key word training data;And
Using described key word training data, build described nervus opticus network.
As shown from the above technical solution, the embodiment of the present invention passes through the search key being provided according to acquired user Word, obtains at least one Search Results, and then obtains each in described user place region and at least one Search Results described Region similarity between Search Results, and described search keyword and each search knot at least one Search Results described Crucial Word similarity between fruit, and according to described region similarity and described key Word similarity, obtain each search described The sequence similarity of result, enabling according to the sequence similarity of each Search Results described, at least one search described Result is ranked up, and exports at least one Search Results after described sequence, due to being no longer completely dependent on search keyword Execution search operation, but execute search operation further combined with user place region, will close with user place region and search The similar Search Results of keyword are as Search Results so that Search Results substantially meet the real intention of user, therefore, it is possible to keep away Exempt from the data increasing between application and search engine leading in prior art because user is repeated search by application The problem of interaction, thus reduce the processing load of search engine.
In addition, adopting technical scheme provided by the present invention, due to being no longer completely dependent on search keyword execution search behaviour Make, but execute search operation further combined with user place region, will be similar to user place region and search keyword Search Results are as Search Results so that Search Results substantially meet the real intention of user, thus improve Search Results Effectiveness.
In addition, adopting technical scheme provided by the present invention, due to being no longer completely dependent on search keyword execution search behaviour Make, but execute search operation further combined with user place region, will be similar to user place region and search keyword Search Results as Search Results so that Search Results substantially meet the real intention of user, thus improve the efficiency of search.
In addition, adopting technical scheme provided by the present invention, the experience of user can be effectively improved.
[brief description]
For the technical scheme being illustrated more clearly that in the embodiment of the present invention, below will be to embodiment or description of the prior art In required use accompanying drawing be briefly described it should be apparent that, drawings in the following description be the present invention some are real Apply example, for those of ordinary skill in the art, without having to pay creative labor, can also be attached according to these Figure obtains other accompanying drawings.
The schematic flow sheet of the searching method based on artificial intelligence that Fig. 1 provides for one embodiment of the invention;
The structural representation of the searcher based on artificial intelligence that Fig. 2 provides for another embodiment of the present invention.
[specific embodiment]
Purpose, technical scheme and advantage for making the embodiment of the present invention are clearer, below in conjunction with the embodiment of the present invention In accompanying drawing, the technical scheme in the embodiment of the present invention is clearly and completely described it is clear that described embodiment is The a part of embodiment of the present invention, rather than whole embodiments.Based on the embodiment in the present invention, those of ordinary skill in the art The whole other embodiments being obtained under the premise of not making creative work, broadly fall into the scope of protection of the invention.
It should be noted that involved terminal in the embodiment of the present invention can include but is not limited to mobile phone, individual digital Assistant (personal digital assistant, pda), radio hand-held equipment, panel computer (tablet computer), PC (personal computer, pc), Mp 3 player, mp4 player, wearable device (for example, intelligent glasses, Intelligent watch, Intelligent bracelet etc.) etc..
In addition, the terms "and/or", a kind of only incidence relation of description affiliated partner, expression there may be Three kinds of relations, for example, a and/or b, can represent: individualism a, there are a and b, these three situations of individualism b simultaneously.Separately Outward, character "/" herein, typically represent forward-backward correlation to as if a kind of relation of "or".
The schematic flow sheet of the searching method based on artificial intelligence that Fig. 1 provides for one embodiment of the invention, as Fig. 1 institute Show.
101st, obtain the search keyword that user is provided.
102nd, according to described search keyword, obtain at least one Search Results.
103rd, obtain the ground between each Search Results in described user place region and at least one Search Results described Key word phase between domain similarity, and described search keyword and each Search Results at least one Search Results described Like degree.
104th, according to described region similarity and described key Word similarity, obtain the sequence phase of each Search Results described Like degree.
At least one Search Results described are ranked up by the 105th, the sequence similarity according to each Search Results described.
106th, at least one Search Results after the described sequence of output.
It should be noted that 101~106 executive agent can be partly or entirely the application being located locally terminal, Or can also be plug-in unit or the SDK (software in the application be arranged in local terminal Development kit, sdk) etc. functional unit, or can also be the search engine in network side server, or Can also be the distributed system positioned at network side, the present embodiment is not particularly limited to this.
It is understood that described application can be mounted in the local program (nativeapp) in terminal, or also may be used To be a web page program (webapp) of browser in terminal, the present embodiment is not particularly limited to this.
So, the search keyword by being provided according to acquired user, obtains at least one Search Results, and then Obtain the region similarity between each Search Results in described user place region and at least one Search Results described, and Crucial Word similarity between each Search Results in described search keyword and at least one Search Results described, and according to institute State region similarity and described key Word similarity, obtain the sequence similarity of each Search Results described, enabling according to At least one Search Results described are ranked up, and export described sequence by the sequence similarity of each Search Results described At least one Search Results afterwards, due to be no longer completely dependent on search keyword execution search operation, but further combined with Family place region executes search operation, using the Search Results similar to user place region and search keyword as search knot Really so that Search Results substantially meet the real intention of user, therefore, it is possible to avoid in prior art because user passes through application Search is repeated and leads to increases the problem of the data interaction between application and search engine, thus reducing search engine Processing load.
Alternatively, in a possible implementation of the present embodiment, in 101, specifically can gather user and be carried For described search keyword.Specifically, specifically can be realized by the search command that user is triggered.Specifically can adopt But it is not limited to following several ways triggering search commands:
Mode one:
User can be inputted described search keyword on the Search Results that current application is represented, then, by point Hit search button on this Search Results for example, using Baidu.com, to trigger search command, in this search command, comprise described search Key word.Wherein, the order of search keyword described in user input can be random order.So, receiving this search life After order, then can parse the described search keyword included in it.
Mode two:
Using Asynchronous loading technology for example, ajax Asynchronous loading or jsonp Asynchronous loading etc., user in real is current The input content being inputted is applied on represented Search Results, in order to make a distinction with search keyword, in input now Hold and be known as inputting key word.Wherein, the order of search keyword described in user input can be random order.Specifically Ground, specifically can provide the interface such as ajax interface or jsonp interface, these interfaces can use java, supertext pretreatment The language such as (hypertext preprocessor, php) language are write, and it specifically calls and can use jquery, or The language such as the primary javascript of person are write.
Mode three: user can press the phonetic search button on the Search Results that current application is represented by length, says Want the voice content inputting, then, unclamp phonetic search button, to trigger search command, in this search command, comprise basis The search keyword of the textual form of voice content conversion said.So, after receiving this search command, then permissible Parse the described search keyword included in it.
Mode four: user can be said by clicking on the phonetic search button on the Search Results that current application is represented Want the voice content inputting, treat that end says voice content for a period of time for example, 2 seconds are afterwards, then trigger search command, should The search keyword of the textual form according to the voice content conversion said is comprised in search command.So, receiving this After search command, then can parse the described search keyword included in it.
After getting described search keyword, then can execute subsequent operation is 102~106.
Alternatively, in a possible implementation of the present embodiment, in 102, existing base can specifically be adopted In the searching method of artificial intelligence, obtain and described search keyword, several Search Results corresponding.Detailed description can be joined See related content of the prior art, do not repeating herein.
It is understood that Search Results involved in the present invention are it is also possible to referred to as web search result or webpage, permissible It is the webpage (web page) write based on HTML (hypertext markup language, html), that is, Html Search Results, or can also be the webpage write based on html and java language, i.e. java server search results (java server page, jsp), or the webpage that can also write for other language, the present embodiment is not carried out to this especially Limit.
Search Results can include by one or more Search Results label for example, HTML (hypertext markup language, html) label, jsp label etc., a display block of definition, referred to as search knot Fruit element, for example, word, picture, hyperlink, button, edit box, combobox etc..For searching of strengthening that search engine provided The bandwagon effect of hitch fruit, in Search Results, except the title of the page and the URL (uniform of the page Resource locator, url) outside, can further include one section of summary being derived from the page.
Alternatively, in a possible implementation of the present embodiment, in 103, specifically can according to described region, Using first nerves network, obtain the space vector of described region, and according to each Search Results described, using described first Neutral net, obtains the first space vector of each Search Results described.And then, then can be according to the space vector of described region With the first space vector of each Search Results described, obtain described region similar to the region between each Search Results described Degree, for example, cosine similarity etc..
Specifically, specifically can adopt existing various participle processing methods, word segmentation processing is carried out to described region, with And each Search Results described are carried out with word segmentation processing, describe in detail and may refer to related content of the prior art, herein not Repeat again.It is then possible to according to the word segmentation result of described region, using first nerves network, obtain each point of described region The space vector of word result, and then, the space vector of each word segmentation result according to described region, obtain the space of described region Vector.At the same time it can also the word segmentation result according to each Search Results described, using first nerves network, obtain described each The space vector of each word segmentation result of Search Results, and then, the sky of each word segmentation result according to each Search Results described Between vector, obtain the first space vector of each Search Results described.
In this implementation, still optionally further, before 103, can also be further according to user's history behavior number According to, obtain same region corresponding to positive example sample and negative example sample, and by the positive example sample corresponding to same region with Negative example sample combination of two, (r represents region to composition paired sample<<r, t, 1><r, t, 0>>, and t represents sample data, and 0 represents negative Example, 1 expression positive example), using as region training data.And then, then can utilize described region training data, build described first Neutral net.Wherein, described first nerves network can include but is not limited to Recognition with Recurrent Neural Network (recurrent neural Network, rnn), convolutional neural networks (convolutional neural network, cnn) or deep neural network (deep neural network, dnn), the present embodiment is not particularly limited to this.
So-called positive example sample, refers to the Search Results clicked on;So-called negative example sample, refers to the search not clicked on Result.For same region, the data sample that a positive example sample and negative example sample just constitute a training is Region training data.Here the Search Results clicked on and the Search Results not clicked on, specifically may refer to draw in search Recorded in the search daily record held up is worked as, when certain user positioned at certain region has searched for a search keyword (query) Afterwards, have selected certain Search Results therein to be browsed further, then the search corresponding to this Search Results can be claimed to tie Fruit is the Search Results clicked on, and the Search Results corresponding to other Search Results unselected are called the search knot not clicked on Really.
It is possible to start study for building first after the data sample obtaining training is region training data The parameter of neutral net.For example, it is possible to the similarity between the space vector of the space vector of region and positive example sample is deducted Difference between similarity between the space vector of the space vector of region and negative example sample maximizes (hinge loss) conduct The learning target of designed first nerves network, to learn the parameter of first nerves network.
Specifically, for a region or Search Results, it needs to process through input layer process, hidden layer Process with output layer.Wherein, orlop is input layer, finds out the term vector of certain word in sentence by way of looking up the dictionary; Taking rnn as a example, the hidden layer vector of hidden layer, it is the calculating by cycling element (recurrent unit), progressively obtain.Sentence Hidden layer vector obtained by last word of son, the vector representation of as whole sentence.
After obtaining the space vector of region, positive example sample and negative example sample, it is possible to use cosine similarity calculates Formula, calculates the similarity between the space vector of region and the space vector of positive example sample, is designated asWith And the similarity between the space vector of region and the space vector of negative example sample, it is designated as
The training objective of the first nerves network trained be so that the space vector of region and positive example sample space to Similarity between amountDeduct the similarity between the space vector of region and the space vector of negative example sampleBetween difference maximize (hinge loss).When concrete implementation, it is to work asSubtract GoDifference, during less than predetermined threshold value (margin), then carry out backward gradient calculation for this sample, and more New gradient.
In order to realize the training of such a deep learning network it would be desirable to have enough data samples i.e. Domain training data, collect enough paired sample<<r, t, 1><r, t, 0>>(r represents region, and t represents sample data, 0 Represent negative example, 1 expression positive example) after, then can using these paired samples as designed first nerves network training number According to.Using stochastic gradient descent (stochastic gradient descent, sgd) algorithm and back propagation At least one algorithm in (backpropagation, bp) algorithm, learns the network parameter of described first nerves network, for example, Parameter w, parameter wh, parameter wrec.Wherein, the detailed description of sgd algorithm and bp algorithm, may refer to correlation of the prior art Content, here is omitted.
So far, described first nerves network struction finishes.
With respect to traditional zone individualty scheme, technical scheme provided by the present invention employs depth learning technology, Avoid relatively complicated Feature Engineering, decrease human cost.
Alternatively, in a possible implementation of the present embodiment, in 103, specifically can be according to described search Key word, using nervus opticus network, obtains the space vector of described search keyword, and according to each search knot described Really, using described nervus opticus network, obtain the second space vector of each Search Results described.And then, then can be according to institute State the space vector of search keyword and the second space vector of each Search Results described, obtain described search keyword and institute State the crucial Word similarity between each Search Results, for example, cosine similarity etc..
Specifically, specifically can adopt existing various participle processing methods, participle is carried out to described search keyword Process, and each Search Results described carried out with word segmentation processing, describe in detail and may refer to related content of the prior art, Here is omitted.It is then possible to according to the word segmentation result of described search keyword, using nervus opticus network, obtain described The space vector of each word segmentation result of search keyword, and then, the sky of each word segmentation result according to described search keyword Between vector, obtain described search keyword space vector.At the same time it can also the participle knot according to each Search Results described Really, using nervus opticus network, obtain the space vector of each word segmentation result of each Search Results described, and then, according to institute State the space vector of each word segmentation result of each Search Results, obtain the second space vector of each Search Results described.
In this implementation, still optionally further, before 103, can also be further according to user's history behavior number According to, obtain the positive example sample corresponding to same search keyword (query) and negative example sample, and will be right for same query institute The positive example sample answered and negative example sample combination of two, (q represents that query, t represent to composition paired sample<<q, t, 1><q, t, 0>> Sample data, 0 represents negative example, 1 expression positive example), using as key word training data.And then, then can utilize described key word Training data, builds described nervus opticus network.Wherein, described nervus opticus network can including but not limited to circulate nerve net Network (recurrent neural network, rnn), convolutional neural networks (convolutional neural network, Cnn) or deep neural network (deep neural network, dnn), the present embodiment is not particularly limited to this.
It is possible to start study for building the after the data sample obtaining training is key word training data The parameter of two neutral nets.For example, it is possible to the similarity between the space vector of the space vector of query and positive example sample is subtracted Go the difference between the similarity between the space vector of query and the space vector of negative example sample to maximize (hinge loss) to make For the learning target of designed nervus opticus network, to learn the parameter of nervus opticus network.
Specifically, for query or Search Results, it needs at input layer process, hidden layer Reason and output layer are processed.Wherein, orlop is input layer, found out by way of looking up the dictionary the word of certain word in sentence to Amount;Taking rnn as a example, the hidden layer vector of hidden layer, it is the calculating by cycling element (recurrent unit), progressively obtain 's.Hidden layer vector obtained by last word of sentence, the vector representation of as whole sentence.
After obtaining the space vector of query, positive example sample and negative example sample, it is possible to use cosine similarity calculates Formula, calculates the similarity between the space vector of query and the space vector of positive example sample, is designated asWith And the similarity between the space vector of query and the space vector of negative example sample, it is designated as
The training objective of the nervus opticus network trained is the space so that the space vector of query and positive example sample Similarity between vectorDeduct the phase between the space vector of query and the space vector of negative example sample Like degreeBetween difference maximize (hinge loss).When concrete implementation, it is to work asDeductDifference, during less than predetermined threshold value (margin), then after carrying out for this sample To gradient calculation, and update gradient.
In order to realize the training of such a deep learning network it would be desirable to there be enough data samples to close Keyword training data, collecting enough paired samples<<q, t, 1><q, t, 0>>, (q represents that query, t represent sample number According to 0 represents negative example, 1 expression positive example) after, then can using these paired samples as designed nervus opticus network instruction Practice data.Using stochastic gradient descent (stochastic gradient descent, sgd) algorithm and back propagation At least one algorithm in (backpropagation, bp) algorithm, learns the network parameter of described nervus opticus network, for example, Parameter w, parameter wh, parameter wrec.Wherein, the detailed description of sgd algorithm and bp algorithm, may refer to correlation of the prior art Content, here is omitted.
So far, described nervus opticus network struction finishes.
Region between each Search Results in obtaining described user place region and at least one Search Results described Key word between similarity, and described search keyword and each Search Results at least one Search Results described is similar After degree, then can adopt Integrated Models, to region similarity and crucial Word similarity, be integrated according to certain weights, obtain Obtain the sequence similarity finally sorting.
In order to verify the effectiveness of technical scheme provided by the present invention, it is possible to use the survey corresponding to same training data Examination data carries out multigroup contrast experiment, and experiment effect is able to demonstrate that technical scheme provided by the present invention achieves optimal effect Really.
Experiment purpose: verify that technical scheme provided by the present invention judges that text is similar under real Webpage search environment The effect of degree.
Training data: come from the training data in Baidu search daily record language material (in November, 2015).
Test data: come from the test data in Baidu search daily record language material (first three sky of in December, 2015).
Appraisal procedure: the positive backward ratio of assessment test data.
Experimental result 1: train first nerves network and nervus opticus network with training data respectively, and integrate mould Type, tests positive backward respectively than index, last comparing result is as follows in test data:
Experimental result 2: in order to verify the effectiveness that first nerves network represents to the space vector of region, can be further Using principal component analysiss (principal component analysis, pca) algorithm, the space vector (256 dimension) to region Dimensionality reduction is to 2 dimension spaces, as a result, the reality of the space vector of region obtained by the first nerves network of training and Chinese city Distribution is basically identical, can substantially restore the map of Chinese city, Beijing area, the Yangtze River Delta, Pearl River Delta, the city in west area Substantially flock together.
Interpretation: using technical scheme provided by the present invention, obtained the common space in each region to Measure this expression, effectively improve the search Evaluated effect of personalization, and by pca fall is carried out to the space vector of region Dimension, becomes actual city distribution in work recovery map of China, demonstrates this method for expressing of space vector of region, effectively carve Draw the distance between city, good semantic meaning representation has been carried out to region.
In the present embodiment, by the search keyword being provided according to acquired user, obtain at least one search knot Really, so obtain described user place region similar to the region between each Search Results at least one Search Results described Degree, and the crucial Word similarity between each Search Results in described search keyword and at least one Search Results described, And according to described region similarity and described key Word similarity, obtain the sequence similarity of each Search Results described so that At least one Search Results described can be ranked up, and export institute according to the sequence similarity of each Search Results described State at least one Search Results after sequence, due to being no longer completely dependent on search keyword execution search operation, but enter one Step combine user place region execute search operation, using the Search Results similar to user place region and search keyword as Search Results so that Search Results substantially meet the real intention of user, therefore, it is possible to avoid in prior art due to user lead to Cross the problem of the data interaction increasing between application and search engine that application is repeated search and leads to, thus reduce searching Index the processing load held up.
In addition, adopting technical scheme provided by the present invention, due to being no longer completely dependent on search keyword execution search behaviour Make, but execute search operation further combined with user place region, will be similar to user place region and search keyword Search Results are as Search Results so that Search Results substantially meet the real intention of user, thus improve Search Results Effectiveness.
In addition, adopting technical scheme provided by the present invention, due to being no longer completely dependent on search keyword execution search behaviour Make, but execute search operation further combined with user place region, will be similar to user place region and search keyword Search Results as Search Results so that Search Results substantially meet the real intention of user, thus improve the efficiency of search.
In addition, adopting technical scheme provided by the present invention, the experience of user can be effectively improved.
It should be noted that for aforesaid each method embodiment, in order to be briefly described, therefore it is all expressed as a series of Combination of actions, but those skilled in the art should know, the present invention is not limited by described sequence of movement because According to the present invention, some steps can be carried out using other orders or simultaneously.Secondly, those skilled in the art also should know Know, embodiment described in this description belongs to preferred embodiment, involved action and the module not necessarily present invention Necessary.
In the above-described embodiments, the description to each embodiment all emphasizes particularly on different fields, and does not have the portion described in detail in certain embodiment Point, may refer to the associated description of other embodiment.
The structural representation of the searcher based on artificial intelligence that Fig. 2 provides for another embodiment of the present invention, as Fig. 2 institute Show.The searcher based on artificial intelligence of the present embodiment can include acquiring unit 21, matching unit 22, pretreatment unit 23rd, integral unit 24, sequencing unit 25 and output unit 26.Acquiring unit 21, for obtaining the search key that user is provided Word;Matching unit 22, for according to described search keyword, obtaining at least one Search Results;Pretreatment unit 23, is used for obtaining Obtain the region similarity between each Search Results, Yi Jisuo in described user place region and at least one Search Results described State the crucial Word similarity between each Search Results in search keyword and at least one Search Results described;Integral unit 24, for according to described region similarity and described key Word similarity, obtaining the sequence similarity of each Search Results described; At least one Search Results described, for the sequence similarity according to each Search Results described, are arranged by sequencing unit 25 Sequence;Output unit 26, for exporting at least one Search Results after described sequence.
It should be noted that the searcher based on artificial intelligence that provided of the present embodiment can be partly or entirely It is located locally the application of terminal, or can also be plug-in unit or the SDK in the application be arranged in local terminal The functional units such as bag (software development kit, sdk), or can also be searching in network side server Index is held up, or can also be the distributed system positioned at network side, and the present embodiment is not particularly limited to this.
It is understood that described application can be mounted in the local program (nativeapp) in terminal, or also may be used To be a web page program (webapp) of browser in terminal, the present embodiment is not particularly limited to this.
Alternatively, in a possible implementation of the present embodiment, described pretreatment unit 23, specifically can be used for According to described region, using first nerves network, obtain the space vector of described region;According to each Search Results described, profit With described first nerves network, obtain the first space vector of each Search Results described;And the space according to described region First space vector of each Search Results described in vector sum, obtains the region between described region and each Search Results described Similarity.
In this implementation, still optionally further, described pretreatment unit 23, can also be further used for according to user Historical behavior data, obtains the positive example sample corresponding to same region and negative example sample;By corresponding to same region just Example sample and negative example sample combination of two, form paired sample, using as region training data;And using the training of described region Data, builds described first nerves network.
Alternatively, in a possible implementation of the present embodiment, described pretreatment unit 23, specifically can be used for According to described search keyword, using nervus opticus network, obtain the space vector of described search keyword;According to described each Search Results, using described nervus opticus network, obtain the second space vector of each Search Results described;And according to described The space vector of search keyword and each Search Results described second space vector, obtain described search keyword with described Crucial Word similarity between each Search Results.
In this implementation, still optionally further, described pretreatment unit 23, can also be further used for according to user Historical behavior data, obtains the positive example sample corresponding to same search keyword and negative example sample;Will be crucial for same search Positive example sample corresponding to word and negative example sample combination of two, form paired sample, using as key word training data;And profit With described key word training data, build described nervus opticus network.
It should be noted that method in the corresponding embodiment of Fig. 1, can by the present embodiment provide based on artificial intelligence's Searcher is realized.Describe the related content that may refer in the corresponding embodiment of Fig. 1 in detail, here is omitted.
In the present embodiment, the search keyword that provided by user according to acquired in acquiring unit for the matching unit, is obtained Obtain at least one Search Results, and then described user place region and at least one Search Results described are obtained by pretreatment unit In in region similarity between each Search Results, and described search keyword and at least one Search Results described each Crucial Word similarity between Search Results, and by integral unit according to described region similarity and described key Word similarity, The sequence similarity obtaining each Search Results described is so that sequencing unit can be according to the sequence phase of each Search Results described Like spend, at least one Search Results described are ranked up, and by output unit export described sequence after at least one search Hitch fruit, due to being no longer completely dependent on search keyword execution search operation, but further combined with the execution of user place region Search operation, using the Search Results similar to user place region and search keyword as Search Results so that Search Results Substantially meet the real intention of user, therefore, it is possible to avoid in prior art due to user by application be repeated search and Lead to increases the problem of the data interaction between application and search engine, thus reducing the processing load of search engine.
In addition, adopting technical scheme provided by the present invention, due to being no longer completely dependent on search keyword execution search behaviour Make, but execute search operation further combined with user place region, will be similar to user place region and search keyword Search Results are as Search Results so that Search Results substantially meet the real intention of user, thus improve Search Results Effectiveness.
In addition, adopting technical scheme provided by the present invention, due to being no longer completely dependent on search keyword execution search behaviour Make, but execute search operation further combined with user place region, will be similar to user place region and search keyword Search Results as Search Results so that Search Results substantially meet the real intention of user, thus improve the efficiency of search.
In addition, adopting technical scheme provided by the present invention, the experience of user can be effectively improved.
Those skilled in the art can be understood that, for convenience and simplicity of description, the system of foregoing description, Device and the specific work process of unit, may be referred to the corresponding process in preceding method embodiment, will not be described here.
It should be understood that disclosed system in several embodiments provided by the present invention, apparatus and method are permissible Realize by another way.For example, device embodiment described above is only schematically, for example, described unit Divide, only a kind of division of logic function, actual can have other dividing mode when realizing, for example, multiple units or group Part can in conjunction with or be desirably integrated into another system, or some features can be ignored, or does not execute.Another, shown Or the coupling each other that discusses or direct-coupling or communication connection can be by some interfaces, device or unit indirect Coupling or communication connection, can be electrical, mechanical or other forms.
The described unit illustrating as separating component can be or may not be physically separate, show as unit The part showing can be or may not be physical location, you can with positioned at a place, or can also be distributed to multiple On NE.The mesh to realize this embodiment scheme for some or all of unit therein can be selected according to the actual needs 's.
In addition, can be integrated in a processing unit in each functional unit in each embodiment of the present invention it is also possible to It is that unit is individually physically present it is also possible to two or more units are integrated in a unit.Above-mentioned integrated list Unit both can be to be realized in the form of hardware, it would however also be possible to employ the form that hardware adds SFU software functional unit is realized.
The above-mentioned integrated unit realized in the form of SFU software functional unit, can be stored in an embodied on computer readable and deposit In storage media.Above-mentioned SFU software functional unit is stored in a storage medium, including some instructions with so that a computer Device (can be personal computer, server, or network equipment etc.) or processor (processor) execution the present invention each The part steps of embodiment methods described.And aforesaid storage medium includes: u disk, portable hard drive, read only memory (read- Only memory, rom), random access memory (random access memory, ram), magnetic disc or CD etc. various Can be with the medium of store program codes.
Finally it is noted that above example, only in order to technical scheme to be described, is not intended to limit;Although With reference to the foregoing embodiments the present invention is described in detail, it will be understood by those within the art that: it still may be used To modify to the technical scheme described in foregoing embodiments, or equivalent is carried out to wherein some technical characteristics; And these modification or replace, do not make appropriate technical solution essence depart from various embodiments of the present invention technical scheme spirit and Scope.

Claims (10)

1. a kind of searching method based on artificial intelligence is it is characterised in that include:
Obtain the search keyword that user is provided;
According to described search keyword, obtain at least one Search Results;
Obtain the region similarity between each Search Results in described user place region and at least one Search Results described, And the crucial Word similarity between each Search Results in described search keyword and at least one Search Results described;
According to described region similarity and described key Word similarity, obtain the sequence similarity of each Search Results described;
According to the sequence similarity of each Search Results described, at least one Search Results described are ranked up;
At least one Search Results after the described sequence of output.
2. method according to claim 1 is it is characterised in that described acquisition described user place region and described at least Region similarity between each Search Results in individual Search Results, comprising:
According to described region, using first nerves network, obtain the space vector of described region;
According to each Search Results described, using described first nerves network, obtain the first space of each Search Results described Vector;
Space vector according to described region and the first space vector of each Search Results described, obtain described region with described Region similarity between each Search Results.
3. method according to claim 2 it is characterised in that described according to described region, using first nerves network, obtain Before obtaining the space vector of described region, also include:
According to user's history behavioral data, obtain the positive example sample corresponding to same region and negative example sample;
By the positive example sample corresponding to same region and negative example sample combination of two, form paired sample, to instruct as region Practice data;
Using described region training data, build described first nerves network.
4. the method according to claims 1 to 3 any claim it is characterised in that described acquisition search keyword with Crucial Word similarity between each Search Results at least one Search Results described, comprising:
According to described search keyword, using nervus opticus network, obtain the space vector of described search keyword;
According to each Search Results described, using described nervus opticus network, obtain the second space of each Search Results described Vector;
The second space vector of the space vector according to described search keyword and each Search Results described, obtains described search Crucial Word similarity between key word and each Search Results described.
5. method according to claim 4 it is characterised in that described according to described search keyword, using nervus opticus Network, before obtaining the space vector of described search keyword, also includes:
According to user's history behavioral data, obtain the positive example sample corresponding to same search keyword and negative example sample;
By the positive example sample corresponding to same search keyword and negative example sample combination of two, form paired sample, using as Key word training data;
Using described key word training data, build described nervus opticus network.
6. a kind of searcher based on artificial intelligence is it is characterised in that include:
Acquiring unit, for obtaining the search keyword that user is provided;
Matching unit, for according to described search keyword, obtaining at least one Search Results;
Pretreatment unit, for obtain in described user place region and at least one Search Results described each Search Results it Between region similarity, and the pass between each Search Results in described search keyword and at least one Search Results described Keyword similarity;
Integral unit, for according to described region similarity and described key Word similarity, obtaining each Search Results described Sequence similarity;
At least one Search Results described, for the sequence similarity according to each Search Results described, are carried out by sequencing unit Sequence;
Output unit, for exporting at least one Search Results after described sequence.
7. device according to claim 6 is it is characterised in that described pretreatment unit, specifically for
According to described region, using first nerves network, obtain the space vector of described region;
According to each Search Results described, using described first nerves network, obtain the first space of each Search Results described Vector;And
Space vector according to described region and the first space vector of each Search Results described, obtain described region with described Region similarity between each Search Results.
8. device according to claim 7, it is characterised in that described pretreatment unit, is additionally operable to
According to user's history behavioral data, obtain the positive example sample corresponding to same region and negative example sample;
By the positive example sample corresponding to same region and negative example sample combination of two, form paired sample, to instruct as region Practice data;And
Using described region training data, build described first nerves network.
9. the device according to claim 6~8 any claim, it is characterised in that described pretreatment unit, is specifically used In
According to described search keyword, using nervus opticus network, obtain the space vector of described search keyword;
According to each Search Results described, using described nervus opticus network, obtain the second space of each Search Results described Vector;And
The second space vector of the space vector according to described search keyword and each Search Results described, obtains described search Crucial Word similarity between key word and each Search Results described.
10. device according to claim 9, it is characterised in that described pretreatment unit, is additionally operable to
According to user's history behavioral data, obtain the positive example sample corresponding to same search keyword and negative example sample;
By the positive example sample corresponding to same search keyword and negative example sample combination of two, form paired sample, using as Key word training data;And
Using described key word training data, build described nervus opticus network.
CN201610800766.7A 2016-09-02 2016-09-02 Search method and device based on artificial intelligence Pending CN106354852A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610800766.7A CN106354852A (en) 2016-09-02 2016-09-02 Search method and device based on artificial intelligence

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610800766.7A CN106354852A (en) 2016-09-02 2016-09-02 Search method and device based on artificial intelligence

Publications (1)

Publication Number Publication Date
CN106354852A true CN106354852A (en) 2017-01-25

Family

ID=57858817

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610800766.7A Pending CN106354852A (en) 2016-09-02 2016-09-02 Search method and device based on artificial intelligence

Country Status (1)

Country Link
CN (1) CN106354852A (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107832476A (en) * 2017-12-01 2018-03-23 北京百度网讯科技有限公司 A kind of understanding method of search sequence, device, equipment and storage medium
CN109410935A (en) * 2018-11-01 2019-03-01 平安科技(深圳)有限公司 A kind of destination searching method and device based on speech recognition
CN109767270A (en) * 2019-01-17 2019-05-17 建信养老金管理有限责任公司 The old information recommendation method of housing support and system are deposited based on artificial intelligence
CN110399568A (en) * 2019-07-04 2019-11-01 Oppo广东移动通信有限公司 Information search method, device, terminal and storage medium
CN110442759A (en) * 2019-07-25 2019-11-12 深圳供电局有限公司 Knowledge retrieval method and system, computer equipment and readable storage medium
CN111382251A (en) * 2018-12-25 2020-07-07 株式会社日立制作所 Text generation method, text generation device, and learned model

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102073735A (en) * 2011-01-20 2011-05-25 百度在线网络技术(北京)有限公司 Searching method and searching system
CN103049481A (en) * 2012-11-29 2013-04-17 百度在线网络技术(北京)有限公司 Searching method and searching device
CN104462357A (en) * 2014-12-08 2015-03-25 百度在线网络技术(北京)有限公司 Method and device for realizing personalized search
CN104615767A (en) * 2015-02-15 2015-05-13 百度在线网络技术(北京)有限公司 Searching-ranking model training method and device and search processing method
CN104715063A (en) * 2015-03-31 2015-06-17 百度在线网络技术(北京)有限公司 Search ranking method and search ranking device

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102073735A (en) * 2011-01-20 2011-05-25 百度在线网络技术(北京)有限公司 Searching method and searching system
CN103049481A (en) * 2012-11-29 2013-04-17 百度在线网络技术(北京)有限公司 Searching method and searching device
CN104462357A (en) * 2014-12-08 2015-03-25 百度在线网络技术(北京)有限公司 Method and device for realizing personalized search
CN104615767A (en) * 2015-02-15 2015-05-13 百度在线网络技术(北京)有限公司 Searching-ranking model training method and device and search processing method
CN104715063A (en) * 2015-03-31 2015-06-17 百度在线网络技术(北京)有限公司 Search ranking method and search ranking device

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107832476A (en) * 2017-12-01 2018-03-23 北京百度网讯科技有限公司 A kind of understanding method of search sequence, device, equipment and storage medium
CN107832476B (en) * 2017-12-01 2020-06-05 北京百度网讯科技有限公司 Method, device, equipment and storage medium for understanding search sequence
CN109410935A (en) * 2018-11-01 2019-03-01 平安科技(深圳)有限公司 A kind of destination searching method and device based on speech recognition
CN111382251A (en) * 2018-12-25 2020-07-07 株式会社日立制作所 Text generation method, text generation device, and learned model
CN109767270A (en) * 2019-01-17 2019-05-17 建信养老金管理有限责任公司 The old information recommendation method of housing support and system are deposited based on artificial intelligence
CN110399568A (en) * 2019-07-04 2019-11-01 Oppo广东移动通信有限公司 Information search method, device, terminal and storage medium
CN110399568B (en) * 2019-07-04 2022-09-30 Oppo广东移动通信有限公司 Information searching method, device, terminal and storage medium
CN110442759A (en) * 2019-07-25 2019-11-12 深圳供电局有限公司 Knowledge retrieval method and system, computer equipment and readable storage medium
CN110442759B (en) * 2019-07-25 2022-05-13 深圳供电局有限公司 Knowledge retrieval method and system, computer equipment and readable storage medium

Similar Documents

Publication Publication Date Title
CN110929164B (en) Point-of-interest recommendation method based on user dynamic preference and attention mechanism
CN106354852A (en) Search method and device based on artificial intelligence
CN108829822B (en) Media content recommendation method and device, storage medium and electronic device
CN110489755A (en) Document creation method and device
CN110598206A (en) Text semantic recognition method and device, computer equipment and storage medium
CN108280114B (en) Deep learning-based user literature reading interest analysis method
CN109241258A (en) A kind of deep learning intelligent Answer System using tax field
CN107491547A (en) Searching method and device based on artificial intelligence
CN111368074A (en) Link prediction method based on network structure and text information
CN106951438A (en) A kind of event extraction system and method towards open field
CN110032623B (en) Method and device for matching question of user with title of knowledge point
CN104598611B (en) The method and system being ranked up to search entry
CN108549658A (en) A kind of deep learning video answering method and system based on the upper attention mechanism of syntactic analysis tree
CN111966786A (en) Microblog rumor detection method
CN109597876A (en) A kind of more wheels dialogue answer preference pattern and its method based on intensified learning
CN109255012B (en) Method and device for machine reading understanding and candidate data set size reduction
CN102270212A (en) User interest feature extraction method based on hidden semi-Markov model
US11551114B2 (en) Method and apparatus for recommending test question, and intelligent device
CN110837738A (en) Similarity recognition method and device, computer equipment and storage medium
CN111078546B (en) Page feature expression method and electronic equipment
CN109087205A (en) Prediction technique and device, the computer equipment and readable storage medium storing program for executing of public opinion index
CN108710672B (en) Theme crawler method based on incremental Bayesian algorithm
CN111524593A (en) Medical question-answering method and system based on context language model and knowledge embedding
CN105956011A (en) Method and device for searching
Zhou et al. ICRC-HIT: A deep learning based comment sequence labeling system for answer selection challenge

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20170125

RJ01 Rejection of invention patent application after publication