CN115757546A - Data searching method and device - Google Patents

Data searching method and device Download PDF

Info

Publication number
CN115757546A
CN115757546A CN202211474766.4A CN202211474766A CN115757546A CN 115757546 A CN115757546 A CN 115757546A CN 202211474766 A CN202211474766 A CN 202211474766A CN 115757546 A CN115757546 A CN 115757546A
Authority
CN
China
Prior art keywords
data
matching
data set
search
score
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211474766.4A
Other languages
Chinese (zh)
Inventor
王沛锋
宋增超
刘贤
杨孟乐
白存
黄普安
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Jingdong Century Trading Co Ltd
Beijing Wodong Tianjun Information Technology Co Ltd
Original Assignee
Beijing Jingdong Century Trading Co Ltd
Beijing Wodong Tianjun Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Jingdong Century Trading Co Ltd, Beijing Wodong Tianjun Information Technology Co Ltd filed Critical Beijing Jingdong Century Trading Co Ltd
Priority to CN202211474766.4A priority Critical patent/CN115757546A/en
Publication of CN115757546A publication Critical patent/CN115757546A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a data searching method and device, and relates to the technical field of computers. One embodiment of the method comprises: determining a search mode according to the obtained search field and the screening condition; when the search is accurate matching search, determining first matching data according to screening conditions, calculating matching scores of the first matching data by using a first scoring model, and sequencing according to the matching scores to obtain an accurate search result; and when the fuzzy matching search is performed, performing intention identification according to the search field and the screening condition, determining second matching data according to the intention identification result, calculating the matching score of the second matching data by using a second scoring model, and sequencing according to the matching score to obtain the fuzzy search result. The implementation mode not only realizes the searching method based on the user intention identification and improves the searching accuracy and efficiency, but also adopts the comprehensive scoring model for sorting, can return the related searching results with high quality and improves the accuracy of the searching results.

Description

Data searching method and device
Technical Field
The present invention relates to the field of computer technologies, and in particular, to a method and an apparatus for searching data.
Background
The Elasticissearch (ES) is used as a full-text search engine with an open-source, distributed and RESTful interface (a software architecture style) constructed based on Lucene (a full-text search framework), has the capability of storing, searching and analyzing a large amount of data in a very short time, and the existing data query platform adopts the ES full-text search engine, carries out weighted search by carrying out English names, chinese names, creators, development responsible persons, business descriptions and use descriptions of tables, and combines screening condition screening of the ES to obtain data meeting requirements of a specific business system and a business domain.
In the process of implementing the invention, the inventor finds that the following problems exist in the prior art:
in the prior art, on one hand, the data search uses a matching scheme of the relevancy, not only other valuable influencing factors (such as high collection amount, high browsing amount and the like) are ignored and are not beneficial to obtaining comprehensive search results, but also excessive relevancy matching fields cause that no focus point exists in the search and the accuracy of the search results is influenced; on the other hand, a universal searching scheme is used, the searching intention identification for the user is lacked, and the searching accuracy and efficiency are influenced.
Disclosure of Invention
In view of this, embodiments of the present invention provide a method and an apparatus for data search, where a search mode is determined according to a search field and a screening condition, when the search mode is an accurate match, a first scoring model is used to calculate a matching score of first matching data that meets the screening condition, and the first matching data is sorted to obtain an accurate search result, and when the search mode is a fuzzy match, a second scoring model is used to calculate a matching score of second matching data that is determined by an intention recognition result, and the second matching data is sorted to obtain a fuzzy search result.
To achieve the object, according to an aspect of an embodiment of the present invention, there is provided a data search method including:
determining a search mode according to the obtained search field and the screening condition, wherein if the value of the search field is null, the search mode is determined to be accurate matching search, otherwise, the search mode is determined to be fuzzy matching search;
under the condition that the searching mode is accurate matching searching, determining first matching data according to the screening condition, calculating matching scores of the first matching data by using a first scoring model, and sequencing according to the matching scores to obtain an accurate searching result;
and under the condition that the search mode is fuzzy matching search, performing intention identification according to the search field and the screening condition, determining second matching data according to an intention identification result, calculating a matching score of the second matching data by using a second scoring model, and sequencing according to the matching score to obtain a fuzzy search result.
Optionally, performing intent recognition according to the search field and the filtering condition, including: generating a wildcard expression based on historical behavior data of a user; and identifying the intention according to the wildcard expression, the search field and the screening condition.
Optionally, in a case that the intention recognition result is a first result, determining second matching data according to the intention recognition result, calculating a matching score of the second matching data using a second scoring model, and sorting according to the matching score, including: determining a first data set meeting a first threshold according to the relevance of data to the screening condition and the intention identification result; determining a second data set meeting a second threshold according to the relevance of data to the search field and the screening condition, wherein the first threshold is greater than the second threshold, and the first data set is higher in sorting priority than the second data set; judging whether the sum of the number of the data pieces of the first data set and the second data set meets a piece number threshold value; if the number threshold is met, then taking the first data set and the second data set as the second matching data; calculating a first matching score of each piece of data in the first data set by using a second scoring model, and sorting the data in the first data set according to the first matching score; calculating a second matching score of each piece of data in the second data set by using a second scoring model, and sorting the data in the second data set according to the second matching score; and sequencing the data in the sequenced first data set and the data in the sequenced second data set according to the sequence of the sequencing priority from top to bottom so as to sequence the second matching data.
Optionally, in a case that the intention recognition result is a first result, determining second matching data according to the intention recognition result, calculating a matching score of the second matching data using a second scoring model, and sorting according to the matching score, further comprising: determining a specified number of pieces of data as a third data set from the data other than the first data set and the second data set according to the correlation degree of the data with the search field and the screening condition if the number threshold is not satisfied, wherein the specified number is the difference between the number threshold and the sum of the number of pieces of data of the first data set and the second data set, and the third data set is lower in sorting priority than the second data set; taking the first data set, the second data set and the third data set as the second matching data; calculating a first matching score of each piece of data in the first data set by using a second scoring model, and sorting the data in the first data set according to the first matching score; calculating a second matching score of each piece of data in the second data set by using a second scoring model, and sorting the data in the second data set according to the second matching score; sorting the data in the third data set according to the relevance of each piece of data in the third data set to the search field and the screening condition; and sorting the data in the sorted first data set, the data in the sorted second data set and the data in the sorted third data set according to the sequence of the sorting priority from top to bottom so as to sort the second matching data.
Optionally, calculating a first matching score for each piece of data in the first set of data using a second scoring model, comprising: calculating a first matching score of each piece of data in the first data set based on the key attribute information of each piece of data in the first data set by using a second scoring model; calculating a second match score for each piece of data in the second set of data using a second scoring model, comprising: calculating a second matching score for each piece of data in the second data set based on the relevance of each piece of data in the second data set to the search field and the screening condition, and the key attribute information of each piece of data in the second data set using a second scoring model.
Optionally, in a case that the intention recognition result is a second result, determining second matching data according to the intention recognition result, calculating a matching score of the second matching data using a second scoring model, and sorting according to the matching score, including: determining a fourth data set meeting a third threshold according to the correlation degree of data with the search field and the screening condition; judging whether the number of the data of the fourth data set meets a number threshold value or not; if the number threshold is met, taking the fourth data set as the second matching data; and calculating a third matching score of each piece of data in the fourth data set by using a second scoring model, and sorting the data in the fourth data set according to the third matching score so as to sort the second matching data.
Optionally, in a case that the intention recognition result is a second result, determining second matching data according to the intention recognition result, calculating a matching score of the second matching data using a second scoring model, and sorting according to the matching score, further comprising: determining a specified number of pieces of data from the data except the fourth data set as a fifth data set according to the correlation degree of the data with the search field and the screening condition under the condition that the number threshold is not met, wherein the specified number is the difference between the number threshold and the number of the pieces of data of the fourth data set, and the fifth data set is lower in sorting priority than the fourth data set; taking the fourth data set and the fifth data set as the second matching data; calculating a third matching score of each piece of data in the fourth data set by using a second scoring model, and sorting the data in the fourth data set according to the third matching score; sorting the data in the fifth data set according to the relevance of each piece of data in the fifth data set to the search field and the screening condition; and sorting the data in the sorted fourth data set and the data in the sorted fifth data set according to the sequence from the top to the bottom of the sorting priority, so as to sort the second matching data.
Optionally, calculating a third matching score for each piece of data in the fourth set of data using a second scoring model, comprising: calculating a third matching score for each piece of data in the fourth data set based on the relevance of each piece of data in the fourth data set to the search field and the screening condition, and the key attribute information of each piece of data in the fourth data set using a second scoring model.
Optionally, calculating a matching score for the first matching data using a first scoring model, comprising: calculating a matching score of the first matching data based on key attribute information of the first matching data using a first scoring model.
Optionally, when the first scoring model and the second scoring model calculate the matching score, the matching score is constrained based on an anti-expansion function and a normalization function.
According to a second aspect of the embodiments of the present invention, there is provided an apparatus for data search, including:
the search mode determining module is used for determining a search mode according to the obtained search field and the screening condition, wherein if the value of the search field is null, the search mode is judged to be accurate matching search, and otherwise, the search mode is judged to be fuzzy matching search;
the accurate searching module is used for determining first matching data according to the screening condition under the condition that the searching mode is accurate matching searching, calculating the matching score of the first matching data by using a first scoring model, and sequencing according to the matching score to obtain an accurate searching result;
and the fuzzy search module is used for performing intention identification according to the search field and the screening condition under the condition that the search mode is fuzzy matching search, determining second matching data according to an intention identification result, calculating a matching score of the second matching data by using a second scoring model, and sequencing according to the matching score to obtain a fuzzy search result.
According to a third aspect of the embodiments of the present invention, there is provided an electronic device for data search, including:
one or more processors;
a storage device for storing one or more programs,
when the one or more programs are executed by the one or more processors, the one or more processors are caused to implement the method provided by the first aspect of the embodiments of the present invention.
According to a fourth aspect of embodiments of the present invention, there is provided a computer readable medium, on which a computer program is stored, which when executed by a processor, implements the method provided by the first aspect of embodiments of the present invention.
One embodiment of the invention has the following advantages or benefits: determining a searching mode according to the obtained searching field and the screening condition, wherein if the value of the searching field is null, the searching mode is judged to be accurate matching searching, and otherwise, the searching mode is judged to be fuzzy matching searching; under the condition that the searching mode is accurate matching searching, determining first matching data according to screening conditions, calculating matching scores of the first matching data by using a first scoring model, and sequencing according to the matching scores to obtain an accurate searching result; under the condition that the search mode is fuzzy matching search, intention identification is carried out according to the search field and the screening condition, second matching data are determined according to the intention identification result, the matching score of the second matching data is calculated by using a second scoring model, and ranking is carried out according to the matching score to obtain the technical scheme of the fuzzy search result, so that the search mode is determined according to the search field and the screening condition; when the accurate matching search is judged, calculating a comprehensive matching score of the accurate matching result based on the first scoring model, and sequencing to obtain the accurate searching result; and when the fuzzy matching search is judged, calculating the comprehensive matching score of the fuzzy search result determined by the intention recognition result based on the second grading model, and sequencing to obtain the fuzzy search result. According to the technical scheme, the search results meeting the user requirements are arranged at the forefront end through user intention identification, the use experience of the user is improved, the search accuracy and efficiency are improved, and the comprehensive scoring model is adopted for sequencing, so that the high-quality related search results can be returned, and the accuracy of the search results is improved.
Drawings
The drawings are included to provide a better understanding of the invention and are not to be construed as unduly limiting the invention. Wherein:
FIG. 1 is a schematic diagram of a main flow of a method of data searching according to an embodiment of the present invention;
FIG. 2 is a flow chart illustrating a method of table model searching in accordance with an embodiment of the present invention;
FIG. 3 is a schematic diagram of the main modules of an apparatus for data search according to an embodiment of the present invention;
FIG. 4 is an exemplary system architecture diagram in which embodiments of the present invention may be applied;
fig. 5 is a schematic block diagram of a computer system suitable for use in implementing a terminal device or server of an embodiment of the invention.
Detailed Description
In the technical solution of the present disclosure, the acquisition, storage, application, and the like of the personal information of the related user all meet the regulations of the relevant laws and regulations, and do not violate the common customs of the public order.
Exemplary embodiments of the invention are described below with reference to the accompanying drawings, in which various details of embodiments of the invention are included to assist understanding, and which are to be considered as merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the invention. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.
In the prior art, on one hand, the data search uses a correlation matching scheme, so that other valuable influence factors (such as high collection amount, high browsing amount and the like) are ignored, the comprehensive search result is not favorably obtained, and the accuracy of the search result is influenced because no focus point exists in the search due to excessive correlation matching fields; on the other hand, a universal searching scheme is used, the searching intention of the user is not identified, and the searching accuracy and efficiency are influenced.
In order to solve the problems in the prior art, the invention provides a data searching method, which comprises the steps of determining a searching mode according to an acquired searching field and a screening condition, when the searching mode is accurate matching, calculating the matching score of first matching data meeting the screening condition by using a first scoring model, sequencing to obtain an accurate searching result, and when the searching mode is fuzzy matching, calculating the matching score of second matching data determined by an intention recognition result by using a second scoring model, sequencing to obtain a fuzzy searching result.
In the description of the embodiments of the present invention, the terms and their meanings are as follows:
table model: each data table in the data warehouse is a table, and the metadata of the data table is a table model or model metadata;
API: the application programming interface is some predefined interface (such as function, HTTP interface), or refers to a convention for linking different components of the software system. A set of routines to provide applications and developers access based on certain software or hardware without accessing source code or understanding the details of internal working mechanisms;
searchafter: a paging inquiry method.
Fig. 1 is a schematic diagram of a main flow of a data search method according to an embodiment of the present invention, and as shown in fig. 1, the data search method according to the embodiment of the present invention includes steps S101 to S103 as follows.
Step S101, determining a search mode according to the obtained search field and the screening condition, wherein if the value of the search field is null, the search mode is determined to be accurate matching search, otherwise, the search mode is determined to be fuzzy matching search.
Specifically, when the ES full text search engine is used for data search, a search field and a screening condition are obtained according to search content input by a received user, a search mode to be executed by the data search is determined, for example, the user limits a business system of a table model to be queried in a search interface of the table model, screens a business domain, or limits a development responsible person of the table model and screens a business responsible person, and a field (which may be an english name or a business or technical description in a table) related to the search content is input in the search field, the device front end respectively returns the screening condition and the search field input by the user to the ES search engine in the form of filter and query, and determines right qualification for data information acquisition according to an identity of the user, and adds the right information into the screening condition filter. The ES judges the received search field and the screening condition, and if the value of the query of the search field is null and only the screening condition filter exists, the ES judges that the data search adopts accurate matching search; and if the value of the search field query is not null, namely as long as the search field exists, judging that the data search adopts fuzzy matching search.
And S102, under the condition that the searching mode is accurate matching searching, determining first matching data according to the screening condition, calculating the matching score of the first matching data by using a first scoring model, and sequencing according to the matching score to obtain an accurate searching result.
Specifically, the accurate matching adopts a termsQuery API of the ES to perform condition matching on the filtering condition filter, first matching data meeting the filtering condition are obtained, a first scoring model is used for calculating the matching score of the first matching data, the first matching data are ranked according to the matching score, matching data meeting the number of the specified search results are recalled from the first matching data by adopting a paging strategy searchafter according to the ranking result, and the matching data are displayed in a search interface to obtain the accurate search result.
According to one embodiment of the invention, calculating a match score for the first match data using a first scoring model comprises: calculating a matching score of the first matching data based on key attribute information of the first matching data using a first scoring model.
According to another embodiment of the invention, the first scoring model constrains the matching score based on an anti-swelling function when calculating the matching score.
Specifically, in consideration of the comprehensiveness of the search results, the key attribute information of the data is incorporated into the first scoring model, and in the embodiment of the present invention, the key attribute information mainly relates to the browsing amount of the data, the collection amount of the data, the quality score and the value of the representation of the use condition of the data, the creation condition, and the authentication status score representing the passing of authentication by the authentication department. Acquiring browsing amount, collection amount and authentication state score representing the platform where the first matching data is located and passing authentication of an authentication department, and quality scores and value values representing data use conditions and creation conditions, performing weighted summation on the attribute values, and performing anti-swelling compression processing through natural logarithm, wherein a calculation formula of a specific matching score function score is as follows:
FunctionScore=a*ln(1+count pv )+b*ln(1+count attention )+c*ln(1+score value )+d*ln(1+score quality )+e*isAuth
among them, the weight coefficients a, b, c, d, e are preferably a =1,b =2,c =2,d =2,e =4,count in the embodiment of the present invention pv Represents the browsing amount, count, of the data on the platform attention Score, representing the number of collections of this piece of data by the user on the platform value Score, representing the value of this piece of data quality And the quality score of the data is represented, the isAuth represents the authentication state score of the data, the authenticated data is 1, and otherwise, the authenticated data is 0.
Through the natural logarithm operation of the browsing volume, the collection volume, the value, the quality score and the authentication state score, each attribute index is subjected to compression anti-expansion processing, each attribute index is guaranteed to be maintained between 0 and 10, and the matching score of the comprehensive browsing volume, the collection volume, the value, the quality score and the authentication state score is obtained, so that more scientific sorting recall is performed.
And step S103, under the condition that the search mode is fuzzy matching search, performing intention identification according to the search field and the screening condition, determining second matching data according to an intention identification result, calculating the matching score of the second matching data by using a second scoring model, and sorting according to the matching score to obtain a fuzzy search result.
According to one embodiment of the invention, the intention identification is carried out according to the search field and the screening condition, and the intention identification comprises the following steps: generating a wildcard expression based on historical behavior data of a user; and identifying the intention according to the wildcard expression, the search field and the screening condition.
Specifically, in the case that the data search mode is fuzzy matching search, wildcard expressions are generated based on historical behavioral data of the user, the data may be matched by a screening condition, and then wildcard matching is performed on a search field to identify an intention of the user search, where an intention identification result of the user search mainly includes three types, the first type is a search intention of a data table name, and the second type is a search intention of a data table name and a data table name, for example, if the data table name involved in the search field is adm _ d04_ trade _ ord _ det _ sku _ snapshot, the wildcard expression corresponding to the intention expression is ^ ([ 0-9a-zA-Z ] {1, } _ >), [0-9a-zA-Z ] {1, } where ^ denotes the start, $ denotes the end, {1, } denotes the intention, denotes the number of at least 1 preceding field body [0-9a-zA-Z ] }, } denotes the preceding ([ 0-9a-Z ] }, } denotes the content of the preceding field body, and {1 to infinity "} denotes the number of the intention of the preceding field body [0-9a-zA- }, } Z ] } indicates the number of the data to be from infinity, } Z, and {1 to { 2 a to { Z ] } denotes the number of the contents of the search intention; if the search field relates to database names and data table names as adm, adm _ jxpp _ d04_ trade _ ord _ det _ snapshot, and the corresponding wildcard is ^ (library name) ([ 0-9a-zA-Z ] {1, }.) [0-9a-zA-Z ] {1, } $, each character in the expression has the same meaning as that in the data table, it means that the search intention which is intended to be identified is a library name.
According to another embodiment of the present invention, in a case where the intention recognition result is a first result, determining second matching data according to the intention recognition result, calculating a matching score of the second matching data using a second scoring model, and sorting according to the matching score, includes: determining a first data set meeting a first threshold according to the relevance of data to the screening condition and the intention identification result; determining a second data set meeting a second threshold according to the relevance of data to the search field and the screening condition, wherein the first threshold is greater than the second threshold, and the first data set is higher in sorting priority than the second data set; judging whether the sum of the number of the data pieces of the first data set and the second data set meets a piece number threshold value; if the number threshold is met, then taking the first data set and the second data set as the second matching data; calculating a first matching score of each piece of data in the first data set by using a second scoring model, and sorting the data in the first data set according to the first matching score; calculating a second matching score of each piece of data in the second data set by using a second scoring model, and sorting the data in the second data set according to the second matching score; and sequencing the data in the sequenced first data set and the data in the sequenced second data set according to the sequence of the sequencing priority from top to bottom so as to sequence the second matching data. Wherein the first result comprises the first search intention and the second search intention, namely the search intention of the data table name, and the search intention of the database name and the data table name.
According to a further embodiment of the present invention, calculating a first matching score for each piece of data in the first set of data using a second scoring model comprises: calculating a first matching score of each piece of data in the first data set based on the key attribute information of each piece of data in the first data set by using a second scoring model; calculating a second match score for each piece of data in the second set of data using a second scoring model, comprising: calculating a second matching score for each piece of data in the second data set based on the relevance of each piece of data in the second data set to the search field and the screening condition, and the key attribute information of each piece of data in the second data set using a second scoring model.
According to another embodiment of the invention, the second scoring model constrains the matching score based on an anti-expansion function and a normalization function when calculating the matching score.
Specifically, the intention identification result can be obtained according to the search field and the screening condition, and under the condition that the intention identification result is a first result, namely the intention identification result is a database name and a data table name or the intention identification result is a data table name, the screening condition and the intention identification result data table name or the data table name are carried out on the data, the correlation degree of the data table name is matched, the data meeting a first threshold value is taken as a first data set, and the embodiment of the invention requires complete matching, namely the first threshold value is 100%; according to the relevance matching of the data, the search field and the screening condition, the data meeting a second threshold value is used as a second data set, the second threshold value in the embodiment of the invention is 85%, as the intention identification is a matching rule which best meets the personalized search requirement of the user, the first threshold value is larger than the second threshold value under the normal condition, and the sorting priority of the first data set is higher than that of the second data set; judging whether the sum of the number of the data of the first data set and the number of the data of the second data set meets a number threshold value, wherein the number threshold value is used for limiting the number of search results displayed on a home page in a search interface; if the number is larger than or equal to the number threshold value, the first data set and the second data set are used as second matching data; the key attribute information in the embodiment of the invention mainly relates to the browsing volume of data, the collection volume of data, the quality score and the value for representing the use condition of the data and the creation condition of the data, and the authentication state score for representing the passing of authentication through an authentication department, wherein the authentication state score 1 represents the authentication and 0 represents the non-authentication. Grouping the first data set according to the scores of the authentication states, taking authenticated data as a first group, taking unauthenticated data as a second group, acquiring the browsing amount of a platform where the data in each group are located, the collection amount of the platform where the data are located, and the quality scores and the value values representing the use condition and the creation condition of the data, performing weighted summation on the attribute values, and performing anti-expansion compression processing and normalization processing through a natural logarithm and a normalization function, wherein the calculation formula of a specific first matching score functional score is as follows:
FunctionScore=a*F(count pv )+b*F(count attention )+c*F(score value )+d*F(score quality )
wherein, an additional score function F (x) is defined as:
Figure BDA0003959190490000121
the weight coefficients a, b, c, d are preferably a =1,b =1,c =2,d =2,x in the embodiment of the present invention max Represents the maximum value, count, of each index in the first data set pv Represents the browsing amount, count, of the data on the platform attention Score, which represents the number of collections of this piece of data by the user on the platform value Score, representing the value of the piece of data quality Representing the quality of the data, sorting the data in the groups according to the first matching score in each group, and placing the sorted first group in front of the sorted second group to obtain a sorted first data set; and using a second scoring model for the data in the second data set, acquiring a correlation score of each piece of data with a search field and a screening condition, a platform browsing amount, a platform collection amount and an authentication state score representing the passing of authentication by an authentication department, and a quality score and a value representing the use condition and the creation condition of the data, performing weighted summation on the attribute values, and performing anti-expansion compression processing and normalization processing through a natural logarithm and a normalization function, wherein the specific calculation formula of a second matching score Functionscore is as follows:
FunctionScore=k*G(score)+a*F(count pv )+b*F(count attention )+c*F(score value )+d*F(score quality )+e*isAuth
wherein the additional score function F (x) is as above, and the correlation score function G (score) is:
Figure BDA0003959190490000122
weighting coefficients k, a, b, c, d, eIt is preferable in the embodiment of the present invention that k =10,a =1,b =1,c =2,d =2,e =4,score represents the correlation value of the correlation matching, and score represents the correlation value of the correlation matching max Representing the maximum value, count, of the degree of correlation in the second data set pv Represents the browsing amount, count, of the data on the platform attention Score, which represents the number of collections of this piece of data by the user on the platform value Score, representing the value of the piece of data quality Representing the quality score of the data, and sorting the data in the second data set according to the second matching score; and sequencing the data in the sequenced first data set and the data in the sequenced second data set in sequence according to the sequence that the sequencing priority of the first data set is higher than that of the second data set to obtain the sequencing of second matching data, displaying the sequenced second matching data in a search page, selecting a first data subset which is arranged in front of the sequenced first data set and accords with the strip number threshold from the sequenced first data set to display the first data subset on a top page under the condition that the number of the data of the first data set is greater than the strip number threshold, and displaying the data of the sequenced second data set in a subsequent search result page. In addition, in the case where there is duplicate data in the second data set with the first data set, the duplicate data in the second data set is deleted.
According to still another embodiment of the present invention, in a case where the intention recognition result is a first result, determining second matching data according to the intention recognition result, calculating a matching score of the second matching data using a second scoring model, and sorting according to the matching score, further comprises: determining a specified number of pieces of data as a third data set from data other than the first data set and the second data set according to the correlation degree of the data with the search field and the screening condition if the number threshold is not satisfied, wherein the specified number is the difference between the number threshold and the sum value of the number of the pieces of data of the first data set and the second data set, and the third data set is lower in ranking priority than the second data set; taking the first data set, the second data set and the third data set as the second matching data; calculating a first matching score of each piece of data in the first data set by using a second scoring model, and sorting the data in the first data set according to the first matching score; calculating a second matching score of each piece of data in the second data set by using a second scoring model, and sorting the data in the second data set according to the second matching score; sorting the data in the third data set according to the relevance of each piece of data in the third data set to the search field and the screening condition; and sorting the data in the sorted first data set, the data in the sorted second data set and the data in the sorted third data set according to the sequence of the sorting priority from top to bottom so as to sort the second matching data.
According to yet another embodiment of the present invention, calculating a first match score for each piece of data in the first set of data using a second scoring model comprises: calculating a first matching score of each piece of data in the first data set based on key attribute information of each piece of data in the first data set by using a second scoring model; calculating a second match score for each piece of data in the second set of data using a second scoring model, comprising: calculating a second matching score of each piece of data in the second data set based on the relevance of each piece of data in the second data set to the search field and the screening condition and the key attribute information of each piece of data in the second data set by using a second scoring model.
Specifically, under the condition that the sum of the number of the data pieces of the first data set and the second data set does not satisfy the number threshold, a scheme of warranty recall is used, and data determining the specified number of the data pieces are selected from the data except the first data set and the second data set as a third data set according to the correlation degree of the data pieces with the search field and the screening condition, wherein the correlation degree can be appropriately reduced on the basis of the second threshold, or the data meeting any matching field in the search field and the screening condition can be selected as the third data; taking the first data set, the second data set and the third data set as second matching data to meet the condition that the sum of the number of data in the first data set, the second data set and the third data set is equal to a preset number threshold, and the sorting priority of the third data set is lower than that of the second data set according to the size relation of the correlation; the specific implementation of calculating the first matching score and the second matching score to sort the data in the corresponding first data set and sort the data in the second data set has been introduced above, and is not described again; sorting the data in the third data set according to the relevance of each piece of data in the third data set with the search field and the screening condition; and sequencing the data in the sequenced first data set, the data in the sequenced second data set and the data in the sequenced third data set in sequence from top to bottom according to the sequencing priority to obtain a sequencing result of the second matched data, and displaying the sequencing result on a search interface according to the sequencing result.
According to another embodiment of the present invention, in a case where the intention recognition result is a second result, determining second matching data according to the intention recognition result, calculating a matching score of the second matching data using a second scoring model, and sorting according to the matching score, includes: determining a fourth data set meeting a third threshold according to the correlation degree of data with the search field and the screening condition; judging whether the number of the data of the fourth data set meets a number threshold value or not; if the number threshold is met, taking the fourth data set as the second matching data; and calculating a third matching score of each piece of data in the fourth data set by using a second scoring model, and sorting the data in the fourth data set according to the third matching score so as to sort the second matching data. Wherein the second result comprises that the intention recognition result is null, namely: no search intent is identified from the search field.
According to a further embodiment of the present invention, calculating a third matching score for each piece of data in the fourth set of data using a second scoring model comprises: calculating a third matching score for each piece of data in the fourth data set based on the relevance of each piece of data in the fourth data set to the search field and the screening condition, and the key attribute information of each piece of data in the fourth data set using a second scoring model.
Specifically, in the case that the intention identification result is the second result, that is, the intention identification result is a null value, and no search intention is identified, according to the correlation matching between the data and the search field and the screening condition, the data meeting the third threshold is used as the fourth data set, and the third threshold in the embodiment of the present invention is 85% as the second threshold; if the number of the data pieces of the fourth data set is larger than or equal to a preset number threshold value, taking the fourth data set as second matching data; and obtaining the relevancy score of each piece of data with the search field and the screening condition by using a second scoring model for the data in the fourth data set, wherein the key attribute information of each piece of data comprises: the system comprises a platform browsing amount, a platform collection amount, an authentication state score for representing the passing of authentication by an authentication department, and a quality score and a value for representing the data use condition and the creation condition. Weighting and summing the attribute values, and performing anti-expansion compression processing and normalization processing through a natural logarithm and a normalization function, wherein a specific third matching score functional score calculation formula is as follows:
FunctionScore=k*G(score)+a*F(count pv )+b*F(count attention )+c*F(score value )+d*F(score quality )+e*isAuth
wherein the additional score function F (x) is as above, and the correlation score function G (score) is:
Figure BDA0003959190490000151
the weight coefficients k, a, b, c, d, e are preferably k =10, a =1, b =1, c =2, d =2, e =4, score represents the correlation value of the correlation match in the embodiment of the present invention, and score represents the correlation value of the correlation match max Maximum value, count, representing the degree of correlation in the second data set pv Represents the browsing amount, count, of the data on the platform attention Score, representing the number of collections of this piece of data by the user on the platform value Score, representing the value of this piece of data quality Representing the quality score of the data, and sorting the data in the fourth data set according to the third matching score; and obtaining the sequence of the second matching data, and displaying the sequenced second matching data on a search page.
According to still another embodiment of the present invention, in a case where the intention recognition result is a second result, determining second matching data according to the intention recognition result, calculating a matching score of the second matching data using a second scoring model, and sorting according to the matching score, further includes: determining a specified number of pieces of data from the data except the fourth data set as a fifth data set according to the correlation degree of the data with the search field and the screening condition under the condition that the number threshold is not met, wherein the specified number is the difference between the number threshold and the number of the pieces of data of the fourth data set, and the fifth data set is lower in sorting priority than the fourth data set; taking the fourth data set and the fifth data set as the second matching data; calculating a third matching score of each piece of data in the fourth data set by using a second scoring model, and sorting the data in the fourth data set according to the third matching score; sorting the data in the fifth data set according to the relevance of each piece of data in the fifth data set to the search field and the screening condition; and sorting the data in the sorted fourth data set and the data in the sorted fifth data set according to the order from the top to the bottom of the sorting priority, so as to sort the second matching data.
According to a further embodiment of the present invention, calculating a third matching score for each data in the fourth data set using a second scoring model comprises: calculating a third matching score for each piece of data in the fourth data set based on the relevance of each piece of data in the fourth data set to the search field and the screening condition, and the key attribute information of each piece of data in the fourth data set using a second scoring model.
Specifically, in the case that the intention identification result is the second result, that is, the intention identification result is null, and the number of data pieces in the fourth data set is less than the preset number threshold, a scheme of a warranty recall is used, and data determining the specified number of data pieces is selected as the fifth data set from data other than the fourth data set according to the correlation degree of the data pieces with the search field and the screening condition, wherein the correlation degree may be appropriately reduced on the basis of the second threshold, or data meeting any matching field in the search field and the screening condition may be selected as the fifth data; taking the fourth data set and the fifth data set as second matching data to meet the condition that the sum of the number of data in the fourth data set and the fifth data set is equal to a preset number threshold, and the sorting priority of the fifth data set is lower than that of the fourth data set according to the size relation of the correlation degree; the sorting of the data in the fourth data set by calculating the third matching score has been described above, and is not described again; sorting the data in the fifth data set according to the relevance of each piece of data in the fifth data set with the search field and the screening condition; and sequencing the data in the sequenced fourth data set and the data in the sequenced fifth data set according to the sequence from top to bottom of the sequencing priority to obtain a sequencing result of the second matching data, and displaying the sequencing result on a search interface.
Through the intention identification of the fuzzy matching search, the correlation degree matching rule of the search field and the screening condition and the score calculation of the scoring model, a data search scheme meeting the actual requirements of the user is realized, the optimal search result is arranged at the forefront end of the search result, the use experience of the user is improved, and the accuracy and the efficiency of the search are improved.
FIG. 2 is a flowchart illustrating a method for searching a table model according to an embodiment of the present invention, in which a user performs a search operation on a search interface of the table model, and the system verifies a user authority based on a search field and a screening condition input by the user in combination with an identity of the user, and opens a database corresponding to the user authority for searching; determining a data index position related to the table model in the ES; judging the accurate matching search and the fuzzy matching search according to the search field and the screening condition, if the accurate matching is met, executing the accurate matching search, determining first matching data, calculating matching scores by using a first scoring model, sorting according to the scores, and finally recalling according to the sorted results to obtain an accurate search result; if the fuzzy matching is met, executing fuzzy matching search, setting a threshold value of the number of data displayed on the home page, performing intention identification on the search result displayed on the home page, if the intention identification result is a first result, the first result in the embodiment may be a database name and a data table name, or may be only a data table name, in order to facilitate understanding, respectively sorting a first data set and a second data set in the first result, which are the intention identification results, calculating a first matching score by using a second scoring model, and sorting the first data set according to the first matching score; performing relevance matching of search fields and screening conditions to obtain a second data set, calculating a second matching score by using a second scoring model, sorting the second data set according to the second matching score, performing sequencing and de-duplication processing on the first data set and the second data set according to the sequencing priority to obtain second matching data after sorting, judging whether the number of the data pieces of the second matching data meets a preset piece number threshold value, and if so, returning the second matching data to a front-end search interface to display a fuzzy matching search result; if the preset threshold value is not met, executing a warranty recall, and obtaining a third data set with a specified number from the data except the first data set and the second data set according to the correlation degree matching of the search field and the screening condition, so that the sum of the number of the data of the first data set, the number of the data of the second data set and the number of the data of the third data set is equal to the preset number threshold value; if the intention identification result is not the first result or the second result, judging that the searching state is abnormal, returning a state abnormal message to the front end, and searching again; if the intention identification result is not the first result but the second result, namely a null value, obtaining a fourth data set according to the relevancy matching of the search field and the screening condition, wherein the relevancy matching rule is the same as the relevancy matching rule of the second data set, calculating a third matching score by using a second scoring model, sorting the fourth data set according to the third matching score, and the like, judging whether the number of the data pieces of the fourth data set meets a preset number threshold value, and if so, returning the sorting result of the fourth data set to a front-end search interface for displaying the fuzzy matching search result; if the number of the data pieces of the fourth data set does not meet the preset number threshold, executing the warranty recall in the same way, and obtaining a fifth data set with the appointed number from the data except the fourth data set according to the correlation matching of the search field and the screening condition, so that the sum of the number of the data pieces of the fourth data set and the fifth data set is equal to the preset number threshold; and for other search result pages which are not the first page, searching, scoring and sorting data by using the relevancy matching rules of the search fields and the screening conditions.
Fig. 3 is a schematic diagram of main blocks of an apparatus for data search according to an embodiment of the present invention. As shown in fig. 3, the data search apparatus 300 mainly includes a search mode determining module 301, a precise search module 302, and a fuzzy search module 303.
A search mode determining module 301, configured to determine a search mode according to the obtained search field and the screening condition, where if the value of the search field is null, the search mode is determined to be an accurate matching search, and otherwise, the search mode is determined to be a fuzzy matching search;
the accurate search module 302 is configured to, when the search mode is an accurate matching search, determine first matching data according to the screening condition, calculate a matching score of the first matching data by using a first scoring model, and rank according to the matching score to obtain an accurate search result;
and the fuzzy search module 303 is configured to, when the search mode is a fuzzy matching search, perform intention identification according to the search field and the screening condition, determine second matching data according to an intention identification result, calculate a matching score of the second matching data by using a second scoring model, and perform ranking according to the matching score to obtain a fuzzy search result.
According to an embodiment of the present invention, the fuzzy search module 303 may be further configured to: generating a wildcard expression based on historical behavior data of a user; and identifying the intention according to the wildcard expression, the search field and the screening condition.
According to still another embodiment of the present invention, in the case that the intention recognition result is the first result, the fuzzy search module 303 may be further configured to: determining a first data set meeting a first threshold according to the relevance of data to the screening condition and the intention identification result; determining a second data set meeting a second threshold according to the correlation degree of data with the search field and the screening condition, wherein the first threshold is greater than the second threshold, and the first data set is higher in sorting priority than the second data set; judging whether the sum of the number of the data pieces of the first data set and the second data set meets a piece number threshold value; if the number threshold is met, then taking the first data set and the second data set as the second matching data; calculating a first matching score of each piece of data in the first data set by using a second scoring model, and sorting the data in the first data set according to the first matching score; calculating a second matching score of each piece of data in the second data set by using a second scoring model, and sorting the data in the second data set according to the second matching score; and sequencing the data in the sequenced first data set and the data in the sequenced second data set according to the sequence of the sequencing priority from top to bottom so as to sequence the second matching data.
According to another embodiment of the present invention, in the case that the intention recognition result is the first result, the fuzzy search module 303 may be further configured to: determining a specified number of pieces of data as a third data set from the data other than the first data set and the second data set according to the correlation degree of the data with the search field and the screening condition if the number threshold is not satisfied, wherein the specified number is the difference between the number threshold and the sum of the number of pieces of data of the first data set and the second data set, and the third data set is lower in sorting priority than the second data set; taking the first data set, the second data set and the third data set as the second matching data; calculating a first matching score of each piece of data in the first data set by using a second scoring model, and sorting the data in the first data set according to the first matching score; calculating a second matching score of each piece of data in the second data set by using a second scoring model, and sorting the data in the second data set according to the second matching score; sorting the data in the third data set according to the relevance of each piece of data in the third data set to the search field and the screening condition; and sorting the data in the sorted first data set, the data in the sorted second data set and the data in the sorted third data set according to the sequence of the sorting priority from top to bottom so as to sort the second matching data.
According to another embodiment of the present invention, the fuzzy search module 303 is further configured to: calculating a first matching score of each piece of data in the first data set based on key attribute information of each piece of data in the first data set by using a second scoring model; calculating a second matching score for each piece of data in the second data set based on the relevance of each piece of data in the second data set to the search field and the screening condition, and the key attribute information of each piece of data in the second data set using a second scoring model.
According to still another embodiment of the present invention, in the case that the intention recognition result is the second result, the fuzzy search module 303 may be further configured to: determining a fourth data set meeting a third threshold according to the correlation degree of data with the search field and the screening condition; judging whether the number of the data pieces of the fourth data set meets a number threshold value; if the number threshold is met, taking the fourth data set as the second matching data; and calculating a third matching score of each piece of data in the fourth data set by using a second scoring model, and sorting the data in the fourth data set according to the third matching score so as to sort the second matching data.
According to another embodiment of the present invention, in the case that the intention recognition result is the second result, the fuzzy search module 303 may be further configured to: in the case that the number threshold value is not met, determining a specified number of pieces of data from the data except the fourth data set as a fifth data set according to the correlation degree of the data with the search field and the screening condition, wherein the specified number is the difference between the number threshold value and the number of the data in the fourth data set, and the fifth data set is lower in sorting priority than the fourth data set; taking the fourth data set and the fifth data set as the second matching data; calculating a third matching score of each piece of data in the fourth data set by using a second scoring model, and sorting the data in the fourth data set according to the third matching score; sorting the data in the fifth data set according to the relevance of each piece of data in the fifth data set to the search field and the screening condition; and sorting the data in the sorted fourth data set and the data in the sorted fifth data set according to the order from the top to the bottom of the sorting priority, so as to sort the second matching data.
According to another embodiment of the present invention, the fuzzy search module 303 is further configured to: calculating a third matching score for each piece of data in the fourth data set based on the relevance of each piece of data in the fourth data set to the search field and the screening condition, and the key attribute information of each piece of data in the fourth data set using a second scoring model.
According to still another embodiment of the present invention, the precise search module 302 may be further configured to: calculating a matching score of the first matching data based on key attribute information of the first matching data using a first scoring model.
According to a further embodiment of the invention, the first scoring model and the second scoring model constrain the matching score based on an anti-swelling function and a normalization function when calculating the matching score.
Fig. 4 is an exemplary system architecture diagram in which embodiments of the present invention may be applied.
As shown in fig. 4, the system architecture 400 may include terminal devices 401, 402, 403, a network 404, and a server 405. The network 404 serves as a medium for providing communication links between the terminal devices 401, 402, 403 and the server 405. Network 404 may include various types of connections, such as wire, wireless communication links, or fiber optic cables, to name a few.
A user may use terminal devices 401, 402, 403 to interact with a server 405 via a network 404 to receive or send messages or the like. The terminal devices 401, 402, 403 may have installed thereon various communication client applications, such as a data search application, etc. (by way of example only).
The terminal devices 401, 402, 403 may be various electronic devices having a display screen and supporting web browsing, including but not limited to smart phones, tablet computers, laptop portable computers, desktop computers, and the like.
The server 405 may be a server providing various services, such as a background management server (for example only) providing support for data search by users with the terminal devices 401, 402, 403. The background management server can determine a search mode according to the obtained search field and the screening condition, wherein if the value of the search field is null, the search mode is determined to be accurate matching search, and otherwise, the search mode is determined to be fuzzy matching search; under the condition that the searching mode is accurate matching searching, determining first matching data according to the screening condition, calculating matching scores of the first matching data by using a first scoring model, and sequencing according to the matching scores to obtain an accurate searching result; and under the condition that the search mode is fuzzy matching search, performing intention identification according to the search field and the screening condition, determining second matching data according to an intention identification result, calculating a matching score of the second matching data by using a second scoring model, sequencing according to the matching score to obtain fuzzy search results and the like, and feeding back a processing result (such as search result data and the like, which are only examples) to the terminal equipment.
It should be noted that the method for searching data provided by the embodiment of the present invention is generally executed by the server 405, and accordingly, the data searching apparatus is generally disposed in the server 405.
It should be understood that the number of terminal devices, networks, and servers in fig. 4 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for an implementation.
Referring now to FIG. 5, a block diagram of a computer system 500 suitable for use with a terminal device or server implementing an embodiment of the invention is shown. The terminal device or the server shown in fig. 5 is only an example, and should not bring any limitation to the functions and the use range of the embodiment of the present invention.
As shown in fig. 5, the computer system 500 includes a Central Processing Unit (CPU) 501 that can perform various appropriate actions and processes according to a program stored in a Read Only Memory (ROM) 502 or a program loaded from a storage section 508 into a Random Access Memory (RAM) 503. In the RAM 503, various programs and data necessary for the operation of the system 500 are also stored. The CPU 501, ROM 502, and RAM 503 are connected to each other through a bus 504. An input/output (I/O) interface 505 is also connected to bus 504.
The following components are connected to the I/O interface 505: an input portion 506 including a keyboard, a mouse, and the like; an output portion 507 including a display such as a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and the like, and a speaker; a storage portion 508 including a hard disk and the like; and a communication section 509 including a network interface card such as a LAN card, a modem, or the like. The communication section 509 performs communication processing via a network such as the internet. The driver 510 is also connected to the I/O interface 505 as necessary. A removable medium 511 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 510 as necessary, so that a computer program read out therefrom is mounted into the storage section 508 as necessary.
In particular, according to the embodiments of the present disclosure, the processes described above with reference to the flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer-readable medium, the computer program comprising program code for performing the method illustrated by the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network through the communication section 509, and/or installed from the removable medium 511. The computer program performs the above-described functions defined in the system of the present invention when executed by the Central Processing Unit (CPU) 501.
It should be noted that the computer readable medium shown in the present invention can be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present invention, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In the present invention, however, a computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wire, fiber optic cable, RF, etc., or any suitable combination of the foregoing.
The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams or flowchart illustration, and combinations of blocks in the block diagrams or flowchart illustration, can be implemented by special purpose hardware-based systems that perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The units described in the embodiments of the present invention may be implemented by software or hardware. The described units may also be provided in a processor, and may be described as: a processor comprising: the device comprises a searching mode determining module, an accurate searching module and a fuzzy searching module.
The names of these modules do not form a limitation on the module itself in some cases, for example, the search method determination module may also be described as a "module for determining a search method according to the acquired search field and the filter condition".
In another aspect, the present invention also provides a computer-readable medium, which may be contained in the apparatus described in the embodiments; or may be separate and not incorporated into the device. The computer readable medium carries one or more programs which, when executed by an apparatus, cause the apparatus to comprise: determining a search mode according to the obtained search field and the screening condition, wherein if the value of the search field is null, the search mode is determined to be accurate matching search, otherwise, the search mode is determined to be fuzzy matching search; under the condition that the searching mode is accurate matching searching, determining first matching data according to the screening condition, calculating matching scores of the first matching data by using a first scoring model, and sequencing according to the matching scores to obtain an accurate searching result; and under the condition that the search mode is fuzzy matching search, performing intention identification according to the search field and the screening condition, determining second matching data according to an intention identification result, calculating the matching score of the second matching data by using a second scoring model, and sequencing according to the matching score to obtain a fuzzy search result.
According to the technical scheme of the embodiment of the invention, the method has the following advantages or beneficial effects: determining a searching mode according to the obtained searching field and the screening condition, wherein if the value of the searching field is null, the searching mode is judged to be accurate matching searching, and otherwise, the searching mode is judged to be fuzzy matching searching; under the condition that the searching mode is accurate matching searching, determining first matching data according to screening conditions, calculating matching scores of the first matching data by using a first scoring model, and sequencing according to the matching scores to obtain an accurate searching result; under the condition that the search mode is fuzzy matching search, intention identification is carried out according to the search field and the screening condition, second matching data are determined according to the intention identification result, the matching score of the second matching data is calculated by using a second scoring model, and sorting is carried out according to the matching score, so that the technical scheme of the fuzzy search result is obtained, and the search mode is determined according to the search field and the screening condition; when the accurate matching search is judged, calculating a comprehensive matching score of the accurate matching result based on the first scoring model, and sequencing to obtain the accurate searching result; and when the fuzzy matching search is judged, calculating the comprehensive matching score of the fuzzy search result determined by the intention recognition result based on the second scoring model, and sequencing to obtain the fuzzy search result. According to the technical scheme, the search results meeting the user requirements are arranged at the forefront end through user intention identification, the use experience of the user is improved, the search accuracy and efficiency are improved, the comprehensive scoring model is adopted for sorting, the high-quality related search results can be returned, and the search result accuracy is improved.
The specific embodiments are not to be construed as limiting the scope of the invention. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may occur depending on design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims (13)

1. A method of data searching, comprising:
determining a search mode according to the obtained search field and the screening condition, wherein if the value of the search field is null, the search mode is determined to be accurate matching search, otherwise, the search mode is determined to be fuzzy matching search;
under the condition that the searching mode is accurate matching searching, determining first matching data according to the screening condition, calculating matching scores of the first matching data by using a first scoring model, and sequencing according to the matching scores to obtain an accurate searching result;
and under the condition that the search mode is fuzzy matching search, performing intention identification according to the search field and the screening condition, determining second matching data according to an intention identification result, calculating a matching score of the second matching data by using a second scoring model, and sequencing according to the matching score to obtain a fuzzy search result.
2. The method of claim 1, wherein performing intent recognition based on the search field and the filtering condition comprises:
generating a wildcard expression based on historical behavior data of a user;
and identifying the intention according to the wildcard expression, the search field and the screening condition.
3. The method of claim 1, wherein in the case that the intention recognition result is a first result, determining second matching data according to the intention recognition result, calculating a matching score of the second matching data using a second scoring model, and sorting according to the matching score comprises:
determining a first data set meeting a first threshold according to the relevance of data to the screening condition and the intention identification result;
determining a second data set meeting a second threshold according to the relevance of data to the search field and the screening condition, wherein the first threshold is greater than the second threshold, and the first data set is higher in sorting priority than the second data set;
judging whether the sum of the number of the data pieces of the first data set and the second data set meets a piece number threshold value;
if the number threshold is met, then taking the first data set and the second data set as the second matching data;
calculating a first matching score of each piece of data in the first data set by using a second scoring model, and sorting the data in the first data set according to the first matching score;
calculating a second matching score of each piece of data in the second data set by using a second scoring model, and sorting the data in the second data set according to the second matching score;
and sorting the data in the sorted first data set and the data in the sorted second data set according to the sequence of the sorting priority from top to bottom so as to sort the second matching data.
4. The method according to claim 3, wherein in a case where the intention recognition result is a first result, determining second matching data according to the intention recognition result, calculating a matching score of the second matching data using a second scoring model, and sorting according to the matching score, further comprising:
determining a specified number of pieces of data as a third data set from data other than the first data set and the second data set according to the correlation degree of the data with the search field and the screening condition if the number threshold is not satisfied, wherein the specified number is the difference between the number threshold and the sum value of the number of the pieces of data of the first data set and the second data set, and the third data set is lower in ranking priority than the second data set;
taking the first data set, the second data set and the third data set as the second matching data;
calculating a first matching score of each piece of data in the first data set by using a second scoring model, and sorting the data in the first data set according to the first matching score;
calculating a second matching score of each piece of data in the second data set by using a second scoring model, and sorting the data in the second data set according to the second matching score;
sorting the data in the third data set according to the relevance of each piece of data in the third data set to the search field and the screening condition;
and sequencing the data in the sequenced first data set, the data in the sequenced second data set and the data in the sequenced third data set according to the sequence of the sequencing priority from top to bottom so as to sequence the second matching data.
5. The method of claim 3 or 4, wherein calculating a first match score for each piece of data in the first set of data using a second scoring model comprises:
calculating a first matching score of each piece of data in the first data set based on the key attribute information of each piece of data in the first data set by using a second scoring model;
calculating a second match score for each piece of data in the second set of data using a second scoring model, comprising:
calculating a second matching score for each piece of data in the second data set based on the relevance of each piece of data in the second data set to the search field and the screening condition, and the key attribute information of each piece of data in the second data set using a second scoring model.
6. The method of claim 1, wherein in the case that the intention recognition result is a second result, determining second matching data according to the intention recognition result, calculating a matching score of the second matching data using a second scoring model, and sorting according to the matching score, comprises:
determining a fourth data set meeting a third threshold according to the correlation degree of data with the search field and the screening condition;
judging whether the number of the data pieces of the fourth data set meets a number threshold value;
if the number threshold is met, taking the fourth data set as the second matching data;
and calculating a third matching score of each piece of data in the fourth data set by using a second scoring model, and sorting the data in the fourth data set according to the third matching score so as to sort the second matching data.
7. The method according to claim 6, wherein in a case where the intention recognition result is a second result, determining second matching data according to the intention recognition result, calculating a matching score of the second matching data using a second scoring model, and sorting according to the matching score, further comprising:
in the case that the number threshold value is not met, determining a specified number of pieces of data from the data except the fourth data set as a fifth data set according to the correlation degree of the data with the search field and the screening condition, wherein the specified number is the difference between the number threshold value and the number of the data in the fourth data set, and the fifth data set is lower in sorting priority than the fourth data set;
taking the fourth data set and the fifth data set as the second matching data;
calculating a third matching score of each piece of data in the fourth data set by using a second scoring model, and sorting the data in the fourth data set according to the third matching score;
sorting the data in the fifth data set according to the relevance of each piece of data in the fifth data set to the search field and the screening condition;
and sorting the data in the sorted fourth data set and the data in the sorted fifth data set according to the sequence from the top to the bottom of the sorting priority, so as to sort the second matching data.
8. The method of claim 6 or 7, wherein calculating a third match score for each piece of data in the fourth set of data using a second scoring model comprises:
calculating a third matching score for each piece of data in the fourth data set based on the relevance of each piece of data in the fourth data set to the search field and the screening condition, and the key attribute information of each piece of data in the fourth data set using a second scoring model.
9. The method of claim 1, wherein calculating a match score for the first match data using a first scoring model comprises:
calculating a matching score of the first matching data based on key attribute information of the first matching data using a first scoring model.
10. The method of claim 1, wherein the first scoring model and the second scoring model constrain the match score based on an anti-expansion function and a normalization function when computing the match score.
11. A data search apparatus, comprising:
the search mode determining module is used for determining a search mode according to the obtained search field and the screening condition, wherein if the value of the search field is null, the search mode is judged to be accurate matching search, and otherwise, the search mode is judged to be fuzzy matching search;
the accurate searching module is used for determining first matching data according to the screening condition under the condition that the searching mode is accurate matching searching, calculating the matching score of the first matching data by using a first scoring model, and sequencing according to the matching score to obtain an accurate searching result;
and the fuzzy search module is used for performing intention identification according to the search field and the screening condition under the condition that the search mode is fuzzy matching search, determining second matching data according to an intention identification result, calculating a matching score of the second matching data by using a second scoring model, and sequencing according to the matching score to obtain a fuzzy search result.
12. A mobile electronic device terminal, comprising:
one or more processors;
a storage device to store one or more programs,
the one or more programs, when executed by the one or more processors, cause the one or more processors to implement the method recited in any of claims 1-10.
13. A computer-readable medium, on which a computer program is stored which, when being executed by a processor, carries out the method according to any one of claims 1-10.
CN202211474766.4A 2022-11-23 2022-11-23 Data searching method and device Pending CN115757546A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211474766.4A CN115757546A (en) 2022-11-23 2022-11-23 Data searching method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211474766.4A CN115757546A (en) 2022-11-23 2022-11-23 Data searching method and device

Publications (1)

Publication Number Publication Date
CN115757546A true CN115757546A (en) 2023-03-07

Family

ID=85336014

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211474766.4A Pending CN115757546A (en) 2022-11-23 2022-11-23 Data searching method and device

Country Status (1)

Country Link
CN (1) CN115757546A (en)

Similar Documents

Publication Publication Date Title
CN108304444B (en) Information query method and device
EP2336905A1 (en) A searching method and system
CN111797214A (en) FAQ database-based problem screening method and device, computer equipment and medium
CN112035599B (en) Query method and device based on vertical search, computer equipment and storage medium
CN110111167A (en) A kind of method and apparatus of determining recommended
CN110968789B (en) Electronic book pushing method, electronic equipment and computer storage medium
CN109558384B (en) Log classification method, device, electronic equipment and storage medium
CN111913954B (en) Intelligent data standard catalog generation method and device
CN110362601A (en) Mapping method, device, equipment and the storage medium of metadata standard
CN109885651B (en) Question pushing method and device
CN110909222A (en) User portrait establishing method, device, medium and electronic equipment based on clustering
CN114330329A (en) Service content searching method and device, electronic equipment and storage medium
CN114116997A (en) Knowledge question answering method, knowledge question answering device, electronic equipment and storage medium
CN110516062B (en) Method and device for searching and processing document
CN110569419A (en) question-answering system optimization method and device, computer equipment and storage medium
CN110245357B (en) Main entity identification method and device
CN115630144A (en) Document searching method and device and related equipment
CN109344232B (en) Public opinion information retrieval method and terminal equipment
CN104881447A (en) Searching method and device
CN111177372A (en) Scientific and technological achievement classification method, device, equipment and medium
WO2023151576A1 (en) Search recommendation method, search recommendation system, computer device and storage medium
CN109918420B (en) Competitor recommendation method and server
CN111831819B (en) Text updating method and device
CN111737607A (en) Data processing method, data processing device, electronic equipment and storage medium
CN111126034A (en) Medical variable relation processing method and device, computer medium and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination