CN111694929A - Data map-based searching method, intelligent terminal and readable storage medium - Google Patents

Data map-based searching method, intelligent terminal and readable storage medium Download PDF

Info

Publication number
CN111694929A
CN111694929A CN202010471335.7A CN202010471335A CN111694929A CN 111694929 A CN111694929 A CN 111694929A CN 202010471335 A CN202010471335 A CN 202010471335A CN 111694929 A CN111694929 A CN 111694929A
Authority
CN
China
Prior art keywords
content
user
path length
data
recall
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010471335.7A
Other languages
Chinese (zh)
Other versions
CN111694929B (en
Inventor
邹杰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ping An Technology Shenzhen Co Ltd
Original Assignee
Ping An Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ping An Technology Shenzhen Co Ltd filed Critical Ping An Technology Shenzhen Co Ltd
Priority to CN202010471335.7A priority Critical patent/CN111694929B/en
Priority to PCT/CN2020/098816 priority patent/WO2021139105A1/en
Publication of CN111694929A publication Critical patent/CN111694929A/en
Application granted granted Critical
Publication of CN111694929B publication Critical patent/CN111694929B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/3332Query translation
    • G06F16/3334Selection or weighting of terms from queries, including natural language queries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/338Presentation of query results
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/367Ontology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9536Search customisation based on social or collaborative filtering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9538Presentation of query results
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Animal Behavior & Ethology (AREA)
  • Artificial Intelligence (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention relates to the technical field of big data, and discloses a searching method based on a data map, which comprises the following steps: acquiring personal data and application data of all registered users, and constructing a data map; acquiring personal data and application data of all registered users in a database, and constructing a data map; acquiring input retrieval keywords, determining a current user and a corresponding identity, acquiring first historical click content of the current user, and acquiring second historical click content of other users similar to the current user from a data map; calculating a first path length of the first historical click content and a second path length of the second historical click content, and taking the minimum value of the first path length and the second path length as a sorting path length; the at least one recalled content is ranked according to a recall path length and a sort path length. The data-map-based search results for different users can be obtained, the working efficiency is effectively improved, and the space complexity is reduced.

Description

Data map-based searching method, intelligent terminal and readable storage medium
Technical Field
The invention relates to the technical field of big data, in particular to a searching method based on a data map, an intelligent terminal and a readable storage medium.
Background
The current search scheme is implemented based on the ES (elastic search) full-text search engine. After the contents of articles, functions, questions and answers and the like are subjected to ES word segmentation, an inverted index is established and is placed into a storage library. When a user searches, the input search keywords are subjected to the same word segmentation processing, are matched with the inverted index, and relevant contents are recalled from the storage library. The recalled content is then sorted according to the traffic weight and returned to the front end. The scheme does not consider user behavior information and does not realize search sequencing based on a data map.
The search ordering method based on the data map, which is commonly used in the industry at present, mainly comprises user collaborative filtering, content collaborative filtering, a latent semantic model, graph-based PageRank and the like. The time and space complexity of the methods is high, and the popularization difficulty is high in an environment with mass data.
Disclosure of Invention
Based on this, it is necessary to propose a data map-based search method, a smart terminal, and a readable storage medium to address the above problems.
A data-map-based search method, the method comprising: acquiring personal data and application data of all registered users in a database, and constructing a data map according to the personal data and the application data; acquiring an input search keyword, and determining a current user who inputs the search keyword and a corresponding identity; acquiring at least one recall content from the database according to the retrieval key words, calculating the content path length of the at least one recall content, and taking the minimum value of the content path length as the recall path length; acquiring first historical click content of the current user, and acquiring second historical click content of other users similar to the current user from the data map; calculating a first path length of the first historical click content and a second path length of a second historical click content, and taking the minimum value of the first path length and the second path length as a sorting path length; and sequencing the at least one recalling content according to the recalling path length and the sequencing path length to serve as a final search result.
The personal data comprises content clicked by the registered user, the number of the content clicked by the registered user, a label of the registered user, and a retrieval keyword input by the registered user; the application data comprises application content and hot content in the application content; the step of constructing a data map from the personal data and the application data comprises: taking the application content, the label of the registered user, the retrieval keyword and the registered user as nodes; and drawing the relation among the nodes.
Wherein the step of obtaining at least one recall content according to the search keyword comprises: recalling the first type of content through an open source search engine according to the retrieval key words; and/or obtaining a derivative word according to the retrieval keyword, and recalling a second type of content through an open source search engine according to the derivative word; and/or deriving according to the first type of content to obtain a third type of content; and/or deriving according to the second type of content to obtain a fourth type of content; and/or acquiring a fifth type of content of which the clicked times are greater than a first preset threshold.
Wherein the other users similar to the currently used user include: a first user having the same tag as the currently used user; and/or a second user who has clicked the same content as the currently using user; and/or a third user who has input the same search keyword as the currently used user.
Wherein the step of obtaining second historical click contents of other users similar to the currently used user from the data map comprises: respectively calculating second scores of paths from the first user and/or the second user and/or the third user to the current user, and acquiring the similarity between the first user and/or the second user and/or the third user and the current user according to the second scores; and if the similarity is greater than a second preset threshold, the first user and/or the second user and/or the third user are other users similar to the current user.
Wherein the step of obtaining the similarity between the first user and/or the second user and/or the third user and the current user according to the second score includes: and taking the average value of the second scores as the similarity of the first user and/or the second user and/or the third user and the current user.
Wherein the step of ranking the at least one recall content according to the recall pathlength and the ranked pathlength comprises: calculating a first score for the at least one recalled content as a function of the recall path length and the ranked path length; arranging the at least one recalled content in descending order according to the first score as a final search result; further comprising uploading the final search result into a blockchain.
An intelligent terminal, comprising: the map module is used for acquiring personal data and application data of all registered users in a database and constructing a data map according to the personal data and the application data; the acquisition module is used for acquiring the input search keywords and determining the current user who inputs the search keywords and the corresponding identity; the recall module is used for acquiring at least one recall content from the database according to the retrieval key words, calculating the content path length of the at least one recall content and taking the minimum value of the content path length as the recall path length; the history module is used for acquiring first history click content of the current user and acquiring second history click content of other users similar to the current user from the data map; the sorting module is used for calculating a first path length of the first historical click content and a second path length of the second historical click content, and taking the minimum value of the first path length and the second path length as a sorting path length; and the result module is used for arranging the at least one recall content according to the recall path length and the sorting path length to serve as a final search result.
An intelligent terminal comprising a memory and a processor, the memory storing a computer program which, when executed by the processor, causes the processor to perform the steps of the method as described above.
A readable storage medium storing a computer program which, when executed by a processor, causes the processor to perform the steps of the method as described above.
The invention has the following beneficial effects:
the method comprises the steps of constructing a data map according to personal data of a registered user of the application program and application data of the application program, obtaining similarity between the registered user and a current user according to the data map, calculating scores of each recalled content according to the correlation between retrieval keywords and the correlation between the contents, and sequencing according to the scores, so that personalized search results of different users can be obtained, the working efficiency can be effectively improved, and the space complexity can be reduced.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.
Wherein:
FIG. 1 is a schematic flow chart of a first embodiment of a data map-based search method provided by the present invention;
FIG. 2 is a schematic representation of a data map provided by the present invention;
FIG. 3 is a flowchart illustrating an embodiment of a method for retrieving recall path length in a data-map-based search method according to the present invention;
FIG. 4 is a flowchart illustrating an embodiment of a method for obtaining a length of a ranking path in the data-map-based search method according to the present invention;
FIG. 5 is a flowchart illustrating an embodiment of a method for obtaining other users similar to a currently used user in the data-map-based searching method according to the present invention;
FIG. 6 is a flow chart of a second embodiment of the data-map-based searching method provided by the invention;
fig. 7 is a schematic structural diagram of a first embodiment of the intelligent terminal provided by the invention;
fig. 8 is a schematic structural diagram of a second embodiment of the intelligent terminal provided by the invention;
fig. 9 is a schematic structural diagram of an embodiment of a readable storage medium provided by the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The current search scheme is implemented based on the ES (elastic search) full-text search engine. The scheme does not consider user behavior information and does not realize search sequencing based on a data map. The time and space complexity of the search sequencing method based on the data map, which is commonly used in the industry at present, is very high, and the popularization difficulty is high in an environment with mass data.
In this embodiment, in order to solve the above problem, a pressure control method for a mixing chamber is provided, which can obtain search results based on data maps for different users, and can effectively improve work efficiency and reduce space complexity.
Referring to fig. 1, fig. 1 is a schematic flow chart of a first embodiment of a data-map-based search method according to the present invention. The searching method based on the data map provided by the invention comprises the following steps:
s101: acquiring personal data and application data of all registered users in a database, and constructing a data map according to the personal data and the application data.
In a specific implementation scenario, personal data of all registered users and application data of the application program in the database are obtained. The data in the database corresponds to an application. With reference to fig. 2, fig. 2 is a schematic diagram of a data map constructed according to personal data and application data. In this embodiment, the map is a "user-search keyword-content (UKC)" map. Personal data and application data can be collected and quickly stored in a neo4j database through a streaming data processing framework such as storm or spark streaming, and a user-search keyword-content (UKC) graph is constructed.
The personal data of the registered User includes the registered User (identified as User in fig. 2), including the ID (Identity document) of the registered User and the out-degree of the node of the registered User, i.e. the number of contents clicked by the registered User; the user image group provides the label (identified as Keyword in fig. 2) for the registered user, the search Keyword (identified as Keyword in fig. 2) input by the registered user, and the personal data of the registered user further includes the contents clicked by the registered user. The application data includes application content (identified in fig. 2 as hotcontent), including content identification and content type; the HOT content of the application (identified as IS _ HOT in fig. 2).
In the implementation scenario, the application content (HotConten), the label (Keyword) of the registered User, the search Keyword (Keyword) and the registered User (User) are used as nodes, and the relationship among the nodes is drawn to construct the data map.
With continued reference to fig. 2, the relationships in the data map are shown in the following table.
Figure BDA0002514386810000061
It should be noted that the IS _ HOT relationship IS not shown in fig. 2, because in neo4j, the label can be attached to the Content node UKC _ KEYWORD _ CONVERSION _ TIMES and UKC _ Content _ CONVERSION _ TIMES, which are intermediate relationships of the calculated ratio and are not shown in fig. 2.
S102: and acquiring the input search keyword, and determining the current user and the corresponding identity of the input search keyword.
In this implementation scenario, the search keyword and the identity of the currently used user who currently uses the application program are obtained, and the method for obtaining the search keyword and the identity may adopt the prior art, and is not described herein again.
S103: and acquiring at least one recall content from the database according to the retrieval key words, calculating the content path length of the at least one recall content, and taking the minimum value of the content path length as the recall path length.
In this implementation scenario, at least one kind of recall content is obtained according to the search keyword, for example, the recall content may be obtained by recalling through an open source search engine, such as es (elastic search), according to the search keyword, and a kind of recall content may also be obtained according to the habit of currently using the click content of the user, and a kind of recall content may also be obtained according to the habit of inputting the click content of other registered users with the same search keyword, and so on. And respectively calculating the content path length of each type of recalled content, and taking the minimum value of the content path length as the recall path length. The shorter the content path length, the more the recalled content matches the needs of the currently using user.
S104: and acquiring first historical click content of the current user, and acquiring second historical click content of other users similar to the current user from the data map.
In the implementation scenario, a clicked first historical click content of a current USER is obtained, other USERs similar to the current USER are obtained according to the data map, specifically, values of all registered USERs and the current USER _ SIM in the data map can be calculated, and when the USER _ SIM meets a preset condition, the registered USER corresponding to the USER _ SIM is the other USERs similar to the current USER, and second historical click contents of the other USERs are obtained.
The obtaining of the first historical click content and the second historical click content may be implemented by a data map or by using the prior art, which is not limited herein.
S105: and calculating a first path length of the first historical click content and a second path length of the second historical click content, and taking the minimum value of the first path length and the second path length as the sorting path length.
In this implementation scenario, a first path length of the first historical click content and a second path length of the second historical click content are calculated, and the minimum value of the first path length and the second path length is used as the sorting path length.
S106: and arranging at least one recall content according to the recall path length and the sorting path length to serve as a final search result.
In this implementation scenario, the ranking scores of the at least one type of recall content obtained in step S103 are respectively calculated according to the recall path length and the ranking path length, and the higher the ranking score is, the more the recall content matches the search requirement of the current user, the at least one type of recall content is sorted in a descending order according to their respective ranking scores as the final search result.
As can be seen from the above description, in this embodiment, a data map is constructed according to personal data of a registered user of an application and application data of the application, other users similar to a currently-used user are obtained from the data map, first historical click content of the currently-used user and second historical click content of the other users are obtained, a minimum value of a first path length of the first historical click content and a second path length of the second historical click content is used as a sorting path length, at least one recall content is obtained according to a search keyword input by the currently-used user, a content path length of the at least one recall content is calculated, the minimum value of the content path length is used as a recall path length, the at least one recall content is arranged according to the recall path length and the sorting path length, and as a final search result, a data map-based search result for different users can be obtained, the working efficiency can be effectively improved, and the space complexity is reduced.
Referring to fig. 3, fig. 3 is a flowchart illustrating an embodiment of a method for obtaining a recall path length in a data-map-based search method according to the present invention. The method for acquiring the length of the recall path comprises the following steps:
s201: and the first type of content is recalled through the open source search engine according to the retrieval key words.
In the present implementation scenario, the first type of content is recalled by searching through an open source search engine, such as es (elastic search), according to the search keyword currently input by the user and according to the search keyword. The ElasticSearch is a Lucene-based search server. It provides a distributed multi-user capable full-text search engine based on restful web interface. The Elasticsearch was developed in the Java language and published as open source under the Apache licensing terms, a popular enterprise level search engine.
S202: and acquiring the derived word according to the retrieval key word, and recalling the second type of content through the open source search engine according to the derived word.
In the implementation scenario, the derivative word is derived according to the search keyword, the search records of the registered user can be counted, other search keywords which can be searched by the registered user when the search keyword is searched are obtained, and the derivative word is obtained according to the other search keywords. For example, the other search keywords are ranked according to the statistical search frequency or the statistical search frequency, and the previous one or several previous other search keywords are selected as the derivative words. The second type of content is recalled by an open source search engine, such as an ES search, in accordance with the derived terms.
S203: and deriving according to the first type of content to obtain a third type of content.
In this implementation scenario, a third type of content is derived from the first type of content. The click records of the registered users can be counted, and the contents frequently clicked by the registered users clicking the first type of contents can be obtained, wherein the contents are the third type of contents. For example, registered users who click on the first type of content may be acquired, content frequently clicked by the registered users may be acquired, the content may be screened by a preset screening condition, content whose interval between the time of clicking the content and the time of clicking the first type of content is smaller than a preset threshold value is selected, or content whose click number is higher than a preset threshold value is selected, and the like. And taking the screened content as a third type of content.
S204: and deriving according to the second type of content to obtain a fourth type of content.
In the present implementation scenario, similar to step S203, a fourth type of content is derived from the second type of content. The click records of the registered users can be counted, and the contents frequently clicked by the registered users clicking the second type of contents can be obtained, wherein the contents are the fourth type of contents. For example, registered users who click on the second type of content may be obtained, content frequently clicked by the registered users may be obtained, the content may be screened by a preset screening condition, content whose interval between the time of clicking the content and the time of clicking the second type of content is smaller than a preset threshold value is selected, or content whose number of clicks is higher than a preset threshold value is selected, and the like. And taking the screened content as the fourth type content.
S205: and acquiring a fifth type of content of which the clicked times are greater than a first preset threshold value.
In the implementation scenario, click records of registered users are obtained, and fifth types of contents with the clicked times larger than a first preset threshold are counted, wherein the fifth types of contents are popular contents in the application program.
It should be noted that, in this embodiment, steps S201 to S205 may be executed simultaneously or sequentially, the execution order is not limited, and only one or several of the steps may be executed.
S206: and calculating the content path length of the at least one recalling content.
In the implementation scenario, the first-type recall content, the second-type recall content, the third-type recall content, the fourth-type recall content and the fifth-type recall content are obtained according to the search keywords, and the content path lengths of the first-type recall content, the second-type recall content, the third-type recall content, the fourth-type recall content and the fifth-type recall content are respectively calculated.
Specifically, the content path length of the first type of recalled content is calculated according to equation (1):
Figure BDA0002514386810000101
wherein, pl1(k,c1) The content path length of the first type of recalled content, k is a search keyword, c1For recall content of the first category, r1For searching key word k and first-class recall content c1The relationship of (a) to (b), r1Cr represents the relationship r1The conversion of (a).
Calculating a content path length for the second type of recalled content according to equation (2):
Figure BDA0002514386810000102
wherein, pl2(k,c2) The content path length of the second type of recalled content, k is a search keyword, c2For recall of content of the second type, r1For searching key word k and first-class recall content c1The relationship of (a) to (b), r2For searching key word k and second type recall content c2The relationship of (a) to (b), r1Cr represents the relationship r1Conversion of r2Cr represents the relationship r2The conversion of (a).
Calculating the content path length of the third type of recalled content according to formula (3):
Figure BDA0002514386810000103
wherein, pl3(k,c3) The content path length of the third type of recalled content, k is a search keyword, c3For recall of content of the third category, r1For searching key word k and first-class recall content c1The relationship of (a) to (b), r2For searching key word k and second type recall content c2The relationship of (a) to (b), r1Cr represents the relationship r1Conversion of r2Cr represents the relationship r2The conversion of (a).
Calculating the content path length of the fourth type of recalled content according to formula (4):
Figure BDA0002514386810000104
wherein, pl4(k,c4) The content path length of the fourth type of recalled content, k is a search keyword, c4For recall content of the fourth category, r1For searching key word k and first-class recall content c1The relationship of (a) to (b), r2For searching key word k and second type recall content c2The relationship of (a) to (b), r3For searching key word k and third type recall content c3The relationship of (a) to (b), r1Cr represents the relationship r1Conversion of r2Cr represents the relationship r2Conversion of r3Cr represents the relationship r3The conversion of (a).
Calculating the content path length pl of the fifth type of recalled content according to the data map5(k,c5)。
S207: the minimum value of the content path length is taken as the recall path length.
In this implementation scenario, the recall path length is calculated according to equation (5).
pl(k,c)=min(pl1(k,c1),pl2(k,c2),pl3(k,c3),pl4(k,c4),pl5(k,c5)) (5)
Where pl (k, c) is the recall path length, pl1(k,c1) Content Path Length, pl, for first type of recalled content2(k,c2) Content Path Length, pl, for the second type of recalled content3(k,c3) Content Path Length, pl, for third class of recalled content4(k,c4) Content Path Length, pl, for fourth type of recalled content5(k,c5) The content path length for the fifth type of recalled content.
As can be seen from the above description, in this embodiment, a first type of content recalled by an open source search engine according to a search keyword, a derivative word is obtained according to the search keyword, a second type of content recalled by the open source search engine according to the derivative word, a third type of content is derived according to the first type of content, a fourth type of content is derived according to the second type of content, a fifth type of content whose clicked time is greater than a first preset threshold value is obtained, a content path length of the at least one type of recall content is calculated, and a minimum value of the content path length is used as the recall path length, so that content required by a currently used user can be obtained.
Referring to fig. 4, fig. 4 is a flowchart illustrating an embodiment of a method for obtaining a sorting path length in a data map-based search method according to the present invention. The method for acquiring the length of the sequencing path comprises the following steps:
s301: and acquiring first historical click content of the current user, and acquiring second historical click content of other users similar to the current user from the data map.
In a specific implementation scenario, an identity of a currently used user is obtained, and first historical click content corresponding to the identity is obtained. The obtaining of the first historical click content of the currently used user can be realized by the prior art, and details are not repeated here.
According to the data map in step S101 in the first embodiment of the data map-based search method provided by the present invention, other users similar to the currently used user are acquired. Specifically, the registered user whose path length meets the preset condition with the currently used user can be obtained as another user similar to the currently used user according to the data map. For example, a first user having the same tag as the currently used user may be obtained; and/or a second user who has clicked the same content as the currently using user; and/or a third user who has input the same search keyword as the current user as other users similar to the current user. Further, the first user, the second user and the third user can be screened according to a preset rule, and the screening result is used as other users similar to the current user.
S302: a first path length of the first historical click content and a second path length of the second historical click content are calculated.
In this implementation scenario, the first path length is calculated according to equation (6):
Figure BDA0002514386810000121
wherein u is1For the currently used user, c6For the first historical click content, sl1(u1,c6) Is a first path length.
Calculating a second path length according to equation (7):
Figure BDA0002514386810000122
wherein u is2For other users similar to the currently used user, c7For the second history click content, r.sim represents the user similarity of other users and the current user, sl2(u2,c7) Is the second path length.
In this implementation scenario, when other users similar to the currently used user are acquired in step S302, the user similarity between the other users and the currently used user may be calculated.
S303: and taking the minimum value of the first path length and the second path length as the sorting path length.
In this implementation scenario, the sort path length is calculated according to equation (8):
sl(u,c)=min(sl1(u1,c6),sl2(u2,c7)) (8)
wherein sl (u, c) is the length of the sequencing path, sl1(u1,c6) For a first path length, sl2(u2,c7) Is the second path length.
As can be seen from the above description, in this embodiment, second historical click content of another user similar to the currently used user is obtained from the data map, the first path length of the first historical click content of the currently used user and the second path length of the second historical click content are calculated, and the minimum value is used as the sorting path length, so that the usage preference of the currently used user can be obtained, which is beneficial to obtaining the search result of the user based on the data map.
Referring to fig. 5, fig. 5 is a flowchart illustrating an embodiment of a method for obtaining other users similar to a currently used user in the data-map-based searching method according to the present invention. The method for acquiring other users similar to the currently used user comprises the following steps:
s401: acquiring a first user with the same label as a current user; and/or a second user who has clicked the same content as the currently using user; and/or a third user who has entered the same search keyword as the currently used user.
In a specific implementation scenario, a first user having the same label as a currently used user, a second user having the same content as the currently used user clicked, and a third user having the same search keyword as the currently used user input are obtained according to the data map. As above, the data map is constructed based on personal data of the registered user and application data of the application, the personal data including contents clicked by the registered user, the number of contents clicked by the registered user, a tag of the registered user, and a search keyword input by the registered user, and thus, the first user, the second user, and the third user can be acquired based on the data map.
S402: and respectively calculating a second score of the path from the first user and/or the second user and/or the third user to the current user, and acquiring the similarity between the first user and/or the second user and/or the third user and the current user according to the second score.
In this implementation scenario, a path from a first user to a currently used user is first acquired: u → Label ← v, path of the second user to the currently using user: u → Content ← v, and the path of the third user to the currently using user: u → Keyword ← v.
And then constructing a function to calculate second scores of paths from the first user, the second user and the third user to the current user, wherein the function needs to meet the following conditions:
i. when the number of the co-clicked contents is 0, the user similarity is 0;
when the number of the co-clicked contents is 1 or 2, the similarity difference of the users is very small;
when the content clicked by the user is 10 or 100, the similarity of the users is very different;
when the number of the co-clicked contents is 10 ten thousand or 100 ten thousand, the similarity of the users is not greatly different;
v. when the number of co-clicked contents is very large, the user similarity is guaranteed to approach 1.
In summary, the function is S-shaped.
In this implementation scenario, the paths from the first user, the second user, and the third user to the currently used user are calculated according to formula (9): path u → (M) → v:
Figure BDA0002514386810000141
f (u, v, M) is the second score, u is the current user, v is one of the first user, the second user, and the third user, M is the same data type of the first user, the second user, and the third user as the current user, for example, Label, Content, or Keyword, and m.id represents the degree of entry of the node M.
In the present embodiment, the second score is used as the similarity.
S403: and if the similarity is greater than a second preset threshold, the first user and/or the second user and/or the third user are other users similar to the currently used user.
In this implementation scenario, the similarity of the first user, the second user, and the third user is obtained through step S402, and it is determined whether the similarity of the first user, the second user, and the third user is greater than a second preset threshold, where the first user, the second user, or the third user with the similarity greater than the second preset threshold are other users with the currently used user.
As can be seen from the above description, in this embodiment, a first user having the same label as a currently used user, a second user having the same content as the currently used user and clicked, and a third user having the same search keyword as the currently used user are obtained according to the data map, second scores of paths from the first user and/or the second user and/or the third user to the currently used user are respectively calculated, and a similarity between the users is obtained according to the second scores, so that other users similar to the current user can be accurately and quickly obtained, and the work efficiency is improved.
Referring to fig. 6, fig. 6 is a schematic flow chart of a data-map-based search method according to a second embodiment of the present invention. The searching method based on the data map provided by the invention comprises the following steps:
s501: acquiring personal data and application data of all registered users in a database, and constructing a data map according to the personal data and the application data.
S502: and acquiring the input search keyword, and determining the current user and the corresponding identity of the input search keyword.
In this implementation scenario, steps S501 to S502 are substantially the same as steps S101 to S102 in the first embodiment of the data map-based search method provided by the present invention, and are not described herein again.
S503: and the first type of content is recalled through the open source search engine according to the retrieval key words.
S504: and acquiring the derived word according to the retrieval key word, and recalling the second type of content through the open source search engine according to the derived word.
S505: and deriving according to the first type of content to obtain a third type of content.
S506: and deriving according to the second type of content to obtain a fourth type of content.
S507: and acquiring a fifth type of content of which the clicked times are greater than a first preset threshold value.
S508: and calculating the content path length of the at least one recalling content.
S509: the minimum value of the content path length is taken as the recall path length.
In this implementation scenario, steps S503 to S509 are substantially the same as steps S201 to S207 in an embodiment of the method for acquiring the recall path length in the data map-based search method provided by the present invention, and are not described herein again.
S510: acquiring a first user with the same label as a current user; and/or a second user who has clicked the same content as the currently using user; and/or a third user who has entered the same search keyword as the currently used user.
S511: and respectively calculating a second score of the path from the first user and/or the second user and/or the third user to the current user, and acquiring the similarity between the first user and/or the second user and/or the third user and the current user according to the second score.
S512: and if the similarity is greater than a second preset threshold, the first user and/or the second user and/or the third user are other users similar to the currently used user.
In this implementation scenario, steps S510 to S512 are substantially the same as steps S401 to S403 in an embodiment of the method for acquiring other users similar to the currently used user in the data map-based search method provided by the present invention, and are not described herein again.
S513: and acquiring first historical click content of the current user, and acquiring second historical click content of other users similar to the current user from the data map.
S514: a first path length of the first historical click content and a second path length of the second historical click content are calculated.
S515: and taking the minimum value of the first path length and the second path length as the sorting path length.
In this implementation scenario, steps S513 to S515 are substantially the same as steps S401 to S403 in an embodiment of the method for acquiring the length of the sorting path in the data map-based search method provided by the present invention, and are not described herein again.
S516: a first score for the at least one recalled content is calculated based on the recall path length and the ranked path length.
In this implementation scenario, a first score of at least one recalled content is calculated according to equation (10):
Figure BDA0002514386810000161
where rl (u, k, c) is the first score, pl (k, c) is the recall path length of the type of recall content, and sl (u, c) is the sort path length.
As can be seen from equation (10), the higher the first score, the lower the match with the search requirements of the currently using user.
S517: and arranging the at least one recalling content according to the first scores in descending order as a final search result.
In this implementation scenario, according to the first score calculated in step S516, the at least one kind of recall content is sorted in a descending order to serve as a final search result, so that the sorting of the at least one kind of recall content corresponds to the matching degree of the search requirement of the currently used user, thereby realizing personalized sorting for the currently used user, satisfying the use requirement of the user, and improving the use experience of the user.
S518: and uploading the final search result to the block chain.
In this implementation scenario, the corresponding digest information is obtained based on the final search result, and specifically, the digest information is obtained by performing hash processing on the final search result, for example, using the sha256s algorithm. Uploading summary information to the blockchain can ensure the safety and the fair transparency of the user. The user equipment may download the summary information from the blockchain to verify whether the final search result is tampered.
The blockchain referred to in this embodiment is a novel application mode of computer technologies such as distributed data storage, point-to-point transmission, consensus mechanism, and encryption algorithm. A block chain (Blockchain), which is essentially a decentralized database, is a series of data blocks associated by using a cryptographic method, and each data block contains information of a batch of network transactions, so as to verify the validity (anti-counterfeiting) of the information and generate a next block. The blockchain may include a blockchain underlying platform, a platform product service layer, an application service layer, and the like.
As can be seen from the above description, in this embodiment, the first score of at least one type of recall content is calculated according to the recall path length and the sorting path length, and the at least one type of recall content is sorted according to the first score in a descending order, so as to serve as a final search result, thereby realizing personalized sorting for a currently used user, effectively improving user experience, and uploading the final search result to a block chain, which can ensure security and fair transparency to the user.
Referring to fig. 7, fig. 7 is a schematic structural diagram of an intelligent terminal according to a first embodiment of the present invention. The intelligent terminal 10 includes: the system comprises a map module 11, an acquisition module 12, a recall module 13, a history module 14, a sequencing module 15 and a result module 16.
The map module 11 is configured to obtain personal data and application data of all registered users in the database, and construct a data map according to the personal data and the application data. The obtaining module 12 is configured to obtain an input search keyword, and determine a currently used user and a corresponding identity of the input search keyword. The recall module 13 is configured to obtain at least one type of recall content from the database according to the search keyword, calculate a content path length of the at least one type of recall content, and use a minimum value of the content path length as the recall path length. The history module 14 is used for obtaining a first history click content of the current using user and obtaining a second history click content of other users similar to the current using user from the data map. The sorting module 15 is configured to calculate a first path length of the first historical click content and a second path length of the second historical click content, and use a minimum value of the first path length and the second path length as a sorting path length. The result module 16 is configured to rank the at least one recalled content according to the recall path length and the sort path length as a final search result.
The personal data comprises content clicked by a registered user, the number of the content clicked by the registered user, labels of the registered user and retrieval keywords input by the registered user; the application data comprises application contents of the application program and hot contents in the application contents.
The map module 11 is further configured to take the application content, the label of the registered user, the search keyword, and the registered user as nodes; and drawing the relation between the nodes.
The recall module 13 is further configured to recall the first type of content through an open source search engine according to the retrieval key word; and/or obtaining a derived word according to the retrieval key word, and recalling the second type of content through an open source search engine according to the derived word; and/or deriving according to the first type of content to obtain a third type of content; and/or deriving according to the second type of content to obtain a fourth type of content; and/or acquiring a fifth type of content of which the clicked times are greater than a first preset threshold.
Wherein, other users similar to the currently used user include: a first user having the same tag as a currently used user; and/or a second user who has clicked the same content as the currently using user; and/or a third user who has entered the same search keyword as the currently used user.
The history module 14 is further configured to calculate a second score of a path from the first user and/or the second user and/or the third user to the currently used user, respectively, and obtain a similarity between the first user and/or the second user and/or the third user and the currently used user according to the second score; and if the similarity is greater than a second preset threshold, the first user and/or the second user and/or the third user are other users similar to the currently used user.
The history module 14 is further configured to use the average of the second scores as the similarity between the first user and/or the second user and/or the third user and the current user.
The result module 16 is further configured to calculate a first score of at least one recalled content according to the recall path length and the ranking path length; and arranging the at least one type of recall content in descending order according to the first scores to serve as a final search result, and uploading the final search result to the block chain.
As can be seen from the above description, in this embodiment, the intelligent terminal constructs a data map according to the personal data of the registered user of the application program and the application data of the application program, acquires the similarity between the registered user and the current user according to the data map, calculates the score of each recalled content according to the correlation between the search keywords and the correlation between the contents, and performs ranking according to the scores, so that the search results based on the data map for different users can be obtained, thereby effectively improving the work efficiency and reducing the space complexity.
Referring to fig. 8, fig. 8 is a schematic structural diagram of an intelligent terminal according to a second embodiment of the present invention. The intelligent terminal 20 includes a processor 21 and a memory 22. The processor 21 is coupled to a memory 22. The memory 22 has stored therein a computer program which is executed by the processor 21 in operation to implement the method as shown in fig. 1-2. The detailed methods can be referred to above and are not described herein.
As can be seen from the above description, in this embodiment, the intelligent terminal constructs a data map according to the personal data of the registered user of the application program and the application data of the application program, acquires the similarity between the registered user and the current user according to the data map, calculates the score of each recalled content according to the correlation between the search keywords and the correlation between the contents, and performs ranking according to the scores, so that the search results based on the data map for different users can be obtained, thereby effectively improving the work efficiency and reducing the space complexity.
Referring to fig. 9, fig. 9 is a schematic structural diagram of a readable storage medium according to an embodiment of the present invention. The readable storage medium 30 stores at least one computer program 31, and the computer program 31 is used for being executed by a processor to implement the method shown in fig. 1-2, and the detailed method can be referred to above and is not described herein again. In one embodiment, the readable storage medium 30 may be a memory chip in a terminal, a hard disk, or other readable and writable storage tool such as a mobile hard disk or a flash drive, an optical disk, or the like, and may also be a server or the like.
As can be seen from the above description, in this embodiment, the computer program in the readable storage medium may be configured to construct a data graph according to personal data of a registered user of the application program and application data of the application program, acquire a similarity between the registered user and a currently used user according to the data graph, calculate a score of each recalled content by combining a correlation between search keywords and a correlation between contents, and sort according to the scores, so as to obtain a search result based on the data graph for different users, thereby effectively improving work efficiency and reducing space complexity.
Different from the prior art, the method calculates the similarity among users, the correlation among retrieval keywords and the correlation among contents based on user portrait data, user search data and user click data, and realizes the search of thousands of people based on the data map by utilizing the fast path matching of neo4 j. Meanwhile, the time and space complexity of the scheme is low, and the method is suitable for large-scale deployment in a production environment.
The technical features of the above embodiments can be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the above embodiments are not described, but should be considered as the scope of the present specification as long as there is no contradiction between the combinations of the technical features.
The above-mentioned embodiments only express several embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the scope of the present application. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, which falls within the scope of protection of the present application. Therefore, the protection scope of the present patent shall be subject to the appended claims. Please enter the implementation content part.

Claims (10)

1. A data map-based search method, the method comprising:
acquiring personal data and application data of all registered users in a database, and constructing a data map according to the personal data and the application data;
acquiring an input search keyword, and determining a current user who inputs the search keyword and a corresponding identity;
acquiring at least one recall content from the database according to the retrieval key words, calculating the content path length of the at least one recall content, and taking the minimum value of the content path length as the recall path length;
acquiring first historical click content of the current user, and acquiring second historical click content of other users similar to the current user from the data map;
calculating a first path length of the first historical click content and a second path length of a second historical click content, and taking the minimum value of the first path length and the second path length as a sorting path length;
and sequencing the at least one recalling content according to the recalling path length and the sequencing path length to serve as a final search result.
2. The data-graph-based searching method of claim 1, wherein the personal data comprises contents clicked by the registered user, the number of contents clicked by the registered user, a tag of the registered user, and a retrieval keyword inputted by the registered user; the application data comprises application content and hot content in the application content;
the step of constructing a data map from the personal data and the application data comprises:
taking the application content, the label of the registered user, the retrieval keyword and the registered user as nodes;
and drawing the relation among the nodes.
3. The data-graph-based search method according to claim 2, wherein the step of retrieving at least one recalled content from the database according to the retrieval keyword comprises:
recalling the first type of content through an open source search engine according to the retrieval key words; and/or
Obtaining a derived word according to the retrieval key word, and recalling second-class content through an open source search engine according to the derived word; and/or
Deriving according to the first type of content to obtain a third type of content; and/or
Deriving according to the second type of content to obtain a fourth type of content; and/or
And acquiring a fifth type of content of which the clicked times are greater than a first preset threshold value.
4. A data-graph-based search method according to claim 2, wherein said other users similar to the currently used user comprise:
a first user having the same tag as the currently used user; and/or
A second user who has clicked the same content as the currently used user; and/or
And a third user who inputs the same search keyword as the current user.
5. The data-graph-based searching method according to claim 4, wherein the step of obtaining second historical click contents of other users similar to the currently-used user from the data graph comprises:
respectively calculating second scores of paths from the first user and/or the second user and/or the third user to the current user, and acquiring the similarity between the first user and/or the second user and/or the third user and the current user according to the second scores;
and if the similarity is greater than a second preset threshold, the first user and/or the second user and/or the third user are other users similar to the current user.
6. The data-graph-based searching method according to claim 5, wherein the step of obtaining the similarity between the first user and/or the second user and/or the third user and the current user according to the second score comprises:
and taking the average value of the second scores as the similarity of the first user and/or the second user and/or the third user and the current user.
7. The data-graph-based search method of claim 1, wherein said step of ranking said at least one recalled content according to said recall path length and said ranked path length comprises:
calculating a first score for the at least one recalled content as a function of the recall path length and the ranked path length;
arranging the at least one recalled content in descending order according to the first score as a final search result;
further comprising uploading the final search result into a blockchain.
8. An intelligent terminal, comprising:
the map module is used for acquiring personal data and application data of all registered users in a database and constructing a data map according to the personal data and the application data;
the acquisition module is used for acquiring the input search keywords and determining the current user who inputs the search keywords and the corresponding identity;
the recall module is used for acquiring at least one recall content from the database according to the retrieval key words, calculating the content path length of the at least one recall content and taking the minimum value of the content path length as the recall path length;
the history module is used for acquiring first history click content of the current user and acquiring second history click content of other users similar to the current user from the data map;
the sorting module is used for calculating a first path length of the first historical click content and a second path length of the second historical click content, and taking the minimum value of the first path length and the second path length as a sorting path length;
and the result module is used for arranging the at least one recall content according to the recall path length and the sorting path length to serve as a final search result.
9. An intelligent terminal, characterized in that it comprises a memory and a processor, the memory storing a computer program which, when executed by the processor, causes the processor to carry out the steps of the method according to any one of claims 1 to 7.
10. A readable storage medium, in which a computer program is stored which, when being executed by a processor, causes the processor to carry out the steps of the method according to any one of claims 1 to 7.
CN202010471335.7A 2020-05-29 2020-05-29 Data map-based searching method, intelligent terminal and readable storage medium Active CN111694929B (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN202010471335.7A CN111694929B (en) 2020-05-29 2020-05-29 Data map-based searching method, intelligent terminal and readable storage medium
PCT/CN2020/098816 WO2021139105A1 (en) 2020-05-29 2020-06-29 Data map-based search method, smart terminal, and readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010471335.7A CN111694929B (en) 2020-05-29 2020-05-29 Data map-based searching method, intelligent terminal and readable storage medium

Publications (2)

Publication Number Publication Date
CN111694929A true CN111694929A (en) 2020-09-22
CN111694929B CN111694929B (en) 2023-04-07

Family

ID=72478755

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010471335.7A Active CN111694929B (en) 2020-05-29 2020-05-29 Data map-based searching method, intelligent terminal and readable storage medium

Country Status (2)

Country Link
CN (1) CN111694929B (en)
WO (1) WO2021139105A1 (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190182059A1 (en) * 2017-12-12 2019-06-13 Facebook, Inc. Utilizing machine learning from exposed and non-exposed user recall to improve digital content distribution
CN110083688A (en) * 2019-05-10 2019-08-02 北京百度网讯科技有限公司 Search result recalls method, apparatus, server and storage medium
CN110096655A (en) * 2019-04-29 2019-08-06 北京字节跳动网络技术有限公司 Sort method, device, equipment and the storage medium of search result

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110609887A (en) * 2019-09-18 2019-12-24 中科赛思联科(苏州)网络科技有限公司 Scientific and technological resource big data query recommendation system and method based on knowledge graph
CN111046188A (en) * 2019-11-15 2020-04-21 北京三快在线科技有限公司 User preference degree determining method and device, electronic equipment and readable storage medium
CN111191042A (en) * 2019-12-10 2020-05-22 同济大学 Knowledge graph path semantic relation-based search accuracy evaluation method
CN111159431A (en) * 2019-12-30 2020-05-15 深圳Tcl新技术有限公司 Knowledge graph-based information visualization method, device, equipment and storage medium

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190182059A1 (en) * 2017-12-12 2019-06-13 Facebook, Inc. Utilizing machine learning from exposed and non-exposed user recall to improve digital content distribution
CN110096655A (en) * 2019-04-29 2019-08-06 北京字节跳动网络技术有限公司 Sort method, device, equipment and the storage medium of search result
CN110083688A (en) * 2019-05-10 2019-08-02 北京百度网讯科技有限公司 Search result recalls method, apparatus, server and storage medium

Also Published As

Publication number Publication date
WO2021139105A1 (en) 2021-07-15
CN111694929B (en) 2023-04-07

Similar Documents

Publication Publication Date Title
CN110532451A (en) Search method and device for policy text, storage medium, electronic device
CN109255586B (en) Online personalized recommendation method for e-government affairs handling
CN111782965A (en) Intention recommendation method, device, equipment and storage medium
US20100293179A1 (en) Identifying synonyms of entities using web search
Shi et al. Learning-to-rank for real-time high-precision hashtag recommendation for streaming news
Reinanda et al. Mining, ranking and recommending entity aspects
TW200945079A (en) Search results ranking using editing distance and document information
CN103562916A (en) Hybrid and iterative keyword and category search technique
KR101543780B1 (en) System and method for expert search by dynamic profile and social network reliability
CN112508609B (en) Crowd expansion prediction method, device, equipment and storage medium
CN103309886A (en) Trading-platform-based structural information searching method and device
CN101140588A (en) Method and apparatus for ordering incidence relation search result
CN108647322B (en) Method for identifying similarity of mass Web text information based on word network
CN110232126B (en) Hot spot mining method, server and computer readable storage medium
CN110222260A (en) A kind of searching method, device and storage medium
CN106354867A (en) Multimedia resource recommendation method and device
CN115905489B (en) Method for providing bidding information search service
CN111651670A (en) Content retrieval method, device terminal and storage medium based on user behavior map
CN103761286B (en) A kind of Service Source search method based on user interest
CN115374781A (en) Text data information mining method, device and equipment
CN116455861A (en) Big data-based computer network security monitoring system and method
WO2015084757A1 (en) Systems and methods for processing data stored in a database
CN112215629B (en) Multi-target advertisement generating system and method based on construction countermeasure sample
CN111930949B (en) Search string processing method and device, computer readable medium and electronic equipment
CN105447013A (en) News recommendation system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant