CN108062418B - Data searching method and device and server - Google Patents

Data searching method and device and server Download PDF

Info

Publication number
CN108062418B
CN108062418B CN201810011936.2A CN201810011936A CN108062418B CN 108062418 B CN108062418 B CN 108062418B CN 201810011936 A CN201810011936 A CN 201810011936A CN 108062418 B CN108062418 B CN 108062418B
Authority
CN
China
Prior art keywords
page information
index data
access
data
preset
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201810011936.2A
Other languages
Chinese (zh)
Other versions
CN108062418A (en
Inventor
高大陆
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing QIYI Century Science and Technology Co Ltd
Original Assignee
Beijing QIYI Century Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing QIYI Century Science and Technology Co Ltd filed Critical Beijing QIYI Century Science and Technology Co Ltd
Priority to CN201810011936.2A priority Critical patent/CN108062418B/en
Publication of CN108062418A publication Critical patent/CN108062418A/en
Application granted granted Critical
Publication of CN108062418B publication Critical patent/CN108062418B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/957Browsing optimisation, e.g. caching or content distillation
    • G06F16/9574Browsing optimisation, e.g. caching or content distillation of access to content, e.g. by caching
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/951Indexing; Web crawling techniques

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The embodiment of the invention provides a data search method, a device and a server, wherein the data search method is applied to the server and comprises the following steps: receiving a search request; searching index data matched with the search request in the first type of index data, wherein the first type of index data is as follows: storing the data in a non-cache region in the memory; if the index data matched with the search request does not exist in the first type of index data, searching the index data matched with the search request in the second type of index data, wherein the second type of index data is as follows: storing the data in a cache region in the memory; and if index data matched with the index request exists in the second type of index data, obtaining a search result from the second type of index data. By the technical scheme provided by the implementation of the invention, the time for searching index data by the server can be shortened, and the waiting time for searching information by a user is reduced.

Description

Data searching method and device and server
Technical Field
The present invention relates to the field of search engine technology, and in particular, to a data search method, a data search apparatus, and a server.
Background
The search engine refers to a type of website specially providing search service on the internet, and a server corresponding to the website collects page information of the website in the internet locally through network search software or a network login mode. In order to facilitate information search for users, the server establishes an index database for the collected page information, and stores index data in the index database in a disk.
In the prior art, when a server is started, index data corresponding to page information with a high access frequency is loaded into a memory. The method comprises the steps that after a server receives a search request of a user, index data matched with the search request need to be searched firstly, and then information needed by the user is provided for the user according to the searched index data, wherein when the server searches the index data matched with the search request, the index data matched with the search request is searched in a memory firstly, when the index data matched with the search request is not searched in the memory, the index data matched with the search request is searched in a magnetic disk, and the index data matched with the search request is loaded into the memory from the magnetic disk.
However, the inventor finds that the prior art has at least the following problems in the process of implementing the invention:
if the server does not store the index data corresponding to the search request in the memory when starting, after receiving the search request, the server needs to execute the following two steps, the first step is: searching index data matched with the search request in a large amount of index data stored in a disk; the second step is that: the index data matched with the search request is loaded into the memory, and the two steps both need to consume longer time, so that the time for searching the index data by the server is increased, and the time for waiting when the user searches the information is longer.
Disclosure of Invention
The embodiment of the invention aims to provide a data searching method, a data searching device and a server, so as to shorten the time for searching index data by the server and further reduce the waiting time for a user to search information. The specific technical scheme is as follows:
in a first aspect, an embodiment of the present invention provides a data search method, which is applied to a server, and the method includes:
receiving a search request;
searching index data matched with the search request in first-class index data, wherein the first-class index data is as follows: storing data in a non-cache region in a memory;
if the index data matched with the search request does not exist in the first type of index data, searching the index data matched with the search request in the second type of index data, wherein the second type of index data is as follows: storing data in a cache region in a memory, wherein the data in the cache region is: index data which is loaded from a disk according to a preset data loading rule and corresponds to page information with access probability larger than preset probability;
and if index data matched with the index request exists in the second type of index data, obtaining a search result from the second type of index data.
Optionally, the access probability of the target page information is calculated according to the following method, where the target page information is any page information:
calculating the access probability of the target page information by using the access amount of the user to the target page information within a first preset time length;
or calculating the access probability of the target page information according to the characteristics of the target page information.
Optionally, the calculating, according to the feature of the target page information, the access probability of the target page information includes:
extracting the characteristics of the target page information;
and estimating the access frequency corresponding to each extracted feature by adopting the access probability of the page information corresponding to each extracted feature, wherein the page information corresponding to one feature is as follows: in the candidate page information set, page information with the characteristics is obtained;
and calculating the access probability of the target page information by using each access frequency obtained by estimation.
Optionally, the page information in the candidate page information set is:
selecting page information with the access amount larger than the preset access amount from all the page information stored in the disk according to the access amount of the user to the page information within a second preset time;
or,
all page information stored in disk.
Optionally, the characteristics of the target page information include at least one of the following characteristics: heat, time of release, source, type, title characteristics.
In a second aspect, an embodiment of the present invention provides a data search apparatus, which is applied to a server, and the apparatus includes:
a request receiving module for receiving a search request;
a first data searching module, configured to search index data matching the search request in first-class index data, where the first-class index data is: storing the data in a non-cache region in the memory;
the second data searching module is configured to search, if there is no index data matching the search request in the first class of index data, index data matching the search request in a second class of index data, where the second class of index data is: the data stored in the cache area in the memory are: index data which is loaded from a disk according to a preset data loading rule and corresponds to page information with access probability larger than preset probability;
and the search result acquisition module is used for acquiring a search result from the second type index data if the index data matched with the index request exists in the second type index data.
Optionally, the apparatus further comprises:
the access probability calculation module is used for calculating the access probability of target page information according to the following modes, wherein the target page information is any page information:
calculating the access probability of the target page information by using the access amount of the user to the target page information within a first preset time length;
or calculating the access probability of the target page information according to the characteristics of the target page information.
Optionally, the access probability calculation module is specifically configured to:
extracting the characteristics of the target page information;
and estimating the access frequency corresponding to each extracted feature by adopting the access probability of the page information corresponding to each extracted feature, wherein the page information corresponding to one feature is as follows: in the candidate page information set, page information with the characteristics is obtained;
and calculating the access probability of the target page information by using each access frequency obtained by estimation.
Optionally, the set of candidate page information is:
selecting page information with the access amount larger than the preset access amount from all the page information stored in the disk according to the access amount of the user to the page information within a second preset time;
or,
all page information stored in disk.
Optionally, the characteristics of the target page information include at least one of the following characteristics: heat, time of release, source, type, title characteristics.
In a third aspect, an embodiment of the present invention provides a server, including a processor, a communication interface, a memory, and a communication bus, where the processor and the communication interface complete communication between the memory and the processor through the communication bus;
a memory for storing a computer program;
a processor, configured to implement any of the data search methods described in the first aspect above when executing a program stored in a memory.
In another aspect of the present invention, there is also provided a computer-readable storage medium, which stores instructions that, when executed on a computer, cause the computer to perform any one of the data searching methods described in the first aspect.
In another aspect of the present invention, there is also provided a computer program product including instructions, which when executed on a computer, make the computer perform any one of the data searching methods described in the first aspect.
Compared with the prior art, according to the technical scheme provided by the embodiment of the invention, after the server receives the search request, whether the search data matched with the search request exists in a non-cache area in the memory is searched firstly; if the search data matched with the search request is not searched in the non-cache region, searching the cache region in the memory for the index data matched with the search request. The index data stored in the cache region in the memory is the index data which is loaded from the disk by the server according to the preset data loading rule and corresponds to the page information with the access probability being greater than the preset probability, namely the probability that the index data matched with the search request is searched in the cache region in the memory is higher, so that the time for searching the index data by the server can be shortened, and the waiting time for searching the information by the user can be reduced.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below.
Fig. 1 is a schematic flow chart of a data searching method according to an embodiment of the present invention;
fig. 2 is a schematic structural diagram of a data search apparatus according to an embodiment of the present invention;
fig. 3 is a schematic structural diagram of a server according to an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be described below with reference to the drawings in the embodiments of the present invention.
In order to solve the technical problems that in the prior art, a server searches index data for a long time and a user needs to wait for the information, embodiments of the present invention provide a data searching method, apparatus, and server, so as to shorten the time for the server to search index data, and further shorten the waiting time for the user to search for information.
In order to more clearly and completely describe the embodiments of the present invention, the concepts involved in the embodiments of the present invention will be described first.
A non-cache area: is a part of the memory and is used for storing the first index data;
a cache area: is relative to the non-cache area and is a part of the memory, which is used for storing the second type index data;
the first type of index data can be index data corresponding to page information with high access frequency loaded into a memory from a disk when a server providing search service is started; or the server is in a starting state, and after receiving the search request, if the index data corresponding to the search request is not searched in the memory, the index data in the memory is loaded from the disk;
second-class index data: index data corresponding to page information which is loaded from a disk by a server according to a preset data loading rule and has an access probability greater than a preset probability;
the preset data loading rule may be a preset time interval, for example, the preset time interval is two days, then the server calculates page information with an access probability greater than the preset access probability within two days before the current time, and loads the index data corresponding to the page information with the access probability greater than the preset access probability into the cache. Of course, the preset data loading rule is not specifically limited in the present invention.
After introducing the concepts related to the embodiments of the present invention, a data search method, a data search apparatus, a server, and the like provided by the embodiments of the present invention will be introduced.
In a first aspect, a data search method provided in an embodiment of the present invention is introduced.
As shown in fig. 1, a data searching method provided in an embodiment of the present invention includes the following steps:
s101, receiving a search request;
s102, searching index data matched with the search request in the first-class index data, wherein the first-class index data is as follows: storing the data in a non-cache region in the memory;
s103, if the index data matched with the search request does not exist in the first-class index data, searching the index data matched with the search request in the second-class index data, wherein the second-class index data is as follows: the data stored in the cache area in the memory are: index data which is loaded from a disk according to a preset data loading rule and corresponds to page information with access probability greater than preset probability;
and S104, if index data matched with the index request exists in the second-class index data, obtaining a search result from the second-class index data.
By adopting the data searching method shown in fig. 1, after receiving a search request, a server first searches whether search data matched with the search request exists in a non-cache area in a memory; if the search data matched with the search request is not searched in the non-cache region, searching the cache region in the memory for the index data matched with the search request. The index data stored in the cache region in the memory is the index data which is loaded from the disk by the server according to the preset data loading rule and corresponds to the page information with the access probability being greater than the preset probability, that is, the probability of searching the index data matched with the search request in the cache region in the memory is higher.
Unlike the prior art, the server loads the index data corresponding to the page information with the higher access frequency into the memory only when the server is started, and after the server is started, the index data in the memory is not updated automatically, if the index data corresponding to the search request is the index data corresponding to the page information with the higher access frequency and stored in the disk, the server needs to search for the index data matched with the search request from a large amount of index data stored in the disk, and load the index data matched with the search request into the memory, and the search and the load of the index data matched with the search request consume a long time, so that the time for searching the index data by the server is increased, and further the time for waiting when the user searches for the information is long.
Therefore, the technical scheme provided by the embodiment of the invention can shorten the time for searching index data by the server and reduce the waiting time for searching information by the user.
The data search method provided by the embodiment of the present invention will be described in detail below.
S101, receiving a search request;
the search engine provides a page containing a search box, when a user needs to search information, the user inputs a search word in the search box and clicks on the search, and the server receives a search request.
For example, the search engine is hundred degrees, and when the user needs to view the latest news, the user inputs the "latest news" in the search box and clicks "hundred degrees", and the server receives the search request.
It should be noted that the search request is used to search page information, and when searching for page information, index data corresponding to the page information needs to be searched first, so in the embodiment of the present invention, the search request may be used to search for index data.
S102, searching index data matched with the search request in the first-class index data, wherein the first-class index data is as follows: storing the data in a non-cache region in the memory;
because the first-class index data may be index data corresponding to page information with a high access frequency when the server is started, after receiving a search request, the server searches whether index data matched with the index data exists in the first-class index data, and if the index data matched with the index data exists in the first-class index data, a search result may be obtained in the first-class index data. If the index data matching the search request is not found in the first-type index data, step S103 is executed.
It is to be understood that there may be one or more index data matching the search request. In general, there are a plurality of index data matching the search request. For example, the search engine is in the hundred degrees, and the user enters "latest news" in the hundred degrees search box and clicks "in the hundred degrees", and the index data matching the search request may be "latest sports news", "latest stock information", "latest entertainment information", and the like.
S103, if the index data matched with the search request does not exist in the first-class index data, searching the index data matched with the search request in the second-class index data, wherein the second-class index data is as follows: the data stored in the cache area in the memory are: index data which is loaded from a disk according to a preset data loading rule and corresponds to page information with access probability larger than preset probability;
in this step, if there is no index data matching the search request in the first type of index data, that is, the server does not find the index data matching the search request in the first type of index data, the index data matching the search request is found in the second type of index data stored in the cache area in the memory. The index data stored in the cache region in the memory is the index data which is loaded from the disk by the server according to the preset data loading rule and corresponds to the page information with the access probability larger than the preset probability, that is, the probability that the index data matched with the search request is searched in the cache region in the memory is higher. Unlike the prior art, if the server does not find the index data matched with the search request in the index data loaded to the memory when the server is started, the index data matched with the search request is found in a large amount of index data stored in the disk, and the found index data is loaded to the memory, so that the time for searching the index data is increased, and the waiting time for searching the information by the user is prolonged.
It should be emphasized that, in practical applications, after receiving the search request, the server may simultaneously search the first-type index data and the second-type index data for index data matching the search request. Of course, the server may also search the index data matching the search request in the second type of index data, and if there is no index data matching the search request in the second type of index data, then search the index data matching the search request in the first type of index data, which is all reasonable. That is to say, the execution sequence of searching the index data matching the search request in the first type of index data and searching the index data matching the search request in the second type of index data is not specifically limited in the embodiments of the present invention.
It should be understood that the above-mentioned disk is any disk storing index data, and it is reasonable that the disk may be a disk of a server, or may be another disk storing index data.
It should be noted that the second type of index data is index data corresponding to page information with an access probability greater than a preset probability, and how to calculate the access probability of the page information will be described in detail below.
In an implementation manner, the access probability of the target page information may be calculated by using the access amount of the user to the target page information within a first preset time length, where the target page information is any page information stored in a disk.
The first preset duration may be greater or smaller, that is, the first preset duration may be: it is reasonable that the time period is 30 days, 15 days, 7 days, 3 days, 2 days, 1 day, 3 hours, 1 hour, etc., and the first preset time period is not particularly limited in the embodiment of the present invention. The first preset time period is 2 days for specific explanation.
The server records the access amount of the user to the target page information within 2 days, judges whether the access amount is greater than or equal to a preset access amount after obtaining the access amount of the target page information, and determines the target page information as page information with high access frequency if the access amount is greater than or equal to the preset access amount; and if the access amount is less than the preset access amount, determining the target page information as the page information with low access frequency.
Of course, since the target page information is any page information stored in the disk, the manner of determining the access amount of the target page information may also be: the server sequences the page information stored in the disk according to the sequence of the access amount from large to small, determines the page information with the sequence number smaller than or equal to the preset sequence number as the page information with high access probability, and determines the page information with the sequence number larger than the preset sequence number as the page information with low access probability. Similarly, the server may also sequence the page information stored in the disk in the order from small to large access amount, determine the page information with the sequence number less than or equal to the preset sequence number as the page information with low access probability, and determine the page information with the sequence number greater than the preset sequence number as the page information with high access probability.
In another embodiment, the access probability of the target page information may be calculated according to the characteristics of the target page information, where the target page information is any page information stored in the disk.
In this embodiment, the features of the target page information, which may be the publication time, the popularity, the source, the genre, the title feature, etc., are first extracted. Specifically, the release time may be the last month, the last week, the last three days, the last day, and the like, and the popularity may be: a heat value; the sources may be: whether originating inside or outside a station; whether the uploading is carried out by a common user or editing and uploading is carried out; the genre may be various, and may be, for example, a television show, a movie, a variety program, or the like; the title characteristics may be the length, wording, semantic meaning, etc. of the title. Of course, the features that can represent the page information may be the features of the target page information, and the features of the target page information are not specifically limited in the embodiments of the present invention.
After the features of the target page information are extracted, for each feature, the access probability of the user to the page information with the feature is found in the candidate page information set, so that the access frequency corresponding to the feature can be estimated. The access frequency corresponding to each feature can be estimated by adopting a machine learning method, so that the calculation accuracy of the access frequency corresponding to each feature can be improved.
It should be noted that the page information in the candidate page information set may be all page information stored in the disk, or may be page information whose access amount is greater than the preset access amount selected from all page information stored in the disk according to the access amount of the user to the page information within the second preset time period. The second preset duration may also be larger or smaller, and the size of the second preset duration is not specifically limited in the embodiment of the present invention.
For example, it is assumed that the feature of extracting the target page information is the publishing time, and the page information in the candidate page information set is all the page information stored in the disk. In this case, the server extracts the distribution time and the access probability of the page information other than the target page information stored in the disk, and estimates the access frequency of the distribution time by machine learning based on the extracted distribution time and access probability. It can be understood that, if the probability of access to the page information of which the release time is the latest week is higher, the access frequency of the page information of which the estimated release time is the latest week is higher.
By the method for calculating the access frequency, the access frequency of each feature of the target page information can be estimated, and finally the access probability of the user on the target page information can be calculated by using the estimated access frequency of each feature. It is understood that the access probability of the user to the target page information can be obtained by summing the estimated access frequencies of the features, and it is reasonable to multiply the estimated access frequencies of the features by weighting coefficients and then sum the weighted access frequencies to obtain the access probability of the user to the target page information. The method for calculating the access probability of the target page information in the embodiment of the present invention is not particularly limited.
In another embodiment, the access probability of the target page information may be calculated by using the access amount of the user to the target page information within the first preset time length and the characteristics of the target page information.
In the embodiment, the first access probability of the target page information can be calculated by utilizing the access amount of the user to the target page information within the first preset time length; and finally, calculating the access probability of the target page information by using the first access probability and the second access probability.
It should be noted that the access probability of the target page information can be calculated through the following three algorithms.
The first algorithm: the weight corresponding to the first access probability is a first weight, the weight corresponding to the second access probability is a second weight, the first weight is equal to the second weight, and at this time, the first access probability and the second access probability can be summed to obtain the access probability of the target page information;
the second algorithm: the weight corresponding to the first access probability is a first weight, the weight corresponding to the second access probability is a second weight, the first weight is greater than the second weight, at the moment, the first access probability is multiplied by the first weight to obtain a first probability, the second access probability is multiplied by the second weight to obtain a second probability, and finally the first probability and the second probability are summed to obtain the access probability of the target page information;
the third algorithm: the weight corresponding to the first access probability is a first weight, the weight corresponding to the second access probability is a second weight, the first weight is smaller than the second weight, at the moment, the first access probability is multiplied by the first weight to obtain a first probability, the second access probability is multiplied by the second weight to obtain a second probability, and finally the first probability and the second probability are summed to obtain the access probability of the target page information.
For the three algorithms, in practical application, which of the above algorithms is used to calculate the access probability of the target page information may be determined according to practical situations, and the calculation algorithm of the access probability of the target page information in the embodiment of the present invention is not specifically limited.
And S104, if index data matched with the index request exists in the second-class index data, obtaining a search result from the second-class index data.
If index data matching the index request exists in the second type index data, the search result can be obtained in the second type index data. The second-class index data is the index data which is loaded from the disk by the server according to the preset data loading rule and corresponds to the page information with the access probability being greater than the preset probability, namely the probability that the index data matched with the search request is searched in the second-class index data is higher.
Compared with the prior art, according to the technical scheme provided by the embodiment of the invention, after the server receives the search request, whether the search data matched with the search request exists in a non-cache area in the memory is searched firstly; if the search data matched with the search request is not searched in the non-cache region, searching the cache region in the memory for the index data matched with the search request. The index data stored in the cache region in the memory is the index data which is loaded from the disk by the server according to the preset data loading rule and corresponds to the page information with the access probability being greater than the preset probability, namely the probability that the index data matched with the search request is searched in the cache region in the memory is higher, so that the time for searching the index data by the server can be shortened, and the waiting time for searching the information by the user can be reduced.
In a second aspect, an embodiment of the present invention further provides a data search apparatus, which is applied to a server, and as shown in fig. 2, the apparatus includes:
a request receiving module 210, configured to receive a search request;
the first data searching module 220 is configured to search index data matching the search request in a first type of index data, where the first type of index data is: storing data in a non-cache region in a memory;
the second data searching module 230 is configured to search, if there is no index data matching the search request in the first class of index data, index data matching the search request in a second class of index data, where the second class of index data is: storing data in a cache region in a memory, wherein the data in the cache region is: index data which is loaded from a disk according to a preset data loading rule and corresponds to page information with access probability larger than preset probability;
the search result obtaining module 240 is configured to obtain a search result from the second type of index data if the index data matching the index request exists in the second type of index data.
Compared with the prior art, according to the technical scheme provided by the embodiment of the invention, after the server receives the search request, whether the search data matched with the search request exists in a non-cache area in the memory is searched firstly; if the search data matched with the search request is not searched in the non-cache region, searching the cache region in the memory for the index data matched with the search request. The index data stored in the cache region in the memory is the index data which is loaded from the disk by the server according to the preset data loading rule and corresponds to the page information with the access probability being greater than the preset probability, namely the probability that the index data matched with the search request is searched in the cache region in the memory is higher, so that the time for searching the index data by the server can be shortened, and the waiting time for searching the information by the user can be reduced.
Optionally, the apparatus further comprises:
the access probability calculation module is used for calculating the access probability of target page information according to the following modes, wherein the target page information is any page information:
calculating the access probability of the target page information by using the access amount of the user to the target page information within a first preset time length;
or calculating the access probability of the target page information according to the characteristics of the target page information.
Optionally, the access probability calculation module is specifically configured to:
extracting the characteristics of the target page information;
and estimating the access frequency corresponding to each extracted feature by adopting the access probability of the page information corresponding to each extracted feature, wherein the page information corresponding to one feature is as follows: page information with the characteristics is in the candidate page information set; and calculating the access probability of the target page information by using each access frequency obtained by estimation.
Optionally, the set of candidate page information is:
selecting page information with the access amount larger than the preset access amount from all the page information stored in the disk according to the access amount of the user to the page information within a second preset time;
or,
all page information stored in disk.
Optionally, the characteristics of the target page information include at least one of the following characteristics: heat, time of release, source, type, title characteristics.
In a third aspect, an embodiment of the present invention further provides a server, as shown in fig. 3, including a processor 301, a communication interface 302, a memory 303 and a communication bus 304, where the processor 301, the communication interface 302, and the memory 303 complete mutual communication through the communication bus 304,
a memory 303 for storing a computer program;
the processor 301 is configured to implement the data searching method according to the above method embodiment when executing the program stored in the memory 303.
The communication bus mentioned in the above server may be a Peripheral Component Interconnect (PCI) bus, an Extended Industry Standard Architecture (EISA) bus, or the like. The communication bus may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, only one thick line is shown, but this is not intended to represent only one bus or type of bus.
The communication interface is used for communication between the server and other devices.
The Memory may include a Random Access Memory (RAM) or a Non-Volatile Memory (NVM), such as at least one disk Memory. Optionally, the memory may also be at least one memory device located remotely from the processor.
The Processor may be a general-purpose Processor, including a Central Processing Unit (CPU), a Network Processor (NP), and the like; but also Digital Signal Processors (DSPs), Application Specific Integrated Circuits (ASICs), Field Programmable Gate Arrays (FPGAs) or other Programmable logic devices, discrete Gate or transistor logic devices, discrete hardware components.
Compared with the prior art, according to the technical scheme provided by the embodiment of the invention, after the server receives the search request, whether the search data matched with the search request exists in the non-cache area in the memory is searched firstly; if the search data matched with the search request is not searched in the non-cache region, searching the cache region in the memory for the index data matched with the search request. The index data stored in the cache region in the memory is the index data which is loaded from the disk by the server according to the preset data loading rule and corresponds to the page information with the access probability being greater than the preset probability, namely the probability of searching the index data matched with the search request in the cache region in the memory is higher, so that the time of searching the index data by the server can be shortened, and the waiting time of searching the information by the user can be reduced.
In yet another embodiment of the present invention, a computer-readable storage medium is further provided, which has instructions stored therein, and when the computer-readable storage medium runs on a computer, the computer is caused to execute the data search method described in the above method embodiment.
Compared with the prior art, according to the technical scheme provided by the embodiment of the invention, after the server receives the search request, whether the search data matched with the search request exists in the non-cache area in the memory is searched firstly; if the search data matched with the search request is not searched in the non-cache region, searching the cache region in the memory for the index data matched with the search request. The index data stored in the cache region in the memory is the index data which is loaded from the disk by the server according to the preset data loading rule and corresponds to the page information with the access probability being greater than the preset probability, namely the probability of searching the index data matched with the search request in the cache region in the memory is higher, so that the time of searching the index data by the server can be shortened, and the waiting time of searching the information by the user can be reduced.
In a further embodiment provided by the present invention, there is also provided a computer program product containing instructions which, when run on a computer, cause the computer to perform the data search method described in the above method embodiment.
Compared with the prior art, according to the technical scheme provided by the embodiment of the invention, after the server receives the search request, whether the search data matched with the search request exists in a non-cache area in the memory is searched firstly; if the search data matched with the search request is not searched in the non-cache region, searching the cache region in the memory for the index data matched with the search request. The index data stored in the cache region in the memory is the index data which is loaded from the disk by the server according to the preset data loading rule and corresponds to the page information with the access probability being greater than the preset probability, namely the probability that the index data matched with the search request is searched in the cache region in the memory is higher, so that the time for searching the index data by the server can be shortened, and the waiting time for searching the information by the user can be reduced.
In the above embodiments, the implementation may be wholly or partially realized by software, hardware, firmware, or any combination thereof. When implemented in software, it may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When loaded and executed on a computer, cause the processes or functions described in accordance with the embodiments of the invention to occur, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable device. The computer instructions may be stored in a computer readable storage medium or transmitted from one computer readable storage medium to another, for example, from one website site, computer, server, or data center to another website site, computer, server, or data center via wired (e.g., coaxial cable, fiber optic, Digital Subscriber Line (DSL)) or wireless (e.g., infrared, wireless, microwave, etc.). The computer-readable storage medium can be any available medium that can be accessed by a computer or a data storage device, such as a server, a data center, etc., that incorporates one or more of the available media. The usable medium may be a magnetic medium (e.g., floppy Disk, hard Disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., Solid State Disk (SSD)), among others.
It is noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in the process, method, article, or apparatus that comprises the element.
All the embodiments in the present specification are described in a related manner, and the same and similar parts among the embodiments may be referred to each other, and each embodiment focuses on differences from other embodiments. In particular, for the apparatus, the server, the computer-readable storage medium, and the computer program product embodiments, since they are substantially similar to the method embodiments, the description is simple, and the relevant points can be referred to the partial description of the method embodiments.
The above description is only for the preferred embodiment of the present invention, and is not intended to limit the scope of the present invention. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention shall fall within the protection scope of the present invention.

Claims (11)

1. A data searching method is applied to a server, and the method comprises the following steps:
receiving a search request;
searching index data matched with the search request in first-class index data, wherein the first-class index data is as follows: the first type of index data is index data corresponding to page information which is loaded from a disk into the memory and has high access frequency;
if the index data matched with the search request does not exist in the first type of index data, searching the index data matched with the search request in the second type of index data, wherein the second type of index data is as follows: storing data in a cache region in a memory, wherein the data in the cache region is: index data which is loaded from a disk according to a preset data loading rule and corresponds to page information with access probability greater than preset probability; the preset data loading rule is that the server calculates page information with access probability greater than preset access probability in a preset time interval before the current moment, and loads index data corresponding to the page information with access probability greater than the preset access probability into a cache region;
and if index data matched with the search request exists in the second type of index data, obtaining a search result from the second type of index data.
2. The method of claim 1, wherein the access probability of the target page information is calculated according to the following method, wherein the target page information is any page information:
calculating the access probability of the target page information by using the access amount of the user to the target page information within a first preset time length;
or calculating the access probability of the target page information according to the characteristics of the target page information.
3. The method of claim 2, wherein the calculating the access probability of the target page information according to the characteristics of the target page information comprises:
extracting the characteristics of the target page information;
and estimating the access frequency corresponding to each extracted feature by adopting the access probability of the page information corresponding to each extracted feature, wherein the page information corresponding to one feature is as follows: in the candidate page information set, page information with the characteristics is obtained;
and calculating the access probability of the target page information by using each access frequency obtained by estimation.
4. The method of claim 3, wherein the page information in the candidate page information set is:
selecting page information with the access amount larger than the preset access amount from all the page information stored in the disk according to the access amount of the user to the page information within a second preset time;
or,
all page information stored in disk.
5. The method according to claim 2 or 3, characterized in that the characteristics of the target page information comprise at least one of the following characteristics: heat, time of release, source, type, title characteristics.
6. A data search apparatus, applied to a server, the apparatus comprising:
the request receiving module is used for receiving a search request;
a first data searching module, configured to search index data matching the search request in first-class index data, where the first-class index data is: the first type of index data is index data corresponding to page information which is loaded from a disk into the memory and has high access frequency;
the second data searching module is configured to search, if there is no index data matching the search request in the first class of index data, index data matching the search request in a second class of index data, where the second class of index data is: the data stored in the cache area in the memory are: index data which is loaded from a disk according to a preset data loading rule and corresponds to page information with access probability greater than preset probability; the preset data loading rule is that the server calculates page information with access probability greater than preset access probability in a preset time interval before the current moment, and loads index data corresponding to the page information with access probability greater than the preset access probability into a cache region;
and the search result acquisition module is used for acquiring a search result from the second type index data if the index data matched with the search request exists in the second type index data.
7. The apparatus of claim 6, further comprising:
the access probability calculation module is used for calculating the access probability of target page information according to the following modes, wherein the target page information is any page information:
calculating the access probability of the target page information by using the access amount of the user to the target page information within a first preset time length;
or, according to the characteristics of the target page information, calculating the access probability of the target page information.
8. The apparatus of claim 7, wherein the access probability calculation module is specifically configured to:
extracting the characteristics of the target page information;
and estimating the access frequency corresponding to each extracted feature by adopting the access probability of the page information corresponding to each extracted feature, wherein the page information corresponding to one feature is as follows: in the candidate page information set, page information with the characteristics is obtained;
and calculating the access probability of the target page information by using each access frequency obtained by estimation.
9. The apparatus of claim 8, wherein the set of candidate page information is:
selecting page information with the access amount larger than the preset access amount from all the page information stored in the disk according to the access amount of the user to the page information within a second preset time;
or,
all page information stored in disk.
10. The apparatus according to claim 7 or 8, wherein the characteristics of the target page information comprise at least one of the following characteristics: heat, time of release, source, type, title characteristics.
11. The server is characterized by comprising a processor, a communication interface, a memory and a communication bus, wherein the processor and the communication interface are used for realizing the communication between the processor and the memory through the communication bus;
a memory for storing a computer program;
a processor for implementing the method steps of any one of claims 1 to 5 when executing a program stored in the memory.
CN201810011936.2A 2018-01-05 2018-01-05 Data searching method and device and server Active CN108062418B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810011936.2A CN108062418B (en) 2018-01-05 2018-01-05 Data searching method and device and server

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810011936.2A CN108062418B (en) 2018-01-05 2018-01-05 Data searching method and device and server

Publications (2)

Publication Number Publication Date
CN108062418A CN108062418A (en) 2018-05-22
CN108062418B true CN108062418B (en) 2022-07-22

Family

ID=62141361

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810011936.2A Active CN108062418B (en) 2018-01-05 2018-01-05 Data searching method and device and server

Country Status (1)

Country Link
CN (1) CN108062418B (en)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110704492A (en) * 2018-06-25 2020-01-17 中兴通讯股份有限公司 Data acquisition method and device and computer readable storage medium
CN108897886B (en) * 2018-07-09 2019-09-24 掌阅科技股份有限公司 Page display method calculates equipment and computer storage medium
CN109284236B (en) * 2018-08-28 2020-04-17 北京三快在线科技有限公司 Data preheating method and device, electronic equipment and storage medium
CN109063199B (en) * 2018-09-11 2022-10-25 优视科技有限公司 Resource filtering method and device, electronic equipment and computer readable medium
CN109933585B (en) * 2019-02-22 2021-11-02 京东数字科技控股有限公司 Data query method and data query system

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080183663A1 (en) * 2007-01-31 2008-07-31 Paul Reuben Day Dynamic Index Selection for Database Queries
CN103500213A (en) * 2013-09-30 2014-01-08 北京搜狗科技发展有限公司 Page hot-spot resource updating method and device based on pre-reading
CN104572643A (en) * 2013-10-10 2015-04-29 北大方正集团有限公司 Search method and search engine
CN105653646A (en) * 2015-12-28 2016-06-08 北京中电普华信息技术有限公司 Dynamic query system and method under concurrent query condition

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080183663A1 (en) * 2007-01-31 2008-07-31 Paul Reuben Day Dynamic Index Selection for Database Queries
CN103500213A (en) * 2013-09-30 2014-01-08 北京搜狗科技发展有限公司 Page hot-spot resource updating method and device based on pre-reading
CN104572643A (en) * 2013-10-10 2015-04-29 北大方正集团有限公司 Search method and search engine
CN105653646A (en) * 2015-12-28 2016-06-08 北京中电普华信息技术有限公司 Dynamic query system and method under concurrent query condition

Also Published As

Publication number Publication date
CN108062418A (en) 2018-05-22

Similar Documents

Publication Publication Date Title
CN108062418B (en) Data searching method and device and server
US11580168B2 (en) Method and system for providing context based query suggestions
CN106202394B (en) Text information recommendation method and system
US9965209B2 (en) Large-scale, dynamic graph storage and processing system
CN107180093B (en) Information searching method and device and timeliness query word identification method and device
WO2018192496A1 (en) Trend information generation method and device, storage medium and electronic device
CN111597449B (en) Candidate word construction method and device for search, electronic equipment and readable medium
US20100191758A1 (en) System and method for improved search relevance using proximity boosting
CN111159563B (en) Method, device, equipment and storage medium for determining user interest point information
CN105302807B (en) Method and device for acquiring information category
US10146872B2 (en) Method and system for predicting search results quality in vertical ranking
CN110968765B (en) Book searching method, computing device and computer storage medium
CN110674400B (en) Sorting method, sorting device, electronic equipment and computer-readable storage medium
CN111291258A (en) Recommendation method and device for searching hot words, electronic equipment and readable medium
US20150302088A1 (en) Method and System for Providing Personalized Content
CN107885875B (en) Synonymy transformation method and device for search words and server
CN112328889A (en) Method and device for determining recommended search terms, readable medium and electronic equipment
CN107291835B (en) Search term recommendation method and device
CN108667875B (en) Information updating method and device
CN110334267B (en) Content searching method and device based on blockchain and electronic equipment
CN113177169B (en) Method, device, equipment and storage medium for acquiring category of network address
US20160055203A1 (en) Method for record selection to avoid negatively impacting latency
CN113536138A (en) Network resource recommendation method and device, electronic equipment and readable storage medium
CN113868373A (en) Word cloud generation method and device, electronic equipment and storage medium
CN109657129B (en) Method and device for acquiring information

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant