CN111831876A - Query method, device and storage medium - Google Patents

Query method, device and storage medium Download PDF

Info

Publication number
CN111831876A
CN111831876A CN201910297346.5A CN201910297346A CN111831876A CN 111831876 A CN111831876 A CN 111831876A CN 201910297346 A CN201910297346 A CN 201910297346A CN 111831876 A CN111831876 A CN 111831876A
Authority
CN
China
Prior art keywords
index
query
word
words
list
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910297346.5A
Other languages
Chinese (zh)
Inventor
李世峰
李中男
朱宏波
于严
赵帅领
王鹏
郭艳民
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Navinfo Co Ltd
Original Assignee
Navinfo Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Navinfo Co Ltd filed Critical Navinfo Co Ltd
Priority to CN201910297346.5A priority Critical patent/CN111831876A/en
Publication of CN111831876A publication Critical patent/CN111831876A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/951Indexing; Web crawling techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention provides a query method, a query device and a storage medium, wherein the method comprises the following steps: acquiring a query word; if the query list contains the target index words with the same code values as the query words, determining the index identifications of the target index words and the number of candidate index words containing the query words according to the query list, wherein the query list is used for indicating the corresponding relation among the code values of the target index words, the index identifications of the target index words and the number of the candidate index words; acquiring all candidate index words according to the index identification of the target index word and the number of the candidate index words, wherein the index identifications of adjacent candidate index words are continuous; and pushing all candidate index words. According to the invention, all candidate index words can be quickly determined by pre-establishing the corresponding relation among the code value of the target index word, the index identification of the target index word and the number of the candidate index words, and the index efficiency is improved, regardless of the number of the index words.

Description

Query method, device and storage medium
Technical Field
The invention relates to the technical field of computer science, in particular to a query method, query equipment and a storage medium.
Background
With the rapid expansion of internet information, the dependence of users on search engines is increasing, and the commercial value of the users cannot be measured. Generally, when a user inputs a character to be queried by using a mobile terminal or a PC terminal, a search engine needs to prompt the rest part which the user wants to input, namely, an associative word index function, quickly and intelligently.
In the prior art, the association word indexing function generally adopts a mode of sequence table indexing and hash table indexing; the sequence table index is to store all index words in sequence, and the index words are obtained by binary search during searching; the hash table index is to establish a mapping between each query word and the corresponding index word list through a hash function, and obtain the index word through the mapping relation.
The query time complexity of the sequence table index is O (log)2n), n is the number of the index words, the searching efficiency of the method is low, and especially the searching efficiency is lower under the condition that the number of the index words is very large; the query time complexity of the hash table index is O (1), and the search efficiency is high, but when the data size is large, the hash function avoids the increase of the operation amount due to hash collision.
Disclosure of Invention
The invention provides a query method, query equipment and a storage medium, which are irrelevant to the number of index words and improve the indexing efficiency.
A first aspect of the present invention provides a query method, including:
acquiring a query word;
if the query list contains the target index words with the same code values as the query words, determining the index identifications of the target index words and the number of candidate index words containing the query words according to the query list, wherein the query list is used for indicating the corresponding relation among the code values of the target index words, the index identifications of the target index words and the number of the candidate index words;
acquiring all candidate index words according to the index identifications of the target index words and the number of the candidate index words, wherein the index identifications of the adjacent candidate index words are continuous;
and pushing all the candidate index words.
A second aspect of the present invention provides a query device, comprising:
the query term acquisition module is used for acquiring query terms;
a determining module, configured to determine, according to a query list, an index identifier of a target index word and a number of candidate index words including the query word if the query list includes the target index word having a same code value as the query word, where the query list is used to indicate a correspondence between the code value of the target index word, the index identifier of the target index word, and the number of the candidate index words;
the candidate index word acquisition module is used for acquiring all the candidate index words according to the index identifications of the target index words and the number of the candidate index words, wherein the index identifications of the adjacent candidate index words are continuous;
and the pushing module is used for pushing all the candidate index words.
A third aspect of the present invention provides a query device, comprising: at least one processor and memory;
the memory stores computer-executable instructions;
the at least one processor executes computer-executable instructions stored by the memory to cause the querying device to perform the querying method described above.
A fourth aspect of the present invention provides a computer-readable storage medium having stored thereon computer-executable instructions that, when executed by a processor, implement the above-mentioned query method.
The invention provides a query method, a query device and a storage medium, wherein the method comprises the following steps: acquiring a query word; if the query list contains the target index words with the same code values as the query words, determining the index identifications of the target index words and the number of candidate index words containing the query words according to the query list, wherein the query list is used for indicating the corresponding relation among the code values of the target index words, the index identifications of the target index words and the number of the candidate index words; acquiring all candidate index words according to the index identification of the target index word and the number of the candidate index words, wherein the index identifications of adjacent candidate index words are continuous; and pushing all candidate index words. According to the invention, all candidate index words can be quickly determined by pre-establishing the corresponding relation among the code value of the target index word, the index identification of the target index word and the number of the candidate index words, and the index efficiency is improved, regardless of the number of the index words.
Drawings
FIG. 1 is a schematic diagram of a system architecture suitable for the query method provided by the present invention;
FIG. 2 is a first flowchart illustrating a query method according to the present invention;
fig. 3 is a schematic view of an interface change of a terminal corresponding to the query method provided by the present invention;
FIG. 4 is a flowchart illustrating a second exemplary query method according to the present invention;
FIG. 5 is a schematic flow chart illustrating the process of creating a query list, a dictionary file, and a query file in the query method according to the present invention;
FIG. 6 is an exemplary diagram of an index tree provided by the present invention;
FIG. 7 is an exemplary diagram of a query file provided by the present invention;
FIG. 8 is an exemplary diagram of a dictionary file provided by the present invention;
FIG. 9 is a first schematic structural diagram of a query device according to the present invention;
FIG. 10 is a schematic structural diagram of a query device according to the present invention;
fig. 11 is a schematic structural diagram of a querying device provided in the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Fig. 1 is a schematic diagram of a system architecture to which the query method provided by the present invention is applicable, where the query method provided by the present invention is applicable in a query scenario as shown in fig. 1, where the query scenario includes a terminal and a query device, the query device may be a server, the terminal provided by the present invention and the query device are in communication connection in a wired or wireless manner, a user inputs a query term through the terminal, and after the terminal acquires the query term, the terminal sends the query term to the query device, so that the query device returns an association term (index term) corresponding to the query term to the terminal according to the query term. In the following embodiments, the query method provided by the present invention is described by taking a query device as an example.
The terminal in the present invention may be, but is not limited to, a mobile terminal or a fixed terminal. The specific mobile terminal can be a mobile device such as a smart phone and a PAD having a function of enabling a user to input a query word and a display function. The fixed terminal may be a fixed device such as a desktop computer having a function of allowing a user to input a query word, and a display function.
Fig. 2 is a first schematic flow chart of the query method provided by the present invention, where an execution subject of the method flow shown in fig. 2 may be a query device, and the query device may be the server in the foregoing. As shown in fig. 2, the query method provided in this embodiment may include:
s101, obtaining the query words.
In this embodiment, the terminal may obtain the query term, and the manner of obtaining the query term by the terminal may be: the terminal displays an interface for acquiring the query term, and specifically, the terminal can acquire the query term through the query term input by the user on the interface and also can acquire the query term through collecting audio input by the user. After the terminal acquires the query word, the query word may be sent to the server, so that the server acquires the query word.
S102, if the query list contains the target index words with the same code values as the query words, determining the index identifications of the target index words and the number of the candidate index words containing the query words according to the query list, wherein the query list is used for indicating the corresponding relations among the code values of the target index words, the index identifications of the target index words and the number of the candidate index words.
In this embodiment, the query list may store the code value of the index word in advance. After the server acquires the query word, the server can encode the query word according to an appointed encoding mode to acquire the encoding value of the query word, and further determine whether the query list contains the target index word with the same encoding value as the query word. It should be understood that the agreed-upon encoding is the same as encoding the index words in the query list.
The query list is used for indicating the corresponding relation among the code value of the target index word, the index identification of the target index word and the number of the candidate index words. When the query list contains the target index words with the same code values as the query words, the index identifiers of the target index words and the number of candidate index words containing the query words can be determined according to the query list. The target index word is the candidate index word with the smallest index identification in all the candidate index words. The number of the candidate index words is the number of the candidate index words comprising the query words. For example, if the query word is "beijing", there are 4 candidate index words including "beijing", and the number of the candidate index words is 4.
Specifically, all the index words in the query list in the server are index words with a certain order. For example, the order of the index words may be that each index word in the query database is arranged according to the order of the Unicode code of each character in the index word, index identifiers of adjacent index words are consecutive, and a table shows a corresponding list of the index words, the encoded values of the index words, and the corresponding index identifiers, where the manner of encoding the index words is not limited in this embodiment, and the encoded values of the index words are represented in a manner of letters in table one.
For example, if the index words include four index words, namely "beijing city", "beijing university", "beijing book building", and "beijing city government", the obtained index words and the corresponding list of corresponding index identifiers are shown as follows after the four index words are sorted according to the Unicode code of each character in each index word:
watch 1
Index identification Index word Encoding a value
1000 Peking University A
1001 Beijing City B
1002 Beijing city government C
1003 Beijing book mansion D
For example, if the query word is "north", the candidate index words including the query word are four index words, namely "beijing city", "beijing university", "beijing book building", and "beijing city government", the number of the candidate index words including the query word is 4, and the index identifier of the target index word is the candidate index word with the smallest index identifier among all the candidate index words, that is, 1000. For example, if the query word is "beijing", the candidate index words including "beijing" are two index words, that is, "beijing" and "beijing government", the number of candidate index words including "beijing" is 2, and the index flag of the target index word is 1001.
The query list is used to indicate the corresponding relationship among the target index words, the index identifiers of the target index words, and the number of all candidate index words, and the form of the query list stored in the server may be specifically as shown in the following table two:
watch two
Figure BDA0002027021570000051
In this embodiment, in order to more accurately obtain the index word corresponding to the query word, the index query word corresponding to the index word may also be stored in the table two, and a specific form of the query list may specifically be as shown in the table three below:
watch III
Figure BDA0002027021570000061
In this embodiment, the server stores the query list in advance, so that after the server obtains the query term, the code value of the query term can be matched with the code value of the index term in the query list to obtain the target index term, and the index identifier of the target index term and the number of candidate index terms including the query term are obtained in the query list.
S103, acquiring all candidate index words according to the index identifications of the target index words and the number of the candidate index words, wherein the index identifications of the adjacent candidate index words are continuous.
In the present embodiment, the query database stores a plurality of index words arranged in sequence in advance, the index identifiers of adjacent candidate index words are consecutive, and the index identifier of each index word is the same as the index identifier used in the query list. The server obtains the index identifier of the target index word and the number of the candidate index words, and then the index identifier of the target index word can be used as an initial identifier in the query database, a plurality of continuous index words are all the candidate index words, and the number of the plurality of index words is the same as the number of the determined candidate index words.
Illustratively, when the query word is "beijing", the number of corresponding candidate index words including the query word is 4, and the index identifier of the target index word is 1000, then the index words from the index identifier 1000 to the index identifier 1004 are all candidate index words, i.e., the candidate index words are "beijing university", "beijing city government", and "beijing book mansion".
S104, pushing all candidate index words.
And after the server acquires all the candidate index words in the database, pushing all the candidate index words to the terminal.
For example, fig. 3 is a schematic view of an interface change of a terminal corresponding to the query method provided by the present invention, and as shown in fig. 3, an interface 201 is that when a query word input by a user in a query box is "north", a server pushes all candidate index words to the terminal as "beijing university", "beijing city government" and "beijing book mansion"; the interface 202 is that when the query word input by the user in the query box is "Beijing", the server pushes all candidate index words to the terminal as "Beijing university", "Beijing City government" and "Beijing book mansion"; the interface 203 is that when the query word input by the user in the query box is "beijing city", the server pushes all candidate index words to the terminal as "beijing city" and "beijing city government".
The embodiment provides a query method, which comprises the following steps: acquiring a query word; if the query list contains the target index words with the same code values as the query words, determining the index identifications of the target index words and the number of candidate index words containing the query words according to the query list, wherein the query list is used for indicating the corresponding relation among the code values of the target index words, the index identifications of the target index words and the number of the candidate index words; acquiring all candidate index words according to the index identification of the target index word and the number of the candidate index words, wherein the index identifications of adjacent candidate index words are continuous; and pushing all candidate index words. According to the embodiment, through the preset corresponding relation between each index word and the target index word identification and the number of the candidate index words comprising the index word, all the candidate index words corresponding to the query word can be quickly acquired after the query word is acquired, and the indexing efficiency is improved.
On the basis of the foregoing embodiment, the following further explains the query method provided by the present invention and the query list thereof with reference to fig. 4, where fig. 4 is a schematic flow diagram of the query method provided by the present invention, and as shown in fig. 4, the query method provided by the present embodiment may include:
s301, establishing a query list, a dictionary file and a query file.
In the embodiment, a pre-established query file is stored in an index database, and the query file comprises a query index area and a query content area; the query index area is used for storing a query list, the query index area is also used for indicating the storage position of the target index word in the query content area, and the query content area is used for storing all the index words in the index database.
The index database also stores a pre-established dictionary file, and the dictionary file comprises: the dictionary element area is used for storing the coding values of all the index words, and the storage positions of the double-array coding base value and the double-array coding check value corresponding to the coding value of each index word in the dictionary content area, and the dictionary content area is used for storing the base value and the check value corresponding to the coding value of each index word. It should be understood that after double-data encoding the index words, a base array and a check array for each query word may be obtained. Similarly, the check array includes the coded value of each index word and the check value corresponding to the coded value.
The specific process of creating the query list, the dictionary file, and the query file in this embodiment is described with reference to fig. 5. Fig. 5 is a schematic flow chart of establishing a query list, a dictionary file, and a query file in the query method provided by the present invention, and as shown in fig. 5, a specific process of S301 in this embodiment is as follows:
s3011, according to the Unicode code of each character in each index word and the length of each index word, all the index words are sequenced, and an index word list is generated.
In this embodiment, the server ranks the plurality of index words stored in the query database first, and the specific manner is as follows: and traversing the Unicode codes of each character in each index word in sequence, and acquiring a first sequence of the index words in the index word list to be generated according to the sequence of the Unicode codes of each character in each index word.
Illustratively, there are four index words in the database, namely "beijing university", "beijing city government" and "beijing book building", and the first order of obtaining the index words in the index word list to be generated may be the order of "beijing university", "beijing city government" and "beijing book building" in sequence according to the order of obtaining the index words from the first character, the second character, … … to the Unicode code of the last character of the index words.
In this embodiment, if there are at least two index words in which the Unicode codes of partial characters have the same order, the order of the at least two index words is adjusted in the first order according to the lengths of the at least two index words, so as to obtain an adjusted first order. The length of the index word may be the number of characters included in the index word. For example, if the index word is "Beijing", the length of the index word is 2; the index word is "Beijing City", and the length of the index word is 3.
Illustratively, "beijing city", "beijing city government" in the above first ranking is an index word, wherein the order of Unicode codes with partial characters is the same, the two index words are adjusted in the first ranking in order from small to large according to the length of the two index words, resulting in the adjusted first ranking being "beijing university", "beijing city government" and "beijing book mansion".
And after the server obtains the sequences of the plurality of index words, generating an index word list according to the adjusted first sequence, each index word and the index identifier of each index word. Specifically, the index identifier of the index word may be an order number of the index word in all the index words, which is obtained according to the above sorting.
For example, the index word list may be as shown in table one in the above embodiment, such as "beijing university" is an index word ordered as 1000 th from all index words, and therefore the index identifier of "beijing university" is set to 1000.
S3012, according to the index word list, obtaining an index tree corresponding to the index words with the same first character.
In this embodiment, the server obtains an index tree corresponding to an index word with the same first character according to the index word list. Fig. 6 is an exemplary diagram of an index tree according to the present invention, and fig. 6 illustrates an index tree formed when the index words are "university of beijing", "beijing city government", and "beijing book building", for example.
The index tree in this embodiment includes a plurality of nodes, each node includes one character of an index word having the same first character, and the characters corresponding to each node are arranged in the order of the Unicode code of each character in the index word. Such as: the index words are 'Beijing university', 'Beijing City government' and 'Beijing book mansion', the same first character is 'Beijing', and the sequence of each node corresponding to the index word 'Beijing university' is 'Beijing', 'big' and 'learning'; correspondingly, the characters in the nodes corresponding to Beijing City, Beijing City government and Beijing Booth are arranged according to the sequence of each character in the index words.
Wherein, the index identifier corresponding to each node is: in the index word list, index marks of index words are formed by characters in the nodes and all characters of father nodes of the nodes; illustratively, the corresponding identifications of the node "city" are: in the index word list, the index mark of "beijing city" is 1001, so the index mark of the node where "city" is located is 1001.
The number corresponding to each node is the number of index words beginning with the index words in the index word list; illustratively, the number of node "cities" corresponds to: the number of index words beginning from the index word "Beijing City" composed of all characters "Beijing" and "Beijing" of the father nodes of "City" and "City" is "Beijing City" and "Beijing City" in the index word list, and the number corresponding to the node "City" is 2.
According to the mode, the server sequentially traverses each node in the index tree, and obtains the index identification corresponding to the index word formed by the characters in each node and the father node of the node, and the number of the index words.
S3013, performing double-array coding on the index word corresponding to each node of the index tree, and acquiring a coding value of each index word, and a double-array coding base value and a double-array coding check value corresponding to the coding value of each index word.
It should be understood that the index word corresponding to each node is an index word composed of characters in each node and its parent node. In this embodiment, the server performs double-array coding on each index word of each index tree to obtain a base array and a check array. The base array and the check array comprise the coding value of each index word, and a double-array coding base value and a double-array coding check value corresponding to the coding value of each index word.
Specifically, the server encodes the single character corresponding to each node as a Unicode code corresponding to each character. For example, as shown in fig. 6, the constructed index tree mainly contains 10 characters, and the 10 characters and the corresponding Unicode code are: north (21271), jing (20140), da (22823), city (24066), graph (22270), school (23398), political (25919), book (20070), mansion (24220) and mansion (21414), then double-group coding is performed on each index word in the index tree, namely, the codes corresponding to the obtained index words are respectively: north (21271), beijing (40907 ═ 20767+20140), beijing da (41461 ═ 18638+22823), beijing city (42704 ═ 18638+24066), beijing diagram (40908 ═ 18638+22270), beijing university (40909 ═ 17511+23398), beijing municipality (40910 ═ 14991+25919), beijing book (40911 ═ 20841+20070), beijing government (40912 ═ 16692+24220), beijing book da (40913 ═ 18090+22823), and beijing book mansion (40914 ═ 19500+ 21414).
In the process of carrying out double-array coding on each index word, whether the current coding exceeds the length of double-array base and check is required to be judged so as to prevent the array subscript from crossing the boundary, 40907 spaces are reserved behind the maximum coding in the double-array, and therefore the purpose of exchanging the space with small loss for high efficiency is achieved.
The specific base array of the double-array coding is shown in the following table four, and the check array of the double-array coding is shown in the following table five:
watch four
Figure BDA0002027021570000101
Watch five
Figure BDA0002027021570000102
As shown in the fourth and fifth tables, the lower areas in the corresponding tables of the base array and the check array are respectively a base value and a check value; the upper areas in the corresponding tables of the base array and the check array are all the coding values of the index words.
S3014, establishing a query list and a query file according to the coding value of the index word corresponding to each node in the index tree and the index identification and number corresponding to each node.
After obtaining the code value corresponding to each index word, the server may establish a mapping relationship between each index word of the index tree, the code value of each index word of the index tree, and the index identifier and the number corresponding to the node to which each index word of the index tree belongs, to obtain the query list. It should be understood that each index word of each index tree corresponds to an index identifier of a node to which the index word belongs, that is, an index identifier of a target index word when the index word is the target index word.
For example, the query list may be as shown in Table six below:
watch six
Index word Encoding a value Index identification Number of
North China 21271 1000 4
Beijing 40907 1000 4
Beijing City 42704 1001 2
Beijing university 41461 1000 1
Similarly, the server establishes the query file according to the coding value of the index word corresponding to each node in the index tree, and the index identification and number corresponding to each node. The query file can be constructed in a query database and is divided into a query index area and a query content area. And storing the established query list in a query index area of the query file, and storing all index words in the index database in a query content area. And storing the storage position of each index word in the query content area in the query index area, and constructing a query file according to the storage position.
Optionally, after the server generates the query list, the server may further store the query list in a double array form, and accordingly, the obtained double data includes a from array (i.e., index identifier) and a length array (i.e., number including the index word) corresponding to each index word.
Specifically, the from array is shown in the seventh table below, and the length array is shown in the eighth table below:
watch seven
Figure BDA0002027021570000111
Table eight
Figure BDA0002027021570000112
As shown in table seven and table eight above, the upper area in the corresponding table of the from array and the length array is the array subscript, that is, the coding value of the corresponding index word, respectively, and the lower area in the table is the array content, which is the corresponding index identifier and the number containing the index word, respectively.
Fig. 7 is an exemplary diagram of a query file according to the present invention, as shown in fig. 7, a code value of each index word and a storage location of each index word in a query content area are stored in a query meta area, and the number of the index words is shown in the query index area in fig. 7 for simplicity. The query content area stores index identifications of target index words corresponding to the index words of each index tree and the number of the index words. Illustratively, from21271 and length21271 correspond to the index identifier and the number of index words of the target index word "north", respectively, from40907 and length40907 correspond to the index identifier and the number of index words of the index word "beijing", respectively, and the index identifier of the target index word is denoted by the abbreviation f and the number of index words is denoted by the abbreviation l in fig. 7.
S3015, establishing a dictionary file according to the coding value of the index word corresponding to each node in the index tree, and the base value and check value corresponding to the coding value of each index word.
In the same way as the above-described method for creating a query file, a dictionary file may be created in the query database, and the dictionary file is divided into a dictionary element area and a dictionary content area. And further storing the base value and the check value corresponding to the coded value of each index word in the dictionary element area at the storage position of the dictionary content area, so as to construct a dictionary file, and facilitate the query of the base value and the check value corresponding to the coded value of each index word.
Fig. 8 is an exemplary diagram of a dictionary file provided by the present invention, as shown in fig. 8, a dictionary element area stores coded values of index words, and storage locations of base values and check values corresponding to the coded values of each index word in a dictionary content area, and in fig. 8, for simplicity, the number of index words is shown in the dictionary element area. The dictionary content area stores a base value and a check value corresponding to each index word of each index tree; illustratively, the base21271 and the check21271 correspond to a base value and a check value of the index word "north", respectively, the base40907 and the check40907 correspond to a base value and a check value of the index word "beijing", respectively, and the base value is denoted by the abbreviation b and the check value is denoted by the abbreviation c in fig. 8, respectively.
It should be understood that S3014 and S3015 do not have a chronological distinction, and both may be performed simultaneously.
S302, obtaining the query words.
S303, determining that the query list contains the target index words with the same code values as the query words according to the query words, the base values and the check values corresponding to the code values of the index words.
If the query word contains 1 character, the Unicode code of the query word is the code value of the query word, and the query word is determined to contain the target index word with the same code value as the query word in the query list according to the comparison between the code value of the query word and the code values of the index words stored in the query list.
If the query word includes i characters, i is an integer greater than 1, that is, if the query word includes at least two characters, the encoded value of the second character string may be obtained in the dictionary element area according to the Unicode of the ith character and the base value corresponding to the encoded value of the first character string. Wherein, the first character string is: a character string formed from the first character to the i-1 th character of the query word, the second character string being: a character string formed starting from the first character to the ith character.
And if the check value corresponding to the coding value of the second character string is the Unicode code of the (i-1) th character, determining that the query list contains the target index word which is the same as the coding value of the query word, wherein the coding value of the second character string is the coding value of the query word.
For example, the query word is "beijing", the Unicode code corresponding to the first character "north" of the query word is 21271, and the Unicode code 20140 corresponding to "jing"; if the code value of the preset index word "Beijing" with the first character as the first character is 40907, acquiring a base value corresponding to "Beijing" as 20767 according to the difference value between the code value of "Beijing" as 40907 and a Unicode code 20140 corresponding to "Beijing", judging that the query word has a second character "Beijing" in addition to the first character by the server, and acquiring a code value of a character string "Beijing" from the first character to the second character according to the Unicode code of the second character and the base value corresponding to the first character; and judging whether the check value corresponding to the character string coding value is a Unicode code of the first character, and when the check value corresponding to the character string coding value is the Unicode code of the first character, namely the check value corresponding to the character string coding value is equal to the Unicode code of the first character, determining that the query list contains a target index word which is the same as the coding value of the query word, wherein the character string coding value is the coding value of the query word. Optionally, if the check value corresponding to the string code value is not the Unicode code of the first character, that is, the check value corresponding to the string code value is not equal to the Unicode code of the first character, it is determined that the query list does not include the target index word, and the query is stopped.
It should be understood that, in this scenario, it may also be determined whether the query term further includes a third character, and it may be determined whether the check value corresponding to the code values of the three character strings is equal to the Unicode code of the second character, if so, it may be determined whether there is a fourth character, and it is determined that the query list includes a target index word having the same code value as the query term, until the last character in the query term is determined.
S304, determining the storage position of the target index word in the query content area in the query index area according to the index identifier of the target index word; and acquiring all candidate index words in the query content area according to the storage positions of the target index words in the query content area and the number of the candidate index words.
In the above steps, the server obtains the code value of the query word, and determines that the query list contains the target index word having the same code value as the query word, so that the index identifier of the target index word having the same code value as the query word and the number of candidate index words including the query word can be determined in the query list. Because the query index area contains the storage position of each index word in the query content area, the storage position of the target index word in the query content area can be obtained according to the index mark of the target index word.
Alternatively, the index words in the query content area may be sorted in the same manner as in the query list, i.e., the index identifiers of adjacent index words are consecutive. The server may obtain all candidate index terms in the query content area according to the storage location of the target index term in the query content area and the number of candidate index terms including the query term. For example, the number of candidate index words including the query word is 3, the index identifier of the target index word is 1000, and all the corresponding candidate index words are index words with index identifiers of 1000, 1001, and 1002.
S305, pushing all candidate index words.
The specific implementation of S302 and S305 in this embodiment may specifically refer to the related descriptions in S101 and S104 in the foregoing embodiment, which are not described herein again.
In the embodiment, an index tree mode is adopted, so that the index word corresponding to each node, the index identification corresponding to each index word and the number of the index words can be quickly determined, and a query list and a query file are further established; furthermore, double-array coding is carried out on the index words in the index tree by adopting a double-array coding mode, so that convenience is provided for establishing a dictionary file. In this embodiment, when it is determined that the target index word is included in the query list according to the dictionary file, all candidate index words are obtained from the query file. The structures of the dictionary file and the query file in the embodiment provide convenience for determining that the query list contains the target index words and acquiring all candidate index words, and the indexing speed of the query mode provided in the embodiment is irrelevant to the number of the index words, so that the indexing efficiency can be improved.
Fig. 9 is a schematic structural diagram of a query device provided in the present invention. As shown in fig. 9, the query device 400 includes: a query term obtaining module 401, a determining module 402, a candidate index term obtaining module 403, and a pushing module 404.
A query term obtaining module 401, configured to obtain a query term.
A determining module 402, configured to determine, according to the query list, an index identifier of the target index word and the number of candidate index words including the query word if the query list includes the target index word having the same code value as the query word, where the query list is used to indicate a correspondence between the code value of the target index word, the index identifier of the target index word, and the number of candidate index words.
The candidate index word obtaining module 403 is configured to obtain all candidate index words according to the index identifier of the target index word and the number of the candidate index words, where the index identifiers of adjacent candidate index words are consecutive.
A pushing module 404, configured to push all candidate index words.
The principle and technical effect of the query device provided in this embodiment are similar to those of the query method, and are not described herein again.
Optionally, fig. 10 is a schematic structural diagram of the query device provided by the present invention. As shown in fig. 10, the query device 400 further includes: a setup module 405.
Optionally, the index database stores query files, and the query files include a query index area and a query content area; the query index area is used for storing a query list, the query index area is also used for indicating the storage position of the target index word in the query content area, and the query content area is used for storing all the index words in the index database.
Optionally, a dictionary file is further stored in the index database, and the dictionary file includes: the dictionary element area is used for storing the coding values of all the index words and the storage positions of the base values and the check values corresponding to the coding values of all the index words in the dictionary content area, and the dictionary content area is used for storing the base values and the check values corresponding to the coding values of all the index words.
The determining module 402 is further configured to determine, according to the query term, a base value and a check value corresponding to the code value of each index term, that the query list contains a target index term that is the same as the code value of the query term.
Optionally, the query term includes i characters.
The determining module 402 is specifically configured to obtain, according to the Unicode code of the ith character and a base value corresponding to the coded value of the first character string, the coded value of the second character string in the dictionary element area, where the first character string is: a character string formed from the first character to the i-1 th character of the query word, the second character string being: a character string formed from the first character to the ith character;
if the check value corresponding to the coding value of the second character string is the Unicode code of the (i-1) th character, determining that the query list contains the target index word which is the same as the coding value of the query word, wherein the coding value of the second character string is the coding value of the query word, and i is an integer greater than 1;
and when i is equal to 1, the Unicode code of the query word is the code value of the query word.
A candidate index word obtaining module 403, configured to determine, in the query index area, a storage location of the target index word in the query content area according to the index identifier of the target index word; and acquiring all candidate index words in the query content area according to the storage positions of the target index words in the query content area and the number of the candidate index words.
The establishing module 405 is configured to sort all the index words according to the Unicode code of each character in each index word and the length of each index word, and generate an index word list; and establishing a query list, a dictionary file and a query file according to the index word list.
Optionally, the establishing module 405 is specifically configured to sequentially traverse Unicode codes of each character in each index word, and obtain a first sequence of the index words in the to-be-generated index word list according to the order of the Unicode codes of each character in each index word; if the sequence of the Unicode codes of at least one character in at least two index words is the same, adjusting the sequence of the at least two index words in the first sequence according to the lengths of the at least two index words to obtain the adjusted first sequence; and generating an index word list according to the adjusted first sequence, each index word and the index identification of each index word.
Optionally, the establishing module 405 is specifically configured to obtain, according to the index word list, an index tree corresponding to the index words with the same first character, where the index tree includes a plurality of nodes, and each node includes one character of the index word with the same first character; wherein, the index identifier corresponding to each node is: in the index word list, index marks of index words are formed by characters in the nodes and all characters of father nodes of the nodes, and the number corresponding to each node is the number of candidate index words comprising the index words in the index word list;
performing double-array coding on the index words corresponding to each node of the index tree to obtain a coding value of each index word and a base value and a check value corresponding to the coding value of each index word;
establishing a query list and a query file according to the coding value of the index word corresponding to each node in the index tree and the index identification and number corresponding to each node;
and establishing a dictionary file according to the coding value of the index word corresponding to each node in the index tree, and the base value and the check value corresponding to the coding value of each index word.
Fig. 11 is a schematic structural diagram of a third querying device provided in the present invention, as shown in fig. 11, the querying device 500 includes: a memory 501 and at least one processor 502.
A memory 501 for storing program instructions.
The processor 502 is configured to implement the query method in this embodiment when the program instructions are executed, and specific implementation principles may be referred to in the foregoing embodiments, which are not described herein again.
The querying device 500 may also include an input/output interface 503.
The input/output interface 503 may include a separate output interface and input interface, or may be an integrated interface that integrates input and output. The output interface is used for outputting data, the input interface is used for acquiring input data, the output data is a general name output in the method embodiment, and the input data is a general name input in the method embodiment.
The present invention also provides a readable storage medium, in which an execution instruction is stored, and when at least one processor of the query device executes the execution instruction, when the computer execution instruction is executed by the processor, the query method in the above embodiments is implemented.
The present invention also provides a program product comprising execution instructions stored in a readable storage medium. The at least one processor of the querying device may read the execution instructions from the readable storage medium, and the execution of the execution instructions by the at least one processor causes the querying device to implement the querying method provided by the various embodiments described above.
In the embodiments provided in the present invention, it should be understood that the disclosed apparatus and method may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the modules is merely a logical division, and in actual implementation, there may be other divisions, for example, multiple modules or components may be combined or integrated into another system, or some features may be omitted, or not implemented. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or modules, and may be in an electrical, mechanical or other form.
The modules described as separate parts may or may not be physically separate, and parts displayed as modules may or may not be physical modules, may be located in one place, or may be distributed on a plurality of network modules. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment.
In addition, functional modules in the embodiments of the present invention may be integrated into one processing module, or each of the modules may exist alone physically, or two or more modules are integrated into one module. The integrated module can be realized in a hardware form, and can also be realized in a form of hardware and a software functional module.
The integrated module implemented in the form of a software functional module may be stored in a computer-readable storage medium. The software functional module is stored in a storage medium and includes several instructions to enable a computer device (which may be a personal computer, a server, or a network device) or a processor (processor) to execute some steps of the methods according to the embodiments of the present invention. And the aforementioned storage medium includes: a U disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.
In the foregoing embodiments of the network device or the terminal device, it should be understood that the Processor may be a Central Processing Unit (CPU), or may be another general-purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), or the like. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of a method disclosed in connection with the present application may be embodied directly in a hardware processor, or in a combination of the hardware and software modules in the processor.
Finally, it should be noted that: the above embodiments are only used to illustrate the technical solution of the present invention, and not to limit the same; while the invention has been described in detail and with reference to the foregoing embodiments, it will be understood by those skilled in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some or all of the technical features may be equivalently replaced; and the modifications or the substitutions do not make the essence of the corresponding technical solutions depart from the scope of the technical solutions of the embodiments of the present invention.

Claims (10)

1. A method of querying, comprising:
acquiring a query word;
if the query list contains the target index words with the same code values as the query words, determining the index identifications of the target index words and the number of candidate index words containing the query words according to the query list, wherein the query list is used for indicating the corresponding relation among the code values of the target index words, the index identifications of the target index words and the number of the candidate index words;
acquiring all candidate index words according to the index identifications of the target index words and the number of the candidate index words, wherein the index identifications of the adjacent candidate index words are continuous;
and pushing all the candidate index words.
2. The method according to claim 1, wherein the index database stores a query file, and the query file comprises a query index area and a query content area; the query index area is used for storing the query list, the query index area is also used for indicating the storage position of the target index word in the query content area, and the query content area is used for storing all the index words in the index database.
3. The method of claim 2, wherein the index database further stores a dictionary file, the dictionary file comprising: the dictionary element area is used for storing the coded values of all the index words, and the storage positions of the double-array coded base value and the double-array coded check value corresponding to the coded value of each index word in the dictionary content area, and the dictionary content area is used for storing the base value and the check value corresponding to the coded value of each index word;
before determining the index identifier of the target index word and the number of candidate index words including the query word according to the query list, the method further includes:
and determining that the query list contains the target index words with the same code values as the query words according to the query words, and the base values and check values corresponding to the code values of the index words.
4. The method of claim 3, wherein the query word comprises i characters, and the determining that the query list comprises the target index word having the same code value as the query word comprises:
acquiring a coding value of a second character string in the dictionary element area according to a Unicode code of the ith character of the query word and a base value corresponding to a coding value of a first character string, wherein the first character string is as follows: a character string formed from the first character to the (i-1) th character of the query word, wherein the second character string is: a character string formed starting from the first character to the ith character;
if the check value corresponding to the coding value of the second character string is the Unicode code of the (i-1) th character, determining that the query list contains a target index word which is the same as the coding value of the query word, wherein the coding value of the second character string is the coding value of the query word, and i is an integer greater than 1;
and when i is equal to 1, the Unicode code of the query word is the code value of the query word.
5. The method according to claim 4, wherein the obtaining all the candidate index words according to the index identifiers of the target index words and the number of the candidate index words comprises:
determining the storage position of the target index word in the query content area in the query index area according to the index identifier of the target index word;
and acquiring all the candidate index words in the query content area according to the storage positions of the target index words in the query content area and the number of the candidate index words.
6. The method of claim 3, wherein before obtaining the query term, further comprising:
sequencing all the index words according to the Unicode code of each character in each index word and the length of each index word to generate an index word list;
and establishing the query list, the dictionary file and the query file according to the index word list.
7. The method of claim 6, wherein generating the list of index words comprises:
sequentially traversing the Unicode codes of each character in each index word, and acquiring a first sequence of the index words in the index word list to be generated according to the sequence of the Unicode codes of each character in each index word;
if the sequence of the Unicode codes of at least one character in at least two index words is the same, adjusting the sequence of the at least two index words in the first sequence according to the lengths of the at least two index words to obtain an adjusted first sequence;
and generating an index word list according to the adjusted first sequence, each index word and the index identification of each index word.
8. The method according to claim 6 or 7, wherein the creating the query list, the dictionary file and the query file according to the index word list comprises:
acquiring an index tree corresponding to index words with the same first character according to the index word list, wherein the index tree comprises a plurality of nodes, and each node comprises a character of the index word with the same first character; wherein, the index identifier corresponding to each node is: in the index word list, index marks of index words are formed by characters in the nodes and all characters of father nodes of the nodes, and the number corresponding to each node is the number of candidate index words comprising the index words in the index word list;
performing double-array coding on the index words corresponding to each node of the index tree, and acquiring a coding value of each index word, and a base value and a check value corresponding to the coding value of each index word;
establishing the query list and the query file according to the coding value of the index word corresponding to each node in the index tree and the index identification and the number corresponding to each node;
and establishing the dictionary file according to the coding value of the index word corresponding to each node in the index tree and the base value and check value corresponding to the coding value of each index word.
9. An inquiry apparatus, comprising: at least one processor and memory;
the memory stores computer-executable instructions;
the at least one processor executing the computer-executable instructions stored by the memory causes the querying device to perform the method of any one of claims 1-8.
10. A computer-readable storage medium having computer-executable instructions stored thereon which, when executed by a processor, implement the method of any one of claims 1-8.
CN201910297346.5A 2019-04-15 2019-04-15 Query method, device and storage medium Pending CN111831876A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910297346.5A CN111831876A (en) 2019-04-15 2019-04-15 Query method, device and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910297346.5A CN111831876A (en) 2019-04-15 2019-04-15 Query method, device and storage medium

Publications (1)

Publication Number Publication Date
CN111831876A true CN111831876A (en) 2020-10-27

Family

ID=72914521

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910297346.5A Pending CN111831876A (en) 2019-04-15 2019-04-15 Query method, device and storage medium

Country Status (1)

Country Link
CN (1) CN111831876A (en)

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101398830A (en) * 2007-09-27 2009-04-01 阿里巴巴集团控股有限公司 Thesaurus fuzzy enquiry method and thesaurus fuzzy enquiry system
KR20100053269A (en) * 2008-11-12 2010-05-20 엔에이치엔(주) Method and system for providing recommendation query
CN102768681A (en) * 2012-06-26 2012-11-07 北京奇虎科技有限公司 Recommending system and method used for search input
CN103092992A (en) * 2013-02-17 2013-05-08 南京师范大学 Vector data preorder quadtree coding and indexing method based on Key / Value type NoSQL (Not only SQL)
CN107341165A (en) * 2016-04-29 2017-11-10 上海京东到家元信信息技术有限公司 The method and apparatus for prompting display are carried out at search box
CN108227954A (en) * 2017-12-29 2018-06-29 北京奇虎科技有限公司 A kind of method, apparatus and electronic equipment that search input associational word is provided
US20180260469A1 (en) * 2017-03-08 2018-09-13 Centri Technology, Inc. Fast indexing and searching of encoded documents

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101398830A (en) * 2007-09-27 2009-04-01 阿里巴巴集团控股有限公司 Thesaurus fuzzy enquiry method and thesaurus fuzzy enquiry system
KR20100053269A (en) * 2008-11-12 2010-05-20 엔에이치엔(주) Method and system for providing recommendation query
CN102768681A (en) * 2012-06-26 2012-11-07 北京奇虎科技有限公司 Recommending system and method used for search input
CN103092992A (en) * 2013-02-17 2013-05-08 南京师范大学 Vector data preorder quadtree coding and indexing method based on Key / Value type NoSQL (Not only SQL)
CN107341165A (en) * 2016-04-29 2017-11-10 上海京东到家元信信息技术有限公司 The method and apparatus for prompting display are carried out at search box
US20180260469A1 (en) * 2017-03-08 2018-09-13 Centri Technology, Inc. Fast indexing and searching of encoded documents
CN108227954A (en) * 2017-12-29 2018-06-29 北京奇虎科技有限公司 A kind of method, apparatus and electronic equipment that search input associational word is provided

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
曹广顺;呙维;朱欣焰;佘冰;: "一种基于key-value数据库的快速地名地址输入提示方法", 计算机应用研究, no. 11, pages 3334 - 3338 *

Similar Documents

Publication Publication Date Title
CN108304444B (en) Information query method and device
US8171029B2 (en) Automatic generation of ontologies using word affinities
CN108712519B (en) Method and device for positioning IP address and storage medium
CN111460311A (en) Search processing method, device and equipment based on dictionary tree and storage medium
EP3072076B1 (en) A method of generating a reference index data structure and method for finding a position of a data pattern in a reference data structure
CN101794307A (en) Vehicle navigation POI (Point of Interest) search engine based on internetwork word segmentation idea
CN111339382A (en) Character string data retrieval method and device, computer equipment and storage medium
Haj Rachid et al. A practical and scalable tool to find overlaps between sequences
CN111198936B (en) Voice search method and device, electronic equipment and storage medium
CN111582967A (en) Content search method, device, equipment and storage medium
CN106407221B (en) Address data retrieval method and device
CN102385597B (en) The fault-tolerant searching method of a kind of POI
CN111797279B (en) Method and device for storing data
CN108345607B (en) Searching method and device
CN106844553B (en) Data detection and expansion method and device based on sample data
CN111190937B (en) Method and device for inquiring native information, electronic equipment and storage medium
CN111831876A (en) Query method, device and storage medium
CN108776705B (en) Text full-text accurate query method, device, equipment and readable medium
CN110347925A (en) Information processing method and computer readable storage medium
CN108376054B (en) Processing method and device for indexing identification data
CN116521733A (en) Data query method and device
CN113297204B (en) Index generation method and device
JP2012069059A (en) Specific character string exclusion character string retrieval support system and retrieval support method and program for the same
CN113918796A (en) Information searching method, device, server and storage medium
CN109726254B (en) Method and device for constructing triple knowledge base

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination