CN108563762A - Inverted index method and device - Google Patents

Inverted index method and device Download PDF

Info

Publication number
CN108563762A
CN108563762A CN201810346228.4A CN201810346228A CN108563762A CN 108563762 A CN108563762 A CN 108563762A CN 201810346228 A CN201810346228 A CN 201810346228A CN 108563762 A CN108563762 A CN 108563762A
Authority
CN
China
Prior art keywords
level
rows
evidence
index
falling number
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
CN201810346228.4A
Other languages
Chinese (zh)
Inventor
梁希云
秦锋剑
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Green Bay Network Technology Co., Ltd.
Original Assignee
Grass Count Language (beijing) Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Grass Count Language (beijing) Technology Co Ltd filed Critical Grass Count Language (beijing) Technology Co Ltd
Priority to CN201810346228.4A priority Critical patent/CN108563762A/en
Publication of CN108563762A publication Critical patent/CN108563762A/en
Withdrawn legal-status Critical Current

Links

Landscapes

  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The present invention proposes that a kind of inverted index method and device, wherein method include:Obtain search condition;Search condition includes:At least one keyword to be retrieved;Inverted index structure is inquired according to search condition, is obtained and the matched level-one index terms of search condition;Inquire the corresponding secondary index data of level-one index terms according to search condition, obtain in secondary index data with the matched secondary index word of search condition;According to the corresponding two level evidence of falling number of rows of secondary index word, determine retrieval result, so as to avoid using all advertisement stereotactic conditions as level-one index terms, also it avoids using part advertisement stereotactic conditions as the filter condition to retrieval result, but using part advertisement stereotactic conditions as level-one index terms, advertisement stereotactic conditions in part are as secondary index word, while improving recall precision, it is ensured that lower EMS memory occupation amount.

Description

Inverted index method and device
Technical field
The present invention relates to technical field of data processing more particularly to a kind of inverted index method and devices.
Background technology
Currently, the inverted index structure used in the systems such as search engine, targeted ads play system is generally level-one and falls Arrange index structure.For example, in targeted ads play system, a kind of solution is, using all advertisement stereotactic conditions as Level-one index terms, the relevant information with the matched advertisement of advertisement stereotactic conditions is as the level-one evidence of falling number of rows.However this solution party In method, with the increase of advertisement stereotactic conditions, the EMS memory occupation amount of inverted index structure can greatly increase.Another solution party Method is, using a part of advertisement stereotactic conditions as level-one index terms, using another part advertisement stereotactic conditions to retrieval result into Row filtering.However in this solution, strainability is inefficient, influences recall precision.
Invention content
The present invention is directed to solve at least some of the technical problems in related technologies.
For this purpose, first purpose of the present invention is to propose a kind of inverted index method, for solving to fall in the prior art The problem of row's index structure is difficult to meet lower EMS memory occupation amount and higher recall precision simultaneously.
Second object of the present invention is to propose a kind of inverted index device.
Third object of the present invention is to propose another inverted index device.
Fourth object of the present invention is to propose a kind of non-transitorycomputer readable storage medium.
The 5th purpose of the present invention is to propose a kind of computer program product.
In order to achieve the above object, first aspect present invention embodiment proposes a kind of inverted index method, including:
Obtain search condition;The search condition includes:At least one keyword to be retrieved;
Inverted index structure is inquired according to the search condition, is obtained and the matched level-one index terms of the search condition;
The corresponding secondary index data of level-one index terms are inquired according to the search condition, obtain the secondary index data In with the matched secondary index word of the search condition;
According to the corresponding two level evidence of falling number of rows of the secondary index word, retrieval result is determined.
Further, the inverted index structure includes:
Level-one index terms and the corresponding level-one evidence of falling number of rows and secondary index data;
The level-one evidence of falling number of rows includes:With the information of the relevant each object of level-one index terms;
The secondary index data include:Each secondary index word and the corresponding two level evidence of falling number of rows.
Further, the corresponding two level evidence of falling number of rows of the secondary index word includes:In the level-one evidence of falling number of rows with The information of the relevant each object of secondary index word.
Further, in the two level evidence of falling number of rows, the information of the object falls number of rows for the object in the level-one Sequence number in.
Further, described that retrieval result is determined according to the corresponding two level evidence of falling number of rows of the secondary index word, including:
Obtain the sequence number of each object in the two level evidence of falling number of rows;
According to the sequence number of each object, the level-one evidence of falling number of rows is inquired, is obtained in the level-one evidence of falling number of rows The information of each object;
According to the information of each object described in the level-one evidence of falling number of rows, retrieval result is determined.
Further, further include in the secondary index data:The byte Type of sequence number;The byte class of the sequence number Type is determined according to the quantity of object in the level-one evidence of falling number of rows;
It is corresponding, the sequence number for obtaining each object in the two level evidence of falling number of rows, including:
Obtain the byte Type of sequence number in the secondary index data;
According to the byte Type, the sequence number of each object is obtained successively from the two level evidence of falling number of rows.
The inverted index method of the embodiment of the present invention, by obtaining search condition;Search condition includes:It is at least one to wait for The keyword of retrieval;Inverted index structure is inquired according to search condition, is obtained and the matched level-one index terms of search condition;According to Search condition inquires the corresponding secondary index data of level-one index terms, obtain in secondary index data with search condition matched two Grade index terms;According to the corresponding two level evidence of falling number of rows of secondary index word, retrieval result is determined, so as to which avoid will be all wide Stereotactic conditions are accused as level-one index terms, are also avoided using part advertisement stereotactic conditions as the filter condition to retrieval result, and It is using part advertisement stereotactic conditions as level-one index terms, advertisement stereotactic conditions in part are retrieved as secondary index word improving While efficiency, it is ensured that lower EMS memory occupation amount.
In order to achieve the above object, second aspect of the present invention embodiment proposes a kind of inverted index device, including:
Acquisition module, for obtaining search condition;The search condition includes:At least one keyword to be retrieved;
Enquiry module, for inquiring inverted index structure according to the search condition, acquisition is matched with the search condition Level-one index terms;
The enquiry module is additionally operable to inquire the corresponding secondary index data of level-one index terms according to the search condition, Obtain in the secondary index data with the matched secondary index word of the search condition;
Determining module, for according to the corresponding two level evidence of falling number of rows of the secondary index word, determining retrieval result.
Further, the inverted index structure includes:
Level-one index terms and the corresponding level-one evidence of falling number of rows and secondary index data;
The level-one evidence of falling number of rows includes:With the information of the relevant each object of level-one index terms;
The secondary index data include:Each secondary index word and the corresponding two level evidence of falling number of rows.
Further, the corresponding two level evidence of falling number of rows of the secondary index word includes:In the level-one evidence of falling number of rows with The information of the relevant each object of secondary index word.
Further, in the two level evidence of falling number of rows, the information of the object falls number of rows for the object in the level-one Sequence number in.
Further, the determining module includes:
Acquiring unit is each object described for the information of each object in the two level evidence of falling number of rows When sequence number in the corresponding level-one evidence of falling number of rows of level-one index terms, the sequence of each object in the two level evidence of falling number of rows is obtained Number;
The acquiring unit is additionally operable to the sequence number according to each object, inquires the level-one evidence of falling number of rows, obtains The information of each object described in the level-one evidence of falling number of rows;
Determination unit determines retrieval result for the information according to each object described in the level-one evidence of falling number of rows.
Further, further include in the secondary index data:The byte Type of sequence number;The byte class of the sequence number Type is determined according to the quantity of object in the level-one evidence of falling number of rows;
Corresponding, the acquiring unit is specifically used for,
Obtain the byte Type of sequence number in the secondary index data;
According to the byte Type, the sequence number of each object is obtained successively from the two level evidence of falling number of rows.
The inverted index device of the embodiment of the present invention, by obtaining search condition;Search condition includes:It is at least one to wait for The keyword of retrieval;Inverted index structure is inquired according to search condition, is obtained and the matched level-one index terms of search condition;According to Search condition inquires the corresponding secondary index data of level-one index terms, obtain in secondary index data with search condition matched two Grade index terms;According to the corresponding two level evidence of falling number of rows of secondary index word, retrieval result is determined, so as to which avoid will be all wide Stereotactic conditions are accused as level-one index terms, are also avoided using part advertisement stereotactic conditions as the filter condition to retrieval result, and It is using part advertisement stereotactic conditions as level-one index terms, advertisement stereotactic conditions in part are retrieved as secondary index word improving While efficiency, it is ensured that lower EMS memory occupation amount.
In order to achieve the above object, third aspect present invention embodiment proposes another inverted index device, including memory, Processor and storage are on a memory and the computer program that can run on a processor, which is characterized in that the processor is held Inverted index method as described above is realized when row described program.
To achieve the goals above, fourth aspect present invention embodiment proposes a kind of computer-readable storage of non-transitory Medium realizes method as described above when the instruction in the storage medium is executed by processor.
In order to achieve the above object, fifth aspect present invention embodiment proposes a kind of computer program product, when the calculating When instruction processing unit in machine program product executes, method as described above is realized.
The additional aspect of the present invention and advantage will be set forth in part in the description, and will partly become from the following description Obviously, or practice through the invention is recognized.
Description of the drawings
Above-mentioned and/or additional aspect and advantage of the invention will become from the following description of the accompanying drawings of embodiments Obviously and it is readily appreciated that, wherein:
Fig. 1 is a kind of flow diagram of inverted index method provided in an embodiment of the present invention;
Fig. 2 is level-one index terms and the schematic diagram of the corresponding level-one evidence of falling number of rows and secondary index data;
Fig. 3 is the flow diagram of another inverted index method provided in an embodiment of the present invention;
Fig. 4 is that the information of the object in the two level evidence of falling number of rows in Fig. 2 is the schematic diagram of sequence number;
Fig. 5 is a kind of structural schematic diagram of inverted index device provided in an embodiment of the present invention;
Fig. 6 is the structural schematic diagram of another inverted index device provided in an embodiment of the present invention;
Fig. 7 is the structural schematic diagram of another inverted index device provided in an embodiment of the present invention.
Specific implementation mode
The embodiment of the present invention is described below in detail, examples of the embodiments are shown in the accompanying drawings, wherein from beginning to end Same or similar label indicates same or similar element or element with the same or similar functions.Below with reference to attached The embodiment of figure description is exemplary, it is intended to for explaining the present invention, and is not considered as limiting the invention.
Below with reference to the accompanying drawings the inverted index method and device of the embodiment of the present invention is described.
Fig. 1 is a kind of flow diagram of inverted index method provided in an embodiment of the present invention.As shown in Figure 1, should the row of falling Indexing means include the following steps:
S101, search condition is obtained;Search condition includes:At least one keyword to be retrieved.
The executive agent of inverted index method provided by the invention is inverted index device, and inverted index device specifically can be with For the system retrieved based on inverted index structure, such as search engine, targeted ads play system etc..In search engine In, search condition can be search term, and search term can be single keyword or multiple keywords.It is played in targeted ads and is In system, search condition can be single or multiple advertisement stereotactic conditions.Advertisement stereotactic conditions are for example, region orientation, Ren Qunding To, point of interest orientation etc..
S102, inverted index structure is inquired according to search condition, obtained and the matched level-one index terms of search condition.
In the present embodiment, may include in inverted index structure:Level-one index terms and the corresponding level-one evidence of falling number of rows and Secondary index data.The level-one evidence of falling number of rows includes:With the information of the relevant each object of level-one index terms;Secondary index data Include:Each secondary index word and the corresponding two level evidence of falling number of rows.
Wherein, in a search engine, object can be for example document, and level-one index terms can be keyword;With level-one rope Draw the information of the relevant each object of word, can be the information of the document comprising level-one index terms.The information of document for example can be with Number, the number difference etc. of document and previous document for document.In addition, the information of document can also include:Level-one indexes There are the information such as position in word occurrence number in a document.
In targeted ads play system, object can be for example advertisement, and level-one index terms can be advertisement stereotactic conditions; Can be the information for the advertisement for meeting advertisement stereotactic conditions with the information of the relevant each object of level-one index terms.The letter of advertisement It ceases such as can be the number difference of the number, advertisement of advertisement with previous advertisement.
Further, on the basis of the above embodiments, may include in the corresponding two level evidence of falling number of rows of secondary index word: In the level-one evidence of falling number of rows with the information of the relevant each object of secondary index word.
Wherein, in a search engine, may include in the two level evidence of falling number of rows:Include level-one index terms and secondary index simultaneously The information of each document of word.In targeted ads play system, may include in the two level evidence of falling number of rows:Meet level-one simultaneously The information of each advertisement of advertisement stereotactic conditions and the advertisement stereotactic conditions of two level.
It should be noted that in inverted index structure in the present embodiment, level-one index terms and corresponding level-one fall to arrange Data and secondary index data, are stored in key-value distributed memory systems.Level-one index terms is key values;Corresponding one The grade evidence of falling number of rows and secondary index data are value values.The data structure of value values is structure.
Wherein, in the case of the number that object is only included in the level-one evidence of falling number of rows, the data structure of the level-one evidence of falling number of rows can Think array;Further include in the level-one evidence of falling number of rows:Occurrence number of the level-one index terms in object, when there are the information such as position, The data structure of the level-one evidence of falling number of rows can be structure.Wherein, the data structure of secondary index data can be structure.Knot May include multiple key-value in structure body;Wherein, secondary index word is key values, and the two level evidence of falling number of rows is value.
Inverted index structure in the present embodiment, compared with the existing technology in level-one inverted index structure for, increase Secondary index data corresponding with level-one index terms, secondary index data include:Each secondary index word and corresponding The two level evidence of falling number of rows;Relative to using secondary index word as the scheme of the filter condition of retrieval result, recall precision is big It is big to improve, and the increment of EMS memory occupation amount is smaller;It is all indexed as level-one relative to by level-one index terms and secondary index word For the scheme of word, it is greatly reduced EMS memory occupation amount, therefore, it is possible to while improving recall precision, it is ensured that lower EMS memory occupation amount.
S103, the corresponding secondary index data of level-one index terms are inquired according to search condition, obtain in secondary index data With the matched secondary index word of search condition.
In the present embodiment, in the case where search condition includes multiple search terms or advertisement stereotactic conditions, inverted index In structure there may be with search condition matched level-one index terms and the corresponding secondary index data of level-one index terms simultaneously In secondary index word.As shown in Fig. 2, being level-one index terms and the corresponding level-one evidence of falling number of rows and secondary index data Schematic diagram.In fig. 2, by taking targeted ads play system as an example, using fresh flower as level-one index terms, for example with the region orientation of fresh flower Beijing, Shandong, Shanghai are secondary index word.In fig. 2, with the number of the relevant advertisement of fresh flower can be 12768,12769, 13530 and 13546.With search condition it is matched simultaneously can be level-one index terms " fresh flower " and secondary index word " Beijing " or Person can be level-one index terms " fresh flower " and secondary index word " Shandong " or can be level-one index terms " fresh flower " and two level rope Draw word " Shanghai ".
S104, according to the corresponding two level evidence of falling number of rows of secondary index word, determine retrieval result.
In the present embodiment, may include in the corresponding two level evidence of falling number of rows of secondary index word:Simultaneously include or meet one The information of the object of grade index terms and secondary index word.For example, in fig. 2, the corresponding two level of secondary index word " Beijing " falls to arrange The advertisement number that data include is 12768 and 13530;The corresponding two level evidence of falling number of rows of secondary index word " Shandong " includes Advertisement number is 12769,13530 and 13546;The advertisement that the corresponding two level evidence of falling number of rows of secondary index word " Shanghai " includes is compiled Number be 12768,12769 and 13546.
In the present embodiment, inverted index device can according to advertisement in the corresponding two level evidence of falling number of rows of secondary index word or The number etc. of document, obtaining corresponding advertisement, either the set of corresponding advertisement or document is determined as retrieval result by document. Wherein, by taking advertisement as an example, inverted index device can inquire advertising database according to the number of respective advertisement, obtain corresponding wide The properties collection of respective advertisement is determined as retrieval result by the content of announcement.
Further, the renewal process of the inverted index structure in the present embodiment can be to increase in inverted index structure When adding level-one index terms or object, number of rows can be fallen to the corresponding level-one of corresponding level-one index terms according to being rebuild completely, Then on the basis of the level-one index terms and the corresponding level-one evidence of falling number of rows rebuild completely, secondary index number is rebuild completely According to.
The inverted index method of the embodiment of the present invention, by obtaining search condition;Search condition includes:It is at least one to wait for The keyword of retrieval;Inverted index structure is inquired according to search condition, is obtained and the matched level-one index terms of search condition;According to Search condition inquires the corresponding secondary index data of level-one index terms, obtain in secondary index data with search condition matched two Grade index terms;According to the corresponding two level evidence of falling number of rows of secondary index word, retrieval result is determined, so as to which avoid will be all wide Stereotactic conditions are accused as level-one index terms, are also avoided using part advertisement stereotactic conditions as the filter condition to retrieval result, and It is using part advertisement stereotactic conditions as level-one index terms, advertisement stereotactic conditions in part are retrieved as secondary index word improving While efficiency, it is ensured that lower EMS memory occupation amount.
Fig. 3 is the flow diagram of another inverted index method provided in an embodiment of the present invention, as shown in figure 3, in Fig. 1 On the basis of illustrated embodiment, the information of object in the two level evidence of falling number of rows is sequence number of the object in the level-one evidence of falling number of rows.
Corresponding, step 104 can specifically include following steps:
S1041, the sequence number for obtaining each object in the two level evidence of falling number of rows.
For example, as shown in figure 4, the information for being the object in Fig. 2 in the two level evidence of falling number of rows is the schematic diagram of sequence number, scheming In 3, the quantity of advertisement is 4 in the corresponding level-one evidence of falling number of rows of level-one index terms, then the sequence number of 4 advertisements can be followed successively by 0, 1,2,3.Correspondingly, the sequence number that the corresponding two level evidence of falling number of rows of secondary index word " Beijing " includes can be 0 and 2;Two level The sequence number that the corresponding two level evidence of falling number of rows of index terms " Shandong " includes can be 1,2 and 3;Secondary index word " Shanghai " is corresponding The two level evidence of falling number of rows include sequence number can be 0,1 and 3.
Further, in order to be further reduced the EMS memory occupation amount of inverted index structure, it is less number of objects can be directed to The level-one evidence of falling number of rows in each object sequence number, be indicated using less byte or digit.That is, according to The quantity of object determines the byte Type of sequence number in the level-one evidence of falling number of rows;It is corresponding, further include in secondary index data:Sequence Number byte Type.
For example, when the quantity of object is less than or equal to 16 in the level-one evidence of falling number of rows, a sequence is indicated using 4 bit Row number;In the level-one evidence of falling number of rows the quantity of object be more than 16, and less than or equal to 256 when, indicate one using a byte Sequence number;In the level-one evidence of falling number of rows the quantity of object be more than 256, and less than or equal to 65536 when, indicated using two bytes One sequence number;When the quantity of object is more than 65536 in the level-one evidence of falling number of rows, a sequence is indicated using four bytes Number.So as to which the sequence number of each object is stored in the two level evidence of falling number of rows using the form of elongated byte storage.
In the present embodiment, include in secondary index data:In the case of the byte Type of sequence number, inverted index device The process for executing step 1041 is specifically as follows, and obtains the byte Type of sequence number in secondary index data;According to byte Type, Obtain the sequence number of each object successively from the two level evidence of falling number of rows.
In the present embodiment, the sequence number of each object is usually to be stored in the form of binary in two level inverted index, Therefore, if the byte Type of uncertain sequence number, may get the sequence number of mistake.Therefore, it is necessary to according to byte Type Obtain the sequence number of each object successively from the two level evidence of falling number of rows.Such as in fig. 4, it is assumed that " Shandong " corresponding two level falls to arrange Data include the sequence number for the advertisement that number is 12769,13530 and 13546, and each sequence number carrys out table with 4 bit Show, then inverted index device can obtain 4 bit every time from two level inverted index, to determine the sequence number of each advertisement.
Inverted index structure in the present embodiment, relative to the inverted index structure in embodiment illustrated in fig. 1, secondary index Word corresponds to the two level evidence of falling number of rows:The sequence number of each object, and the byte Type of sequence number is according to the level-one evidence of falling number of rows The quantity of middle object determines, due in search engine or targeted ads play system, object in the level-one evidence of falling number of rows Quantity is generally fewer, therefore, using the inverted index structure in the present embodiment, can be further reduced EMS memory occupation amount, and Do not influence recall precision.
S1042, the sequence number according to each object inquire the level-one evidence of falling number of rows, and it is each right in the level-one evidence of falling number of rows to obtain The information of elephant.
In the present embodiment, according to the sequence number of each object, the level-one evidence of falling number of rows is inquired, it is corresponding with sequence number right to obtain The information of elephant.For example, in fig. 4, it is assumed that the number that " Shandong " corresponding two level evidence of falling number of rows includes is 12769,13530 and The sequence number of 13546 advertisement is respectively 1,2,3, then inverted index device can obtain second in the level-one evidence of falling number of rows successively The number of the number of a advertisement, the number of third advertisement and the 4th advertisement.
S1043, according to the information of each object in the level-one evidence of falling number of rows, determine retrieval result.
The inverted index method of the embodiment of the present invention, by obtaining search condition;Search condition includes:It is at least one to wait for The keyword of retrieval;Inverted index structure is inquired according to search condition, is obtained and the matched level-one index terms of search condition;According to Search condition inquires the corresponding secondary index data of level-one index terms, obtain in secondary index data with search condition matched two Grade index terms;Obtain the sequence number of each object in the two level evidence of falling number of rows;According to the sequence number of each object, inquiry level-one falls to arrange Data obtain the information of each object in the level-one evidence of falling number of rows;According to the information of each object in the level-one evidence of falling number of rows, inspection is determined Rope is as a result, so as to avoid, using all advertisement stereotactic conditions as level-one index terms, also avoiding part advertisement orientation bar As the filter condition to retrieval result, but using part advertisement stereotactic conditions as level-one index terms, part advertisement orients part Condition is as secondary index word, while improving recall precision, it is ensured that lower EMS memory occupation amount.
Fig. 5 is a kind of structural schematic diagram of inverted index device provided in an embodiment of the present invention.As shown in figure 5, including:It obtains Modulus block 51, enquiry module 52 and determining module 53.
Wherein, acquisition module 51, for obtaining search condition;The search condition includes:It is at least one to be retrieved Keyword;
Enquiry module 52 obtains and the search condition for inquiring inverted index structure according to the search condition The level-one index terms matched;
The enquiry module 52 is additionally operable to inquire the corresponding secondary index number of level-one index terms according to the search condition According to, obtain in the secondary index data with the matched secondary index word of the search condition;
Determining module 53, for according to the corresponding two level evidence of falling number of rows of the secondary index word, determining retrieval result.
Inverted index device provided by the invention is specifically as follows the system retrieved based on inverted index structure, such as Search engine, targeted ads play system etc..In a search engine, search condition can be search term, and search term can be single A keyword or multiple keywords.In targeted ads play system, search condition can be that single or multiple advertisements are fixed To condition.Advertisement stereotactic conditions are for example, region orientation, crowd's orientation, point of interest orientation etc..
In the present embodiment, may include in inverted index structure:Level-one index terms and the corresponding level-one evidence of falling number of rows and Secondary index data.The level-one evidence of falling number of rows includes:With the information of the relevant each object of level-one index terms;Secondary index data Include:Each secondary index word and the corresponding two level evidence of falling number of rows.
Wherein, in a search engine, object can be for example document, and level-one index terms can be keyword;With level-one rope Draw the information of the relevant each object of word, can be the information of the document comprising level-one index terms.The information of document for example can be with Number, the number difference etc. of document and previous document for document.In addition, the information of document can also include:Level-one indexes There are the information such as position in word occurrence number in a document.
In targeted ads play system, object can be for example advertisement, and level-one index terms can be advertisement stereotactic conditions; Can be the information for the advertisement for meeting advertisement stereotactic conditions with the information of the relevant each object of level-one index terms.The letter of advertisement It ceases such as can be the number difference of the number, advertisement of advertisement with previous advertisement.
Further, on the basis of the above embodiments, may include in the corresponding two level evidence of falling number of rows of secondary index word: In the level-one evidence of falling number of rows with the information of the relevant each object of secondary index word.
Wherein, in a search engine, may include in the two level evidence of falling number of rows:Include level-one index terms and secondary index simultaneously The information of each document of word.In targeted ads play system, may include in the two level evidence of falling number of rows:Meet level-one simultaneously The information of each advertisement of advertisement stereotactic conditions and the advertisement stereotactic conditions of two level.
In the present embodiment, inverted index device can according to advertisement in the corresponding two level evidence of falling number of rows of secondary index word or The number etc. of document, obtaining corresponding advertisement, either the set of corresponding advertisement or document is determined as retrieval result by document. Wherein, by taking advertisement as an example, inverted index device can inquire advertising database according to the number of respective advertisement, obtain corresponding wide The properties collection of respective advertisement is determined as retrieval result by the content of announcement.
The inverted index device of the embodiment of the present invention, by obtaining search condition;Search condition includes:It is at least one to wait for The keyword of retrieval;Inverted index structure is inquired according to search condition, is obtained and the matched level-one index terms of search condition;According to Search condition inquires the corresponding secondary index data of level-one index terms, obtain in secondary index data with search condition matched two Grade index terms;According to the corresponding two level evidence of falling number of rows of secondary index word, retrieval result is determined, so as to which avoid will be all wide Stereotactic conditions are accused as level-one index terms, are also avoided using part advertisement stereotactic conditions as the filter condition to retrieval result, and It is using part advertisement stereotactic conditions as level-one index terms, advertisement stereotactic conditions in part are retrieved as secondary index word improving While efficiency, it is ensured that lower EMS memory occupation amount.
Further, in conjunction with reference to figure 6, on the basis of embodiment shown in Fig. 5, the letter of object in the two level evidence of falling number of rows Breath, can be sequence number of the object in the level-one evidence of falling number of rows.
Corresponding, the determining module 53 may include:Acquiring unit 531 and determination unit 532.
Wherein, acquiring unit 531 are described each right for the information of each object in the two level evidence of falling number of rows When as sequence number in the corresponding level-one evidence of falling number of rows of the level-one index terms, it is each right in the two level evidence of falling number of rows to obtain The sequence number of elephant;
The acquiring unit 531 is additionally operable to the sequence number according to each object, inquires the level-one evidence of falling number of rows, Obtain the information of each object described in the level-one evidence of falling number of rows;
Determination unit 532 determines retrieval knot for the information according to each object described in the level-one evidence of falling number of rows Fruit.
Further, in order to be further reduced the EMS memory occupation amount of inverted index structure, it is less number of objects can be directed to The level-one evidence of falling number of rows in each object sequence number, be indicated using less byte or digit.That is, according to The quantity of object determines the byte Type of sequence number in the level-one evidence of falling number of rows;It is corresponding, further include in secondary index data:Sequence Number byte Type.
For example, when the quantity of object is less than or equal to 16 in the level-one evidence of falling number of rows, a sequence is indicated using 4 bit Row number;In the level-one evidence of falling number of rows the quantity of object be more than 16, and less than or equal to 256 when, indicate one using a byte Sequence number;In the level-one evidence of falling number of rows the quantity of object be more than 256, and less than or equal to 65536 when, indicated using two bytes One sequence number;When the quantity of object is more than 65536 in the level-one evidence of falling number of rows, a sequence is indicated using four bytes Number.So as to which the sequence number of each object is stored in the two level evidence of falling number of rows using the form of elongated byte storage.
In the present embodiment, include in secondary index data:In the case of the byte Type of sequence number, acquiring unit 531 It specifically can be used for, obtain the byte Type of sequence number in secondary index data;According to byte Type, from the two level evidence of falling number of rows The sequence number of each object is obtained successively.
In the present embodiment, the sequence number of each object is usually to be stored in the form of binary in two level inverted index, Therefore, if the byte Type of uncertain sequence number, may get the sequence number of mistake.Therefore, it is necessary to according to byte Type Obtain the sequence number of each object successively from the two level evidence of falling number of rows.
The inverted index device of the embodiment of the present invention, by obtaining search condition;Search condition includes:It is at least one to wait for The keyword of retrieval;Inverted index structure is inquired according to search condition, is obtained and the matched level-one index terms of search condition;According to Search condition inquires the corresponding secondary index data of level-one index terms, obtain in secondary index data with search condition matched two Grade index terms;Obtain the sequence number of each object in the two level evidence of falling number of rows;According to the sequence number of each object, inquiry level-one falls to arrange Data obtain the information of each object in the level-one evidence of falling number of rows;According to the information of each object in the level-one evidence of falling number of rows, inspection is determined Rope is as a result, so as to avoid, using all advertisement stereotactic conditions as level-one index terms, also avoiding part advertisement orientation bar As the filter condition to retrieval result, but using part advertisement stereotactic conditions as level-one index terms, part advertisement orients part Condition is as secondary index word, while improving recall precision, it is ensured that lower EMS memory occupation amount.
Fig. 7 is the structural schematic diagram of another inverted index device provided in an embodiment of the present invention.The inverted index device Including:
Memory 1001, processor 1002 and it is stored in the calculating that can be run on memory 1001 and on processor 1002 Machine program.
Processor 1002 realizes the inverted index method provided in above-described embodiment when executing described program.
Further, inverted index device further includes:
Communication interface 1003, for the communication between memory 1001 and processor 1002.
Memory 1001, for storing the computer program that can be run on processor 1002.
Memory 1001 may include high-speed RAM memory, it is also possible to further include nonvolatile memory (non- Volatile memory), a for example, at least magnetic disk storage.
Processor 1002 realizes the inverted index method described in above-described embodiment when for executing described program.
If memory 1001, processor 1002 and the independent realization of communication interface 1003, communication interface 1003, memory 1001 and processor 1002 can be connected with each other by bus and complete mutual communication.The bus can be industrial standard Architecture (Industry Standard Architecture, referred to as ISA) bus, external equipment interconnection (Peripheral Component, referred to as PCI) bus or extended industry-standard architecture (Extended Industry Standard Architecture, referred to as EISA) bus etc..The bus can be divided into address bus, data/address bus, control Bus processed etc..For ease of indicating, only indicated with a thick line in Fig. 7, it is not intended that an only bus or a type of Bus.
Optionally, in specific implementation, if memory 1001, processor 1002 and communication interface 1003, are integrated in one It is realized on block chip, then memory 1001, processor 1002 and communication interface 1003 can be completed mutual by internal interface Communication.
Processor 1002 may be a central processing unit (Central Processing Unit, referred to as CPU), or Person is specific integrated circuit (Application Specific Integrated Circuit, referred to as ASIC) or quilt It is configured to implement one or more integrated circuits of the embodiment of the present invention.
The present embodiment also provides a kind of non-transitorycomputer readable storage medium, is stored thereon with computer program, It is characterized in that, which realizes inverted index method as described above when being executed by processor.
The present embodiment also provides a kind of computer program product, when the instruction processing unit in the computer program product is held When row, inverted index method as described above is realized.
In the description of this specification, reference term " one embodiment ", " some embodiments ", " example ", " specifically show The description of example " or " some examples " etc. means specific features, structure, material or spy described in conjunction with this embodiment or example Point is included at least one embodiment or example of the invention.In the present specification, schematic expression of the above terms are not It must be directed to identical embodiment or example.Moreover, particular features, structures, materials, or characteristics described can be in office It can be combined in any suitable manner in one or more embodiments or example.In addition, without conflicting with each other, the skill of this field Art personnel can tie the feature of different embodiments or examples described in this specification and different embodiments or examples It closes and combines.
In addition, term " first ", " second " are used for description purposes only, it is not understood to indicate or imply relative importance Or implicitly indicate the quantity of indicated technical characteristic.Define " first " as a result, the feature of " second " can be expressed or Implicitly include at least one this feature.In the description of the present invention, the meaning of " plurality " is at least two, such as two, three It is a etc., unless otherwise specifically defined.
Any process described otherwise above or method description are construed as in flow chart or herein, and expression includes It is one or more for realizing custom logic function or process the step of executable instruction code module, segment or portion Point, and the range of the preferred embodiment of the present invention includes other realization, wherein can not press shown or discuss suitable Sequence, include according to involved function by it is basic simultaneously in the way of or in the opposite order, to execute function, this should be of the invention Embodiment person of ordinary skill in the field understood.
Expression or logic and/or step described otherwise above herein in flow charts, for example, being considered use In the order list for the executable instruction for realizing logic function, may be embodied in any computer-readable medium, for Instruction execution system, device or equipment (system of such as computer based system including processor or other can be held from instruction The instruction fetch of row system, device or equipment and the system executed instruction) it uses, or combine these instruction execution systems, device or set It is standby and use.For the purpose of this specification, " computer-readable medium " can any can be included, store, communicating, propagating or passing Defeated program is for instruction execution system, device or equipment or the dress used in conjunction with these instruction execution systems, device or equipment It sets.The more specific example (non-exhaustive list) of computer-readable medium includes following:Electricity with one or more wiring Interconnecting piece (electronic device), portable computer diskette box (magnetic device), random access memory (RAM), read-only memory (ROM), erasable edit read-only storage (EPROM or flash memory), fiber device and portable optic disk is read-only deposits Reservoir (CDROM).In addition, computer-readable medium can even is that the paper that can print described program on it or other are suitable Medium, because can be for example by carrying out optical scanner to paper or other media, then into edlin, interpretation or when necessary with it His suitable method is handled electronically to obtain described program, is then stored in computer storage.
It should be appreciated that each section of the present invention can be realized with hardware, software, firmware or combination thereof.Above-mentioned In embodiment, software that multiple steps or method can in memory and by suitable instruction execution system be executed with storage Or firmware is realized.Such as, if realized in another embodiment with hardware, following skill well known in the art can be used Any one of art or their combination are realized:With for data-signal realize logic function logic gates from Logic circuit is dissipated, the application-specific integrated circuit with suitable combinational logic gate circuit, programmable gate array (PGA), scene can compile Journey gate array (FPGA) etc..
Those skilled in the art are appreciated that realize all or part of step that above-described embodiment method carries Suddenly it is that relevant hardware can be instructed to complete by program, the program can be stored in a kind of computer-readable storage medium In matter, which includes the steps that one or a combination set of embodiment of the method when being executed.
In addition, each functional unit in each embodiment of the present invention can be integrated in a processing module, it can also That each unit physically exists alone, can also two or more units be integrated in a module.Above-mentioned integrated mould The form that hardware had both may be used in block is realized, can also be realized in the form of software function module.The integrated module is such as Fruit is realized in the form of software function module and when sold or used as an independent product, can also be stored in a computer In read/write memory medium.
Storage medium mentioned above can be read-only memory, disk or CD etc..Although having been shown and retouching above The embodiment of the present invention is stated, it is to be understood that above-described embodiment is exemplary, and should not be understood as the limit to the present invention System, those skilled in the art can be changed above-described embodiment, change, replace and become within the scope of the invention Type.

Claims (15)

1. a kind of inverted index method, which is characterized in that including:
Obtain search condition;The search condition includes:At least one keyword to be retrieved;
Inverted index structure is inquired according to the search condition, is obtained and the matched level-one index terms of the search condition;
Inquire the corresponding secondary index data of level-one index terms according to the search condition, obtain in the secondary index data with The matched secondary index word of search condition;
According to the corresponding two level evidence of falling number of rows of the secondary index word, retrieval result is determined.
2. according to the method described in claim 1, it is characterized in that, the inverted index structure includes:
Level-one index terms and the corresponding level-one evidence of falling number of rows and secondary index data;
The level-one evidence of falling number of rows includes:With the information of the relevant each object of level-one index terms;
The secondary index data include:Each secondary index word and the corresponding two level evidence of falling number of rows.
3. according to the method described in claim 2, it is characterized in that, being wrapped in the corresponding two level evidence of falling number of rows of the secondary index word It includes:In the level-one evidence of falling number of rows with the information of the relevant each object of secondary index word.
4. according to any methods of claim 1-3, which is characterized in that in the two level evidence of falling number of rows, the object Information is sequence number of the object in the level-one evidence of falling number of rows.
5. according to the method described in claim 4, it is characterized in that, described fall to arrange according to the corresponding two level of the secondary index word Data determine retrieval result, including:
Obtain the sequence number of each object in the two level evidence of falling number of rows;
According to the sequence number of each object, the level-one evidence of falling number of rows is inquired, is obtained described in the level-one evidence of falling number of rows The information of each object;
According to the information of each object described in the level-one evidence of falling number of rows, retrieval result is determined.
6. according to the method described in claim 5, it is characterized in that, further including in the secondary index data:The word of sequence number Save type;The byte Type of the sequence number is determined according to the quantity of object in the level-one evidence of falling number of rows;
It is corresponding, the sequence number for obtaining each object in the two level evidence of falling number of rows, including:
Obtain the byte Type of sequence number in the secondary index data;
According to the byte Type, the sequence number of each object is obtained successively from the two level evidence of falling number of rows.
7. a kind of inverted index device, which is characterized in that including:
Acquisition module, for obtaining search condition;The search condition includes:At least one keyword to be retrieved;
Enquiry module obtains and the search condition matched one for inquiring inverted index structure according to the search condition Grade index terms;
The enquiry module is additionally operable to inquire the corresponding secondary index data of level-one index terms according to the search condition, obtain In the secondary index data with the matched secondary index word of the search condition;
Determining module, for according to the corresponding two level evidence of falling number of rows of the secondary index word, determining retrieval result.
8. device according to claim 7, which is characterized in that the inverted index structure includes:
Level-one index terms and the corresponding level-one evidence of falling number of rows and secondary index data;
The level-one evidence of falling number of rows includes:With the information of the relevant each object of level-one index terms;
The secondary index data include:Each secondary index word and the corresponding two level evidence of falling number of rows.
9. device according to claim 8, which is characterized in that wrapped in the corresponding two level evidence of falling number of rows of the secondary index word It includes:In the level-one evidence of falling number of rows with the information of the relevant each object of secondary index word.
10. according to any devices of claim 7-9, which is characterized in that in the two level evidence of falling number of rows, the object Information is sequence number of the object in the level-one evidence of falling number of rows.
11. device according to claim 10, which is characterized in that the determining module includes:
Acquiring unit is each object in the level-one for the information of each object in the two level evidence of falling number of rows When sequence number in the corresponding level-one evidence of falling number of rows of index terms, the sequence number of each object in the two level evidence of falling number of rows is obtained;
The acquiring unit is additionally operable to the sequence number according to each object, inquires the level-one evidence of falling number of rows, described in acquisition The information of each object described in the level-one evidence of falling number of rows;
Determination unit determines retrieval result for the information according to each object described in the level-one evidence of falling number of rows.
12. according to the devices described in claim 11, which is characterized in that further include in the secondary index data:Sequence number Byte Type;The byte Type of the sequence number is determined according to the quantity of object in the level-one evidence of falling number of rows;
Corresponding, the acquiring unit is specifically used for,
Obtain the byte Type of sequence number in the secondary index data;
According to the byte Type, the sequence number of each object is obtained successively from the two level evidence of falling number of rows.
13. a kind of inverted index device, which is characterized in that including:
Memory, processor and storage are on a memory and the computer program that can run on a processor, which is characterized in that institute State the inverted index method realized when processor executes described program as described in any in claim 1-6.
14. a kind of non-transitorycomputer readable storage medium, is stored thereon with computer program, which is characterized in that the program The inverted index method as described in any in claim 1-6 is realized when being executed by processor.
15. a kind of computer program product realizes such as right when the instruction processing unit in the computer program product executes It is required that any inverted index method in 1-6.
CN201810346228.4A 2018-04-18 2018-04-18 Inverted index method and device Withdrawn CN108563762A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810346228.4A CN108563762A (en) 2018-04-18 2018-04-18 Inverted index method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810346228.4A CN108563762A (en) 2018-04-18 2018-04-18 Inverted index method and device

Publications (1)

Publication Number Publication Date
CN108563762A true CN108563762A (en) 2018-09-21

Family

ID=63535230

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810346228.4A Withdrawn CN108563762A (en) 2018-04-18 2018-04-18 Inverted index method and device

Country Status (1)

Country Link
CN (1) CN108563762A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111580881A (en) * 2020-04-30 2020-08-25 支付宝(杭州)信息技术有限公司 File loading method and device and electronic equipment

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100169321A1 (en) * 2008-12-30 2010-07-01 Nec (China)Co., Ltd. Method and apparatus for ciphertext indexing and searching
CN106408320A (en) * 2015-07-31 2017-02-15 北京奇虎科技有限公司 Advertisement index construction method and apparatus and advertisement retrieval method and system
CN106445953A (en) * 2015-08-07 2017-02-22 北京奇虎科技有限公司 Advertisement creative information retrieval method and system
CN107341221A (en) * 2017-06-28 2017-11-10 百度在线网络技术(北京)有限公司 Foundation, associative search method, apparatus, equipment and the storage medium of index structure

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100169321A1 (en) * 2008-12-30 2010-07-01 Nec (China)Co., Ltd. Method and apparatus for ciphertext indexing and searching
CN106408320A (en) * 2015-07-31 2017-02-15 北京奇虎科技有限公司 Advertisement index construction method and apparatus and advertisement retrieval method and system
CN106445953A (en) * 2015-08-07 2017-02-22 北京奇虎科技有限公司 Advertisement creative information retrieval method and system
CN107341221A (en) * 2017-06-28 2017-11-10 百度在线网络技术(北京)有限公司 Foundation, associative search method, apparatus, equipment and the storage medium of index structure

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111580881A (en) * 2020-04-30 2020-08-25 支付宝(杭州)信息技术有限公司 File loading method and device and electronic equipment

Similar Documents

Publication Publication Date Title
CN109255126A (en) Article recommended method and device
CN108491529A (en) Information recommendation method and device
CN108920242A (en) Navigation bar generation method and device
CN106055654B (en) The integration method and device of isomeric data
CN103593440B (en) The reading/writing method and device of journal file
CN108242153A (en) Abnormal bayonet recognition methods and device
CN109032910A (en) Log collection method, device and storage medium
JP5499825B2 (en) Database management method, database system, program, and database data structure
CN110209760B (en) Method and device for associating calendar case pieces, electronic equipment and computer readable medium
CN110188350A (en) Text coherence calculation method and device
CN110069739A (en) The page preloads method and device
CN103473185B (en) Method, buffer storage and the storage system of caching write
CN107748802A (en) Polymerizable clc method and device
CN113010116B (en) Data processing method, device, terminal equipment and readable storage medium
CN110471915A (en) Account route determining methods and device
CN107819687A (en) Fixed route method, apparatus and its equipment
CN104346405A (en) Method and device for extracting information from webpage
SE530514C2 (en) A method, apparatus and computer software product in fingerprint matching
CN108563762A (en) Inverted index method and device
CN112783971B (en) Transaction recording method, transaction query method, electronic device and storage medium
CN108566316B (en) Unmanned vehicle delay statistics method, apparatus, equipment and computer-readable medium
CN111190895A (en) Method and device for organizing columnar storage data and storage medium
CN110298666A (en) Abnormality eliminating method and device during trading processing
CN108363655A (en) User behavior characteristics analysis method and device
CN107748801A (en) News recommends method, apparatus, terminal device and computer-readable recording medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right

Effective date of registration: 20190903

Address after: 100192 Dongsheng Science Park, Zhongguancun, 66 Xixiaokou Road, Haidian District, Beijing

Applicant after: Green Bay Network Technology Co., Ltd.

Address before: 100089 Beijing Haidian District Xixiaokou Road 66 Zhongguancun Dongsheng Science Park B-6 Building B 5 floors

Applicant before: Grass count language (Beijing) Technology Co., Ltd.

TA01 Transfer of patent application right
WW01 Invention patent application withdrawn after publication

Application publication date: 20180921

WW01 Invention patent application withdrawn after publication