CN104765782B - A kind of index order update method and device - Google Patents

A kind of index order update method and device Download PDF

Info

Publication number
CN104765782B
CN104765782B CN201510125423.0A CN201510125423A CN104765782B CN 104765782 B CN104765782 B CN 104765782B CN 201510125423 A CN201510125423 A CN 201510125423A CN 104765782 B CN104765782 B CN 104765782B
Authority
CN
China
Prior art keywords
ranking results
caching
index
segment
publisher
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201510125423.0A
Other languages
Chinese (zh)
Other versions
CN104765782A (en
Inventor
杨逸
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing 58 Information Technology Co Ltd
Original Assignee
Beijing 58 Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing 58 Information Technology Co Ltd filed Critical Beijing 58 Information Technology Co Ltd
Priority to CN201510125423.0A priority Critical patent/CN104765782B/en
Publication of CN104765782A publication Critical patent/CN104765782A/en
Application granted granted Critical
Publication of CN104765782B publication Critical patent/CN104765782B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The present invention provides a kind of index order update method and device, is related to computing technique and search field, in the prior art can not reflect in time the factor of the real-time updates such as information publisher's state to solve to search result, make the problem that search result is not accurate enough.The described method includes: carrying out inverted index to data segment according to the first inquiry request is calculated the first sequence as a result, and first ranking results are stored in caching;Forward index is carried out to first ranking results in caching according to the publisher's state refreshed in real time to calculate to carry out real-time update to first ranking results.

Description

A kind of index order update method and device
Technical field
The present invention relates to calculating and information technology fields, more particularly to a kind of information displaying method and device.
Background technique
Information sorting in the search result of classification information website is influenced by factors, in addition to the correlation of information itself Except the factors such as property, renewal time, there are also the state of information publisher, the generic of information and place regions etc..
However in the prior art, when the factors such as publisher's state change, since data volume is big, requirement of real-time is high, it is System is difficult in the information sorting that these factors are reflected to search result in time, to keep search result not accurate enough.
Summary of the invention
The technical problem to be solved in the present invention is to provide a kind of index order update method and devices, to solve existing skill The factor of the real-time updates such as information publisher's state can not be reflected in art to search result in time, keep search result not accurate enough The problem of.
On the one hand, the present invention provides a kind of index order update method, comprising: according to the first inquiry request to data segment into The first sequence is calculated as a result, and first ranking results are stored in caching in row inverted index;According to publisher's state pair First ranking results in caching carry out forward index and calculate to carry out real-time update to first ranking results.
Optionally, the multiple segmentations of the data segment point are managed, and are stored in preset time range in each segmentation The data of generation, the corresponding preset time range of each segmentation are different.
Optionally, described that first ranking results packet is calculated to data segment progress inverted index according to the first inquiry request It includes: according to first inquiry request, inverted index being carried out to each segmentation, the first ranking results are calculated.
Optionally, publisher's state that the basis refreshes in real time is just arranging first ranking results in caching It includes: to exist when in the first ranking results of the data segment that index, which is calculated to carry out real-time update to first ranking results, When the situation that document is deleted, corresponding document is removed from the cache;According to the publisher's state refreshed in real time in caching First ranking results carry out forward index calculate with to first ranking results carry out real-time update.
Optionally, publisher's state includes the user property of publisher or the operation behavior of publisher.
Further, forward index meter is carried out to first ranking results in caching according to publisher's state described After calculating to carry out real-time update to first ranking results, the method also includes: according to the second inquiry request described slow Deposit middle progress result set inquiry;There are the data acquisitions in the case where result set, from the caching in the caching The result set;In the case where the result set is not present in the caching, successively carries out inverted index and calculate and positive row's rope Draw calculating to obtain the second ranking results.
On the other hand, the present invention also provides a kind of index order updating devices, comprising: inverted index computing unit is used for Inverted index is carried out to data segment according to the first inquiry request, the first sequence is calculated as a result, and by first ranking results Deposit caching;Forward index computing unit, for being carried out just according to publisher's state to first ranking results in caching Row's index is calculated to carry out real-time update to first ranking results.
Optionally, the multiple segmentations of the data segment point are managed, and are stored in preset time range in each segmentation The data of generation, the corresponding preset time range of each segmentation are different.
Optionally, the forward index computing unit is specifically used for: existing when in the first ranking results of the data segment When the situation that document is deleted, corresponding document is removed from the cache;According to the publisher's state refreshed in real time in caching First ranking results carry out forward index calculate with to first ranking results carry out real-time update.
Further, described device further include: query unit, for according to publisher's state to described in caching After one ranking results carry out forward index calculating to carry out real-time update to first ranking results, according to the second inquiry request Result set inquiry is carried out in the caching;Acquiring unit, in the caching there are in the case where the result set, from Result set described in data acquisition in the caching;In the case where the result set is not present in the caching, successively trigger The inverted index computing unit and the forward index computing unit calculate and forward index calculating carrying out inverted index Afterwards, the second ranking results are obtained.
Index order update method provided in an embodiment of the present invention and device, can be according to the first inquiry request to data segment It carries out inverted index and the first sequence is calculated as a result, and first ranking results are stored in caching;Then according to publisher State carries out forward index to first ranking results in caching and calculates to carry out in real time more to first ranking results Newly.In this way, faster due to the data throughput speed in caching, and only need to carry out simple forward index calculating, therefore energy It is enough to reflect the update of publisher's state in ranking results in time, to substantially increase the accuracy of search result.
Detailed description of the invention
Fig. 1 is a kind of flow chart of index order update method provided in an embodiment of the present invention;
Fig. 2 is a kind of operating process schematic diagram of index order update method in the preferred embodiment of the present invention;
Fig. 3 is a kind of structural schematic diagram of data segment in the preferred embodiment of the present invention;
Fig. 4 is a kind of structural schematic diagram of index order updating device provided in an embodiment of the present invention.
Specific embodiment
Below in conjunction with attached drawing, the present invention is described in detail.It should be appreciated that specific embodiment described herein is only To explain the present invention, the present invention is not limited.
As shown in Figure 1, the embodiment of the present invention provides a kind of index order update method, comprising:
S11 carries out inverted index to data segment according to the first inquiry request and the first sequence is calculated as a result, and will be described First ranking results deposit caching;
S12 carries out forward index meter to first ranking results in caching according to the publisher's state refreshed in real time It calculates to carry out real-time update to first ranking results.
Index order update method provided in an embodiment of the present invention can fall data segment according to the first inquiry request The first sequence is calculated as a result, and first ranking results are stored in caching in row's index;Then according to the hair refreshed in real time Cloth person state carries out forward index to first ranking results in caching and calculates to carry out in fact to first ranking results Shi Gengxin.In this way, faster due to the data throughput speed in caching, and only need to carry out simple forward index calculating, because This can reflect the update of publisher's state in ranking results in time, to substantially increase the accuracy of search result.
Wherein, optionally, publisher's state may include the user property of publisher or the operation behavior of publisher etc. It can be with the feature of real-time change.
Operation change to index may include inquiry and update.To support multi-thread concurrent inquiry and real time indexing, Multistage can be divided to be managed index data.
For search index function, the document in index data section can be carried out real-time according to the inquiry request of user Sequence, and the document after sequence is stored in caching.For document more new function, index data section can be voluntarily to document therein Implement the operations such as document updates and batch heterogeneous profiles refresh.
Specifically, data segment is a kind of way to manage to magnanimity index data.For support multi-thread concurrent inquiry and Real time indexing, the multiple segmentations of data segment point are managed, and the number generated in preset time range is only stored in each segmentation According to the corresponding preset time range of each segmentation is different.In general, these data segments can be divided into reader segment and writer segment.Wherein, reader segment for user query and can carry out the deleting of data, change;writer Segment only provides the increasing of data, deletes, changes function, does not provide query function.After a collection of index data generates, it can be generated Document addition request, newly-increased document is added in writer segment.When writer segment life cycle reaches the upper limit Later (such as 3 seconds), and writer segment is converted into reader segment, becomes the reader for inquiry Segment, while creating new writer segment.It can be scheduled between each reader segment and dynamic is melted It closes.
Specifically, in one embodiment of the invention, it can be with timing scan reader segment, as a reader After the life cycle of segment reaches the upper limit, just by this reader segment and one bigger than its life cycle Adjacent reader segment is merged.
Preferably, as shown in Figure 2, wherein w indicates that writer segment, r and R indicate reader segment, by scheming 2 as can be seen that the reader segment capacity minimum life cycle adjacent with writer segment is most short, from writer Segment is remoter, and the capacity of reader segment is bigger, and life cycle is bigger.
Preferably, biggish Reader segment can be indexed by the full dose under line, and lesser data segment can be on line Real time indexing section;Each section has its life cycle, had after reaching its life cycle carry out life cycle promotion or to The bigger section of life cycle merges.
Since data segment is managed by multiple segmentations, optionally, data segment is carried out according to the first inquiry request It is specific that the first ranking results are calculated in inverted index can include: same to each segmentation according to first inquiry request The first sequence is calculated as a result, to effectively accelerate search speed in Shi Jinhang inverted index.
Specifically, document update can be started by document isomery refresh requests.Optionally, document isomery refresh requests May include two: querying condition and update condition, for example, in one embodiment of the invention, querying condition and update condition Are as follows:
Query=day:Friday AND gender:male&&&update=valid_days:100, price: 19.22
When updater receives the request of isomery refreshing, can proceed as follows:
1) it according to the querying condition in request, executes investigation to each reader segment data segment and askes, obtain All qualified document sets;
2) document sets are traversed, according to every a pair of of the domain name and thresholding in request, to the correspondence in each document just arranging domain into Row refreshes.
Request is updated since updater serially executes each document, so document sets will not change in the process, thus It ensure that the integrality of batch refresh.
Since operation of the user to data can be directly reflected into data segment, but it is different surely reflect in caching, In order to make the data in caching that can also timely update, it is preferred that in step s 12, according to the publisher's state refreshed in real time Carrying out forward index calculating to first ranking results in caching can to carry out real-time update to first ranking results It specifically includes:
When there is a situation where that document is deleted in the first ranking results of the data segment, by corresponding document from caching Middle deletion;
According to the publisher's state refreshed in real time in caching first ranking results carry out forward index calculate with Real-time update is carried out to first ranking results.
Due to having existed for some ranking results after step s 12, in caching, therefore, inquired when next time Or when search, result set inquiry can be carried out in the caching according to the second inquiry request;Exist in the caching described In the case where result set, from result set described in the data acquisition in the caching;The result set is not present in the caching In the case where, it successively carries out inverted index and calculates with forward index calculating to obtain the second ranking results.
That is, it is directed to a query, and if coming to nothing collection in cache, progress inverted index inquiry first It is ranked up Deng calculating, and according to main sequence dimension (such as time), the query result after obtaining a sequence for the first time Collection, there are in cache, this step is thick row.This result set will guarantee there are enough redundancies.For example, even if inquiry request The information for only taking first page will also save former pages of information result collection, in thick row for the first time to work as user's page turning backward When, cache result can be also multiplexed, and on the other hand, when the document in cache is deleted in real time or is dropped temporary, redundancy is literary in cache Shelves can also supplement up.Specific redundancy number of pages can be depending on the common page turning number of user, and user usually will not browse page The information of number excessively rearward.
After obtaining the result set slightly arranged in cache, using more complex marking mode to the document in result set into Row rearrangement.This marking mode can be using just arranging domain as marking factor as publisher's score of document, therefore works as When score updates, ranking results also be will be updated, this step is essence row.
It should be noted that filtering out from the result set and being deleted in real time if there is the result set met in cache The document removed is (when the document in some section is deleted, it may be possible to because business needs, it is also possible to because of the brush of entire chapter document Newly, that is, this document it is deleted from the section, and the new data of this document is increased in new section, at this time taken If cache, the data in cache may also maintain this document, then cache is just inconsistent with the state of section, institute To need to filter out the document deleted), and the number of documents according to specified by query, intercept required result set.
In the embodiment of the present invention, other than it can carry out this more rough sequence of inverted index in data segment, More accurate forward index can also be carried out at cache layers.Cache layers avoid the operations such as frequent inverted index inquiry, It ensure that query performance.Essence row has used cache result the positive row domain such as publisher's score of real-time refreshing as marking factor It resequences, ensure that ranking results being capable of real-time update.Moreover, because essence row carries out only for the result set in cache Sequence, does not need to carry out inverted index inquiry again, so not having to expend attitude resource.
Correspondingly, as shown in figure 3, the embodiment of the present invention also provides a kind of index order updating device, comprising:
Inverted index computing unit 41 is calculated for carrying out inverted index to data segment according to the first inquiry request One ranking results, and first ranking results are stored in and are cached;
Forward index computing unit 42, for being sorted according to the publisher's state refreshed in real time to described first in caching As a result forward index is carried out to calculate to carry out real-time update to first ranking results.
Index order updating device provided in an embodiment of the present invention, inverted index computing unit 41 can be according to the first inquiries Request carries out inverted index to data segment and the first sequence is calculated as a result, and first ranking results are stored in caching;So Forward index computing unit 42 carries out just first ranking results in caching according to the publisher's state refreshed in real time afterwards Row's index is calculated to carry out real-time update to first ranking results.In this way, faster due to the data throughput speed in caching, And only need to carry out simple forward index calculating, therefore the update of publisher's state can be reflected in time and arrive ranking results In, to substantially increase the accuracy of search result.
Optionally, the multiple segmentations of the data segment point are managed, and preset time range is only stored in each segmentation The data of interior generation, the corresponding preset time range of each segmentation are different.
Optionally, the forward index computing unit is specifically used for: existing when in the first ranking results of the data segment When the situation that document is deleted, corresponding document is removed from the cache;According to the publisher's state refreshed in real time in caching First ranking results carry out forward index calculate with to first ranking results carry out real-time update.
In another embodiment, as shown in figure 4, index order updating device provided by the invention may also include that
Query unit 43, for carrying out forward index to first ranking results in caching according to publisher's state After calculating to carry out real-time update to first ranking results, result set is carried out in the caching according to the second inquiry request Inquiry;
Acquiring unit 44, for there are the data in the case where result set, from the caching in the caching Obtain the result set;In the case where the result set is not present in the caching, inverted index computing unit is successively triggered 41 and forward index computing unit 42, it calculates carrying out inverted index and after forward index calculates, obtains the second ranking results.
Although for illustrative purposes, the preferred embodiment of the present invention has been disclosed, those skilled in the art will recognize It is various improve, increase and replace be also it is possible, therefore, the scope of the present invention should be not limited to the above embodiments.

Claims (8)

1. a kind of index order update method characterized by comprising
Inverted index is carried out to data segment according to the first inquiry request, the first sequence is calculated as a result, and described first sorts As a result deposit caching;
Forward index is carried out to first ranking results in caching according to publisher's state to calculate to first sequence As a result real-time update is carried out;
Wherein, the multiple segmentations of the data segment point are managed, and generation in preset time range is stored in each segmentation Data, the corresponding preset time range of each segmentation is different, and multiple segmentations include: writer segment and multiple The corresponding preset time range of reader segment, the reader segment is with the reader segment Increase with the writer segment distance and increase.
2. the method according to claim 1, wherein described arrange data segment according to the first inquiry request Index is calculated the first ranking results and includes:
According to first inquiry request, inverted index is carried out to each segmentation, the first ranking results are calculated.
3. the method according to claim 1, wherein publisher's state that the basis refreshes in real time is in caching First ranking results carry out forward index calculate with to first ranking results carry out real-time update include:
When there is a situation where that document is deleted in the first ranking results of the data segment, corresponding document is deleted from caching It removes;
Forward index is carried out to first ranking results in caching according to the publisher's state refreshed in real time to calculate to institute It states the first ranking results and carries out real-time update.
4. the method according to claim 1, wherein publisher's state include publisher user property or The operation behavior of publisher.
5. method according to claim 1 to 4, which is characterized in that it is described according to publisher's state to slow It is described after first ranking results deposited carry out forward index calculating to carry out real-time update to first ranking results Method further include:
Result set inquiry is carried out in the caching according to the second inquiry request;
There are in the case where the result set in the caching, from result set described in the data acquisition in the caching;
In the caching be not present the result set in the case where, successively carry out inverted index calculate and forward index calculate with Obtain the second ranking results.
6. a kind of index order updating device characterized by comprising
The first sequence is calculated for carrying out inverted index to data segment according to the first inquiry request in inverted index computing unit As a result, and first ranking results are stored in caching;
Forward index computing unit, for carrying out forward index to first ranking results in caching according to publisher's state It calculates to carry out real-time update to first ranking results;
Wherein, the multiple segmentations of the data segment point are managed, and generation in preset time range is stored in each segmentation Data, the corresponding preset time range of each segmentation is different, and multiple segmentations include: writer segment and multiple The corresponding preset time range of reader segment, the reader segment is with the reader segment Increase with the writer segment distance and increase.
7. device according to claim 6, which is characterized in that the forward index computing unit is specifically used for:
When there is a situation where that document is deleted in the first ranking results of the data segment, corresponding document is deleted from caching It removes;
Forward index is carried out to first ranking results in caching according to the publisher's state refreshed in real time to calculate to institute It states the first ranking results and carries out real-time update.
8. the device according to any one of claim 6 to 7, which is characterized in that further include:
Query unit, for according to publisher's state in caching first ranking results carry out forward index calculate with After carrying out real-time update to first ranking results, result set inquiry is carried out in the caching according to the second inquiry request;
Acquiring unit, in the caching there are in the case where the result set, from the data acquisition institute in the caching State result set;In the caching be not present the result set in the case where, successively trigger the inverted index computing unit and The forward index computing unit calculates with after forward index calculating carrying out inverted index, obtains the second ranking results.
CN201510125423.0A 2015-03-20 2015-03-20 A kind of index order update method and device Active CN104765782B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510125423.0A CN104765782B (en) 2015-03-20 2015-03-20 A kind of index order update method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510125423.0A CN104765782B (en) 2015-03-20 2015-03-20 A kind of index order update method and device

Publications (2)

Publication Number Publication Date
CN104765782A CN104765782A (en) 2015-07-08
CN104765782B true CN104765782B (en) 2019-06-21

Family

ID=53647613

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510125423.0A Active CN104765782B (en) 2015-03-20 2015-03-20 A kind of index order update method and device

Country Status (1)

Country Link
CN (1) CN104765782B (en)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105677813A (en) * 2015-12-30 2016-06-15 五八有限公司 Information display method and device
CN106294691B (en) * 2016-08-04 2020-03-03 广州交易猫信息技术有限公司 List refreshing method and device and server
CN110750535B (en) * 2019-09-27 2024-02-02 上海麦克风文化传媒有限公司 Ordering result updating method
CN111787351B (en) * 2020-07-01 2022-09-06 百度在线网络技术(北京)有限公司 Information query method, device, equipment and computer storage medium
CN116303140B (en) * 2023-05-19 2023-08-29 珠海妙存科技有限公司 Hardware-based sorting algorithm optimization method and device

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102867070A (en) * 2012-09-29 2013-01-09 瑞庭网络技术(上海)有限公司 Method for updating cache of key-value distributed memory system
CN103177117A (en) * 2013-04-08 2013-06-26 北京奇虎科技有限公司 Information index system and information index update method
CN103218423A (en) * 2013-04-02 2013-07-24 中国科学院信息工程研究所 Data inquiry method and device
CN103970853A (en) * 2014-05-05 2014-08-06 浙江宇视科技有限公司 Method and device for optimizing search engine

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN100573520C (en) * 2006-08-29 2009-12-23 国际商业机器公司 For retrieval is carried out pretreated method and apparatus to a plurality of documents
US9424351B2 (en) * 2010-11-22 2016-08-23 Microsoft Technology Licensing, Llc Hybrid-distribution model for search engine indexes

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102867070A (en) * 2012-09-29 2013-01-09 瑞庭网络技术(上海)有限公司 Method for updating cache of key-value distributed memory system
CN103218423A (en) * 2013-04-02 2013-07-24 中国科学院信息工程研究所 Data inquiry method and device
CN103177117A (en) * 2013-04-08 2013-06-26 北京奇虎科技有限公司 Information index system and information index update method
CN103970853A (en) * 2014-05-05 2014-08-06 浙江宇视科技有限公司 Method and device for optimizing search engine

Also Published As

Publication number Publication date
CN104765782A (en) 2015-07-08

Similar Documents

Publication Publication Date Title
CN104765782B (en) A kind of index order update method and device
CN104850572B (en) HBase non-primary key index construct and querying method and its system
CN107103032B (en) Mass data paging query method for avoiding global sequencing in distributed environment
CN102890722B (en) Indexing method applied to time sequence historical database
US9760636B1 (en) Systems and methods for browsing historical content
CN112437916A (en) Incremental clustering of database tables
CN105630864A (en) Forced ordering of a dictionary storing row identifier values
US10565198B2 (en) Bit vector search index using shards
CN104268295B (en) A kind of data query method and device
CN1979469A (en) Index and its extending and searching method
Bender et al. Exponential structures for efficient cache-oblivious algorithms
JP2017194778A (en) Tuning device and method for relational database
CN103488684A (en) Electricity reliability index rapid calculation method based on caching data multithread processing
US20140222828A1 (en) Columnwise Storage of Point Data
EP3314465B1 (en) Match fix-up to remove matching documents
CN106682042B (en) A kind of relation data caching and querying method and device
TWI539306B (en) Information delivery method, processing server and merge server
Huang et al. Mining frequent and top-k high utility time interval-based events with duration patterns
CN110162522A (en) A kind of distributed data search system and method
CN105653611A (en) Submeter paging sorting query method and device
CN106599190A (en) Dynamic Skyline query method based on cloud computing
CN104834719B (en) Applied to the Database Systems under real-time big data scene
Leong Hou et al. Durable top-k search in document archives
CN106484818A (en) A kind of hierarchy clustering method based on Hadoop and HBase
CN106776810A (en) The data handling system and method for a kind of big data

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
EXSB Decision made by sipo to initiate substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant