CN107066527B - A kind of method and system of the caching index based on out-pile memory - Google Patents

A kind of method and system of the caching index based on out-pile memory Download PDF

Info

Publication number
CN107066527B
CN107066527B CN201710104402.XA CN201710104402A CN107066527B CN 107066527 B CN107066527 B CN 107066527B CN 201710104402 A CN201710104402 A CN 201710104402A CN 107066527 B CN107066527 B CN 107066527B
Authority
CN
China
Prior art keywords
index
pile
memory
lucene
data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201710104402.XA
Other languages
Chinese (zh)
Other versions
CN107066527A (en
Inventor
何小成
黄三伟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hunan Ant Software Ltd By Share Ltd
Original Assignee
Hunan Ant Software Ltd By Share Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hunan Ant Software Ltd By Share Ltd filed Critical Hunan Ant Software Ltd By Share Ltd
Priority to CN201710104402.XA priority Critical patent/CN107066527B/en
Publication of CN107066527A publication Critical patent/CN107066527A/en
Application granted granted Critical
Publication of CN107066527B publication Critical patent/CN107066527B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/17Details of further file system functions
    • G06F16/172Caching, prefetching or hoarding of files
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/13File access structures, e.g. distributed indices
    • G06F16/134Distributed indices
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/18File system types
    • G06F16/182Distributed file systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Memory System Of A Hierarchy Structure (AREA)

Abstract

The method and system for caching index based on out-pile memory that the invention discloses a kind of.The method of the present invention after distributing the memory of specified size for index data in out-pile memory and be put into memory pool, caches index preheating to out-pile the following steps are included: when Lucene is in starting state;When Lucene is in Index Status, judge that out-pile memory indexes amount of capacity, if the index capacity reaches requirements, opens output stream in out-pile memory index index data is written;When Lucene is in search condition, judge in out-pile memory index with the presence or absence of the index data for currently needing to read, and if it exists, open inlet flow in out-pile memory index then to read index data.Present system is corresponding with method.The present invention is substantially reduced because of disk I/O bring performance bottleneck, and can be by data persistence;It will not cause to reduce aggregate speed because of disk I/O bottleneck simultaneously and then lead to the real-time write latency of data.

Description

A kind of method and system of the caching index based on out-pile memory
Technical field
The invention belongs to computerized informations to store index technology field, more particularly to a kind of based on the realization of out-pile memory The method and system of the caching index of Apache Lucene.
Background technique
The index of Lucene is divided into memory index in heap, file system index and HDFS index.Memory index is in heap Memory is realized in heap based on JVM, with disk no interactions therefore without disk I/O performance bottleneck, it can however not persistence is done to data, There are the risks of loss of data, and there is also the severe performance problems that GC takes long time when data are excessive;File system index Or HDFS index can do persistence to data, but there are disk I/O performance bottlenecks in hundred million grades of data volume environment.
Summary of the invention
The method and system for caching index based on out-pile memory that the purpose of the present invention is to provide a kind of, it is intended to solve existing There is the shortcomings of the prior art in above-mentioned background technique.
The invention is realized in this way a method of the caching index based on out-pile memory, this method includes following step It is rapid:
When Lucene is in starting state, the memory of specified size is distributed for index data in out-pile memory and is put into After memory pool, index preheating is cached to out-pile;
When Lucene is in Index Status, judge that out-pile memory indexes amount of capacity, if the index capacity, which reaches, to be needed Evaluation opens output stream in out-pile memory index then index data is written;
When Lucene is in search condition, judge in out-pile memory index with the presence or absence of the index number for currently needing to read According to, and if it exists, open inlet flow in out-pile memory index then to read index data.
Preferably, the out-pile caching index preheating specifically: open inlet flow in file system index, judge out-pile Memory index amount of capacity, if out-pile memory index in capacity reach requirements, by file system index in index number According to be written to out-pile memory index in.
Preferably, described, out-pile memory index in open output stream index data is written the step of after further include Step: when submitting data, index data is synchronized in file system index.
The system for caching index based on out-pile memory that the present invention further discloses a kind of, the system packet of caching index It includes:
Lucene starting module, for being distributed in out-pile memory for index data when Lucene is in starting state After specifying the memory of size and being put into memory pool, index preheating is cached to out-pile;
Lucene index module, for when Lucene is in Index Status, judging that out-pile memory indexes amount of capacity, if The index capacity reaches requirements, then opens output stream in out-pile memory index index data is written;
Lucene search module whether there is in out-pile memory index for judging when Lucene is in search condition The index data for currently needing to read, and if it exists, open inlet flow in out-pile memory index then to read index data.
Preferably, described that index preheating is cached to out-pile in Lucene starting module specifically: to be indexed in file system Middle opening inlet flow judges that out-pile memory indexes amount of capacity, will be literary if the capacity in out-pile memory index reaches requirements Index data in part system index is written in out-pile memory index.
Preferably, the Lucene index module, for when Lucene is in Index Status, judging that out-pile memory indexes Amount of capacity, if the index capacity reaches requirements, out-pile memory index in open output stream index data is written, When submitting data, index data is synchronized in file system index.
Compared with the prior art the shortcomings that and deficiency, the invention has the following advantages: the present invention holds in out-pile memory When measuring sufficiently large, all in memory, read-write is substantially reduced without disk interaction because of disk I/O bring performance bottle in this way for read-write Neck, and the index file in out-pile memory index can be synchronized in file system index again simultaneously, to reach both without disk I/O bottleneck, and can be by data persistence;The merging of Lucene index segment also carries out in memory simultaneously, will not be because of disk I/O bottleneck Cause to reduce aggregate speed and then leads to the real-time write latency of data.
Detailed description of the invention
Fig. 1 is the step flow chart of one embodiment of method of the caching index the present invention is based on out-pile memory;
Fig. 2 is the step flow chart of the another embodiment of method of the caching index the present invention is based on out-pile memory;
Fig. 3 is the structural schematic diagram of one embodiment of system of the caching index the present invention is based on out-pile memory.
Specific embodiment
In order to make the objectives, technical solutions, and advantages of the present invention clearer, with reference to the accompanying drawings and embodiments, right The present invention is further elaborated.It should be appreciated that the specific embodiments described herein are merely illustrative of the present invention, and It is not used in the restriction present invention.
Referring to Fig.1, the method that the present invention discloses the caching index based on out-pile memory of first embodiment, in out-pile memory When capacity is sufficiently large, all in memory, read-write is substantially reduced without disk interaction because of disk I/O bring performance bottle in this way for read-write Neck, this method comprises:
S1, when Lucene is in starting state, distribute the memory of specified size simultaneously for index data in out-pile memory After being put into memory pool, index preheating is cached to out-pile;
S2, when Lucene is in Index Status, judge out-pile memory index amount of capacity, if the index capacity reaches Requirements open output stream in out-pile memory index then index data is written;
S3, when Lucene is in search condition, judge in out-pile memory index with the presence or absence of the rope that currently needs to read Argument evidence, and if it exists, open inlet flow in out-pile memory index then to read index data.
In the present embodiment, above-mentioned Lucene is Apache Lucene, and Lucene is an open source search engine of Apache Project.
As described in step S1, the out-pile caching index preheating specifically: open inlet flow in file system index, sentence Disconnected out-pile memory indexes amount of capacity, if the capacity in out-pile memory index reaches requirements, file system indexed Index data is written in out-pile memory index.
As described in step S1, Lucene on startup, applies for the memory of one piece of specified size, according to list in out-pile memory Position size storage allocation block is put into memory pool;Out-pile caching index preheating, starts a thread and beats in file system index Inlet flow is opened, if the capacity in out-pile memory index is enough, is written in out-pile memory index.
As described in step S2, it when Lucene index, if out-pile memory index capacity is sufficiently large, is indexed in out-pile memory Then index data is written in out-pile memory index by one output stream of middle opening, if out-pile memory size is inadequate, An output stream is opened in file system index, index data is written in file system index.
As described in step S3, when Lucene is searched for, if there is the index text for currently needing to read in out-pile memory index Part then opens inlet flow in out-pile memory index;Otherwise inlet flow is opened in file system index to read index data.
The present invention realizes that Lucene memory indexes in out-pile memory, solves the problems, such as that GC takes long time;In addition, realizing one A index list, inside, which possesses an out-pile memory index and a file system index, out-pile memory index, to be promoted The search speed of Lucene, file system index can guarantee that data can persistence.
In addition, the present invention, when out-pile memory size is sufficiently large, all in memory, read-write is handed over without disk in this way for read-write Mutually, it substantially reduces because of disk I/O bring performance bottleneck, and the index file in out-pile memory index can be synchronized to again simultaneously In file system index, to reach not only without disk I/O bottleneck, but also can be by data persistence;The merging of Lucene index segment simultaneously Also it carries out in memory, will not cause to reduce aggregate speed because of disk I/O bottleneck and then leads to the real-time write latency of data.
Referring to Fig. 2, the invention discloses the method for the caching index based on out-pile memory of another embodiment, this method packets It includes:
S1, when Lucene is in starting state, distribute the memory of specified size simultaneously for index data in out-pile memory After being put into memory pool, index preheating is cached to out-pile;
S2, when Lucene is in Index Status, judge out-pile memory index amount of capacity, if the index capacity reaches Requirements open output stream in out-pile memory index then index data is written;
S20, when submitting data, by index data be synchronized to file system index in;
S3, when Lucene is in search condition, judge in out-pile memory index with the presence or absence of the rope that currently needs to read Argument evidence, and if it exists, open inlet flow in out-pile memory index then to read index data.
As described in step S1, the out-pile caching index preheating specifically: open inlet flow in file system index, sentence Disconnected out-pile memory indexes amount of capacity, if the capacity in out-pile memory index reaches requirements, file system indexed Index data is written in out-pile memory index.
As described in step S1, Lucene on startup, applies for the memory of one piece of specified size, according to list in out-pile memory Position size storage allocation block is put into memory pool;Out-pile caching index preheating, starts a thread and beats in file system index Inlet flow is opened, if the capacity in out-pile memory index is enough, is written in out-pile memory index.
As described in step S2, it when Lucene index, if out-pile memory index capacity is sufficiently large, is indexed in out-pile memory Then index data is written in out-pile memory index by one output stream of middle opening, if out-pile memory size is inadequate, An output stream is opened in file system index, index data is written in file system index.
As described in step S20, after index data is written in out-pile memory index, when submitting data by out-pile memory Index file in index is synchronized in file system index.
As described in step S3, when Lucene is searched for, if there is the index text for currently needing to read in out-pile memory index Part then opens inlet flow in out-pile memory index;Otherwise inlet flow is opened in file system index to read index data.
The present invention realizes that Lucene memory indexes in out-pile memory, solves the problems, such as that GC takes long time;In addition, realizing one A index list, inside, which possesses an out-pile memory index and a file system index, out-pile memory index, to be promoted The search speed of Lucene, file system index can guarantee that data can persistence.
In addition, the present invention, when out-pile memory size is sufficiently large, all in memory, read-write is handed over without disk in this way for read-write Mutually, it substantially reduces because of disk I/O bring performance bottleneck, and the index file in out-pile memory index can be synchronized to again simultaneously In file system index, to reach not only without disk I/O bottleneck, but also can be by data persistence;The merging of Lucene index segment simultaneously Also it carries out in memory, will not cause to reduce aggregate speed because of disk I/O bottleneck and then leads to the real-time write latency of data.
Referring to shown in Fig. 3, the present invention further discloses a kind of system for caching index based on out-pile memory, the system Include:
Lucene starting module 1, for being distributed in out-pile memory for index data when Lucene is in starting state After specifying the memory of size and being put into memory pool, index preheating is cached to out-pile;
Lucene index module 2, for when Lucene is in Index Status, judging that out-pile memory indexes amount of capacity, If the index capacity reaches requirements, output stream is opened in out-pile memory index index data is written;
Lucene search module 3 whether there is in out-pile memory index for judging when Lucene is in search condition The index data for currently needing to read, and if it exists, open inlet flow in out-pile memory index then to read index data.
In the present embodiment, above-mentioned Lucene is Apache Lucene, and Lucene is an open source search engine of Apache Project.
It is described that index preheating is cached to out-pile in Lucene starting module 1 specifically: to be opened in file system index Inlet flow judges that out-pile memory indexes amount of capacity, if the capacity in out-pile memory index reaches requirements, by file system Index data in index is written in out-pile memory index.
In Lucene starting module 1, Lucene on startup, applies in one piece of specified size in out-pile memory It deposits, is put into memory pool according to unit-sized storage allocation block;Out-pile caching index preheating, starts a thread in file system Inlet flow is opened in index, if the capacity in out-pile memory index is enough, is written in out-pile memory index.
In Lucene index module 2, when Lucene index, if out-pile memory index capacity is sufficiently large, in out-pile An output stream is opened in memory index, then index data is written in out-pile memory index, if out-pile memory size Not enough, then an output stream is opened in file system index, index data is written in file system index.
In Lucene search module 3, when Lucene is searched for, if having what current needs were read in out-pile memory index Index file then opens inlet flow in out-pile memory index;Otherwise inlet flow is opened in file system index to read rope Argument evidence.
In embodiments of the present invention, more specifically, the Lucene index module 2, for being in index shape as Lucene When state, judge that out-pile memory indexes amount of capacity, if the index capacity reaches requirements, is opened in out-pile memory index Output stream is to be written index data, and when submitting data, index data is synchronized in file system index.
The present invention realizes that Lucene memory indexes in out-pile memory, solves the problems, such as that GC takes long time;In addition, realizing one A index list, inside, which possesses an out-pile memory index and a file system index, out-pile memory index, to be promoted The search speed of Lucene, file system index can guarantee that data can persistence.
In addition, the present invention, when out-pile memory size is sufficiently large, all in memory, read-write is handed over without disk in this way for read-write Mutually, it substantially reduces because of disk I/O bring performance bottleneck, and the index file in out-pile memory index can be synchronized to again simultaneously In file system index, to reach not only without disk I/O bottleneck, but also can be by data persistence;The merging of Lucene index segment simultaneously Also it carries out in memory, will not cause to reduce aggregate speed because of disk I/O bottleneck and then leads to the real-time write latency of data.
The foregoing is merely illustrative of the preferred embodiments of the present invention, is not intended to limit the invention, all in essence of the invention Made any modifications, equivalent replacements, and improvements etc., should all be included in the protection scope of the present invention within mind and principle.

Claims (4)

1. a kind of method of the caching index based on out-pile memory, which is characterized in that method includes the following steps:
When Lucene is in starting state, the memory of specified size is distributed for index data in out-pile memory and is put into memory Chi Hou caches index preheating to out-pile;The out-pile caching index preheating specifically: open input in file system index Stream judges that out-pile memory indexes amount of capacity, if the capacity in out-pile memory index reaches requirements, file system is indexed In index data be written to out-pile memory index in;
When Lucene is in Index Status, judge that out-pile memory indexes amount of capacity, if the index capacity reaches requirements, Output stream is opened in out-pile memory index then index data is written;
When Lucene is in search condition, judge to whether there is the index data for currently needing to read in out-pile memory index, If it exists, then inlet flow is opened to read index data in out-pile memory index.
2. the method for the caching index based on out-pile memory as described in claim 1, which is characterized in that described then in out-pile It deposits and further comprises the steps of: after opening the step of output stream is to be written index data in index when submitting data, by index data It is synchronized in file system index.
3. it is a kind of based on out-pile memory caching index system, which is characterized in that the caching index system include:
Lucene starting module, for being specified in out-pile memory for index data distribution when Lucene is in starting state The memory of size and after being put into memory pool, caches index preheating to out-pile;It is described that index preheating is cached to out-pile specifically: in text Inlet flow is opened in part system index, judges that out-pile memory indexes amount of capacity, if the capacity in out-pile memory index, which reaches, to be needed Evaluation, then by file system index in index data be written to out-pile memory index in;
Lucene index module, for when Lucene is in Index Status, judging that out-pile memory indexes amount of capacity, if described Index capacity reaches requirements, then opens output stream in out-pile memory index index data is written;
Lucene search module, for when Lucene is in search condition, judging in out-pile memory index with the presence or absence of current The index data for needing to read, and if it exists, open inlet flow in out-pile memory index then to read index data.
4. the system of the caching index based on out-pile memory as claimed in claim 3, which is characterized in that the Lucene index Module, for when Lucene is in Index Status, judging that out-pile memory indexes amount of capacity, if the index capacity, which reaches, to be needed Evaluation opens output stream in out-pile memory index then index data is written, when submitting data, index data is synchronized to In file system index.
CN201710104402.XA 2017-02-24 2017-02-24 A kind of method and system of the caching index based on out-pile memory Active CN107066527B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710104402.XA CN107066527B (en) 2017-02-24 2017-02-24 A kind of method and system of the caching index based on out-pile memory

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710104402.XA CN107066527B (en) 2017-02-24 2017-02-24 A kind of method and system of the caching index based on out-pile memory

Publications (2)

Publication Number Publication Date
CN107066527A CN107066527A (en) 2017-08-18
CN107066527B true CN107066527B (en) 2019-10-29

Family

ID=59621323

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710104402.XA Active CN107066527B (en) 2017-02-24 2017-02-24 A kind of method and system of the caching index based on out-pile memory

Country Status (1)

Country Link
CN (1) CN107066527B (en)

Families Citing this family (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107844579B (en) * 2017-11-10 2021-10-26 顺丰科技有限公司 Method, system and equipment for optimizing distributed database middleware access
CN110580241B (en) * 2018-05-22 2023-09-01 微软技术许可有限责任公司 Preheating index files
CN108763572B (en) * 2018-06-06 2021-06-22 湖南蚁坊软件股份有限公司 Method and device for realizing Apache Solr read-write separation
CN109101554A (en) * 2018-07-12 2018-12-28 厦门中控智慧信息技术有限公司 For the data buffering system of JAVA platform, method and terminal
CN110895475B (en) * 2018-09-10 2023-03-31 深圳云天励飞技术有限公司 Search server starting method and device and search server
CN109902032B (en) * 2019-01-31 2021-05-25 泰康保险集团股份有限公司 Out-of-heap memory management method, device, medium and electronic equipment
CN113626446B (en) * 2021-10-09 2022-09-20 阿里云计算有限公司 Data storage and search method, device, electronic equipment and medium
CN113608804B (en) * 2021-10-11 2022-01-04 北京华品博睿网络技术有限公司 Persistent Java off-heap cache system and method

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102779134A (en) * 2011-05-12 2012-11-14 苏州同程旅游网络科技有限公司 Lucene-based distributed search method
CN102843396A (en) * 2011-06-22 2012-12-26 中兴通讯股份有限公司 Data writing and reading method and device in distributed caching system
CN103399915A (en) * 2013-07-31 2013-11-20 北京华易互动科技有限公司 Optimal reading method for index file of search engine
CN106021484A (en) * 2016-05-18 2016-10-12 中国电子科技集团公司第三十二研究所 Customizable multi-mode big data processing system based on memory calculation

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9171077B2 (en) * 2009-02-27 2015-10-27 International Business Machines Corporation Scaling dynamic authority-based search using materialized subgraphs

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102779134A (en) * 2011-05-12 2012-11-14 苏州同程旅游网络科技有限公司 Lucene-based distributed search method
CN102843396A (en) * 2011-06-22 2012-12-26 中兴通讯股份有限公司 Data writing and reading method and device in distributed caching system
CN103399915A (en) * 2013-07-31 2013-11-20 北京华易互动科技有限公司 Optimal reading method for index file of search engine
CN106021484A (en) * 2016-05-18 2016-10-12 中国电子科技集团公司第三十二研究所 Customizable multi-mode big data processing system based on memory calculation

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
lucene内存索引和文件索引合并;mdong;《https://www.cnblogs.com/nulisaonian/p/6257309.html》;20170106;第1-2页 *

Also Published As

Publication number Publication date
CN107066527A (en) 2017-08-18

Similar Documents

Publication Publication Date Title
CN107066527B (en) A kind of method and system of the caching index based on out-pile memory
CN103856567B (en) Small file storage method based on Hadoop distributed file system
US9418013B2 (en) Selective prefetching for a sectored cache
CN103885728B (en) A kind of disk buffering system based on solid-state disk
US8443149B2 (en) Evicting data from a cache via a batch file
CN102016790B (en) Cache coherency protocol in a data processing system
CN106503051B (en) A kind of greediness based on meta data category prefetches type data recovery system and restoration methods
US20160019254A1 (en) Tiered data storage architecture
CN104657366B (en) The method, apparatus and log disaster tolerance system of massive logs write-in database
CN103777969B (en) Server parameter deploying method and device with no need for restarting
CN106227794A (en) The storage method and apparatus of dynamic attribute data in temporal diagram data
CN107038206A (en) The method for building up of LSM trees, the method for reading data and server of LSM trees
CN107273522A (en) Towards the data-storage system and data calling method applied more
US9135177B2 (en) Scheme to escalate requests with address conflicts
US9817754B2 (en) Flash memory management
US20120158742A1 (en) Managing documents using weighted prevalence data for statements
CN106648442A (en) Metadata node internal memory mirroring method and device
US9558123B2 (en) Retrieval hash index
CN111367991B (en) MongoDB data real-time synchronization method and system based on message queue
US20200201690A1 (en) Method, computer program product, and apparatus for acceleration of simultaneous access to shared data
US20070233965A1 (en) Way hint line replacement algorithm for a snoop filter
CN104407990B (en) A kind of disk access method and device
CN105260139A (en) Magnetic disk management method and system
CN109478164A (en) For storing the system and method for being used for the requested information of cache entries transmission
US10180901B2 (en) Apparatus, system and method for managing space in a storage device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant