CN107066527B - A kind of method and system of the caching index based on out-pile memory - Google Patents
A kind of method and system of the caching index based on out-pile memory Download PDFInfo
- Publication number
- CN107066527B CN107066527B CN201710104402.XA CN201710104402A CN107066527B CN 107066527 B CN107066527 B CN 107066527B CN 201710104402 A CN201710104402 A CN 201710104402A CN 107066527 B CN107066527 B CN 107066527B
- Authority
- CN
- China
- Prior art keywords
- index
- pile
- memory
- lucene
- data
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/10—File systems; File servers
- G06F16/17—Details of further file system functions
- G06F16/172—Caching, prefetching or hoarding of files
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/10—File systems; File servers
- G06F16/13—File access structures, e.g. distributed indices
- G06F16/134—Distributed indices
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/10—File systems; File servers
- G06F16/18—File system types
- G06F16/182—Distributed file systems
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- Memory System Of A Hierarchy Structure (AREA)
Abstract
The method and system for caching index based on out-pile memory that the invention discloses a kind of.The method of the present invention after distributing the memory of specified size for index data in out-pile memory and be put into memory pool, caches index preheating to out-pile the following steps are included: when Lucene is in starting state;When Lucene is in Index Status, judge that out-pile memory indexes amount of capacity, if the index capacity reaches requirements, opens output stream in out-pile memory index index data is written;When Lucene is in search condition, judge in out-pile memory index with the presence or absence of the index data for currently needing to read, and if it exists, open inlet flow in out-pile memory index then to read index data.Present system is corresponding with method.The present invention is substantially reduced because of disk I/O bring performance bottleneck, and can be by data persistence;It will not cause to reduce aggregate speed because of disk I/O bottleneck simultaneously and then lead to the real-time write latency of data.
Description
Technical field
The invention belongs to computerized informations to store index technology field, more particularly to a kind of based on the realization of out-pile memory
The method and system of the caching index of Apache Lucene.
Background technique
The index of Lucene is divided into memory index in heap, file system index and HDFS index.Memory index is in heap
Memory is realized in heap based on JVM, with disk no interactions therefore without disk I/O performance bottleneck, it can however not persistence is done to data,
There are the risks of loss of data, and there is also the severe performance problems that GC takes long time when data are excessive;File system index
Or HDFS index can do persistence to data, but there are disk I/O performance bottlenecks in hundred million grades of data volume environment.
Summary of the invention
The method and system for caching index based on out-pile memory that the purpose of the present invention is to provide a kind of, it is intended to solve existing
There is the shortcomings of the prior art in above-mentioned background technique.
The invention is realized in this way a method of the caching index based on out-pile memory, this method includes following step
It is rapid:
When Lucene is in starting state, the memory of specified size is distributed for index data in out-pile memory and is put into
After memory pool, index preheating is cached to out-pile;
When Lucene is in Index Status, judge that out-pile memory indexes amount of capacity, if the index capacity, which reaches, to be needed
Evaluation opens output stream in out-pile memory index then index data is written;
When Lucene is in search condition, judge in out-pile memory index with the presence or absence of the index number for currently needing to read
According to, and if it exists, open inlet flow in out-pile memory index then to read index data.
Preferably, the out-pile caching index preheating specifically: open inlet flow in file system index, judge out-pile
Memory index amount of capacity, if out-pile memory index in capacity reach requirements, by file system index in index number
According to be written to out-pile memory index in.
Preferably, described, out-pile memory index in open output stream index data is written the step of after further include
Step: when submitting data, index data is synchronized in file system index.
The system for caching index based on out-pile memory that the present invention further discloses a kind of, the system packet of caching index
It includes:
Lucene starting module, for being distributed in out-pile memory for index data when Lucene is in starting state
After specifying the memory of size and being put into memory pool, index preheating is cached to out-pile;
Lucene index module, for when Lucene is in Index Status, judging that out-pile memory indexes amount of capacity, if
The index capacity reaches requirements, then opens output stream in out-pile memory index index data is written;
Lucene search module whether there is in out-pile memory index for judging when Lucene is in search condition
The index data for currently needing to read, and if it exists, open inlet flow in out-pile memory index then to read index data.
Preferably, described that index preheating is cached to out-pile in Lucene starting module specifically: to be indexed in file system
Middle opening inlet flow judges that out-pile memory indexes amount of capacity, will be literary if the capacity in out-pile memory index reaches requirements
Index data in part system index is written in out-pile memory index.
Preferably, the Lucene index module, for when Lucene is in Index Status, judging that out-pile memory indexes
Amount of capacity, if the index capacity reaches requirements, out-pile memory index in open output stream index data is written,
When submitting data, index data is synchronized in file system index.
Compared with the prior art the shortcomings that and deficiency, the invention has the following advantages: the present invention holds in out-pile memory
When measuring sufficiently large, all in memory, read-write is substantially reduced without disk interaction because of disk I/O bring performance bottle in this way for read-write
Neck, and the index file in out-pile memory index can be synchronized in file system index again simultaneously, to reach both without disk
I/O bottleneck, and can be by data persistence;The merging of Lucene index segment also carries out in memory simultaneously, will not be because of disk I/O bottleneck
Cause to reduce aggregate speed and then leads to the real-time write latency of data.
Detailed description of the invention
Fig. 1 is the step flow chart of one embodiment of method of the caching index the present invention is based on out-pile memory;
Fig. 2 is the step flow chart of the another embodiment of method of the caching index the present invention is based on out-pile memory;
Fig. 3 is the structural schematic diagram of one embodiment of system of the caching index the present invention is based on out-pile memory.
Specific embodiment
In order to make the objectives, technical solutions, and advantages of the present invention clearer, with reference to the accompanying drawings and embodiments, right
The present invention is further elaborated.It should be appreciated that the specific embodiments described herein are merely illustrative of the present invention, and
It is not used in the restriction present invention.
Referring to Fig.1, the method that the present invention discloses the caching index based on out-pile memory of first embodiment, in out-pile memory
When capacity is sufficiently large, all in memory, read-write is substantially reduced without disk interaction because of disk I/O bring performance bottle in this way for read-write
Neck, this method comprises:
S1, when Lucene is in starting state, distribute the memory of specified size simultaneously for index data in out-pile memory
After being put into memory pool, index preheating is cached to out-pile;
S2, when Lucene is in Index Status, judge out-pile memory index amount of capacity, if the index capacity reaches
Requirements open output stream in out-pile memory index then index data is written;
S3, when Lucene is in search condition, judge in out-pile memory index with the presence or absence of the rope that currently needs to read
Argument evidence, and if it exists, open inlet flow in out-pile memory index then to read index data.
In the present embodiment, above-mentioned Lucene is Apache Lucene, and Lucene is an open source search engine of Apache
Project.
As described in step S1, the out-pile caching index preheating specifically: open inlet flow in file system index, sentence
Disconnected out-pile memory indexes amount of capacity, if the capacity in out-pile memory index reaches requirements, file system indexed
Index data is written in out-pile memory index.
As described in step S1, Lucene on startup, applies for the memory of one piece of specified size, according to list in out-pile memory
Position size storage allocation block is put into memory pool;Out-pile caching index preheating, starts a thread and beats in file system index
Inlet flow is opened, if the capacity in out-pile memory index is enough, is written in out-pile memory index.
As described in step S2, it when Lucene index, if out-pile memory index capacity is sufficiently large, is indexed in out-pile memory
Then index data is written in out-pile memory index by one output stream of middle opening, if out-pile memory size is inadequate,
An output stream is opened in file system index, index data is written in file system index.
As described in step S3, when Lucene is searched for, if there is the index text for currently needing to read in out-pile memory index
Part then opens inlet flow in out-pile memory index;Otherwise inlet flow is opened in file system index to read index data.
The present invention realizes that Lucene memory indexes in out-pile memory, solves the problems, such as that GC takes long time;In addition, realizing one
A index list, inside, which possesses an out-pile memory index and a file system index, out-pile memory index, to be promoted
The search speed of Lucene, file system index can guarantee that data can persistence.
In addition, the present invention, when out-pile memory size is sufficiently large, all in memory, read-write is handed over without disk in this way for read-write
Mutually, it substantially reduces because of disk I/O bring performance bottleneck, and the index file in out-pile memory index can be synchronized to again simultaneously
In file system index, to reach not only without disk I/O bottleneck, but also can be by data persistence;The merging of Lucene index segment simultaneously
Also it carries out in memory, will not cause to reduce aggregate speed because of disk I/O bottleneck and then leads to the real-time write latency of data.
Referring to Fig. 2, the invention discloses the method for the caching index based on out-pile memory of another embodiment, this method packets
It includes:
S1, when Lucene is in starting state, distribute the memory of specified size simultaneously for index data in out-pile memory
After being put into memory pool, index preheating is cached to out-pile;
S2, when Lucene is in Index Status, judge out-pile memory index amount of capacity, if the index capacity reaches
Requirements open output stream in out-pile memory index then index data is written;
S20, when submitting data, by index data be synchronized to file system index in;
S3, when Lucene is in search condition, judge in out-pile memory index with the presence or absence of the rope that currently needs to read
Argument evidence, and if it exists, open inlet flow in out-pile memory index then to read index data.
As described in step S1, the out-pile caching index preheating specifically: open inlet flow in file system index, sentence
Disconnected out-pile memory indexes amount of capacity, if the capacity in out-pile memory index reaches requirements, file system indexed
Index data is written in out-pile memory index.
As described in step S1, Lucene on startup, applies for the memory of one piece of specified size, according to list in out-pile memory
Position size storage allocation block is put into memory pool;Out-pile caching index preheating, starts a thread and beats in file system index
Inlet flow is opened, if the capacity in out-pile memory index is enough, is written in out-pile memory index.
As described in step S2, it when Lucene index, if out-pile memory index capacity is sufficiently large, is indexed in out-pile memory
Then index data is written in out-pile memory index by one output stream of middle opening, if out-pile memory size is inadequate,
An output stream is opened in file system index, index data is written in file system index.
As described in step S20, after index data is written in out-pile memory index, when submitting data by out-pile memory
Index file in index is synchronized in file system index.
As described in step S3, when Lucene is searched for, if there is the index text for currently needing to read in out-pile memory index
Part then opens inlet flow in out-pile memory index;Otherwise inlet flow is opened in file system index to read index data.
The present invention realizes that Lucene memory indexes in out-pile memory, solves the problems, such as that GC takes long time;In addition, realizing one
A index list, inside, which possesses an out-pile memory index and a file system index, out-pile memory index, to be promoted
The search speed of Lucene, file system index can guarantee that data can persistence.
In addition, the present invention, when out-pile memory size is sufficiently large, all in memory, read-write is handed over without disk in this way for read-write
Mutually, it substantially reduces because of disk I/O bring performance bottleneck, and the index file in out-pile memory index can be synchronized to again simultaneously
In file system index, to reach not only without disk I/O bottleneck, but also can be by data persistence;The merging of Lucene index segment simultaneously
Also it carries out in memory, will not cause to reduce aggregate speed because of disk I/O bottleneck and then leads to the real-time write latency of data.
Referring to shown in Fig. 3, the present invention further discloses a kind of system for caching index based on out-pile memory, the system
Include:
Lucene starting module 1, for being distributed in out-pile memory for index data when Lucene is in starting state
After specifying the memory of size and being put into memory pool, index preheating is cached to out-pile;
Lucene index module 2, for when Lucene is in Index Status, judging that out-pile memory indexes amount of capacity,
If the index capacity reaches requirements, output stream is opened in out-pile memory index index data is written;
Lucene search module 3 whether there is in out-pile memory index for judging when Lucene is in search condition
The index data for currently needing to read, and if it exists, open inlet flow in out-pile memory index then to read index data.
In the present embodiment, above-mentioned Lucene is Apache Lucene, and Lucene is an open source search engine of Apache
Project.
It is described that index preheating is cached to out-pile in Lucene starting module 1 specifically: to be opened in file system index
Inlet flow judges that out-pile memory indexes amount of capacity, if the capacity in out-pile memory index reaches requirements, by file system
Index data in index is written in out-pile memory index.
In Lucene starting module 1, Lucene on startup, applies in one piece of specified size in out-pile memory
It deposits, is put into memory pool according to unit-sized storage allocation block;Out-pile caching index preheating, starts a thread in file system
Inlet flow is opened in index, if the capacity in out-pile memory index is enough, is written in out-pile memory index.
In Lucene index module 2, when Lucene index, if out-pile memory index capacity is sufficiently large, in out-pile
An output stream is opened in memory index, then index data is written in out-pile memory index, if out-pile memory size
Not enough, then an output stream is opened in file system index, index data is written in file system index.
In Lucene search module 3, when Lucene is searched for, if having what current needs were read in out-pile memory index
Index file then opens inlet flow in out-pile memory index;Otherwise inlet flow is opened in file system index to read rope
Argument evidence.
In embodiments of the present invention, more specifically, the Lucene index module 2, for being in index shape as Lucene
When state, judge that out-pile memory indexes amount of capacity, if the index capacity reaches requirements, is opened in out-pile memory index
Output stream is to be written index data, and when submitting data, index data is synchronized in file system index.
The present invention realizes that Lucene memory indexes in out-pile memory, solves the problems, such as that GC takes long time;In addition, realizing one
A index list, inside, which possesses an out-pile memory index and a file system index, out-pile memory index, to be promoted
The search speed of Lucene, file system index can guarantee that data can persistence.
In addition, the present invention, when out-pile memory size is sufficiently large, all in memory, read-write is handed over without disk in this way for read-write
Mutually, it substantially reduces because of disk I/O bring performance bottleneck, and the index file in out-pile memory index can be synchronized to again simultaneously
In file system index, to reach not only without disk I/O bottleneck, but also can be by data persistence;The merging of Lucene index segment simultaneously
Also it carries out in memory, will not cause to reduce aggregate speed because of disk I/O bottleneck and then leads to the real-time write latency of data.
The foregoing is merely illustrative of the preferred embodiments of the present invention, is not intended to limit the invention, all in essence of the invention
Made any modifications, equivalent replacements, and improvements etc., should all be included in the protection scope of the present invention within mind and principle.
Claims (4)
1. a kind of method of the caching index based on out-pile memory, which is characterized in that method includes the following steps:
When Lucene is in starting state, the memory of specified size is distributed for index data in out-pile memory and is put into memory
Chi Hou caches index preheating to out-pile;The out-pile caching index preheating specifically: open input in file system index
Stream judges that out-pile memory indexes amount of capacity, if the capacity in out-pile memory index reaches requirements, file system is indexed
In index data be written to out-pile memory index in;
When Lucene is in Index Status, judge that out-pile memory indexes amount of capacity, if the index capacity reaches requirements,
Output stream is opened in out-pile memory index then index data is written;
When Lucene is in search condition, judge to whether there is the index data for currently needing to read in out-pile memory index,
If it exists, then inlet flow is opened to read index data in out-pile memory index.
2. the method for the caching index based on out-pile memory as described in claim 1, which is characterized in that described then in out-pile
It deposits and further comprises the steps of: after opening the step of output stream is to be written index data in index when submitting data, by index data
It is synchronized in file system index.
3. it is a kind of based on out-pile memory caching index system, which is characterized in that the caching index system include:
Lucene starting module, for being specified in out-pile memory for index data distribution when Lucene is in starting state
The memory of size and after being put into memory pool, caches index preheating to out-pile;It is described that index preheating is cached to out-pile specifically: in text
Inlet flow is opened in part system index, judges that out-pile memory indexes amount of capacity, if the capacity in out-pile memory index, which reaches, to be needed
Evaluation, then by file system index in index data be written to out-pile memory index in;
Lucene index module, for when Lucene is in Index Status, judging that out-pile memory indexes amount of capacity, if described
Index capacity reaches requirements, then opens output stream in out-pile memory index index data is written;
Lucene search module, for when Lucene is in search condition, judging in out-pile memory index with the presence or absence of current
The index data for needing to read, and if it exists, open inlet flow in out-pile memory index then to read index data.
4. the system of the caching index based on out-pile memory as claimed in claim 3, which is characterized in that the Lucene index
Module, for when Lucene is in Index Status, judging that out-pile memory indexes amount of capacity, if the index capacity, which reaches, to be needed
Evaluation opens output stream in out-pile memory index then index data is written, when submitting data, index data is synchronized to
In file system index.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710104402.XA CN107066527B (en) | 2017-02-24 | 2017-02-24 | A kind of method and system of the caching index based on out-pile memory |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710104402.XA CN107066527B (en) | 2017-02-24 | 2017-02-24 | A kind of method and system of the caching index based on out-pile memory |
Publications (2)
Publication Number | Publication Date |
---|---|
CN107066527A CN107066527A (en) | 2017-08-18 |
CN107066527B true CN107066527B (en) | 2019-10-29 |
Family
ID=59621323
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710104402.XA Active CN107066527B (en) | 2017-02-24 | 2017-02-24 | A kind of method and system of the caching index based on out-pile memory |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN107066527B (en) |
Families Citing this family (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107844579B (en) * | 2017-11-10 | 2021-10-26 | 顺丰科技有限公司 | Method, system and equipment for optimizing distributed database middleware access |
CN110580241B (en) * | 2018-05-22 | 2023-09-01 | 微软技术许可有限责任公司 | Preheating index files |
CN108763572B (en) * | 2018-06-06 | 2021-06-22 | 湖南蚁坊软件股份有限公司 | Method and device for realizing Apache Solr read-write separation |
CN109101554A (en) * | 2018-07-12 | 2018-12-28 | 厦门中控智慧信息技术有限公司 | For the data buffering system of JAVA platform, method and terminal |
CN110895475B (en) * | 2018-09-10 | 2023-03-31 | 深圳云天励飞技术有限公司 | Search server starting method and device and search server |
CN109902032B (en) * | 2019-01-31 | 2021-05-25 | 泰康保险集团股份有限公司 | Out-of-heap memory management method, device, medium and electronic equipment |
CN113626446B (en) * | 2021-10-09 | 2022-09-20 | 阿里云计算有限公司 | Data storage and search method, device, electronic equipment and medium |
CN113608804B (en) * | 2021-10-11 | 2022-01-04 | 北京华品博睿网络技术有限公司 | Persistent Java off-heap cache system and method |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102779134A (en) * | 2011-05-12 | 2012-11-14 | 苏州同程旅游网络科技有限公司 | Lucene-based distributed search method |
CN102843396A (en) * | 2011-06-22 | 2012-12-26 | 中兴通讯股份有限公司 | Data writing and reading method and device in distributed caching system |
CN103399915A (en) * | 2013-07-31 | 2013-11-20 | 北京华易互动科技有限公司 | Optimal reading method for index file of search engine |
CN106021484A (en) * | 2016-05-18 | 2016-10-12 | 中国电子科技集团公司第三十二研究所 | Customizable multi-mode big data processing system based on memory calculation |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9171077B2 (en) * | 2009-02-27 | 2015-10-27 | International Business Machines Corporation | Scaling dynamic authority-based search using materialized subgraphs |
-
2017
- 2017-02-24 CN CN201710104402.XA patent/CN107066527B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102779134A (en) * | 2011-05-12 | 2012-11-14 | 苏州同程旅游网络科技有限公司 | Lucene-based distributed search method |
CN102843396A (en) * | 2011-06-22 | 2012-12-26 | 中兴通讯股份有限公司 | Data writing and reading method and device in distributed caching system |
CN103399915A (en) * | 2013-07-31 | 2013-11-20 | 北京华易互动科技有限公司 | Optimal reading method for index file of search engine |
CN106021484A (en) * | 2016-05-18 | 2016-10-12 | 中国电子科技集团公司第三十二研究所 | Customizable multi-mode big data processing system based on memory calculation |
Non-Patent Citations (1)
Title |
---|
lucene内存索引和文件索引合并;mdong;《https://www.cnblogs.com/nulisaonian/p/6257309.html》;20170106;第1-2页 * |
Also Published As
Publication number | Publication date |
---|---|
CN107066527A (en) | 2017-08-18 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107066527B (en) | A kind of method and system of the caching index based on out-pile memory | |
CN103856567B (en) | Small file storage method based on Hadoop distributed file system | |
US9418013B2 (en) | Selective prefetching for a sectored cache | |
CN103885728B (en) | A kind of disk buffering system based on solid-state disk | |
US8443149B2 (en) | Evicting data from a cache via a batch file | |
CN102016790B (en) | Cache coherency protocol in a data processing system | |
CN106503051B (en) | A kind of greediness based on meta data category prefetches type data recovery system and restoration methods | |
US20160019254A1 (en) | Tiered data storage architecture | |
CN104657366B (en) | The method, apparatus and log disaster tolerance system of massive logs write-in database | |
CN103777969B (en) | Server parameter deploying method and device with no need for restarting | |
CN106227794A (en) | The storage method and apparatus of dynamic attribute data in temporal diagram data | |
CN107038206A (en) | The method for building up of LSM trees, the method for reading data and server of LSM trees | |
CN107273522A (en) | Towards the data-storage system and data calling method applied more | |
US9135177B2 (en) | Scheme to escalate requests with address conflicts | |
US9817754B2 (en) | Flash memory management | |
US20120158742A1 (en) | Managing documents using weighted prevalence data for statements | |
CN106648442A (en) | Metadata node internal memory mirroring method and device | |
US9558123B2 (en) | Retrieval hash index | |
CN111367991B (en) | MongoDB data real-time synchronization method and system based on message queue | |
US20200201690A1 (en) | Method, computer program product, and apparatus for acceleration of simultaneous access to shared data | |
US20070233965A1 (en) | Way hint line replacement algorithm for a snoop filter | |
CN104407990B (en) | A kind of disk access method and device | |
CN105260139A (en) | Magnetic disk management method and system | |
CN109478164A (en) | For storing the system and method for being used for the requested information of cache entries transmission | |
US10180901B2 (en) | Apparatus, system and method for managing space in a storage device |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |