CN113722072B - Storage system file merging method and device based on intelligent shunting - Google Patents

Storage system file merging method and device based on intelligent shunting Download PDF

Info

Publication number
CN113722072B
CN113722072B CN202111074845.1A CN202111074845A CN113722072B CN 113722072 B CN113722072 B CN 113722072B CN 202111074845 A CN202111074845 A CN 202111074845A CN 113722072 B CN113722072 B CN 113722072B
Authority
CN
China
Prior art keywords
water level
end service
cache pool
pool
service pressure
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202111074845.1A
Other languages
Chinese (zh)
Other versions
CN113722072A (en
Inventor
杨宁
周文明
曹羽中
魏洪锦
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huarui Index Cloud Technology Shenzhen Co ltd
Original Assignee
Huarui Index Cloud Technology Shenzhen Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huarui Index Cloud Technology Shenzhen Co ltd filed Critical Huarui Index Cloud Technology Shenzhen Co ltd
Priority to CN202111074845.1A priority Critical patent/CN113722072B/en
Publication of CN113722072A publication Critical patent/CN113722072A/en
Application granted granted Critical
Publication of CN113722072B publication Critical patent/CN113722072B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/48Program initiating; Program switching, e.g. by interrupt
    • G06F9/4806Task transfer initiation or dispatching
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5011Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resources being hardware resources other than CPUs, Servers and Terminals
    • G06F9/5016Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resources being hardware resources other than CPUs, Servers and Terminals the resource being the memory
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5027Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
    • G06F9/505Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals considering the load
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2209/00Indexing scheme relating to G06F9/00
    • G06F2209/50Indexing scheme relating to G06F9/50
    • G06F2209/5011Pool
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2209/00Indexing scheme relating to G06F9/00
    • G06F2209/50Indexing scheme relating to G06F9/50
    • G06F2209/508Monitor

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention belongs to the technical field of storage, and particularly relates to a storage system file merging method and device based on intelligent shunting. Under the condition that the water level of the cache pool is higher and the load of the data pool is moderate, the similarity between the near-term front-end service pressure and the history contemporaneous front-end service pressure is compared, under the condition that the similarity is higher, the change trend of the water level of the history contemporaneous cache pool is predicted, and whether the flow is split is determined according to the prediction result; under the condition of lower similarity, the current change trend of the water level of the cache pool is predicted, whether the current change trend of the water level of the cache pool is split is determined according to a prediction result, the occurrence of system performance bottlenecks is prevented, smooth and smooth performance of merging services is ensured, the performance of NVMe SSD can be achieved even if cheaper SATA SSD or SAS SSD is adopted, and effective cost reduction is achieved. And the bandwidth of the split is automatically adjusted according to the set split ratio during the split, so that the stability of the time delay of the front-end service in a high-pressure scene is ensured.

Description

Storage system file merging method and device based on intelligent shunting
Technical Field
The invention belongs to the technical field of storage, and particularly relates to a storage system file merging method and device based on intelligent shunting.
Background
In the current various distributed storage products, the storage of massive small files is a technical problem in the industry. The main technical difficulties are three: (1) in the storage process of the massive small files, a large number of small IOs are frequently generated on a storage medium, so that the storage performance is low; (2) the actual space occupied by the small file on the storage medium is much larger than that of the small file, so that the storage space is wasted; (3) the full text retrieval performance after the storage of massive small files is low.
In order to solve the first two problems, each storage manufacturer in the industry has introduced its own small file merging scheme. The main scheme is to add a buffer pool (high performance, low capacity) created based on SSD storage media in front of the data pool, and then migrate the data to the data pool (low performance, high capacity) created based on HDD storage media after the data is merged in the buffer pool. The specific implementation is mainly divided into two types, namely an online merging scheme and an offline merging scheme.
The schematic diagram of the online merging scheme is shown in fig. 1, and can be simply summarized as that after a small file is received into a storage system, merging is performed in a memory, then the merged large file is written into a cache pool, then an asynchronous task is started in the background, the large file is read out of the cache pool, and then the large file is written into a data pool.
The schematic diagram of the offline merging scheme is shown in fig. 2, and can be simply summarized as writing the small files into the cache pool according to the sequence of writing the small files into the storage system, then starting an asynchronous task in the background, reading a plurality of small files from the cache pool, merging the small files into a large file in the memory, and then writing the large file into the data pool.
In the online merging scheme, when data fall on an SSD storage medium, the data are merged into a large file, so that the writing performance is better; in the offline merging scheme, multiple writing and multiple reading are generated on the SSD medium in the data writing and merging process, so that the writing performance is poor. In contrast, although the online merging scheme has a certain advantage in performance compared with the offline merging scheme, when the capacity of the cache pool is smaller and the capacity of the data pool is larger, under the condition that the total capacity of the SSD cache medium is far smaller than the total capacity of the HDD main storage medium, in particular, a large-capacity hard disk and a high-density server are mainly adopted in the currently mainstream distributed storage system, the total bandwidth of the SSD is necessarily much smaller than the total bandwidth of the HDD, the total bandwidth which can be provided by the cache pool is limited, the cache pool is easy to become a performance bottleneck of the whole system, the performance of hardware cannot be effectively exerted, and the front-end service is blocked.
Disclosure of Invention
The invention provides a storage system file merging method and device based on intelligent distribution, which are used for solving the problem that the prior art blocks front-end business.
In order to solve the technical problems, the technical scheme and the corresponding beneficial effects of the technical scheme are as follows:
the invention discloses a storage system file merging method based on intelligent shunting, which comprises the following steps:
1) Under the condition that small files need to be combined, acquiring the water level of a cache pool and the load of a data pool, and judging whether the water level of the cache pool exceeds a water level threshold value or not and whether the load of the data pool exceeds a load threshold value or not; the data pool load is the utilization rate of a storage medium in the data pool, the cache pool water level is the total capacity of dirty data in the cache pool/the total capacity of the cache pool, and the dirty data is data written into the cache pool but not migrated to the data pool;
2) Under the condition that the water level of the cache pool exceeds the water level threshold and the load of the data pool does not exceed the load threshold, acquiring the near-term front-end service pressure, and comparing the near-term front-end service pressure with the front-end service pressure in the same period of history in a similarity manner:
if the similarity between the near-term front-end service pressure and the history contemporaneous front-end service pressure is smaller than or equal to a similarity threshold value, predicting the current change trend of the water level of the cache pool based on the near-term water level of the cache pool, and determining whether to perform split-flow processing or non-split-flow processing according to a prediction result;
if the similarity between the near-term front-end service pressure and the history contemporaneous front-end service pressure is larger than a similarity threshold value, predicting the change trend of the water level of the history contemporaneous cache pool based on the water level of the history contemporaneous cache pool, and determining whether to perform split-flow processing or non-split-flow processing according to a prediction result;
the front-end service pressure is the data volume written into the cache pool every second by the front-end service; the non-splitting processing refers to that small files are directly placed in a cache pool for merging processing; the splitting treatment is to put the small files into a cache pool and a data pool for merging treatment respectively according to the set splitting proportion.
The beneficial effects of the technical scheme are as follows: under the condition that the water level of the cache pool is higher and the load of the data pool is moderate, the similarity between the near-term front-end service pressure and the history contemporaneous front-end service pressure is compared, under the condition that the similarity is higher, the change trend of the water level of the history contemporaneous cache pool is predicted, and whether the flow is split is determined according to the prediction result; under the condition of low similarity, the current cache pool water level change trend is predicted, and whether the current cache pool water level change trend is split or not is determined according to the prediction result, so that the intelligent splitting method for comprehensively utilizing the similarity between the recent front-end service pressure and the history contemporaneous front-end service pressure, the prediction result for predicting the history contemporaneous cache pool water level change trend and the prediction result for predicting the current cache pool water level change trend is realized, the occurrence of system performance bottlenecks is prevented, smooth and fluent performance of merging services is ensured, the time delay stability of the front-end service under a high-pressure scene is ensured, the performance of an NVMe SSD can be achieved even if cheaper SATA SSD or SAS SSD is adopted, and the cost is effectively reduced. And the bandwidth of the split is automatically adjusted according to the set split ratio during the split, so that the stability of the time delay of the front-end service in a high-pressure scene is further ensured.
In step 2), if the predicted result of the history contemporaneous cache pool water level change trend is an ascending trend, the diversion processing is performed.
Further, in step 2), if the predicted result of predicting the historical contemporaneous buffer pool water level change trend is a downward trend, the current buffer pool water level change trend is predicted based on the recent buffer pool water level without diversion temporarily, and whether diversion processing or non-diversion processing is performed is determined according to the predicted result.
Further, in step 2), if the predicted result of predicting the current cache pool water level change trend is an ascending trend, performing a diversion process; and if the predicted result of the current cache pool water level change trend is a descending trend, performing non-diversion processing.
In step 1), if the buffer pool water level does not exceed the water level threshold or the data pool load exceeds the load threshold, non-split processing is performed.
Further, in order to accurately calculate the similarity between the near-term front-end service pressure and the history contemporaneous front-end service pressure, a similarity calculation model based on a pearson correlation coefficient and/or a euclidean distance algorithm is used to compare the similarity between the near-term front-end service pressure and the history contemporaneous front-end service pressure.
Further, in order to accurately predict the current buffer pool water level change trend and the history contemporaneous buffer pool water level change trend, a time sequence analysis model is utilized to predict the history contemporaneous buffer pool water level change trend based on the history contemporaneous buffer pool water level or predict the current buffer pool water level change trend based on the recent buffer pool water level; and training the time sequence analysis model by using the historical cache pool water level change trend data.
Further, in order to accurately calculate and obtain a diversion evaluation model to ensure the stability of the time delay of the front-end service in a large-pressure scene, the diversion proportion is calculated by using the diversion evaluation model, the current front-end service pressure, the current cache pool water level and the current data pool load; the input of the shunt evaluation model is the product of front-end service pressure and corresponding weight, the product of buffer pool water level and corresponding weight, and the product of data pool load and corresponding weight, the output of the shunt evaluation model comprises a shunt proportion, and the shunt evaluation model is obtained by training the historical front-end service pressure, the historical buffer pool water level and the historical data pool load, and the current shunt proportion of the historical front-end service pressure, the historical buffer pool water level and the historical data pool load.
Further, in order to ensure the stability of the system, before step 1), the method further comprises the following steps: calculating the maximum front-end service pressure which can be borne currently in real time, and judging whether the current front-end service pressure exceeds the maximum front-end service pressure: if the maximum front-end service pressure is exceeded, triggering a front-end service flow control mechanism to limit the writing bandwidth of the front-end service; and if the maximum front-end service pressure is not exceeded, executing the step 1).
The storage system file merging device based on intelligent shunting comprises a memory and a processor, wherein the processor is used for executing instructions stored in the memory to realize the storage system file merging method based on intelligent shunting and achieve the same beneficial effects as the method.
Drawings
FIG. 1 is a schematic diagram of an online merge scheme of the prior art;
FIG. 2 is a schematic diagram of an offline consolidation scheme of the prior art;
FIG. 3 is a schematic diagram of a file merging system according to the present invention;
FIG. 4 is a schematic diagram of the intelligent shunt module of the present invention;
FIG. 5 is a flow chart of a storage system file merging method based on intelligent splitting of the present invention;
FIG. 6 is a block diagram of a storage system file merge device based on intelligent forking in accordance with the present invention.
Detailed Description
The basic idea of the invention is as follows: the invention utilizes the data pool to share the small file merging pressure of the cache pool, so that partial small files are directly merged in the data pool without passing through the cache pool, namely 'splitting' mentioned in the following paragraphs. Based on the method, under the condition that the water level of the cache pool exceeds a water level threshold and the load of the data pool does not exceed a load threshold, the method comprehensively utilizes the similarity between the near-term front-end service pressure and the history contemporaneous front-end service pressure, the prediction result of predicting the history contemporaneous cache pool water level change trend and the prediction result of predicting the current cache pool water level change trend to determine whether to carry out diversion processing; and the splitting treatment is to reasonably distribute the splitting proportion according to the setting into a buffer pool for merging and a data pool for merging.
The following describes a storage system file merging method based on intelligent shunting and a storage system file merging device based on intelligent shunting in detail with reference to the drawings and embodiments.
Method embodiment:
in order to realize the file merging method of the storage system based on intelligent shunting, the file merging system shown in the figure 3 is designed, and the file merging system comprises a service detection module, a cache pool merging module, a data pool merging module, a small file migration module, a key index monitoring module, an intelligent flow control module, an intelligent shunting module and a metadata management module. These modules are all software modules. The function of each module and the data interaction between each module are described in detail below.
(1) And a service request detection module. The module is used for detecting each request in real time and judging whether the files need to be combined or not according to a preset strategy.
(2) And a cache pool merging module. The module is used for merging the small files in the cache pool. If the small files need to be combined in the cache pool, the module mounts the small files into a memory queue corresponding to the cache pool, and after enough small files exist in the queue, combines a plurality of small files on the queue into a large file and writes the large file into an SSD storage medium corresponding to the cache pool.
(3) And a data pool merging module. The module is used for merging small files in the data pool. If the small files need to be directly combined in the data pool, the module mounts the small files into a memory queue corresponding to the data pool, and after enough small files exist in the queue, combines a plurality of small files on the queue into a large file and writes the large file into an HDD storage medium corresponding to the data pool.
(4) And a small file migration module. The module is used to periodically read large files in the cache pool from the SSD storage medium and then write the data pool.
(5) And the key index monitoring module. The module is used for monitoring and calculating key indexes, wherein the key indexes comprise front-end service pressure, cache pool water level and data pool load. The front-end service pressure refers to the amount of data written into the buffer pool per second by the front-end service, i.e. the data bandwidth. The cache pool water level refers to the proportion of dirty data in the cache pool, namely the percentage of the capacity of the dirty data to the total capacity of the cache pool; the dirty data refers to data which is just written into the cache pool and is not migrated to the data pool. The data pool load refers to the utilization of the storage media in the data pool. The key index calculated by the module is used by the intelligent flow control module and the intelligent flow distribution module to determine whether to distribute flow and how to distribute flow.
(6) And the intelligent flow control module. The module calculates the maximum service bandwidth which can be borne by the system in real time according to various key indexes calculated by the key index monitoring module, and if the maximum capacity of the system is exceeded, a front-end service flow control mechanism is triggered, and the writing bandwidth of the front-end service is limited so as to ensure the stability of the system.
(7) And the intelligent shunt module. The main functions of the module are as follows: when the front-end service pressure is too high, and the cache pool reaches the performance bottleneck, the module transfers a part of front-end service pressure to the data pool according to various key indexes calculated by the key index monitoring module, and the action of merging small files is directly finished on the data pool without passing through the cache pool. The structure of the intelligent shunt module is shown in fig. 4, and the intelligent shunt module comprises a trend prediction unit, a machine learning unit, a shunt proportion calculation unit and a model configuration unit. The function of each unit will be described in detail below.
(1) Trend prediction unit. The unit is used for predicting the trend of water level change in the data pool. The unit adopts a trend prediction algorithm based on time sequence analysis, a main body adopts an ARIMA differential autoregressive moving average model to analyze the writing bandwidth of front-end service, the front-end service pressure is decomposed into a trend part, a period part and a residual sequence, and the analysis result of the trend part and the load of a data pool are utilized to judge whether the distribution is necessary or not. The periodic part in the algorithm refers to a periodic variation rule generated when the front-end service pressure is reflected on the monitoring index of the cache pool, wherein the periodicity may be in units of days or in units of weeks or months, for example, the front-end service pressure reaches a peak value in the early 8 hours of the working hours of each day, and the front-end service pressure reaches the lowest in the late 6 hours of the working hours of each day. The trend part in the algorithm refers to the trend of rising, falling and up-and-down fluctuation of the front-end service pressure within a certain range. The residual sequence in the algorithm refers to the original sequence of the training data minus the fitting sequence on the training data. The more consistent the sequence is with a random error distribution (normal distribution with a mean of 0), the better the model fit is explained. The input items of the module are a group of time series data formed by { time 1, water level 1 of a cache pool }, { time 2, water level 2 of a cache pool }, { time 3, water level 3 of a cache pool } … and the like. The output term of the unit is the trend of the water level of the cache pool, namely { rising, required time, confidence } or { falling, required time, confidence } or { holding, duration, confidence }.
(2) And a machine learning unit. The unit has two functions. The first function is to summarize historical data to summarize a shunt assessment model. The input item of the shunt evaluation model is { front end service pressure }, the weight of front end service pressure, the buffer pool water level }, the weight of buffer pool water level, the weight of data pool load }, the output item is { shunt proportion, the time required for the buffer pool water level to drop below a safety threshold value, and the confidence; the diversion evaluation model is obtained by training the diversion proportion of the historical front-end service pressure, the historical cache pool water level and the historical data pool load at the time. The value of each weight mainly considers the influence degree of each factor on the prediction result and the stability of the factor. Stability here refers to the degree of randomness with which the factor changes. In general, the weights of the individual factors can be set as follows: the weight of the front-end service pressure is 30%, the weight of the water level of the buffer pool is 40%, and the weight of the data pool load is 40%, and of course, the value can be adjusted according to the actual situation. The second function is to provide a similarity calculation model based on the pearson correlation coefficient and the euclidean distance algorithm, which is used for comparing the current front-end service pressure with the front-end service pressure in the same period of history as the reference data of whether to split or not. The input items of the similarity calculation model are two groups of time series data, wherein one group is historical data, namely { historical time 1, historical service pressure 1}, { historical time 2, historical service pressure 2}, { historical time 3, historical service pressure 3} …, the other group is recent data, namely { recent time 1, recent service pressure 1}, { recent time 2, recent service pressure 2}, { recent time 3, recent service pressure 3} …, the output items are { the similarity percentage of the two groups of data }, and when the similarity is above 95%, the similarity between the current front-end service pressure and the historical service pressure is considered to be very high.
(3) And a split ratio calculating unit. An optimal split ratio is automatically calculated based on some analysis results given by the trend prediction unit, the machine learning unit and the model configuration unit. The calculation flow of the unit is as follows: firstly, judging whether the current service pressure is similar to the pressure in a certain historical time period according to a similarity calculation method of a machine learning module: if the similarity is higher, firstly, shunting according to the historical shunting proportion; if the current is not similar, judging whether the current has an ascending trend in the future according to the calculation result of the trend prediction module, and if the current has the ascending trend, shunting by adopting the shunting proportion calculated by the machine learning module. The detailed process is shown in fig. 5.
(4) And a model configuration unit. The unit provides a method for manually adjusting parameters for the trend prediction unit and the machine learning unit, also supports the manual marking of business peak and trough of holidays, and can effectively supplement the algorithm of the machine learning unit.
(8) And a metadata management module. The module is responsible for recording key information such as the position, the size and the like of each small file, and information such as mapping information of the small file to the large file, position information of the large file, cavity information generated in the large file after the small file is deleted and the like.
Based on the file merging system introduced above, the storage system file merging method based on intelligent shunting can be realized. The overall process is described below in conjunction with fig. 5.
Step one, in the working process of the system, the key indexes (including front-end service pressure, cache pool water level and data Chi Fuzai) in the system are monitored and calculated in real time through a key index monitoring module, and the intelligent flow control module calculates the maximum service bandwidth which can be borne by the system currently according to the key indexes calculated by the key index monitoring module, and judges: if the current front-end service pressure exceeds the maximum front-end service pressure, triggering a front-end service flow control mechanism, and limiting the writing bandwidth of the front-end service to ensure the stability of the system; if the current front-end service pressure does not exceed the maximum front-end service pressure, detecting each request in real time through a service request detection module, judging whether the files need to be combined according to a preset strategy under the condition that the requests exist, executing the second step if the files need to be combined, and directly ending if the files do not need to be combined.
Judging the water level of the buffer pool calculated by the key index monitoring module, and judging whether the water level of the buffer pool exceeds a water level threshold value or not:
if the water level of the cache pool does not exceed the water level threshold, performing non-diversion processing, and executing the step eight;
and if the water level of the cache pool exceeds the water level threshold, executing the third step.
Step three, judging the data pool load calculated by the key index monitoring module, and judging whether the data pool load exceeds a load threshold value or not:
if the load of the data pool exceeds the load threshold, performing non-shunt processing, and executing the step eight;
and if the load of the data pool does not exceed the load threshold, executing the fourth step.
Judging whether the near-term front-end service pressure (namely the front-end service pressure in a few periods) is similar to the front-end service pressure in the same period as the history through a machine learning unit in the intelligent diversion module:
if the output result of the machine learning unit is that the front-end service pressure in the near several periods is dissimilar to the front-end service pressure in the history synchronization (i.e. the similarity is less than or equal to a similarity threshold), executing the fifth step;
if the output result of the machine learning unit is that the front-end service pressure in the near-several periods is similar to the front-end service pressure in the history synchronization (i.e. the similarity is greater than the similarity threshold), the sixth step is executed.
Step five, under the condition that the front-end service pressure in a near-several periods is dissimilar to the front-end service pressure in the same period as the history, predicting the current change trend of the water level of the cache pool by a trend prediction unit based on the recent water level data of the cache pool:
if the prediction result of the trend prediction unit is an ascending trend, carrying out shunt processing, and executing a step nine;
if the prediction result of the trend prediction unit is a downward trend, performing non-shunt processing, and executing the step eight.
Step six, under the condition that the front-end service pressure in a few periods is similar to the front-end service pressure in the history synchronization, based on the history synchronization cache pool water level data, predicting the history synchronization cache pool water level change trend through a trend prediction unit:
if the prediction result of the trend prediction unit is an ascending trend, carrying out shunt processing, and executing a step nine;
if the prediction result of the trend prediction unit is a downward trend, the current is not split temporarily, and the seventh step is executed.
Step seven, based on recent cache pool water level data, predicting the current cache pool water level change trend through a trend prediction unit:
if the prediction result of the trend prediction unit is an ascending trend, carrying out shunt processing, and executing a step nine;
if the prediction result of the trend prediction unit is a downward trend, performing non-shunt processing, and executing the step eight.
Step eight, carrying out non-splitting treatment, namely calculating a splitting ratio by a machine learning unit, namely calculating the product of front-end service pressure and corresponding weight, the product of buffer pool water level and corresponding weight and the product of data pool load and corresponding weight, inputting three product results into a splitting evaluation model, and obtaining a splitting ratio, mounting partial small files into a memory queue corresponding to the buffer pool by using the splitting ratio through a buffer pool merging module, merging a plurality of small files on the queue into a large file after enough small files exist in the queue, writing the large file into an SSD storage medium corresponding to the buffer pool, and periodically reading the large file in the buffer pool by a small file migration module and writing the large file into the data pool; and the other part of small files are mounted in a memory queue corresponding to the data pool through the data pool merging module, and after enough small files exist in the queue, a plurality of small files on the queue are merged into a large file and written into an SSD storage medium corresponding to the data pool.
And step nine, under the condition of shunting, firstly, mounting small files into a memory queue corresponding to a data pool through a data pool merging module, merging a plurality of small files on the queue into a large file after enough small files exist in the queue, and writing the large file into an HDD storage medium corresponding to the data pool.
Thus, the storage system file merging method based on intelligent shunting can be completed. In the whole, the file merging system and the file merging method realized by using the file merging system have the following characteristics:
1) The intelligent diversion module is arranged, the trend prediction unit in the intelligent diversion module can predict the water level change trend, the machine learning unit can calculate the diversion proportion to conduct diversion processing according to the calculated diversion proportion, and can conduct similarity calculation to achieve comparison of the current front-end service pressure and the front-end service pressure in the same period of history, intelligent diversion can be achieved by utilizing the data, the bandwidth proportion of diversion can be automatically adjusted, and the stability of time delay of front-end service under a large-pressure scene is guaranteed.
2) According to the shunting method, even if cheaper SATA SSD or SAS SSD is adopted, the NVMe SSD performance can be achieved, the cost can be effectively reduced, and the hardware performance can be effectively exerted.
Device example:
the embodiment of the storage system file merging device based on intelligent shunting disclosed by the invention is shown in fig. 6, and comprises a memory, a processor and an internal bus, wherein the processor and the memory are communicated with each other and data interaction is completed through the internal bus. The memory includes at least one software functional module stored in the memory, and the processor executes various functional applications and data processing by running the software program and the module stored in the memory, so as to implement a storage system file merging method based on intelligent shunting in the method embodiment of the present invention, that is, the steps of the scheme in the method embodiment are completed by the program to instruct relevant hardware, and the program executes the steps including the method embodiment.
The processor may be a microprocessor MCU, a programmable logic device FPGA, or other processing device. The memory may be various memories for storing information by using electric energy, such as RAM, ROM, etc.; the magnetic storage device can also be various memories for storing information by utilizing a magnetic energy mode, such as a hard disk, a floppy disk, a magnetic tape, a magnetic core memory, a bubble memory, a U disk and the like; various memories for optically storing information, such as CDs, DVDs, etc.; of course, other types of memory are also possible, such as quantum memory, graphene memory, etc.

Claims (10)

1. The storage system file merging method based on intelligent shunting is characterized by comprising the following steps:
1) Under the condition that small files need to be combined, acquiring the water level of a cache pool and the load of a data pool, and judging whether the water level of the cache pool exceeds a water level threshold value or not and whether the load of the data pool exceeds a load threshold value or not; the data pool load is the utilization rate of a storage medium in the data pool, the cache pool water level is the total capacity of dirty data in the cache pool/the total capacity of the cache pool, and the dirty data is data written into the cache pool but not migrated to the data pool;
2) Under the condition that the water level of the cache pool exceeds the water level threshold and the load of the data pool does not exceed the load threshold, acquiring the near-term front-end service pressure, and comparing the near-term front-end service pressure with the front-end service pressure in the same period of history in a similarity manner:
if the similarity between the near-term front-end service pressure and the history contemporaneous front-end service pressure is smaller than or equal to a similarity threshold value, predicting the current change trend of the water level of the cache pool based on the near-term water level of the cache pool, and determining whether to perform split-flow processing or non-split-flow processing according to a prediction result;
if the similarity between the near-term front-end service pressure and the history contemporaneous front-end service pressure is larger than a similarity threshold value, predicting the change trend of the water level of the history contemporaneous cache pool based on the water level of the history contemporaneous cache pool, and determining whether to perform split-flow processing or non-split-flow processing according to a prediction result;
the front-end service pressure is the data volume written into the cache pool every second by the front-end service; the non-splitting processing refers to that small files are directly placed in a cache pool for merging processing; the splitting processing refers to that a part of small files are placed in a cache pool for merging processing according to the set splitting proportion, and the other part of small files are directly merged in a data pool without passing through the cache pool.
2. The method for merging files in a storage system based on intelligent shunting according to claim 1, wherein in step 2), if a prediction result of predicting a change trend of a cache pool water level in a history synchronization is an ascending trend, shunting is performed.
3. The method for merging files in a storage system based on intelligent shunting according to claim 1, wherein in step 2), if a prediction result of predicting a change trend of a cache pool water level in a history synchronization is a decreasing trend, temporary no shunting is performed, a current change trend of the cache pool water level is predicted based on a recent cache pool water level, and whether to perform shunting processing or non-shunting processing is determined according to the prediction result.
4. The method for merging files in a storage system based on intelligent diversion according to claim 1 or 3, wherein in step 2), if a prediction result of predicting a current cache pool water level change trend is an ascending trend, diversion processing is performed; and if the predicted result of the current cache pool water level change trend is a descending trend, performing non-diversion processing.
5. The method for merging files in a storage system based on intelligent shunting according to claim 1, wherein in step 1), if the cache pool water level does not exceed a water level threshold or the data pool load exceeds a load threshold, non-shunting processing is performed.
6. The intelligent split-based storage system file merging method according to claim 1, wherein a similarity calculation model based on pearson correlation coefficients and/or euclidean distance algorithms is used to compare the similarity of recent front-end traffic pressures with the history contemporaneous front-end traffic pressures.
7. The method for merging files of a storage system based on intelligent shunting according to claim 1, wherein a time sequence analysis model is utilized to predict a change trend of a historical contemporaneous cache pool water level based on the historical contemporaneous cache pool water level or predict a current change trend of the cache pool water level based on a recent cache pool water level; and training the time sequence analysis model by using the historical cache pool water level change trend data.
8. The intelligent split-based storage system file merging method according to claim 1, wherein the split ratio is calculated by using a split evaluation model, and a current front-end service pressure, a current cache pool water level and a current data pool load; the input of the shunt evaluation model is the product of front-end service pressure and corresponding weight, the product of buffer pool water level and corresponding weight, and the product of data pool load and corresponding weight, the output of the shunt evaluation model comprises a shunt proportion, and the shunt evaluation model is obtained by training the historical front-end service pressure, the historical buffer pool water level and the historical data pool load, and the current shunt proportion of the historical front-end service pressure, the historical buffer pool water level and the historical data pool load.
9. The intelligent split-based storage system file merging method according to claim 1, further comprising the steps of, before step 1): calculating the maximum front-end service pressure which can be borne currently in real time, and judging whether the current front-end service pressure exceeds the maximum front-end service pressure: if the maximum front-end service pressure is exceeded, triggering a front-end service flow control mechanism to limit the writing bandwidth of the front-end service; and if the maximum front-end service pressure is not exceeded, executing the step 1).
10. The storage system file merging device based on intelligent shunting is characterized by comprising a memory and a processor, wherein the processor is used for executing instructions stored in the memory to realize the storage system file merging method based on intelligent shunting according to any one of claims 1-9.
CN202111074845.1A 2021-09-14 2021-09-14 Storage system file merging method and device based on intelligent shunting Active CN113722072B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111074845.1A CN113722072B (en) 2021-09-14 2021-09-14 Storage system file merging method and device based on intelligent shunting

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111074845.1A CN113722072B (en) 2021-09-14 2021-09-14 Storage system file merging method and device based on intelligent shunting

Publications (2)

Publication Number Publication Date
CN113722072A CN113722072A (en) 2021-11-30
CN113722072B true CN113722072B (en) 2024-02-13

Family

ID=78683638

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111074845.1A Active CN113722072B (en) 2021-09-14 2021-09-14 Storage system file merging method and device based on intelligent shunting

Country Status (1)

Country Link
CN (1) CN113722072B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114465957B (en) * 2021-12-29 2024-03-08 天翼云科技有限公司 Data writing method and device
CN117648297B (en) * 2024-01-30 2024-06-11 中国人民解放军国防科技大学 Method, system, equipment and medium for offline merging of small files based on object storage

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090083344A1 (en) * 2007-09-26 2009-03-26 Hitachi, Ltd. Computer system, management computer, and file management method for file consolidation
CN108595567A (en) * 2018-04-13 2018-09-28 郑州云海信息技术有限公司 A kind of merging method of small documents, device, equipment and readable storage medium storing program for executing
CN112631521A (en) * 2020-12-25 2021-04-09 苏州浪潮智能科技有限公司 Method, system, equipment and medium for controlling water level of cache pool

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090083344A1 (en) * 2007-09-26 2009-03-26 Hitachi, Ltd. Computer system, management computer, and file management method for file consolidation
CN108595567A (en) * 2018-04-13 2018-09-28 郑州云海信息技术有限公司 A kind of merging method of small documents, device, equipment and readable storage medium storing program for executing
CN112631521A (en) * 2020-12-25 2021-04-09 苏州浪潮智能科技有限公司 Method, system, equipment and medium for controlling water level of cache pool

Also Published As

Publication number Publication date
CN113722072A (en) 2021-11-30

Similar Documents

Publication Publication Date Title
CN113722072B (en) Storage system file merging method and device based on intelligent shunting
CN110413227B (en) Method and system for predicting remaining service life of hard disk device on line
CN105653591B (en) A kind of industrial real-time data classification storage and moving method
CN105630638B (en) For the apparatus and method for disk array distribution caching
CN109558287B (en) Method, device and system for predicting service life of solid state disk
CN110289994B (en) Cluster capacity adjusting method and device
TWI740899B (en) Optimization method, evaluation method, processing method and device for data migration
CN109766234A (en) Disk storage capacity prediction technique based on time series models
WO2017058045A1 (en) Dynamic storage tiering based on predicted workloads
CN107122126B (en) Data migration method, device and system
CN103631538A (en) Cold and hot data identification threshold value calculation method, device and system
CN103279429A (en) Application-aware distributed global shared cache partition method
WO2012159490A1 (en) Error estimation module and estimation method thereof for flash memory
CN109144895B (en) Data storage method and device
CN102129442A (en) Distributed database system and data accessing method
CN104699424A (en) Page hot degree based heterogeneous memory management method
CN104050057B (en) Historical sensed data duplicate removal fragment eliminating method and system
RU2731321C2 (en) Method for determining a potential fault of a storage device
CN103502925B (en) A kind of monitoring record management method and device
Yang et al. Ars: Reducing f2fs fragmentation for smartphones using decision trees
CN105493024A (en) Data threshold prediction method and related apparatus
CN104376094A (en) File hierarchical storage method and system considering access randomness
TWI751580B (en) Management method of cache files in storage space and recording device for storing cache files
CN101776946B (en) Method for controlling power consumption on basis of object-based storage system
US11983172B2 (en) Generation of a predictive model for selection of batch sizes in performing data format conversion

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right

Effective date of registration: 20221011

Address after: 518071 4011, Block A, Zhongguan Times Square, No. 4168, Liuxian Avenue, Pingshan Community, Taoyuan Street, Nanshan District, Shenzhen, Guangdong

Applicant after: Huarui Index Cloud Technology (Shenzhen) Co.,Ltd.

Address before: 471399 No. 1, Huali electronic technology Animation Industrial Park, Hebin sub district office, Yichuan County, Luoyang City, Henan Province

Applicant before: Huarui index cloud (Henan) Technology Co.,Ltd.

TA01 Transfer of patent application right
GR01 Patent grant
GR01 Patent grant