CN109634914B - Optimization method for whole storage, dispersion and bifurcation retrieval of talkback voice small files - Google Patents

Optimization method for whole storage, dispersion and bifurcation retrieval of talkback voice small files Download PDF

Info

Publication number
CN109634914B
CN109634914B CN201811390509.6A CN201811390509A CN109634914B CN 109634914 B CN109634914 B CN 109634914B CN 201811390509 A CN201811390509 A CN 201811390509A CN 109634914 B CN109634914 B CN 109634914B
Authority
CN
China
Prior art keywords
files
talkback voice
talkback
small
small files
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201811390509.6A
Other languages
Chinese (zh)
Other versions
CN109634914A (en
Inventor
方国栋
张育钊
袁科
刘昊天
张鑫
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huaqiao University
Original Assignee
Huaqiao University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huaqiao University filed Critical Huaqiao University
Priority to CN201811390509.6A priority Critical patent/CN109634914B/en
Publication of CN109634914A publication Critical patent/CN109634914A/en
Application granted granted Critical
Publication of CN109634914B publication Critical patent/CN109634914B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention provides an optimization method for whole storage, dispersion and bifurcation retrieval of talkback voice small files, which comprises the steps of classifying the talkback voice small files; sequencing each type of talkback voice small files in sequence according to the sizes of the files; for each type of talkback voice small files, selecting the maximum number of files which can be accommodated by the integral multiple of the HDFS block space according to the sorting sequence, and merging and storing the selected talkback voice small files; setting classification levels for the remaining talkback voice small files after selection, classifying the remaining talkback voice small files according to the set classification levels, and merging and storing the classified talkback voice small files; and establishing a bifurcation index mechanism for the stored merged file, and recording the information of each talkback voice small file in the merged file. The invention has the advantages that: the number of block spaces can be reduced, and the aim of reducing the overhigh memory occupation when the NameNode maintains the metadata is fulfilled; the space size of metadata information can be reduced, and the reading speed is accelerated.

Description

Optimization method for whole storage, dispersion and bifurcation retrieval of talkback voice small files
Technical Field
The invention relates to the field of performance optimization of distributed file systems, in particular to an optimization method for whole storage, dispersion and bifurcation retrieval of small talkback voice files.
Background
With the rapid development of internet technology, the communication industry has also changed dramatically. The application of the Internet Protocol (IP) -based network interphone is more and more extensive, the amount of talkback voice small files is larger and larger due to frequent use of the interphone by users, and how to effectively manage the talkback voice small files becomes a problem to be solved urgently for the internet interphone provider.
The Hadoop Distributed File System (HDFS) is a core component of a Hadoop Distributed computing framework of an Apache open source organization, takes GFS (Google File System) of Google company as a prototype, is realized by adopting Java open source, and provides reference for erecting cloud storage solutions for various major institutions and companies. Once released, the HDFS has been widely used to store mass data in internet companies such as FaceBook, Yahoo, ariiba, tench, and hectometre. The design is designed to stably run on a low-cost commercial server, and the system also has the advantages of high fault tolerance, good expandability and the like.
The HDFS adopts a master-slave type structure and consists of a NameNode node and a large number of DataNode nodes, wherein the NameNode is the core of the HDFS, the operation of the NameNode is to maintain the metadata information of files and coordinate and manage all the DataNode nodes, and the DataNode is used for storing actual files. After the Hadoop cluster is started, all metadata information is loaded into the memory of the NameNode. When a client accesses the HDFS, metadata information of related files is firstly acquired from the NameNode node, then the DataNode for actually storing the files is found according to the metadata information, and finally the files requested by the client are acquired through the DataNode.
The HDFS master-slave architecture has several problems, that is, because each file corresponds to a piece of metadata information, and the space occupied by each piece of metadata information is about 150 bytes, as the number of small files stored in the HDFS increases, the metadata information to be maintained by the NameNode also increases sharply, and the NameNode space is consumed in large quantities, but the memory space of the NameNode is limited, so that the performance bottleneck of the NameNode is finally caused. Secondly, each time of writing small files, the distribution of data blocks needs to be requested to the NameNode node, and each time of reading small files, metadata information needs to be requested to the NameNode node, so that frequent data reading and writing can cause the performance of the NameNode node to be reduced, and even cause network flooding. And thirdly, the file size of each small file is smaller, and the three steps of requesting file metadata information, locating the position of a data block, and establishing connection between a client and a DataNode are carried out every time when an actual file is transmitted, so that the time for reading and writing the small files is possibly shorter than the time for establishing network connection, and the efficiency of the HDFS is reduced.
Disclosure of Invention
The technical problem to be solved by the invention is to provide an optimization method for whole storage, dispersion and bifurcation retrieval of small files of talkback voice, and the method is used for solving the problems of overhigh memory occupation of a NameNode node, performance reduction caused by reading and writing small files and the like caused by storing a large number of small files by the conventional HDFS.
The invention is realized by the following steps: a method for optimizing whole storage, dispersion and bifurcation retrieval of talkback voice small files comprises the following steps:
step S1, classifying the talkback voice small files;
step S2, after the classification is finished, sequencing each class of talkback voice small files in sequence according to the size of the files;
step S3, selecting the maximum number of files which can be accommodated by the integral multiple of the HDFS block space according to the sorting sequence for each type of small talkback voice files, and merging and storing the selected small talkback voice files into the HDFS block space;
step S4, setting classification levels for the remaining talkback voice small files after selection, classifying the remaining talkback voice small files according to the set classification levels, and merging and storing the classified talkback voice small files;
and step S5, establishing a bifurcation index mechanism for the stored merged file, and recording the information of each talkback voice small file in the merged file.
Further, the step S1 specifically includes:
step S11, when the talkback voice initiator uploads the talkback voice small files, the talkback voice server places all talkback voice small files belonging to the initiator under a specified folder according to the initiator information;
and step S12, marking the appointed folders of each initiator as a type separately according to the initiator information.
Further, the designated folders are each named by an initiator name.
Further, the step S2 is specifically:
after the classification is finished, for each appointed folder, traversing all the talkback voice small files under the appointed folder, and sequencing all the talkback voice small files under the same appointed folder according to the sequence from big to small.
Further, in step S4, the setting of the classification level for the remaining talkback voice small files after selection is specifically:
according to the relation among all the initiators, three classification levels are set for the remaining talkback voice small files after selection, and each classification level is set with a priority, wherein the three classification levels are sequentially from high to low according to the priority: the group-related talkback voice small files, the talkback voice small files with the same time period and other talkback voice small files.
Further, in step S4, the merging and storing the classified small talk-back files specifically includes:
step B11, creating a buffer area;
step B12, filling the talkback voice small files with the group relationship into a cache region, judging whether the cache region can not be filled with the next talkback voice small files with the group relationship, if so, directly storing the talkback voice small files filled into the cache region into an HDFS block space, emptying the cache region, and entering step B13; if not, go directly to step B13;
step B13, judging whether the talkback voice small files with group relation are filled, if yes, entering step B14; if not, return to step B12;
step B14, filling talkback voice small files with the same time period into the cache region, judging whether the cache region can not be refilled with talkback voice small files with the same time period in the next time period, if so, directly storing the talkback voice small files filled into the cache region into an HDFS block space, emptying the cache region, and entering step B15; if not, go directly to step B15;
step B15, judging whether the talkback voice docks in the same time period are filled, if so, entering step B16; if not, return to step B14;
step B16, filling other talkback voice small files into the cache region, judging whether the cache region can not be filled with the next other talkback voice small files or not, if so, directly storing the talkback voice small files filled into the cache region into the HDFS block space, emptying the cache region, and entering step B17; if not, go directly to step B17;
step B17, judging whether other talkback voice small files are filled up, if so, entering step B18; if not, return to step B16;
and step B18, storing the small talkback voice files filled in the cache area into the HDFS block space, and emptying the cache area.
Further, the step S5 is specifically:
storing the metadata information of each talkback voice small file by using a hash table structure, wherein in the hash table structure, a key is a hash value of the name information of the talkback voice small file, and the structure of the key is < user name | file name >; the value is metadata information of the talkback voice small file; for the talkback voice small files selected according to the sorting sequence, the metadata information comprises the range, the initial position and the length of the small files; for the remaining talkback voice small files after selection, the metadata information comprises the range of the small files, the file names after scattered combination, the initial positions and the lengths.
The invention has the following advantages:
1. in an original HDFS block management mode, waste of block space is reduced through small file combination, and an integration and dispersion strategy is added, so that waste of block space by edge files is avoided. The specific implementation mode is as follows: the small files are integrated and dispersed according to the initiators and the relations thereof, namely, the files belonging to the same initiator are merged firstly, and then the small files exceeding the integral multiple of the block space are classified and merged according to the relations among the initiators. By the method, the number of block spaces can be reduced, and the aim of reducing the overhigh memory occupation when the NameNode maintains the metadata is fulfilled.
2. By establishing a bifurcation index mechanism and using different classification methods aiming at different files, the space size of metadata information can be reduced, and the reading speed is accelerated. The specific implementation mode is as follows: and recording the belonged range, the starting position and the length of the small file. The method is very convenient and simple for users, can complete related functions only by deploying software, and is convenient to popularize and use.
Drawings
The invention will be further described with reference to the following examples with reference to the accompanying drawings.
FIG. 1 is a functional block diagram of an implementation of the present invention.
FIG. 2 is a flowchart illustrating an optimization method for whole storage, dispersion and bifurcation retrieval of a small talk-back voice file according to the present invention.
Fig. 3 is a schematic diagram illustrating classifying talkback voice doclets according to the present invention.
Fig. 4 is a schematic diagram of sorting the intercom voice doclets in accordance with the present invention.
Fig. 5 is a diagram illustrating a correspondence relationship between a metadata information structure of the intercom voice doclet and the intercom voice doclet in accordance with the present invention.
FIG. 6 is a diagram illustrating an integrated storage structure according to the present invention.
FIG. 7 is a diagram of a distributed storage structure according to the present invention.
Detailed Description
Referring to fig. 1 to 7, the present invention provides a method for optimizing whole storage, dispersion and bifurcation retrieval of small talk-back files, which comprises the following steps:
step S1, classifying the talkback voice small files;
in this embodiment, the step S1 specifically includes:
step S11, when the talkback voice initiator uploads the talkback voice small files, the talkback voice server places all talkback voice small files belonging to the initiator under a specified folder according to the initiator information;
step S12, according to the originator information, the designated folders of each originator are individually marked as a class, that is, when the classification is performed specifically, the originator is used as the classification criterion.
As shown in fig. 3, for example, the initiator a and the initiator B both upload 6 intercom voice doclets, and after receiving the intercom voice doclets uploaded by the initiator a and the initiator B, the intercom voice server stores the 6 intercom voice doclets corresponding to the initiator a in the directory of the designated folder corresponding to the initiator a, and stores the 6 intercom voice doclets corresponding to the initiator B in the directory of the designated folder corresponding to the initiator B.
The designated folders are each named by the name of the initiator. The voice talkback is mainly divided into the initiator and the receiver, so the initiator is used for naming and is used as a classification standard, on one hand, the classification rule can be ensured to be clear, and on the other hand, the position of the small file to be accessed can be quickly positioned according to the information of the initiator, so that the efficiency of accessing the talkback voice small file is improved.
Step S2, after the classification is finished, sequencing each class of talkback voice small files in sequence according to the size of the files;
in this embodiment, the step S2 specifically includes:
after the classification is finished, for each appointed folder, traversing all the talkback voice small files under the appointed folder, and sequencing all the talkback voice small files under the same appointed folder according to the sequence from big to small. In specific implementation, all the talkback voice small files in the same designated folder can be uniformly sorted by one sorting module.
As shown in fig. 4, for example, 6 intercom voice docks are stored in the designated folder of the originator a, and the size order of the files is: the talkback file 1, the talkback file 2, the talkback file 3, the talkback file 4, the talkback file 5 and the talkback file 6 are sequentially arranged, and then the sorting module sorts the talkback files according to the sequence of the talkback file 1, the talkback file 2, the talkback file 3, the talkback file 4, the talkback file 5 and the talkback file 6.
Step S3, selecting the maximum number of files which can be accommodated by the integral multiple of the HDFS block space according to the sorting sequence for each type of small talkback voice files, and merging and storing the selected small talkback voice files into the HDFS block space; that is to say, in the specific implementation of the present invention, each type of small intercom voice files is stored in an integrated manner, that is, the small intercom voice files whose sum is closest to the integral multiple of the block space of the HDFS (the size of each block space in the HDFS is fixed, and since a plurality of block spaces are usually occupied during the specific storage, the integral multiple of the block space needs to be selected) are selected in the order from large to small, and the small intercom voice files are integrated and stored in the block space of the HDFS.
For example, as can be seen from fig. 4, only the talkback files 1 to 5 can be accommodated by integral multiples of the HDFS block space, then the talkback files 1 to 5 are stored in the HDFS block space in an integrated manner, and the talkback file 6 is handed to the scatter module for processing; at this time, if the talkback file 1 is handed over to the scatter module for processing, the occupancy rate of the memory will be reduced, because each talkback voice small file is sorted from large to small, the following can be obtained:
File1.Size≥File6.Size (1)
this can be further derived from equation (1):
Figure BDA0001873936860000071
in specific implementation, since there is very large randomness in the user's talk-back time, the probability of being equal to true in equations (1) and (2) is very low, and therefore, storing the talk-back voice small file in this way can maximize the use of the block space.
Step S4, setting classification levels for the remaining talkback voice small files after selection, classifying the remaining talkback voice small files according to the set classification levels, and merging and storing the classified talkback voice small files; in specific implementation, the selected scattered talkback voice small files can be processed by one scattering module in a unified mode.
In step S4, the setting of the classification level for the remaining intercom voice doclets after selection is specifically:
according to the relation among all the initiators, three classification levels are set for the remaining talkback voice small files after selection, and each classification level is set with a priority, wherein the three classification levels are sequentially from high to low according to the priority: the group-related talkback voice small files, the talkback voice small files with the same time period and other talkback voice small files.
In step S4, the merging and storing the classified talkback voice doclets specifically includes:
step B11, creating a buffer area;
step B12, filling the talkback voice small files with the group relationship into a cache region, judging whether the cache region can not be filled with the next talkback voice small files with the group relationship, if so, directly storing the talkback voice small files filled into the cache region into an HDFS block space, emptying the cache region, and entering step B13; if not, go directly to step B13;
step B13, judging whether the talkback voice small files with group relation are filled, if yes, entering step B14; if not, return to step B12;
step B14, filling talkback voice small files with the same time period into the cache region, judging whether the cache region can not be refilled with talkback voice small files with the same time period in the next time period, if so, directly storing the talkback voice small files filled into the cache region into an HDFS block space, emptying the cache region, and entering step B15; if not, go directly to step B15;
step B15, judging whether the talkback voice docks in the same time period are filled, if so, entering step B16; if not, return to step B14;
step B16, filling other talkback voice small files into the cache region, judging whether the cache region can not be filled with the next other talkback voice small files or not, if so, directly storing the talkback voice small files filled into the cache region into the HDFS block space, emptying the cache region, and entering step B17; if not, go directly to step B17;
step B17, judging whether other talkback voice small files are filled up, if so, entering step B18; if not, return to step B16;
and step B18, storing the small talkback voice files filled in the cache area into the HDFS block space, and emptying the cache area.
The invention uses the scattered way to combine the remaining talk-back voice small files after selection to reduce the occupation of block space, so as to reduce the occupation of memory of NameNode by reducing the waste of block space.
And step S5, establishing a bifurcation index mechanism for the stored merged file, and recording the information of each talkback voice small file in the merged file.
The step S5 specifically includes:
using a HashMap with a HashMap structure to store metadata information of each talkback voice small file, wherein the HashMap structure is a Key, a Value, namely, a Key, a Value, and in the HashMap with the HashMap structure, the Key (Key) is a HashCode Value of the name information of the talkback voice small file, and the Key (Key) structure is a user name | file name >; the Value (Value) is metadata information of the talkback voice doclet;
for the talkback voice doclets selected according to the sorting order (i.e. the talkback voice doclets stored in an integrated manner, Scope is Whole), the metadata information includes Scope to which the doclets belong, start position (Offset), and Length (Length), as shown in fig. 6; for the remaining talkback voice doclets after selection (i.e. talkback voice doclets stored in a scatter manner, Scope is Apart), the metadata information includes a Scope (Scope) to which the doclets belong, a filename (MergeFileName) after scatter merge, a start position (Offset), and a Length (Length), as shown in fig. 7.
The talkback voice small files are stored by adopting the integrated storage structure and the scattered storage structure, so that when a client (namely an accessor) accesses the talkback voice small files, Scope (range to which the small files belong) information of the talkback voice small files can be obtained from a HashMap in a hash table structure according to a Key (Key), and after a return Value is obtained, if the talkback voice small files are combined in an integrated mode, the corresponding talkback voice small files are read from a specified file directly through an initial position (Offset) and a Length (Length) in a user name and a Value (Value) in the Key (Key); if the talkback voice small files are combined in a scattered manner, reading the corresponding small files from the designated files according to the file name (MergeFileName), the starting position (Offset) and the Length (Length) after scattered combination, as shown in FIG. 5. When reading small files, the HDFSAPI is directly used.
In summary, the invention has the following advantages:
1. in an original HDFS block management mode, waste of block space is reduced through small file combination, and an integration and dispersion strategy is added, so that waste of block space by edge files is avoided. The specific implementation mode is as follows: the small files are integrated and dispersed according to the initiators and the relations thereof, namely, the files belonging to the same initiator are merged firstly, and then the small files exceeding the integral multiple of the block space are classified and merged according to the relations among the initiators. By the method, the number of block spaces can be reduced, and the aim of reducing the overhigh memory occupation when the NameNode maintains the metadata is fulfilled.
2. By establishing a bifurcation index mechanism and using different classification methods aiming at different files, the space size of metadata information can be reduced, and the reading speed is accelerated. The specific implementation mode is as follows: and recording the belonged range, the starting position and the length of the small file. The method is very convenient and simple for users, can complete related functions only by deploying software, and is convenient to popularize and use.
Although specific embodiments of the invention have been described above, it will be understood by those skilled in the art that the specific embodiments described are illustrative only and are not limiting upon the scope of the invention, and that equivalent modifications and variations can be made by those skilled in the art without departing from the spirit of the invention, which is to be limited only by the appended claims.

Claims (6)

1. An optimization method for whole storage, dispersion and bifurcation retrieval of talkback voice small files is characterized by comprising the following steps: the method comprises the following steps:
step S1, classifying the talkback voice small files;
step S2, after the classification is finished, sequencing each class of talkback voice small files in sequence according to the size of the files;
step S3, selecting the maximum number of files which can be accommodated by the integral multiple of the HDFS block space according to the sorting sequence for each type of small talkback voice files, and merging and storing the selected small talkback voice files into the HDFS block space;
step S4, setting classification levels for the remaining talkback voice small files after selection, classifying the remaining talkback voice small files according to the set classification levels, and merging and storing the classified talkback voice small files;
step S5, establishing a bifurcation index mechanism for the stored merged file, and recording the information of each talkback voice small file in the merged file;
in step S4, the merging and storing the classified talkback voice doclets specifically includes:
step B11, creating a buffer area;
step B12, filling the talkback voice small files with the group relationship into a cache region, judging whether the cache region can not be filled with the next talkback voice small files with the group relationship, if so, directly storing the talkback voice small files filled into the cache region into an HDFS block space, emptying the cache region, and entering step B13; if not, go directly to step B13;
step B13, judging whether the talkback voice small files with group relation are filled, if yes, entering step B14; if not, return to step B12;
step B14, filling talkback voice small files with the same time period into the cache region, judging whether the cache region can not be refilled with talkback voice small files with the same time period in the next time period, if so, directly storing the talkback voice small files filled into the cache region into an HDFS block space, emptying the cache region, and entering step B15; if not, go directly to step B15;
step B15, judging whether the talkback voice docks in the same time period are filled, if so, entering step B16; if not, return to step B14;
step B16, filling other talkback voice small files into the cache region, judging whether the cache region can not be filled with the next other talkback voice small files or not, if so, directly storing the talkback voice small files filled into the cache region into the HDFS block space, emptying the cache region, and entering step B17; if not, go directly to step B17;
step B17, judging whether other talkback voice small files are filled up, if so, entering step B18; if not, return to step B16;
and step B18, storing the small talkback voice files filled in the cache area into the HDFS block space, and emptying the cache area.
2. The method for optimizing whole inventory scatter and bifurcation retrieval of small intercom voice files according to claim 1, wherein: the step S1 specifically includes:
step S11, when the talkback voice initiator uploads the talkback voice small files, the talkback voice server places all talkback voice small files belonging to the initiator under a specified folder according to the initiator information;
and step S12, marking the appointed folders of each initiator as a type separately according to the initiator information.
3. The method for optimizing whole inventory and split and bifurcation retrieval of small intercom voice files according to claim 2, wherein the method comprises the following steps: the designated folders are each named by the name of the initiator.
4. The method for optimizing whole inventory and split and bifurcation retrieval of small intercom voice files according to claim 2, wherein the method comprises the following steps: the step S2 specifically includes:
after the classification is finished, for each appointed folder, traversing all the talkback voice small files under the appointed folder, and sequencing all the talkback voice small files under the same appointed folder according to the sequence from big to small.
5. The method for optimizing whole inventory scatter and bifurcation retrieval of small intercom voice files according to claim 1, wherein: in step S4, the setting of the classification level for the remaining intercom voice doclets after selection is specifically:
according to the relation among all the initiators, three classification levels are set for the remaining talkback voice small files after selection, and each classification level is set with a priority, wherein the three classification levels are sequentially from high to low according to the priority: the group-related talkback voice small files, the talkback voice small files with the same time period and other talkback voice small files.
6. The method for optimizing whole inventory scatter and bifurcation retrieval of small intercom voice files according to claim 1, wherein: the step S5 specifically includes:
storing the metadata information of each talkback voice small file by using a hash table structure, wherein in the hash table structure, a key is a hash value of the name information of the talkback voice small file, and the structure of the key is < user name | file name >; the value is metadata information of the talkback voice small file; for the talkback voice small files selected according to the sorting sequence, the metadata information comprises the range, the initial position and the length of the small files; for the remaining talkback voice small files after selection, the metadata information comprises the range of the small files, the file names after scattered combination, the initial positions and the lengths.
CN201811390509.6A 2018-11-21 2018-11-21 Optimization method for whole storage, dispersion and bifurcation retrieval of talkback voice small files Active CN109634914B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811390509.6A CN109634914B (en) 2018-11-21 2018-11-21 Optimization method for whole storage, dispersion and bifurcation retrieval of talkback voice small files

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811390509.6A CN109634914B (en) 2018-11-21 2018-11-21 Optimization method for whole storage, dispersion and bifurcation retrieval of talkback voice small files

Publications (2)

Publication Number Publication Date
CN109634914A CN109634914A (en) 2019-04-16
CN109634914B true CN109634914B (en) 2021-11-30

Family

ID=66068643

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811390509.6A Active CN109634914B (en) 2018-11-21 2018-11-21 Optimization method for whole storage, dispersion and bifurcation retrieval of talkback voice small files

Country Status (1)

Country Link
CN (1) CN109634914B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112235422B (en) * 2020-12-11 2021-03-30 浙江大华技术股份有限公司 Data processing method and device, computer readable storage medium and electronic device

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104820717A (en) * 2015-05-22 2015-08-05 国网智能电网研究院 Massive small file storage and management method and system
CN105631010A (en) * 2015-12-29 2016-06-01 成都康赛信息技术有限公司 Optimization method based on HDFS small file storage
CN107103095A (en) * 2017-05-19 2017-08-29 成都四象联创科技有限公司 Method for computing data based on high performance network framework
CN108710639A (en) * 2018-04-17 2018-10-26 桂林电子科技大学 A kind of mass small documents access optimization method based on Ceph

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160232457A1 (en) * 2015-02-11 2016-08-11 Skytree, Inc. User Interface for Unified Data Science Platform Including Management of Models, Experiments, Data Sets, Projects, Actions and Features

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104820717A (en) * 2015-05-22 2015-08-05 国网智能电网研究院 Massive small file storage and management method and system
CN105631010A (en) * 2015-12-29 2016-06-01 成都康赛信息技术有限公司 Optimization method based on HDFS small file storage
CN107103095A (en) * 2017-05-19 2017-08-29 成都四象联创科技有限公司 Method for computing data based on high performance network framework
CN108710639A (en) * 2018-04-17 2018-10-26 桂林电子科技大学 A kind of mass small documents access optimization method based on Ceph

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
一种面向HDFS中海量小文件的存取优化方法;顾玉宛等;《计算机应用研究》;20170831;第34卷(第8期);全文 *

Also Published As

Publication number Publication date
CN109634914A (en) 2019-04-16

Similar Documents

Publication Publication Date Title
AU2014212780B2 (en) Data stream splitting for low-latency data access
US10581957B2 (en) Multi-level data staging for low latency data access
CN102332029B (en) Hadoop-based mass classifiable small file association storage method
US8838595B2 (en) Operating on objects stored in a distributed database
US9774564B2 (en) File processing method, system and server-clustered system for cloud storage
US9020892B2 (en) Efficient metadata storage
US10482062B1 (en) Independent evictions from datastore accelerator fleet nodes
CN105933376A (en) Data manipulation method, server and storage system
CN109766318B (en) File reading method and device
US11151081B1 (en) Data tiering service with cold tier indexing
US10817203B1 (en) Client-configurable data tiering service
CN109767274B (en) Method and system for carrying out associated storage on massive invoice data
CN103559229A (en) Small file management service (SFMS) system based on MapFile and use method thereof
US11496562B1 (en) Method and system for accessing digital object in human-cyber-physical environment
US20220253419A1 (en) Multi-record index structure for key-value stores
CN116661705A (en) Data management method, system, electronic equipment and storage medium based on kafka
CN114116612B (en) Access method for index archive file based on B+ tree
CN109634914B (en) Optimization method for whole storage, dispersion and bifurcation retrieval of talkback voice small files
US10146833B1 (en) Write-back techniques at datastore accelerators
CN110109866B (en) Method and equipment for managing file system directory
CN113095778A (en) Architecture for managing mass data in communication application through multiple mailboxes
US9578120B1 (en) Messaging with key-value persistence
CN112965939A (en) File merging method, device and equipment
EP2765517B1 (en) Data stream splitting for low-latency data access
CN113835613B (en) File reading method and device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant