CN114840488A - Distributed storage method, system and storage medium based on super-fusion structure - Google Patents

Distributed storage method, system and storage medium based on super-fusion structure Download PDF

Info

Publication number
CN114840488A
CN114840488A CN202210778538.XA CN202210778538A CN114840488A CN 114840488 A CN114840488 A CN 114840488A CN 202210778538 A CN202210778538 A CN 202210778538A CN 114840488 A CN114840488 A CN 114840488A
Authority
CN
China
Prior art keywords
file
statistical information
resource pool
uniform resource
data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202210778538.XA
Other languages
Chinese (zh)
Other versions
CN114840488B (en
Inventor
刘江
龚立义
郭军
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Baike Data Technology Shenzhen Co ltd
Original Assignee
Baike Data Technology Shenzhen Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Baike Data Technology Shenzhen Co ltd filed Critical Baike Data Technology Shenzhen Co ltd
Priority to CN202210778538.XA priority Critical patent/CN114840488B/en
Publication of CN114840488A publication Critical patent/CN114840488A/en
Application granted granted Critical
Publication of CN114840488B publication Critical patent/CN114840488B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/18File system types
    • G06F16/182Distributed file systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/13File access structures, e.g. distributed indices
    • G06F16/134Distributed indices
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/18File system types
    • G06F16/1805Append-only file systems, e.g. using logs or journals to store data
    • G06F16/1815Journaling file systems
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a distributed storage method, a distributed storage system and a storage medium based on a super-fusion structure, wherein the method comprises the following steps: acquiring data to be stored, and generating log statistical information according to the data to be stored; determining whether a file which is the same as or similar to the log statistical information exists in a preset uniform resource pool or not according to the log statistical information, and integrating and marking data to be stored to obtain an integrated marking file when the same or similar file does not exist in the uniform resource pool; and splitting the integrated marked file through the commercial server to obtain a split file, acquiring the type information of the split file, selecting a target storage disk matched with the type information from the uniform resource pool, and storing the split file into the target storage disk. The invention can automatically realize distributed storage of data, realize automatic resource allocation and realize high-efficiency communication.

Description

Distributed storage method, system and storage medium based on super-fusion structure
Technical Field
The invention relates to the technical field of data storage, in particular to a distributed storage method, a distributed storage system and a storage medium based on a super-fusion structure.
Background
Memory systems are one of the important components of computers. The memory system provides the ability to write and read information (programs and data) required for the operation of the computer, and realizes the information memory function of the computer. In a modern computer system, a multi-level storage architecture of a register, a high-speed cache, a main memory and an external memory is often adopted; the core of the computer storage system is a memory, which is a necessary memory device in the computer and used for storing programs and data; the internal memory (memory for short) mainly stores programs and data required by the current work of the computer, and comprises a Cache memory (Cache for short) and a main memory. The main memory elements currently used are semiconductor memories. The external memory (external memory for short) mainly has three implementations of magnetic memory, optical memory and semiconductor memory, and the storage medium includes hard disk, optical disk, magnetic tape and removable memory.
However, in the prior art, the storage of data is inefficient, and when data changes or needs to be updated, all data may need to be redistributed.
Thus, there is a need for improvements and enhancements in the art.
Disclosure of Invention
The technical problem to be solved by the present invention is to provide a distributed storage method based on a super-fusion structure, aiming at solving the problems that the efficiency of storing data is low and all data may need to be redistributed when the data changes or needs to be updated in the prior art.
In order to solve the technical problems, the technical scheme adopted by the invention is as follows:
in a first aspect, the present invention provides a distributed storage method based on a super-fusion structure, where the method includes:
acquiring data to be stored, temporarily storing the data to be stored, and generating log statistical information according to the data to be stored, wherein the log statistical information is used for reflecting attribute information in the data to be stored;
determining whether a file which is the same as or similar to the log statistical information exists in a preset uniform resource pool or not according to the log statistical information, and integrating and marking the data to be stored to obtain an integrated marking file when determining that the file which is the same as or similar to the log statistical information does not exist in the uniform resource pool;
and splitting the integrated mark file through a commercial server to obtain a split file, acquiring the type information of the split file, selecting a target storage disk matched with the type information from the uniform resource pool, and storing the split file into the target storage disk.
In one implementation, the generating log statistical information according to the data to be stored includes:
acquiring a file name, a keyword, a file size and a file type in the data to be stored;
and generating the log statistical information according to the file name, the keyword, the file size and the file type.
In an implementation manner, the determining whether a file identical or similar to the log statistical information exists in a preset uniform resource pool according to the log statistical information includes:
searching in the uniform resource pool according to the file name, the keyword, the file size and the file type in sequence, and determining candidate files respectively matched with the file name, the keyword, the file size and the file type in the uniform resource;
if the candidate files have files with the same file names, keywords, file sizes and file types, determining that the files with the same log statistical information exist in the uniform resource pool;
and if the candidate file does not have a file with the same file name, keyword, file size and file type, determining that the unified resource pool does not have a file with the same log statistical information.
In an implementation manner, the determining whether a file identical or similar to the log statistical information exists in a preset uniform resource pool according to the log statistical information includes:
carrying out similarity analysis on the file name, the keyword, the file size and the file type and existing files in the uniform resource pool in sequence;
if the similarity between the existing files and the file name, the keyword, the file size and the file type exceeds a threshold value, determining that the files similar to the log statistical information exist in the uniform resource pool;
if the similarity between the existing file and the file name, the keyword, the file size and the file type exceeds a threshold value, determining that the file similar to the log statistical information does not exist in the uniform resource pool.
In one implementation, the method further includes:
if the uniform resource pool has a file which is the same as or similar to the log statistical information, prompting a selection item, wherein the selection item comprises: replacing similar files, saving as new files or not saving files;
receiving an input instruction, determining a selection item corresponding to the instruction, and executing an operation corresponding to the selection item.
In one implementation, the splitting, by the commercial server, the integrated markup file to obtain a split file includes:
determining different locations of the integrated markup file through a compute node in the commercial server, and determining the same locations of the integrated markup file through a fusion node in the commercial server;
and splitting the integration mark file based on the same position and the different positions to obtain the split file.
In one implementation, the obtaining type information of the split file, selecting a target storage disk matched with the type information from the uniform resource pool, and storing the split file in the target storage disk includes:
determining the type information of the split file based on the file type in the log statistics;
according to the type information, finding out the target storage disk with the same storage type as the type information from the uniform resource pool;
and storing the split file into the target storage disk.
In a second aspect, an embodiment of the present invention further provides a distributed storage system based on a super-fusion structure, where the system includes: the system comprises a super-fusion all-in-one machine, a commercial server connected with the super-fusion all-in-one machine and a uniform resource pool connected with the commercial server; wherein, super fuse all-in-one includes:
the log statistical information acquisition module is used for acquiring data to be stored, temporarily storing the data to be stored and generating log statistical information according to the data to be stored, wherein the log statistical information is used for reflecting attribute information in the data to be stored;
an integration mark file obtaining module, configured to determine whether a file that is the same as or similar to the log statistical information exists in a preset uniform resource pool according to the log statistical information, and when it is determined that the file that is the same as or similar to the log statistical information does not exist in the uniform resource pool, integrate and mark the data to be stored to obtain an integration mark file;
and the file splitting and storing module is used for splitting the integration mark file through a commercial server to obtain a split file, acquiring the type information of the split file, selecting a target storage disk matched with the type information from the uniform resource pool, and storing the split file into the target storage disk.
In a third aspect, an embodiment of the present invention further provides a super-fusion all-in-one machine, where the super-fusion all-in-one machine includes a memory, a processor, and a distributed storage program based on a super-fusion structure, where the distributed storage program based on a super-fusion structure is stored in the memory and is executable on the processor, and when the processor executes the distributed storage program based on a super-fusion structure, the steps of the distributed storage method based on a super-fusion structure according to any one of the above schemes are implemented.
In a fourth aspect, the embodiment of the present invention further provides a computer-readable storage medium, where a distributed storage program based on a super-fusion structure is stored, and when being executed by a processor, the computer-readable storage medium implements the steps of the distributed storage method based on a super-fusion structure according to any one of the above schemes.
Has the advantages that: compared with the prior art, the invention provides a distributed storage method based on a super-fusion structure, which is characterized by acquiring data to be stored, temporarily storing the data to be stored, and generating log statistical information according to the data to be stored, wherein the log statistical information is used for reflecting attribute information in the data to be stored. And then, according to the log statistical information, determining whether a file which is the same as or similar to the log statistical information exists in a preset uniform resource pool, and integrating and marking the data to be stored to obtain an integrated marking file when determining that the file which is the same as or similar to the log statistical information does not exist in the uniform resource pool. And finally, splitting the integration mark file through a commercial server to obtain a split file, acquiring the type information of the split file, selecting a target storage disk matched with the type information from the uniform resource pool, and storing the split file into the target storage disk. The invention can automatically realize distributed storage of data and automatic resource allocation, the super-fusion structure has no setting of a master node and a slave node, each calculation/data node has the capability of bearing the function of the other calculation/data node, and the nodes complete mutual cooperation through an internal efficient distributed protocol to realize efficient communication.
Drawings
Fig. 1 is a flowchart of a specific implementation of a distributed storage method based on a super-fusion structure according to an embodiment of the present invention.
Fig. 2 is a schematic frame diagram of a distributed storage system based on a super-fusion structure according to an embodiment of the present invention.
Fig. 3 is a schematic block diagram of a super-fusion all-in-one machine in a distributed storage system based on a super-fusion structure according to an embodiment of the present invention.
Fig. 4 is a functional schematic diagram of the super-fusion all-in-one machine provided in the embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and effects of the present invention clearer and clearer, the present invention is further described in detail below with reference to the accompanying drawings and examples. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
The embodiment provides a distributed storage method based on a super-fusion structure, and the method based on the embodiment can realize high-efficiency storage of data. In specific implementation, the embodiment acquires data to be stored, temporarily stores the data to be stored, and generates log statistical information according to the data to be stored, where the log statistical information is used to reflect attribute information in the data to be stored. And then, according to the log statistical information, determining whether a file which is the same as or similar to the log statistical information exists in a preset uniform resource pool, and integrating and marking the data to be stored to obtain an integrated marking file when determining that the file which is the same as or similar to the log statistical information does not exist in the uniform resource pool. And finally, splitting the integration mark file through a commercial server to obtain a split file, acquiring the type information of the split file, selecting a target storage disk matched with the type information from the uniform resource pool, and storing the split file into the target storage disk. The embodiment can automatically realize distributed storage of data and automatic resource allocation, the super-fusion structure is not provided with a master node and a slave node, each computing/data node has the capability of bearing the function of the other computing/data node, and the nodes complete mutual cooperation through an internal efficient distributed protocol to realize efficient communication.
Exemplary method
The distributed storage method based on the super-fusion structure can be applied to terminal equipment, the terminal equipment can be a super-fusion all-in-one machine, the super-fusion all-in-one machine cluster has very good elastic expansion capability, and in the system operation process, when nodes and hard disks are added or deleted, the super-fusion structure can achieve data optimization in the cluster, and automatic redistribution and equalization are achieved; the whole data migration and rebalancing process does not influence the access of the application to the data; in the redistribution and equalization process of all data, the system can ensure that only as little data as possible is needed to be redistributed, and all data in the system does not need to be adjusted and migrated, so that the stability and performance of the system are improved. Specifically, as shown in fig. 1, the distributed storage method based on the super-fusion structure of the present embodiment specifically includes the following steps:
step S100, obtaining data to be stored, temporarily storing the data to be stored, and generating log statistical information according to the data to be stored, wherein the log statistical information is used for reflecting attribute information in the data to be stored.
In this embodiment, as shown in fig. 2, the PC first uploads the data to be stored, and the data is received and temporarily stored by the hyper-fusion all-in-one machine. The PC terminal in the embodiment is connected with the plurality of super-fusion all-in-one machines through the protocol channel, the protocol channel adopts a TCP/IP protocol to realize data transmission, and the super-fusion all-in-one machines temporarily store the data to be stored after acquiring the data to be stored uploaded by the PC terminal through the TCP/IP protocol. And then, the super-fusion all-in-one machine generates log statistical information according to the data to be stored, wherein the log statistical information is used for reflecting attribute information in the data to be stored.
Specifically, the super-fusion all-in-one machine of this embodiment obtains a file name, a keyword, a file size, and a file type in the data to be stored, and then generates the log statistical information according to the file name, the keyword, the file size, and the file type.
Step S200, determining whether a file which is the same as or similar to the log statistical information exists in a preset uniform resource pool or not according to the log statistical information, and integrating and marking the data to be stored to obtain an integrated marking file when determining that the file which is the same as or similar to the log statistical information does not exist in the uniform resource pool.
After the log statistical information is obtained, the super-fusion all-in-one machine of the embodiment can search in a preset uniform resource pool according to the log statistical information, and determine whether a file identical or similar to the log statistical information exists in the preset uniform resource pool. Specifically, the super-fusion all-in-one machine of this embodiment may sequentially search in the uniform resource pool according to the file name, the keyword, the file size, and the file type, and determine candidate files in the uniform resource that are respectively matched with the file name, the keyword, the file size, and the file type. And if the candidate files have files with the same file names, keywords, file sizes and file types, determining that the files with the same log statistical information exist in the uniform resource pool. And if the candidate file does not have a file with the same file name, keyword, file size and file type as the file name, keyword, file size and file type, determining that the unified resource pool does not have a file with the same log statistical information as the log statistical information. Or, the super-fusion all-in-one machine of this embodiment may further perform similarity analysis on the file name, the keyword, the file size, and the file type with existing files in the uniform resource pool in sequence. And if the similarity among the existing files, the file name, the keyword, the file size and the file type exceeds a threshold value, determining that the files similar to the log statistical information exist in the uniform resource pool. If the similarity between the existing file and the file name, the keyword, the file size and the file type exceeds a threshold value, determining that the file similar to the log statistical information does not exist in the uniform resource pool. And if the uniform resource pool has a file which is the same as or similar to the log statistical information, prompting a selection item, wherein the selection item comprises: and replacing the similar files, storing the similar files as new files or not storing the new files, then receiving an input instruction by the super-fusion all-in-one machine, determining a selection item corresponding to the instruction, and executing the operation corresponding to the selection item. Specifically, the super-fusion all-in-one machine of this embodiment may receive the instruction, and replace the uploaded similar file with the similar file in the uniform resource pool through the PC terminal according to the instruction, or save the similar file as a new file or not. And when the unified resource pool does not have a file which is the same as or similar to the log statistical information, judging the data to be stored as a new file, and integrating and marking the data to be stored to obtain an integrated marked file. The embodiment integrates and marks the data to be stored, so as to distinguish existing files, avoid confusion with the existing files, and be beneficial to better storing the data to be stored.
Step 300, splitting the integration mark file through a commercial server to obtain a split file, obtaining type information of the split file, selecting a target storage disk matched with the type information from the uniform resource pool, and storing the split file into the target storage disk.
In this embodiment, each super-fusion all-in-one machine is connected with the same resource pool through a commercial server, and after the integration mark file is obtained, the integration mark file can be split through the commercial server to obtain a split file. Then, the type information of the split file can be obtained, a target storage disk matched with the type information is selected from the uniform resource pool, and the split file is stored in the target storage disk. That is to say, in the embodiment, when the integration mark file is stored, the integration mark file is firstly split and then stored according to the type information, which is convenient for data management.
In one implementation, the step S300 specifically includes the following steps:
step S301, determining different positions of the integrated markup file through a computing node in the commercial server, and determining the same position of the integrated markup file through a fusion node in the commercial server;
step S302, splitting the integration mark file based on the same position and the different positions to obtain the split file.
In specific implementation, the commercial server in this embodiment includes a computing node and a fusion node, where the computing node is configured to determine different locations of the integrated markup file, and the fusion node is configured to determine the same locations of the integrated markup file. After the different locations and the same locations are determined, the embodiment may split the integrated markup file based on the same locations and the different locations to obtain the split file. In other words, in this embodiment, the same part of the integrated markup file is split into one file, and different parts of the integrated markup file are split into one file. In this embodiment, a resource pool is composed of a plurality of storage disks, and each storage disk is used for storing different types of data files. Therefore, after the split file is obtained, the type information of the split file can be further obtained, and then the split file is stored in the corresponding storage disk according to the type information, so that distributed storage is realized.
Specifically, the present embodiment may determine the type information of the split file based on the file type in the log statistical information. The log statistical information is obtained based on the file name, the keyword, the file size and the file type in the data to be stored, so the log statistical information comprises the file type. The split file is obtained by splitting an integration mark file formed by integrating marks of data to be stored, so that the type information of the split file can be determined after the file type is determined according to the log statistical information. Then, according to the type information, the target storage disk having the same storage type as the type information is found from the uniform resource pool; and finally, storing the split file into the target storage disk, so that different types of information can be stored into the corresponding storage disks in a distributed and ordered manner.
In an implementation manner, after the split file is stored in the target storage disk, the embodiment may perform an encryption operation on data in each storage disk in the uniform resource pool, and incorporate the identity information during encryption. Only after passing the identity authentication, the PC terminal can call the data in the uniform resource pool, thereby ensuring the security of the data.
The super fusion all-in-one machine in the embodiment adopts a distributed and shared-nothing design concept, data are stored in all nodes in a cluster in a distributed mode through a distributed algorithm, a data redundancy mode of a cross-node 2/3 copy can be achieved, and data reliability is greatly improved; the super-fusion architecture is not provided with a master node and a slave node, each computing/data node has the capability of bearing the function of the other computing/data node, and the nodes are mutually cooperated and communicated through an internal efficient distributed protocol. The super fusion all-in-one machine deploys calculation virtualization and distributed storage in the same server hardware, stores data on a local physical server aiming at applications with high I/O delay requirements such as virtualization and databases, reduces network overhead brought by traditional external shared storage (SAN/NAS), enables a user to set service levels of calculation and storage resources according to self needs, enables distribution of actual resources to be automatically completed by a management platform, and enables management to be easy and simple.
In summary, in this embodiment, first, data to be stored is obtained, the data to be stored is temporarily stored, and log statistical information is generated according to the data to be stored, where the log statistical information is used to reflect attribute information in the data to be stored. And then, according to the log statistical information, determining whether a file which is the same as or similar to the log statistical information exists in a preset uniform resource pool, and integrating and marking the data to be stored to obtain an integrated marking file when determining that the file which is the same as or similar to the log statistical information does not exist in the uniform resource pool. And finally, splitting the integration mark file through a commercial server to obtain a split file, acquiring the type information of the split file, selecting a target storage disk matched with the type information from the uniform resource pool, and storing the split file into the target storage disk. The embodiment can automatically realize distributed storage of data and automatic resource allocation, the super-fusion structure is not provided with a master node and a slave node, each computing/data node has the capability of bearing the function of the other computing/data node, and the nodes complete mutual cooperation through an internal efficient distributed protocol to realize efficient communication.
Exemplary System
Based on the above embodiment, the present invention further provides a distributed storage system based on a super-fusion structure, where the system includes: the system comprises a super-integration all-in-one machine, a commercial server connected with the super-integration all-in-one machine and a uniform resource pool connected with the commercial server. Wherein, as shown in fig. 3, the super-fusion all-in-one machine includes: a log statistical information acquisition module 10, an integration mark file acquisition module 20 and a file splitting and storing module 30. Specifically, the log statistical information obtaining module 10 in this embodiment is configured to obtain data to be stored, temporarily store the data to be stored, and generate log statistical information according to the data to be stored, where the log statistical information is used to reflect attribute information in the data to be stored. The integration mark file obtaining module 20 is configured to determine whether a file that is the same as or similar to the log statistical information exists in a preset uniform resource pool according to the log statistical information, and integrate and mark the data to be stored to obtain an integration mark file when it is determined that the file that is the same as or similar to the log statistical information does not exist in the uniform resource pool. The file splitting and storing module 30 is configured to split the integration markup file through a commercial server to obtain a split file, acquire type information of the split file, select a target storage disk matched with the type information from the uniform resource pool, and store the split file in the target storage disk.
In one implementation, the log statistical information obtaining module 10 includes:
the information acquisition unit is used for acquiring the file name, the keyword, the file size and the file type in the data to be stored;
and the information generating unit is used for generating the log statistical information according to the file name, the keyword, the file size and the file type.
In one implementation, the integration mark file obtaining module 20 includes:
the candidate matching unit is used for searching in the uniform resource pool according to the file name, the keyword, the file size and the file type in sequence and determining candidate files which are matched with the file name, the keyword, the file size and the file type in the uniform resource respectively;
the same judgment unit is used for determining that the file which is the same as the log statistical information exists in the uniform resource pool if the file which is the same as the file name, the keyword, the file size and the file type exists in the candidate files;
and a different judging unit, configured to determine that a file identical to the log statistical information does not exist in the uniform resource pool if a file identical to the file name, the keyword, the file size, and the file type does not exist in the candidate file.
In one implementation, the integration mark file obtaining module 20 includes:
the similarity analysis unit is used for carrying out similarity analysis on the file name, the keyword, the file size and the file type and existing files in the uniform resource pool in sequence;
a similarity judging unit, configured to determine that a file similar to the log statistical information exists in the uniform resource pool if similarity between the existing file and the file name, the keyword, the file size, and the file type exceeds a threshold;
and the dissimilarity judging unit is used for determining that the file similar to the log statistical information does not exist in the uniform resource pool if the similarity among the file name, the keyword, the file size and the file type does not exceed a threshold value in the existing file.
In one implementation, the system further includes:
a selection prompting module, configured to prompt a selection item if a file that is the same as or similar to the log statistical information exists in the uniform resource pool, where the selection item includes: replacing the similar files, saving the similar files as new files or not saving the files;
and the selection operation module is used for receiving an input instruction, determining a selection item corresponding to the instruction and executing the operation corresponding to the selection item.
In one implementation manner, the file splitting and storing module 30 further includes:
a file analysis unit for determining different locations of the integrated markup file through a computation node in the commercial server and determining the same locations of the integrated markup file through a fusion node in the commercial server;
and the file splitting unit is used for splitting the integration mark file based on the same position and the different positions to obtain the split file.
In one implementation manner, the file splitting and storing module 30 further includes:
a type determining unit, configured to determine the type information of the split file based on the file type in the log statistical information;
the type analysis unit is used for finding out the target storage disk with the same storage type as the type information from the uniform resource pool according to the type information;
and the file storage unit is used for storing the split file into the target storage disk.
The working principle of each module in the distributed storage system based on the super-fusion structure of this embodiment is the same as the principle of each step in the above method embodiments, and details are not described here.
Based on the above embodiment, the invention also provides a super-fusion all-in-one machine, and a schematic block diagram of the super-fusion all-in-one machine can be shown in fig. 4. The super-integration all-in-one machine comprises a processor and a memory which are connected through a system bus, wherein the processor and the memory are arranged in a host. Wherein, the processor of the super-fusion all-in-one machine is used for providing calculation and control capability. The memory of the super-fusion all-in-one machine comprises a nonvolatile storage medium and an internal memory. The non-volatile storage medium stores an operating system and a computer program. The internal memory provides an environment for the operation of an operating system and computer programs in the non-volatile storage medium. The network interface of the super-convergence all-in-one machine is used for being connected and communicated with an external terminal through network communication. The computer program is executed by a processor to implement a distributed storage method based on a hyper-fusion architecture.
It will be understood by those skilled in the art that the schematic block diagram shown in fig. 4 is only a block diagram of a portion of the structure associated with the solution of the present invention, and does not constitute a limitation on the super-fusion unitary apparatus to which the solution of the present invention is applied, and a specific super-fusion unitary apparatus may include more or less components than those shown in the drawings, or may combine some components, or have a different arrangement of components.
In one embodiment, a super-fusion all-in-one machine is provided, where the super-fusion all-in-one machine includes a memory, a processor, and a distributed storage method program based on a super-fusion structure, where the distributed storage method program based on a super-fusion structure is stored in the memory and is executable on the processor, and when the processor executes the distributed storage method program based on a super-fusion structure, the following operation instructions are implemented:
acquiring data to be stored, temporarily storing the data to be stored, and generating log statistical information according to the data to be stored, wherein the log statistical information is used for reflecting attribute information in the data to be stored;
determining whether a file which is the same as or similar to the log statistical information exists in a preset uniform resource pool or not according to the log statistical information, and integrating and marking the data to be stored to obtain an integrated marking file when determining that the file which is the same as or similar to the log statistical information does not exist in the uniform resource pool;
and splitting the integrated mark file through a commercial server to obtain a split file, acquiring the type information of the split file, selecting a target storage disk matched with the type information from the uniform resource pool, and storing the split file into the target storage disk.
It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by hardware instructions of a computer program, which can be stored in a non-volatile computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. Any reference to memory, storage, operational databases, or other media used in embodiments provided herein may include non-volatile and/or volatile memory. Non-volatile memory can include read-only memory (ROM), Programmable ROM (PROM), Electrically Programmable ROM (EPROM), Electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), Dynamic RAM (DRAM), Synchronous DRAM (SDRAM), double-rate SDRAM (DDRSDRAM), Enhanced SDRAM (ESDRAM), synchronous link (Synchlink) DRAM (SLDRAM), Rambus Direct RAM (RDRAM), direct bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM).
In summary, the present invention discloses a distributed storage method, system and storage medium based on a super-fusion structure, the method includes: acquiring data to be stored, and generating log statistical information according to the data to be stored; determining whether a file which is the same as or similar to the log statistical information exists in a preset uniform resource pool or not according to the log statistical information, and integrating and marking data to be stored to obtain an integrated marking file when the same or similar file does not exist in the uniform resource pool; and splitting the integrated marked file through the commercial server to obtain a split file, acquiring the type information of the split file, selecting a target storage disk matched with the type information from the uniform resource pool, and storing the split file into the target storage disk. The invention can automatically realize distributed storage of data, realize automatic resource allocation and realize high-efficiency communication.
Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims (10)

1. A distributed storage method based on a super-fusion structure is characterized by comprising the following steps:
acquiring data to be stored, temporarily storing the data to be stored, and generating log statistical information according to the data to be stored, wherein the log statistical information is used for reflecting attribute information in the data to be stored;
determining whether a file which is the same as or similar to the log statistical information exists in a preset uniform resource pool or not according to the log statistical information, and integrating and marking the data to be stored to obtain an integrated marking file when determining that the file which is the same as or similar to the log statistical information does not exist in the uniform resource pool;
and splitting the integration mark file through a commercial server to obtain a split file, acquiring the type information of the split file, selecting a target storage disk matched with the type information from the uniform resource pool, and storing the split file into the target storage disk.
2. The distributed storage method based on the super-fusion structure according to claim 1, wherein the generating log statistical information according to the data to be stored comprises:
acquiring a file name, a keyword, a file size and a file type in the data to be stored;
and generating the log statistical information according to the file name, the keyword, the file size and the file type.
3. The distributed storage method based on the super-fusion structure according to claim 2, wherein the determining whether a file identical or similar to the log statistical information exists in a preset uniform resource pool according to the log statistical information comprises:
searching in the uniform resource pool according to the file name, the keyword, the file size and the file type in sequence, and determining candidate files respectively matched with the file name, the keyword, the file size and the file type in the uniform resource;
if the candidate files have files with the same file names, keywords, file sizes and file types, determining that the files with the same log statistical information exist in the uniform resource pool;
and if the candidate file does not have a file with the same file name, keyword, file size and file type, determining that the unified resource pool does not have a file with the same log statistical information.
4. The distributed storage method based on the super-fusion structure according to claim 3, wherein the determining whether a file identical or similar to the log statistical information exists in a preset uniform resource pool according to the log statistical information comprises:
carrying out similarity analysis on the file name, the keyword, the file size and the file type and existing files in the uniform resource pool in sequence;
if the similarity between the existing files and the file name, the keyword, the file size and the file type exceeds a threshold value, determining that the files similar to the log statistical information exist in the uniform resource pool;
if the similarity between the existing file and the file name, the keyword, the file size and the file type exceeds a threshold value, determining that the file similar to the log statistical information does not exist in the unified resource pool.
5. The distributed storage method based on super-fusion structure according to claim 1, further comprising:
if the uniform resource pool has a file which is the same as or similar to the log statistical information, prompting a selection item, wherein the selection item comprises: replacing similar files, saving as new files or not saving files;
receiving an input instruction, determining a selection item corresponding to the instruction, and executing an operation corresponding to the selection item.
6. The distributed storage method based on the super-fusion structure according to claim 1, wherein the splitting the integration markup file by a commercial server to obtain a split file comprises:
determining different locations of the integrated markup file through a compute node in the commercial server, and determining the same locations of the integrated markup file through a fusion node in the commercial server;
and splitting the integration mark file based on the same position and the different positions to obtain the split file.
7. The distributed storage method based on the super-fusion structure according to claim 2, wherein the obtaining of the type information of the split file, selecting a target storage disk matched with the type information from the uniform resource pool, and storing the split file in the target storage disk comprises:
determining the type information of the split file based on the file type in the log statistics;
according to the type information, finding out the target storage disk with the same storage type as the type information from the uniform resource pool;
and storing the split file into the target storage disk.
8. A distributed storage system based on a super-fusion structure, the system comprising: the system comprises a super-fusion all-in-one machine, a commercial server connected with the super-fusion all-in-one machine and a uniform resource pool connected with the commercial server; wherein, super fuse all-in-one includes:
the log statistical information acquisition module is used for acquiring data to be stored, temporarily storing the data to be stored and generating log statistical information according to the data to be stored, wherein the log statistical information is used for reflecting attribute information in the data to be stored;
an integration mark file obtaining module, configured to determine whether a file that is the same as or similar to the log statistical information exists in a preset uniform resource pool according to the log statistical information, and when it is determined that the file that is the same as or similar to the log statistical information does not exist in the uniform resource pool, integrate and mark the data to be stored to obtain an integration mark file;
and the file splitting and storing module is used for splitting the integration mark file through a commercial server to obtain a split file, acquiring the type information of the split file, selecting a target storage disk matched with the type information from the uniform resource pool, and storing the split file into the target storage disk.
9. A hyper-fusion all-in-one machine, which is characterized by comprising a memory, a processor and a hyper-fusion structure based distributed storage program stored in the memory and operable on the processor, wherein the processor implements the steps of the hyper-fusion structure based distributed storage method according to any one of claims 1 to 7 when executing the hyper-fusion structure based distributed storage program.
10. A computer-readable storage medium, wherein the computer-readable storage medium stores thereon a distributed storage program based on a super-fusion structure, and when the distributed storage program based on a super-fusion structure is executed by a processor, the steps of the distributed storage method based on a super-fusion structure according to any one of claims 1 to 7 are implemented.
CN202210778538.XA 2022-07-04 2022-07-04 Distributed storage method, system and storage medium based on super fusion structure Active CN114840488B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210778538.XA CN114840488B (en) 2022-07-04 2022-07-04 Distributed storage method, system and storage medium based on super fusion structure

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210778538.XA CN114840488B (en) 2022-07-04 2022-07-04 Distributed storage method, system and storage medium based on super fusion structure

Publications (2)

Publication Number Publication Date
CN114840488A true CN114840488A (en) 2022-08-02
CN114840488B CN114840488B (en) 2023-05-02

Family

ID=82574251

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210778538.XA Active CN114840488B (en) 2022-07-04 2022-07-04 Distributed storage method, system and storage medium based on super fusion structure

Country Status (1)

Country Link
CN (1) CN114840488B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117555874A (en) * 2024-01-11 2024-02-13 成都大成均图科技有限公司 Log storage method, device, equipment and medium of distributed database

Citations (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20020116402A1 (en) * 2001-02-21 2002-08-22 Luke James Steven Information component based data storage and management
US20170192707A1 (en) * 2015-12-30 2017-07-06 Kevin Arnold External hard drive device with cloud drive support
WO2017166644A1 (en) * 2016-03-31 2017-10-05 乐视控股(北京)有限公司 Data acquisition method and system
US20180046503A1 (en) * 2016-08-09 2018-02-15 International Business Machines Corporation Data-locality-aware task scheduling on hyper-converged computing infrastructures
CN107807796A (en) * 2017-11-17 2018-03-16 北京联想超融合科技有限公司 A kind of data hierarchy method, terminal and system based on super fusion storage system
CN109542861A (en) * 2018-11-08 2019-03-29 浪潮软件集团有限公司 File management method, device and system
CN109558404A (en) * 2018-10-19 2019-04-02 中国平安人寿保险股份有限公司 Date storage method, device, computer equipment and storage medium
CN109960587A (en) * 2019-02-27 2019-07-02 厦门市世纪网通网络服务有限公司 The storage resource distribution method and device of super fusion cloud computing system
CN110019048A (en) * 2017-09-30 2019-07-16 北京国双科技有限公司 Document handling method, device, system and server based on MongoDB
CN110209633A (en) * 2019-06-06 2019-09-06 深圳龙图腾创新设计有限公司 A kind of document handling method, system, computer equipment and storage medium
CN111488198A (en) * 2020-04-16 2020-08-04 湖南麒麟信安科技有限公司 Virtual machine scheduling method, system and medium in super-fusion environment
US10785294B1 (en) * 2015-07-30 2020-09-22 EMC IP Holding Company LLC Methods, systems, and computer readable mediums for managing fault tolerance of hardware storage nodes
CN113448938A (en) * 2021-07-20 2021-09-28 恒安嘉新(北京)科技股份公司 Data processing method and device, electronic equipment and storage medium
CN113590033A (en) * 2021-06-30 2021-11-02 郑州云海信息技术有限公司 Information synchronization method and device of super-fusion system
US20220019358A1 (en) * 2020-07-15 2022-01-20 EMC IP Holding Company LLC Determining storage system configuration recommendations based on vertical sectors and size parameters using machine learning techniques
CN114238240A (en) * 2022-02-14 2022-03-25 柏科数据技术(深圳)股份有限公司 Distributed multi-cluster data storage method and device and storage medium

Patent Citations (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20020116402A1 (en) * 2001-02-21 2002-08-22 Luke James Steven Information component based data storage and management
US10785294B1 (en) * 2015-07-30 2020-09-22 EMC IP Holding Company LLC Methods, systems, and computer readable mediums for managing fault tolerance of hardware storage nodes
US20170192707A1 (en) * 2015-12-30 2017-07-06 Kevin Arnold External hard drive device with cloud drive support
WO2017166644A1 (en) * 2016-03-31 2017-10-05 乐视控股(北京)有限公司 Data acquisition method and system
US20180046503A1 (en) * 2016-08-09 2018-02-15 International Business Machines Corporation Data-locality-aware task scheduling on hyper-converged computing infrastructures
CN110019048A (en) * 2017-09-30 2019-07-16 北京国双科技有限公司 Document handling method, device, system and server based on MongoDB
CN107807796A (en) * 2017-11-17 2018-03-16 北京联想超融合科技有限公司 A kind of data hierarchy method, terminal and system based on super fusion storage system
CN109558404A (en) * 2018-10-19 2019-04-02 中国平安人寿保险股份有限公司 Date storage method, device, computer equipment and storage medium
CN109542861A (en) * 2018-11-08 2019-03-29 浪潮软件集团有限公司 File management method, device and system
CN109960587A (en) * 2019-02-27 2019-07-02 厦门市世纪网通网络服务有限公司 The storage resource distribution method and device of super fusion cloud computing system
CN110209633A (en) * 2019-06-06 2019-09-06 深圳龙图腾创新设计有限公司 A kind of document handling method, system, computer equipment and storage medium
CN111488198A (en) * 2020-04-16 2020-08-04 湖南麒麟信安科技有限公司 Virtual machine scheduling method, system and medium in super-fusion environment
US20220019358A1 (en) * 2020-07-15 2022-01-20 EMC IP Holding Company LLC Determining storage system configuration recommendations based on vertical sectors and size parameters using machine learning techniques
CN113590033A (en) * 2021-06-30 2021-11-02 郑州云海信息技术有限公司 Information synchronization method and device of super-fusion system
CN113448938A (en) * 2021-07-20 2021-09-28 恒安嘉新(北京)科技股份公司 Data processing method and device, electronic equipment and storage medium
CN114238240A (en) * 2022-02-14 2022-03-25 柏科数据技术(深圳)股份有限公司 Distributed multi-cluster data storage method and device and storage medium

Non-Patent Citations (10)

* Cited by examiner, † Cited by third party
Title
CARLOS MELO 等: ""Availability models for hyper-converged cloud computing infrastructures"", 《2018 ANNUAL IEEE INTERNATIONAL SYSTEMS CONFERENCE (SYSCON)》 *
CASSERLY, BRIAN MD 等: ""Lactate measurements in sepsis-induced tissue hypoperfusion: results from the Surviving Sepsis Campaign database"", 《HTTPS://JOURNALS.LWW.COM/CCMJOURNAL/ABSTRACT/2015/03000/LACTATE_MEASUREMENTS_IN_SEPSIS_INDUCED_TISSUE.8.ASPX》 *
HARSHIT GUJRAL 等: ""No-escape search: Design and implementation ofcloud based directory content search"", 《HTTPS://IEEEXPLORE.IEEE.ORG/DOCUMENT/8284288》 *
SOUY_C: ""超融合存储探秘"", 《HTTPS://BLOG.CSDN.NET/CYQ6239075/ARTICLE/DETAILS/106732341》 *
刘高军等: "基于Redis的海量小文件分布式存储方法研究", 《计算机工程与科学》 *
吴吉义等: "一种对等结构的云存储***研究", 《电子学报》 *
张彦彬: ""基于超融合架构的电子病历信息共享平台研究"", 《中国优秀硕士学位论文全文数据库 张彦彬》 *
张璐等: "数据库中小文件的实时存储与优化", 《河南科技》 *
许嗣强: ""新安防时代下超融合云存储的具体实现"", 《中国安防》 *
黄永燊: ""基于OpenStack的高可用超融合基础架构优化的研究"", 《中国优秀硕士学位论文全文数据库 信息科技辑》 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117555874A (en) * 2024-01-11 2024-02-13 成都大成均图科技有限公司 Log storage method, device, equipment and medium of distributed database
CN117555874B (en) * 2024-01-11 2024-03-29 成都大成均图科技有限公司 Log storage method, device, equipment and medium of distributed database

Also Published As

Publication number Publication date
CN114840488B (en) 2023-05-02

Similar Documents

Publication Publication Date Title
US9378053B2 (en) Generating map task output with version information during map task execution and executing reduce tasks using the output including version information
CN110147407B (en) Data processing method and device and database management server
US20070239747A1 (en) Methods, systems, and computer program products for providing read ahead and caching in an information lifecycle management system
EP3262512B1 (en) Application cache replication to secondary application(s)
JP2018516409A (en) Indexing method and system for file storage
US9514170B1 (en) Priority queue using two differently-indexed single-index tables
US10708379B1 (en) Dynamic proxy for databases
US10515228B2 (en) Commit and rollback of data streams provided by partially trusted entities
US11663170B2 (en) Method for associating data between a plurality of blockchain networks and apparatus thereof
CN111475483A (en) Database migration method and device and computing equipment
CN113760847A (en) Log data processing method, device, equipment and storage medium
CN114840488B (en) Distributed storage method, system and storage medium based on super fusion structure
US20180260463A1 (en) Computer system and method of assigning processing
CN107153680B (en) Method and system for on-line node expansion of distributed memory database
CN109388651B (en) Data processing method and device
US10599472B2 (en) Information processing apparatus, stage-out processing method and recording medium recording job management program
US20070005552A1 (en) Methods and systems for reducing transient memory consumption in an object-oriented system
CN112711606A (en) Database access method and device, computer equipment and storage medium
US9537941B2 (en) Method and system for verifying quality of server
CN107832121B (en) Concurrency control method applied to distributed serial long transactions
CN114089924B (en) Block chain account book data storage system and method
CN110765125A (en) Data storage method and device
CN113254349A (en) AB test processing method, device, equipment and storage medium based on cloud function
CN113626383A (en) Data processing method, device and equipment
CN113127717A (en) Key retrieval method and system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant