CN117555493B

CN117555493B - Data processing method, system, device, storage medium and electronic equipment

Info

Publication number: CN117555493B
Application number: CN202410044488.1A
Authority: CN
Inventors: 刘名欣; 王晓亮; 张旭明; 王豪迈; 胥昕
Original assignee: Beijing Xingchen Tianhe Technology Co ltd
Current assignee: Beijing Xingchen Tianhe Technology Co ltd
Priority date: 2024-01-11
Filing date: 2024-01-11
Publication date: 2024-05-10
Anticipated expiration: 2044-01-11
Also published as: CN117555493A

Abstract

The invention discloses a data processing method, a data processing system, a data processing device, a storage medium and electronic equipment. The method relates to the field of data processing, and comprises the following steps: under the condition that a data storage request of a client is received, acquiring data to be stored from the data storage request; judging whether the data to be stored is in the management range of the processing unit or not according to the information of the data to be stored; under the condition that the data to be stored is in the management range of the processing unit, acquiring resource allocation information matched with the data to be stored, and storing the data to be stored based on the resource allocation information; in the case where the data to be stored is not within the management range of the processing units, a target processing unit for managing the data to be stored is determined from the processing units in the data processing system, and information of the target processing unit is fed back to the client. The invention solves the technical problem that the data processing efficiency is low because the related technology adopts a shared-nothing architecture to process the data.

Description

Data processing method, system, device, storage medium and electronic equipment

Technical Field

The present invention relates to the field of data processing, and in particular, to a data processing method, system, device, storage medium and electronic apparatus.

Background

In today's digital age, data is a valuable asset for businesses and organizations, and the rapid growth and diversity of data presents a significant challenge to data processing systems (e.g., storage systems). To address these challenges, distributed storage systems have developed that generally employ Shared-Nothing architecture (Shared-Nothing) architecture, i.e., each node works independently, has its own storage resources, and is not Shared with other nodes, and this architecture design aims to improve the scalability, fault tolerance, and to some extent the performance of the system.

In a shared-nothing architecture, the data to be processed is typically mapped in slices, and each node is only responsible for managing a portion of the data, and such data slicing allows the system to evenly distribute data and load, so that data access is faster and more balanced. Meanwhile, in order to ensure the reliability of data, the data is usually stored in a redundant manner, for example, the data is mapped to a plurality of storage nodes by adopting a multi-copy and Erasure Code (EC) mode, and when I/O (Input/Output) is executed, each node needs to be coordinated, so that the corresponding management cost is increased.

For example, the widely used distributed storage system Ceph is a typical Shared-notify architecture implementation, with each node managing its internal object storage devices (Object Storage Daemon, OSD). The monitoring node of Ceph gathers information of all OSDs and synchronizes them to the client and the storage node, fig. 1 is a schematic diagram of a data writing process in the related art, as shown in fig. 1, the data to be processed is mapped onto a placement group by DHT (Distributed Hash Table ) algorithm, each placement group includes a plurality of OSDs (i.e. the storage disk in fig. 1), one of which is a master OSD, and the rest is called a slave OSD, and the master OSD writes the data onto the plurality of OSDs according to a redundancy policy set by a user. Fig. 2 is a schematic diagram of a data writing process in the related art, as shown in fig. 2, in order to ensure data consistency among multiple OSDs, a master OSD (i.e. a master storage disc in fig. 2) needs to allocate a unique version number for each writing request, and forms a transaction together with a data portion according to metadata such as version information, operation information, and the like, and then sends a transaction operation to a slave OSD (i.e. a slave storage disc in fig. 2) through a network to perform data writing, where Ceph adopts a strongly consistent synchronization manner, and the master OSD needs to wait for all slave OSDs to complete writing before returning the information of completion of writing to a client. Once the written information is returned to the client, the written data is required to be ensured not to be lost, and the data is kept consistent. Accordingly, in the related art, frequent communication is required between the respective OSDs when processing data, so that there is a problem in that the data processing efficiency is low.

In view of the above problems, no effective solution has been proposed at present.

Disclosure of Invention

The embodiment of the invention provides a data processing method, a system, a device, a storage medium and electronic equipment, which at least solve the technical problem that the data processing efficiency is low due to the fact that a shared-nothing architecture is adopted to process data in the related technology.

According to an aspect of an embodiment of the present invention, there is provided a data processing method including: under the condition that a data storage request of a client is received, acquiring data to be stored from the data storage request; judging whether the data to be stored is in the management range of the processing unit or not according to the information of the data to be stored; under the condition that the data to be stored is in the management range of the processing unit, acquiring resource allocation information matched with the data to be stored, and storing the data to be stored based on the resource allocation information, wherein the resource allocation information comprises storage path information of N storage disks, the N storage disks are used for storing the data to be stored, and N is a positive integer; and under the condition that the data to be stored is not in the management range of the processing units, determining a target processing unit for managing the data to be stored from the processing units in the data processing system, and feeding back the information of the target processing unit to the client, wherein the client sends a data storage request to the target processing unit according to the information of the target processing unit, and the target processing unit stores the data to be stored according to the data storage request.

Further, the data processing method further comprises: judging whether the processing unit stores the storage information of the data to be stored or not; under the condition that the storage information is stored in the processing unit, determining that the data to be stored is in the management range of the processing unit; judging whether the data to be stored is first requested to be stored or not under the condition that the storage information is not stored in the processing unit; under the condition that the data to be stored is first requested to be stored, determining that the data to be stored is in the management range of the processing unit; and in the case that the data to be stored is not stored for the first time, determining that the data to be stored is not in the management range of the processing unit.

Further, the data processing method further comprises: generating a target request according to the data to be stored and sending the target request to a total processing unit in the data processing system, wherein the target request is used for requesting to acquire resource allocation information matched with the data to be stored, and the total processing unit is used for determining a storage strategy matched with the processing unit and generating the resource allocation information according to the storage strategy; and under the condition that the resource allocation information fed back by the total processing unit is received, determining the received resource allocation information as the resource allocation information matched with the data to be stored.

Further, the data processing method further comprises: storing the content of the data to be stored in the N storage disks according to the storage path information in the resource allocation information; for each storage disk, detecting whether the storage disk has a fault or not in the process of storing the content of data to be stored; transmitting failure information of the storage disk to a total processing unit of the data processing system under the condition that the storage disk is detected to have a failure; under the condition that the failure of the storage disk is detected to be relieved, the content of the data to be stored is restored to the storage disk; and storing the content of the data to be stored according to the update information when the update information of the total processing unit on the resource allocation information is received, wherein the update information is determined based on the fault information, and the update information at least comprises storage path information of storage disks except for N storage disks.

Further, the data processing method further comprises at least one of the following: acquiring the writing speed of the storage disk for writing data to be stored, and detecting whether the storage disk has faults or not according to the size relation between the writing speed and the preset speed; detecting whether storage feedback information of the storage disk is received or not, obtaining a detection result, and detecting whether the storage disk has faults or not according to the detection result; and detecting whether the storage disk has a fault according to the storage feedback information under the condition that the storage feedback information is received.

Further, the data processing method further comprises: after the target processing unit stores the data to be stored according to the data storage request, under the condition that a data reading request of the client side for the stored data to be stored is received, acquiring a data identifier of the data to be stored from the data reading request; acquiring storage path information of the data to be stored according to a data identifier of the data to be stored, acquiring content of the data to be stored and at least one version number corresponding to the data to be stored according to the storage path information, wherein the acquired content of the data to be stored consists of sub-content acquired from at least one storage disk, the version number is generated by a target processing unit according to the storage times of the data to be stored, and the version number is stored to the storage disk along with the sub-content; comparing the obtained version number with a target version number of the data to be stored, determining target storage data according to the comparison result, and feeding back the target storage data to the client, wherein the target version number is the version number which is issued to the processing unit after the target processing unit successfully stores the data to be stored.

Further, the data processing method further comprises: determining target storage data according to the content of the acquired data to be stored under the condition that the comparison result characterizes that all the acquired version numbers are lower than or equal to the target version number; and under the condition that the version number higher than the target version number exists in the version numbers obtained by the comparison result characterization, waiting for the target processing unit to update the target version number until all the obtained version numbers are lower than or equal to the updated target version number, re-obtaining the content of the data to be stored, and determining the target storage data according to the re-obtained content of the data to be stored.

Further, the target processing unit determines whether the data to be stored is successfully stored by: after the content of the data to be stored is stored in the N storage disks, counting the number of the storage disks which are successfully stored; under the condition that the number of the storage disks which are successfully stored is larger than or equal to the preset number, determining that the data to be stored are successfully stored; and under the condition that the number of the storage disks which are successfully stored is smaller than the preset number, determining that the data to be stored fails to be stored.

Further, the data processing method further comprises: after data to be stored is stored based on resource allocation information, under the condition that a first instruction of a total processing unit in a data processing system is received, storage path information of a target storage disk is obtained from the first instruction, wherein the first instruction is generated under the condition that a storage disk is newly added in the data processing system, the target storage disk is the newly added storage disk, and the first instruction is used for migrating the data to be stored which are stored by the processing unit to the newly added storage disk; and migrating the content of the data to be stored in at least one storage disk in the N storage disks to the target storage disk according to the storage path information of the target storage disk.

Further, the data processing method further comprises: after the target processing unit stores the data to be stored according to the data storage request, under the condition that a second instruction of a total processing unit in the data processing system is received, acquiring a data identifier of the data to be stored from the second instruction, wherein the second instruction is generated under the condition that the target processing unit fails, and the second instruction is used for replacing the processing unit for managing the data to be stored; the determining processing unit is used for managing the data to be stored matched with the data identification.

According to another aspect of an embodiment of the present invention, there is also provided a data processing system including: m storage disks for storing data, wherein M is a positive integer; the system comprises a plurality of processing units, a target processing unit and a client, wherein each processing unit is used for acquiring data to be stored from a data storage request under the condition that the data storage request of the client is received, storing the data to be stored based on resource allocation information matched with the data to be stored under the condition that the data to be stored is in a management range of the processing unit, sending a query request to the total processing unit and receiving information of the target processing unit fed back by the total processing unit and feeding back the information of the target processing unit to the client, wherein the resource allocation information comprises storage path information of N storage disks needing to store the data to be stored, N is a positive integer, M is larger than N, and the client sends the data storage request to the target processing unit according to the information of the target processing unit; and the total processing unit is used for determining a target processing unit for managing the data to be stored from the plurality of processing units according to the query request and feeding back the information of the target processing unit to the processing unit.

According to another aspect of the embodiment of the present invention, there is also provided a data processing apparatus including: the first acquisition module is used for acquiring data to be stored from the data storage request under the condition that the data storage request of the client is received; the judging module is used for judging whether the data to be stored is in the management range of the processing unit according to the information of the data to be stored; the storage module is used for acquiring resource allocation information matched with the data to be stored under the condition that the data to be stored is in the management range of the processing unit, and storing the data to be stored based on the resource allocation information, wherein the resource allocation information comprises storage path information of N storage discs, the N storage discs are used for storing the data to be stored, and N is a positive integer; and the processing module is used for determining a target processing unit for managing the data to be stored from the processing units in the data processing system under the condition that the data to be stored is not in the management range of the processing units, and feeding back the information of the target processing unit to the client, wherein the client sends a data storage request to the target processing unit according to the information of the target processing unit, and the target processing unit stores the data to be stored according to the data storage request.

According to another aspect of the embodiments of the present invention, there is also provided a computer-readable storage medium having a computer program stored therein, wherein the computer program is configured to perform the above-described data processing method when run.

According to another aspect of an embodiment of the present invention, there is also provided an electronic device including one or more processors; and a memory for storing one or more programs which, when executed by the one or more processors, cause the one or more processors to implement a method for running the program, wherein the program is configured to perform the data processing method described above when run.

In the embodiment of the invention, a mode of writing data through a unique processing unit is adopted, data to be stored is obtained from a data storage request under the condition that a data storage request of a client is received, then whether the data to be stored is in a management range of the processing unit is judged according to information of the data to be stored, so that resource allocation information matched with the data to be stored is obtained under the condition that the data to be stored is in the management range of the processing unit, the data to be stored is stored based on the resource allocation information, a target processing unit for managing the data to be stored is determined from the processing units in a data processing system under the condition that the data to be stored is not in the management range of the processing unit, and information of the target processing unit is fed back to the client, wherein the resource allocation information comprises storage path information of N storage discs, N is a positive integer, the client sends the data storage request to the target processing unit according to the information of the target processing unit, and the target processing unit stores the data to be stored according to the data storage request.

In the above process, when the processing unit receives the data storage request, the data is written in under the condition that the data to be stored is determined to be within the management range of the processing unit, otherwise, information is sent to the client to send the data to be stored to the target processing unit for managing the data to be stored, so that the data to be stored is written in through the unique processing unit, the data processing efficiency is improved, and the data interaction among a plurality of storage disks is avoided when the data is written in the related technology, so that the data processing efficiency is low. In addition, the processing unit is arranged to write data, and the storage disk stores data, so that the memory calculation separation is realized, and the phenomenon that the data is difficult to recover in time if faults occur when the data is written and stored depending on the storage disk in the related technology is avoided, thereby improving the stability of data processing.

Therefore, the scheme provided by the application achieves the aim of writing the data through the unique processing unit, thereby realizing the technical effect of improving the data processing efficiency, and further solving the technical problem that the data processing efficiency is low because the related technology adopts a shared-free architecture to process the data.

Drawings

The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this specification, illustrate embodiments of the application and together with the description serve to explain the application and do not constitute a limitation on the application. In the drawings:

FIG. 1 is a schematic diagram of a related art data writing process;

FIG. 2 is a schematic diagram II of a related art data writing process;

FIG. 3 is a schematic diagram of an alternative data processing system in accordance with an embodiment of the present invention;

FIG. 4 is a schematic diagram of an alternative data processing method according to an embodiment of the invention;

FIG. 5 is a schematic diagram of an alternative data writing process according to an embodiment of the invention;

FIG. 6 is a schematic diagram of an alternative storage disk failure handling according to an embodiment of the present invention;

FIG. 7 is a schematic diagram of a related art system for detecting disk failures;

FIG. 8 is a schematic diagram of an alternative data migration in accordance with an embodiment of the present invention;

FIG. 9 is a schematic diagram of an alternative data processing apparatus according to an embodiment of the present invention;

FIG. 10 is a schematic diagram of an alternative electronic device according to an embodiment of the invention;

in the figure: 100. a storage disk; 200. a processing unit; 300. a total processing unit; 400. a read-write manager.

Detailed Description

In order that those skilled in the art will better understand the present invention, a technical solution in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in which it is apparent that the described embodiments are only some embodiments of the present invention, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the present invention without making any inventive effort, shall fall within the scope of the present invention.

It should be noted that the terms "first," "second," and the like in the description and the claims of the present invention and the above figures are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that the embodiments of the invention described herein may be implemented in sequences other than those illustrated or otherwise described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

It should be noted that, the user information (including but not limited to user equipment information, user personal information, etc.) and the data (including but not limited to data for analysis, stored data, presented data, etc.) related to the present application are information and data authorized by the user or fully authorized by each party, and the collection, use and processing of the related data need to comply with the related laws and regulations and standards of the related region, and provide corresponding operation entries for the user to select authorization or rejection.

First, partial terms or terminology appearing in the course of describing embodiments of the application are applicable to the following explanation:

Shared-Nothing architecture (Shared-notify): a distributed system architecture in which nodes are independent of each other, each node having its own independent storage and memory, and not sharing these resources with other nodes.

Full Shared architecture (Shared-evaluation): a distributed system architecture in which multiple nodes share storage resources, including memory and storage devices, etc.

Transaction (TRANSACTI/On): an atomic sequence of operations that is either fully executed or not executed at all, but is not in a partially executed state. Transactions are typically applied in situations where data integrity and consistency need to be ensured.

Distributed transaction (Distributed TransactI/On): a distributed transaction is a transaction operation that spans multiple compute nodes or data storage locations. They either complete successfully on all participating nodes or roll back on all participating nodes to ensure consistency of the data.

Erasure Code (EC): a data encoding technique for improving redundancy and fault tolerance of data. It is able to recover the original data when the data is lost or corrupted by dividing the original data into a number of blocks and calculating additional redundant blocks.

Remote data direct access (Remote Direct Memory Access, RDMA): a network communication technique allows two computers to directly access memory to each other without involving a host CPU. RDMA is commonly used in high performance computing, large-scale data centers, and storage systems to improve the efficiency and performance of network communications.

NVMe (Non-Volatile Memory Express): a high-performance storage protocol is suitable for solid state disks and flash memory devices.

NVMe-oF (NVMe over Fabrics): a protocol for extending NVMe storage devices over a storage network. NVMe-orf allows remote access to NVMe devices using standard network protocols (e.g., RDMA or transmission control protocols), enabling storage devices to communicate with hosts with low latency and high throughput while maintaining the performance advantages oF the NVMe protocols.

Example 1

According to an embodiment of the present invention, there is provided an embodiment of a data processing system, FIG. 3 is a schematic diagram of an alternative data processing system according to an embodiment of the present invention, as shown in FIG. 3, the system including:

M storage disks 100 for storing data, wherein M is a positive integer;

The system comprises a plurality of processing units 200, a target processing unit 300 and a client, wherein each processing unit 200 is used for acquiring data to be stored from a data storage request when receiving the data storage request of the client, storing the data to be stored based on resource allocation information matched with the data to be stored when the data to be stored is within the management range of the processing unit 200, sending a query request to the total processing unit 300 when the data to be stored is not within the management range of the processing unit 200, receiving information of the target processing unit 200 fed back by the total processing unit 300, and feeding back the information of the target processing unit 200 to the client, wherein the resource allocation information comprises storage path information of N storage discs 100 required to store the data to be stored, N is a positive integer, M is greater than N, and the client sends the data storage request to the target processing unit 200 according to the information of the target processing unit 200;

the overall processing unit 300 is configured to determine a target processing unit 200 for managing data to be stored from the plurality of processing units 200 according to the query request, and feed back information of the target processing unit 200 to the processing unit 200.

In this embodiment, the data processing system employs a Shared-nothing architecture (Shared-discovery), and optionally, as shown in fig. 3, the data processing system further includes a read-write manager 400. The data processing system comprises a plurality of nodes, one node is a server, at least one of a processing unit 200, a total processing unit 300 and a read-write manager 400 can be arranged on each server, at least one storage disk 100 is arranged on the server provided with the read-write manager 400, and all the storage disks 100 in the data processing system form a storage resource pool.

The read-write manager 400 is used for managing a detection process of the local storage disk 100, and is responsible for detecting and reporting access information, health status and performance statistics of the storage disk 100, where the access information refers to information required when accessing the storage disk 100, and may include information such as an access address, a port, etc. of the storage disk 100, the total processing unit 300 is used for managing storage resources, storage policy configuration, and allocation of storage physical space in the data processing system, the processing units 200 are used for processing data reading and writing, maintaining a consistent service process, each processing unit 200 is only responsible for writing data managed by the processing unit 200, and the written data is shared among the processing units 200, that is, each processing unit 200 can read the written data. As shown in fig. 3, the read-write manager 400 may perform information interaction with the overall processing unit 300, and in the application process, the read-write manager 400 may report information such as access information to the overall processing unit 300, and then the overall processing unit 300 gathers storage disk information in the data processing system and shares the storage disk information with all the processing units 200, so that each processing unit 200 performs data writing and data reading according to the storage disk information. As shown in fig. 3, the read-write manager 400 may also communicate data with each processing unit 200 and the overall processing unit 300 to transmit health status and performance statistics of the storage disk 100.

Optionally, connections are established between nodes in the data processing system over a high speed network that supports RDMA (Remote Direct Memory Access, remote direct data access) high speed transport protocols, and may also support conventional TCP (TRANSMISSI/On Control Protocol, transport control protocol). As shown in fig. 3, the processing unit 200 may connect to any storage disk 100 in the data processing system via a remote access protocol, which may be an NVMe-ofe access protocol, as with a local disk. The data processing system may implement a user-state NVMe-oh protocol processing terminal in the read-write manager 400, and optionally, may also configure other protocol processing terminals or add other types oF access protocols in the form oF a plug-in.

In this embodiment, if the scale of the data processing system is continuously enlarged, the data processing system may also perform partition management. For example, a plurality of partitions are provided, each partition having its own independent overall processing unit 300, processing unit 200, and independent storage resource pool, thereby facilitating a reduction in the management complexity of a large-scale system. Each data is distributed in one partition, and the processing unit 200 only performs network communication with the total processing unit 300, the read-write manager 400 and the storage disk 100 in the corresponding storage resource pool in the partition, so that the number of network connections can be reduced, and performance problems caused by the excessive number of network connections are avoided. When a fault occurs, the setting mode only affects the partition where the fault is located, so that the influence range of the fault can be reduced, and the fault convergence speed is improved. In addition, the data processing system can transfer data from one partition to another partition according to actual requirements in the application process, so that storage resources are conveniently and fully utilized.

Therefore, the scheme provided by the application achieves the aim of writing the data through the unique processing unit 200, thereby realizing the technical effect of improving the data processing efficiency, and further solving the technical problem that the data processing efficiency is low because the related technology adopts a shared-nothing architecture to process the data.

Example 2

According to an embodiment of the present invention, there is provided an embodiment of a data processing method, it being noted that the steps shown in the flowcharts of the figures may be performed in a computer system such as a set of computer executable instructions, and that although a logical order is shown in the flowcharts, in some cases the steps shown or described may be performed in an order different from that herein.

FIG. 4 is a schematic diagram of an alternative data processing method according to an embodiment of the present invention, as shown in FIG. 4, applied to any one of the processing units in the data processing system of embodiment 1, including the steps of:

In step S401, in the case of receiving a data storage request of a client, data to be stored is acquired from the data storage request.

Alternatively, the client may be an electronic device such as a computer, a notebook, a mobile phone, etc., and the form of the data to be stored includes, but is not limited to, text, video, pictures, audio, etc. The data storage request may include data to be stored and data information of the data to be stored, where the data information includes at least an initial identifier of the data to be stored, and may also include information such as a data size.

Step S402, judging whether the data to be stored is within the management range of the processing unit according to the information of the data to be stored.

The information of the data to be stored may include storage information and storage times of the data to be stored, where the storage information may be generated by a processing unit in the data processing system when the data to be stored is stored, and the storage information includes, but is not limited to, a data identifier set by the data processing system for the data to be stored, a version number of the data to be stored, a data size, a timestamp, and the like. The client can request the data processing system to store the data to be stored of the same initial identifier for a plurality of times, so that the data processing system can distinguish the data to be stored received each time through the version number.

Optionally, the processing unit may determine whether the storage information of the data to be stored is stored in the processing unit, so as to determine whether the data to be stored is within the management range of the processing unit according to the determination result and the storage times of the data to be stored.

Step S403, under the condition that the data to be stored is within the management range of the processing unit, acquiring the resource allocation information matched with the data to be stored, and storing the data to be stored based on the resource allocation information, where the resource allocation information includes storage path information of N storage disks, the N storage disks are used for storing the data to be stored, and N is a positive integer.

Alternatively, each processing unit is only responsible for writing the data it manages. Therefore, in the case where the data to be stored is within the management range of the processing unit, the processing unit can acquire the resource allocation information to which the data to be stored matches. The processing unit can store the data to be stored received each time to the same area under the condition of multiple times of storage, and the resource allocation information matched with the data to be stored can be obtained from the main processing unit by the processing unit under the condition of storing the data to be stored for the first time, and the processing unit can store the resource allocation information in the memory after obtaining, so that the processing unit can obtain the resource allocation information from the memory under the condition of not storing the data to be stored for the first time.

After the resource allocation information is acquired, the processing unit can store the data to be stored in the N storage disks in a redundant storage mode based on the storage path information in the resource allocation information so as to improve the reliability of the data. For example, the processing unit may generate multiple copies of data according to the data to be stored, where the content of each copy of data is consistent with the complete content of the data to be stored, so as to store one copy of data in one storage disk. For another example, the processing unit may determine a plurality of data blocks with different contents according to the data to be stored, so as to store one data block in one storage disc, where in this case, the processing unit may determine the plurality of data blocks in an erasure code manner, and the processing unit may determine the plurality of data blocks in other manners, so as to achieve the effect that the complete content of the data to be stored may be determined according to some data blocks in the plurality of data blocks, for example, the processing unit may make each data block include only a part of the content in the data to be stored, and the contents in the respective data blocks partially overlap.

In step S404, in the case that the data to be stored is not within the management range of the processing units, determining a target processing unit for managing the data to be stored from the processing units in the data processing system, and feeding back information of the target processing unit to the client, where the client sends a data storage request to the target processing unit according to the information of the target processing unit, and the target processing unit stores the data to be stored according to the data storage request.

Alternatively, the processing unit may generate a query request according to the data information of the data to be stored, where the query request is used to request to query the target processing unit for managing the data to be stored and the information of the target processing unit. After generating the query request, the processing unit may send the query request to the main processing unit, and then the main processing unit determines the target processing unit according to the query request, and feeds back information of the target processing unit to the processing unit. The information of the target processing unit may refer to an access address of the target processing unit, and the like.

And then, the processing unit can determine the target processing unit according to the information fed back by the main processing unit and feed back the information of the target processing unit to the client, so that the client initiates a data storage request to the target processing unit according to the received information, and the effective storage of the data to be stored is realized.

Based on the scheme defined in the above steps S401 to S404, it may be known that in the embodiment of the present invention, in a manner of writing data by using a unique processing unit, by acquiring data to be stored from a data storage request when a data storage request of a client is received, then determining whether the data to be stored is within a management range of the processing unit according to information of the data to be stored, thereby acquiring resource allocation information matched with the data to be stored when the data to be stored is within the management range of the processing unit, storing the data to be stored based on the resource allocation information, determining a target processing unit for managing the data to be stored from a processing unit in a data processing system when the data to be stored is not within the management range of the processing unit, and feeding back information of the target processing unit to the client, where the resource allocation information includes storage path information of N storage disks, the N storage disks are used for storing the data to be stored, and the client sends the data storage request to the target processing unit according to the information of the target processing unit, and the target processing unit stores the data to be stored according to the data storage request.

In an alternative embodiment, in the process of judging whether the data to be stored is within the management range of the processing unit according to the information of the data to be stored, the processing unit may judge whether the storage information of the data to be stored is stored in the processing unit, so that if the storage information is stored in the processing unit, the data to be stored is determined to be within the management range of the processing unit, if the storage information is not stored in the processing unit, the data to be stored is judged to be the first time to be requested to be stored, and if the data to be stored is the first time to be requested to be stored, the data to be stored is determined to be within the management range of the processing unit, and if the data to be stored is not the first time to be requested to be stored, the data to be stored is determined to be not to be within the management range of the processing unit.

The processing unit stores the storage information in the processing unit each time the data is stored, so that in the case that the storage information of the data to be stored is stored in the processing unit, the data to be stored is determined to be stored in the processing unit before, and the data to be stored is determined to be in the management range of the processing unit, otherwise, the data to be stored is determined to be not stored in the processing unit before. The processing unit may search the storage information corresponding to the data to be stored according to the initial identifier and other contents in the data information of the data to be stored.

If the processing unit has not previously stored the data to be stored, the processing unit needs to further determine whether the data to be stored is the first time to request storage. For example, the processing unit may request a query from the main processing unit according to the data information of the data to be stored, and the main processing unit may determine that the data to be stored is first requested to be stored if the initial identifier of the data to be stored cannot be found, otherwise determine that the data to be stored is not first requested to be stored, and feedback the information to the processing unit after determining. Further, if the data to be stored is the first request for storage, the processing unit automatically determines that the data to be stored belongs to the management range thereof, otherwise, the data to be stored is stored by other processing units before being determined, that is, the data to be stored is not in the management range of the processing unit. Wherein, the main processing unit stores management corresponding relations between each processing unit and stored data.

It should be noted that, through the above process, accurate judgment of whether the data to be stored is within the management range of the processing unit is realized.

In an alternative embodiment, in the case that the data to be stored is the first request for storage, in the process of obtaining the resource allocation information matched with the data to be stored, the processing unit may generate a target request according to the data to be stored, and send the target request to a total processing unit in the data processing system, so that in the case that the resource allocation information fed back by the total processing unit is received, the received resource allocation information is determined to be the resource allocation information matched with the data to be stored. The target request is used for requesting to acquire the resource allocation information matched with the data to be stored, and the total processing unit is used for determining the storage strategy matched with the processing unit and generating the resource allocation information according to the storage strategy.

Optionally, fig. 5 is a schematic diagram of an optional data writing process according to an embodiment of the present invention, as shown in fig. 5, where, in a case where data to be stored is first requested to be stored, a processing unit may generate a target request according to the data to be stored and data information thereof and send the target request to a main processing unit, where, in a case where the target request is received, the main processing unit allocates a unique data identifier to the data to be stored and allocates a corresponding storage space (i.e. resource allocation information in fig. 5), where a correspondence between an initial identifier of the data and the data identifier is recorded in the main processing unit. Since resources in a data processing system are fully shared, by deciding on allocation of resources by the main processing unit, the problem of contention for resources by the individual processing units can be avoided.

The total processing unit can set different storage strategies for different processing units according to actual requirements, so that when a target request sent by the processing unit is received, the storage strategy matched with the processing unit is determined, then resource allocation information is generated according to the storage strategy, storage path information in the resource allocation information is the position information of data to be stored in a storage disk, and the position information can refer to a segment address. Optionally, the storage policy is used to indicate a redundant storage mode, performance requirements, security policy, data size of the data block or copy, etc. for the data. For example, the redundant storage mode may be copy mode storage, EC mode storage, other mode storage, etc., if the redundant storage mode is copy mode storage, the security policy may be "determining that the version of data meets consistency when the number of copies of a version of data that is successfully stored meets a preset number", and if the redundant storage mode is copy mode storage, the security policy may be "determining that the version of data meets consistency when the number of data blocks of a version of data that is successfully stored meets a preset number", that is, the security policy is used to characterize that the version of data meets consistency, which is equivalent to a condition that is required to be met to characterize that the data is successfully stored.

The overall processing unit may then send the resource allocation information to the processing unit, which then determines the received resource allocation information as resource allocation information for which the data to be stored matches. The overall processing unit may also send the storage policy to the processing unit. Further, as shown in fig. 5, the processing unit may store the content of the data to be stored (i.e. C1 and C2 in fig. 5) in the storage disc according to the resource allocation information, and send corresponding information to the client after confirming that the writing is completed.

In the data storage process, the processing unit may store the storage information together in the storage disk, and the storage mode of the data may be preset in the processing unit, where the storage mode includes, but is not limited to, a storage rate of the data to be stored, a storage rate of the storage information, and the like. And the processing unit feeds back successful data storage to the client only after the stored data to be stored meets the security policy, namely the number of the storage disks which are successfully written in meets the preset number in the security policy. The storage times are equivalent to the times that the data processing system receives a data storage request for storing the data with storage, and when the data to be stored is stored each time, the processing unit and the main processing unit can perform data communication, namely the storage times of the data to be processed can be recorded through the main processing unit.

In the shared-nothing architecture, resources cannot be flexibly allocated due to resource independence, so that resources are difficult to fully utilize, and a large amount of resources are wasted. For example, different services have a need to use different storage policies, and fixed resources are often required to be partitioned for the different storage policies, making capacity planning and capacity balancing difficult. In the full-sharing architecture provided by the embodiment, through the above process, different storage strategies are conveniently selected for data storage according to actual demands, and the different storage strategies can be shared for one storage resource pool, so that the flexibility of the application can be improved on one hand, and the resource utilization rate can be improved on the other hand.

In an alternative embodiment, in the process of storing the data to be stored based on the resource allocation information, the processing unit may store the content of the data to be stored in the N storage disks according to the storage path information in the resource allocation information, and then, for each storage disk, detect whether the storage disk has a failure in the process of storing the content of the data to be stored, so that in the case that the storage disk is detected to be failed, failure information of the storage disk is sent to a total processing unit of the data processing system, in the case that the storage disk is detected to be released from the failure, the content of the data to be stored is restored to the storage disk, in the case that update information of the resource allocation information by the total processing unit is received, the content of the data to be stored is stored according to the update information, where the update information is determined based on the failure information, and the update information includes at least the storage path information of the storage disks other than the N storage disks.

Alternatively, during the storage process, the processing unit may detect whether the storage disk has a failure according to the response situation of the storage disk and the response content, for example, determine that the storage disk has a failure in the case of a data writing failure. Further, fig. 6 is a schematic diagram illustrating an alternative storage disk failure handling process according to an embodiment of the present invention, where, as shown in fig. 6, in a case where a storage disk failure is detected, the processing unit may send failure information of a disk that fails to write (i.e., a storage disk located in the middle in fig. 6) to the main processing unit.

The main processing unit may remove the failed disk from the mapping set of the data to be stored, and determine whether the data to be stored after removing the storage disk still conforms to the security policy, for example, determine whether the number of remaining normal storage disks satisfies a preset number in the security policy, if the number satisfies, determine that the data to be stored is likely to conform to the security policy, the main processing unit may temporarily do not process, and if the number does not satisfy, determine that the data to be stored cannot conform to the security policy, in which case, as shown in fig. 6, the main processing unit may add storage disks other than the N storage disks to the mapping set of the data to be stored, generate update information according to the storage path information of the newly added storage disk, and then send the update information to the processing unit, so that the processing unit stores the content that has failed to be stored before into the new storage disk, and normally feeds back the information that the writing was completed to the client. For example, the rightmost storage disk in fig. 6 is a new storage disk, C1 and C2 form the content of the data to be stored, and C2 is the content of the previous storage failure, where C2' is consistent with the content of C2. Therefore, the quick switching capability of data mapping under the full-shared storage architecture is fully utilized, so that the effective storage of the content of the data to be stored is conveniently realized. Wherein the mapping set is used for recording a storage disk for storing data to be stored.

Alternatively, the processing unit may also restore the failed data content to the storage disk in case of detecting a failure of the storage disk.

It should be noted that, fig. 7 is a schematic diagram of detecting a failure of a storage disk in the related art, as shown in fig. 7, in the related art, in order to cope with the failure, a certain frequency of detection is maintained between OSDs in a system, if a detection message is not returned, the OSDs report information to a monitoring node, the monitoring node makes a judgment according to the overall report condition, marks the failed OSDs as non-serviceable, and informs each OSD and a client, and the time of the overall failure reaction generally requires several seconds. In this embodiment, the processing unit determines whether the storage disks are faulty, so as to avoid detection between the storage disks, thereby reducing occupation of resources of the system and improving processing efficiency.

In addition, in the related art, after OSD fault information is received, each placement group needs to negotiate the current state of the data maintained by them, compare the version information with the operation log, and determine whether the OSDs serviceable in the placement group constitute complete data and on which OSDs each data is specific. After the negotiation of the data state is completed, the placement group will select a main OSD to out-service, and the failover time will typically take several seconds. If a partial copy of the data is lost, the data recovery is needed to bring the system to a safe and reliable state, and the time is possibly several minutes or days according to the quantity of the recovered data. In this embodiment, as long as the processing unit does not fail, the processing unit can accurately determine whether the latest version number of the data to be stored satisfies the security policy and whether each data storage request is successfully processed, thereby avoiding negotiation work between OSDs, further reducing the occupation of resources to the system and improving the processing efficiency.

In an alternative embodiment, detecting whether a storage disk is faulty includes at least one of: acquiring the writing speed of the storage disk for writing data to be stored, and detecting whether the storage disk has faults or not according to the size relation between the writing speed and the preset speed; detecting whether storage feedback information of the storage disk is received or not, obtaining a detection result, and detecting whether the storage disk has faults or not according to the detection result; and detecting whether the storage disk has a fault according to the storage feedback information under the condition that the storage feedback information is received.

There are many factors affecting the I/O of the storage disk, such as problems with remote disk media, problems with access protocol processing software, problems with network transport, etc., and many feature types of problems, including permanent failures, intermittent failures, degradation of quality of service, etc. It is very difficult to cover so many components and so many types of faults, resulting in the failure detection in the related art being typically slow, thereby affecting traffic I/O.

In this implementation, it is determined whether a storage disk has a failure based on the quality of service of the service. For example, in the case where the writing speed is greater than the preset speed, it is determined that the storage disk is malfunctioning, and conversely, it is determined that the storage disk is not malfunctioning. For another example, if the storage feedback information of the storage disk is not received, it is determined that the storage disk is malfunctioning, and if the storage feedback information of the storage disk is received, it is determined that the storage disk is not malfunctioning. For another example, in the case that the storage feedback information indicates that the storage fails, it is determined that the storage disk is failed, and in the case that the storage feedback information indicates that the storage is successful, it is determined that the storage disk is not failed.

By the above process, it is able to implement rapid detection of whether the storage disk is faulty, thereby further improving data processing efficiency and improving stability of data processing.

In an alternative embodiment, after storing the data to be stored based on the resource allocation information, the processing unit may acquire, in the case of receiving a first instruction of the overall processing unit in the data processing system, storage path information of the target storage disk from the first instruction, so as to migrate, according to the storage path information of the target storage disk, content of the data to be stored in at least one storage disk of the N storage disks to the target storage disk. The first instruction is generated under the condition that a storage disk is newly added in the data processing system, the target storage disk is the newly added storage disk, and the first instruction is used for migrating the data to be stored, which is stored by the processing unit, to the newly added storage disk.

Optionally, in this embodiment, the data processing system supports dynamically adding nodes and adding new storage disks in the nodes, where the added nodes and the new storage disks are summarized by the main processing unit to expand a storage resource pool formed by the storage disks. FIG. 8 is a schematic diagram of an alternative data migration according to an embodiment of the present invention, as shown in FIG. 8, when the storage resource pool is expanded, the newly added data (i.e. data 4 in FIG. 8) may use the storage space on the newly added storage disk, and in order to ensure the balance of the storage disk, a portion of the stored data (i.e. data 3 in FIG. 8) may be gradually migrated to the newly added storage disk.

Therefore, the main processing unit can generate the first instruction according to the storage path information of the newly added storage disk under the condition that the storage disk is newly added in the data processing system, and send the first instruction to the processing unit, and after receiving the first instruction, the processing unit migrates the content of the data to be stored in at least one storage disk in the N storage disks to the target storage disk. The first instruction may further specify a storage disk that needs to migrate data from the N storage disks, so that the processing unit migrates data.

It should be noted that, because the mapping relationship between the data and the storage disk is uniformly managed by the main processing unit, the frequency, the opportunity and the data volume of each migration of the data can be freely controlled by the main processing unit, so that the influence of the process of migrating the data on the client request can be precisely controlled, and the service quality of the business is further ensured. In addition, in this embodiment, since the data processing units are of a fully shared architecture, the storage resources represented by the storage disks in the storage resource pool and the computing power represented by the processing units can be expanded independently, so that the expandability of the data processing system is greatly improved, and as the storage resource pool expands, the overall hardware performance of the data processing system is improved, the number of processing units can be increased flexibly, so that the data access performance adapting to the hardware performance can be exerted, and the performance bottleneck is avoided.

In an alternative embodiment, the stored data to be stored is shared between processing units in the data processing system, after the target processing unit stores the data to be stored according to the data storage request, the processing unit may acquire a data identifier of the data to be stored from the data reading request under the condition that a data reading request of the client for the stored data is received, then acquire storage path information of the data to be stored according to the data identifier of the data to be stored, acquire content of the data to be stored and at least one version number corresponding to the data to be stored according to the storage path information, so as to compare the acquired version number with a target version number of the data to be stored, determine the target storage data according to the comparison result, and feed back the target storage data to the client, wherein the acquired content of the data to be stored is composed of sub-content acquired from at least one storage disk, the version number is generated by the target processing unit according to the storage times of the data to be stored, and the version number is issued to the processing unit after the target processing unit successfully stores the data to be stored.

In this embodiment, the data processing system is a fully shared architecture, and due to resource sharing, each processing unit can access all storage disks in the data processing system, and allow multiple processing units to read the same data concurrently, and the processing units can flexibly read any storage disk in a mapping set corresponding to the data, and the selection policy includes nearby reading, polling, according to a disk load, time delay, and the like. When each processing unit is reading certain data, the processing unit managing the data may be writing concurrently, and thus, it is necessary to avoid that the processing unit reading the data reads the data which is being written and has not yet satisfied the security policy, and returns it to the client.

In order to solve the foregoing problem, in this embodiment, in the case where the data to be stored is not within the management range of the processing unit, after determining that the data to be stored is successfully stored, that is, after determining that the data to be stored in this time satisfies the security policy, the target processing unit may issue version numbers of the data to be stored to all processing units except the target processing unit, and record the version numbers as target version numbers by each processing unit, where the target version numbers are used for the processing unit to determine whether the data may be being written concurrently. The release form includes, but is not limited to, directly sending the version number to the processing unit, storing the version number in the target area for the processing unit to obtain, sending the version number to the third party device, and sending the version number to the processing unit through the third party device.

Optionally, under the condition that a data reading request of the client for the stored data to be stored is received, the processing unit may acquire storage path information of the data to be stored according to a data identifier of the data to be stored, and acquire sub-content in the storage disk and a version number corresponding to each sub-content from at least one storage disk according to the storage path information of the data to be stored, so as to acquire the content of the data to be stored and at least one version number corresponding to the data to be stored. Based on the difference of the redundant storage modes, the foregoing sub-content may be a copy of the data to be stored, or may be a data block of the data to be stored, where the target processing unit stores the version number of the data to be stored together with the sub-content to a designated position in the storage disk during the process of storing the data to be stored, that is, the sub-content in the storage disk corresponds to the version number one by one, and in this embodiment, the storage path information includes a segment address of the data to be stored in the storage disk, and for the same storage disk, each version of the data to be stored is stored in the same segment address in the storage disk, that is, the storage path information of each version is the same.

Further, the processing unit may compare the obtained version number with the target version number of the data to be stored, and determine the target storage data according to the comparison result, so as to feed back the target storage data to the client. After the data is stored, any processing unit in the data processing system can feed back the unique data identification allocated to the data by the data processing system to the client so that the client can read or store the data next time.

It should be noted that, in the related art, each OSD in the shared-nothing architecture is independent, frequent information interaction is required between OSDs to negotiate an authoritative version of data and a primary OSD and a secondary OSD, and the primary OSD is used for processing a client request. In the interaction process, new faults may occur to cause renegotiation, and the two selected primary copies have the possibility of divergence, so that extra steps, such as two-stage submission and the like, are required for coping with the superposition of faults. In this embodiment, in the process of implementing a distributed transaction based on a full-sharing architecture, a unique processing unit is set to manage a certain data, so that the processing unit can participate in the data writing process in the whole process, including determining an execution result of a data storage request, recording a disc with write failure, maintaining a version number of the data meeting a security policy, and the like, so that when other processing units read data in response to a client request, only information interaction with the processing unit for managing the data is needed to determine that the data meets an accurate version of the security policy, thereby greatly simplifying a data version determination process, facilitating the processing unit to accurately determine target storage data, and simultaneously enabling maintenance and recovery of the data to be more efficient.

In an alternative embodiment, in determining the target storage data according to the comparison result, the processing unit may determine the target storage data according to the content of the data to be stored when all the obtained version numbers of the comparison result characterization are lower than or equal to the target version number, wait for the target processing unit to update the target version number when a version number higher than the target version number exists in the version numbers obtained by the comparison result characterization until all the obtained version numbers are lower than or equal to the updated target version number, re-obtain the content of the data to be stored, and determine the target storage data according to the re-obtained content of the data to be stored.

Optionally, if each version number obtained from the storage disc is lower than or equal to the target version number, it indicates that the data read from the storage disc is necessarily stored by the target processing unit, that is, the security policy is satisfied, and at this time, the target storage data may be determined directly according to the content of the data to be stored. If the redundant storage mode is to store in a copy mode, determining sub-content acquired from a certain storage disk as target storage data, if the redundant storage mode is to store in an EC mode, determining the target storage data according to the sub-content acquired from a plurality of storage disks according to the EC principle, and emphasizing that the content of the target storage data is consistent with the complete content of the data to be stored.

Optionally, if a version number higher than the target version number exists in each version number obtained from the storage disk, it is determined that the target processing unit may be writing the data to be stored at this time, so the processing unit may wait for the target processing unit to update the target version number until the version number of each sub-content is lower than or equal to the updated target version number, and at this time, acquire the content of the data to be stored from the storage disk again according to the storage path information of the data to be stored, thereby determining the target storage data.

By the above process, the accuracy of the target storage data determined by the processing unit is ensured, and the version data which does not accord with the security policy is prevented from being fed back to the client.

In an alternative embodiment, the target processing unit may determine whether the data to be stored was successfully stored by: after the content of the data to be stored is stored in the N storage disks, counting the number of the storage disks which are successfully stored; under the condition that the number of the storage disks which are successfully stored is larger than or equal to the preset number, determining that the data to be stored are successfully stored; and under the condition that the number of the storage disks which are successfully stored is smaller than the preset number, determining that the data to be stored fails to be stored.

The target processing unit may acquire the corresponding storage policy, and then acquire the preset number from the security policy of the storage policy, so as to determine that the storage of the data to be stored is successful when the number of storage disks that are successfully stored is greater than or equal to the preset number, and otherwise determine that the storage of the data to be stored fails.

By the above process, the data which is successfully stored always meets the data consistency, so that the reliability of the data when the data is fed back to the client can be effectively ensured.

In an alternative embodiment, after the target processing unit stores the data to be stored according to the data storage request, the processing unit may, in a case of receiving a second instruction of the total processing unit in the data processing system, obtain the data identifier of the data to be stored from the second instruction, so as to determine that the processing unit is used to manage the data to be stored, where the data identifier matches. Wherein a second instruction is generated in case of a failure of the target processing unit, the second instruction being for replacing the processing unit managing the data to be stored.

In order to avoid that the stored data to be stored is not managed by the processing unit after the target processing unit fails, or that the data to be stored is not executed by the processing unit when the client stores the data to be stored again, the main processing unit may generate a second instruction according to the data identifier of the data to be stored under the condition that the target processing unit fails, and send the second instruction to the processing unit which needs to manage the data to be stored next, so that the processing unit which receives the second instruction is responsible for subsequent writing and data recovery of the data to be stored. Optionally, if a processing unit receives the second instruction, the processing unit may determine that it is used to manage the data to be stored for which the data identifier matches.

When the target processing unit fails and cannot be serviced, because the failure information of the storage disk used for storing the data to be stored is recorded in the main processing unit, if the number of the storage disks which are not recorded as failures meets the preset number in the security policy, the data to be stored can be directly recovered by the processing unit which receives the second instruction, so that the data recovery efficiency is effectively improved.

It should be noted that, through the above process, when the target processing unit fails, the decision of data management can be realized only through the main processing unit, so that the problems of resource occupation and low efficiency existing in the related art that a plurality of OSD negotiations are required are avoided.

Example 3

According to an embodiment of the present invention, there is provided an embodiment of a data processing apparatus, wherein fig. 9 is a schematic diagram of an alternative data processing apparatus according to an embodiment of the present invention, as shown in fig. 9, the apparatus includes:

a first obtaining module 901, configured to obtain, when receiving a data storage request of a client, data to be stored from the data storage request;

A judging module 902, configured to judge whether the data to be stored is within the management range of the processing unit according to the information of the data to be stored;

The storage module 903 is configured to obtain, when data to be stored is within a management range of the processing unit, resource allocation information matched with the data to be stored, and store the data to be stored based on the resource allocation information, where the resource allocation information includes storage path information of N storage disks, where N storage disks are used to store the data to be stored, and N is a positive integer;

And the processing module 904 is configured to determine, from processing units in the data processing system, a target processing unit for managing the data to be stored, and feed back information of the target processing unit to the client, where the client sends a data storage request to the target processing unit according to the information of the target processing unit, and the target processing unit stores the data to be stored according to the data storage request, where the data to be stored is not within the management range of the processing unit.

It should be noted that the first obtaining module 901, the judging module 902, the storing module 903, and the processing module 904 correspond to steps S401 to S404 in the above embodiment, and the four modules are the same as examples and application scenarios implemented by the corresponding steps, but are not limited to those disclosed in the above embodiment 2.

Optionally, the determining module 902 further includes: the first judging submodule is used for judging whether the storage information of the data to be stored is stored in the processing unit or not; the first determining submodule is used for determining that data to be stored are in the management range of the processing unit under the condition that the storage information is stored in the processing unit; the second judging sub-module is used for judging whether the data to be stored is first requested to be stored or not under the condition that the storage information is not stored in the processing unit; the second determining submodule is used for determining that the data to be stored is in the management range of the processing unit under the condition that the data to be stored is requested to be stored for the first time; and the third determining submodule is used for determining that the data to be stored is not in the management range of the processing unit when the data to be stored is not the first request for storage.

Optionally, the storage module 903 further includes: the first sending sub-module is used for generating a target request according to the data to be stored and sending the target request to a total processing unit in the data processing system, wherein the target request is used for requesting to acquire resource allocation information matched with the data to be stored, and the total processing unit is used for determining a storage strategy matched with the processing unit and generating the resource allocation information according to the storage strategy; and the fourth determining submodule is used for determining the received resource allocation information as the resource allocation information matched with the data to be stored under the condition that the resource allocation information fed back by the total processing unit is received.

Optionally, the storage module 903 further includes: the first storage sub-module is used for storing the content of the data to be stored in the N storage disks according to the storage path information in the resource allocation information; the detection submodule is used for detecting whether the storage disk has faults or not in the process of storing the content of the data to be stored for each storage disk; the second sending submodule is used for sending the fault information of the storage disc to a total processing unit of the data processing system under the condition that the storage disc is detected to have faults; the second storage sub-module is used for re-storing the content of the data to be stored to the storage disk under the condition that the storage disk is detected to be free from faults; and the third storage sub-module is used for storing the content of the data to be stored according to the update information under the condition that the update information of the total processing unit on the resource allocation information is received, wherein the update information is determined based on the fault information, and the update information at least comprises the storage path information of the storage disks except the N storage disks.

Optionally, the detection submodule includes at least one of: the first detection unit is used for acquiring the writing speed of the storage disk for writing the data to be stored, and detecting whether the storage disk has faults or not according to the size relation between the writing speed and the preset speed; the second detection unit is used for detecting whether the storage feedback information of the storage disk is received or not, obtaining a detection result, and detecting whether the storage disk has faults or not according to the detection result; and the third detection unit is used for detecting whether the storage disk has faults or not according to the storage feedback information under the condition that the storage feedback information is received.

Optionally, the data processing apparatus further includes: the second acquisition module is used for acquiring a data identifier of the data to be stored from the data reading request under the condition that the data reading request of the client side for the stored data to be stored is received; the third acquisition module is used for acquiring storage path information of the data to be stored according to the data identifier of the data to be stored, acquiring the content of the data to be stored and at least one version number corresponding to the data to be stored according to the storage path information, wherein the acquired content of the data to be stored consists of sub-content acquired from at least one storage disk, the version number is generated by the target processing unit according to the storage times of the data to be stored, and the version number is stored to the storage disk along with the sub-content; the comparison module is used for comparing the acquired version number with a target version number of the data to be stored, determining target storage data according to the comparison result, and feeding back the target storage data to the client, wherein the target version number is the version number which is issued to the processing unit after the target processing unit successfully stores the data to be stored.

Optionally, the comparison module further includes: a fifth determining sub-module, configured to determine target storage data according to the content of the obtained data to be stored when the comparison result indicates that all the obtained version numbers are lower than or equal to the target version number; and the sixth determining submodule is used for waiting for the target processing unit to update the target version number under the condition that the version number higher than the target version number exists in the version number obtained by the comparison result characterization until all the obtained version numbers are lower than or equal to the updated target version number, re-obtaining the content of the data to be stored, and determining the target storage data according to the re-obtained content of the data to be stored.

Optionally, the data processing apparatus further includes: the statistics module is used for counting the number of the storage disks which are successfully stored after the content of the data to be stored is stored in the N storage disks; the first determining module is used for determining that the data to be stored is successfully stored under the condition that the number of the storage disks which are successfully stored is greater than or equal to the preset number; and the second determining module is used for determining that the data to be stored fails to be stored under the condition that the number of the storage disks which are successfully stored is smaller than the preset number.

Optionally, the data processing apparatus further includes: a fourth obtaining module, configured to obtain, when a first instruction of a total processing unit in the data processing system is received, storage path information of a target storage disk from the first instruction, where the first instruction is generated when a storage disk is newly added in the data processing system, the target storage disk is the newly added storage disk, and the first instruction is used to migrate data to be stored that is already stored in the processing unit to the newly added storage disk; and the data migration module is used for migrating the content of the data to be stored in at least one storage disk in the N storage disks to the target storage disk according to the storage path information of the target storage disk.

Optionally, the data processing apparatus further includes: a fifth obtaining module, configured to obtain, when a second instruction of a total processing unit in the data processing system is received, a data identifier of data to be stored from the second instruction, where the second instruction is generated when a target processing unit fails, and the second instruction is used to replace a processing unit that manages the data to be stored; and the third determining module is used for determining that the processing unit is used for managing the data to be stored, and the data to be stored are matched with the data identification.

Example 4

According to another aspect of the embodiments of the present invention, there is also provided a computer readable storage medium having a computer program stored therein, wherein the computer program is configured to perform the above-described data processing method when run.

Example 5

According to another aspect of an embodiment of the present invention, there is also provided an electronic device, wherein fig. 10 is a schematic diagram of an alternative electronic device according to an embodiment of the present invention, as shown in fig. 10, the electronic device including one or more processors; and a memory for storing one or more programs which, when executed by the one or more processors, cause the one or more processors to implement a method for running the program, wherein the program is configured to perform the data processing method described above when run.

The foregoing embodiment numbers of the present invention are merely for the purpose of description, and do not represent the advantages or disadvantages of the embodiments.

In the foregoing embodiments of the present invention, the descriptions of the embodiments are emphasized, and for a portion of this disclosure that is not described in detail in this embodiment, reference is made to the related descriptions of other embodiments.

In the several embodiments provided in the present application, it should be understood that the disclosed technology may be implemented in other manners. The above-described embodiments of the apparatus are merely exemplary, and the division of units may be a logic function division, and there may be another division manner in actual implementation, for example, multiple units or components may be combined or integrated into another system, or some features may be omitted, or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be through some interfaces, units or modules, or may be in electrical or other forms.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed over a plurality of units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.

In addition, each functional unit in the embodiments of the present invention may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units may be implemented in hardware or in software functional units.

The integrated units, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied in essence or a part contributing to the prior art or all or part of the technical solution in the form of a software product stored in a storage medium, comprising several instructions for causing a computer device (which may be a personal computer, a server or a network device, etc.) to perform all or part of the steps of the method of the various embodiments of the present invention. And the aforementioned storage medium includes: a usb disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a removable hard disk, a magnetic disk, or an optical disk, or other various media capable of storing program codes.

The foregoing is merely a preferred embodiment of the present invention and it should be noted that modifications and adaptations to those skilled in the art may be made without departing from the principles of the present invention, which are intended to be comprehended within the scope of the present invention.

Claims

1. A data processing method, the method being applied to any one of processing units in a data processing system, wherein the method comprises:

Under the condition that a data storage request of a client is received, acquiring data to be stored from the data storage request;

Judging whether the data to be stored is in the management range of the processing unit or not according to the information of the data to be stored;

Acquiring resource allocation information matched with the data to be stored under the condition that the data to be stored is in the management range of the processing unit, and storing the data to be stored based on the resource allocation information, wherein the resource allocation information comprises storage path information of N storage disks, the N storage disks are used for storing the data to be stored, and N is a positive integer;

Determining a target processing unit for managing the data to be stored from processing units in the data processing system and feeding back information of the target processing unit to the client under the condition that the data to be stored is not in the management range of the processing units, wherein the client sends the data storage request to the target processing unit according to the information of the target processing unit, and the target processing unit stores the data to be stored according to the data storage request;

The judging whether the data to be stored is in the management range of the processing unit according to the information of the data to be stored comprises the following steps:

Judging whether the processing unit stores the storage information of the data to be stored or not;

Under the condition that the storage information is stored in the processing unit, determining that the data to be stored is in the management range of the processing unit;

Judging whether the data to be stored is first requested to be stored or not under the condition that the storage information is not stored in the processing unit;

under the condition that the data to be stored is requested to be stored for the first time, determining that the data to be stored is in the management range of the processing unit;

And under the condition that the data to be stored is not stored for the first time, determining that the data to be stored is not in the management range of the processing unit.

2. The method according to claim 1, wherein, in the case where the data to be stored is the first request for storage, obtaining the resource allocation information matched with the data to be stored includes:

Generating a target request according to the data to be stored and sending the target request to a total processing unit in the data processing system, wherein the target request is used for requesting to acquire resource allocation information matched with the data to be stored, and the total processing unit is used for determining a storage strategy matched with the processing unit and generating the resource allocation information according to the storage strategy;

and under the condition that the resource allocation information fed back by the total processing unit is received, determining the received resource allocation information as the resource allocation information matched with the data to be stored.

3. The method of claim 1, wherein storing the data to be stored based on the resource allocation information comprises:

Storing the content of the data to be stored in the N storage disks according to the storage path information in the resource allocation information;

For each storage disk, detecting whether the storage disk has a fault or not in the process of storing the content of the data to be stored;

transmitting fault information of the storage disk to a total processing unit of the data processing system under the condition that the storage disk is detected to have faults;

Re-storing the content of the data to be stored to the storage disk under the condition that the storage disk is detected to be free from faults;

And storing the content of the data to be stored according to the update information when the update information of the total processing unit on the resource allocation information is received, wherein the update information is determined based on the fault information, and the update information at least comprises storage path information of storage disks except the N storage disks.

4. A method according to claim 3, wherein detecting whether the storage disk is faulty comprises at least one of:

Acquiring the writing speed of the storage disk for writing the data to be stored, and detecting whether the storage disk has faults or not according to the size relation between the writing speed and a preset speed;

Detecting whether storage feedback information of the storage disk is received or not, obtaining a detection result, and detecting whether the storage disk has faults or not according to the detection result;

And under the condition that the storage feedback information is received, detecting whether the storage disk has faults or not according to the storage feedback information.

5. The method of claim 1, wherein the stored data to be stored is shared between processing units in the data processing system, wherein after the target processing unit stores the data to be stored in accordance with the data storage request, the method further comprises:

Under the condition that a data reading request of the client for the stored data to be stored is received, acquiring a data identifier of the data to be stored from the data reading request;

Acquiring storage path information of the data to be stored according to the data identifier of the data to be stored, and acquiring content of the data to be stored and at least one version number corresponding to the data to be stored according to the storage path information, wherein the acquired content of the data to be stored consists of sub-content acquired from at least one storage disk, the version number is generated by the target processing unit according to the storage times of the data to be stored, and the version number is stored to the storage disk along with the sub-content;

Comparing the obtained version number with the target version number of the data to be stored, determining target storage data according to the comparison result, and feeding back the target storage data to the client, wherein the target version number is the version number which is issued to the processing unit after the target processing unit successfully stores the data to be stored.

6. The method of claim 5, wherein determining the target storage data based on the comparison result comprises:

Under the condition that the comparison result characterizes that all the obtained version numbers are lower than or equal to the target version number, determining the target storage data according to the content of the obtained data to be stored;

And under the condition that the version number higher than the target version number exists in the version numbers obtained by the comparison result characterization, waiting for the target processing unit to update the target version number until all the obtained version numbers are lower than or equal to the updated target version number, re-obtaining the content of the data to be stored, and determining the target storage data according to the re-obtained content of the data to be stored.

7. The method of claim 5, wherein the target processing unit determines whether the data to be stored was successfully stored by:

after the content of the data to be stored is stored in the N storage disks, counting the number of the storage disks which are successfully stored;

determining that the data to be stored is successfully stored under the condition that the number of the storage disks which are successfully stored is larger than or equal to the preset number;

and under the condition that the number of the storage disks which are successfully stored is smaller than the preset number, determining that the data to be stored fails to be stored.

8. The method of claim 1, wherein after storing the data to be stored based on the resource allocation information, the method further comprises:

Under the condition that a first instruction of a total processing unit in the data processing system is received, obtaining storage path information of a target storage disk from the first instruction, wherein the first instruction is generated under the condition that a storage disk is newly added in the data processing system, the target storage disk is the newly added storage disk, and the first instruction is used for migrating the data to be stored, which is stored by the processing unit, to the newly added storage disk;

And migrating the content of the data to be stored in at least one storage disk in the N storage disks to the target storage disk according to the storage path information of the target storage disk.

9. The method of claim 1, wherein after the target processing unit stores the data to be stored in accordance with the data storage request, the method further comprises:

acquiring a data identifier of the data to be stored from a second instruction of a total processing unit in the data processing system under the condition that the second instruction is received, wherein the second instruction is generated under the condition that the target processing unit fails, and the second instruction is used for replacing the processing unit for managing the data to be stored;

And determining that the processing unit is used for managing the data to be stored, which are matched with the data identification.

10. A data processing system, comprising:

m storage disks for storing data, wherein M is a positive integer;

each processing unit is used for acquiring data to be stored from a data storage request of a client when the data storage request of the client is received, storing the data to be stored based on resource allocation information matched with the data to be stored when the data to be stored is within the management range of the processing unit, sending a query request to a total processing unit when the data to be stored is not within the management range of the processing unit, receiving information of a target processing unit fed back by the total processing unit, and feeding back the information of the target processing unit to the client, wherein the resource allocation information comprises storage path information of N storage disks needing to store the data to be stored, N is a positive integer, M is greater than N, and the client sends the data storage request to the target processing unit according to the information of the target processing unit;

the total processing unit is used for determining a target processing unit for managing the data to be stored from the plurality of processing units according to the query request and feeding back information of the target processing unit to the processing unit;

The processing unit is further used for judging whether the processing unit stores the storage information of the data to be stored or not; under the condition that the storage information is stored in the processing unit, determining that the data to be stored is in the management range of the processing unit; judging whether the data to be stored is first requested to be stored or not under the condition that the storage information is not stored in the processing unit; under the condition that the data to be stored is requested to be stored for the first time, determining that the data to be stored is in the management range of the processing unit; and under the condition that the data to be stored is not stored for the first time, determining that the data to be stored is not in the management range of the processing unit.

11. A data processing apparatus, comprising:

The first acquisition module is used for acquiring data to be stored from a data storage request of a client under the condition of receiving the data storage request;

the judging module is used for judging whether the data to be stored is in the management range of the processing unit according to the information of the data to be stored;

the storage module is used for acquiring resource allocation information matched with the data to be stored under the condition that the data to be stored is in the management range of the processing unit, and storing the data to be stored based on the resource allocation information, wherein the resource allocation information comprises storage path information of N storage disks, the N storage disks are used for storing the data to be stored, and N is a positive integer;

The processing module is used for determining a target processing unit for managing the data to be stored from processing units in a data processing system and feeding back information of the target processing unit to the client when the data to be stored is not in the management range of the processing units, wherein the client sends the data storage request to the target processing unit according to the information of the target processing unit, and the target processing unit stores the data to be stored according to the data storage request;

Wherein, the judging module further comprises:

the first judging submodule is used for judging whether the processing unit stores the storage information of the data to be stored or not;

the first determining submodule is used for determining that the data to be stored is in the management range of the processing unit under the condition that the storage information is stored in the processing unit;

The second judging sub-module is used for judging whether the data to be stored is first-time request storage or not under the condition that the storage information is not stored in the processing unit;

The second determining submodule is used for determining that the data to be stored is in the management range of the processing unit when the data to be stored is requested to be stored for the first time;

and the third determining submodule is used for determining that the data to be stored is not in the management range of the processing unit when the data to be stored is not requested to be stored for the first time.

12. A computer readable storage medium, characterized in that the computer readable storage medium has stored therein a computer program, wherein the computer program is arranged to execute the data processing method according to any of the claims 1 to 9 when run.

13. An electronic device, the electronic device comprising one or more processors; a memory for storing one or more programs that, when executed by the one or more processors, cause the one or more processors to implement a method for running a program, wherein the program is configured to perform the data processing method of any of claims 1 to 9 when run.