CN115993932A

CN115993932A - Data processing method, device, storage medium and electronic equipment

Info

Publication number: CN115993932A
Application number: CN202211475701.1A
Authority: CN
Inventors: 周兆星
Original assignee: China Telecom Corp Ltd
Current assignee: China Telecom Corp Ltd
Priority date: 2022-11-23
Filing date: 2022-11-23
Publication date: 2023-04-21

Abstract

The application discloses a data processing method, a data processing device, a storage medium and electronic equipment. Wherein the method comprises the following steps: acquiring each data transmitted by each data node; mapping each data into a corresponding data fragment according to the mapping relation; determining a priority of a storage area, wherein the storage area comprises: solid state drives and hard disk drives; the data stored in the data fragments are input into a plurality of buffer areas in the solid-state drive preferentially, and after the occupancy rate of the buffer areas of the solid-state drive reaches a preset threshold, the rest data are written into the hard disk drive. The method and the device solve the technical problems that the stored data are severely inclined due to the access data inclination phenomenon generated by multi-user, multi-task and multi-priority access flows in the cloud computing process of big data, and hot data competition and cold data storage resource waste are caused.

Description

Data processing method, device, storage medium and electronic equipment

Technical Field

The present invention relates to the field of big data, and in particular, to a data processing method, apparatus, storage medium, and electronic device.

Background

The related technology produces a computing node in the computing process, then encrypts and transmits the computing node to a cloud platform for storage, but the problem that large data can incline access data to multi-user, multi-task and multi-priority access flows in the cloud computing process, so that the storage data based on unified management is seriously inclined, and hot data competition and cold data storage resource waste are caused still exists.

In view of the above problems, no effective solution has been proposed at present.

Disclosure of Invention

The embodiment of the application provides a data processing method, a device, a storage medium and electronic equipment, which at least solve the technical problems that storage data are severely inclined and hot data competition and cold data storage resource waste are caused due to access data inclination phenomenon generated by multi-user, multi-task and multi-priority access flows in the cloud computing process of big data.

According to an aspect of the embodiments of the present application, there is provided a data processing method, including: acquiring each data transmitted by each data node; mapping each data into a corresponding data fragment according to the mapping relation; determining a priority of a storage area, wherein the storage area comprises: solid state drives and hard disk drives; the data stored in the data fragments are input into a plurality of buffer areas in the solid-state drive preferentially, and after the occupancy rate of the buffer areas of the solid-state drive reaches a preset threshold, the rest data are written into the hard disk drive.

Optionally, mapping each data into a corresponding data fragment according to the mapping relationship includes: determining initial key value pairs corresponding to each data, and mapping the initial key value pairs into target binary groups; and determining the data fragments to which each data belongs according to the key values in the target binary group.

Optionally, the method further comprises: the access condition of each data is determined, each data is divided according to the access condition, and each data is classified as hot data or cold data.

Optionally, after classifying the respective data as hot data or cold data, the method further comprises: and acquiring a global data copy load value, determining a data block corresponding to the hot data under the condition that the task executed in the current period is a non-local task, and automatically copying the data block from other nodes.

Optionally, after classifying the respective data as hot data or cold data, the method further comprises: detecting the data block load stored on the data node at intervals of a preset period, and acquiring the total number of data copies corresponding to the data node under the condition that the difference value between the data block load and the normal load is smaller than a preset threshold value; under the condition that the total number of the data copies is a preset number, transmitting erasure codes to the data nodes; and receiving the data information returned by the data block, and independently storing the data information to the cold data independent disk array.

Optionally, deleting the file in the data block on the data node and reporting the deletion information when the total number of the data copies is not the predetermined number, wherein the deletion information includes: file name and location corresponding to the file.

Optionally, determining the access condition of each data, dividing each data according to the access condition, and classifying each data as hot data or cold data includes: at least acquiring each file name and each access time corresponding to each data; determining the access times corresponding to each file name, and determining that the data is hot data when the access times are larger than preset access times; or, in the case that the access time belongs to the target period, determining that the data is hot data; when the access times are smaller than the preset access times, determining that the data are cold data; alternatively, in the case where the access time belongs to the target period, the data is determined to be cold data.

According to another aspect of the embodiments of the present application, there is also provided a data processing apparatus, including: the acquisition module is used for acquiring each data transmitted by each data node; the mapping module is used for mapping and storing each data into the corresponding data fragment according to the mapping relation; a determining module, configured to determine a priority of a storage area, where the storage area includes: the data stored in the data fragments are input into a plurality of buffer areas in the solid state drive preferentially, and after the occupancy rate of the buffer areas of the solid state drive reaches a preset threshold, the residual data are written into the hard disk drive.

According to another aspect of the embodiments of the present application, there is also provided a nonvolatile storage medium including: the storage medium includes a stored program, wherein the program, when run, controls a device in which the storage medium resides to execute any one of the data processing methods.

According to another aspect of the embodiments of the present application, there is also provided an electronic device, including: a processor; a memory for storing processor-executable instructions; wherein the processor is configured to execute instructions to implement any of the data processing methods.

In the embodiment of the application, a mode of dividing the data access heat is adopted, and all data transmitted by all data nodes are obtained; mapping each data into a corresponding data fragment according to the mapping relation; determining a priority of a storage area, wherein the storage area comprises: solid state drives and hard disk drives; the method comprises the steps of inputting data stored in the data fragments into a plurality of buffer areas in the solid state drive preferentially, writing the residual data into the hard disk drive after the occupancy rate of the buffer areas of the solid state drive reaches a preset threshold value, and achieving the purpose of reducing data redundancy in the large data cloud computing storage process, thereby realizing the technical effects of reducing hot data competition and avoiding cold data storage resource waste, further solving the technical problems of serious inclination of stored data caused by access data inclination phenomena of multiple users, multiple tasks and multiple priority access flows in the cloud computing process, and causing hot data competition and cold data storage resource waste.

Drawings

The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this application, illustrate embodiments of the application and together with the description serve to explain the application and do not constitute an undue limitation to the application. In the drawings:

FIG. 1 is a flow diagram of a data processing method according to an embodiment of the present application;

FIG. 2 is a flow diagram of an alternative data processing method according to an embodiment of the present application;

FIG. 3 is a flow chart of data execution of a data processing method according to an embodiment of the present application;

FIG. 4 is a data flow diagram of a data processing method according to an embodiment of the present application;

FIG. 5 is a schematic diagram of an apparatus structure of a data processing method according to an embodiment of the present application;

fig. 6 is a schematic block diagram of an example electronic device 600 according to an embodiment of the present application.

Detailed Description

In order to make the present application solution better understood by those skilled in the art, the following description will be made in detail and with reference to the accompanying drawings in the embodiments of the present application, it is apparent that the described embodiments are only some embodiments of the present application, not all embodiments. All other embodiments, which can be made by one of ordinary skill in the art based on the embodiments herein without making any inventive effort, shall fall within the scope of the present application.

It should be noted that the terms "first," "second," and the like in the description and claims of the present application and the above figures are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that embodiments of the present application described herein may be implemented in sequences other than those illustrated or otherwise described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

In accordance with the embodiments of the present application, there is provided a method embodiment of data processing, it being noted that the steps shown in the flowcharts of the figures may be performed in a computer system such as a set of computer executable instructions, and although a logical order is shown in the flowcharts, in some cases the steps shown or described may be performed in an order other than that shown.

Fig. 1 is a data processing method according to an embodiment of the present application, as shown in fig. 1, the method includes the following steps:

step S102, acquiring each data transmitted by each data node;

step S104, mapping each data into a corresponding data fragment according to the mapping relation;

step S106, determining the priority of the storage area, wherein the storage area comprises: solid state drives and hard disk drives;

it should be noted that, the priority of the solid state drive is higher than the priority of the hard disk drive.

Step S108, the data stored in the data fragments are input to a plurality of buffer areas in the solid state drive preferentially, and after the occupancy rate of the buffer areas of the solid state drive reaches a preset threshold, the rest data are written into the hard disk drive.

It should be noted that, the preset threshold may be 80%, that is, when the occupancy rate of the buffer area reaches 80%, the remaining data is written into the hard disk drive.

In an exemplary embodiment of the present application, mapping each data into a corresponding data fragment according to a mapping relationship includes: determining initial key value pairs corresponding to each data, and mapping the initial key value pairs into target binary groups; and determining the data fragments to which each data belongs according to the key values in the target binary group.

It will be appreciated that the higher number of accesses is hot data and the lower number of accesses is cold data.

In some optional embodiments of the present application, after classifying each data as hot data or cold data, the method further comprises: and acquiring a global data copy load value, determining a data block corresponding to the hot data under the condition that the task executed in the current period is a non-local task, and automatically copying the data block from other nodes.

In an exemplary embodiment of the present application, after classifying each data as hot data or cold data, the method further includes: detecting the data block load stored on the data node at intervals of a preset period, and acquiring the total number of data copies corresponding to the data node under the condition that the difference value between the data block load and the normal load is smaller than a preset threshold value; under the condition that the total number of the data copies is a preset number, transmitting erasure codes to the data nodes; and receiving the data information returned by the data block, and independently storing the data information to the cold data independent disk array.

In an alternative embodiment, if the total number of data copies is not a predetermined number, deleting a file in a data block on the data node and reporting deletion information, where the deletion information includes: file name and location corresponding to the file.

It should be noted that, the predetermined number of data copies are three copies, no copy operation can be performed on a single file, and a random allocation policy is adopted for the storage positions of the copies.

In some optional embodiments of the present application, determining an access condition of each data, dividing each data according to the access condition, and classifying each data as hot data or cold data includes: at least acquiring each file name and each access time corresponding to each data; determining the access times corresponding to each file name, and determining that the data is hot data when the access times are larger than preset access times; or, in the case that the access time belongs to the target period, determining that the data is hot data; when the access times are smaller than the preset access times, determining that the data are cold data; alternatively, in the case where the access time belongs to the target period, the data is determined to be cold data.

It will be appreciated that when the access time falls within a peak period, the data may be determined to be hot data; when the access time belongs to an off-peak period, the data may be determined to be cold data.

In order to facilitate a better understanding of the technical solutions of the present application, a specific embodiment will now be described.

FIG. 2 is a schematic flow chart of an alternative data processing method according to an embodiment of the present application, as shown in FIG. 2, the flow mainly includes the following steps:

(1) Partitioning the acquired data content according to the data requirement, and establishing a plurality of databases through the partitioned content;

(2) One or more data nodes are stored in the data fragments, the data fragments are divided by a mapping technology in a processing partition mode, then the data are sent to a plurality of buffer areas in the solid-state drive, and when the buffer areas are fully written, the background writes the data in the buffer areas into the hard disk drive;

(3) The information acquisition module acquires upper node data access logs in the system and provides information for dynamic data division, a dynamic cold and hot copy distinguishing module is adopted to dynamically divide data access heat, and a dynamic copy storage module is used for managing and maintaining the number of copies;

it should be noted that, the default copy number of all files in the dynamic cold and hot copy distinguishing module is three copies, copy operation cannot be performed on a single file, a completely random allocation strategy is adopted for storage positions of the copies, and the dynamic cold and hot copy distinguishing module performs unified management on data storage and data access.

It should be noted that, the dynamic copy storage module can make mark distinction on the cold and hot of the data according to the access condition of the data, the data copy of the dynamic copy storage module is completely dynamic, and the dynamic copy storage module adopts a feedback adjustment mechanism, which mainly includes a copy increasing mechanism of zero data copy and a data copy automatic attenuation mechanism to change the number of data blocks.

It can be understood that the copy adding mechanism of the zero data copy can acquire the global data copy load value based on the log record module, if a non-local task is executed, the corresponding data block is hot data, when the data mapping is completed, the mapping task can automatically copy the data block from other nodes, the copy adding mechanism of the zero data copy occurs after the data block copy is completed, the conventional mapping task can discard the mapping input data as a temporary file, and the copy adding mechanism of the zero data copy locally persistence and reports the data to the server, so that the visibility of the data block is realized.

The automatic attenuation mechanism of the data copy is based on the calculation of the load of the data block, the load of the data block stored on the whole node is scanned regularly, when the load of the data block is found to be obviously lower than the normal load value, the dynamic copy storage module accesses preferentially, the total number of the copies is obtained, if the number of the copies is not three, the corresponding file of the data block is deleted directly and reported, the global visibility of the data deletion is realized, and if the number of the copies is equal to three, the data is processed by the cold data independent disk array module.

(4) The method comprises the steps that (1) extra data block reliability storage is carried out on attenuated data which are rarely accessed by adopting a cold data independent disk array module;

it should be noted that, the storage of the data block of the cold data independent disk array module adopts a delayed loading mode, when the current number of all copies of the dynamic cold and hot copy distinguishing module is three, the data node issues the erasure code calculation operation to the data node, and when the data node receives the information, the data node submits the information of the data block, and the cold data independent disk array module performs the data reliability storage.

It can be understood that the RAID storage of the data blocks of the cold data independent disk array module adopts a delayed loading mode, for the data blocks of a file, the master node periodically gathers all copy positions of the data blocks, if the number of available copies is lower than three copies, the automatic copy adding operation is performed, so as to ensure the reliability of the data, in the dynamic cold and hot copy distinguishing module, after the master node receives a message that the life cycle of the copies of the data blocks is finished, the copy number of the data blocks is monitored, if the current number of all copies is found to be three, erasure code computing operation is issued to the data node, the data node does not delete the corresponding copies immediately after receiving the information, but submits the information of the data blocks, including the file name of the data block, the data block splitting ID number, the original data block data is sent to the cold data independent disk array module for data reliability storage, and after the data blocks are stored in the cold data independent disk array module, the data block is deleted, the data block whose life cycle has been finished is returned to the master node, and the whole flow of the data block storage is completed.

(5) The data is divided into a plurality of segments, the data is written in the segments, the mapping input is completed, a plurality of temporary buffer area files exist in the solid-state drive, the data are strictly sequenced and integrated according to key values through a reorganization end, and an intermediate data file containing a plurality of partitions is formed and stored in the hard disk drive;

(6) And the reorganization end globally merges the files in the file transmission process, aggregates the key value pairs with the same key value into a key group indexed by the key value, and transmits the key group content to the reduction end for application.

It is easy to notice that the method has the following beneficial effects by dividing the data access heat:

(1) According to the scheme, a dynamic storage mode is adopted, data storage is carried out through a full dynamic copy mode and an independent redundant disk array combination strategy, compared with a static scheme, dynamic copy can be efficiently adapted to the change of upper file access, so that adaptive data storage is provided, for hot spot data, the number of the dynamic copy can be increased, the availability of the data under concurrency can be improved, the generation of non-local tasks is reduced, network transmission overhead is reduced, the unbalanced load condition of nodes is relieved, the overall performance of a system is improved, for cold data, compared with the dynamic strategy adopted by a static independent redundant disk array operation, the DHS can be used for timely carrying out copy number reduction operation according to the access change of the data, and the storage cost of the data can be saved from the whole system.

(2) According to the scheme, the dynamic cold and hot copy distinguishing module is adopted, the dynamic cold and hot copy distinguishing module adopts a dynamic copy mode, the load of the data blocks is virtually dependent on the backup number of the current data blocks, the access load of files in the same access state can be changed along with the change of the copy number, more data blocks can bear the pressure of upper access together for more data backups, so that the load is lower, the fewer data backups are just opposite, and for a file, the upper access depends on a user, and the dynamic cold and hot copy distinguishing module cannot interfere, so that the dynamic cold and hot copy distinguishing module achieves the final purpose of adapting to the upper access by adjusting the copy number by utilizing the concept of the data block load.

(3) According to the scheme, the cold data independent disk array module is adopted, for cloud computing of big data, the most core mechanism depends on load abstraction and computing of data blocks, the number of copies of the data blocks, the load of disks and the load of nodes are directly determined by the load of the data blocks, the load of the disks is a core parameter of a multi-disk scheduler, the priority of task scheduling is affected by the load of the nodes, the position where the copies are specifically increased is further affected, and the task load of the nodes is changed by adopting the cold data independent disk array module, so that the redundancy problem caused by data inclination in big data cloud computing is solved, the redundancy is reduced to the lowest value, the load of a big data cloud computing server is reduced, and the speed of big data cloud computing is improved.

Fig. 3 is a schematic diagram of a data execution flow of a data processing method according to an embodiment of the present application, as shown in fig. 3, where the flow mainly includes the following steps:

(1) The plurality of databases are distributed to a plurality of servers for network interconnection, and each data fragment is analyzed one by one in the process of data analysis by a mapping technology;

(2) The information acquisition module is used for recording data required to be recorded for each access: the file name of the current access, the node position of the data block after the file division and the time of the current access;

it should be noted that, the format of the access information of the collected file is < file name, list < access time > >, the access information is used for dividing the file heat, the mapping relation between the file name and the node position of the splitting data block of the file is used for heat calculation, and the node heat is calculated according to the file data block distribution and the file heat, so that the function of balancing the node load of the subsequent task scheduler is supported.

(3) The input of the mapping task is usually text data, the initial Key value pair is < RAWdata, line number >, and one or more Key values of the < RAWdata, line number > are remapped into meaningful < Key, value > tuples through a mapping end;

it should be noted that, the output result of the mapping is partitioned and then transferred to a buffer area in the solid state drive, and the background process writes the data in the current buffer area into the hard disk drive whenever the buffer area is about to be written to 80%. When all mapping inputs have been completed, there may be multiple temporary buffer files in the hard disk drive that need to be merged, and during the merging process, it is ensured that the data within each partition of the final merged file is strictly ordered according to key values.

It can be understood that, in order to ensure the speed of processing the mass data, all the key value pairs output by mapping are strictly arranged in ascending order according to the key values, and the advantage of the strictly arranged in ascending order is that the reduction data can conveniently and quickly find a certain key value pair, thereby improving the quick query of the user on the result key value pair.

(4) When the mapping end processes data, a plurality of reductions exist, one kind of mapping data needs to be sent to the partition of the corresponding reduction task, each partition can ensure that the data in the mapping end is mapped to the unique reduction task, and the key value pair output by the mapping task can be reduced to the unique partition according to the key value;

(5) Reorganization copies data exceeding the storage space of the available solid state drive to the hard disk drive as a temporary file, and the result of the reduction is organized in the form of key value pairs and written to the server side.

It will be appreciated that the copy stage results in the reduction end accepting a large number of mapping results, and that the splitting into a plurality of different files requires global merging of the files after all copies have been completed, thereby generating final reduction input data, which is aggregated for key-value pairs having the same key-value into a set of values indexed by key-values, at which point the reorganization end has completed the transfer of the mapping output results to the reduction end.

Fig. 4 is a data flow diagram of a data processing method according to an embodiment of the present application, as shown in fig. 4, the process mainly includes the following steps:

(1) Because there are multiple reductions when the mapping end processes data, the mapping data needs to be sent to the partitions of the corresponding reduction tasks, such as partition a, partition B, partition C and partition D in the solid-state drive shown in fig. 4, each partition can ensure that the data in the mapping end is mapped to a unique reduction task, the key value pair output by the mapping task can be attributed to a unique partition according to the key value, and the reorganizing end copies the data exceeding the storage space of the available solid-state drive into the hard disk drive as temporary files, for example, temporary file a, temporary file B, temporary file C and temporary file D in fig. 4;

(2) The reduction end receives a large amount of mapping results in the copying stage, and because the reduction end is divided into a plurality of different files, the files need to be globally combined after all copying is completed, so that reduction input data is generated;

(3) The reduction input data is input to a reduction task.

Fig. 5 is a schematic device structure diagram of a data processing method according to an embodiment of the present application, as shown in fig. 5, where the device includes:

an acquiring module 50, configured to acquire each data transmitted by each data node;

the mapping module 52 is configured to map each data into a corresponding data slice according to the mapping relationship;

a determining module 54, configured to determine a priority of a storage area, where the storage area includes: the data stored in the data fragments are input into a plurality of buffer areas in the solid state drive preferentially, and after the occupancy rate of the buffer areas of the solid state drive reaches a preset threshold, the residual data are written into the hard disk drive.

In the device, an acquisition module 50 is used for acquiring each data transmitted by each data node; the mapping module 52 is configured to map each data into a corresponding data slice according to the mapping relationship; a determining module 54, configured to determine a priority of a storage area, where the storage area includes: the solid state drive and the hard disk drive are used for inputting data stored in the data fragments into a plurality of buffer areas in the solid state drive preferentially, and after the occupancy rate of the buffer areas of the solid state drive reaches a preset threshold value, the residual data is written into the hard disk drive, so that the aim of reducing data redundancy in the large data cloud computing and storing process is fulfilled, the technical effects of reducing hot data competition and avoiding cold data storage resource waste are realized, and the technical problems of serious inclination of stored data caused by access data inclination phenomena caused by multi-user, multi-task and multi-priority access flows in the cloud computing process of the large data and the heat data competition and cold data storage resource waste are caused are solved.

According to another aspect of the embodiments of the present application, there is also provided a nonvolatile storage medium including a stored program, where the device in which the nonvolatile storage medium is controlled to execute any one of the data processing methods when the program runs.

Specifically, the storage medium is configured to store program instructions for the following functions, and implement the following functions:

acquiring each data transmitted by each data node; mapping each data into a corresponding data fragment according to the mapping relation; determining a priority of a storage area, wherein the storage area comprises: solid state drives and hard disk drives; the data stored in the data fragments are input into a plurality of buffer areas in the solid-state drive preferentially, and after the occupancy rate of the buffer areas of the solid-state drive reaches a preset threshold, the rest data are written into the hard disk drive.

Alternatively, in the present embodiment, the storage medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

In an exemplary embodiment of the present application, a computer program product is also provided, comprising a computer program which, when executed by a processor, implements any of the above-mentioned data processing methods.

Optionally, the computer program may, when executed by a processor, implement the steps of:

There is provided, according to an embodiment of the present application, an electronic device including: at least one processor; and a memory communicatively coupled to the at least one processor; the memory stores instructions executable by the at least one processor to enable the at least one processor to perform any one of the data processing methods described above.

Optionally, the electronic device may further include a transmission device and an input/output device, where the transmission device is connected to the processor, and the input device is connected to the processor.

Fig. 6 is a schematic block diagram of an example electronic device 600 according to an embodiment of the present application. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular telephones, smartphones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the application described and/or claimed herein.

As shown in fig. 6, the apparatus 600 includes a computing unit 601 that can perform various appropriate actions and processes according to a computer program stored in a Read Only Memory (ROM) 602 or a computer program loaded from a storage unit 608 into a Random Access Memory (RAM) 603. In the RAM 603, various programs and data required for the operation of the device 600 may also be stored. The computing unit 601, ROM 602, and RAM 603 are connected to each other by a bus 604. An input/output (I/O) interface 605 is also connected to bus 604.

Various components in the device 600 are connected to the I/O interface 605, including: an input unit 606 such as a keyboard, mouse, etc.; an output unit 607 such as various types of displays, speakers, and the like; a storage unit 608, such as a magnetic disk, optical disk, or the like; and a communication unit 609 such as a network card, modem, wireless communication transceiver, etc. The communication unit 609 allows the device 600 to exchange information/data with other devices via a computer network, such as the internet, and/or various telecommunication networks.

The computing unit 601 may be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of computing unit 601 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various specialized Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, etc. The computing unit 601 performs the respective methods and processes described above, such as a data processing method. For example, in some embodiments, the data processing method may be implemented as a computer software program tangibly embodied on a machine-readable medium, such as storage unit 608. In some embodiments, part or all of the computer program may be loaded and/or installed onto the device 600 via the ROM 602 and/or the communication unit 609. When a computer program is loaded into RAM 603 and executed by computing unit 601, one or more steps of the data processing method described above may be performed. Alternatively, in other embodiments, the computing unit 601 may be configured to perform the data processing method by any other suitable means (e.g. by means of firmware).

Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuit systems, field Programmable Gate Arrays (FPGAs), application Specific Integrated Circuits (ASICs), application Specific Standard Products (ASSPs), systems On Chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs, the one or more computer programs may be executed and/or interpreted on a programmable system including at least one programmable processor, which may be a special purpose or general-purpose programmable processor, that may receive data and instructions from, and transmit data and instructions to, a storage system, at least one input device, and at least one output device.

Program code for carrying out methods of the present application may be written in any combination of one or more programming languages. These program code may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus such that the program code, when executed by the processor or controller, causes the functions/operations specified in the flowchart and/or block diagram to be implemented. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package, partly on the machine and partly on a remote machine or entirely on the remote machine or server.

In the context of this application, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. The machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and pointing device (e.g., a mouse or trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user may be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic input, speech input, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a background component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such background, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), and the internet.

The computer system may include a client and a server. The client and server are typically remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server may be a cloud server, a server of a distributed system, or a server incorporating a blockchain.

The foregoing embodiment numbers of the present application are merely for describing, and do not represent advantages or disadvantages of the embodiments.

In the foregoing embodiments of the present application, the descriptions of the embodiments are emphasized, and for a portion of this disclosure that is not described in detail in this embodiment, reference is made to the related descriptions of other embodiments.

In the several embodiments provided in the present application, it should be understood that the disclosed technology content may be implemented in other manners. The above-described embodiments of the apparatus are merely exemplary, and the division of the units, for example, may be a logic function division, and may be implemented in another manner, for example, a plurality of units or components may be combined or may be integrated into another system, or some features may be omitted, or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be through some interfaces, units or modules, or may be in electrical or other forms.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.

In addition, each functional unit in each embodiment of the present application may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units may be implemented in hardware or in software functional units.

The integrated units, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application may be embodied in essence or a part contributing to the prior art or all or part of the technical solution in the form of a software product stored in a storage medium, including several instructions to cause a computer device (which may be a personal computer, a server or a network device, etc.) to perform all or part of the steps of the methods described in the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a removable hard disk, a magnetic disk, or an optical disk, or other various media capable of storing program codes.

The foregoing is merely a preferred embodiment of the present application and it should be noted that modifications and adaptations to those skilled in the art may be made without departing from the principles of the present application and are intended to be comprehended within the scope of the present application.

Claims

1. A method of data processing, comprising:

acquiring each data transmitted by each data node;

mapping and storing each data to a corresponding data fragment according to the mapping relation;

determining a priority of a storage area, wherein the storage area comprises: solid state drives and hard disk drives;

and preferentially inputting the data stored in the data fragments into a plurality of buffer areas in the solid-state drive, and writing the residual data into the hard disk drive after the occupancy rate of the buffer areas of the solid-state drive reaches a preset threshold.

2. The method of claim 1, wherein mapping the respective data into the corresponding data slices according to the mapping relationship comprises:

determining initial key value pairs corresponding to the data, and mapping the initial key value pairs into target tuples;

and determining the data fragments to which each data belongs according to the key value in the target binary group.

3. The method according to claim 1, wherein the method further comprises:

and determining the access condition of each data, dividing each data according to the access condition, and classifying each data into hot data or cold data.

4. A method according to claim 3, wherein after classifying the respective data as hot data or cold data, the method further comprises:

and acquiring a global data copy load value, determining a data block corresponding to the hot data under the condition that the task executed in the current period is a non-local task, and automatically copying the data block from other nodes.

5. A method according to claim 3, wherein after classifying the respective data as hot data or cold data, the method further comprises:

detecting the data block load stored on a data node at intervals of a preset period, and acquiring the total number of data copies corresponding to the data node under the condition that the difference value between the data block load and the normal load is smaller than a preset threshold value;

under the condition that the total number of the data copies is a preset number, transmitting erasure codes to the data nodes;

and receiving the data information returned by the data block, and independently storing the data information to a cold data independent disk array.

6. The method according to claim 5, wherein, if the total number of data copies is not the predetermined number, deleting files in the data block on the data node and reporting deletion information, wherein the deletion information includes: and the file name and the position corresponding to the file.

7. A method according to claim 3, wherein determining the access condition of the respective data, dividing the respective data according to the access condition, classifying the respective data as hot data or cold data, comprises:

at least acquiring each file name and each access time corresponding to each data;

determining the access times corresponding to each file name, and determining the data as hot data when the access times are larger than preset access times; or, in the case that the access time belongs to a target period, determining that the data is hot data;

when the access times are smaller than preset access times, determining that the data are cold data; or, in the case that the access time belongs to the target period, determining that the data is cold data.

8. A data processing apparatus, comprising:

the acquisition module is used for acquiring each data transmitted by each data node;

the mapping module is used for mapping and storing each data into a corresponding data fragment according to the mapping relation;

a determining module, configured to determine a priority of a storage area, where the storage area includes: the data stored in the data fragments are input to a plurality of buffer areas in the solid state drive preferentially, and after the occupancy rate of the buffer areas of the solid state drive reaches a preset threshold, the rest data are written into the hard disk drive.

9. A non-volatile storage medium, characterized in that the storage medium comprises a stored program, wherein the program, when run, controls a device in which the storage medium is located to perform the data processing method according to any one of claims 1 to 7.

10. An electronic device, comprising:

a processor;

a memory for storing the processor-executable instructions;

wherein the processor is configured to execute the instructions to implement the data processing method of any of claims 1 to 7.