CN115114240A

CN115114240A - Data processing method and device

Info

Publication number: CN115114240A
Application number: CN202110309817.7A
Authority: CN
Inventors: 郑永广; 武宗涛; 胡奎; 赵锡成
Original assignee: China United Network Communications Group Co Ltd
Current assignee: China United Network Communications Group Co Ltd
Priority date: 2021-03-23
Filing date: 2021-03-23
Publication date: 2022-09-27

Abstract

The embodiment of the application provides a data processing method and a data processing device, wherein when data processing is carried out, a data processing request is received firstly; wherein, the data processing request comprises the identifier of the data to be processed; searching a target file storage path of the data to be processed in the relational database management system according to the identifier of the data to be processed; acquiring data to be processed from a distributed file system according to a target file storage path; the distributed file system stores all data and file storage paths where all data are located; and processing the data to be processed. By dispersedly storing the data in the distributed file system, the distributed file system supports a large amount of data storage and has high throughput, so that the data processing efficiency is high when the data stored in the distributed file system is processed, and the data processing efficiency can be effectively improved.

Description

Data processing method and device

Technical Field

The present invention relates to the field of data management technologies, and in particular, to a data processing method and apparatus.

Background

In general, when data storage is required, data is stored in a relational database management system (MySQL). When the data is processed, the data to be processed is read from the MySQL, the data to be processed is analyzed and processed, and then the data analysis result is stored in the MySQL.

However, when mass data is stored in MySQL, if data to be processed is read from MySQL and analyzed, the data processing amount of MySQL is large, and thus the data processing efficiency is low.

Disclosure of Invention

The embodiment of the application provides a data processing method and device, which can effectively improve the data processing efficiency.

In a first aspect, an embodiment of the present application provides a data processing method, where the data processing method includes:

receiving a data processing request; wherein, the data processing request comprises the identification of the data to be processed.

Searching a target file storage path of the data to be processed in a relational database management system according to the identifier of the data to be processed; the relational database management system stores the identification of each data in a plurality of data and the file storage path of each data, and the data to be processed are included in the data.

Acquiring the data to be processed from a distributed file system according to the target file storage path; and the distributed file system stores the data and the file storage path where the data are located.

And processing the data to be processed.

In one possible implementation, the method further includes:

marking the state of the data to be processed in the relational database management system according to the processing result of the data to be processed; wherein the state of the data to be processed comprises processed or unprocessed.

In a possible implementation manner, before searching, in a relational database management system, a target file storage path where the to-be-processed data is located according to the identifier of the to-be-processed data, the method further includes:

the plurality of data is acquired.

And storing the plurality of data in the distributed file system, and generating a file storage path in which each of the plurality of data is located.

And establishing the relational database management system according to the identification of each data in the plurality of data and the file storage path of each data.

In one possible implementation manner, the storing the plurality of data in the distributed file system includes:

determining the data capacity which can be stored by each file storage unit in the file storage containers.

Storing the plurality of data in the distributed file system according to the data capacity which can be stored in each file storage unit; the file storage paths corresponding to different file storage units are different.

In one possible implementation, the method further includes:

and outputting the processing result of the data to be processed.

And updating the data to be processed in the distributed file system according to the processing result of the data to be processed.

In a possible implementation manner, the data processing request further includes a target processing method, and the processing the to-be-processed data includes:

and processing the data to be processed according to the target processing method.

In a second aspect, an embodiment of the present application provides a data processing apparatus, including:

a receiving unit for receiving a data processing request; wherein, the data processing request comprises the identification of the data to be processed.

The acquisition unit is used for searching a target file storage path of the data to be processed in a relational database management system according to the identifier of the data to be processed; the relational database management system stores the identification of each data in a plurality of data and the file storage path of each data, and the data to be processed are included in the data.

The acquisition unit is further configured to acquire the data to be processed from a distributed file system according to the target file storage path; and the distributed file system stores the data and the file storage path where the data are located.

And the processing unit is used for processing the data to be processed.

In a possible implementation manner, the processing unit is further configured to mark, in the relational database management system, a state of the data to be processed according to a processing result of the data to be processed; wherein the status of the data to be processed comprises processed or unprocessed.

In a possible implementation manner, the method further includes a storage unit, configured to obtain the plurality of data; storing the plurality of data in the distributed file system, and generating respective file storage paths of the plurality of data; and establishing the relational database management system according to the identification of each data in the plurality of data and the file storage path of each data.

In a possible implementation manner, the storage unit is specifically configured to determine a data capacity that can be stored by each file storage unit in the plurality of file storage containers; storing the plurality of data in the distributed file system according to the data capacity which can be stored in each file storage unit; the file storage paths corresponding to different file storage units are different.

In a possible implementation manner, the processing unit is further configured to output a processing result of the data to be processed; and updating the data to be processed in the distributed file system according to the processing result of the data to be processed.

In a possible implementation manner, the processing unit is specifically configured to process the data to be processed according to the target processing method.

In a third aspect, an embodiment of the present application further provides a data processing apparatus, where the data processing apparatus may include a memory and a processor; wherein the content of the first and second substances,

the memory is used for storing the computer program.

The processor is configured to read the computer program stored in the memory, and execute the data processing method in any one of the possible implementation manners of the first aspect according to the computer program in the memory.

In a fourth aspect, an embodiment of the present application further provides a computer-readable storage medium, where a computer-executable instruction is stored in the computer-readable storage medium, and when a processor executes the computer-executable instruction, the data processing method described in any one of the foregoing possible implementation manners of the first aspect is implemented.

In a fifth aspect, an embodiment of the present application further provides a computer program product, which includes a computer program, and when the computer program is executed by a processor, the computer program implements the data processing method described in any one of the possible implementation manners of the first aspect.

Therefore, the embodiment of the application provides a data processing method and a data processing device, and when data processing is carried out, a data processing request is received firstly; wherein, the data processing request comprises the identifier of the data to be processed; searching a target file storage path of the data to be processed in the relational database management system according to the identifier of the data to be processed; acquiring data to be processed from a distributed file system according to a target file storage path; the distributed file system stores all data and file storage paths where all data are located; and processing the data to be processed. By storing the data in the distributed file system in a distributed manner, the distributed file system supports a large amount of data storage and has high throughput, so that the data processing efficiency is high when the data stored in the distributed file system is processed, and the data processing efficiency can be effectively improved.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present disclosure and, together with the description, serve to explain the principles of the disclosure.

Fig. 1 is a schematic flowchart of a data processing method according to an embodiment of the present application;

fig. 2 is a schematic flow chart of another data processing method according to an embodiment of the present disclosure;

fig. 3 is a schematic structural diagram of a data processing apparatus according to an embodiment of the present application;

fig. 4 is a schematic structural diagram of another data processing apparatus according to an embodiment of the present application.

With the foregoing drawings in mind, certain embodiments of the disclosure have been shown and described in more detail below. These drawings and written description are not intended to limit the scope of the disclosed concepts in any way, but rather to illustrate the concepts of the disclosure to those skilled in the art by reference to specific embodiments.

Detailed Description

Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The implementations described in the exemplary embodiments below are not intended to represent all implementations consistent with the present disclosure. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the disclosure, as detailed in the appended claims.

In the embodiments of the present invention, "at least one" means one or more, "a plurality" means two or more. "and/or" describes the association relationship of the associated object, indicating that there may be three relationships, for example, a and/or B, which may indicate: a exists alone, A and B exist simultaneously, and B exists alone, wherein A and B can be singular or plural. In the description of the present invention, the character "/" generally indicates that the former and latter associated objects are in an "or" relationship.

The technical scheme provided by the embodiment of the application can be applied to a data processing scene. With the advent of the digital information age, data storage, processing and updating are generally performed by storing data in a relational database management system (MySQL) when there is a data storage requirement. When the data is processed, the data to be processed is read from the MySQL, the data to be processed is analyzed and processed, and then the data analysis result is stored in the MySQL.

However, when the relational database management system MySQL stores mass data, if data to be processed is read from the MySQL and analyzed, the data processing amount of the MySQL is large, and the data processing efficiency is low.

To solve the problem of low data processing efficiency. Considering that the distributed file system supports a large amount of data storage and has high throughput, the data can be stored in the distributed file system with high throughput in the form of file storage units, and corresponding file storage paths are stored in the relational database management system; when data processing is carried out, data are obtained from the distributed file system according to a file storage path in the relational database management system; and the data is processed, so that when the data stored in the distributed file system is processed, the data processing efficiency is higher, and the data processing efficiency can be effectively improved.

Based on the technical concept, the embodiment of the application provides a data processing method, when data processing is carried out, a data processing request is received firstly; the data processing request comprises an identifier of data to be processed; searching a target file storage path of the data to be processed in the relational database management system according to the identifier of the data to be processed; the relational database management system stores the identification of each data in a plurality of data and the file storage path of each data, and the plurality of data comprise data to be processed; acquiring data to be processed from a distributed file system according to a target file storage path; the distributed file system stores all data and file storage paths where all data are located; and processing the data to be processed.

The relational database management system MySQL is a relational database management system that can store data in different tables, rather than putting all data in one large warehouse.

Distributed File System (DFS) means that physical storage resources managed by a File System are not necessarily directly connected to a local node, but are connected to a node (which can be simply understood as a computer) through a computer network; or a complete hierarchical file system formed by combining a plurality of different logical disk partitions or volume labels together can support the storage of super-large files, and has the advantages of high fault-tolerant performance, high throughput and the like.

Therefore, in the embodiment of the application, when data processing is performed, a data processing request is received first; the data processing request comprises an identifier of data to be processed; searching a target file storage path of the data to be processed in the relational database management system according to the identifier of the data to be processed; acquiring data to be processed from a distributed file system according to a target file storage path; the distributed file system stores all data and file storage paths where all data are stored; and processing the data to be processed. By dispersedly storing the data in the distributed file system, the distributed file system supports a large amount of data storage and has high throughput, so that when the data stored in the distributed file system is processed, the data processing efficiency is high, and the data processing efficiency can be effectively improved.

Hereinafter, the data processing method provided in the present application will be described in detail by specific examples. It is to be understood that the following detailed description may be combined with the accompanying drawings, and that the same or similar concepts or processes may not be described in detail in connection with certain embodiments.

Fig. 1 is a schematic flowchart of a data processing method according to an embodiment of the present application. The data processing method may be performed by software and/or hardware means, for example, the hardware means may be a data processing means, which may be a terminal or a processing chip in the terminal. For example, referring to fig. 1, the data processing method may include:

s101, receiving a data processing request.

The data processing request comprises the identification of the data to be processed and a target processing method. The identifier of the data to be processed can be used for identifying the data; the target processing method is a specific way of processing data, and in the embodiment of the present application, no limitation is imposed on a specific method of processing data.

For example, the data processing request may be in the form of a code, that is, a code including an identifier of the data to be processed and a target processing method, or may be in other forms, which is not limited in this embodiment of the present application. In order to further improve the efficiency of data processing, the data processing request may be set in a manner of a fragmentation task, for example, the data processing request is divided into 5 pieces of tasks according to different data processing manners, and the 5 pieces of tasks are executed simultaneously.

S102, searching a target file storage path of the data to be processed in the relational database management system according to the identification of the data to be processed.

The relational database management system stores the identification of each data in the plurality of data and the file storage path of each data, and the plurality of data comprise the data to be processed.

For example, before searching a target file storage path where to-be-processed data is located in a relational database management system according to an identifier of the to-be-processed data, a plurality of data may be obtained first; storing a plurality of data in a distributed file system, and generating file storage paths where the plurality of data are respectively located; and establishing a relational database management system according to the identification of each data in the plurality of data and the file storage path of each data.

When acquiring a plurality of data, the data may be acquired through a plurality of channels, for example, a web browser, a server, and the like. In the communication field, the acquired data may also be service data generated by a mobile device of a user, for example, when a communication operator recommends a service for a communication user, historical service data of a number used by the user may be acquired and processed by the communication operator. The embodiment of the present application does not set any limit to the specific content of the plurality of data.

It is understood that the identifier of each data in the relational database management system may be represented by the source of each data or the use of each data, which is not limited in this embodiment of the present application. In addition, the relational database management system can also comprise a data processing state for indicating whether the data in the storage path is processed or not.

In the embodiment of the application, a plurality of data are stored in a distributed file system; generating a file storage path where a plurality of data are respectively located; and establishing a relational database management system according to the identification of each data in the plurality of data and the file storage path of each data, so that the database does not store a large amount of data any more, and the storage pressure of the relational database management system can be greatly reduced. In addition, the distributed file system has high throughput, so that the data acquisition speed can be increased, and the data processing efficiency can be effectively improved.

For example, the distributed file system includes a plurality of file storage units, and when a plurality of data are stored in the distributed file system, the data capacity storable by each file storage unit in a plurality of file storage containers may be determined; storing a plurality of data in a distributed file system according to the data capacity which can be stored in each file storage unit; the file storage paths corresponding to different file storage units are different.

The file storage container may be a distributed file system, or the distributed file system, or a folder in the distributed file system, which may be specifically set according to an actual situation, and this is not limited in this embodiment of the present application. The Distributed File System may use a lightweight Distributed File System (fastdistributed File System, referred to as FastDFS for short), so that the data processing speed is faster, and the data processing efficiency is improved.

It is understood that the file storage unit may be a text format file for storing data, and may also be other forms of files. When the file in the text format is used to store data, the data capacity that can be stored in the file in each text format may be set according to actual conditions, for example, the data stored in the file in each text format is set to 500 lines or 1000 lines. The embodiment of the present application does not set any limit to the form of the file storage unit and the storable data capacity.

For example, when a plurality of data are stored in a file storage unit in the distributed file system, the data may be classified first, and the data of the same type may be stored in the same file storage unit. For example, the data may be classified according to the source of the data, the data acquired through the browser may be stored in the same file storage unit, and the data acquired through the terminal device may be stored in the same file storage unit, which may be specifically classified according to the actual situation.

In the implementation of the application, a plurality of data are stored in the distributed file system in the form of file storage units, the data can be classified, and the data stored in each file storage unit meets the storable data capacity, so that the data acquisition is facilitated, and the data processing efficiency is effectively improved.

After the relational database management system is established, a target file storage path where the data to be processed is located can be searched in the relational database management system according to the identifier of the data to be processed in the data request, and specifically, the identifier of the data to be processed can be searched in the identifier of each data in the relational database management system; and acquiring a file storage path corresponding to the identifier of the data to be processed, namely acquiring a target file storage path of the data to be processed.

It can be understood that when a large amount of data is processed at the same time, since each target file stores limited data, when the file storage path where the data to be processed is located is obtained, a plurality of file storage paths can be obtained at the same time.

After the target file storage path where the data to be processed is located is obtained, the following step S103 may be executed:

s103, acquiring data to be processed from the distributed file system according to the target file storage path.

The distributed file system stores all data and file storage paths where all data are located.

Illustratively, when data to be processed is acquired from a distributed file system according to a target file storage path, a target file corresponding to the target file storage path is searched in the distributed file system according to the target file storage path; and acquiring data to be processed from the target file.

It can be understood that the acquired data to be processed may be data in one object file, or may be data in multiple object files, which is not limited in this embodiment of the present application.

And S104, processing the data to be processed.

When the data to be processed is processed, the data can be processed according to the target processing method in the data processing request; and acquiring a processing result of the data to be processed.

For example, after the processing result of the data to be processed is obtained, the state of the data to be processed may be marked in the relational database management system according to the processing result of the data to be processed; wherein the state of the data to be processed comprises processed or unprocessed. The state of the data to be processed may be represented by a special symbol, for example, "0" represents that the state of the data to be processed is unprocessed, and "1" represents that the state of the data to be processed is processed.

When the status of the data to be processed is marked in the relational database management system, the status of the data to be processed corresponding to the identification of the data to be processed can be marked according to the identification of the data to be processed in the data processing request. For example, the status of the data to be processed is changed from unprocessed to processed.

In the embodiment of the application, the state of the data to be processed in the relational database management system is marked according to the processing result of the data to be processed, so that a user can monitor the state of the data to be processed, and the data after being processed is prevented from being reprocessed.

For example, after the processing result of the data to be processed is obtained, the processing result of the data to be processed may also be output; and updating the data to be processed in the distributed file system according to the processing result of the data to be processed. When the processing result of the data to be processed is output, the processing result can be output through one or more modes of short messages, telephones, WeChat or application programs and the like, so that the processing result can be sent to a user in time.

It can be understood that the to-be-processed data in the distributed file system may be updated by deleting the target file stored in the to-be-processed data, and marking the target file stored in the to-be-processed data, which is not limited in this embodiment of the present application.

In the embodiment of the application, the data to be processed in the distributed file system can be updated according to the processing result of the data to be processed, so that the data stored in the distributed file system is always the latest data, the accuracy of the data in the distributed file system is ensured, and the accuracy of the data processing result is improved.

Therefore, in the data processing method provided by the embodiment of the application, when data processing is performed, a data processing request is received first; searching a target file storage path of the data to be processed in the relational database management system according to the identifier of the data to be processed; acquiring data to be processed from a distributed file system according to a target file storage path; and processing the data to be processed. By dispersedly storing the data in the distributed file system, the distributed file system supports a large amount of data storage and has high throughput, so that the data processing efficiency is high when the data stored in the distributed file system is processed, and the data processing efficiency can be effectively improved.

For facilitating understanding of the data processing method provided in the embodiment of the present application, the technical solution provided in the embodiment of the present application will be described in detail below by taking an application of the technical solution provided in the embodiment of the present application in the communication field as an example, and specifically, refer to fig. 2, where fig. 2 is a schematic flow diagram of another data processing method provided in the embodiment of the present application.

As shown in fig. 2, a large amount of service data may be obtained through a search server (ES), an FTP server, and a user terminal; and storing the acquired service data in a data storage module. The service data may also be obtained in other manners, and this is merely taken as an example for description in the embodiment of the present application, but this does not represent that the embodiment of the present application is only limited thereto.

Further, after a large amount of service data are acquired, the data storage module stores the data in a lightweight distributed file system in a text file manner and generates a corresponding file storage path; and establishing a relational database management system according to the file storage path, and storing the file storage path in the relational database management system. Assuming that 5 text files of service data are stored in the lightweight distributed file system and 5 file storage paths are generated, wherein each text file includes 500 service data, the established relational database management system can be referred to as table 1.

As shown in table 1, the relational database management system includes a data identifier, a processing method identifier, a file storage path, a short message status, a telephone status, an application status, and a WeChat status. The processing mode identification represents a processing method of the data under the file storage path; the short message state, the telephone state, the application program state and the WeChat state represent the state after data processing, namely whether the data processing result informs the user in one or more modes of the short message, the telephone, the application program and the WeChat. In table 1, "0" indicates that the data is unprocessed, and "1" indicates that the data is processed.

TABLE 1

According to fig. 2, when data processing is performed, the data processing module receives a data processing request, and searches for an identifier of data to be processed and a target file storage path corresponding to the identifier in the relational database according to the identifier of the data to be processed in the data processing request; acquiring data to be processed in a target file in a lightweight distributed file system according to a target file storage path; processing the data to be processed according to the target processing method in the data processing request to obtain a corresponding data processing result; and sending the data processing result to the user in the form of short message, telephone, application program or WeChat. In an example, the target file in the lightweight distributed file system can be deleted according to the data processing result; and updates the data processing state in the relational database management system.

Assuming that the received data processing request includes data identifier 1, data identifier 2, data identifier 3, data identifier 4, and data identifier 5, the updated relational database management system is shown in table 2 below.

TABLE 2

According to the table 2, the data processing result of the data identifier 1 is sent to the user by way of telephone and WeChat, the data processing result of the data identifier 2 is sent to the user by way of short message and telephone, the data processing result of the data identifier 3 is sent to the user by way of short message and WeChat, the data processing result of the data identifier 4 is sent to the user by way of telephone and application program, and the data processing result of the data identifier 5 is sent to the user by way of telephone and WeChat.

To sum up, in the data processing method provided in the embodiment of the present application, a large amount of service data is stored in a plurality of files in a distributed file system; and the storage path of the file stored in each data is stored in the relational database management system, so that the data is acquired in the distributed file system when the data is processed.

Fig. 3 is a schematic structural diagram of a data processing apparatus 30 according to an embodiment of the present application, and for example, please refer to fig. 3, the data processing apparatus 30 may include:

a receiving unit 301, configured to receive a data processing request; the data processing request comprises an identifier of data to be processed.

An obtaining unit 302, configured to search, in a relational database management system, a target file storage path where to-be-processed data is located according to an identifier of the to-be-processed data; the relational database management system stores the identification of each data in the plurality of data and the file storage path of each data, and the plurality of data comprise the data to be processed.

The obtaining unit 302 is further configured to obtain data to be processed from the distributed file system according to the target file storage path; the distributed file system stores all data and file storage paths where all data are located.

A processing unit 303, configured to process the data to be processed.

Optionally, the processing unit 303 is further configured to mark a state of the data to be processed in the relational database management system according to a processing result of the data to be processed; wherein the status of the data to be processed comprises processed or unprocessed.

Optionally, the method further includes a storage unit 304, configured to obtain a plurality of data; storing a plurality of data in a distributed file system, and generating file storage paths where the plurality of data are respectively located; and establishing a relational database management system according to the identification of each data in the plurality of data and the file storage path of each data.

Optionally, the storage unit 304 is specifically configured to determine a storable data capacity of each file storage unit in the multiple file storage containers; storing a plurality of data in a distributed file system according to the data capacity which can be stored in each file storage unit; the file storage paths corresponding to different file storage units are different.

Optionally, the processing unit 303 is further configured to output a processing result of the data to be processed; and updating the data to be processed in the distributed file system according to the processing result of the data to be processed.

Optionally, the processing unit 303 is specifically configured to process the data to be processed according to a target processing method.

The data processing apparatus provided in this embodiment of the present application may execute the technical solution of the data processing method in any embodiment, and the implementation principle and the beneficial effect of the data processing apparatus are similar to those of the data processing method, and reference may be made to the implementation principle and the beneficial effect of the data processing method, which is not described herein again.

Fig. 4 is a schematic structural diagram of another data processing apparatus 40 provided in the embodiment of the present application, and for example, please refer to fig. 4, the data processing apparatus 40 may include a processor 401 and a memory 402;

wherein the content of the first and second substances,

the memory 402 is used for storing computer programs.

The processor 401 is configured to read the computer program stored in the memory 402, and execute the technical solution of the data processing method in any of the embodiments according to the computer program in the memory 402.

Alternatively, the memory 402 may be separate or integrated with the processor 401. When the memory 402 is a device independent of the processor 401, the data processing apparatus 40 may further include: a bus for connecting the memory 402 and the processor 401.

Optionally, this embodiment further includes: a communication interface that may be connected to the processor 401 through a bus. The processor 401 may control the communication interface to implement the receiving and transmitting functions of the data processing apparatus 40 described above.

The data processing apparatus 40 shown in the embodiment of the present application can execute the technical solution of the data processing method in any of the above embodiments, and the implementation principle and the beneficial effect thereof are similar to those of the data processing method, and reference may be made to the implementation principle and the beneficial effect of the data processing method, which are not described herein again.

An embodiment of the present application further provides a computer-readable storage medium, where a computer execution instruction is stored in the computer-readable storage medium, and when a processor executes the computer execution instruction, the technical solution of the data processing method in any of the above embodiments is implemented, and implementation principles and beneficial effects of the data processing method are similar to those of the data processing method, which can be referred to for implementation principles and beneficial effects of the data processing method, and are not described herein again.

The embodiment of the present application further provides a computer program product, which includes a computer program, and when the computer program is executed by a processor, the technical solution of the data processing method in any of the above embodiments is implemented, and the implementation principle and the beneficial effect of the computer program are similar to those of the data processing method, which can be referred to as the implementation principle and the beneficial effect of the data processing method, and are not described herein again.

In the several embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the units is only one type of logical functional division, and other divisions may be realized in practice, for example, multiple units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.

The units described as separate parts may or may not be physically separate, and parts shown as units may or may not be physical units, may be located in one position, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment. In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, or in a form of hardware plus a software functional unit.

The integrated module implemented in the form of a software functional module may be stored in a computer-readable storage medium. The software functional module is stored in a storage medium and includes several instructions for enabling a computer device (which may be a personal computer, a server, or a network device) or a processor (in english: processor) to execute some steps of the methods according to the embodiments of the present application.

It should be understood that the Processor may be a Central Processing Unit (CPU), other general purpose processors, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), etc. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of a method disclosed in connection with the present invention may be embodied directly in a hardware processor, or in a combination of the hardware and software modules within the processor.

The memory may comprise a high-speed RAM memory, and may further comprise a non-volatile storage NVM, such as at least one disk memory, and may also be a usb disk, a removable hard disk, a read-only memory, a magnetic or optical disk, etc.

The bus may be an Industry Standard Architecture (ISA) bus, a Peripheral Component Interconnect (PCI) bus, an Extended ISA (EISA) bus, or the like. The bus may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, the buses in the figures of the present application are not limited to only one bus or one type of bus.

The computer-readable storage medium may be implemented by any type or combination of volatile or non-volatile memory devices, such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disks. A storage media may be any available media that can be accessed by a general purpose or special purpose computer.

Finally, it should be noted that: the above embodiments are only used to illustrate the technical solutions of the present application, and not to limit the same; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some or all of the technical features may be equivalently replaced; and these modifications or substitutions do not depart from the scope of the technical solutions of the embodiments of the present application.

Claims

1. A data processing method, comprising:

receiving a data processing request; wherein, the data processing request comprises the identifier of the data to be processed;

searching a target file storage path of the data to be processed in a relational database management system according to the identifier of the data to be processed; the relational database management system stores the identification of each data in a plurality of data and the file storage path of each data, and the plurality of data comprise the data to be processed;

acquiring the data to be processed from a distributed file system according to the target file storage path; the distributed file system stores the data and file storage paths where the data are located;

and processing the data to be processed.

2. The method of claim 1, further comprising:

3. The method according to claim 1, wherein before searching a storage path of a target file in which the data to be processed is located in a relational database management system according to the identifier of the data to be processed, the method further comprises:

acquiring the plurality of data;

storing the plurality of data in the distributed file system, and generating respective file storage paths of the plurality of data;

4. The method of claim 3, wherein the distributed file system includes a plurality of file storage units therein, and wherein storing the plurality of data in the distributed file system comprises:

determining the data capacity which can be stored by each file storage unit in the file storage containers;

storing the plurality of data in the distributed file system according to the data capacity which can be stored by each file storage unit; the file storage paths corresponding to different file storage units are different.

5. The method according to any one of claims 1-4, further comprising:

outputting a processing result of the data to be processed;

6. The method according to any one of claims 1 to 4, wherein the data processing request further includes a target processing method, and the processing the data to be processed includes:

7. A data processing apparatus, characterized by comprising:

a receiving unit for receiving a data processing request; wherein, the data processing request comprises the identifier of the data to be processed;

the acquisition unit is used for searching a target file storage path of the data to be processed in a relational database management system according to the identifier of the data to be processed; the relational database management system stores the identification of each data in a plurality of data and the file storage path of each data, and the data to be processed are included in the data;

the acquisition unit is further configured to acquire the data to be processed from a distributed file system according to the target file storage path; the distributed file system stores the data and file storage paths where the data are located;

and the processing unit is used for processing the data to be processed.

8. A data processing apparatus comprising a memory and a processor; wherein the content of the first and second substances,

the memory for storing a computer program;

the processor is configured to read the computer program stored in the memory and execute a data processing method according to any one of claims 1 to 6 according to the computer program in the memory.

9. A computer-readable storage medium having computer-executable instructions stored thereon which, when executed by a processor, implement a data processing method as claimed in any one of claims 1 to 6.

10. A computer program product comprising a computer program, characterized in that the computer program, when being executed by a processor, carries out a data processing method as claimed in any one of the preceding claims 1 to 6.