CN117194014A - Data processing method, device and storage medium - Google Patents

Data processing method, device and storage medium Download PDF

Info

Publication number
CN117194014A
CN117194014A CN202311075597.1A CN202311075597A CN117194014A CN 117194014 A CN117194014 A CN 117194014A CN 202311075597 A CN202311075597 A CN 202311075597A CN 117194014 A CN117194014 A CN 117194014A
Authority
CN
China
Prior art keywords
data
processed
processing
determining
parallel processing
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311075597.1A
Other languages
Chinese (zh)
Inventor
李思民
马立志
曾鸣
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Lenovo Beijing Ltd
Original Assignee
Lenovo Beijing Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Lenovo Beijing Ltd filed Critical Lenovo Beijing Ltd
Priority to CN202311075597.1A priority Critical patent/CN117194014A/en
Publication of CN117194014A publication Critical patent/CN117194014A/en
Pending legal-status Critical Current

Links

Landscapes

  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The application provides a data processing method, which comprises the following steps: determining attribute information of data to be processed; determining a parallel processing strategy of the data according to the attribute information and the historical processing information; the attribute information at least comprises the data quantity of the data to be processed and the completion time limit of the data to be processed.

Description

Data processing method, device and storage medium
Technical Field
The application relates to a data processing method, device and storage medium.
Background
At present, most of dispatching optimization schemes of big data platforms adopt peak clipping and valley filling modes, and the aim is to run the jobs running during the job dispatching peak in the dispatching valley period, so that the resource utilization of the big data platforms in each time period is balanced, and the data job running efficiency is optimal; the other mode is to calculate the resource gap needed by the job scheduling peak period of the big data platform, and adjust the parallelism of the data job tasks when the resource gap exists, or add more resources to ensure the completion of the job operation in the peak period.
Both the two modes work perfectly under specific conditions, but have a certain difference from the actual situation. For example, resources in large data platforms tend to be limited, and in particular, there are insufficient resources or insufficient resources to allocate in time to each job during peak usage. On the other hand, the job scheduling time of the job peak period is changed to the job valley period, and the task completion time possibly not meeting the service requirement is changed. Therefore, how to make decisions on maximizing business revenues using limited resources in a big data platform is a problem that needs to be solved at present.
Disclosure of Invention
In view of this, the embodiments of the present application desire to provide a data processing method, apparatus, and storage medium.
In order to achieve the above purpose, the technical scheme of the application is realized as follows:
according to an aspect of the present application, there is provided a data processing method including:
determining attribute information of data to be processed;
determining a parallel processing strategy of the data according to the attribute information and the historical processing information;
the attribute information at least comprises the data quantity of the data to be processed and the completion time limit of the data to be processed.
In the above solution, the determining attribute information of the data to be processed includes:
determining the data quantity of the data to be processed through the historical processing information, and determining the completion time limit of the data to be processed based on the data quantity; wherein the history processing information includes history processing efficiency information and history processing data amount information of the processed data.
In the above scheme, the determining a parallel processing policy of the data according to the attribute information and the history processing information includes:
determining data processing efficiency and data processing amount corresponding to different parallel processing strategies based on the historical processing information, wherein the historical processing data represents the historical processing efficiency of the processed data;
determining the expected completion time and the expected completion data amount of the data to be processed under different parallel processing strategies based on the data processing efficiency corresponding to the different parallel processing strategies;
and determining a parallel processing strategy of the data to be processed based on the expected completion time and the expected completion data amount.
In the above scheme, the attribute information further includes a data priority weight value;
said determining a parallel processing strategy for said data to be processed based on said predicted completion time and said predicted amount of completion data, comprising
And determining a parallel processing strategy of which the expected completion time is within the completion time limit of the data to be processed and the expected completion data quantity meets the processing condition as the parallel processing strategy of the data based on the priority weight value.
In the above aspect, the method further comprises at least one of:
determining a data priority value of the data to be processed based on a data source of the data to be processed;
determining a data priority value of the data to be processed based on the data amount of the data to be processed and the completion time limit;
and determining a data priority weight value of the data to be processed based on the sensitivity value of the data to be processed.
In the above scheme, the method further comprises:
and processing the data to be processed based on the parallel processing strategy.
In the above scheme, the data to be processed includes a plurality of data priority weight values, and each priority weight value corresponds to different types of data;
the processing the data to be processed based on the parallel processing strategy comprises the following steps:
comparing the priority weight values to obtain a priority sequence;
and processing the data to be processed based on the parallel processing strategy and the priority sequence.
According to another aspect of the present application, there is provided an electronic apparatus including:
a determining unit for determining attribute information of the data to be processed; the parallel processing strategy is used for determining data according to the attribute information and the historical processing information;
the attribute information at least comprises the data quantity of the data to be processed and the completion time limit of the data to be processed.
According to a third aspect of the present application, there is provided an electronic device comprising: a processor and a memory for storing a computer program capable of running on the processor, wherein the processor is adapted to perform the method steps of any of the above data processing methods when the computer program is run.
According to a fourth aspect of the present application there is provided a computer readable storage medium having stored thereon a computer program which, when executed by a processor, performs the method steps of any of the above-described data processing methods.
Drawings
FIG. 1 is a schematic diagram of a data processing method according to the present application;
FIG. 2 is a schematic diagram II of a flow implementation of a data processing method according to the present application;
FIG. 3 is a schematic diagram III of a flow implementation of a data processing method according to the present application;
FIG. 4 is a schematic diagram of the structural components of an electronic device according to the present application;
fig. 5 is a schematic diagram showing a structural composition of an electronic device according to the present application.
Detailed Description
For the purpose of making the objects, technical solutions and advantages of the present application more apparent, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present application, and it is apparent that the described embodiments are only some embodiments of the present application, not all embodiments of the present application. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application. Embodiments of the application and features of the embodiments may be combined with one another arbitrarily without conflict. The steps illustrated in the flowchart of the figures may be performed in a computer system, such as a set of computer-executable instructions. Also, while a logical order is depicted in the flowchart, in some cases, the steps depicted or described may be performed in a different order than presented herein.
The technical scheme of the application is further elaborated below by referring to the drawings in the specification and the specific embodiments.
Fig. 1 is a schematic diagram of a flow implementation of a data processing method in the present application, where the method may be applied to a data platform, and the data platform includes a physical platform and a cloud platform. As shown in fig. 1, the method includes:
step 101, determining attribute information of data to be processed;
here, the attribute information includes at least a data amount of the data to be processed, a completion time limit of the data to be processed.
In one implementation of the application, the data volume of the data to be processed can be determined through the historical processing information of the processed data in the data platform, and the completion time limit of the data to be processed is determined based on the data volume; wherein the historical processing information includes, but is not limited to, historical processing efficiency information and historical processing data volume information of the processed data.
Here, the data platform may obtain a first history log and a second history log of processed data, wherein the first history log characterizes a history processing efficiency of the processed data; the second history log characterizes a history of processed data amounts of the processed data. Based on the first history log and the second history log, different data entry numbers processed by each job in different time can be obtained, and the size of the table processed by each job is combined, so that the size of the data amount processed by the job task in different time tasks is known. A first prediction model representing the data quantity of each task based on different time can be established through a linear regression or a neural network and other machine learning algorithms, and the data quantity of the data to be processed can be determined based on the first prediction model.
For example, the job name of the data to be processed (e.g Jo b 1 ) The working time (for example 2023.08.15, 10:00-10:10) is used as the input of the first prediction model, namely the data quantity (for example 100 pieces) of data to be processed can be output.
Here, the second history log may refer to data from an upstream node of the data platform. The upstream node is a data integration point that obtains data from a data source and transmits the data to a data platform.
In the application, the data platform can also perform feature extraction in the first history log of the processed data to obtain the feature information such as the operation processing time, the parallel processing strategy, the operation data volume and the like of the processed data. The characteristic information is processed through a linear regression or neural network and other machine learning algorithms, so that a second prediction model for representing the data processing capacity of the data platform under different parallel processing strategies at different times can be constructed. Based on the second prediction model, the data processing efficiency of the data platform under different job parallel processing strategies in different time periods of the day can be determined.
For example, the data processing capability of the data platform under different parallel processing strategies at different times may be:
time period: 0 to 1; number of concurrency: 10; treatment speed: 1 GB/s';
"time period: 0 to 1; number of concurrency: 11; treatment speed: 1.1 GB/s';
"time period: 0 to 1; number of concurrency: 12; treatment speed: 1 GB/s';
"time period: time 1 to time 2; number of concurrency: 10; treatment speed: 1.2 GB/s';
"time period: time 1 to time 2; number of concurrency: 11; treatment speed: 1.3GB/s; "
"time period: time 1 to time 2; number of concurrency: 12; treatment speed: 1.2 GB/s).
For example, the processing speed of the data to be processed is "1GB/s" when the working time of the data to be processed is "0 time to 1 time" and the concurrent data "10" is used as the input of the second preset model.
For another example, the processing speed of the data to be processed is "1.2GB/s" when the working time of the data to be processed is "1 time to 2 time" and the concurrent data "10" is used as the input of the second preset model.
Based on the data amount of the data to be processed and the data processing efficiency of the data platform under different operation parallel processing strategies in different time periods of the day, the data platform can determine the expected completion time limit of the data to be processed under the corresponding parallel processing strategies.
Here, when the data platform performs feature extraction in the first history log of the processed data, feature extraction may be performed by an outlier processing manner, a missing value processing manner, a feature construction manner, a feature screening manner, and a feature dimension reduction manner. The specific implementation means of feature extraction is not limited herein.
In another implementation mode of the application, the completion time limit and the data volume which are input to the data platform for the data to be processed can be received through the data platform. That is, the data platform has an input interface through which the business department can input attribute information to be processed to the data platform.
Step 102, determining a parallel processing strategy of the data according to the attribute information and the historical processing information;
according to the application, the data platform determines the data processing efficiency and the data processing amount corresponding to different parallel processing strategies based on the historical processing information of the processed data, wherein the historical processing data represents the historical processing efficiency of the processed data; the expected completion time and the expected completion data amount of the data to be processed under different parallel processing strategies can be determined based on the data processing efficiency corresponding to the different parallel processing strategies; based on the expected completion time and the expected amount of completion data, a parallel processing policy for the data to be processed may be determined.
Here, the historical processing information includes, but is not limited to, a first historical log in the data platform that characterizes historical processing efficiency of the processed data and a second historical log that characterizes a historical amount of processed data of the processed data.
In the application, the attribute information of the data to be processed also comprises a data priority weight value; the data platform may further determine, based on the priority weight value, a parallel processing policy that an expected completion time is within a completion time limit of the data to be processed and an expected completion data amount satisfies a processing condition as a parallel processing policy of the data.
Here, the data platform may determine a parallel processing policy in which the maximum amount of completion data is expected within the completion time limit of the data to be processed as the parallel processing policy of the data to be processed.
Or, the data platform may determine a parallel processing policy for the data to be processed, where the parallel processing policy predicts that the amount of the completed data reaches a preset proportion within a completion time limit of the data to be processed.
In the application, the data platform can determine the data priority weight value of the data to be processed based on the data source of the data to be processed; for example, a plurality of important data sources may be preset in the data platform, and each important data source has a corresponding priority weight value. When the data source of the data to be processed is at least one of a plurality of preset important data sources, determining the priority weight value corresponding to the data source as the data priority weight value of the data to be processed.
In the application, the data platform can also determine the data priority weight value of the data to be processed based on the data quantity and the completion time limit of the data to be processed; for example, the data platform may compare the completion time limit of the data to be processed with the current time, and if the comparison result indicates that the difference between the completion time limit and the current time is greater than a preset value and the data size of the data to be processed is smaller than the preset value, it is determined that the data size of the data to be processed is not particularly large or urgent, and the priority weight of the data to be processed may be set as the level B. If the comparison result indicates that the difference value between the completion time limit and the current time is smaller than or equal to a preset value and the data volume of the to-be-processed time is larger than or equal to the preset value, the data volume of the to-be-processed data is determined to be particularly large and particularly urgent, and the priority weight of the to-be-processed data can be set as A level. Wherein the priority of class a is greater than class B.
In the application, the user can set the data priority weight value of the data to be processed according to the requirement, and the method for determining the data priority weight value is not limited.
In the application, the data platform can also determine the data priority weight value of the data to be processed based on the sensitivity value of the data to be processed. Here, the sensitivity value may represent a privacy degree of the data, and may be a preset value. For example, if the sensitivity value of data a is 5 and the sensitivity value of data B is 2, and the privacy level of the sensitivity value 5 is greater than the sensitivity value 2, the priority weight value of data a may be set to be higher than that of data B.
In the application, the data platform can process the data to be processed based on the parallel processing strategy under the condition of obtaining the parallel processing strategy of the data to be processed.
Here, if the data to be processed includes a plurality of data priority weight values; the data platform can compare the priority weight values to obtain a priority sequence; and then processing the data to be processed based on the parallel processing strategy and the priority sequence. Wherein each priority weight value corresponds to a different type of data. Therefore, the data with high priority can be processed first, so that the processing efficiency and quality of the data with high priority can be ensured.
According to the data processing method provided by the application, the data platform predicts the parallel processing strategy of the data to be processed by the historical processing information of the processed data and the attribute information of the data to be processed, so that not only can the data processing amount of the current job be predicted, but also the job processing speeds of different data amounts under different parallel processing strategies can be predicted. And combining the processing priority weight values of different data to obtain an optimal parallel processing strategy so as to ensure that the data with high priority in the data to be processed is processed preferentially, thereby ensuring the processing efficiency and quality of the data with high priority. A number of problems in the data platform affecting the data processing capabilities are overcome. For example, for the situation of limited resources in a data platform, by considering the situation of dynamic change of data volume in the data processing process and combining the history logs of processed data in the data platform, the application establishes the efficiency prediction model and the data volume prediction model based on different parallel processing strategies, thereby not only providing sufficient basis for evaluating the completion time of the operation, but also effectively calculating the efficiency situation of the data platform for simultaneously processing different data volumes in different time periods and deducing the completion time of the data, and further making a decision for maximizing the service benefit by utilizing limited resources.
Fig. 2 is a schematic diagram II of a flow implementation of a data processing method in the present application, as shown in fig. 2, the method includes:
in step 201, a first prediction model and a second preset model are constructed by historical processing information of the processed data.
Here, the history processing information characterizes the history processing efficiency and the amount of history data of the processed data. The first prediction model is used for predicting and evaluating the data quantity to be processed in each scheduling of each task in the platform; the second prediction model is used for predicting and evaluating the data processing efficiency of the data platform under different parallel processing strategies.
As shown in fig. 3, a data amount prediction model (i.e., a first prediction model) is constructed based on the history log of the operation efficiency and data amount of each job in the data platform (including the job name, job start time, and end time, the data entry data and data size that the job processes in this operation schedule) for predicting and evaluating the data amount to be processed by each task in the platform in each subsequent schedule. In fig. 3: job (1) _Num1 refers to the amount of data that Job (1) is to process at time (1), job (1) _Num2 refers to the amount of data that Job (1) is to process at time (2); job (1) _NumN refers to the amount of data to be processed by Job (1) at time (N); the data amount to be processed by the operation (2) at the time (1) and the time (2); the job (N) is the amount of data to be processed in time N (N schedules).
As shown in fig. 3, based on the history log of each operation efficiency of the processed data in the data platform, extracting the concurrent number of the operation of each time period of the data platform and the overall data volume processing speed of the data platform of the time period through the feature engineering; and constructing a data processing capacity prediction model (namely a second prediction model) of the data platform under different concurrences of different times through a machine learning algorithm such as linear regression or a neural network. In fig. 3, concur (m+1) _V1 is the data amount processing speed in the case where the data platform processes m+1 jobs (the concurrency number is m+1) at the same time is V1, concur (m+1) _V2 is the data amount processing speed in the case where the data platform processes m+1 jobs (the concurrency number is m+1) at the same time is V2, …, concur (m+1) _Vm+1 is the data amount processing speed in the case where the data platform processes m+1 jobs (the concurrency number is m+1) at the same time is Vm+1; the data volume processing speed of the data platform in the case of processing m+2 jobs (the concurrence number is m+2) at the same time in the m+2 time period is vm+2; concur (m+n) _vm+n is the data volume processing speed in the case where the data platform processes m+n jobs (the concurrency number is m+n) simultaneously in the m+n time period, and vm+n.
Step 202, obtaining the predicted completion time of the data to be processed under different parallel processing strategies based on the first prediction model and the second prediction model.
As shown in fig. 3, the data processing efficiency of the data platform under the condition of different concurrent numbers in the current time period is obtained through the second prediction model, and then the data volume to be processed is predicted by each job task being queued in the current time period predicted by the first prediction model of the data platform, and the time of completing the job requirement input into the data platform by the service personnel in advance is combined, so that the task with the largest concurrent number in the current time period of the data platform can be calculated, and the current concurrent job number of the data platform can be dynamically adjusted.
Here, in the information of the task completion time required by the operator, a corresponding priority weight policy may be set, so that the operator defines task priority weights for each task, where N represents N task units with the highest weight, and the task priority decreases from high to low, and 1 represents completion of 1 task unit with the lowest weight. The data platform can be completed with the most task units in the current time period.
Here, the data platform predicts the data volume to be processed by each task currently according to the first prediction model and the second prediction model, calculates the predicted completion Time of each queued task in the queue under each parallel processing condition, and in fig. 3, concur (m+1) _Job (1) _Time refers to: when m+1 jobs are processed in parallel at the same time, the predicted completion time of job 1; concur (m+1) _Job (2) _Time refers to: when m+1 jobs are processed in parallel, the predicted completion time of job 2; concur (m+1) _Job (n) _Time refers to: when m+1 jobs are processed in parallel, the predicted completion time of job n; when m+2 jobs are processed in parallel at the same time, the predicted completion time of job 1; when m+2 jobs are processed in parallel at the same time, the predicted completion time of job 2; when m+2 jobs are processed in parallel, the predicted completion time of job n; when m+n jobs are processed in parallel at the same time, the predicted completion time of job 1; when m+n jobs are processed in parallel at the same time, the predicted completion time of job 2; concur (m+n) _Job (n) _Time refers to: when m+n jobs are processed in parallel at the same time, the expected completion time of job n.
Step 203, determining the parallel processing strategy of the data to be processed based on the expected completion time under different parallel processing strategies, the required completion time of the data to be processed, and the priority information of the data to be processed.
The predicted completion time of each queuing task under each concurrency condition is calculated based on the second prediction model according to the data quantity predicted by the first prediction model to be processed by each current task, and the task requirement time of the data to be processed is combined, so that the situation that the most tasks in the current queuing task can be completed in the task requirement time under which concurrency condition can be judged. When judging the maximum number of tasks, a strategy of setting a higher weight N for the tasks with high priority, namely, the tasks with high priority are equal to N jobs with low priority can be properly adopted according to the task priority, so that the concurrent number of the jobs which can run the tasks with the highest priority can be obtained. The job scheduling sequence which can be completed faster within the required time of the task is more forward, and when the job efficiency of different priorities is compared, the comparison and the re-sequencing are needed to be carried out by combining the weight ratio N of different priorities in the second prediction model. In order to prevent the delay of a job with a large data amount, it is necessary to delay the job by one time and increase the weight of the delayed job.
In fig. 3, job (1) _time refers to: job (1) requires completion Time, job (2) _time means: job (2) requires completion Time, …, job (n) _time refers to: job (n) requires a completion time.
In fig. 3, job (1) _priority refers to: job (1) Priority, job (2) _priority means: the Priority of Job (2), …, job (n) _priority refers to the Priority of Job (n).
And obtaining a job scheduling strategy through a sequencing algorithm. In fig. 3, job (1) _schedule means: job (2) _schedule refers to the scheduling policy for Job (1): the scheduling policy of Job (2), …, job (n) _schedule, refers to: scheduling policy for job (n).
The data platform can complete the concurrent number of the maximum task benefit in the limited resources in time, and the job scheduling sequence which can be completed faster in the service requirement time is more advanced. And when the job scheduling is adjusted, the weight is added by 1 for each time of the tasks scheduled after the delay, so that the low-priority jobs or jobs with more data processing capacity can be scheduled.
According to the data processing method provided by the application, the data amount to be processed of each job is predicted by combining the historical log data and the upstream log, the data amount to be processed of each job scheduled at this time is predicted, a sufficient basis is provided for evaluating the completion time of the job, and the efficiency prediction model and the data amount prediction model based on different concurrency conditions are established through the historical log conditions of the data platform, so that the efficiency conditions of the data platform for simultaneously processing different numbers of jobs can be effectively calculated, the completion time of the jobs is deduced, and therefore, the data platform can make a decision for maximizing the service benefit by utilizing limited resources.
Fig. 4 is a schematic diagram of the structural composition of an electronic device according to the present application, as shown in fig. 4, the electronic device includes:
a determining unit 401 for determining attribute information of data to be processed; the parallel processing strategy is used for determining data according to the attribute information and the historical processing information;
the attribute information at least comprises the data quantity of the data to be processed and the completion time limit of the data to be processed.
In a preferred embodiment, the determining unit 401 is further configured to determine, according to the history processing information, a data amount of the data to be processed; and determining a completion time limit of the data to be processed based on the data amount;
wherein the history processing information includes history processing efficiency information and history processing data amount information of the processed data.
In a preferred embodiment, the determining unit 401 may be configured to determine data processing efficiency and data processing amount corresponding to different parallel processing policies based on the historical processing information, where the historical processing data characterizes the historical processing efficiency of the processed data; determining the expected completion time and the expected completion data amount of the data to be processed under different parallel processing strategies based on the data processing efficiency corresponding to the different parallel processing strategies; and determining a parallel processing strategy of the data to be processed based on the expected completion time and the expected completion data amount.
In a preferred scheme, the attribute information further comprises a data priority weight value; the determining unit 401 is further configured to determine, as a parallel processing policy of data, a parallel processing policy that an expected completion time is within a completion time limit of the data to be processed and that an expected completion data amount satisfies a processing condition based on the priority weight value.
In a preferred embodiment, the determining unit 401 is further configured to determine a data priority value of the data to be processed based on a data source of the data to be processed; or determining a data priority value of the data to be processed based on the data amount of the data to be processed and the completion time limit; or determining a data priority value of the data to be processed based on the sensitivity value of the data to be processed.
In a preferred embodiment, the electronic device further includes: the processing unit 402 is configured to process the data to be processed based on the parallel processing policy.
In a preferred scheme, the data to be processed comprises a plurality of data priority weight values, and each priority weight value corresponds to different types of data;
in a preferred embodiment, the method further comprises:
comparing the priority weight values to obtain a priority sequence;
the processing unit 402 is configured to process the data to be processed based on the parallel processing policy and the priority sequence.
It should be noted that: in the electronic device provided in the above embodiment, when performing data processing, only the division of each program module is used as an example, in practical application, the processing allocation may be performed by different program modules according to needs, that is, the internal structure of the device is divided into different program modules, so as to complete all or part of the processing described above. In addition, the electronic device provided in the foregoing embodiment and the processing method embodiment provided in the foregoing embodiment belong to the same concept, and specific implementation processes of the electronic device are detailed in the method embodiment and are not repeated herein.
The embodiment of the application also provides electronic equipment, which comprises: a processor and a memory for storing a computer program capable of running on the processor,
wherein the processor is configured to execute any of the method steps of the above-described processing method when the computer program is run.
Fig. 5 is a schematic diagram of a second structural component of an electronic device according to the present application, where the electronic device 500 may be a mobile phone, a computer, a digital broadcast terminal, an information transceiver, a game console, a tablet device, a medical device, a fitness device, a personal digital assistant, or a server side, a cloud server. The electronic device 500 shown in fig. 5 includes: at least one processor 501, memory 502, at least one network interface 504, and a user interface 503. The various components in the electronic device 500 are coupled together by a bus system 505. It is understood that bus system 505 is used to enable connected communications between these components. The bus system 505 includes a power bus, a control bus, and a status signal bus in addition to a data bus. But for clarity of illustration the various buses are labeled as bus system 505 in fig. 5.
The user interface 503 may include, among other things, a display, keyboard, mouse, trackball, click wheel, keys, buttons, touch pad, or touch screen, etc.
It is to be appreciated that memory 502 can be either volatile memory or nonvolatile memory, and can include both volatile and nonvolatile memory. Wherein the nonvolatile Memory may be Read Only Memory (ROM), programmable Read Only Memory (PROM, programmable Read-Only Memory), erasable programmable Read Only Memory (EPROM, erasable Programmable Read-Only Memory), electrically erasable programmable Read Only Memory (EEPROM, electrically Erasable Programmable Read-Only Memory), magnetic random access Memory (FRAM, ferromagnetic random access Memory), flash Memory (Flash Memory), magnetic surface Memory, optical disk, or compact disk Read Only Memory (CD-ROM, compact Disc Read-Only Memory); the magnetic surface memory may be a disk memory or a tape memory. The volatile memory may be random access memory (RAM, random Access Memory), which acts as external cache memory. By way of example, and not limitation, many forms of RAM are available, such as static random access memory (SRAM, static Random Access Memory), synchronous static random access memory (SSRAM, synchronous Static Random Access Memory), dynamic random access memory (DRAM, dynamic Random Access Memory), synchronous dynamic random access memory (SDRAM, synchronous Dynamic Random Access Memory), double data rate synchronous dynamic random access memory (ddr SDRAM, double Data Rate Synchronous Dynamic Random Access Memory), enhanced synchronous dynamic random access memory (ESDRAM, enhanced Synchronous Dynamic Random Access Memory), synchronous link dynamic random access memory (SLDRAM, syncLink Dynamic Random Access Memory), direct memory bus random access memory (DRRAM, direct Rambus Random Access Memory). The memory 502 described in embodiments of the present application is intended to comprise, without being limited to, these and any other suitable types of memory.
The memory 502 in embodiments of the present application is used to store various types of data to support the operation of the electronic device 500. Examples of such data include: any computer programs for operation on the electronic device 500, such as an operating system 5021 and application 5022; contact data; telephone book data; a message; a picture; audio, etc. The operating system 5021 includes various system programs, such as a framework layer, a core library layer, a driver layer, and the like, for implementing various basic services and processing hardware-based tasks. The application 5022 may include various application programs such as a Media Player (Media Player), a Browser (Browser), and the like for implementing various application services. A program for implementing the method according to the embodiment of the present application may be included in the application 5022.
The method disclosed in the above embodiment of the present application may be applied to the processor 501 or implemented by the processor 501. The processor 501 may be an integrated circuit chip having signal processing capabilities. In implementation, the steps of the above method may be performed by integrated logic circuitry in hardware or instructions in software in the processor 501. The processor 501 may be a general purpose processor, a digital signal processor (DSP, digital Signal Processor), or other programmable logic device, discrete gate or transistor logic device, discrete hardware components, or the like. The processor 501 may implement or perform the methods, steps and logic blocks disclosed in embodiments of the present application. The general purpose processor may be a microprocessor or any conventional processor or the like. The steps of the method disclosed in the embodiment of the application can be directly embodied in the hardware of the decoding processor or can be implemented by combining hardware and software modules in the decoding processor. The software modules may be located in a storage medium in memory 502 and processor 501 reads information in memory 502 to perform the steps of the method described above in connection with its hardware.
In an exemplary embodiment, the electronic device 500 may be implemented by one or more application specific integrated circuits (ASIC, application Specific Integrated Circuit), DSPs, programmable logic devices (PLD, programmable Logic Device), complex programmable logic devices (CPLD, complex Programmable Logic Device), field-programmable gate arrays (FPGA, field-Programmable Gate Array), general purpose processors, controllers, microcontrollers (MCU, micro Controller Unit), microprocessors (Microprocessor), or other electronic components for performing the aforementioned methods.
In an exemplary embodiment, the present application also provides a computer-readable storage medium, such as a memory 502 including a computer program executable by the processor 501 of the electronic device 500 to perform the steps described in the foregoing methods. The computer readable storage medium may be FRAM, ROM, PROM, EPROM, EEPROM, flash Memory, magnetic surface Memory, optical disk, or CD-ROM; but may be a variety of devices including one or any combination of the above-described memories, such as a mobile phone, computer, tablet device, personal digital assistant, or the like.
A computer readable storage medium having stored thereon a computer program which, when executed by a processor, performs any of the method steps of the above-described processing method.
In the several embodiments provided by the present application, it should be understood that the disclosed apparatus and method may be implemented in other ways. The above described device embodiments are only illustrative, e.g. the division of the units is only one logical function division, and there may be other divisions in practice, such as: multiple units or components may be combined or may be integrated into another system, or some features may be omitted, or not performed. In addition, the various components shown or discussed may be coupled or directly coupled or communicatively coupled to each other via some interface, whether indirectly coupled or communicatively coupled to devices or units, whether electrically, mechanically, or otherwise.
The units described as separate units may or may not be physically separate, and units displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units; some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.
The methods disclosed in the method embodiments provided by the application can be arbitrarily combined under the condition of no conflict to obtain a new method embodiment.
The features disclosed in the several product embodiments provided by the application can be combined arbitrarily under the condition of no conflict to obtain new product embodiments.
The features disclosed in the embodiments of the method or the apparatus provided by the application can be arbitrarily combined without conflict to obtain new embodiments of the method or the apparatus.
The foregoing is merely illustrative of the present application, and the present application is not limited thereto, and any person skilled in the art will readily recognize that variations or substitutions are within the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims (10)

1. A data processing method, comprising:
determining attribute information of data to be processed;
determining a parallel processing strategy of the data according to the attribute information and the historical processing information;
the attribute information at least comprises the data quantity of the data to be processed and the completion time limit of the data to be processed.
2. The method of claim 1, the determining attribute information of the data to be processed, comprising:
determining the data quantity of the data to be processed according to the historical processing information;
determining a completion time limit of the data to be processed based on the data amount;
wherein the history processing information includes history processing efficiency information and history processing data amount information of the processed data.
3. The method of claim 1, wherein determining a parallel processing policy for data based on the attribute information and historical processing information comprises:
determining data processing efficiency and data processing amount corresponding to different parallel processing strategies based on the historical processing information, wherein the historical processing data represents the historical processing efficiency of the processed data;
determining the expected completion time and the expected completion data amount of the data to be processed under different parallel processing strategies based on the data processing efficiency corresponding to the different parallel processing strategies;
and determining a parallel processing strategy of the data to be processed based on the expected completion time and the expected completion data amount.
4. A method according to claim 3, wherein the attribute information further includes a data priority weight value;
said determining a parallel processing strategy for said data to be processed based on said predicted completion time and said predicted amount of completion data, comprising
And determining a parallel processing strategy of which the expected completion time is within the completion time limit of the data to be processed and the expected completion data quantity meets the processing condition as the parallel processing strategy of the data based on the priority weight value.
5. The method of claim 4, further comprising at least one of:
determining a data priority value of the data to be processed based on a data source of the data to be processed;
determining a data priority value of the data to be processed based on the data amount of the data to be processed and the completion time limit;
and determining a data priority weight value of the data to be processed based on the sensitivity value of the data to be processed.
6. The method of claim 1, the method further comprising:
and processing the data to be processed based on the parallel processing strategy.
7. The method of claim 6, wherein the data to be processed includes a plurality of data priority weight values, each priority weight value corresponding to a different type of data;
the processing the data to be processed based on the parallel processing strategy comprises the following steps:
comparing the priority weight values to obtain a priority sequence;
and processing the data to be processed based on the parallel processing strategy and the priority sequence.
8. An electronic device, comprising:
a determining unit for determining attribute information of the data to be processed; the parallel processing strategy is used for determining data according to the attribute information and the historical processing information;
the attribute information at least comprises the data quantity of the data to be processed and the completion time limit of the data to be processed.
9. An electronic device, comprising: a processor and a memory for storing a computer program capable of running on the processor, wherein the processor is adapted to perform the method steps of any of claims 1 to 7 when the computer program is run.
10. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, performs the method steps of any of claims 1 to 7.
CN202311075597.1A 2023-08-24 2023-08-24 Data processing method, device and storage medium Pending CN117194014A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311075597.1A CN117194014A (en) 2023-08-24 2023-08-24 Data processing method, device and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311075597.1A CN117194014A (en) 2023-08-24 2023-08-24 Data processing method, device and storage medium

Publications (1)

Publication Number Publication Date
CN117194014A true CN117194014A (en) 2023-12-08

Family

ID=89000818

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311075597.1A Pending CN117194014A (en) 2023-08-24 2023-08-24 Data processing method, device and storage medium

Country Status (1)

Country Link
CN (1) CN117194014A (en)

Similar Documents

Publication Publication Date Title
CN109298936B (en) Resource scheduling method and device
EP3117317B1 (en) Resource management based on device-specific or user-specific resource usage profiles
CN107483351B (en) Current limiting method and device
WO2017166643A1 (en) Method and device for quantifying task resources
CN106980532A (en) A kind of job scheduling method and device
CN109669775B (en) Distributed task scheduling method, system and storage medium
CN112148427A (en) Cloud platform resource allocation method and device and computer readable storage medium
CN111782359B (en) Distributed computing system task allocation method and related equipment
WO2021208786A1 (en) Thread management method and apparatus
CN112181613A (en) Heterogeneous resource distributed computing platform batch task scheduling method and storage medium
Choi et al. An enhanced data-locality-aware task scheduling algorithm for hadoop applications
US20140359182A1 (en) Methods and apparatus facilitating access to storage among multiple computers
CN111625367A (en) Method for dynamically adjusting read-write resources of file system
CN115211092A (en) Message pulling method and device and computer storage medium
CN112559176B (en) Instruction processing method and device
US9213575B2 (en) Methods and systems for energy management in a virtualized data center
CN110413393B (en) Cluster resource management method and device, computer cluster and readable storage medium
CN116450290A (en) Computer resource management method and device, cloud server and storage medium
CN117194014A (en) Data processing method, device and storage medium
CN109445863B (en) Data processing method, device, equipment and medium based on FPGA
CN112286623A (en) Information processing method and device and storage medium
CN115278786A (en) Service access method, device, electronic equipment and storage medium
CN114003388A (en) Method and device for determining task parameters of big data computing engine
CN114489978A (en) Resource scheduling method, device, equipment and storage medium
CN111262794B (en) Gateway flow control method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination