CN110399089B

CN110399089B - Data storage method, device, equipment and medium

Info

Publication number: CN110399089B
Application number: CN201810355324.5A
Authority: CN
Inventors: 高海军; 彭自然; 沈华品
Original assignee: Alibaba Group Holding Ltd
Current assignee: Alibaba Group Holding Ltd
Priority date: 2018-04-19
Filing date: 2018-04-19
Publication date: 2023-05-05
Anticipated expiration: 2038-04-19
Also published as: CN110399089A

Abstract

The embodiment of the application discloses a data storage method and device. The method comprises the following steps: the method comprises the steps of acquiring attribute information of a data processing node, determining a persistence storage rule according to the attribute information, and performing persistence storage on result data of the data processing node based on the persistence storage rule, so that a mechanism that the persistence storage can change according to attribute information change of the data processing node is realized, a flexible and changeable persistence storage rule can be provided by a flow engine, and adaptability of persistence storage of the flow engine to various complex business products is improved.

Description

Data storage method, device, equipment and medium

Technical Field

The present invention relates to the field of data processing technologies, and in particular, to a data storage method, a data storage device, a computer device, and a computer readable storage medium.

Background

With the development of informatization, people are increasingly relying on computers to process various data in the work. The work has a flow, which is composed of a plurality of nodes for processing data, and each node processes the respective data processing task. In executing a flow, it is necessary to change between different nodes, and it is generally determined by the flow engine according to the current state of the flow and the operation of the user, to which node the flow should be changed, that is, the flow engine controls the scheduling work of the relevant node.

The flow engine is used as a part of the application system and provides a core solution for determining information transmission route, content grade and the like according to different roles, division and conditions, wherein the information transmission route, the content grade and the like have a decision function for each application system. The flow engine comprises important functions of node management, flow direction management and the like of the flow.

The persistence of the result of processing data by a node is an essential component of a flow engine, and is referred to as "persistence" with respect to "non-persistence", where data in memory is persistence with respect to data in cache, and data in hard disk is persistence with respect to data in memory, that is, persistence data is more nonvolatile than non-persistence data.

The applicant finds that a large number of complex internet products need to use a process engine as a driving of a core service, but the conventional process engine cannot provide flexible and various persistence modes to meet various persistence requirements brought by various complex products, in short, the persistence modes supported by the conventional process engine are single, and cannot adapt to the complex requirements.

Disclosure of Invention

In view of the above, the present application has been made to provide a data storage method, a data storage device, and a computer apparatus, a computer-readable storage medium that overcome or at least partially solve the above problems.

According to one aspect of the present application, there is provided a data storage method comprising:

acquiring attribute information of a data processing node;

determining a persistence storage rule according to the attribute information;

and based on the persistence storage rule, persistence storage is carried out on the result data of the data processing node.

Optionally, the persistent storage rule includes a serialization store or a paradigm store.

Optionally, the attribute information includes at least one of a flow concurrency number of the data processing flow, a node number, and a task concurrency number of the data processing node.

Optionally, the determining the persistence storage rule according to the attribute information includes:

and determining a persistence storage rule according to the numerical range of the attribute information.

Optionally, the determining the persistence storage rule according to the numerical range of the attribute information includes:

if the attribute information exceeds the corresponding set numerical range, determining that the persistence storage rule comprises serialization storage;

and if the attribute information does not exceed the corresponding set numerical range, determining that the persistence storage rule comprises normal form storage.

Optionally, the attribute information includes at least one of a task type, a data type, and a node identifier of the data processing node;

The determining a persistence storage rule according to the attribute information includes:

and determining a persistence storage rule according to whether the attribute information comprises the set attribute information.

Optionally, the attribute information includes at least one of historical attribute information and current attribute information;

when the attribute information includes historical attribute information, the acquiring attribute information of the data processing node includes:

and acquiring a history record of flow execution, and processing the history attribute information of the node according to the history record statistical data.

Optionally, the method further comprises:

and adding the persistence storage rule to a configuration file of a data processing flow, so that when the result data of the data processing node is subjected to persistence storage, the persistence storage rule is read from the configuration file.

Optionally, the method further comprises:

and determining a storage container of the persistent storage according to the attribute information of the data processing node.

Optionally, the determining the storage container of the persistent storage according to the attribute information of the data processing node includes:

acquiring the life cycle of a data processing flow to which the data processing node belongs;

and selecting a storage container suitable for the life cycle, wherein the storage container comprises a cache, a memory, a hard disk or an external memory.

Optionally, the method further comprises:

and determining the file form of the persistent storage according to the attribute information of the data processing node.

Optionally, the determining the file form of the persistent storage according to the attribute information of the data processing node includes:

detecting whether a data processing task of the data processing node comprises a database call or not;

if the database call is included, determining that the file form of the persistent storage includes a database;

if the database call is not included, it is determined that the persisted file form includes a data file.

Optionally, the performing the persistence storage on the result data of the data processing node based on the persistence storage rule includes:

and calling a function plug-in corresponding to the persistence storage rule to perform persistence storage on the result data of the data processing node.

Optionally, the serializing storing includes:

obtaining result data of the data processing node, and reading various sub-data from the result data;

and sequentially combining the plurality of sub data and converting the sub data into a character string with a set format.

Optionally, the serializing storage further includes:

and adding a serialization version identifier into the converted character string.

Optionally, before the persisting of the result data of the data processing node based on the persisting rule, the method further comprises:

driving the data processing flow to a current data processing node according to the result data of the last data processing node;

and executing the data processing task corresponding to the current data processing node.

Correspondingly, the application also provides a data storage device, which comprises:

the information acquisition module is used for acquiring attribute information of the data processing node;

the rule determining module is used for determining a persistence storage rule according to the attribute information;

and the persistence storage module is used for carrying out persistence storage on the result data of the data processing node based on the persistence storage rule.

Optionally, the rule determining module includes:

and the first rule determining submodule is used for determining a persistence storage rule according to the numerical range of the attribute information.

Optionally, the first rule determining submodule includes:

the serialization storage determining unit is used for determining that the persistence storage rule comprises serialization storage if the attribute information exceeds the corresponding set numerical value range;

and the normal form storage determining unit is used for determining that the persistence storage rule comprises normal form storage if the attribute information does not exceed the corresponding set numerical value range.

the rule determination module includes:

and the second rule determining submodule is used for determining the persistence storage rule according to whether the attribute information comprises the set attribute information.

when the attribute information includes history attribute information, the information acquisition module includes:

and the information acquisition sub-module is used for acquiring a history record of flow execution and processing the history attribute information of the nodes according to the history record statistical data.

Optionally, the apparatus further comprises:

and the file adding module is used for adding the persistence storage rule to a configuration file of a data processing flow so as to read the persistence storage rule from the configuration file when the result data of the data processing node is subjected to persistence storage.

Optionally, the apparatus further comprises:

and the container determining module is used for determining a storage container of the persistent storage according to the attribute information of the data processing node.

Optionally, the container determining module includes:

the period acquisition sub-module is used for acquiring the life period of the data processing flow to which the data processing node belongs;

the storage container selecting submodule is used for selecting a storage container suitable for the life cycle, and the storage container comprises a cache, a memory, a hard disk or an external memory.

Optionally, the apparatus further comprises:

and the file form determining module is used for determining the file form of the persistent storage according to the attribute information of the data processing node.

Optionally, the file form determining module includes:

the call detection sub-module is used for detecting whether a data processing task of the data processing node comprises a database call or not;

the database determining submodule is used for determining that the file form of the persistent storage comprises a database if the database call is included;

and the data file determination submodule is used for determining that the file form of the persistent storage comprises the data file if the database call is not included.

Optionally, the persistent storage module includes:

And the plug-in calling sub-module is used for calling the functional plug-in corresponding to the persistence storage rule and carrying out persistence storage on the result data of the data processing node.

Optionally, the persistent storage module includes:

the sub-data reading sub-module is used for acquiring the result data of the data processing node and reading various sub-data from the result data;

and the character string conversion sub-module is used for sequentially combining various sub-data and converting the sub-data into character strings with set formats.

Optionally, the persistent storage module further includes:

the identification adding sub-module is used for adding the serialization version identification into the converted character string.

Optionally, the apparatus further comprises:

the flow driving module is used for driving the data processing flow to the current data processing node according to the result data of the last data processing node before the result data of the data processing node is subjected to persistent storage based on the persistent storage rule;

and the processing task execution module is used for executing the data processing task corresponding to the current data processing node.

Accordingly, the present application also provides a computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements a method as described above in one or more of the above when executing the computer program.

Accordingly, the present application also provides a computer readable storage medium having stored thereon a computer program, characterized in that the program when executed by a processor implements a method as described above for one or more of the above.

According to the embodiment of the application, the data processing node is subjected to the persistent storage according to the attribute information, the persistent storage rule is determined according to the attribute information, and the result data of the data processing node is subjected to the persistent storage based on the persistent storage rule, so that a mechanism that the persistent storage can change according to the attribute information change of the data processing node is realized, a flexible and changeable persistent storage rule can be provided by a flow engine, and the adaptability of the persistent storage of the flow engine to various complex business products is improved.

Further, the storage container and the file form of the persistent storage are determined according to the attribute information of the data processing nodes, so that a mechanism that the storage container and the file form change due to the attribute information change of the data processing nodes is realized, a flow engine can aim at different data processing nodes, and the flexibility of the flow engine in selecting the storage container and the file form is improved.

Further, by persisting the result data of the data processing node, the persisting rules may be read from the configuration file. The configuration file can be customized and modified by a service party according to the self requirements so as to customize the persistence storage rule meeting the self requirements, and the flexibility and the configurability of the persistence storage are improved.

The foregoing description is only an overview of the technical solutions of the present application, and may be implemented according to the content of the specification in order to make the technical means of the present application more clearly understood, and in order to make the above-mentioned and other objects, features and advantages of the present application more clearly understood, the following detailed description of the present application will be given.

Drawings

Various other advantages and benefits will become apparent to those of ordinary skill in the art upon reading the following detailed description of the preferred embodiments. The drawings are only for purposes of illustrating the preferred embodiments and are not to be construed as limiting the application. Also, like reference numerals are used to designate like parts throughout the figures. In the drawings:

FIG. 1 shows a schematic diagram of a data persistence storage process;

FIG. 2 illustrates a flow chart of one embodiment of a data storage method according to one embodiment of the present application;

FIG. 3 illustrates a flow chart of an embodiment of a data storage method according to a second embodiment of the present application;

FIG. 4 illustrates a flow chart of one embodiment of a data storage method according to a third embodiment of the present application;

FIG. 5 shows an architectural diagram of a flow engine;

FIG. 6 shows a schematic diagram of a data processing flow;

FIG. 7 illustrates a block diagram of an embodiment of a data storage device according to a fourth embodiment of the present application;

FIG. 8 illustrates an exemplary system that can be used to implement various embodiments described in this disclosure.

Detailed Description

Exemplary embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.

For a better understanding of the present application, the concepts to which the present application relates are described below:

the data processing flow includes a series of interconnected, automatically performed data processing nodes that can automatically communicate documents, information, or tasks using a computer according to some predetermined program. For example, when an employee needs to ask for a leave, the employee needs to submit an ask for leave application on the enterprise management platform, the ask for leave application occurs at the start node of the ask for leave process, and the process engine obtains the data of "ask for leave event", "ask for leave day", "to be examined" of the start node, and drives the process to the corresponding node according to the data, which may include two branches of processes: if the number of leave days exceeds 3 days, the next node is a department manager approval node, if the number of leave days does not exceed 3 days, the next node is a group leader approval node, and whichever branch is required to wait for the approval result of the department manager or the group leader, the flow engine drives the flow to an end node according to the data of the department manager approval node or the group leader approval node, and sends an email to inform staff of the approval result of the leave application. Any suitable data processing flow may be specifically included, which is not limited in this embodiment of the present application.

The data processing node may obtain result data after processing according to some input data, typically the input data of the data processing node is from the previous data processing node, and the result data may be transferred to the next data processing node. For example, in the employee leave process, the input data of the approval node of the department manager is the result data of the initial node ("leave, leave days", "leave approval" etc.), and after the approval result is input by the department manager, the result data includes "leave, leave days", "approve pass/fail" etc. Any suitable data processing node and its result data may be specifically included, which is not limited in the embodiments of the present application.

Accordingly, the processing tasks that are specifically performed on the data at the data processing node are data processing tasks. For example, "department manager approval", "group leader approval", "financial approval", "bank payment", etc., or any other suitable data processing task, embodiments of the present application are not limited in this regard.

The attribute information of the data processing node may include the flow concurrency number, the number of nodes, the task concurrency number of the data processing node, and the like of the data processing flow, and may also include the task type, the data type, the node identifier, and the like of the data processing node, or any other applicable attribute information, which is not limited in this embodiment of the present application.

In an optional embodiment of the present application, the attribute information may include at least one of a flow concurrency number of the data processing flow, a node number, and a task concurrency number of the data processing node. The flow concurrency number is used to characterize the number of certain data processing flows running simultaneously in the application system, for example, an employee leave flow may have multiple flow instances executing simultaneously, i.e., multiple employee request flows. The number of nodes is used to characterize the number of data processing nodes included in a certain data processing flow, for example, the employee leave flow includes 4 nodes including a "start node", "department manager approval node", "group leader approval node" and "end node". The concurrent data of tasks of the data processing nodes is used to characterize the number of tasks of a certain data processing node running simultaneously, for example, the number of tasks to be approved by a department manager running simultaneously in the employee leave process.

In an alternative embodiment of the present application, the attribute information may include at least one of a task type, a data type, and a node identification of the data processing node. The task type is used to characterize the type of task being performed, e.g., query task, list generation task, etc. The data type is used for representing the type of data related to the executed task, for example, data belonging to the leave type, such as 'leave days' in the leave process, and data belonging to the reimbursement type, such as 'reimbursement amount' in the reimbursement process. The node identification is used to uniquely identify the data processing node, e.g., a unique serial number assigned to each data processing node.

The result data of the data processing node is not only used for driving the flow to the next corresponding data processing node, but also used for recording the node to which the flow is currently running, so that the result data needs to be stored, namely, the result data is stored in a lasting mode.

Persistent storage is a relative concept in that data in memory is persistent with respect to data in cache, and data in hard disk is persistent with respect to data in memory, i.e., persistent data is more nonvolatile than non-persistent data.

Different persistent storage modes can be adopted for different data processing nodes and data processing flows so as to meet the running requirements of different software and the increasing complexity. Accordingly, persistent storage is controlled using persistent storage rules, including serialized storage, canonical storage, etc., or any other applicable rules, which are not limited in this embodiment of the present application.

In an alternative embodiment of the present application, the persistent storage rules may include a serialization store or a paradigm store. The serialization store may store the result data in an ordered format or byte sequence, for example, by convention, converting the result data into a string of characters, the string of characters being stored to represent the result data. The method can design a customized serialization protocol for different data processing flows to meet the needs of various data processing flows or data processing nodes, and specifically can include any applicable method for serializing and storing the result data, which is not limited in the embodiment of the present application. The serial storage can reduce redundant data storage, save storage space, reduce the data quantity transmitted during data transmission and improve data transmission efficiency.

The schema storage includes a persistence storage using a schema (database design schema) table, and when storing result data, a plurality of matching schema tables are needed to establish relationships between data, for example, an RDBMS (Relational Database Management System ) generally has a persistence mechanism with about 10 matching schema tables as a bottom layer, creates a schema table for relationships between various data in the result data of the data processing node, and performs persistence storage according to the schema table. In particular, any suitable manner may be included to store the resulting data pattern, which is not limited by the embodiments of the present application. The normal form storage reserves the direct relation between the data, is beneficial to inquiring the data, and can improve the inquiring speed of the data.

In an alternative embodiment of the present application, the storage container may include a cache, a memory, a hard disk, a flash memory, or the like, or any other suitable container, which is not limited in this embodiment of the present application.

In an alternative embodiment of the present application, the file form may include a database, a data file, etc., or any other suitable form, which is not limited by the embodiments of the present application. The database is a repository built on a computer storage device, for example, noSQL (Not Only SQL, generally referred to as a non-relational database), RDBMS, and the like, which organizes, stores, and manages data according to data results. The data files may include, without limitation, non-database files such as XML (eXtensible Markup Language) files, TXT (text) files, or any other suitable files.

In an alternative embodiment of the present application, the persistence storage rules may be implemented by a functional plug-in, and the flow engine may implement the persistence storage of the result data by invoking the functional plug-in corresponding to the persistence storage rules. For example, the serialized storage may be correspondingly written as a functional plug-in, the paradigm storage may be correspondingly written as a functional plug-in, the storage in the mechanical hard disk may be correspondingly written as a functional plug-in, the storage in the solid state hard disk may be correspondingly written as a functional plug-in, the storage in the memory may be correspondingly written as a functional plug-in, or any other suitable functional plug-in, which the embodiments of the present application do not limit. The function plug-ins are mutually independent, so that various persistence storage rules have the characteristics of easy modification, flexibility, variability, strong maintainability and the like.

According to one embodiment of the present application, the flow engine essentially needs to persist the data. The application provides a data storage scheme, such as a schematic diagram of a data persistence storage process shown in fig. 1, wherein the process can be applied to a process engine of an application system, the process engine can acquire attribute information of a data processing node, then determine persistence storage rules according to the attribute information, and then perform persistence storage on result data of the data processing node based on the persistence storage rules, so that a mechanism that the persistence storage can change due to attribute information change of the data processing node is realized, the process engine can provide flexible and changeable persistence storage rules, and the adaptability of the persistence storage of the process engine to various complex business products is improved. The application is applicable to, but not limited to, the application scenario described above.

Referring to fig. 2, a flowchart of an embodiment of a data storage method according to a first embodiment of the present application is shown, and the method may specifically include the following steps:

step 101, obtaining attribute information of a data processing node.

In this embodiment of the present application, the flow engine may obtain multiple attribute information, and for different types of attribute information, may specifically include obtaining attribute information before the flow runs to the data processing node, obtaining attribute information after the flow runs to the data processing node, or any other applicable manner, which is not limited in this embodiment of the present application.

For example, in the example of the leave-out process, the history of the leave-out process is first obtained, and then the number of concurrent processes of the leave-out process, the number of nodes included in the leave-out process, the number of concurrent tasks of each data processing node included in the leave-out process, and other attribute information are counted. For another example, in the case of the leave-out process, for the "department manager approval node", attribute information such as a task type of the data processing node is an approval task, a data type is leave-out related data, and a node identifier of the "department manager approval node" is acquired.

And 102, determining a persistence storage rule according to the attribute information.

In this embodiment of the present application, the persistence storage rule may be determined according to one or more attribute information, where for a change of the persistence storage rule, different attribute information plays different roles, for example, when the number of concurrent flows is higher and reaches a certain threshold, the persistence storage rule adopts a serialized storage manner, or if the data processing node involves a task of a query type, a normal form storage manner is adopted if the task of the query type involves.

The persistence storage rules may be determined by a numerical range of the attribute information, by whether the attribute information includes set attribute information, or by any other suitable manner, which is not limited in this embodiment of the present application.

For example, in the employee leave process, if the number of simultaneously running leave processes exceeds a set threshold, a persistence storage rule of the serialization storage is adopted in the leave process. For another example, in the employee leave process, if there is a query task in the "department manager approval node", the persistent storage rule of the paradigm store may be employed.

In the embodiment of the application, the persistent storage rule is determined for the data processing nodes, and in one data processing flow, each data processing node can adopt one persistent storage rule or can respectively determine each persistent storage rule for each data processing node.

And step 103, based on the persistence storage rule, persistence storing the result data of the data processing node.

In the embodiment of the application, after the data processing node operates, result data is obtained, and after the flow engine obtains the result data, the result data is subjected to persistent storage based on the persistent storage rule of the data processing node. For example, after the result data is serialized, the serialized data is stored in a database of the hard disk.

In one implementation, the persistent storage rule may be implemented by a functional plug-in, and when the flow engine drives the data processing flow to the data processing node and obtains the result data, the functional plug-in implementing the persistent storage rule is called to complete the persistent storage of the result data. Any other suitable implementation may be specifically included, and embodiments of the present application are not limited thereto.

In an optional embodiment of the present application, the attribute information may include at least one of a task type, a data type, and a node identifier of the data processing node, and correspondingly, an implementation manner of determining the persistent storage rule according to the attribute information may include: and determining a persistence storage rule according to whether the attribute information comprises the set attribute information.

The setting attribute information is preset, and may be modified by changing configuration, including setting task types including query tasks, execution list generation tasks, and the like, setting data types, setting node identifiers, and the like. The set attribute information may have a set correspondence with the persistent storage rule, for example, the attribute information includes the set attribute information, then the persistent storage rule of the paradigm store is determined, or the attribute information does not include the set attribute information, then the persistent storage rule of the serialization store is determined, or any other applicable manner determines the persistent storage rule, which is not limited in this embodiment of the present application.

In one implementation manner, the persistence storage rule can be determined according to the flow concurrency number of the data processing flow, the number of nodes, the task concurrency number of the data processing nodes and other attribute information, and the task type, the data type, the node identification and other attribute information of the data processing nodes. For example, in the employee leave-out process, if the number of concurrent processes of the leave-out process exceeds 100 per second and the task type does not include the query task, it is determined that the persistence storage rule adopts the serialization storage. For another example, in the employee leave process, if the "department manager approves the nodes" includes a list generating task and the number of nodes included in the whole leave process is not more than 10, it is determined that the persistent storage rule is stored in a normal form, and specifically, any applicable mode may be included, which is not limited in the embodiment of the present application.

In an optional embodiment of the present application, the attribute information may include at least one of historical attribute information and current attribute information, and when the attribute information includes historical attribute information, an implementation manner of obtaining the attribute information of the data processing node may include: and acquiring a history record of flow execution, and processing the history attribute information of the node according to the history record statistical data.

The attribute information of the data processing node currently running in the flow engine is recorded as current attribute information, and the attribute information acquired before the data processing node is run is recorded as historical attribute information. The attribute information acquired by the flow engine may include at least one of current attribute information and historical attribute information. When the attribute information includes the historical attribute information, the data processing flow is executed to generate a history record, for example, in employee leave flows, the history record includes the number of leave flows executed each day in the past year, and according to this history record, the average number of leave flows executed each day, or the highest value of the number of leave flows of a single day, etc., i.e. the historical attribute information is counted.

Referring to fig. 3, a flowchart of an embodiment of a data storage method according to a second embodiment of the present application is shown, where the method specifically may include the following steps:

Step 201, obtaining attribute information of a data processing node.

In the embodiments of the present application, the specific implementation manner of this step may refer to the description in the foregoing embodiments, which is not repeated herein.

Step 202, determining a persistence storage rule according to the numerical range of the attribute information.

In the embodiment of the application, when the attribute information is a numerical value, the persistent storage rule may be determined according to a numerical value range in which the attribute information is located. In general, the higher the number of flow concurrency, the more necessary the use of serialization storage, the higher the number of task concurrency of data processing nodes, and the more necessary the use of serialization storage.

In specific implementation, the data range is preset for different attribute information, the set numerical value range can be adjusted according to actual conditions, and when the attribute information falls into the set numerical value range, the persistence storage rule corresponding to the falling numerical value range is determined.

In an alternative embodiment of the present application, an implementation of determining the persistence storage rule according to the numerical range of the attribute information may include: and if the attribute information does not exceed the corresponding set numerical range, determining that the persistence storage rule comprises normal form storage.

For example, if the number of concurrent flows exceeds 100 per second, the storage is determined to be the serialization storage, and if the number of concurrent flows does not exceed 100 per second, the storage is determined to be the norm storage.

The attribute information includes a plurality of types, and each type of attribute information may be respectively provided with a corresponding numerical range, and when determining the persistence storage rule, the numerical ranges of the plurality of types of attribute information may be integrated to perform the determination. In one implementation, priorities may be set for different attribute information, that is, attribute information with a high priority exceeds a corresponding set numerical range, but attribute information with a low priority does not exceed a corresponding set numerical range, and the persistence storage rule is determined to include serialization storage based on the high priority. In another implementation, it may also be required that all attribute information does not exceed the corresponding set value range, and determining that the persistent storage rule includes a normal form storage, and otherwise determining a serialized storage. The persistence storage rules may be specifically determined in any applicable manner, which is not limited by the embodiments of the present application.

Step 203, determining a storage container of the persistent storage according to the attribute information of the data processing node.

In this embodiment of the present application, the attribute information may also determine a storage container for persistent storage, for example, the data type may determine the storage container, if the data is related data about asking for falsification, the storage container may be temporarily stored, the storage container may be determined as a memory, if the data is related data about reimbursement, the long-term storage may be required, and the storage container may be determined as a hard disk. Any suitable manner may be specifically adopted, and the embodiments of the present application do not limit this.

In an alternative embodiment of the present application, an implementation of the determining a storage container of the persistent storage according to the attribute information of the data processing node may include: and acquiring the life cycle of the data processing flow to which the data processing node belongs, and selecting a storage container suitable for the life cycle, wherein the storage container comprises a cache, a memory, a hard disk or an external memory.

The life cycle of the data processing flow, that is, the time span from the start to the end of the data processing flow, may also be different, so that the average life cycle, the longest life cycle, etc. may be used, which is not limited in the embodiments of the present application.

Generally, the longer the life cycle, the longer the data needs to be stored, but the space of volatile storage media such as a cache and a memory is limited, the stability is poor, the data is easy to be lost, and according to the sequence from the long life cycle to the short life cycle, the storage containers such as a hard disk or an external memory, a cache and the like are corresponding to the storage containers suitable for the life cycle, and the storage containers suitable for the life cycle are selected as the storage containers for persistent storage. For the data processing flow with short life cycle, the operation speed can be improved by selecting cache or content, and for the data processing flow with long life cycle, the stability of data can be improved by selecting hard disk or external memory, and the data is prevented from being lost due to power failure.

Step 204, determining the file form of the persistent storage according to the attribute information of the data processing node.

In this embodiment of the present application, the attribute information may also determine a file form of persistent storage, for example, the task type may determine a file form, where the file form uses a database if the data processing node includes a query task, and uses a data file if the data processing node does not include a query task. Any suitable manner may be specifically adopted, and the embodiments of the present application do not limit this.

In an alternative embodiment of the present application, an implementation of determining a file form of the persistent storage according to attribute information of the data processing node may include: detecting whether a data processing task of the data processing node comprises a database call, if so, determining that the file form of the persistent storage comprises the database, and if not, determining that the file form of the persistent storage comprises the data file.

The data processing node may include one or more data processing tasks, detect whether the data processing task includes a database call, if so, determine that the file form of the persistent storage includes a database, so as to facilitate improving the speed of query and improve the operation efficiency, and if not, determine that the file form of the persistent storage includes a data file, so as to reduce the storage volume and save the storage space.

In particular implementations, the attribute information of the data processing node may determine both the persistence storage rules, the storage container, and the file form. For example, in employee leave-out processes, attribute information includes a process concurrency number, a life cycle of a data processing process, task types included in a data processing node, and the like, if the process concurrency number exceeds 100 processes per second, the life cycle exceeds 1 day, and no query task is included in the data processing node, serialized storage is adopted, a storage container is determined to be hard disk storage, and the storage is performed in a file form of a data file.

Step 205, driving the data processing flow to the current data processing node according to the result data of the previous data processing node.

In the embodiment of the application, the flow engine acquires the result data of the last data processing node, and drives the data processing flow to the current data processing node according to the result data of the last data processing node.

And 206, executing the data processing task corresponding to the current data processing node.

In this embodiment of the present application, the current data processing node may include one or more data processing tasks, and execute the corresponding data processing task to obtain result data of the current data processing result.

Step 207, based on the persistence storage rule, persistence storing the result data of the data processing node.

According to the embodiment of the application, the data processing flow is driven to the current data processing node according to the result data of the last data processing node by acquiring the attribute information of the data processing node and determining the persistence storage rule according to the numerical range of the attribute information, the data processing task corresponding to the current data processing node is executed, and the persistence storage is carried out on the result data of the data processing node based on the persistence storage rule, so that a mechanism that the persistence storage can change according to the attribute information change of the data processing node is realized, a flexible and changeable persistence storage rule can be provided by a flow engine, and the adaptability of the persistence storage of the flow engine to various complex business products is improved.

Referring to fig. 4, a flowchart of an embodiment of a data storage method according to a third embodiment of the present application is shown, and the method may specifically include the following steps:

step 301, obtaining attribute information of a data processing node.

Step 302, determining a persistence storage rule according to the attribute information.

And 303, adding the persistence storage rule to a configuration file of a data processing flow, so that when the result data of the data processing node is subjected to persistence storage, the persistence storage rule is read from the configuration file.

In the embodiment of the present application, the configuration file is used for controlling the data processing flow, including controlling the persistent storage, and specifically, the persistent storage rule may be added to the configuration file of the data processing flow, and specifically, the persistent storage rule may include the persistent storage rule for each data processing node in the data processing flow.

The implementation is that when the result data of the data processing node is subjected to persistent storage, the persistent storage rule can be read from the configuration file. The configuration file can be customized and modified by a service party according to the self requirements so as to customize the persistence storage rule meeting the self requirements, and the flexibility and the configurability of the persistence storage are improved.

And step 304, calling a function plug-in corresponding to the persistence storage rule, and performing persistence storage on the result data of the data processing node.

In the embodiment of the application, in order to realize different persistence storage rules, corresponding function plug-ins are respectively provided, and the function plug-ins are called to perform persistence storage on the result data of the data processing nodes.

It should be noted that, when the storage container and the file form also have corresponding plug-ins, a function plug-in may be written separately, and then several function plug-ins may be called to complete the persistent storage, for example, the serialized storage may be written correspondingly as a function plug-in, the normal storage may be written correspondingly as a function plug-in, the mechanical hard disk may be written correspondingly as a function plug-in, the solid hard disk may be written correspondingly as a function plug-in, the database may be written correspondingly as a function plug-in, the data file may be written correspondingly as a function plug-in, and the serialized storage function plug-in, the solid hard disk function plug-in and the database function plug-in may be called to perform the persistent storage together. Or a functional plug-in comprising a persistent storage rule, a storage container and a file form can be written to complete the persistent storage, for example, one functional plug-in can be correspondingly written for the data file in the manner of serialization storage, storage in a solid state disk, and another functional plug-in can be correspondingly written for the data file in the manner of serialization storage, storage in a mechanical disk. Any suitable functional plug-in may be specifically included, which is not limited by the embodiments of the present application.

In an alternative embodiment of the present application, an implementation of the serialization storage may include: and obtaining result data of the data processing node, reading various sub-data from the result data, combining the various sub-data in sequence, and converting the sub-data into character strings with set formats. The result data may include various sub-data, such as "leave days" and "leave events" in the employee leave process. After the sub data is read, a plurality of sub data are combined according to a set sequence, and then the sub data are converted into character strings with a set format.

In an alternative embodiment of the present application, the serialization storage may further include: and adding a serialization version identifier into the converted character string.

The serialization version identifier is used for identifying the version of the serialization protocol, and the serialization version identifier is added because the serialization protocol can be changed, so that the original result data can be obtained by utilizing the corresponding serialization protocol according to the serialization version identifier in the reverse serialization process.

According to the embodiment of the application, the attribute information of the data processing node is obtained, the persistence storage rule is determined according to the attribute information, the persistence storage rule is added to the configuration file of the data processing flow, the function plug-in corresponding to the persistence storage rule is called to perform persistence storage on the result data of the data processing node, a mechanism that the persistence storage can change due to the change of the attribute information of the data processing node is realized, the flow engine can provide flexible and changeable persistence storage rules, and the adaptability of the persistence storage of the flow engine to various complex business products is improved.

For a better understanding of the present application by those skilled in the art, one implementation of the present application is described below by way of specific examples.

An architecture diagram of the flow engine as shown in fig. 5.

The bottom layer of the flow engine is the underlying settings layer, the responsibilities of which mainly include the act of providing XML parsing functionality and loading the various data processing flow nodes in plug-ins.

The process definition analysis layer mainly has the function of determining analysis period models and resolvers corresponding to the models. The process is defined as follows: a complete data processing flow is described, which consists of a plurality of data processing nodes, including the basic information of the data processing flow, the starting and ending conditions of the data processing flow, the data processing nodes formed, the rules of circulation among the data processing nodes, the tasks required to be executed by users, the application programs possibly called, the flow related data and other information.

After the flow definition is resolved, the data obtained by resolving is converted into a running period model for execution, and the running period corresponds to the running period behavior, wherein the running period behavior comprises the persistent storage of the result data. The process virtual machine is the basis of a multi-process language. Native support for any flow language may be built on top of the flow virtual machine. The runtime behavior of each activity in the flowchart is delegated to an interface of a java.

The flow engine serves as a portal for all services, providing an API (Application Programming Interface ).

A schematic diagram of the data processing flow is shown in fig. 6.

Step 1, the data processing flow is started or driven, respectively for the newly generated data processing flow and for the pre-existing data processing flow.

And 2, configuring parameters including parameters of the current service request and the like.

And step 3, searching for flow definitions, namely searching for corresponding flow definitions according to the configured parameters.

Step 4, creating an instance object, and the flow engine creates a new flow instance object (namely, a data processing flow) or drives the data processing flow to a corresponding data processing node according to the current service request parameters and the flow definition.

And 5, calculating the link state, and running the data processing task in the data processing node to obtain the result data (namely the link state).

Step 6, storing instance state, namely, persistent storage of result data, wherein DB storage (database storage), memory storage, custom storage (including serialization storage) and the like can be selected.

Referring to fig. 7, a block diagram of an embodiment of a data storage device according to a fourth embodiment of the present application is shown, which may specifically include:

An information acquisition module 401, configured to acquire attribute information of a data processing node;

a rule determining module 402, configured to determine a persistence storage rule according to the attribute information;

and the persistence storage module 403 is configured to perform persistence storage on the result data of the data processing node based on the persistence storage rule.

In an alternative embodiment of the present application, the persistent storage rule comprises a serialization store or a paradigm store.

In an optional embodiment of the present application, the attribute information includes at least one of a flow concurrency number of the data processing flow, a node number, and a task concurrency number of the data processing node.

In an alternative embodiment of the present application, the rule determining module includes:

In an alternative embodiment of the present application, the first rule determining submodule includes:

In an optional embodiment of the present application, the attribute information includes at least one of a task type, a data type, and a node identifier of the data processing node;

the rule determination module includes:

In an optional embodiment of the present application, the attribute information includes at least one of historical attribute information and current attribute information;

In an alternative embodiment of the present application, the apparatus further comprises:

In an alternative embodiment of the present application, the container determining module includes:

In an alternative embodiment of the present application, the file form determining module includes:

In an alternative embodiment of the present application, the persistent storage module includes:

In an alternative embodiment of the present application, the persistent storage module further includes:

For the device embodiments, since they are substantially similar to the method embodiments, the description is relatively simple, and reference is made to the description of the method embodiments for relevant points.

Embodiments of the present disclosure may be implemented as a system configured as desired using any suitable hardware, firmware, software, or any combination thereof. Fig. 8 schematically illustrates an example system (or apparatus) 500 that may be used to implement various embodiments described in this disclosure.

For one embodiment, FIG. 8 illustrates an exemplary system 500 having one or more processors 502, a system control module (chipset) 504 coupled to at least one of the processor(s) 502, a system memory 506 coupled to the system control module 504, a non-volatile memory (NVM)/storage device 508 coupled to the system control module 504, one or more input/output devices 510 coupled to the system control module 504, and a network interface 512 coupled to the system control module 504.

The processor 502 may include one or more single-core or multi-core processors, and the processor 502 may include any combination of general-purpose or special-purpose processors (e.g., graphics processor, application processor, baseband processor, etc.). In some embodiments, the system 500 can function as a browser as described in embodiments of the present application.

In some embodiments, the system 500 can include one or more computer-readable media (e.g., system memory 506 or NVM/storage 508) having instructions and one or more processors 502, in combination with the one or more computer-readable media, configured to execute the instructions to implement the modules to perform the actions described in this disclosure.

For one embodiment, the system control module 504 may include any suitable interface controller to provide any suitable interface to at least one of the processor(s) 502 and/or any suitable device or component in communication with the system control module 504.

The system control module 504 may include a memory controller module to provide an interface to the system memory 506. The memory controller modules may be hardware modules, software modules, and/or firmware modules.

The system memory 506 may be used, for example, to load and store data and/or instructions for the system 500. For one embodiment, system memory 506 may include any suitable volatile memory, such as, for example, a suitable DRAM. In some embodiments, system memory 506 may comprise a double data rate type four synchronous dynamic random access memory (DDR 4 SDRAM).

For one embodiment, the system control module 504 may include one or more input/output controllers to provide an interface to the NVM/storage 508 and the input/output device(s) 510.

For example, NVM/storage 508 may be used to store data and/or instructions. NVM/storage 508 may include any suitable nonvolatile memory (e.g., flash memory) and/or may include any suitable nonvolatile storage device(s) (e.g., one or more Hard Disk Drives (HDDs), one or more Compact Disc (CD) drives, and/or one or more Digital Versatile Disc (DVD) drives).

NVM/storage 508 may include storage resources that are physically part of the device on which system 500 is installed or which may be accessed by the device without being part of the device. For example, NVM/storage 508 may be accessed over a network via input/output device(s) 510.

Input/output device(s) 510 may provide an interface for system 500 to communicate with any other suitable device, input/output device 510 may include communication components, audio components, sensor components, and the like. Network interface 512 may provide an interface for system 500 to communicate over one or more networks, and system 500 may wirelessly communicate with one or more components of a wireless network according to any of one or more wireless network standards and/or protocols, such as accessing a wireless network based on a communication standard, such as WiFi,2G, or 3G, or a combination thereof.

For one embodiment, at least one of the processor(s) 502 may be packaged together with logic of one or more controllers (e.g., memory controller modules) of the system control module 504. For one embodiment, at least one of the processor(s) 502 may be packaged together with logic of one or more controllers of the system control module 504 to form a System In Package (SiP). For one embodiment, at least one of the processor(s) 502 may be integrated on the same die with logic of one or more controllers of the system control module 504. For one embodiment, at least one of the processor(s) 502 may be integrated on the same die with logic of one or more controllers of the system control module 504 to form a system on chip (SoC).

In various embodiments, system 500 may be, but is not limited to being: a browser, workstation, desktop computing device, or mobile computing device (e.g., a laptop computing device, handheld computing device, tablet, netbook, etc.). In various embodiments, system 500 may have more or fewer components and/or different architectures. For example, in some embodiments, system 500 includes one or more cameras, keyboards, liquid Crystal Display (LCD) screens (including touch screen displays), non-volatile memory ports, multiple antennas, graphics chips, application Specific Integrated Circuits (ASICs), and speakers.

Wherein if the display comprises a touch panel, the display screen may be implemented as a touch screen display to receive input signals from a user. The touch panel includes one or more touch sensors to sense touches, swipes, and gestures on the touch panel. The touch sensor may sense not only the boundary of a touch or slide action, but also the duration and pressure associated with the touch or slide operation.

The embodiment of the application also provides a non-volatile readable storage medium, in which one or more modules (programs) are stored, where the one or more modules are applied to a terminal device, and the terminal device may be caused to execute instructions (instructions) of each method step in the embodiment of the application.

In one example, a computer device is provided, comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements a method as in embodiments of the present application when executing the computer program.

There is also provided in one example a computer readable storage medium having stored thereon a computer program, characterized in that the program when executed by a processor implements a method as in one or more of the embodiments of the present application.

The embodiment of the application discloses a data storage method and device, and example 1 includes a data storage method, including:

acquiring attribute information of a data processing node;

determining a persistence storage rule according to the attribute information;

Example 2 may include the method of example 1, wherein the persistent storage rule comprises a serialization store or a paradigm store.

Example 3 may include the method of example 1 and/or example 2, wherein the attribute information includes at least one of a flow concurrency number of the data processing flows, a number of nodes, and a task concurrency number of the data processing nodes.

Example 4 may include the method of one or more of examples 1-3, wherein the determining a persistence storage rule from the attribute information comprises:

Example 5 may include the method of one or more of examples 1-4, wherein the determining a persistence storage rule from the range of values of the attribute information comprises:

Example 6 may include the method of one or more of examples 1-5, wherein the attribute information includes at least one of a task type, a data type, a node identification of the data processing node;

Example 7 may include the method of one or more of examples 1-6, wherein the attribute information includes at least one of historical attribute information and current attribute information;

Example 8 may include the method of one or more of examples 1-7, wherein the method further comprises:

Example 9 may include the method of one or more of examples 1-8, wherein the method further comprises:

Example 10 may include the method of one or more of examples 1-9, wherein the determining a storage container of persistent storage from attribute information of the data processing node comprises:

Example 11 may include the method of one or more of examples 1-10, wherein the method further comprises:

Example 12 may include the method of one or more of examples 1-11, wherein the determining a file form of the persistent storage based on attribute information of the data processing node comprises:

Example 13 may include the method of one or more of examples 1-12, wherein the persisting the result data of the data processing node based on the persisting storage rules comprises:

Example 14 may include the method of one or more of examples 1-13, wherein the serializing the storage comprises:

Example 15 may include the method of one or more of examples 1-14, wherein the serializing store further comprises:

Example 16 may include the method of one or more of examples 1-15, wherein, prior to the persisting the result data of the data processing node based on the persisting storage rule, the method further comprises:

Example 17 includes a data storage device, comprising:

Example 18 may include the apparatus of example 17, wherein the persistent storage rule comprises a serialization store or a paradigm store.

Example 19 may include the apparatus of example 17 and/or example 18, wherein the attribute information includes at least one of a flow concurrency number of the data processing flows, a number of nodes, and a task concurrency number of the data processing nodes.

Example 20 may include the apparatus of one or more of examples 17-19, wherein the rule determination module comprises:

Example 21 may include the apparatus of one or more of examples 17-20, wherein the first rule determination submodule includes:

Example 22 may include the apparatus of one or more of examples 17-21, wherein the attribute information includes at least one of a task type, a data type, a node identification of the data processing node;

the rule determination module includes:

Example 23 may include the apparatus of one or more of examples 17-22, wherein the attribute information includes at least one of historical attribute information and current attribute information;

Example 24 may include the apparatus of one or more of examples 17-23, wherein the apparatus further comprises:

Example 25 may include the apparatus of one or more of examples 17-24, wherein the apparatus further comprises:

Example 26 may include the apparatus of one or more of examples 17-25, wherein the container determination module comprises:

Example 27 may include the apparatus of one or more of examples 17-26, wherein the apparatus further comprises:

Example 28 may include the apparatus of one or more of examples 17-27, wherein the file form determination module comprises:

Example 29 may include the apparatus of one or more of examples 17-28, wherein the persistent storage module comprises:

Example 30 may include the apparatus of one or more of examples 17-29, wherein the persistent storage module comprises:

Example 31 may include the apparatus of one or more of examples 17-30, wherein the persistent storage module further comprises:

Example 32 may include the apparatus of one or more of examples 17-31, wherein the apparatus further comprises:

Example 33 includes a computer device comprising a memory, a processor, and a computer program stored on the memory and executable on the processor, wherein the processor implements the method of one or more of examples 1-16 when the computer program is executed.

Example 34 includes a computer-readable storage medium having stored thereon a computer program, wherein the program when executed by a processor implements a method as in one or more of examples 1-16.

While certain embodiments have been illustrated and described for purposes of description, various alternative, and/or equivalent implementations, or calculations, may be made to achieve the same purpose without departing from the scope of the implementations of the present application. This application is intended to cover any adaptations or variations of the embodiments discussed herein. It is manifestly, therefore, that the embodiments described herein are limited only by the claims and the equivalents thereof.

Claims

1. A method of data storage, comprising:

acquiring attribute information of a data processing node, wherein the attribute information comprises at least one of the flow concurrency number of a data processing flow, the number of nodes and the task concurrency number of the data processing node;

determining a persistence storage rule according to the attribute information;

2. The method of claim 1, wherein the persistent storage rule comprises a serialization store or a paradigm store.

3. The method of claim 1, wherein said determining a persistence storage rule based on said attribute information comprises:

4. A method according to claim 3, wherein said determining a persistence storage rule based on a range of values of said attribute information comprises:

5. The method of claim 1, wherein the attribute information comprises at least one of a task type, a data type, a node identification of the data processing node;

6. The method of claim 1, wherein the attribute information includes at least one of historical attribute information and current attribute information;

7. The method according to claim 1, wherein the method further comprises:

8. The method according to claim 1, wherein the method further comprises:

9. The method of claim 8, wherein the determining a storage container for persistent storage based on attribute information of the data processing node comprises:

10. The method according to claim 1, wherein the method further comprises:

11. The method of claim 10, wherein said determining a file form of a persistent store based on attribute information of the data processing node comprises:

12. The method of claim 1, wherein the persisting the result data of the data processing node based on the persistence storage rule comprises:

13. The method of claim 2, wherein the serializing storage comprises:

14. The method of claim 13, wherein the serializing store further comprises:

15. The method of claim 1, wherein prior to said persisting the result data of the data processing node based on the persistence storage rule, the method further comprises:

Driving the data processing flow to the current data processing node according to the result data of the previous data processing node;

16. A data storage device, comprising:

the information acquisition module is used for acquiring attribute information of the data processing nodes, wherein the attribute information comprises at least one of the flow concurrency number of the data processing flow, the number of the nodes and the task concurrency number of the data processing nodes;

17. The apparatus of claim 16, wherein the persistent storage rule comprises a serialization store or a paradigm store.

18. The apparatus of claim 16, wherein the rule determination module comprises:

19. The apparatus of claim 18, wherein the first rule determination submodule comprises:

20. The apparatus of claim 16, wherein the attribute information comprises at least one of a task type, a data type, a node identification of the data processing node;

the rule determination module includes:

21. The apparatus of claim 16, wherein the attribute information comprises at least one of historical attribute information and current attribute information;

22. The apparatus of claim 16, wherein the apparatus further comprises:

23. The apparatus of claim 16, wherein the apparatus further comprises:

24. The apparatus of claim 23, wherein the container determination module comprises:

25. The apparatus of claim 16, wherein the apparatus further comprises:

26. The apparatus of claim 25, wherein the file form determination module comprises:

27. The apparatus of claim 16, wherein the persistent storage module comprises:

28. The apparatus of claim 17, wherein the persistent storage module comprises:

29. The apparatus of claim 28, wherein the persistent storage module further comprises:

30. The apparatus of claim 16, wherein the apparatus further comprises:

31. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the method of any of claims 1-15 when the computer program is executed.

32. A computer readable storage medium, on which a computer program is stored, characterized in that the program, when being executed by a processor, implements the method according to any one of claims 1-15.