CN105956087A

CN105956087A - Data and code version management system and method

Info

Publication number: CN105956087A
Application number: CN201610282533.2A
Authority: CN
Inventors: 徐葳; 徐方舟; 张炀
Original assignee: Tsinghua University
Current assignee: Cross Information Core Technology Research Institute (Xi'an) Co., Ltd.
Priority date: 2016-04-29
Filing date: 2016-04-29
Publication date: 2016-09-21
Anticipated expiration: 2036-04-29
Also published as: CN105956087B; CN110119393A; CN110119393B

Abstract

The invention provides a data and code version management system and method. The data and code version management system comprises a data management module, a code management module, an execution engine module and a system core module; at least a data set is stored in the data management module; at least an execution code is stored in the code management module; the code management module receives and stores a code pushed by a user or sends a code processing request based on the code pushed by the user; the execution engine module is configured with at least an execution back-end engine, calls the execution back-end engine based on a received execution instruction and runs one execution code so as to carry out operations for the at least one data set in the data management module; and the system core module after receiving a data processing request submitted by the user processes the data set in the data management module, and establishes a data workflow of the data set and records the formed data version information and code version information. The data and code version management system and method solve the problems such as low data and code version management efficiency and disordered data and code version management.

Description

Data and code release management system and method

Technical field

The present invention relates to data analysis field, particularly relate to a kind of data and code release management system and method.

Background technology

In recent years, people have collected substantial amounts of data.Meanwhile, data science man also becomes the work that each major company is very powerful and exceedingly arrogant Make.But, lack now enough instruments and help data science man's analytical data stream.Along with the task of data science is more and more multiple Miscellaneous, many Data Analysts start to transform code release instrument, such as Git.But, the task of data science is not Git Can process completely.

First, data science is data-centered.One data set can be several through over cleaning, labelling and pretreatment etc. Individual operation.So data set just creates multiple version.Data science man needs record these versions and revise data at any time. Method that is a kind of common but that do not recommend preserves multiple copy exactly, and a little copies are respectively designated as data.csv, data- Version1.csv, data-final-version.csv, data-last-version.csv.This naming method often makes People feel the most chaotic.And because version or data set are mistaken and often resulted in some mistakes.

Second, a machine learning model generally comprises a lot of parameter, and it is the most normal in data science for training these parameters Seeing of task.These parameters such as learning rate, initial value, regularization etc. often allow people feel vast and hazy.So, successors The most just forgotten the significance and importance of these parameters.

3rd, when data set is increasing, data science man it is frequently necessary to build a distributed platform, and its it On be iteratively repeated these experiment.They are also possible to use some third-party software kits.But unfortunately, different soft Part/hardware environment is installed and configured these software kits and is typically the most dull.

Finally, between data science man, shared data set and experience are highly difficult things.Certainly they can share them Code and result, but this and be unfavorable for their their data set of profound understanding and make full use of others code and knot Really.

The DataHub project support data set version of MIT controls, and but can not manipulate whole data set analysis and develop Journey.So this project is more a database management tools rather than SDK.On the other hand, Harvard Dataverse is then a data publication and sharing platform, but lacks Version Control and the analytic function of data.

Summary of the invention

The shortcoming of prior art in view of the above, it is an object of the invention to provide a kind of data and code release management System and method, for solving the problems such as the version management efficiency of data and code in prior art is low or chaotic.

For achieving the above object and other relevant purposes, the first aspect of the application is to provide a kind of data version management System, including: data management module, code administration module, enforcement engine module, and system core module, wherein, described Data management module storage has at least one data set；Described code administration module stores has at least one to perform code, described Perform code at least one data set described is operated；After described enforcement engine module is configured with at least one execution End engine, performs back-end engine according to the commands calls one received, and runs an execution code with to described data management mould At least one data set in block performs operation；When described system core module receives the data processing request that user submits to, place Manage the data set in described data management module, create the datamation stream of described data set, and record the versions of data of formation Information.

In an embodiment of the application, the data of described data set are stored in one first memory element, described data set Metadata be stored in one second memory element, and the data of described data set and metadata are associated by a data ID.

In an embodiment of the application, described system core module receives the data processing request of user's submission for carrying When handing over a new data set, described system core module is extracted the data of described new data set and is stored in described first storage list Unit, extracts the metadata of described new data set and is stored in described second memory element, and it is described to form a new data ID association The data of new data set and metadata, create the datamation stream of described data set, and record the data version information of formation.

In an embodiment of the application, described system core module receives the data processing request of user's submission for repairing When changing a data set of storage in described data management module, described system core module will according to described data processing request One performs Code copying in described enforcement engine module, and sends an execution order and make it run to described enforcement engine module Described execution code is to form a new data set, and described system core module is extracted the data of described new data set and is stored in institute State the first memory element, extract the metadata of described new data set and be stored in described second memory element, and formation one newly counts Associate data and the metadata of described new data set according to ID, and form a code ID by described execution code and described new data Collection is associated, and creates the datamation stream of described data set, and records the data version information of formation.

The another aspect of the application provides a kind of data version management method, said method comprising the steps of: prestore to Lack a data set and perform code at least one that at least one data set described is operated, and configuring at least A kind of execution back-end engine for running described execution code；And；When receiving the data processing request that user submits to, adjust Perform at least one execution code described in back-end engine operation with one and, so that described at least one data set is performed process, create described number According to the datamation stream of collection, and record the data version information of formation.

In an embodiment of the application, the described data processing request receiving user's submission is for submitting a new number to During according to collection, extract the data of described new data set and be stored in described first memory element, extracting first number of described new data set According to and be stored in described second memory element, and form a new data ID and associate data and the metadata of described new data set, and The data version information that record is formed.

In an embodiment of the application, the described data processing request receiving user's submission is for revising described data During the data set stored in management module, perform Code copying to described execution according to described data processing request by one and draw In holding up, and send an execution order to described enforcement engine make its run described execution code to form a new data set, extract The data of described new data set are also stored in described first memory element, extract the metadata of described new data set and are stored in institute State the second memory element, and formation one new data ID associates data and the metadata of described new data set, and form a code Described execution code is associated by ID with described new data set, and records the data version information of formation.

The another further aspect of the application is to provide a kind of code release management system, including: data management module, dematron Reason module, enforcement engine module, and system core module, wherein said data management module storage has at least one data Collection；Described code administration module stores has at least one to perform code, and described execution code is for described data management module At least one data set of storage operates；Described code administration module is additionally operable to receive the code of user's propelling movement and is stored Or send a code process request according to the code of user's propelling movement；Described enforcement engine module is configured with at least one and performs rear end Engine, when being used for receiving execution order, according to performing back-end engine described in a commands calls, run an execution code with A data set in described data management module is performed operation；Described system core module is for recording the code that user pushes And form code release information, and when receiving the code process request of described code administration module, send one and perform order To described enforcement engine module, it is made to run the execution code in described code administration module, and at described execution code with right A data set in described data management module records the code release information of formation after performing operation.

In an embodiment of the application, described system core module is additionally operable to the data processing request submitted to according to user Perform Code copying by one in described enforcement engine module, and send an execution order and make it transport to described enforcement engine module The described execution code of row is to form a new data set, and forms a code ID by relevant to described new data set for described execution code Connection, and record the code release information of formation.

In an embodiment of the application, described system core module according to described data processing request be copied to described in hold Row engine modules perform that code is that user submits to new perform storage in code or the described code administration module called Perform code.

The another aspect of the application is to provide a kind of code release management method, comprises the following steps: prestore at least one Individual data set and perform code at least one operating at least one data set described, and configures at least one For running the execution back-end engine of described execution code；And；The code receiving user's propelling movement is stored, and records formation Code release information；Or the code pushed according to user sends a code process request, send an execution order to described Performing back-end engine makes it run the execution code prestored, and at described execution code so that the described data set prestored is performed The code release information of formation is recorded after operation.

In an embodiment of the application, code release management method also includes step, according at the data that user submits to Reason request performs one in Code copying extremely described execution back-end engine, and it runs described execution generation to send an execution command commands Code is to form a new data set, and forms a code ID and be associated with described new data set by described execution code, and records shape The code release information become.

In an embodiment of the application, described is copied to described execution back-end engine according to described data processing request In perform that code is that user submits to new perform the execution code of storage in code or the described code administration module called.

In an embodiment of the application, code release management method also includes the step being configured with multiple user UI, uses To receive the request of different user submission respectively or to different user feedback request information.

As it has been described above, the data of the present invention and code release management system and method, have the advantages that the present invention By providing respective version management for data set and code, and for each data set and code provide directed acyclic workflow and Build the incidence relation of the two, efficiently solve the problems such as the version management efficiency of data and code is low or chaotic；It addition, use UI designs, it is possible to for user's comparison, analyzes each history data set and provides approach easily；It addition, each unit is distributed in difference On server, it is possible to be easy to alleviate the operating pressure on each server.

Accompanying drawing explanation

Fig. 1 is shown as the structural representation of the data version management system of the present invention.

Fig. 2 is shown as the flow chart of the data version management method of the present invention.

Fig. 3 is shown as the structural representation of the code release management system of the present invention.

Fig. 4 is shown as the flow chart of the code release management method of the present invention.

Fig. 5 is shown as the composition schematic diagram of datamation stream in the present invention one specific embodiment.

Element numbers explanation

1 data version management system

11 data management modules

12 code administration modules

13 enforcement engine modules

14 system core module

2 code release management systems

21 data management modules

22 code administration modules

23 enforcement engine modules

24 system core module

S11～S12, S21～S22 step

Detailed description of the invention

Below by way of specific instantiation, embodiments of the present invention being described, those skilled in the art can be by this specification Disclosed content understands other advantages and effect of the present invention easily.The present invention can also be by the most different concrete realities The mode of executing is carried out or applies, the every details in this specification can also based on different viewpoints and application, without departing from Various modification or change is carried out under the spirit of the present invention.It should be noted that, in the case of not conflicting, following example and enforcement Feature in example can be mutually combined.

It should be noted that the diagram provided in following example illustrates the basic structure of the present invention the most in a schematic way Think, the most graphic in component count time only display with relevant assembly in the present invention rather than is implemented according to reality and arbitrary shape, During its actual enforcement, the kenel of each assembly, quantity and ratio can be a kind of random change, and its assembly layout kenel is likely to Increasingly complex.

Embodiment one

Refer to Fig. 1, be shown as the structural representation of the data version management system of the present invention, as it can be seen, the application First aspect be to provide a kind of data version management system, described data version management system can be only fitted to separate unit service In device, server cluster, server based on cloud computing framework or distributed server.Wherein, described server cluster refer to by A lot of server centered get up to carry out data version management together, and described server cluster can utilize multiple computer to carry out also Row calculates, to improve arithmetic speed.Each server is stored by described server based on cloud computing framework by Intel Virtualization Technology Chi Hua so that in data version management system, each module place server is shared and calculated resource.Described distributed server be by Data and program in described data version management system are dispersed on multiple server and carry out coordinated operation.

Each module in described data version management system can be arranged in any of the above-described kind of service according to actual design needs In device.Specifically, described data version management system 1 includes: data management module 11, code administration module 12, enforcement engine Module 13, and system core module 14.

The storage of described data management module 11 has at least one data set.Wherein, described data set is by version management The set of data.Described data include but not limited to: text data and/or multi-medium data etc..In a specific embodiment In, described text data is exemplified as code data, system journal etc..Described multi-medium data is exemplified as image data, video counts According to etc..If described data management module 11 preserves multiple data set, can be uncorrelated or relevant between the most each data set Connection.Such as, in data set A1, A2 and A3, data set A3 is derived by data set A1 and A2.Data set A3 passes through rope Draw or associate field is associated with data set A1 and A2.

Described data set can also comprise the metadata for indexing or describe data.Wherein, in described data set The metadata of each data and correspondence can be associated by a data ID.Specifically, described metadata (Metadata) is also known as intermediary Data, relay data, for describing the data (data about data) of data, mainly describe data attribute (property) Information, be used for supporting such as to indicate the functions such as storage position, historical data, resource lookup, file record.Described metadata is calculated It is a kind of electronic type catalogue, in order to reach the purpose of scheduling, it is necessary to perhaps characteristic in describing and collect data, and then Reach the purpose assisting data retrieval.

A kind of alternative is that the data of described data set are stored in one first memory element, first number of described data set According to being stored in one second memory element, and the data of described data set and metadata are associated by a data ID.Here, it is described First memory element and the second memory element are configurable in same database server.Can also be arranged according to actual needs In different servers.Such as in an optional embodiment, described first memory cell arrangements is in Hadoop distributed document In system (big data distributed file system)；Second memory cell arrangements is in NoSQL data base (data base of non-relational) In.

Wherein, described distributed file system (Distributed File System) designs based on client/server Pattern, the physical memory resources specifically referring to file system management is not necessarily directly connected on the local node, but by meter Calculation machine network is connected with node.Described NoSQL data base for example, key assignments (Key-Value) stores data base, column storage Storehouse, Document image analysis, figure (Graph) data base or MongoDB database.

In this application, utilize distributed file system can scan data set efficiently, but random access then poor efficiency 's.In order to solve this problem, the scheme that the application provides is to store mark and the note of every pictures, such as filename, big Little, content describes, and these contents are stored in NoSQL data base to accelerate inquiry velocity, namely connect according to data ID Connect initial data and metadata.

The data version management system of the application have recorded each territory of data, and such as, a new data is a newname Word and the set of version number.The embodiment that the application specifically uses uses MongoDB data base to store metadata, But do not limit to and this, in other implementations, it is also possible to column storage database, key assignments storage data will be moved to Storehouse, to improve efficiency in Document image analysis, or figure (Graph) data base.

The storage of described code administration module 12 has at least one to perform code, and described execution code is for described at least one Individual data set operates.Wherein, when described execution code is called, perform the data in data set and data set are entered The operations such as row additions and deletions change.Such as, described execution code includes but not limited to: increases the execution code of a new data set, delete number According to the execution code collected, the execution code increasing label/character etc. in the data that preset data is concentrated, concentrate in preset data Data in delete performing code, replacing the execution of label/character etc. in the data that preset data is concentrated of label/character etc. Code.

Can be selected for being in embodiment at one, described execution code is stored in such as Gitlab, and uses The API of GitLab interacts.Described Gitlab is to utilize mono-edition management system increased income of Ruby on Rails, real An existing Git project warehouse from trustship, can be conducted interviews disclosed or private items by web interface.Described Gitlab has the function similar with Github, it is possible to browse source code, management defect and annotation.Can be with Executive Team to storehouse The access in storehouse, described Gitlab is highly susceptible to browsing the version submitted to and providing a file history storehouse.Team Member can To utilize built-in simple TALKER (Wall) to exchange.Described Gitlab also provides for a code snippet collecting function Can easily realize code reuse, it is simple to make a look up time the most in need.

Described enforcement engine module 13 is configured with at least one and performs back-end engine, according to the commands calls one received Perform back-end engine, run an execution code so that at least one data set in described data management module 11 is performed operation.? This, described execution back-end engine sets for the programming language of each execution code.Described execution back-end engine includes unit engine And Distributed engine.

Python, described Python on described unit engine for example, unit is free software purely, source code Following GPL (GNU General Public License) agreement with interpreter CPython, it first can be by .py Compilation of source code in file becomes the byte code (bytecode) of Python, the most again by Python Virtual Machine (Python virtual machine) performs the byte code that these are compiled.

Spark on described Distributed engine for example, cluster, described Spark are quick and general computing cluster Framework, its kernel uses Scala language to write, it provides Scala, Java and Python programming language high-level API, uses these API can develop the application program of parallel processing easily.

In the particular embodiment, the execution back-end engine described in configuration is required, because so not only facilitating as user Put up the environment of distributed type assemblies；Automatically code and result data collection can also be coupled together.It is to say, The data version management system of the application performs code and can obtain arbitrary intermediate object program, as long as user remains original number According to and code.

When described system core module 14 receives the data processing request that user submits to, process described data management module Data set in 11, creates the datamation stream of described data set, and records the data version information of formation.

Wherein, described datamation stream (Data Work Flow, DWF) is used for being marked at data set during version management And/or the oriented acyclic v ion of the data in data set.For new data set, described datamation stream is corresponding Labelling v ion is root node.For the data set/data comprising version updating, described datamation stream is for representing number According to the secondary relationship between two data concentrated and/or between two data sets.Described secondary relationship includes the execution of a data set Historical record and version.Wherein, described execution historical record includes but not limited to: the data intensive data before change and after change Points relationship (i.e. father and son's node relationships), the execution code called before and after change, execution time etc..In brief, described The concept of datamation stream is as the data set logical relation in the data version management system of the application.Described data set foundation This defines dependence.Described datamation stream be the application data version management system in reappear the core merit of data Energy.

In a described datamation stream, a node represents the particular version illustrating data set.Article one, company The directed edge connecing two nodes represents that a data set is to have another data set to be derived.What the labelling on limit represented is then The code release once tested.Refer to Fig. 5, the example of shown in Fig. 5 a data workflow, i.e. a data workflow It it is a directed acyclic graph.Picture 5 illustrate common one to one with two kinds of datamation flow structures of many-one structure.

In above-mentioned structure one to one, a data set is derived by another data set.For example, Yong Huke To create a new data set based on an existing data set, and on new data set, stamp some new labels and incite somebody to action It shares to other users.And one data set of above-mentioned many-one representation can be derived by two or more data sets ?.As some operations merging two tables of data etc are not always the case.

Father and son's joint is introduced the when of stating datamation flow structure in the data version management system of the application in realization Point is related to that this attribute is used for recording this data set from which data set is derived.Meanwhile, the data version of the application This management system can also realize comparing the function of the difference between father and son's data set.These functions help user's energy easier Find what result the amendment of oneself code result in.Therefore, the structure chart of described datamation stream not only can make number in order According to the relation between collection, it is also possible to help the execution record of management user, produce result including according to version number.

Described system core module 14 is according to the data set before change and the label information (such as ID value) of data and data work Flow, obtain and record the data version information after forming correspondence change.Wherein, described data version information includes but not limited to At least one in dataset name, data ID, code ID, formation time and running log.

Here, the data process that described system core module 14 can pass through network/submission interface captures user submission please Ask, and send the execution order of correspondence and from dematron according to acquired data processing request to described enforcement engine module 13 Execution code selected in reason module 12, calls the execution back-end engine of correspondence to run for described enforcement engine module 13 The execution code selected, in order to carry out the version management of data set.

Such as in a concrete implementation process, whenever user submits execution code (Push) to Gitlab server, Gitlab server will notify described system core module 14 by a Web hook.Described system core module 14 can be by User's request pushes the queue of oneself, chooses request from head of the queue simultaneously and processes.Described system core module 14 can will be somebody's turn to do The execution code copies of request can use user to provide to described enforcement engine module 13, the most described enforcement engine module 13 Parameter and input run execution code.After this task terminates, described system core module 14 can record current request Information, including current Push commit ID on Gitlab server, parameter that user specifies and any concrete with experiment Relevant information.In some cases, experiment can produce new data set.The most described system core module 14 also can record these Relation between data set, the most aforesaid datamation stream.

The request of data processed when user's submission is multiple situation, will be illustrated respectively below:

In one case, when described data processing request is for submitting a new data set to, described system core module New data set can be directly saved in the first memory element by 14, creates the datamation stream of described new data set, and record The data version information formed.Or, described system core module 14 according to this data processing request from code administration module 12 In choose the execution code of correspondence, and send the execution order of submission new data set of correspondence to enforcement engine module 13.Institute State enforcement engine module 13 and performed the execution code selected by order execution according to receive, and by acquired new data set Data be stored in the first memory element.Meanwhile, described system core module 14 also creates the data of described data set Workflow, and record the data version information of formation.

In a kind of alternative, described new data set includes data and metadata.Described system core module 14 is in choosing When taking execution code, choose and corresponding can preserve data and the execution code of metadata, and the execution code selected by execution, from New data set is extracted data and metadata respectively, then the data extracted are stored in the first memory element, will be extracted Metadata be stored in the second memory element, and form a new data ID and associate data and the metadata of described new data set, wound Build the datamation stream of described data set, and record the data version information of formation.Here, described datamation stream comprises work Data and corresponding metadata for root node.Recorded data version information includes: dataset name, data ID, unit's number According to ID, data ID and the corresponding relation of metadata ID, perform to add the code ID of this data set, form time and running log.

In another case, the data processing request submitted to as user stores for revising in described data management module 11 A data set time, described system core module 14 according to described data processing request by one perform Code copying to described in hold In row engine modules 13, and send an execution order to described enforcement engine module 13 make its run described execution code with formed One new data set.Again the new data set formed is saved in the first memory element, creates this new data set simultaneously and relatively repair The datamation stream of the data set before changing, and record the data version information of formation.

A kind of optional mode is, described system core module 14 is extracted the data of described new data set and is stored in described the One memory element, extracts the metadata of described new data set and is stored in described second memory element, and forms a new data ID Associate data and the metadata of described new data set, and form a code ID by described execution code and described new data set phase Association, creates the datamation stream of described data set, and records the data version information of formation.

The mode being more highly preferred to is, described system core module 14 is copied to described execution according to described data processing request The new execution performing that code is that user submits to of engine modules 13 stores in code or the described code administration module 12 called Execution code.

Specifically, user can also submit new execution code to previously according to self-demand, and manually or by described system System nucleus module 14 adjusts the corresponding relation between the new execution back-end engine performed in code and enforcement engine module 13.By This, when the data processing request that user submits to is the data set revising storage in described data management module 11, described System core module 14, according to described data processing request, determines this new execution Code copying to described enforcement engine module In 13, and performed to perform code accordingly by corresponding execution back-end engine, to preserve new data set, and create new data The datamation stream of the data set before collecting relative to amendment, and record the data version information of formation.

When user needs the data set between different editions is analyzed and is calculated, a kind of preferably side in the present embodiment Formula is, described data version management system 1 also includes: Subscriber Interface Module SIM (is illustrated).Described Subscriber Interface Module SIM is joined It is equipped with multiple user UI, in order to receive the request of different user submission respectively or to different user feedback request information.

Specifically, user it is frequently necessary to analyze some data sets and calculate in some parameters, such as natural language processing Accuracy rate or stock market return investment repayment every day in survey.Creating datamation stream and the versions of data of data set After information, described Subscriber Interface Module SIM can provide the user the data in each data set associated based on datamation stream. Shown data can help the difference that user contrasts a pair historical analysis result, shows in code and/or parameter.For many The multiple UI design planting user can help each user to obtain best algorithm and parameter.

In sum, the data version management system that the application provides can be implemented in and manages number in a system integrated According to version, and run personal code work in systems；Code and the data of user can be retained simultaneously, and twice version can be entered Row compares, and finds difference；It addition, data and metadata are stored separately by the data version management system of the present invention so that permissible Cross filter data more efficiently；Furthermore, the present invention is by providing respective version management for data set and code, and is each data set There is provided the workflow of directed acyclic with code and build the incidence relation of the two, efficiently solving the version management of data and code The problems such as efficiency is low or chaotic；It addition, use multiple UI to design, it is possible to for user's comparison, analyze each history data set and provide Approach easily；It addition, each unit is distributed on different server, it is possible to be easy to alleviate the operating pressure on each server.

Embodiment two

Refer to Fig. 2, be shown as the flow chart of the data version management method of the present invention, as it can be seen, the of the application Two aspects are to provide a kind of data version management method.Described data version management method mainly has data version management system Perform.Wherein, described data version management system can be only fitted to single server, server cluster, based on cloud computing frame In the server of structure or distributed server.Wherein, described server cluster refers to get up a lot of server centered carry out together Data version management, described server cluster can utilize multiple computer to carry out parallel computation, to improve arithmetic speed.Described Server based on cloud computing framework passes through Intel Virtualization Technology by each server storage pool so that in data version management system Each module place server is shared and is calculated resource.Described distributed server is by the data in described data version management system It is dispersed on multiple server with program and carries out coordinated operation.

Each module in described data version management system can be arranged in any of the above-described kind of service according to actual design needs In device.Described data version management system suddenly performs described method according to following steps.

In step s 11, prestore at least one data set and for operating at least one data set described At least one performs code, and configures at least one for the execution back-end engine running described execution code.

Here, described data set is by the set of the data of version management.Described data include but not limited to: textual data According to and/or multi-medium data etc..In a specific embodiment, described text data is exemplified as code data, system journal Deng.Described multi-medium data is exemplified as image data, video data etc..If described data management module preserves multiple data sets, Can be uncorrelated or related between the most each data set.Such as, in data set A1, A2 and A3, data set A3 is Derived by data set A1 and A2.Data set A3 is associated with data set A1 and A2 by index or associate field.

Here, described execution code is for operating at least one data set described.Wherein, when described execution code Time called, perform the operations such as the data in data set and data set are carried out that additions and deletions change.Such as, described execution code includes But be not limited to: increase a new data set perform code, delete data set perform code, preset data concentrate data in Increase label/character etc. perform code, delete in the data that preset data is concentrated label/character etc. execution code, The data that preset data is concentrated are replaced the execution code of label/character etc..

Can be selected for being in embodiment at one, described execution code is stored in such as Gitlab, and uses The API of GitLab interacts.Described Gitlab is to utilize mono-edition management system increased income of Ruby on Rails, real An existing Git project warehouse from trustship, can be conducted interviews disclosed or private items by web interface.Described Gitlab has the function similar with Github, it is possible to browse source code, management defect and annotation.Can be with Executive Team to warehouse Access, described Gitlab is highly susceptible to browsing the version submitted to and providing a file history storehouse.Team Member is permissible Built-in simple TALKER (Wall) is utilized to exchange.Described Gitlab also provides for a code snippet collecting function can Easily to realize code reuse, it is simple to make a look up time the most in need.Here, described execution back-end engine is for respectively holding The programming language of line code and set.Described execution back-end engine includes unit engine and Distributed engine.

In step s 12, when receiving the data processing request that user submits to, call an execution back-end engine and run described At least one execution code, so that described at least one data set is performed process, creates the datamation stream of described data set, and record The data version information formed.

Specifically, when described data version management system receives the data processing request that user submits to, described number is processed According to the data set in management module, create the datamation stream of described data set, and record the data version information of formation.

Wherein, described datamation stream (Data Work Flow, DWF) is used for being marked at data set during version management And/or the oriented acyclic v ion of the data in data set.For new data set, described datamation stream is corresponding Labelling v ion is root node.For the data set/data comprising version updating, described datamation stream is for representing number According to the secondary relationship between two data concentrated and/or between two data sets.

Described secondary relationship includes execution historical record and the version of a data set.Wherein, described execution historical record bag Include but be not limited to: institute before and after the points relationship (i.e. father and son's node relationships) of the data intensive data before change and after change, change The execution code that calls, execution time etc..In brief, the concept of described datamation stream is as the versions of data of the application Data set logical relation in management system.Described data set defines dependence according to this.Described datamation stream is this The data version management system of application is reappeared the Core Feature of data.

Described data version management system is according to the data set before change and the label information (such as ID value) of data and data Workflow, obtains and records the data version information after forming correspondence change.Wherein, described data version information includes but does not limits At least one in dataset name, data ID, code ID, formation time and running log.

Here, described data version management system can pass through the data process that network/submission interface captures user is submitted to Request, and send corresponding execution order and from dematron according to acquired data processing request to described enforcement engine module Execution code selected in reason module, the execution back-end engine calling correspondence for described enforcement engine module is run selected Execution code, in order to carry out the version management of data set.

In one case, when described data processing request is for submitting a new data set to, data version management system of institute New data set can be directly saved in the first memory element by system, creates the datamation stream of described new data set, and record The data version information formed.Or, described data version management system chooses the execution of correspondence according to this data processing request Code, is stored in the first memory element with the data by acquired new data set.Meanwhile, described data version management System also creates the datamation stream of described data set, and records the data version information of formation.

In a kind of alternative, described new data set includes data and metadata.Described data version management system exists When choosing execution code, choose and corresponding can preserve data and the execution code of metadata, and the execution code selected by execution, From new data set, extract data and metadata respectively, then the data extracted are stored in the first memory element, will be carried The metadata taken is stored in the second memory element, and formation one new data ID associates data and the metadata of described new data set, Create the datamation stream of described data set, and record the data version information of formation.Here, described datamation stream comprises Data and corresponding metadata as root node.Recorded data version information includes: dataset name, data ID, unit The corresponding relation of data ID, data ID and metadata ID, execution add the code ID of this data set, form the time and run day Will.

In another case, when the data processing request that user submits to is the data set that amendment has stored, institute State data version management system and according to described data processing request, one execution Code copying performed in back-end engine to corresponding, And send an execution order to described execution back-end engine make its run described execution code to form a new data set.Again by institute The new data set formed is saved in the first memory element, creates the data of the data set before this new data set is revised relatively simultaneously Workflow, and record the data version information of formation.

A kind of optional mode is, described data version management system is extracted the data of described new data set and is stored in described First memory element, extracts the metadata of described new data set and is stored in described second memory element, and forms a new data ID associates data and the metadata of described new data set, and forms a code ID by described execution code and described new data set It is associated, creates the datamation stream of described data set, and record the data version information of formation.

The mode being more highly preferred to is, described data version management system according to described data processing request be copied to described in hold Row back-end engine perform that code is that user submits to new perform storage in code or the described code administration module called Perform code.

Specifically, user can also submit new execution code to previously according to self-demand, and by described versions of data What management system call interception was new performs code and each corresponding relation performed between back-end engine.Thus, when the data that user submits to Process request for amendment stored a data set time, described data version management system according to described data processing request, Determine and this new execution Code copying to correspondence is performed in back-end engine, and performed by corresponding execution back-end engine corresponding Perform code, to preserve new data set, and create the new data set datamation stream relative to the data set before amendment, and The data version information that record is formed.

When user needs the data set between different editions is analyzed and is calculated, a kind of preferably side in the present embodiment Formula is, described data version management method also includes the step being configured with multiple user UI, in this step, is configured with multiple use Family UI, in order to receive the request of different user submission respectively or to different user feedback request information.

Specifically, user it is frequently necessary to analyze some data sets and calculate in some parameters, such as natural language processing Accuracy rate or stock market return investment repayment every day in survey.Creating datamation stream and the versions of data of data set After information, described data version management system can provide the user the number in each data set associated based on datamation stream According to.Shown data can help the difference that user contrasts a pair historical analysis result, shows in code and/or parameter.Pin Multiple UI design to multiple user can help each user to obtain best algorithm and parameter.

Embodiment three

Refer to Fig. 3, be shown as the structural representation of the code release management system of the present invention, as it can be seen, the application The third aspect be to provide a kind of code release management system, described code release management system can be only fitted to separate unit service In device, server cluster, server based on cloud computing framework or distributed server.Wherein, described server cluster refer to by A lot of server centered get up to carry out data version management together, and described server cluster can utilize multiple computer to carry out also Row calculates, to improve arithmetic speed.Each server is stored by described server based on cloud computing framework by Intel Virtualization Technology Chi Hua so that in code release management system, each module place server is shared and calculated resource.Described distributed server be by Data and program in described code release management system are dispersed on multiple server and carry out coordinated operation.

Each module in described code release management system can be arranged in any of the above-described kind of service according to actual design needs In device.Specifically, described code release management system 2 includes: data management module 21, code administration module 22, enforcement engine Module 23 and system core module 24.

The storage of described data management module 21 has at least one data set.

Wherein, described data set is by the set of the data of version management.Described data include but not limited to: textual data According to and/or multi-medium data etc..In a specific embodiment, described text data is exemplified as code data, system journal Deng.Described multi-medium data is exemplified as image data, video data etc..If described data management module 21 preserves multiple data Collection, can be uncorrelated or related between the most each data set.Such as, in data set A1, A2 and A3, data set A3 Derived by data set A1 and A2.Data set A3 is associated with data set A1 and A2 by index or associate field.

The code release management system of the application have recorded each territory of data, and such as, a new data is a newname Word and the set of version number.The embodiment that the application specifically uses uses MongoDB data base to store metadata, But do not limit to and this, in other implementations, it is also possible to column storage database, key assignments storage data will be moved to Storehouse, to improve efficiency in Document image analysis, or figure (Graph) data base.

The storage of described code administration module 22 has at least one to perform code, and described execution code is for described data pipe At least one data set of reason module 21 storage operates.

Wherein, when described execution code is called, performs that the data in data set and data set are carried out additions and deletions and change Operation.Such as, described execution code includes but not limited to: increase by a new data set performs code, the execution of deletion data set Code, in the data that preset data is concentrated, increase performing code, deleting in the data that preset data is concentrated of label/character etc. Performing code, replace the execution code of label/character etc. in the data that preset data is concentrated except label/character etc..

Can be selected for being in embodiment at one, described execution code is stored in such as Gitlab, and uses The API of GitLab interacts.Described Gitlab is to utilize mono-edition management system increased income of Ruby on Rails, real An existing Git project warehouse from trustship, can be conducted interviews disclosed or private items by web interface.Described Gitlab has the function similar with Github, it is possible to browse source code, management defect and annotation.Can be with Executive Team to warehouse Access, described Gitlab is highly susceptible to browsing the version submitted to and providing a file history storehouse.Team Member is permissible Built-in simple TALKER (Wall) is utilized to exchange.Described Gitlab also provides for a code snippet collecting function can Easily to realize code reuse, it is simple to make a look up time the most in need.

In addition, described code administration module 22 be additionally operable to receive user push code store or foundation user The code pushed sends a code process request.

Wherein under a kind of situation, the code that described code administration module 22 pushes for receiving user is stored.

Specifically, user uses user terminal to described code administration module 22 upload code, the most described code administration mould The code received is preserved by block 22.Wherein, described code is that user is provided according to described code release management system 2 API newly-built/reorganization execution code.Such as, user improves the execution code in described code administration module 22, and leads to Crossing the system core module 24 of subsequent detailed description, the execution code after improving is uploaded to described code administration module 22, then The code received is updated and is preserved by described code administration module 22.

In another scenario, described code administration module 22 sends a code process for the code pushed according to user Request.

Here, the code that user pushes is without carrying out in advance and the joining of enforcement engine module 23 in system core module 24 Putting, then select the execution code pushed in advance when managing data set as user, described code administration module 22 is to system core Core module 24 sends a code process request, to inform the execution back-end engine corresponding to system core module 24.

Described enforcement engine module 23, is configured with at least one and performs back-end engine, when being used for receiving execution order, depends on According to performing back-end engine described in a commands calls, run an execution code with to the number in described data management module 21 Operation is performed according to collection.

Here, described execution back-end engine sets for the programming language of each execution code.Described execution back-end engine bag Include unit engine and Distributed engine.

In the particular embodiment, the execution back-end engine described in configuration is required, because so not only facilitating as user Put up the environment of distributed type assemblies；Automatically code and result data collection can also be coupled together.It is to say, The code release management system of the application performs code and can obtain arbitrary intermediate object program, as long as user remains original number According to and code.

Described system core module 24 is for recording the code of user's propelling movement and forming code release information；And receive When the code process of described code administration module 22 is asked, send an execution and order to described enforcement engine module 23, make it transport Execution code in the described code administration module 22 of row, and at described execution code with in described data management module 21 Data set records the code release information of formation after performing operation.

Here, for the function of described system core module 24, be respectively described.In in one way in which, when with When family pushes code, the code received is passed to code administration module 22 by described system core module 24, in order to preserve, with Time form the code release information of corresponding described code.Wherein, similar with the data version information in the various embodiments described above, described Code release information include but not limited to following at least one: name of code, code ID, formed the time, specify parameter and run Daily record.Described system core module 24 refers to the mode utilizing data set workflow to manage data set, enters described code Row version management.Such as, user improves the execution code stored, and the most described code release management system 2 will be received Code updated and preserved.Meanwhile, before and after the also record modification of described code management system between two execution codes Corresponding relation, to form code operation stream；And on the basis of execution code release information before a modification, determine and record amendment After code release information.In a further mode of operation, when user selects the code of propelling movement to manage data set, described system core Core module 24 code process based on described code administration module 22 is asked, and sends an execution order to described enforcement engine module 23, make it run the execution code in described code administration module 22, and at described execution code with to described data management mould A data set in block 21 records the code release information of formation after performing operation.Such as, user is by input parameter, code version This information etc. select the data in the code administration data set pushed in advance to change, and the most described system core module 24 will be connect The parameter of receipts, code release information etc. are supplied to code administration module 22.Described code administration module 22 is true according to above-mentioned information Fixed execution code to be performed, and send the request of corresponding code process to described system core module 24.Described system core Core module 24 performs order according to the request of described code process to enforcement engine module 23 transmission.The most described enforcement engine module 23 Perform back-end engine accordingly according to described execution command selection and perform specified execution code, and at described execution code After the data set in described data management module 21 is performed operation, the code release information that record is formed.Wherein, described At least one in code release information, datamation stream and data version information also records performed code and data Corresponding relation between collection.

In a concrete implementation process, whenever the execution code that user submits to Gitlab server, Gitlab services Device will notify described system core module 24 by a Web hook.User's request can be pushed away by described system core module 24 Enter the queue of oneself, choose request from head of the queue simultaneously and process.Described system core module 24 can be by the execution of this request Code copies is to described enforcement engine module 23, and sends code process request to described enforcement engine module 23, to inform The execution back-end engine selected.The parameter that the most described enforcement engine module 23 can use user to provide runs execution with input Code.After this task terminates, described system core module 24 can record the information of current request, exists including current Push Commit ID on Gitlab server, perform code release information, parameter that user specifies and any concrete with test phase The information closed.In some cases, experiment can produce new data set.The most described system core module 24 also can record these numbers According to the relation between collection, i.e. datamation stream described in foregoing embodiments.

It should be noted that described system core module 24 is copied to described enforcement engine according to described data processing request The execution code of module 23 is in addition to the new execution code that user submits to, it is also possible to be the described code administration module 22 called The execution code of middle storage.Such as, the execution code in user's appointment codes management module 22 processes corresponding data set, and The execution code of acquiescence in non-used code administration module 22.

In a kind of alternative, described system core module 24 is additionally operable to the data processing request according to user's submission will One performs Code copying in described enforcement engine module 23, and sends an execution order and make it to described enforcement engine module 23 Run described execution code to form a new data set, and form a code ID by described execution code and described new data set phase Association, and record the code release information of formation.

Wherein, described data processing request includes but not limited to: submits a new data set to or revises described data management A data set of storage in module 21.

Specifically, on the basis of above-mentioned data version management system forms datamation stream and data version information, with And system core module 24 in the present embodiment in performing Code copying extremely described enforcement engine module 23 by one, and send On the basis of one execution order makes its described execution code of operation to form a new data set to described enforcement engine module 23, also Performed execution code ID is associated with new data set, and records the code release information of formation.So, need as user When the execution historical record of multiple associated data set is analyzed, the relevant letter performing code of available corresponding each data set Breath.Provide the user more for data set analysis, the average information of statistics.

As a kind of preferred version, described code release management system 2 also includes Subscriber Interface Module SIM, is configured with multiple use Family UI, in order to receive the request of different user submission respectively or to different user feedback request information.

Specifically, user it is frequently necessary to analyze some data sets and calculate in some parameters, such as natural language processing Accuracy rate or stock market return investment repayment every day in survey.Creating datamation stream and the versions of data of data set After information, described Subscriber Interface Module SIM can provide the user the data in each data set associated based on datamation stream, Perform code etc..Shown data can help user to contrast a pair historical analysis result, show in code and/or parameter Different.Multiple UI design for multiple user can help each user to obtain best algorithm and parameter.

In sum, the code release management system that the application provides can be implemented in and manages generation in a system integrated The version of code, and run personal code work in systems；Code and the data of user can be retained simultaneously, and twice version can be entered Row compares, and finds difference；It addition, data and metadata are stored separately by the code release management system of the present invention so that permissible Cross filter data more efficiently；Furthermore, the present invention is by providing respective version management for data set and code, and is each data set There is provided the workflow of directed acyclic with code and build the incidence relation of the two, efficiently solving the version management of data and code The problems such as efficiency is low or chaotic；It addition, use multiple UI to design, it is possible to for user's comparison, analyze each history data set and provide Approach easily；It addition, each unit is distributed on different server, it is possible to be easy to alleviate the operating pressure on each server.

Embodiment four

Refer to Fig. 4, be shown as the flow chart of the code release management method of the present invention, as it can be seen, the of the application Four aspects are to provide a kind of code release management method.Described code release management method is mainly managed system by code release Perform.Wherein, described code release management system can be only fitted to single server, server cluster, based on cloud computing frame In the server of structure or distributed server.Wherein, described server cluster refers to get up a lot of server centered carry out together Data version management, described server cluster can utilize multiple computer to carry out parallel computation, to improve arithmetic speed.Described Server based on cloud computing framework passes through Intel Virtualization Technology by each server storage pool so that in code release management system Each module place server is shared and is calculated resource.Described distributed server is by the data in described code release management system It is dispersed on multiple server with program and carries out coordinated operation.

Each module in described code release management system can be arranged in any of the above-described kind of service according to actual design needs In device.Specifically, described code release management system performs described method by performing following steps.

In the step s 21, prestore at least one data set and for operating at least one data set described At least one performs code, and configures at least one for the execution back-end engine running described execution code.

Described execution code is for describing the operations such as the data in data set and data set being carried out, additions and deletions change.Such as, institute State execution code to include but not limited to: increase the execution code of a new data set, delete the execution code of data set, in present count According to the data concentrated increasing the execution code of label/character etc., deleting label/character etc. in the data that preset data is concentrated Perform code, in the data that preset data is concentrated, replace the execution code of label/character etc..

In step S22, the code receiving user's propelling movement is stored, and records the code release information of formation；Or The code pushed according to user sends a code process request, sends an execution order and makes it run to described execution back-end engine The execution code prestored, and at described execution code to record the code of formation after the described data set prestored is performed operation Version information.

Wherein, described code release management system receives the code of user's propelling movement and is stored, and records the code of formation The mode of version information is specific as follows:

User uses user terminal to manage system upload code to described code release, and the most described code administration module is by institute The code received is preserved, and concurrently forms the code release information of corresponding described code.Wherein, described code can be user According to the described code release management API that provided of system newly-built/the execution code of reorganization.It addition, with in foregoing embodiments Data version information be similar to, described code release information include but not limited to following at least one: name of code, code ID, Formation time, appointment parameter and running log.Described code release management system refers to utilize data set workflow to manage The mode of data set, carries out version management to described code.

Such as, user improves the execution code stored, the generation that the most described code release management system will be received Code is updated and is preserved.Meanwhile, two the correspondence between codes is performed before and after the also record modification of described code management system Relation, to form code operation stream；And on the basis of execution code release information before a modification, determine and record amended Code release information.

Or, described code release management system sends a code process request according to the code that user pushes, and sends one Performing order makes it run the execution code prestored to described execution back-end engine, and at described execution code with to the institute prestored State the code release information recording formation after a data set performs operation.

Specifically, the code that user pushes is without configuring with corresponding execution back-end engine in advance, but works as user Selecting the execution code pushed in advance when managing data set, described code release management system generates one and comprises corresponding execution The code process request of back-end engine, and start the execution back-end engine of correspondence, makes it run the execution code that pushed, and Perform described execution code with to one data set perform operation after, record formed code release information.

Such as, user selects the code pushed in advance by input parameter, code release information etc., to manage data set In data change.Described code release management system determines to perform based on the parameter received, code release information etc. Execution code, and generate correspondence code process request.Described system core module is asked to right according to described code process The execution back-end engine answered sends and performs order.The most described execution back-end engine orders holding specified by execution according to described execution Line code, and after described execution code is to perform operation to the data set in described data management module, record is formed Code release information.Wherein, at least one in described code release information, datamation stream and data version information also The corresponding relation between code and data set performed by record.

In a concrete implementation process, whenever the execution code that user submits to Gitlab server, Gitlab services By a Web hook, device will notify that described code release manages system.User can be asked by described code release management system Ask the queue pushing oneself, choose request from head of the queue simultaneously and process；And by the execution code copies of this request to corresponding Execution back-end engine, to inform selected execution back-end engine.The most described execution back-end engine can use user to provide Parameter and input run execution code.After this task terminates, described code release management system can be recorded and specifically please The information asked, including current Push commit ID on Gitlab server, perform code release information, user specifies Parameter and any specifically relevant to experiment information.In some cases, experiment can produce new data set.The most described code Edition management system also can record the relation between these data sets, i.e. datamation stream described in foregoing embodiments.

It should be noted that after described code release management system is copied to perform accordingly according to described data processing request The execution code of end engine is in addition to the new execution code that user submits to, it is also possible to be the execution code stored called. Such as, the execution code that user's appointment has stored is to process corresponding data set, and acquiescence in non-used code administration module Perform code.

In a kind of alternative, the data processing request that described code release management system is submitted to according to user is held one Line code is copied in described execution back-end engine, and sending an execution order, to make it run to described execution back-end engine described Perform code to form a new data set, and form a code ID described execution code is associated with described new data set, and The code release information that record is formed.

Wherein, described data processing request includes but not limited to: submits a new data set to or revises described data management A data set of storage in module.

Specifically, on the basis of above-mentioned data version management system forms datamation stream and data version information, with And performing Code copying by one in described execution back-end engine in described code release management system, and send one and perform life Order is to described execution back-end engine, on the basis of making its described execution code of operation to form a new data set, and described code version Performed execution code ID is also associated by this management system with new data set, and records the code release information of formation.As This, is when user needs to be analyzed the execution historical record of multiple associated data set, holding of available corresponding each data set The relevant information of line code.Provide the user more for data set analysis, the average information of statistics.

As a kind of preferred version, described code release management method also includes: be configured with the step of multiple user UI.? In the step for of, it is configured with multiple user UI, in order to receive the request of different user submission respectively or to ask to different user feedback Seek information.

Specifically, user it is frequently necessary to analyze some data sets and calculate in some parameters, such as natural language processing Accuracy rate or stock market return investment repayment every day in survey.Creating datamation stream and the versions of data of data set After information, described code release management system can provide the user the number in each data set associated based on datamation stream According to, perform code etc..Shown data can help user to contrast a pair historical analysis result, show code and/or parameter On difference.Multiple UI design for multiple user can help each user to obtain best algorithm and parameter.

It should be noted that data version management system and code release described in each embodiment manage system in the present invention In each module can coordinate to share according to name so that two edition management systems can manage data set simultaneously and perform generation Code, respective version.

In sum, the code release management system that the application provides can be implemented in and manages generation in a system integrated The version of code, and run personal code work in systems；Code and the data of user can be retained simultaneously, and twice version can be entered Row compares, and finds difference；It addition, the present invention is by providing respective version management for data set and code, and it it is each data set There is provided the workflow of directed acyclic with code and build the incidence relation of the two, efficiently solving the version management of data and code The problems such as efficiency is low or chaotic；It addition, use UI design, it is possible to for user's comparison, analyze each history data set provide convenient Approach；It addition, each unit is distributed on different server, it is possible to be easy to alleviate the operating pressure on each server.So, this Invention effectively overcomes various shortcoming of the prior art and has high industrial utilization.

The principle of above-described embodiment only illustrative present invention and effect thereof, not for limiting the present invention.Any ripe Above-described embodiment all can be modified under the spirit and the scope of the present invention or change by the personage knowing this technology.Cause This, have usually intellectual such as complete with institute under technological thought without departing from disclosed spirit in art All equivalences become are modified or change, and must be contained by the claim of the present invention.

Claims

1. a data version management system, it is characterised in that including:

Data management module, storage has at least one data set；

Code administration module, storage has at least one to perform code, and described execution code is at least one data set described Operate；

Enforcement engine module, is configured with at least one and performs back-end engine, perform rear end according to the commands calls one received Engine, runs an execution code so that at least one data set in described data management module is performed operation；

System core module, when receiving the data processing request that user submits to, processes the data in described data management module Collection, creates the datamation stream of described data set, and records the data version information of formation.

Data version management system the most according to claim 1, it is characterised in that: the data of described data set are stored in one First memory element, the metadata of described data set is stored in one second memory element, and the data of described data set and unit's number It is associated according to by a data ID.

Data version management system the most according to claim 2, it is characterised in that: described system core module receives use When the data processing request that family is submitted to is for submitting a new data set to, described system core module extracts the number of described new data set According to and be stored in described first memory element, extract the metadata of described new data set and be stored in described second memory element, And formation one new data ID associates data and the metadata of described new data set, create the datamation stream of described data set, and The data version information that record is formed.

Data version management system the most according to claim 2, it is characterised in that: described system core module receives use When the data processing request that family is submitted to is the data set revising storage in described data management module, described system core mould Block performs Code copying in described enforcement engine module according to described data processing request by one, and send an execution order to Described enforcement engine module makes its described execution code of operation to form a new data set, and described system core module is extracted described The data of new data set are also stored in described first memory element, extract the metadata of described new data set and are stored in described Two memory element, and form data and the metadata of a new data ID described new data set of association, and formation one code ID will Described execution code is associated with described new data set, creates the datamation stream of described data set, and records the data of formation Version information.

Data version management system the most according to claim 4, it is characterised in that: described system core module is according to described Data processing request be copied to described enforcement engine module perform that code is that user submits to new perform code or call The execution code of storage in described code administration module.

6. according to the data version management system described in claim 2,3 or 4, it is characterised in that: described first memory element is joined It is placed in Hadoop distributed file system；Second memory cell arrangements is in NoSQL data base.

7. according to the data version management system described in claim 1,2,3 or 4, it is characterised in that: described datamation stream is Representing the secondary relationship between at least two data set, described secondary relationship includes execution historical record and the version of a data set This.

8. according to the data version management system described in claim 1,2,3 or 4, it is characterised in that: also include user interface mould Block, is configured with multiple user UI, in order to receive the request of different user submission respectively or to different user feedback request information.

9. according to the data version management system described in claim 1,2,3 or 4, it is characterised in that: described data version information Including at least one in dataset name, data ID, code ID, formation time and running log.

10. according to the data version management system described in claim 1,2,3 or 4, it is characterised in that: described execution back-end engine Including unit engine and Distributed engine.

11. 1 kinds of data version management methods, it is characterised in that said method comprising the steps of:

Prestore at least one data set and at least one data set described is operated at least one perform code, And configure at least one for the execution back-end engine running described execution code；And；

Receive user submit to data processing request time, call one execution back-end engine run described at least one perform code with Described at least one data set is performed process, creates the datamation stream of described data set, and record the versions of data letter of formation Breath.

12. data version management methods according to claim 11, it is characterised in that: the data of described data set are stored in One first memory element, the metadata of described data set is stored in one second memory element, and the data of described data set and unit Data are associated by a data ID.

13. data version management methods according to claim 12, it is characterised in that: described receives what user submitted to When data processing request is for submitting a new data set to, extracts the data of described new data set and be stored in described first storage list Unit, extracts the metadata of described new data set and is stored in described second memory element, and it is described to form a new data ID association The data of new data set and metadata, and record the data version information of formation.

14. data version management methods according to claim 12, it is characterised in that: described receives what user submitted to When data processing request is the data set that amendment has stored, perform Code copying extremely according to described data processing request by one Corresponding perform in back-end engine, and send an execution order to described execution back-end engine make its run described execution code with Form a new data set, extract the data of described new data set and be stored in described first memory element, extracting described new data The metadata of collection is also stored in described second memory element, and forms a new data ID and associate the data of described new data set and unit Data, and form a code ID described execution code is associated with described new data set, and record the versions of data of formation Information.

15. data version management methods according to claim 14, it is characterised in that: described processes according to described data What request was copied in described execution back-end engine perform, and code is that user submits to new performs code or prestoring of calling Perform code.

16. according to the data version management method described in claim 12,13 or 14, it is characterised in that: described first storage is single Unit is configured in Hadoop distributed file system；Second memory cell arrangements is in NoSQL data base.

17. according to the data version management method described in claim 10,11,12 or 13, it is characterised in that: described datamation Stream for representing the secondary relationship between at least two data set, described secondary relationship include a data set execution historical record and Version.

18. according to the data version management method described in claim 11,12,13 or 14, it is characterised in that: also include being configured with The step of multiple user UI, in order to receive the request of different user submission respectively or to different user feedback request information.

19. according to the data version management system described in claim 11,12,13 or 14, it is characterised in that: described versions of data Information includes at least one in dataset name, data ID, code ID, formation time and running log.

20. according to the data version management system described in claim 11,12,13 or 14, it is characterised in that: described execution rear end Engine includes unit engine and Distributed engine.

21. 1 kinds of code release management systems, it is characterised in that including:

Data management module, storage has at least one data set；

Code administration module, storage has at least one to perform code, and described execution code is for depositing described data management module At least one data set of storage operates；Described code administration module be additionally operable to receive user push code stored or The code pushed according to user sends a code process request；

Enforcement engine module, is configured with at least one and performs back-end engine, when being used for receiving execution order, performs life according to one Described execution back-end engine is called in order, runs an execution code so that the data set in described data management module is performed behaviour Make；

System core module, for recording the code of user's propelling movement and forming code release information, and receives described code When the code process of management module is asked, send an execution and order to described enforcement engine module, make it run described dematron Execution code in reason module, and after described execution code is to perform operation to the data set in described data management module The code release information that record is formed.

22. code release according to claim 21 management systems, it is characterised in that: the data of described data set are stored in One first memory element, the metadata of described data set is stored in one second memory element, and the data of described data set and unit Data are associated by a data ID.

23. code release according to claim 21 management systems, it is characterised in that: described system core module is additionally operable to The data processing request submitted to according to user performs Code copying in described enforcement engine module by one, and sends one and perform life Make and make it run described execution code to form a new data set to described enforcement engine module, and form a code ID by described Perform code to be associated with described new data set, and record the code release information of formation.

24. code release according to claim 21 management systems, it is characterised in that: described system core module is according to institute State data processing request be copied to described enforcement engine module perform that code is that user submits to new perform code or call Described code administration module in storage execution code.

25. manage system according to the code release described in claim 21,22,23 or 24, it is characterised in that: also include user Interface module, is configured with multiple user UI, in order to receive the request of different user submission respectively or to different user feedback request Information.

26. manage system according to the code release described in claim 21,22,23 or 24, it is characterised in that: described code version This information includes name of code, code ID, formation time, specifies at least one in parameter and running log.

27. manage system according to the code release described in claim 21,22,23 or 24, it is characterised in that: after described execution End engine includes unit engine and Distributed engine.

28. 1 kinds of code release management methods, it is characterised in that comprise the following steps:

The code receiving user's propelling movement is stored, and records the code release information of formation；Or the generation pushed according to user Code sends a code process request, sends an execution order and makes it run the execution code prestored to described execution back-end engine, And at described execution code to record the code release information of formation after the described data set prestored is performed operation.

29. code release management methods according to claim 28, it is characterised in that: the data of described data set are stored in One first memory element, the metadata of described data set is stored in one second memory element, and the data of described data set and unit Data are associated by a data ID.

30. code release management methods according to claim 28, it is characterised in that: also include step, carry according to user The data processing request handed over performs one in Code copying extremely described execution back-end engine, and it runs to send an execution command commands Described execution code is to form a new data set, and forms a code ID by relevant to described new data set for described execution code Connection, and record the code release information of formation.

31. code release management methods according to claim 28, it is characterised in that: described processes according to described data What request was copied in described execution back-end engine performs code is the new execution code that user submits to or the described generation called The execution code of storage in code management module.

32. according to the code release management method described in claim 28,29,30 or 31, it is characterised in that: also include configuration There is the step of multiple user UI, in order to receive the request of different user submission respectively or to different user feedback request information.

33. according to the code release management method described in claim 28,29,30 or 31, it is characterised in that: described code version This information includes name of code, code ID, formation time, specifies at least one in parameter and running log.

34. according to the code release management method described in claim 28,29,30 or 31, it is characterised in that: after described execution End engine includes unit engine and Distributed engine.