CN105956087A - Data and code version management system and method - Google Patents
Data and code version management system and method Download PDFInfo
- Publication number
- CN105956087A CN105956087A CN201610282533.2A CN201610282533A CN105956087A CN 105956087 A CN105956087 A CN 105956087A CN 201610282533 A CN201610282533 A CN 201610282533A CN 105956087 A CN105956087 A CN 105956087A
- Authority
- CN
- China
- Prior art keywords
- data
- code
- data set
- execution
- module
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/21—Design, administration or maintenance of databases
- G06F16/215—Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/21—Design, administration or maintenance of databases
- G06F16/219—Managing data history or versioning
Abstract
The invention provides a data and code version management system and method. The data and code version management system comprises a data management module, a code management module, an execution engine module and a system core module; at least a data set is stored in the data management module; at least an execution code is stored in the code management module; the code management module receives and stores a code pushed by a user or sends a code processing request based on the code pushed by the user; the execution engine module is configured with at least an execution back-end engine, calls the execution back-end engine based on a received execution instruction and runs one execution code so as to carry out operations for the at least one data set in the data management module; and the system core module after receiving a data processing request submitted by the user processes the data set in the data management module, and establishes a data workflow of the data set and records the formed data version information and code version information. The data and code version management system and method solve the problems such as low data and code version management efficiency and disordered data and code version management.
Description
Technical field
The present invention relates to data analysis field, particularly relate to a kind of data and code release management system and method.
Background technology
In recent years, people have collected substantial amounts of data.Meanwhile, data science man also becomes the work that each major company is very powerful and exceedingly arrogant
Make.But, lack now enough instruments and help data science man's analytical data stream.Along with the task of data science is more and more multiple
Miscellaneous, many Data Analysts start to transform code release instrument, such as Git.But, the task of data science is not Git
Can process completely.
First, data science is data-centered.One data set can be several through over cleaning, labelling and pretreatment etc.
Individual operation.So data set just creates multiple version.Data science man needs record these versions and revise data at any time.
Method that is a kind of common but that do not recommend preserves multiple copy exactly, and a little copies are respectively designated as data.csv, data-
Version1.csv, data-final-version.csv, data-last-version.csv.This naming method often makes
People feel the most chaotic.And because version or data set are mistaken and often resulted in some mistakes.
Second, a machine learning model generally comprises a lot of parameter, and it is the most normal in data science for training these parameters
Seeing of task.These parameters such as learning rate, initial value, regularization etc. often allow people feel vast and hazy.So, successors
The most just forgotten the significance and importance of these parameters.
3rd, when data set is increasing, data science man it is frequently necessary to build a distributed platform, and its it
On be iteratively repeated these experiment.They are also possible to use some third-party software kits.But unfortunately, different soft
Part/hardware environment is installed and configured these software kits and is typically the most dull.
Finally, between data science man, shared data set and experience are highly difficult things.Certainly they can share them
Code and result, but this and be unfavorable for their their data set of profound understanding and make full use of others code and knot
Really.
The DataHub project support data set version of MIT controls, and but can not manipulate whole data set analysis and develop
Journey.So this project is more a database management tools rather than SDK.On the other hand, Harvard
Dataverse is then a data publication and sharing platform, but lacks Version Control and the analytic function of data.
Summary of the invention
The shortcoming of prior art in view of the above, it is an object of the invention to provide a kind of data and code release management
System and method, for solving the problems such as the version management efficiency of data and code in prior art is low or chaotic.
For achieving the above object and other relevant purposes, the first aspect of the application is to provide a kind of data version management
System, including: data management module, code administration module, enforcement engine module, and system core module, wherein, described
Data management module storage has at least one data set;Described code administration module stores has at least one to perform code, described
Perform code at least one data set described is operated;After described enforcement engine module is configured with at least one execution
End engine, performs back-end engine according to the commands calls one received, and runs an execution code with to described data management mould
At least one data set in block performs operation;When described system core module receives the data processing request that user submits to, place
Manage the data set in described data management module, create the datamation stream of described data set, and record the versions of data of formation
Information.
In an embodiment of the application, the data of described data set are stored in one first memory element, described data set
Metadata be stored in one second memory element, and the data of described data set and metadata are associated by a data ID.
In an embodiment of the application, described system core module receives the data processing request of user's submission for carrying
When handing over a new data set, described system core module is extracted the data of described new data set and is stored in described first storage list
Unit, extracts the metadata of described new data set and is stored in described second memory element, and it is described to form a new data ID association
The data of new data set and metadata, create the datamation stream of described data set, and record the data version information of formation.
In an embodiment of the application, described system core module receives the data processing request of user's submission for repairing
When changing a data set of storage in described data management module, described system core module will according to described data processing request
One performs Code copying in described enforcement engine module, and sends an execution order and make it run to described enforcement engine module
Described execution code is to form a new data set, and described system core module is extracted the data of described new data set and is stored in institute
State the first memory element, extract the metadata of described new data set and be stored in described second memory element, and formation one newly counts
Associate data and the metadata of described new data set according to ID, and form a code ID by described execution code and described new data
Collection is associated, and creates the datamation stream of described data set, and records the data version information of formation.
The another aspect of the application provides a kind of data version management method, said method comprising the steps of: prestore to
Lack a data set and perform code at least one that at least one data set described is operated, and configuring at least
A kind of execution back-end engine for running described execution code;And;When receiving the data processing request that user submits to, adjust
Perform at least one execution code described in back-end engine operation with one and, so that described at least one data set is performed process, create described number
According to the datamation stream of collection, and record the data version information of formation.
In an embodiment of the application, the data of described data set are stored in one first memory element, described data set
Metadata be stored in one second memory element, and the data of described data set and metadata are associated by a data ID.
In an embodiment of the application, the described data processing request receiving user's submission is for submitting a new number to
During according to collection, extract the data of described new data set and be stored in described first memory element, extracting first number of described new data set
According to and be stored in described second memory element, and form a new data ID and associate data and the metadata of described new data set, and
The data version information that record is formed.
In an embodiment of the application, the described data processing request receiving user's submission is for revising described data
During the data set stored in management module, perform Code copying to described execution according to described data processing request by one and draw
In holding up, and send an execution order to described enforcement engine make its run described execution code to form a new data set, extract
The data of described new data set are also stored in described first memory element, extract the metadata of described new data set and are stored in institute
State the second memory element, and formation one new data ID associates data and the metadata of described new data set, and form a code
Described execution code is associated by ID with described new data set, and records the data version information of formation.
The another further aspect of the application is to provide a kind of code release management system, including: data management module, dematron
Reason module, enforcement engine module, and system core module, wherein said data management module storage has at least one data
Collection;Described code administration module stores has at least one to perform code, and described execution code is for described data management module
At least one data set of storage operates;Described code administration module is additionally operable to receive the code of user's propelling movement and is stored
Or send a code process request according to the code of user's propelling movement;Described enforcement engine module is configured with at least one and performs rear end
Engine, when being used for receiving execution order, according to performing back-end engine described in a commands calls, run an execution code with
A data set in described data management module is performed operation;Described system core module is for recording the code that user pushes
And form code release information, and when receiving the code process request of described code administration module, send one and perform order
To described enforcement engine module, it is made to run the execution code in described code administration module, and at described execution code with right
A data set in described data management module records the code release information of formation after performing operation.
In an embodiment of the application, described system core module is additionally operable to the data processing request submitted to according to user
Perform Code copying by one in described enforcement engine module, and send an execution order and make it transport to described enforcement engine module
The described execution code of row is to form a new data set, and forms a code ID by relevant to described new data set for described execution code
Connection, and record the code release information of formation.
In an embodiment of the application, described system core module according to described data processing request be copied to described in hold
Row engine modules perform that code is that user submits to new perform storage in code or the described code administration module called
Perform code.
The another aspect of the application is to provide a kind of code release management method, comprises the following steps: prestore at least one
Individual data set and perform code at least one operating at least one data set described, and configures at least one
For running the execution back-end engine of described execution code;And;The code receiving user's propelling movement is stored, and records formation
Code release information;Or the code pushed according to user sends a code process request, send an execution order to described
Performing back-end engine makes it run the execution code prestored, and at described execution code so that the described data set prestored is performed
The code release information of formation is recorded after operation.
In an embodiment of the application, the data of described data set are stored in one first memory element, described data set
Metadata be stored in one second memory element, and the data of described data set and metadata are associated by a data ID.
In an embodiment of the application, code release management method also includes step, according at the data that user submits to
Reason request performs one in Code copying extremely described execution back-end engine, and it runs described execution generation to send an execution command commands
Code is to form a new data set, and forms a code ID and be associated with described new data set by described execution code, and records shape
The code release information become.
In an embodiment of the application, described is copied to described execution back-end engine according to described data processing request
In perform that code is that user submits to new perform the execution code of storage in code or the described code administration module called.
In an embodiment of the application, code release management method also includes the step being configured with multiple user UI, uses
To receive the request of different user submission respectively or to different user feedback request information.
As it has been described above, the data of the present invention and code release management system and method, have the advantages that the present invention
By providing respective version management for data set and code, and for each data set and code provide directed acyclic workflow and
Build the incidence relation of the two, efficiently solve the problems such as the version management efficiency of data and code is low or chaotic;It addition, use
UI designs, it is possible to for user's comparison, analyzes each history data set and provides approach easily;It addition, each unit is distributed in difference
On server, it is possible to be easy to alleviate the operating pressure on each server.
Accompanying drawing explanation
Fig. 1 is shown as the structural representation of the data version management system of the present invention.
Fig. 2 is shown as the flow chart of the data version management method of the present invention.
Fig. 3 is shown as the structural representation of the code release management system of the present invention.
Fig. 4 is shown as the flow chart of the code release management method of the present invention.
Fig. 5 is shown as the composition schematic diagram of datamation stream in the present invention one specific embodiment.
Element numbers explanation
1 data version management system
11 data management modules
12 code administration modules
13 enforcement engine modules
14 system core module
2 code release management systems
21 data management modules
22 code administration modules
23 enforcement engine modules
24 system core module
S11~S12, S21~S22 step
Detailed description of the invention
Below by way of specific instantiation, embodiments of the present invention being described, those skilled in the art can be by this specification
Disclosed content understands other advantages and effect of the present invention easily.The present invention can also be by the most different concrete realities
The mode of executing is carried out or applies, the every details in this specification can also based on different viewpoints and application, without departing from
Various modification or change is carried out under the spirit of the present invention.It should be noted that, in the case of not conflicting, following example and enforcement
Feature in example can be mutually combined.
It should be noted that the diagram provided in following example illustrates the basic structure of the present invention the most in a schematic way
Think, the most graphic in component count time only display with relevant assembly in the present invention rather than is implemented according to reality and arbitrary shape,
During its actual enforcement, the kenel of each assembly, quantity and ratio can be a kind of random change, and its assembly layout kenel is likely to
Increasingly complex.
Embodiment one
Refer to Fig. 1, be shown as the structural representation of the data version management system of the present invention, as it can be seen, the application
First aspect be to provide a kind of data version management system, described data version management system can be only fitted to separate unit service
In device, server cluster, server based on cloud computing framework or distributed server.Wherein, described server cluster refer to by
A lot of server centered get up to carry out data version management together, and described server cluster can utilize multiple computer to carry out also
Row calculates, to improve arithmetic speed.Each server is stored by described server based on cloud computing framework by Intel Virtualization Technology
Chi Hua so that in data version management system, each module place server is shared and calculated resource.Described distributed server be by
Data and program in described data version management system are dispersed on multiple server and carry out coordinated operation.
Each module in described data version management system can be arranged in any of the above-described kind of service according to actual design needs
In device.Specifically, described data version management system 1 includes: data management module 11, code administration module 12, enforcement engine
Module 13, and system core module 14.
The storage of described data management module 11 has at least one data set.Wherein, described data set is by version management
The set of data.Described data include but not limited to: text data and/or multi-medium data etc..In a specific embodiment
In, described text data is exemplified as code data, system journal etc..Described multi-medium data is exemplified as image data, video counts
According to etc..If described data management module 11 preserves multiple data set, can be uncorrelated or relevant between the most each data set
Connection.Such as, in data set A1, A2 and A3, data set A3 is derived by data set A1 and A2.Data set A3 passes through rope
Draw or associate field is associated with data set A1 and A2.
Described data set can also comprise the metadata for indexing or describe data.Wherein, in described data set
The metadata of each data and correspondence can be associated by a data ID.Specifically, described metadata (Metadata) is also known as intermediary
Data, relay data, for describing the data (data about data) of data, mainly describe data attribute (property)
Information, be used for supporting such as to indicate the functions such as storage position, historical data, resource lookup, file record.Described metadata is calculated
It is a kind of electronic type catalogue, in order to reach the purpose of scheduling, it is necessary to perhaps characteristic in describing and collect data, and then
Reach the purpose assisting data retrieval.
A kind of alternative is that the data of described data set are stored in one first memory element, first number of described data set
According to being stored in one second memory element, and the data of described data set and metadata are associated by a data ID.Here, it is described
First memory element and the second memory element are configurable in same database server.Can also be arranged according to actual needs
In different servers.Such as in an optional embodiment, described first memory cell arrangements is in Hadoop distributed document
In system (big data distributed file system);Second memory cell arrangements is in NoSQL data base (data base of non-relational)
In.
Wherein, described distributed file system (Distributed File System) designs based on client/server
Pattern, the physical memory resources specifically referring to file system management is not necessarily directly connected on the local node, but by meter
Calculation machine network is connected with node.Described NoSQL data base for example, key assignments (Key-Value) stores data base, column storage
Storehouse, Document image analysis, figure (Graph) data base or MongoDB database.
In this application, utilize distributed file system can scan data set efficiently, but random access then poor efficiency
's.In order to solve this problem, the scheme that the application provides is to store mark and the note of every pictures, such as filename, big
Little, content describes, and these contents are stored in NoSQL data base to accelerate inquiry velocity, namely connect according to data ID
Connect initial data and metadata.
The data version management system of the application have recorded each territory of data, and such as, a new data is a newname
Word and the set of version number.The embodiment that the application specifically uses uses MongoDB data base to store metadata,
But do not limit to and this, in other implementations, it is also possible to column storage database, key assignments storage data will be moved to
Storehouse, to improve efficiency in Document image analysis, or figure (Graph) data base.
The storage of described code administration module 12 has at least one to perform code, and described execution code is for described at least one
Individual data set operates.Wherein, when described execution code is called, perform the data in data set and data set are entered
The operations such as row additions and deletions change.Such as, described execution code includes but not limited to: increases the execution code of a new data set, delete number
According to the execution code collected, the execution code increasing label/character etc. in the data that preset data is concentrated, concentrate in preset data
Data in delete performing code, replacing the execution of label/character etc. in the data that preset data is concentrated of label/character etc.
Code.
Can be selected for being in embodiment at one, described execution code is stored in such as Gitlab, and uses
The API of GitLab interacts.Described Gitlab is to utilize mono-edition management system increased income of Ruby on Rails, real
An existing Git project warehouse from trustship, can be conducted interviews disclosed or private items by web interface.Described
Gitlab has the function similar with Github, it is possible to browse source code, management defect and annotation.Can be with Executive Team to storehouse
The access in storehouse, described Gitlab is highly susceptible to browsing the version submitted to and providing a file history storehouse.Team Member can
To utilize built-in simple TALKER (Wall) to exchange.Described Gitlab also provides for a code snippet collecting function
Can easily realize code reuse, it is simple to make a look up time the most in need.
Described enforcement engine module 13 is configured with at least one and performs back-end engine, according to the commands calls one received
Perform back-end engine, run an execution code so that at least one data set in described data management module 11 is performed operation.?
This, described execution back-end engine sets for the programming language of each execution code.Described execution back-end engine includes unit engine
And Distributed engine.
Python, described Python on described unit engine for example, unit is free software purely, source code
Following GPL (GNU General Public License) agreement with interpreter CPython, it first can be by .py
Compilation of source code in file becomes the byte code (bytecode) of Python, the most again by Python Virtual Machine
(Python virtual machine) performs the byte code that these are compiled.
Spark on described Distributed engine for example, cluster, described Spark are quick and general computing cluster
Framework, its kernel uses Scala language to write, it provides Scala, Java and Python programming language high-level
API, uses these API can develop the application program of parallel processing easily.
In the particular embodiment, the execution back-end engine described in configuration is required, because so not only facilitating as user
Put up the environment of distributed type assemblies;Automatically code and result data collection can also be coupled together.It is to say,
The data version management system of the application performs code and can obtain arbitrary intermediate object program, as long as user remains original number
According to and code.
When described system core module 14 receives the data processing request that user submits to, process described data management module
Data set in 11, creates the datamation stream of described data set, and records the data version information of formation.
Wherein, described datamation stream (Data Work Flow, DWF) is used for being marked at data set during version management
And/or the oriented acyclic v ion of the data in data set.For new data set, described datamation stream is corresponding
Labelling v ion is root node.For the data set/data comprising version updating, described datamation stream is for representing number
According to the secondary relationship between two data concentrated and/or between two data sets.Described secondary relationship includes the execution of a data set
Historical record and version.Wherein, described execution historical record includes but not limited to: the data intensive data before change and after change
Points relationship (i.e. father and son's node relationships), the execution code called before and after change, execution time etc..In brief, described
The concept of datamation stream is as the data set logical relation in the data version management system of the application.Described data set foundation
This defines dependence.Described datamation stream be the application data version management system in reappear the core merit of data
Energy.
In a described datamation stream, a node represents the particular version illustrating data set.Article one, company
The directed edge connecing two nodes represents that a data set is to have another data set to be derived.What the labelling on limit represented is then
The code release once tested.Refer to Fig. 5, the example of shown in Fig. 5 a data workflow, i.e. a data workflow
It it is a directed acyclic graph.Picture 5 illustrate common one to one with two kinds of datamation flow structures of many-one structure.
In above-mentioned structure one to one, a data set is derived by another data set.For example, Yong Huke
To create a new data set based on an existing data set, and on new data set, stamp some new labels and incite somebody to action
It shares to other users.And one data set of above-mentioned many-one representation can be derived by two or more data sets
?.As some operations merging two tables of data etc are not always the case.
Father and son's joint is introduced the when of stating datamation flow structure in the data version management system of the application in realization
Point is related to that this attribute is used for recording this data set from which data set is derived.Meanwhile, the data version of the application
This management system can also realize comparing the function of the difference between father and son's data set.These functions help user's energy easier
Find what result the amendment of oneself code result in.Therefore, the structure chart of described datamation stream not only can make number in order
According to the relation between collection, it is also possible to help the execution record of management user, produce result including according to version number.
Described system core module 14 is according to the data set before change and the label information (such as ID value) of data and data work
Flow, obtain and record the data version information after forming correspondence change.Wherein, described data version information includes but not limited to
At least one in dataset name, data ID, code ID, formation time and running log.
Here, the data process that described system core module 14 can pass through network/submission interface captures user submission please
Ask, and send the execution order of correspondence and from dematron according to acquired data processing request to described enforcement engine module 13
Execution code selected in reason module 12, calls the execution back-end engine of correspondence to run for described enforcement engine module 13
The execution code selected, in order to carry out the version management of data set.
Such as in a concrete implementation process, whenever user submits execution code (Push) to Gitlab server,
Gitlab server will notify described system core module 14 by a Web hook.Described system core module 14 can be by
User's request pushes the queue of oneself, chooses request from head of the queue simultaneously and processes.Described system core module 14 can will be somebody's turn to do
The execution code copies of request can use user to provide to described enforcement engine module 13, the most described enforcement engine module 13
Parameter and input run execution code.After this task terminates, described system core module 14 can record current request
Information, including current Push commit ID on Gitlab server, parameter that user specifies and any concrete with experiment
Relevant information.In some cases, experiment can produce new data set.The most described system core module 14 also can record these
Relation between data set, the most aforesaid datamation stream.
The request of data processed when user's submission is multiple situation, will be illustrated respectively below:
In one case, when described data processing request is for submitting a new data set to, described system core module
New data set can be directly saved in the first memory element by 14, creates the datamation stream of described new data set, and record
The data version information formed.Or, described system core module 14 according to this data processing request from code administration module 12
In choose the execution code of correspondence, and send the execution order of submission new data set of correspondence to enforcement engine module 13.Institute
State enforcement engine module 13 and performed the execution code selected by order execution according to receive, and by acquired new data set
Data be stored in the first memory element.Meanwhile, described system core module 14 also creates the data of described data set
Workflow, and record the data version information of formation.
In a kind of alternative, described new data set includes data and metadata.Described system core module 14 is in choosing
When taking execution code, choose and corresponding can preserve data and the execution code of metadata, and the execution code selected by execution, from
New data set is extracted data and metadata respectively, then the data extracted are stored in the first memory element, will be extracted
Metadata be stored in the second memory element, and form a new data ID and associate data and the metadata of described new data set, wound
Build the datamation stream of described data set, and record the data version information of formation.Here, described datamation stream comprises work
Data and corresponding metadata for root node.Recorded data version information includes: dataset name, data ID, unit's number
According to ID, data ID and the corresponding relation of metadata ID, perform to add the code ID of this data set, form time and running log.
In another case, the data processing request submitted to as user stores for revising in described data management module 11
A data set time, described system core module 14 according to described data processing request by one perform Code copying to described in hold
In row engine modules 13, and send an execution order to described enforcement engine module 13 make its run described execution code with formed
One new data set.Again the new data set formed is saved in the first memory element, creates this new data set simultaneously and relatively repair
The datamation stream of the data set before changing, and record the data version information of formation.
A kind of optional mode is, described system core module 14 is extracted the data of described new data set and is stored in described the
One memory element, extracts the metadata of described new data set and is stored in described second memory element, and forms a new data ID
Associate data and the metadata of described new data set, and form a code ID by described execution code and described new data set phase
Association, creates the datamation stream of described data set, and records the data version information of formation.
The mode being more highly preferred to is, described system core module 14 is copied to described execution according to described data processing request
The new execution performing that code is that user submits to of engine modules 13 stores in code or the described code administration module 12 called
Execution code.
Specifically, user can also submit new execution code to previously according to self-demand, and manually or by described system
System nucleus module 14 adjusts the corresponding relation between the new execution back-end engine performed in code and enforcement engine module 13.By
This, when the data processing request that user submits to is the data set revising storage in described data management module 11, described
System core module 14, according to described data processing request, determines this new execution Code copying to described enforcement engine module
In 13, and performed to perform code accordingly by corresponding execution back-end engine, to preserve new data set, and create new data
The datamation stream of the data set before collecting relative to amendment, and record the data version information of formation.
When user needs the data set between different editions is analyzed and is calculated, a kind of preferably side in the present embodiment
Formula is, described data version management system 1 also includes: Subscriber Interface Module SIM (is illustrated).Described Subscriber Interface Module SIM is joined
It is equipped with multiple user UI, in order to receive the request of different user submission respectively or to different user feedback request information.
Specifically, user it is frequently necessary to analyze some data sets and calculate in some parameters, such as natural language processing
Accuracy rate or stock market return investment repayment every day in survey.Creating datamation stream and the versions of data of data set
After information, described Subscriber Interface Module SIM can provide the user the data in each data set associated based on datamation stream.
Shown data can help the difference that user contrasts a pair historical analysis result, shows in code and/or parameter.For many
The multiple UI design planting user can help each user to obtain best algorithm and parameter.
In sum, the data version management system that the application provides can be implemented in and manages number in a system integrated
According to version, and run personal code work in systems;Code and the data of user can be retained simultaneously, and twice version can be entered
Row compares, and finds difference;It addition, data and metadata are stored separately by the data version management system of the present invention so that permissible
Cross filter data more efficiently;Furthermore, the present invention is by providing respective version management for data set and code, and is each data set
There is provided the workflow of directed acyclic with code and build the incidence relation of the two, efficiently solving the version management of data and code
The problems such as efficiency is low or chaotic;It addition, use multiple UI to design, it is possible to for user's comparison, analyze each history data set and provide
Approach easily;It addition, each unit is distributed on different server, it is possible to be easy to alleviate the operating pressure on each server.
Embodiment two
Refer to Fig. 2, be shown as the flow chart of the data version management method of the present invention, as it can be seen, the of the application
Two aspects are to provide a kind of data version management method.Described data version management method mainly has data version management system
Perform.Wherein, described data version management system can be only fitted to single server, server cluster, based on cloud computing frame
In the server of structure or distributed server.Wherein, described server cluster refers to get up a lot of server centered carry out together
Data version management, described server cluster can utilize multiple computer to carry out parallel computation, to improve arithmetic speed.Described
Server based on cloud computing framework passes through Intel Virtualization Technology by each server storage pool so that in data version management system
Each module place server is shared and is calculated resource.Described distributed server is by the data in described data version management system
It is dispersed on multiple server with program and carries out coordinated operation.
Each module in described data version management system can be arranged in any of the above-described kind of service according to actual design needs
In device.Described data version management system suddenly performs described method according to following steps.
In step s 11, prestore at least one data set and for operating at least one data set described
At least one performs code, and configures at least one for the execution back-end engine running described execution code.
Here, described data set is by the set of the data of version management.Described data include but not limited to: textual data
According to and/or multi-medium data etc..In a specific embodiment, described text data is exemplified as code data, system journal
Deng.Described multi-medium data is exemplified as image data, video data etc..If described data management module preserves multiple data sets,
Can be uncorrelated or related between the most each data set.Such as, in data set A1, A2 and A3, data set A3 is
Derived by data set A1 and A2.Data set A3 is associated with data set A1 and A2 by index or associate field.
Described data set can also comprise the metadata for indexing or describe data.Wherein, in described data set
The metadata of each data and correspondence can be associated by a data ID.Specifically, described metadata (Metadata) is also known as intermediary
Data, relay data, for describing the data (data about data) of data, mainly describe data attribute (property)
Information, be used for supporting such as to indicate the functions such as storage position, historical data, resource lookup, file record.Described metadata is calculated
It is a kind of electronic type catalogue, in order to reach the purpose of scheduling, it is necessary to perhaps characteristic in describing and collect data, and then
Reach the purpose assisting data retrieval.
A kind of alternative is that the data of described data set are stored in one first memory element, first number of described data set
According to being stored in one second memory element, and the data of described data set and metadata are associated by a data ID.Here, it is described
First memory element and the second memory element are configurable in same database server.Can also be arranged according to actual needs
In different servers.Such as in an optional embodiment, described first memory cell arrangements is in Hadoop distributed document
In system (big data distributed file system);Second memory cell arrangements is in NoSQL data base (data base of non-relational)
In.
Wherein, described distributed file system (Distributed File System) designs based on client/server
Pattern, the physical memory resources specifically referring to file system management is not necessarily directly connected on the local node, but by meter
Calculation machine network is connected with node.Described NoSQL data base for example, key assignments (Key-Value) stores data base, column storage
Storehouse, Document image analysis, figure (Graph) data base or MongoDB database.
In this application, utilize distributed file system can scan data set efficiently, but random access then poor efficiency
's.In order to solve this problem, the scheme that the application provides is to store mark and the note of every pictures, such as filename, big
Little, content describes, and these contents are stored in NoSQL data base to accelerate inquiry velocity, namely connect according to data ID
Connect initial data and metadata.
The data version management system of the application have recorded each territory of data, and such as, a new data is a newname
Word and the set of version number.The embodiment that the application specifically uses uses MongoDB data base to store metadata,
But do not limit to and this, in other implementations, it is also possible to column storage database, key assignments storage data will be moved to
Storehouse, to improve efficiency in Document image analysis, or figure (Graph) data base.
Here, described execution code is for operating at least one data set described.Wherein, when described execution code
Time called, perform the operations such as the data in data set and data set are carried out that additions and deletions change.Such as, described execution code includes
But be not limited to: increase a new data set perform code, delete data set perform code, preset data concentrate data in
Increase label/character etc. perform code, delete in the data that preset data is concentrated label/character etc. execution code,
The data that preset data is concentrated are replaced the execution code of label/character etc..
Can be selected for being in embodiment at one, described execution code is stored in such as Gitlab, and uses
The API of GitLab interacts.Described Gitlab is to utilize mono-edition management system increased income of Ruby on Rails, real
An existing Git project warehouse from trustship, can be conducted interviews disclosed or private items by web interface.Described
Gitlab has the function similar with Github, it is possible to browse source code, management defect and annotation.Can be with Executive Team to warehouse
Access, described Gitlab is highly susceptible to browsing the version submitted to and providing a file history storehouse.Team Member is permissible
Built-in simple TALKER (Wall) is utilized to exchange.Described Gitlab also provides for a code snippet collecting function can
Easily to realize code reuse, it is simple to make a look up time the most in need.Here, described execution back-end engine is for respectively holding
The programming language of line code and set.Described execution back-end engine includes unit engine and Distributed engine.
Python, described Python on described unit engine for example, unit is free software purely, source code
Following GPL (GNU General Public License) agreement with interpreter CPython, it first can be by .py
Compilation of source code in file becomes the byte code (bytecode) of Python, the most again by Python Virtual Machine
(Python virtual machine) performs the byte code that these are compiled.
Spark on described Distributed engine for example, cluster, described Spark are quick and general computing cluster
Framework, its kernel uses Scala language to write, it provides Scala, Java and Python programming language high-level
API, uses these API can develop the application program of parallel processing easily.
In the particular embodiment, the execution back-end engine described in configuration is required, because so not only facilitating as user
Put up the environment of distributed type assemblies;Automatically code and result data collection can also be coupled together.It is to say,
The data version management system of the application performs code and can obtain arbitrary intermediate object program, as long as user remains original number
According to and code.
In step s 12, when receiving the data processing request that user submits to, call an execution back-end engine and run described
At least one execution code, so that described at least one data set is performed process, creates the datamation stream of described data set, and record
The data version information formed.
Specifically, when described data version management system receives the data processing request that user submits to, described number is processed
According to the data set in management module, create the datamation stream of described data set, and record the data version information of formation.
Wherein, described datamation stream (Data Work Flow, DWF) is used for being marked at data set during version management
And/or the oriented acyclic v ion of the data in data set.For new data set, described datamation stream is corresponding
Labelling v ion is root node.For the data set/data comprising version updating, described datamation stream is for representing number
According to the secondary relationship between two data concentrated and/or between two data sets.
Described secondary relationship includes execution historical record and the version of a data set.Wherein, described execution historical record bag
Include but be not limited to: institute before and after the points relationship (i.e. father and son's node relationships) of the data intensive data before change and after change, change
The execution code that calls, execution time etc..In brief, the concept of described datamation stream is as the versions of data of the application
Data set logical relation in management system.Described data set defines dependence according to this.Described datamation stream is this
The data version management system of application is reappeared the Core Feature of data.
In a described datamation stream, a node represents the particular version illustrating data set.Article one, company
The directed edge connecing two nodes represents that a data set is to have another data set to be derived.What the labelling on limit represented is then
The code release once tested.Refer to Fig. 5, the example of shown in Fig. 5 a data workflow, i.e. a data workflow
It it is a directed acyclic graph.Picture 5 illustrate common one to one with two kinds of datamation flow structures of many-one structure.
In above-mentioned structure one to one, a data set is derived by another data set.For example, Yong Huke
To create a new data set based on an existing data set, and on new data set, stamp some new labels and incite somebody to action
It shares to other users.And one data set of above-mentioned many-one representation can be derived by two or more data sets
?.As some operations merging two tables of data etc are not always the case.
Father and son's joint is introduced the when of stating datamation flow structure in the data version management system of the application in realization
Point is related to that this attribute is used for recording this data set from which data set is derived.Meanwhile, the data version of the application
This management system can also realize comparing the function of the difference between father and son's data set.These functions help user's energy easier
Find what result the amendment of oneself code result in.Therefore, the structure chart of described datamation stream not only can make number in order
According to the relation between collection, it is also possible to help the execution record of management user, produce result including according to version number.
Described data version management system is according to the data set before change and the label information (such as ID value) of data and data
Workflow, obtains and records the data version information after forming correspondence change.Wherein, described data version information includes but does not limits
At least one in dataset name, data ID, code ID, formation time and running log.
Here, described data version management system can pass through the data process that network/submission interface captures user is submitted to
Request, and send corresponding execution order and from dematron according to acquired data processing request to described enforcement engine module
Execution code selected in reason module, the execution back-end engine calling correspondence for described enforcement engine module is run selected
Execution code, in order to carry out the version management of data set.
Such as in a concrete implementation process, whenever user submits execution code (Push) to Gitlab server,
Gitlab server will notify described system core module 14 by a Web hook.Described system core module 14 can be by
User's request pushes the queue of oneself, chooses request from head of the queue simultaneously and processes.Described system core module 14 can will be somebody's turn to do
The execution code copies of request can use user to provide to described enforcement engine module 13, the most described enforcement engine module 13
Parameter and input run execution code.After this task terminates, described system core module 14 can record current request
Information, including current Push commit ID on Gitlab server, parameter that user specifies and any concrete with experiment
Relevant information.In some cases, experiment can produce new data set.The most described system core module 14 also can record these
Relation between data set, the most aforesaid datamation stream.
The request of data processed when user's submission is multiple situation, will be illustrated respectively below:
In one case, when described data processing request is for submitting a new data set to, data version management system of institute
New data set can be directly saved in the first memory element by system, creates the datamation stream of described new data set, and record
The data version information formed.Or, described data version management system chooses the execution of correspondence according to this data processing request
Code, is stored in the first memory element with the data by acquired new data set.Meanwhile, described data version management
System also creates the datamation stream of described data set, and records the data version information of formation.
In a kind of alternative, described new data set includes data and metadata.Described data version management system exists
When choosing execution code, choose and corresponding can preserve data and the execution code of metadata, and the execution code selected by execution,
From new data set, extract data and metadata respectively, then the data extracted are stored in the first memory element, will be carried
The metadata taken is stored in the second memory element, and formation one new data ID associates data and the metadata of described new data set,
Create the datamation stream of described data set, and record the data version information of formation.Here, described datamation stream comprises
Data and corresponding metadata as root node.Recorded data version information includes: dataset name, data ID, unit
The corresponding relation of data ID, data ID and metadata ID, execution add the code ID of this data set, form the time and run day
Will.
In another case, when the data processing request that user submits to is the data set that amendment has stored, institute
State data version management system and according to described data processing request, one execution Code copying performed in back-end engine to corresponding,
And send an execution order to described execution back-end engine make its run described execution code to form a new data set.Again by institute
The new data set formed is saved in the first memory element, creates the data of the data set before this new data set is revised relatively simultaneously
Workflow, and record the data version information of formation.
A kind of optional mode is, described data version management system is extracted the data of described new data set and is stored in described
First memory element, extracts the metadata of described new data set and is stored in described second memory element, and forms a new data
ID associates data and the metadata of described new data set, and forms a code ID by described execution code and described new data set
It is associated, creates the datamation stream of described data set, and record the data version information of formation.
The mode being more highly preferred to is, described data version management system according to described data processing request be copied to described in hold
Row back-end engine perform that code is that user submits to new perform storage in code or the described code administration module called
Perform code.
Specifically, user can also submit new execution code to previously according to self-demand, and by described versions of data
What management system call interception was new performs code and each corresponding relation performed between back-end engine.Thus, when the data that user submits to
Process request for amendment stored a data set time, described data version management system according to described data processing request,
Determine and this new execution Code copying to correspondence is performed in back-end engine, and performed by corresponding execution back-end engine corresponding
Perform code, to preserve new data set, and create the new data set datamation stream relative to the data set before amendment, and
The data version information that record is formed.
When user needs the data set between different editions is analyzed and is calculated, a kind of preferably side in the present embodiment
Formula is, described data version management method also includes the step being configured with multiple user UI, in this step, is configured with multiple use
Family UI, in order to receive the request of different user submission respectively or to different user feedback request information.
Specifically, user it is frequently necessary to analyze some data sets and calculate in some parameters, such as natural language processing
Accuracy rate or stock market return investment repayment every day in survey.Creating datamation stream and the versions of data of data set
After information, described data version management system can provide the user the number in each data set associated based on datamation stream
According to.Shown data can help the difference that user contrasts a pair historical analysis result, shows in code and/or parameter.Pin
Multiple UI design to multiple user can help each user to obtain best algorithm and parameter.
In sum, the data version management system that the application provides can be implemented in and manages number in a system integrated
According to version, and run personal code work in systems;Code and the data of user can be retained simultaneously, and twice version can be entered
Row compares, and finds difference;It addition, data and metadata are stored separately by the data version management system of the present invention so that permissible
Cross filter data more efficiently;Furthermore, the present invention is by providing respective version management for data set and code, and is each data set
There is provided the workflow of directed acyclic with code and build the incidence relation of the two, efficiently solving the version management of data and code
The problems such as efficiency is low or chaotic;It addition, use multiple UI to design, it is possible to for user's comparison, analyze each history data set and provide
Approach easily;It addition, each unit is distributed on different server, it is possible to be easy to alleviate the operating pressure on each server.
Embodiment three
Refer to Fig. 3, be shown as the structural representation of the code release management system of the present invention, as it can be seen, the application
The third aspect be to provide a kind of code release management system, described code release management system can be only fitted to separate unit service
In device, server cluster, server based on cloud computing framework or distributed server.Wherein, described server cluster refer to by
A lot of server centered get up to carry out data version management together, and described server cluster can utilize multiple computer to carry out also
Row calculates, to improve arithmetic speed.Each server is stored by described server based on cloud computing framework by Intel Virtualization Technology
Chi Hua so that in code release management system, each module place server is shared and calculated resource.Described distributed server be by
Data and program in described code release management system are dispersed on multiple server and carry out coordinated operation.
Each module in described code release management system can be arranged in any of the above-described kind of service according to actual design needs
In device.Specifically, described code release management system 2 includes: data management module 21, code administration module 22, enforcement engine
Module 23 and system core module 24.
The storage of described data management module 21 has at least one data set.
Wherein, described data set is by the set of the data of version management.Described data include but not limited to: textual data
According to and/or multi-medium data etc..In a specific embodiment, described text data is exemplified as code data, system journal
Deng.Described multi-medium data is exemplified as image data, video data etc..If described data management module 21 preserves multiple data
Collection, can be uncorrelated or related between the most each data set.Such as, in data set A1, A2 and A3, data set A3
Derived by data set A1 and A2.Data set A3 is associated with data set A1 and A2 by index or associate field.
Described data set can also comprise the metadata for indexing or describe data.Wherein, in described data set
The metadata of each data and correspondence can be associated by a data ID.Specifically, described metadata (Metadata) is also known as intermediary
Data, relay data, for describing the data (data about data) of data, mainly describe data attribute (property)
Information, be used for supporting such as to indicate the functions such as storage position, historical data, resource lookup, file record.Described metadata is calculated
It is a kind of electronic type catalogue, in order to reach the purpose of scheduling, it is necessary to perhaps characteristic in describing and collect data, and then
Reach the purpose assisting data retrieval.
A kind of alternative is that the data of described data set are stored in one first memory element, first number of described data set
According to being stored in one second memory element, and the data of described data set and metadata are associated by a data ID.Here, it is described
First memory element and the second memory element are configurable in same database server.Can also be arranged according to actual needs
In different servers.Such as in an optional embodiment, described first memory cell arrangements is in Hadoop distributed document
In system (big data distributed file system);Second memory cell arrangements is in NoSQL data base (data base of non-relational)
In.
Wherein, described distributed file system (Distributed File System) designs based on client/server
Pattern, the physical memory resources specifically referring to file system management is not necessarily directly connected on the local node, but by meter
Calculation machine network is connected with node.Described NoSQL data base for example, key assignments (Key-Value) stores data base, column storage
Storehouse, Document image analysis, figure (Graph) data base or MongoDB database.
In this application, utilize distributed file system can scan data set efficiently, but random access then poor efficiency
's.In order to solve this problem, the scheme that the application provides is to store mark and the note of every pictures, such as filename, big
Little, content describes, and these contents are stored in NoSQL data base to accelerate inquiry velocity, namely connect according to data ID
Connect initial data and metadata.
The code release management system of the application have recorded each territory of data, and such as, a new data is a newname
Word and the set of version number.The embodiment that the application specifically uses uses MongoDB data base to store metadata,
But do not limit to and this, in other implementations, it is also possible to column storage database, key assignments storage data will be moved to
Storehouse, to improve efficiency in Document image analysis, or figure (Graph) data base.
The storage of described code administration module 22 has at least one to perform code, and described execution code is for described data pipe
At least one data set of reason module 21 storage operates.
Wherein, when described execution code is called, performs that the data in data set and data set are carried out additions and deletions and change
Operation.Such as, described execution code includes but not limited to: increase by a new data set performs code, the execution of deletion data set
Code, in the data that preset data is concentrated, increase performing code, deleting in the data that preset data is concentrated of label/character etc.
Performing code, replace the execution code of label/character etc. in the data that preset data is concentrated except label/character etc..
Can be selected for being in embodiment at one, described execution code is stored in such as Gitlab, and uses
The API of GitLab interacts.Described Gitlab is to utilize mono-edition management system increased income of Ruby on Rails, real
An existing Git project warehouse from trustship, can be conducted interviews disclosed or private items by web interface.Described
Gitlab has the function similar with Github, it is possible to browse source code, management defect and annotation.Can be with Executive Team to warehouse
Access, described Gitlab is highly susceptible to browsing the version submitted to and providing a file history storehouse.Team Member is permissible
Built-in simple TALKER (Wall) is utilized to exchange.Described Gitlab also provides for a code snippet collecting function can
Easily to realize code reuse, it is simple to make a look up time the most in need.
In addition, described code administration module 22 be additionally operable to receive user push code store or foundation user
The code pushed sends a code process request.
Wherein under a kind of situation, the code that described code administration module 22 pushes for receiving user is stored.
Specifically, user uses user terminal to described code administration module 22 upload code, the most described code administration mould
The code received is preserved by block 22.Wherein, described code is that user is provided according to described code release management system 2
API newly-built/reorganization execution code.Such as, user improves the execution code in described code administration module 22, and leads to
Crossing the system core module 24 of subsequent detailed description, the execution code after improving is uploaded to described code administration module 22, then
The code received is updated and is preserved by described code administration module 22.
In another scenario, described code administration module 22 sends a code process for the code pushed according to user
Request.
Here, the code that user pushes is without carrying out in advance and the joining of enforcement engine module 23 in system core module 24
Putting, then select the execution code pushed in advance when managing data set as user, described code administration module 22 is to system core
Core module 24 sends a code process request, to inform the execution back-end engine corresponding to system core module 24.
Described enforcement engine module 23, is configured with at least one and performs back-end engine, when being used for receiving execution order, depends on
According to performing back-end engine described in a commands calls, run an execution code with to the number in described data management module 21
Operation is performed according to collection.
Here, described execution back-end engine sets for the programming language of each execution code.Described execution back-end engine bag
Include unit engine and Distributed engine.
Python, described Python on described unit engine for example, unit is free software purely, source code
Following GPL (GNU General Public License) agreement with interpreter CPython, it first can be by .py
Compilation of source code in file becomes the byte code (bytecode) of Python, the most again by Python Virtual Machine
(Python virtual machine) performs the byte code that these are compiled.
Spark on described Distributed engine for example, cluster, described Spark are quick and general computing cluster
Framework, its kernel uses Scala language to write, it provides Scala, Java and Python programming language high-level
API, uses these API can develop the application program of parallel processing easily.
In the particular embodiment, the execution back-end engine described in configuration is required, because so not only facilitating as user
Put up the environment of distributed type assemblies;Automatically code and result data collection can also be coupled together.It is to say,
The code release management system of the application performs code and can obtain arbitrary intermediate object program, as long as user remains original number
According to and code.
Described system core module 24 is for recording the code of user's propelling movement and forming code release information;And receive
When the code process of described code administration module 22 is asked, send an execution and order to described enforcement engine module 23, make it transport
Execution code in the described code administration module 22 of row, and at described execution code with in described data management module 21
Data set records the code release information of formation after performing operation.
Here, for the function of described system core module 24, be respectively described.In in one way in which, when with
When family pushes code, the code received is passed to code administration module 22 by described system core module 24, in order to preserve, with
Time form the code release information of corresponding described code.Wherein, similar with the data version information in the various embodiments described above, described
Code release information include but not limited to following at least one: name of code, code ID, formed the time, specify parameter and run
Daily record.Described system core module 24 refers to the mode utilizing data set workflow to manage data set, enters described code
Row version management.Such as, user improves the execution code stored, and the most described code release management system 2 will be received
Code updated and preserved.Meanwhile, before and after the also record modification of described code management system between two execution codes
Corresponding relation, to form code operation stream;And on the basis of execution code release information before a modification, determine and record amendment
After code release information.In a further mode of operation, when user selects the code of propelling movement to manage data set, described system core
Core module 24 code process based on described code administration module 22 is asked, and sends an execution order to described enforcement engine module
23, make it run the execution code in described code administration module 22, and at described execution code with to described data management mould
A data set in block 21 records the code release information of formation after performing operation.Such as, user is by input parameter, code version
This information etc. select the data in the code administration data set pushed in advance to change, and the most described system core module 24 will be connect
The parameter of receipts, code release information etc. are supplied to code administration module 22.Described code administration module 22 is true according to above-mentioned information
Fixed execution code to be performed, and send the request of corresponding code process to described system core module 24.Described system core
Core module 24 performs order according to the request of described code process to enforcement engine module 23 transmission.The most described enforcement engine module 23
Perform back-end engine accordingly according to described execution command selection and perform specified execution code, and at described execution code
After the data set in described data management module 21 is performed operation, the code release information that record is formed.Wherein, described
At least one in code release information, datamation stream and data version information also records performed code and data
Corresponding relation between collection.
In a concrete implementation process, whenever the execution code that user submits to Gitlab server, Gitlab services
Device will notify described system core module 24 by a Web hook.User's request can be pushed away by described system core module 24
Enter the queue of oneself, choose request from head of the queue simultaneously and process.Described system core module 24 can be by the execution of this request
Code copies is to described enforcement engine module 23, and sends code process request to described enforcement engine module 23, to inform
The execution back-end engine selected.The parameter that the most described enforcement engine module 23 can use user to provide runs execution with input
Code.After this task terminates, described system core module 24 can record the information of current request, exists including current Push
Commit ID on Gitlab server, perform code release information, parameter that user specifies and any concrete with test phase
The information closed.In some cases, experiment can produce new data set.The most described system core module 24 also can record these numbers
According to the relation between collection, i.e. datamation stream described in foregoing embodiments.
It should be noted that described system core module 24 is copied to described enforcement engine according to described data processing request
The execution code of module 23 is in addition to the new execution code that user submits to, it is also possible to be the described code administration module 22 called
The execution code of middle storage.Such as, the execution code in user's appointment codes management module 22 processes corresponding data set, and
The execution code of acquiescence in non-used code administration module 22.
In a kind of alternative, described system core module 24 is additionally operable to the data processing request according to user's submission will
One performs Code copying in described enforcement engine module 23, and sends an execution order and make it to described enforcement engine module 23
Run described execution code to form a new data set, and form a code ID by described execution code and described new data set phase
Association, and record the code release information of formation.
Wherein, described data processing request includes but not limited to: submits a new data set to or revises described data management
A data set of storage in module 21.
Specifically, on the basis of above-mentioned data version management system forms datamation stream and data version information, with
And system core module 24 in the present embodiment in performing Code copying extremely described enforcement engine module 23 by one, and send
On the basis of one execution order makes its described execution code of operation to form a new data set to described enforcement engine module 23, also
Performed execution code ID is associated with new data set, and records the code release information of formation.So, need as user
When the execution historical record of multiple associated data set is analyzed, the relevant letter performing code of available corresponding each data set
Breath.Provide the user more for data set analysis, the average information of statistics.
As a kind of preferred version, described code release management system 2 also includes Subscriber Interface Module SIM, is configured with multiple use
Family UI, in order to receive the request of different user submission respectively or to different user feedback request information.
Specifically, user it is frequently necessary to analyze some data sets and calculate in some parameters, such as natural language processing
Accuracy rate or stock market return investment repayment every day in survey.Creating datamation stream and the versions of data of data set
After information, described Subscriber Interface Module SIM can provide the user the data in each data set associated based on datamation stream,
Perform code etc..Shown data can help user to contrast a pair historical analysis result, show in code and/or parameter
Different.Multiple UI design for multiple user can help each user to obtain best algorithm and parameter.
In sum, the code release management system that the application provides can be implemented in and manages generation in a system integrated
The version of code, and run personal code work in systems;Code and the data of user can be retained simultaneously, and twice version can be entered
Row compares, and finds difference;It addition, data and metadata are stored separately by the code release management system of the present invention so that permissible
Cross filter data more efficiently;Furthermore, the present invention is by providing respective version management for data set and code, and is each data set
There is provided the workflow of directed acyclic with code and build the incidence relation of the two, efficiently solving the version management of data and code
The problems such as efficiency is low or chaotic;It addition, use multiple UI to design, it is possible to for user's comparison, analyze each history data set and provide
Approach easily;It addition, each unit is distributed on different server, it is possible to be easy to alleviate the operating pressure on each server.
Embodiment four
Refer to Fig. 4, be shown as the flow chart of the code release management method of the present invention, as it can be seen, the of the application
Four aspects are to provide a kind of code release management method.Described code release management method is mainly managed system by code release
Perform.Wherein, described code release management system can be only fitted to single server, server cluster, based on cloud computing frame
In the server of structure or distributed server.Wherein, described server cluster refers to get up a lot of server centered carry out together
Data version management, described server cluster can utilize multiple computer to carry out parallel computation, to improve arithmetic speed.Described
Server based on cloud computing framework passes through Intel Virtualization Technology by each server storage pool so that in code release management system
Each module place server is shared and is calculated resource.Described distributed server is by the data in described code release management system
It is dispersed on multiple server with program and carries out coordinated operation.
Each module in described code release management system can be arranged in any of the above-described kind of service according to actual design needs
In device.Specifically, described code release management system performs described method by performing following steps.
In the step s 21, prestore at least one data set and for operating at least one data set described
At least one performs code, and configures at least one for the execution back-end engine running described execution code.
Wherein, described data set is by the set of the data of version management.Described data include but not limited to: textual data
According to and/or multi-medium data etc..In a specific embodiment, described text data is exemplified as code data, system journal
Deng.Described multi-medium data is exemplified as image data, video data etc..If described data management module 21 preserves multiple data
Collection, can be uncorrelated or related between the most each data set.Such as, in data set A1, A2 and A3, data set A3
Derived by data set A1 and A2.Data set A3 is associated with data set A1 and A2 by index or associate field.
Described data set can also comprise the metadata for indexing or describe data.Wherein, in described data set
The metadata of each data and correspondence can be associated by a data ID.Specifically, described metadata (Metadata) is also known as intermediary
Data, relay data, for describing the data (data about data) of data, mainly describe data attribute (property)
Information, be used for supporting such as to indicate the functions such as storage position, historical data, resource lookup, file record.Described metadata is calculated
It is a kind of electronic type catalogue, in order to reach the purpose of scheduling, it is necessary to perhaps characteristic in describing and collect data, and then
Reach the purpose assisting data retrieval.
A kind of alternative is that the data of described data set are stored in one first memory element, first number of described data set
According to being stored in one second memory element, and the data of described data set and metadata are associated by a data ID.Here, it is described
First memory element and the second memory element are configurable in same database server.Can also be arranged according to actual needs
In different servers.Such as in an optional embodiment, described first memory cell arrangements is in Hadoop distributed document
In system (big data distributed file system);Second memory cell arrangements is in NoSQL data base (data base of non-relational)
In.
Wherein, described distributed file system (Distributed File System) designs based on client/server
Pattern, the physical memory resources specifically referring to file system management is not necessarily directly connected on the local node, but by meter
Calculation machine network is connected with node.Described NoSQL data base for example, key assignments (Key-Value) stores data base, column storage
Storehouse, Document image analysis, figure (Graph) data base or MongoDB database.
In this application, utilize distributed file system can scan data set efficiently, but random access then poor efficiency
's.In order to solve this problem, the scheme that the application provides is to store mark and the note of every pictures, such as filename, big
Little, content describes, and these contents are stored in NoSQL data base to accelerate inquiry velocity, namely connect according to data ID
Connect initial data and metadata.
The code release management system of the application have recorded each territory of data, and such as, a new data is a newname
Word and the set of version number.The embodiment that the application specifically uses uses MongoDB data base to store metadata,
But do not limit to and this, in other implementations, it is also possible to column storage database, key assignments storage data will be moved to
Storehouse, to improve efficiency in Document image analysis, or figure (Graph) data base.
Described execution code is for describing the operations such as the data in data set and data set being carried out, additions and deletions change.Such as, institute
State execution code to include but not limited to: increase the execution code of a new data set, delete the execution code of data set, in present count
According to the data concentrated increasing the execution code of label/character etc., deleting label/character etc. in the data that preset data is concentrated
Perform code, in the data that preset data is concentrated, replace the execution code of label/character etc..
Can be selected for being in embodiment at one, described execution code is stored in such as Gitlab, and uses
The API of GitLab interacts.Described Gitlab is to utilize mono-edition management system increased income of Ruby on Rails, real
An existing Git project warehouse from trustship, can be conducted interviews disclosed or private items by web interface.Described
Gitlab has the function similar with Github, it is possible to browse source code, management defect and annotation.Can be with Executive Team to warehouse
Access, described Gitlab is highly susceptible to browsing the version submitted to and providing a file history storehouse.Team Member is permissible
Built-in simple TALKER (Wall) is utilized to exchange.Described Gitlab also provides for a code snippet collecting function can
Easily to realize code reuse, it is simple to make a look up time the most in need.
Here, described execution back-end engine sets for the programming language of each execution code.Described execution back-end engine bag
Include unit engine and Distributed engine.
Python, described Python on described unit engine for example, unit is free software purely, source code
Following GPL (GNU General Public License) agreement with interpreter CPython, it first can be by .py
Compilation of source code in file becomes the byte code (bytecode) of Python, the most again by Python Virtual Machine
(Python virtual machine) performs the byte code that these are compiled.
Spark on described Distributed engine for example, cluster, described Spark are quick and general computing cluster
Framework, its kernel uses Scala language to write, it provides Scala, Java and Python programming language high-level
API, uses these API can develop the application program of parallel processing easily.
In the particular embodiment, the execution back-end engine described in configuration is required, because so not only facilitating as user
Put up the environment of distributed type assemblies;Automatically code and result data collection can also be coupled together.It is to say,
The code release management system of the application performs code and can obtain arbitrary intermediate object program, as long as user remains original number
According to and code.
In step S22, the code receiving user's propelling movement is stored, and records the code release information of formation;Or
The code pushed according to user sends a code process request, sends an execution order and makes it run to described execution back-end engine
The execution code prestored, and at described execution code to record the code of formation after the described data set prestored is performed operation
Version information.
Wherein, described code release management system receives the code of user's propelling movement and is stored, and records the code of formation
The mode of version information is specific as follows:
User uses user terminal to manage system upload code to described code release, and the most described code administration module is by institute
The code received is preserved, and concurrently forms the code release information of corresponding described code.Wherein, described code can be user
According to the described code release management API that provided of system newly-built/the execution code of reorganization.It addition, with in foregoing embodiments
Data version information be similar to, described code release information include but not limited to following at least one: name of code, code ID,
Formation time, appointment parameter and running log.Described code release management system refers to utilize data set workflow to manage
The mode of data set, carries out version management to described code.
Such as, user improves the execution code stored, the generation that the most described code release management system will be received
Code is updated and is preserved.Meanwhile, two the correspondence between codes is performed before and after the also record modification of described code management system
Relation, to form code operation stream;And on the basis of execution code release information before a modification, determine and record amended
Code release information.
Or, described code release management system sends a code process request according to the code that user pushes, and sends one
Performing order makes it run the execution code prestored to described execution back-end engine, and at described execution code with to the institute prestored
State the code release information recording formation after a data set performs operation.
Specifically, the code that user pushes is without configuring with corresponding execution back-end engine in advance, but works as user
Selecting the execution code pushed in advance when managing data set, described code release management system generates one and comprises corresponding execution
The code process request of back-end engine, and start the execution back-end engine of correspondence, makes it run the execution code that pushed, and
Perform described execution code with to one data set perform operation after, record formed code release information.
Such as, user selects the code pushed in advance by input parameter, code release information etc., to manage data set
In data change.Described code release management system determines to perform based on the parameter received, code release information etc.
Execution code, and generate correspondence code process request.Described system core module is asked to right according to described code process
The execution back-end engine answered sends and performs order.The most described execution back-end engine orders holding specified by execution according to described execution
Line code, and after described execution code is to perform operation to the data set in described data management module, record is formed
Code release information.Wherein, at least one in described code release information, datamation stream and data version information also
The corresponding relation between code and data set performed by record.
In a concrete implementation process, whenever the execution code that user submits to Gitlab server, Gitlab services
By a Web hook, device will notify that described code release manages system.User can be asked by described code release management system
Ask the queue pushing oneself, choose request from head of the queue simultaneously and process;And by the execution code copies of this request to corresponding
Execution back-end engine, to inform selected execution back-end engine.The most described execution back-end engine can use user to provide
Parameter and input run execution code.After this task terminates, described code release management system can be recorded and specifically please
The information asked, including current Push commit ID on Gitlab server, perform code release information, user specifies
Parameter and any specifically relevant to experiment information.In some cases, experiment can produce new data set.The most described code
Edition management system also can record the relation between these data sets, i.e. datamation stream described in foregoing embodiments.
It should be noted that after described code release management system is copied to perform accordingly according to described data processing request
The execution code of end engine is in addition to the new execution code that user submits to, it is also possible to be the execution code stored called.
Such as, the execution code that user's appointment has stored is to process corresponding data set, and acquiescence in non-used code administration module
Perform code.
In a kind of alternative, the data processing request that described code release management system is submitted to according to user is held one
Line code is copied in described execution back-end engine, and sending an execution order, to make it run to described execution back-end engine described
Perform code to form a new data set, and form a code ID described execution code is associated with described new data set, and
The code release information that record is formed.
Wherein, described data processing request includes but not limited to: submits a new data set to or revises described data management
A data set of storage in module.
Specifically, on the basis of above-mentioned data version management system forms datamation stream and data version information, with
And performing Code copying by one in described execution back-end engine in described code release management system, and send one and perform life
Order is to described execution back-end engine, on the basis of making its described execution code of operation to form a new data set, and described code version
Performed execution code ID is also associated by this management system with new data set, and records the code release information of formation.As
This, is when user needs to be analyzed the execution historical record of multiple associated data set, holding of available corresponding each data set
The relevant information of line code.Provide the user more for data set analysis, the average information of statistics.
As a kind of preferred version, described code release management method also includes: be configured with the step of multiple user UI.?
In the step for of, it is configured with multiple user UI, in order to receive the request of different user submission respectively or to ask to different user feedback
Seek information.
Specifically, user it is frequently necessary to analyze some data sets and calculate in some parameters, such as natural language processing
Accuracy rate or stock market return investment repayment every day in survey.Creating datamation stream and the versions of data of data set
After information, described code release management system can provide the user the number in each data set associated based on datamation stream
According to, perform code etc..Shown data can help user to contrast a pair historical analysis result, show code and/or parameter
On difference.Multiple UI design for multiple user can help each user to obtain best algorithm and parameter.
It should be noted that data version management system and code release described in each embodiment manage system in the present invention
In each module can coordinate to share according to name so that two edition management systems can manage data set simultaneously and perform generation
Code, respective version.
In sum, the code release management system that the application provides can be implemented in and manages generation in a system integrated
The version of code, and run personal code work in systems;Code and the data of user can be retained simultaneously, and twice version can be entered
Row compares, and finds difference;It addition, the present invention is by providing respective version management for data set and code, and it it is each data set
There is provided the workflow of directed acyclic with code and build the incidence relation of the two, efficiently solving the version management of data and code
The problems such as efficiency is low or chaotic;It addition, use UI design, it is possible to for user's comparison, analyze each history data set provide convenient
Approach;It addition, each unit is distributed on different server, it is possible to be easy to alleviate the operating pressure on each server.So, this
Invention effectively overcomes various shortcoming of the prior art and has high industrial utilization.
The principle of above-described embodiment only illustrative present invention and effect thereof, not for limiting the present invention.Any ripe
Above-described embodiment all can be modified under the spirit and the scope of the present invention or change by the personage knowing this technology.Cause
This, have usually intellectual such as complete with institute under technological thought without departing from disclosed spirit in art
All equivalences become are modified or change, and must be contained by the claim of the present invention.
Claims (34)
1. a data version management system, it is characterised in that including:
Data management module, storage has at least one data set;
Code administration module, storage has at least one to perform code, and described execution code is at least one data set described
Operate;
Enforcement engine module, is configured with at least one and performs back-end engine, perform rear end according to the commands calls one received
Engine, runs an execution code so that at least one data set in described data management module is performed operation;
System core module, when receiving the data processing request that user submits to, processes the data in described data management module
Collection, creates the datamation stream of described data set, and records the data version information of formation.
Data version management system the most according to claim 1, it is characterised in that: the data of described data set are stored in one
First memory element, the metadata of described data set is stored in one second memory element, and the data of described data set and unit's number
It is associated according to by a data ID.
Data version management system the most according to claim 2, it is characterised in that: described system core module receives use
When the data processing request that family is submitted to is for submitting a new data set to, described system core module extracts the number of described new data set
According to and be stored in described first memory element, extract the metadata of described new data set and be stored in described second memory element,
And formation one new data ID associates data and the metadata of described new data set, create the datamation stream of described data set, and
The data version information that record is formed.
Data version management system the most according to claim 2, it is characterised in that: described system core module receives use
When the data processing request that family is submitted to is the data set revising storage in described data management module, described system core mould
Block performs Code copying in described enforcement engine module according to described data processing request by one, and send an execution order to
Described enforcement engine module makes its described execution code of operation to form a new data set, and described system core module is extracted described
The data of new data set are also stored in described first memory element, extract the metadata of described new data set and are stored in described
Two memory element, and form data and the metadata of a new data ID described new data set of association, and formation one code ID will
Described execution code is associated with described new data set, creates the datamation stream of described data set, and records the data of formation
Version information.
Data version management system the most according to claim 4, it is characterised in that: described system core module is according to described
Data processing request be copied to described enforcement engine module perform that code is that user submits to new perform code or call
The execution code of storage in described code administration module.
6. according to the data version management system described in claim 2,3 or 4, it is characterised in that: described first memory element is joined
It is placed in Hadoop distributed file system;Second memory cell arrangements is in NoSQL data base.
7. according to the data version management system described in claim 1,2,3 or 4, it is characterised in that: described datamation stream is
Representing the secondary relationship between at least two data set, described secondary relationship includes execution historical record and the version of a data set
This.
8. according to the data version management system described in claim 1,2,3 or 4, it is characterised in that: also include user interface mould
Block, is configured with multiple user UI, in order to receive the request of different user submission respectively or to different user feedback request information.
9. according to the data version management system described in claim 1,2,3 or 4, it is characterised in that: described data version information
Including at least one in dataset name, data ID, code ID, formation time and running log.
10. according to the data version management system described in claim 1,2,3 or 4, it is characterised in that: described execution back-end engine
Including unit engine and Distributed engine.
11. 1 kinds of data version management methods, it is characterised in that said method comprising the steps of:
Prestore at least one data set and at least one data set described is operated at least one perform code,
And configure at least one for the execution back-end engine running described execution code;And;
Receive user submit to data processing request time, call one execution back-end engine run described at least one perform code with
Described at least one data set is performed process, creates the datamation stream of described data set, and record the versions of data letter of formation
Breath.
12. data version management methods according to claim 11, it is characterised in that: the data of described data set are stored in
One first memory element, the metadata of described data set is stored in one second memory element, and the data of described data set and unit
Data are associated by a data ID.
13. data version management methods according to claim 12, it is characterised in that: described receives what user submitted to
When data processing request is for submitting a new data set to, extracts the data of described new data set and be stored in described first storage list
Unit, extracts the metadata of described new data set and is stored in described second memory element, and it is described to form a new data ID association
The data of new data set and metadata, and record the data version information of formation.
14. data version management methods according to claim 12, it is characterised in that: described receives what user submitted to
When data processing request is the data set that amendment has stored, perform Code copying extremely according to described data processing request by one
Corresponding perform in back-end engine, and send an execution order to described execution back-end engine make its run described execution code with
Form a new data set, extract the data of described new data set and be stored in described first memory element, extracting described new data
The metadata of collection is also stored in described second memory element, and forms a new data ID and associate the data of described new data set and unit
Data, and form a code ID described execution code is associated with described new data set, and record the versions of data of formation
Information.
15. data version management methods according to claim 14, it is characterised in that: described processes according to described data
What request was copied in described execution back-end engine perform, and code is that user submits to new performs code or prestoring of calling
Perform code.
16. according to the data version management method described in claim 12,13 or 14, it is characterised in that: described first storage is single
Unit is configured in Hadoop distributed file system;Second memory cell arrangements is in NoSQL data base.
17. according to the data version management method described in claim 10,11,12 or 13, it is characterised in that: described datamation
Stream for representing the secondary relationship between at least two data set, described secondary relationship include a data set execution historical record and
Version.
18. according to the data version management method described in claim 11,12,13 or 14, it is characterised in that: also include being configured with
The step of multiple user UI, in order to receive the request of different user submission respectively or to different user feedback request information.
19. according to the data version management system described in claim 11,12,13 or 14, it is characterised in that: described versions of data
Information includes at least one in dataset name, data ID, code ID, formation time and running log.
20. according to the data version management system described in claim 11,12,13 or 14, it is characterised in that: described execution rear end
Engine includes unit engine and Distributed engine.
21. 1 kinds of code release management systems, it is characterised in that including:
Data management module, storage has at least one data set;
Code administration module, storage has at least one to perform code, and described execution code is for depositing described data management module
At least one data set of storage operates;Described code administration module be additionally operable to receive user push code stored or
The code pushed according to user sends a code process request;
Enforcement engine module, is configured with at least one and performs back-end engine, when being used for receiving execution order, performs life according to one
Described execution back-end engine is called in order, runs an execution code so that the data set in described data management module is performed behaviour
Make;
System core module, for recording the code of user's propelling movement and forming code release information, and receives described code
When the code process of management module is asked, send an execution and order to described enforcement engine module, make it run described dematron
Execution code in reason module, and after described execution code is to perform operation to the data set in described data management module
The code release information that record is formed.
22. code release according to claim 21 management systems, it is characterised in that: the data of described data set are stored in
One first memory element, the metadata of described data set is stored in one second memory element, and the data of described data set and unit
Data are associated by a data ID.
23. code release according to claim 21 management systems, it is characterised in that: described system core module is additionally operable to
The data processing request submitted to according to user performs Code copying in described enforcement engine module by one, and sends one and perform life
Make and make it run described execution code to form a new data set to described enforcement engine module, and form a code ID by described
Perform code to be associated with described new data set, and record the code release information of formation.
24. code release according to claim 21 management systems, it is characterised in that: described system core module is according to institute
State data processing request be copied to described enforcement engine module perform that code is that user submits to new perform code or call
Described code administration module in storage execution code.
25. manage system according to the code release described in claim 21,22,23 or 24, it is characterised in that: also include user
Interface module, is configured with multiple user UI, in order to receive the request of different user submission respectively or to different user feedback request
Information.
26. manage system according to the code release described in claim 21,22,23 or 24, it is characterised in that: described code version
This information includes name of code, code ID, formation time, specifies at least one in parameter and running log.
27. manage system according to the code release described in claim 21,22,23 or 24, it is characterised in that: after described execution
End engine includes unit engine and Distributed engine.
28. 1 kinds of code release management methods, it is characterised in that comprise the following steps:
Prestore at least one data set and at least one data set described is operated at least one perform code,
And configure at least one for the execution back-end engine running described execution code;And;
The code receiving user's propelling movement is stored, and records the code release information of formation;Or the generation pushed according to user
Code sends a code process request, sends an execution order and makes it run the execution code prestored to described execution back-end engine,
And at described execution code to record the code release information of formation after the described data set prestored is performed operation.
29. code release management methods according to claim 28, it is characterised in that: the data of described data set are stored in
One first memory element, the metadata of described data set is stored in one second memory element, and the data of described data set and unit
Data are associated by a data ID.
30. code release management methods according to claim 28, it is characterised in that: also include step, carry according to user
The data processing request handed over performs one in Code copying extremely described execution back-end engine, and it runs to send an execution command commands
Described execution code is to form a new data set, and forms a code ID by relevant to described new data set for described execution code
Connection, and record the code release information of formation.
31. code release management methods according to claim 28, it is characterised in that: described processes according to described data
What request was copied in described execution back-end engine performs code is the new execution code that user submits to or the described generation called
The execution code of storage in code management module.
32. according to the code release management method described in claim 28,29,30 or 31, it is characterised in that: also include configuration
There is the step of multiple user UI, in order to receive the request of different user submission respectively or to different user feedback request information.
33. according to the code release management method described in claim 28,29,30 or 31, it is characterised in that: described code version
This information includes name of code, code ID, formation time, specifies at least one in parameter and running log.
34. according to the code release management method described in claim 28,29,30 or 31, it is characterised in that: after described execution
End engine includes unit engine and Distributed engine.
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610282533.2A CN105956087B (en) | 2016-04-29 | 2016-04-29 | Data version management system and method |
CN201910359068.1A CN110119393B (en) | 2016-04-29 | 2016-04-29 | Code version management system and method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610282533.2A CN105956087B (en) | 2016-04-29 | 2016-04-29 | Data version management system and method |
Related Child Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910359068.1A Division CN110119393B (en) | 2016-04-29 | 2016-04-29 | Code version management system and method |
Publications (2)
Publication Number | Publication Date |
---|---|
CN105956087A true CN105956087A (en) | 2016-09-21 |
CN105956087B CN105956087B (en) | 2019-08-30 |
Family
ID=56914515
Family Applications (2)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201610282533.2A Active CN105956087B (en) | 2016-04-29 | 2016-04-29 | Data version management system and method |
CN201910359068.1A Active CN110119393B (en) | 2016-04-29 | 2016-04-29 | Code version management system and method |
Family Applications After (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910359068.1A Active CN110119393B (en) | 2016-04-29 | 2016-04-29 | Code version management system and method |
Country Status (1)
Country | Link |
---|---|
CN (2) | CN105956087B (en) |
Cited By (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108170756A (en) * | 2017-12-22 | 2018-06-15 | 南京邮电大学 | The implementation method of various dimensions, various visual angles and visualization annotation system based on Git warehouses |
CN108228231A (en) * | 2018-01-08 | 2018-06-29 | 南京邮电大学 | A kind of visualization shift algorithm of Git warehouses file annotation system |
CN108268275A (en) * | 2017-06-12 | 2018-07-10 | 平安普惠企业管理有限公司 | Software version control method and software version control device |
CN109032592A (en) * | 2018-08-23 | 2018-12-18 | 常熟市盛铭信息技术有限公司 | A kind of method that software code is shared mutually |
CN109302448A (en) * | 2018-08-27 | 2019-02-01 | 华为技术有限公司 | A kind of data processing method and device |
CN110059096A (en) * | 2019-03-16 | 2019-07-26 | 平安城市建设科技(深圳)有限公司 | Data version management method, apparatus, equipment and storage medium |
CN111198711A (en) * | 2020-01-13 | 2020-05-26 | 陕西心像信息科技有限公司 | Collection version control method and system based on MongoDB |
CN111221566A (en) * | 2019-12-28 | 2020-06-02 | 华为技术有限公司 | Method and device for combining multiple and changeable versions of software code |
CN111506779A (en) * | 2020-04-20 | 2020-08-07 | 东云睿连(武汉)计算技术有限公司 | Object version and associated information management method and system facing data processing |
CN112698866A (en) * | 2021-01-06 | 2021-04-23 | 中国科学院软件研究所 | Code line life cycle tracing method based on Git and electronic device |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1592291A (en) * | 2003-08-28 | 2005-03-09 | 国际商业机器公司 | Method and system for managing service state data |
CN101076793A (en) * | 2004-08-31 | 2007-11-21 | 国际商业机器公司 | System structure for enterprise data integrated system |
US20070282927A1 (en) * | 2006-05-31 | 2007-12-06 | Igor Polouetkov | Method and apparatus to handle changes in file ownership and editing authority in a document management system |
CN101770608A (en) * | 2008-12-26 | 2010-07-07 | 新奥特(北京)视频技术有限公司 | Management method and device of engineering versions |
Family Cites Families (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20020078205A1 (en) * | 2000-11-17 | 2002-06-20 | Lloyd Nolan | Resource control facility |
US20030005408A1 (en) * | 2001-07-02 | 2003-01-02 | Pradeep Tumati | System and method for creating software modifiable without halting its execution |
CN101246420A (en) * | 2007-12-29 | 2008-08-20 | 中国建设银行股份有限公司 | Method and system for multi-language system implementing unified development |
CN101276279B (en) * | 2008-05-21 | 2010-12-08 | 天柏宽带网络科技(北京)有限公司 | Unified development system and method |
CN103049268B (en) * | 2012-12-25 | 2016-08-03 | 中国科学院深圳先进技术研究院 | A kind of application and development based on Naplet management system |
CN103729195B (en) * | 2014-01-15 | 2017-04-05 | 北京奇虎科技有限公司 | A kind of control method and system of software version |
CN103970579B (en) * | 2014-05-29 | 2017-05-03 | 中国银行股份有限公司 | Application version deploying method and application version deploying device |
CN105094851A (en) * | 2015-09-06 | 2015-11-25 | 浪潮软件股份有限公司 | Method for momentarily issuing codes based on Git |
-
2016
- 2016-04-29 CN CN201610282533.2A patent/CN105956087B/en active Active
- 2016-04-29 CN CN201910359068.1A patent/CN110119393B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1592291A (en) * | 2003-08-28 | 2005-03-09 | 国际商业机器公司 | Method and system for managing service state data |
CN101076793A (en) * | 2004-08-31 | 2007-11-21 | 国际商业机器公司 | System structure for enterprise data integrated system |
US20070282927A1 (en) * | 2006-05-31 | 2007-12-06 | Igor Polouetkov | Method and apparatus to handle changes in file ownership and editing authority in a document management system |
CN101770608A (en) * | 2008-12-26 | 2010-07-07 | 新奥特(北京)视频技术有限公司 | Management method and device of engineering versions |
Non-Patent Citations (1)
Title |
---|
李涛: "基于着色时间工作流网的产品数据管理***的研究", 《中国优秀博硕士学位论文全文数据库 (博士)信息科技辑》 * |
Cited By (16)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108268275A (en) * | 2017-06-12 | 2018-07-10 | 平安普惠企业管理有限公司 | Software version control method and software version control device |
CN108170756B (en) * | 2017-12-22 | 2021-12-03 | 南京邮电大学 | Implementation method of multidimensional, multi-view and visual annotation system based on Git warehouse |
CN108170756A (en) * | 2017-12-22 | 2018-06-15 | 南京邮电大学 | The implementation method of various dimensions, various visual angles and visualization annotation system based on Git warehouses |
CN108228231B (en) * | 2018-01-08 | 2021-07-27 | 南京邮电大学 | Visualization drifting method of Git warehouse file annotation system |
CN108228231A (en) * | 2018-01-08 | 2018-06-29 | 南京邮电大学 | A kind of visualization shift algorithm of Git warehouses file annotation system |
CN109032592A (en) * | 2018-08-23 | 2018-12-18 | 常熟市盛铭信息技术有限公司 | A kind of method that software code is shared mutually |
CN109302448A (en) * | 2018-08-27 | 2019-02-01 | 华为技术有限公司 | A kind of data processing method and device |
CN110059096A (en) * | 2019-03-16 | 2019-07-26 | 平安城市建设科技(深圳)有限公司 | Data version management method, apparatus, equipment and storage medium |
CN111221566B (en) * | 2019-12-28 | 2021-10-22 | 华为技术有限公司 | Method and device for combining multiple and changeable versions of software code |
CN111221566A (en) * | 2019-12-28 | 2020-06-02 | 华为技术有限公司 | Method and device for combining multiple and changeable versions of software code |
CN111198711A (en) * | 2020-01-13 | 2020-05-26 | 陕西心像信息科技有限公司 | Collection version control method and system based on MongoDB |
CN111198711B (en) * | 2020-01-13 | 2023-02-28 | 陕西心像信息科技有限公司 | Collection version control method and system based on MongoDB |
CN111506779B (en) * | 2020-04-20 | 2021-03-16 | 东云睿连(武汉)计算技术有限公司 | Object version and associated information management method and system facing data processing |
CN111506779A (en) * | 2020-04-20 | 2020-08-07 | 东云睿连(武汉)计算技术有限公司 | Object version and associated information management method and system facing data processing |
CN112698866A (en) * | 2021-01-06 | 2021-04-23 | 中国科学院软件研究所 | Code line life cycle tracing method based on Git and electronic device |
CN112698866B (en) * | 2021-01-06 | 2022-06-17 | 中国科学院软件研究所 | Code line life cycle tracing method based on Git and electronic device |
Also Published As
Publication number | Publication date |
---|---|
CN105956087B (en) | 2019-08-30 |
CN110119393A (en) | 2019-08-13 |
CN110119393B (en) | 2021-04-09 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN105956087A (en) | Data and code version management system and method | |
US11726760B2 (en) | Systems and methods for entry point-based code analysis and transformation | |
US11663033B2 (en) | Design-time information based on run-time artifacts in a distributed computing cluster | |
US10740075B2 (en) | Systems and methods for code clustering analysis and transformation | |
US20230244476A1 (en) | Systems and methods for code analysis heat map interfaces | |
US11016757B2 (en) | Content deployment system having a proxy for continuously providing selected content items to a content publishing engine for integration into a specific release and methods for implementing the same | |
US7574379B2 (en) | Method and system of using artifacts to identify elements of a component business model | |
US8671084B2 (en) | Updating a data warehouse schema based on changes in an observation model | |
US20130166602A1 (en) | Cloud-enabled business object modeling | |
US20200387372A1 (en) | Microservice file generation system | |
US20130085961A1 (en) | Enterprise context visualization | |
CN105528418B (en) | A kind of design documentation generation method and device | |
KR102397495B1 (en) | No code web development and operating system, and service method using of it | |
US20150293947A1 (en) | Validating relationships between entities in a data model | |
CN104106066A (en) | System to view and manipulate artifacts at temporal reference point | |
US10083061B2 (en) | Cloud embedded process tenant system for big data processing | |
TW201405452A (en) | Workflow management device and workflow management method | |
US11120200B1 (en) | Capturing unstructured information in application pages | |
US9262556B2 (en) | Embedded search results within the context of a process | |
US9244707B2 (en) | Transforming user interface actions to script commands | |
CN110019440A (en) | The processing method and processing device of data | |
US20140149186A1 (en) | Method and system of using artifacts to identify elements of a component business model | |
Blagaić et al. | Application for data migration with complete data integrity | |
Kang et al. | Heterogeneous Business Process Consolidation: A Pattern-Driven Approach | |
Zdraveski et al. | The UML model of business intelligence system in increasing corporate performance |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
TA01 | Transfer of patent application right |
Effective date of registration: 20190603 Address after: 710077 Room 101, Block B, Yunhui Valley, 156 Tiangu Eighth Road, New Town, Yuhua Street Software, Xi'an High-tech Zone, Shaanxi Province Applicant after: Cross Information Core Technology Research Institute (Xi'an) Co., Ltd. Address before: 100084 Qinghua Garden, Haidian District, Haidian District, Beijing Applicant before: Tsinghua University |
|
TA01 | Transfer of patent application right | ||
GR01 | Patent grant | ||
GR01 | Patent grant |