The content of the invention
For the upper transfer efficiency for solving existing method for uploading presence is low, the problems such as uploading is repeated, the invention discloses one kind
Data uploading method and system, from the angle to data documents disposal, innovatively by data file burst and integrality pipe
Reason, the file for being transmitted to data slot and being received carries out completeness check, completes the high-speed transfer and increment of big data
Data transfer, so as to improve transfer efficiency in data, solve to upload batch small documents occupying system resources it is excessive the problems such as.
To realize above-mentioned technical purpose, the invention discloses a kind of data uploading method, the method comprises the following steps,
Step 1, reads data file;
Step 2, data fragmentation operation, generation data slot are carried out to the data file;
Step 3, Integrity Management operation is carried out to the data slot;
Step 4, uploads data file by way of uploading data slot;
Step 5, after receiving the data slot, the integrality of all data slots that verification is received.
The present invention is solved traditionally by data fragmentation, Integrity Management, upload data slot, verification integrity operations
The low problem of transfer efficiency in journey is transmitted through, especially for the upload of batch small documents and big file, present invention greatly enhances big
The upper transfer efficiency of scale data;In addition, the present invention can be effectively prevented from the problem of Data duplication upload, it is timely due to uploading, this
Invention allows super to can be regarded as industry and submitted in real time.
Further, in step 1, judge whether the data file is to upload first:If it is, being uploaded in step 4
All of data slot;If it is not, then uploading the data slot for changing in step 4.
The present invention repeatedly uploads a large amount of useless operation for bringing and accounts for for client data in extensive long-range supercomputing
The low problem of transfer efficiency in the data brought with the calculating network resource of client or service end, the present invention is for the number that has uploaded
According to file, only upload the data slot for changing, the i.e. present invention and devise task based access control-file characteristic value list and text
The one-to-one file increment of part-unique features value uploads mode, so as to avoid Data duplication from uploading, improves data and uploads effect
Rate.
Further, data slot to be uploaded is numbered in step 3;In step 5, connect according to the numbering verification
The integrality of the data file for receiving.
This simple mode is numbered present invention employs data slot so that checking of the server to integrality becomes more
For easy, the present invention is set to be easier to implement.
Further, in step 2, the data slot to having generated carries out caching;In step 5, after uploading successfully, delete
Except the data slot uploaded in caching.
By the communication of client and server, after server is properly received related data fragment, then client is notified
Receive result, client judges to delete related data fragment or retransmission data fragment according to result is received;Avoid redundant data
The uploading operation of other files is influenceed on the occupancy of client-cache.
Further, in step 2, the data slot size meets high speed transport protocols requirement;In step 4, based on height
Fast host-host protocol concurrently uploads data slot.
Data file is innovatively cut into the present invention data slot for meeting high speed transport protocols requirement, is energy of the present invention
It is enough to upload the preparation that data file has done abundance at high speed.
Further, the high speed transport protocols are connectionless protocol.
The present invention for uploaded in extensive long-range supercomputing single super large file and batch small documents transmission speed it is low,
By the problem of stability difference caused by web influence transmission, reliable file transmission side of the present invention based on connectionless network protocol
Method.Improve network transfer speeds, it is ensured that the integrality of file transmission.
Another goal of the invention of the invention is to provide a kind of data uploading system, and the system includes client and service
Device;The client includes read module, burst module, management module and uploading module, and the read module reads data text
Part, the burst module carries out data fragmentation operation, generation data slot to the data file, and the management module is to described
Data slot carries out Integrity Management operation, and the uploading module uploads data file by way of uploading data slot;Institute
Stating server includes receiver module and correction verification module, and the receiver module is used to receive the data slot of uploading module upload, institute
State correction verification module for verify the integrality of all data slots that receives.
Further, the client also includes judge module;The judge module is used to judge that the data file is
It is no to upload first:If it is, uploading module uploads all of data slot;If it is not, then uploading module is uploaded changing
The data slot of change.
Further, the client also includes division module, and the division module is carried out to data slot to be uploaded
Numbering;The server and the client communication, the data file that the correction verification module is received according to the numbering verification
Integrality.
Further, the client also includes cache module, and the cache module is used to store the data slot of generation.
Beneficial effects of the present invention are:The present invention for supercomputing provide efficient, the quick data uploading method of one kind and
System, it is to avoid the problem that Data duplication is uploaded, effectively reduces data transmission period, greatly improves the upper of large-scale data
Transfer efficiency, and then allow super to can be regarded as industry and submitted in real time.
The present invention uploads problem for data in extensive long-range supercomputing, there is provided a kind of total solution,
Solve that the upper transfer efficiency existed by connection-oriented files passe mode in traditional Remote super computer system is low, can not solve
The problem that certainly Data duplication is uploaded so that large-scale data treatment operation need not wait as long for data upload, you can carry out reality
When submit to.
Specific embodiment
Detailed explanation and explanation is carried out to data uploading method of the present invention and system with reference to Figure of description.
As shown in Figure 1,2,3, 4, the invention discloses a kind of data uploading method, the method is a kind of efficient upload side
Method, the data after upload are used for long-range supercomputing.The efficient data method for uploading and system of long-range supercomputing are mainly section
The method for uploading and transmission method of the large-scale data that or Engineering Oriented are calculated.Wherein, big data entirety high-speed uploading is
By establishment file feature value list, big file fragmentation treatment carries out parallel transmission, then in service using high speed transport protocols
End carries out file restructuring, and in order to ensure good efficiency of transmission, the efficient data of long-range supercomputing is uploaded and avoids data weight
It is multiple to upload, and improve efficiency of transmission.Overload is solved the problems, such as, while Adaptive Transmission system is to various transmission means
Condition monitoring is carried out, to ensure the controllability and reliability of transmission.The method specifically includes following steps:
Step 1, reads data file, in the present embodiment, data file is read by client, in the present invention, by visitor
Family end judges whether data file is to upload first with server communication:If it is, uploading all of data slot;If
No, then it is incremental data to need the data file for uploading, and the data slot that upload changes is reduced in the repetition of redundant data
System resource waste caused by passing, server only receives the data for changing, and then carries out the restructuring of file, completes increment
File reception.
Step 2, data fragmentation operation, generation data slot are carried out to data file, to meet wanting for postorder high-speed transfer
Ask, data file is cut into the present invention data slot for meeting high speed transport protocols requirement size;To the data slice for having generated
Duan Jinhang cachings, in other words, cache to data file.It is right in this step for convenience of the long-range supercomputing in later stage
Data file to be uploaded is calculated.
Step 3, Integrity Management operation is carried out to data slot, to meet post-service device to data fragment integrity
Verification.
Step 4, based on high speed transport protocols, uploads data file by way of uploading data slot;In addition, of the invention
It is numbered for data slot to be uploaded by client, based on above-mentioned numbering, inspection of the server end to integrality.This
In embodiment, above-mentioned high speed transport protocols are a kind of connectionless protocols;File cache area is set up, i.e., for depositing number to be uploaded
According to caching, create multithreading the file in caching is uploaded.
Step 5, after server of the present invention receives data slot, the integrality of verification data fragment, i.e. verification are received
All data slots whether have omission.In the present embodiment, can be by Data Comparison scheduling algorithm verification data fragment or data
Integrality, such as, and the integrality of the data file received according to the numbering verification in step 4, if finding what client sent
There is the situation that part receives without being serviced device in data slot, such as 10 data slots only have received 8, then service
Device sends to client and retransmits request, so that it is guaranteed that the integrality of data file or data slot;The data file of repeating transmission can be
Total data fragment or only retransmit the data slot not received that last time sends.If corresponding data fragment is uploaded successfully, i.e.,
Client receives server and receives successfully to be recovered, then the present invention can be automatically deleted the data uploaded in the caching of step 2
Fragment.
In addition, after server receives data slot, data slot is integrated into data file, completes delta file and connect
Receive, then data storage file, the data file for storing is calculated by supercomputing cluster, is fulfiled assignment.
As shown in Fig. 2,1,3,4, the invention also discloses a kind of data uploading system, in long-range supercomputing process
Middle file uploading speed is slow and the low problem of files passe efficiency caused by repeating to upload, and proposes a kind of based on C/S frameworks
For the system total solution that the efficient data of supercomputing is uploaded.
A kind of data uploading system of the present invention includes client and server;As shown in Fig. 2,3,1, client includes reading
Module, burst module, management module and uploading module, read module, burst module, management module and uploading module can connect successively
Connect or according to the transmission relation connection of data.Read module reads data file, and burst module carries out data point to data file
Piece operation, generation data slot, management module carry out Integrity Management operation to data slot, manage data slot to be sent
The concurrent thread pool of Buffer Pool, management, uploading module uploads data file by way of uploading data slot.Visitor of the invention
Family end also includes judge module;Judge module is used to judge whether data file is to upload first:If it is, in uploading module
Pass all of data slot;If it is not, then uploading module uploads the data slot for changing.Client also includes dividing mould
Block, division module is numbered to data slot to be uploaded;Server and client communication, correction verification module are verified according to numbering
The integrality of the data file for receiving.Client also includes cache module, that is, cache, and cache module is used to store the number of generation
According to fragment, after data slot is uploaded successfully, control cache module is automatically deleted the data slot uploaded in caching.
As shown in Fig. 2,4,1, server includes receiver module and correction verification module, and receiver module is used to receive uploading module
The data slot of biography, the integrality and request repeat that correction verification module is used for verification data fragment does not upload successful file, school
Module is tested to be additionally operable to for data slot to be reassembled as data file, write-in storage;After files passe terminates, supercomputing cluster root
Fulfiled assignment according to the data in storage.
In addition, client may also include network detection module, network monitoring module combination management module and uploading module, base
In the transmitting mode of connectionless network protocol, be pushed to for the connection status of client and server by network detection module
Transmission module;Uploading module carries out the transmission of data in the case of confirming that network connection is reliable.In present invention contrast legacy system
Mode is uploaded using the ftp file based on TCP, big file transfer rate is substantially increased.Improve supercomputing task system
Operational efficiency.
According to above-mentioned data uploading method and uploading system, the present invention can be implemented as follows.
(1) the file fragmentation size set according to host-host protocol, and the value is set as the threshold value of file fragmentation, will be greater than
The file of the threshold value carries out burst, and file fragmentation is numbered.
(2) in the multiple transmission threads of client terminal start-up, file is transmitted.
(3) set up in service end and receive window, carry out file reception, the file for burst enters according to the numbering of file
Style of writing part integrity verification, for the file fragmentation not received, sends to client and retransmits request.
(4) received complete big file in window is carried out the restructuring of file.
(5) whether it has been transmitted according to listed files checking file synchronous in service end and client, if not
Complete, sent to client and retransmit request.
(6) file transmission is completed, is returned to client and is completed response.
In the present invention, unless otherwise clearly defined and limited, term " installation ", " connected ", " connection ", " fixation " etc.
Term should be interpreted broadly, for example, it may be fixedly connected, or be detachably connected, or integrally;Can be that machinery connects
Connect, or electrically connect;Can be joined directly together, it is also possible to be indirectly connected to by intermediary, can be in two elements
The connection in portion or two interaction relationships of element, unless otherwise clearly restriction.For one of ordinary skill in the art
For, can as the case may be understand above-mentioned term concrete meaning in the present invention.
In the description of this specification, reference term " one embodiment ", " some embodiments ", " the present embodiment ", " specific
The description of example " or " some examples " etc. mean to combine the specific features that the embodiment or example describe, structure, material or
Feature is contained at least one embodiment of the invention or example.In this manual, to the schematic representation of above-mentioned term
Necessarily it is directed to identical embodiment or example.And, the specific features of description, structure, material or feature can be
Combined in an appropriate manner in any one or more embodiments or example.Additionally, in the case of not conflicting, this area
Technical staff can be carried out the feature of the different embodiments or example described in this specification and different embodiments or example
With reference to and combination.
Presently preferred embodiments of the present invention is these are only, is not intended to limit the invention, it is all in substance of the present invention
On any modification, equivalent and the simple modifications made etc., should be included within the scope of the present invention.