CN104765876A - Massive GNSS small file cloud storage method - Google Patents

Massive GNSS small file cloud storage method Download PDF

Info

Publication number
CN104765876A
CN104765876A CN201510204235.7A CN201510204235A CN104765876A CN 104765876 A CN104765876 A CN 104765876A CN 201510204235 A CN201510204235 A CN 201510204235A CN 104765876 A CN104765876 A CN 104765876A
Authority
CN
China
Prior art keywords
index
file
gnss
observation
small documents
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201510204235.7A
Other languages
Chinese (zh)
Other versions
CN104765876B (en
Inventor
吕志平
李林阳
陈正生
崔阳
黄令勇
王宇谱
吕浩
孙大双
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
PLA Information Engineering University
Original Assignee
PLA Information Engineering University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by PLA Information Engineering University filed Critical PLA Information Engineering University
Priority to CN201510204235.7A priority Critical patent/CN104765876B/en
Publication of CN104765876A publication Critical patent/CN104765876A/en
Application granted granted Critical
Publication of CN104765876B publication Critical patent/CN104765876B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention relates to a massive GNSS small file cloud storage method. The problem of massive GNSS small file efficient storing, managing, issuing and sharing is effectively solved. The method comprises the steps that first, massive GNSS small files are combined into a large file, and an index is established for the combined large file; an index block storing strategy is optimized, file blocks and index blocks obtained after segmentation are stored on nodes of data blocks or data nodes close to the data blocks, the index of a GNSS data type is stored on a name node, storage capacity consumption and name node memory consumption are lowered, and the performance of massive small file writing in, assessing and deleting is improved. The massive GNSS small file cloud storage method is simple and easy to operate, storage space is saved, memory consumption is lowered, the efficiency of writing in, reading and deleting is improved, the purposes of massive GNSS small file efficient storing, managing, issuing and sharing can be effectively achieved, the method is an innovation on massive GNSS small file managing, and economic and social benefits are high.

Description

Magnanimity GNSS small documents cloud storage means
Technical field
The present invention relates to " Geodesy and Survey Engineering " technical field in " Surveying Science and Technology " subject, particularly a kind of magnanimity GNSS small documents cloud storage means.
Background technology
Along with the development of science and technology, the whole world, country, region class CORS net (CORS, ContinuouslyOperating Reference Station System) constantly build up, GPS (Global Position System) (GNSS, GlobalNavigation Satellite System) be widely used in every field, that particularly integrates that self CORS formed has more base stations, the networking successively of higher level associative form CORS and Continuous Observation, and the scale of GPS (Global Position System) data volume is increasing.
The data of magnanimity bring challenges to store and management, data latency process more than a large amount of TB level.For GNSS observation data, Continuous Observation 1 day, sampling rate be 1 second, only the data volume of gps satellite just can reach 80MB, there are up to ten thousand research stations in the whole world, and the data volume of a day just can reach tens of to hundreds of TB; In addition, be different from network log and remote sensing image, GNSS data kind and form rich and varied, observe file conciliate being counted as fruit with GNSS is the category that the GNSS data of representative all belongs to small documents.
For the challenge that magnanimity GNSS small documents brings to store and management, traditional storage area network (SAN, Storage AreaNetwork) and network attached storage (NAS, Network-Attached Storage) in the expansion of capacity and performance, there is bottleneck.File transfer protocol (FTP) (the FTP that current GNSS data center adopts, File Transfer Protocol) and relational database in management magnanimity GNSS data, there is many restrictions, centralised storage method can not meet the needs that extensive GNSS data stores application.Domestic and international research institution and researchist have carried out paying close attention to widely and studying to mass small documents storage, the document delivered mainly comprises: " the AnOptimized Approach for Storing and Accessing Small Files on Cloud Storage " of external " Journal of Network and Computer Applications ", " the Metadata-Aware Small Files Storage Architectureon Hadoop " of " WebInformation Systems and Mining ", " the Hmfs:EfficientSupport of Small Files Processing over HDFS " of " Algorithms and Architectures for Parallel Processing ", " a kind of scheme improving cloud storage small file storage efficiency " of domestic " XI AN JIAOTONG UNIVERSITY Subject Index ", " a kind of mass small documents storage means in conjunction with RDBMS and Hadoop " and " the space-time data small documents storage policy under cloud environment " of " Wuhan University Journal information science version ".
Existing solution has all been placed on focus and has inquired into metadata schema, analyze correlativity between mass small documents, the structure of adjustment System and user access the aspects such as rule, but pay close attention to less to data type and feature and the Placement Strategy that is combined rear file index, the management of GNSS small documents can not be applied to completely.In the face of taking small documents as the storage demand of magnanimity GNSS data of representative, utilize the cloud platform of increasing income of bottom, in conjunction with GNSS data type and feature, design magnanimity GNSS small documents cloud storage means, becomes magnanimity GNSS small documents efficient storage, management, issue and shared effective way.
Summary of the invention
For above-mentioned situation, for overcoming the defect of prior art, the object of the present invention is just to provide a kind of magnanimity GNSS small documents cloud storage means, effectively solves magnanimity GNSS small documents efficient storage, management, issue and shared problem.
The technical scheme that the present invention solves is, for defect and the bottleneck of magnanimity GNSS small documents centralised storage method, to increase income cloud platform (Hadoop) based on bottom, build and design magnanimity GNSS small documents cloud storage means, the efficient cloud realizing magnanimity GNSS small documents stores, first magnanimity GNSS small documents is merged into large files, the large files after being combined sets up index; And optimum indexing block storage policy, blocks of files after cutting and index block are stored in data block node or from the nearest back end (DataNode) of data block, by the index stores of GNSS data type on title node (NameNode), reduce the consumption of memory capacity and the memory consumption of title node (NameNode), the performance improving large amount of small documents write, access and delete, specifically comprises the following steps:
(1), magnanimity GNSS small documents is merged into large files, to reduce large amount of small documents taking title node (NameNode) internal memory, small documents merge be first by same observation the period or the resolving time, same type file merge; Wherein when observing the merging of file to GNSS, the sequencing alphabetical by survey station name four merges, to when resolving the merging of Outcome Document, the sequencing alphabetical by GNSS analytic centre title three merges, observed a large amount of GNSS Piece file mergence become an observation period continuous print observation large files, by resolve Outcome Document merge become a resolving time Sequentially continuous resolve achievement large files;
(2) the GNSS large files index building after, being combined, namely respectively observation file is conciliate and be counted as fruit index building, adopt character and index mode one to one, to observation file, by file sequence number, year day of year and survey station name build Pyatyi index, in the end store the positional information of observation file in one-level index; To resolving Outcome Document, building six grades of indexes by day and analytic centre's title in GPS week, week, in the end in one-level index, storing the positional information resolving Outcome Document;
(3), the index of foundation is carried out cutting by data block size, owing to intraday observation data can be merged by GNSS data process software, therefore file sequence number can be unified to be 0, corresponding observation file first order index file sequence number is also 0, during index cutting, to observing the second to the level V index of file, resolving first to the 6th grade of index of achievement, take mode from bottom to top, its cutting is the index block of 64MB size by the size of computation index;
(4), by index block be placed on the node that stores data block or from the nearest node of data block, improve file reading speed and to go forward side by side the memory consumption of the low title node (NameNode) of a step-down;
(5), the index stores of the file type of rear GNSS large files will be merged on title node (NameNode), blocks of files map paths and sign observation file are stored on title node (NameNode) with three characters/index block map paths resolving achievement type, blocks of files and index block are all stored on back end (DataNode), and the cloud realizing magnanimity GNSS small documents stores.
The inventive method is simple, easy to operate, save storage space, reduce memory consumption, improve write, read and deletion efficiency, effective raising magnanimity GNSS small documents efficient storage, management, issue and the object shared, be innovate greatly magnanimity GNSS small documents managerial, economic and social benefit is huge.
Accompanying drawing explanation
Fig. 1 is small documents storage platform functional schematic of the present invention.
Fig. 2 is that the present invention observes file index design of graphics.
Fig. 3 is that the present invention resolves achievement index construct figure.
Fig. 4 is that the present invention observes file reconciliation be counted as fruit file storage location schematic diagram.
Embodiment
Below in conjunction with accompanying drawing, the specific embodiment of the present invention is elaborated.
Shown in Fig. 1-4, the present invention, in concrete enforcement, comprises the following steps:
Step 1: magnanimity GNSS small documents is merged into large files, to reduce large amount of small documents taking title node (NameNode) internal memory, magnanimity GNSS small documents comprises following two type files: a class is, with observation data, ephemeris is gentle resembles the observation file that file is representative in navigation, and another kind of is that to be representative with coordinate file, precise ephemeris, precise clock correction resolve Outcome Document, no matter be observe file or resolve Outcome Document, all have employed the standard format of international uniform, Interchange Format (the RINEX irrelevant with receiver that observation file adopts, Receiver Independent Exchange Format), that resolves achievement employing resolves irrelevant achievement Interchange Format (SINEX, Solution (Software/technique) Independent ExchangeFormat), ionosphere Interchange Format (IONEX, Ionosphere Exchange Format) and precise ephemeris data layout (SP3, NGS Standard GPS Format) form, n GNSS small documents is stored in system, every part of GNSS small documents all comprises position, time and file type three kinds of parameters, distinguished by parameter between data, GNSS small documents data set D is expressed as:
D={d (L i, T j, I k), d|L i∈ L, T j∈ T, I k∈ I}, i, j, k ∈ Z formula (1)
Wherein, the positional information that L representation file produces, mainly comprises the survey station gathering observation file and the mechanism of resolving Outcome Document; The time mark that T representation file produces, because the 24h Continuous Observation of survey station and the timing of data center are resolved and issue continuously, T is a continuous print time series; I representation file type, is defined by above-mentioned standard format, L and T all divides acquisition from the top of file of filename and file record, and I divides acquisition from the top of file of file extension and file record; D represents set, the sequence number of i, j, k respectively representation file position, time and type parameter, and Z is integer;
Small documents merge time, first by same observation the period or the resolving time, same type file, merge by the sequencing of survey station name four characters, analytic centre's title three characters, merge after GNSS small documents collection be expressed as:
D T j , I k = { d ( L i , T j , I k ) , d | L i ∈ L , T j , I k } , i ∈ Z Formula (2)
Wherein, T jrepresent jth observation period or the moment of resolving, I krepresent a kth file type, Z is integer;
Then, respectively the small documents of each type is merged by continuous print observation period or resolving time sequence, because in GNSS small documents is measured, all solutions are of universal significance, therefore respectively the day of the observation file of continuous 7 days and 7 days being separated Piece file mergence is a large files can be expressed as:
D I k = { d ( L i , T j , I k ) , d | L i ∈ L , T j ∈ T , I k } , i , j ∈ Z Formula (3)
By the merging of above two steps, the GNSS of continuous 7 days can be observed Piece file mergence become one to observe period continuous print observation large files, by 7 days resolve that Outcome Document is merged into a resolving time Sequentially continuous resolve achievement large files; The filename of large files is with the observation of file type, starting and ending or resolving time, first and end survey station name or analytic centre's name mark, file after merging is stored in cloud storage system the mode adopting piecemeal, the size of data block is set to 64MB, each data block is the set of multiple small documents, and take the memory headroom of title node (NameNode) 150B, the memory headroom that each small documents before comparatively merging takies 150B is compared, and greatly reduces the memory consumption of title node (NameNode);
Described magnanimity GNSS small documents comprises GNSS to be observed file and resolves Outcome Document, the standard format of international uniform all followed by these files, because GNSS data and achievement form can constantly be upgraded, therefore, to the file layout after upgrading and the file type of up-to-date proposition, the category of GNSS small documents all can be brought into;
The file of described same observation period or resolving time, same type merges, also can resolve the date by the identical observation period first respectively to merge, merge by continuous print observation period or the cycle of resolving again, the filename of large files is referred to as mark with the observation of file type, starting and ending or resolving time, first and end survey station title or analytic centre's name, large files is stored in cloud storage system the mode adopting piecemeal after merging, the size of data block is set to 64MB, and each data block is the set of multiple GNSS small documents;
Step 2: the GNSS large files index building after being combined, namely conciliate observation file respectively and be counted as fruit by L and T index building, method is:
During to observation file index building, because observation file adopts RINEX form to preserve, RINEX form adopts the naming method of 8.3, the wherein root name of 8 representatives for representing 8 character lengths that file belongs to, 3 representatives are for representing the extension name of 3 character lengths of file type, and concrete form is ssssdddf.yyt, and wherein ssss represents the survey station name of 4 character lengths, ddd represents year day of year, and f represents the file sequence number of a day, employing character f represents intraday file sequence number, character string ddd represents year day of year, character string ssss represents survey station name, from top to down, takes character and index mode one to one, build Pyatyi index, in the end the end node of one-level index stores the positional information of observation file, first order index is file sequence number, index range is by [0, 9] and [a, z] two interval compositions, [0, 9] 10 Arabic integers are represented, [a, z] represent 26 English lower case, second level index is hundred of year day of year, index range is [0, 3], [0, 3] 4 Arabic integers are represented, third level index corresponds to ten of year day of year, index range is [0, 9], fourth stage index is the individual position of year day of year, index range is [0, 9], level V index is the survey station name of four character lengths, the scope of each character drops on [0, 9] and [a, z] in two intervals,
To resolving Outcome Document, the form of sssddddd.ttt is adopted to preserve, wherein sss represents three characters abbreviations of analytic centre, front four d representative in ddddd is from the GPS week that on January 6th, 1980,0h started at, last d represents day in week, the type of achievement is resolved in ttt representative, by GPS week, in week, day and analytic centre's title build six grades of indexes, the positional information of Outcome Document is resolved in the end node storage of index, the first order is respectively the kilobit in GPS week to fourth stage index, hundred, ten and a position, index range is [0, 9], level V index is day in GPS week, index range is [0, 7], wherein [0, 6] 7 integers represented represent the day solution file of a week, numeral 7 represents separates file in week, 6th grade of index is analysis institution's title of three-character doctrine length, the equal scope of each character drops on [a, z] in interval,
Described observation file and resolve Outcome Document and build Pyatyi and six grades of indexes respectively, Standard File Format is followed in the foundation of index, in the end the routing information of storage file in one-level index;
Step 3: the index of foundation is carried out cutting by data block 64MB, to observation file, owing to intraday observation data can be merged by GNSS data process software, therefore the unification of file sequence number is 0, and corresponding first order index file sequence number is also 0; During index cutting, to observing the second to the level V index of file, resolving first to the 6th grade of index of achievement, take mode from bottom to top, the size of computation index block, the size (IBlock) of a current i-1 and i index meets following formula
Σ 1 i - 1 IBlock ≤ 64 MB Σ 1 i IBlock > 64 MB Formula (4)
A front i-1 index is saved as an independently index block, in such a way, completes the cutting to all indexes that step 2 builds;
Step 4: index block is placed on the back end (DataNode) that stores data block or from the nearest back end (DataNode) of data block, improve reading speed to go forward side by side the memory consumption of the low title node (NameNode) of a step-down, the title of the content of the index block of cutting with the GNSS large file block after merging is mated, the mode of mating step by step is taked from top to bottom during coupling, when there is branch in index, there is the ratio shared by each index character of bifurcation in statistics, mate accounting for the maximum character of index block ratio with data block in back end (DataNode), using the memory node of node the highest for matching rate as index block, when index block be placed on store data block node or from the node that data block is nearest time, on the one hand, reduce communication overhead during digital independent, node local or adjacent again after namely finding certain index just can find corresponding file content, improve reading speed, on the other hand, because index does not leave on title node (NameNode), but on back end (DataNode), therefore reduce the memory consumption of title node (NameNode) further,
Step 5: by the index stores of the file type of the GNSS large files after merging on title node (NameNode), file is observed to GNSS, be stored in index on title node (NameNode) except comprising the file type of a letter representative, also comprise the rear two digits in observation year on date; To resolving Outcome Document, the index be stored on title node (NameNode) only comprises the file type of three letter representatives; Therefore, except the data block copy amount stored and large files filename/map paths, file type/index block the path be made up of three bit digital or letter is also stored on title node (NameNode), thus realizes the storage of magnanimity GNSS small documents cloud.
The present invention, in concrete enforcement, also can be realized by following methods:
Provided by Fig. 1, the present invention mainly comprises a title node (NameNode) as host node, several back end (DataNode) are as the memory node of blocks of files and index block, the task of each back end (DataNode) comprises the structure of the merging of responsible small documents and index block, a certain specific back end (DataNode) is responsible for the merging of index and the cutting of index block, and concrete steps are:
1) magnanimity GNSS small documents is merged: magnanimity GNSS small documents comprises GNSS and observes file, resolves Outcome Document two class, observation file receives via all kinds of receiver, through the file formation of the standard RINEX form that Data Format Conversion Software converts to, mainly comprise RINEX 2.0 and 3.0 two kind of form, file type comprises the observation data of multisystem multifrequency, the navigation ephemeris of each system, satellite clock correction and observation summary (summary file) four class files; Resolve Outcome Document and comprise precise ephemeris, precise clock correction, earth rotation parameter (ERP), satellite yaw rate and coordinate file etc., the international GNSS Servers Organization of Shi You (IGS, InternationalGNSS Service) each analytic centre utilizes high-precision GNSS data processing software to resolve and obtains, and form follows SP3, SINEX, IONEX standard;
Observation file corresponds to observes the period accordingly, comprises the information such as initial time, end time and sampling interval, therefore can first be merged by survey station name by the observation file of identical period; Then press continuous print observation time sequence, merge the observation file of different observation period; Resolve achievement to correspond to by the period of resolved data, comprise the starting and ending time of resolved data, therefore achievement of resolving corresponding for identical period observation data can be merged, again according to the continuous print cycle of resolving merge different times resolve achievement, the filename of large files is referred to as mark with the observation of file type, starting and ending or resolving time, first and end survey station title or analytic centre's name;
Each back end (DataNode) has been responsible for the merging of this node small documents;
2) the observation file after being combined respectively is conciliate and is counted as fruit index building: during to observation file index building, because observation data generally adopts RINEX form, RINEX form adopts the naming method of 8.3, the wherein root name of 8 representatives for representing 8 character lengths that file belongs to, 3 representatives are for representing the extension name of 3 character lengths of file type, concrete form is ssssdddf.yyt, therefore the intraday file sequence number that character f represents can be utilized, character string ddd representative year day of year and character string ssss representative survey station name, from top to down, take character and index mode one to one, build Pyatyi index, in the end the end node of one-level index stores the routing information of observation file, as Fig. 2 observes shown in file index, first order index is file sequence number, index range is by [0, 9] and [a, z] two interval compositions, [0, 9] 10 Arabic integers are represented, [a, z] represent 26 English lower case, second level index is hundred of year day of year, index range is [0, 3], [0, 3] 4 Arabic integers are represented, third level index corresponds to ten of year day of year, index range is [0, 9], fourth stage index is the individual position of year day of year, index range is [0, 9], level V index is the survey station name of four character lengths, the scope of each character drops on [0, 9] and [a, z] in two intervals,
To resolving Outcome Document, the form of sssddddd.ttt is adopted to preserve, wherein sss represents three characters abbreviations of analytic centre, front four d representative in ddddd is from the GPS week that on January 6th, 1980,0h started at, last d represents day in week, the type of achievement is resolved in ttt representative, builds six grades of indexes, resolve the positional information of Outcome Document in the end node storage of index by day and analytic centre's title in GPS week, week; As Fig. 3 resolves shown in Outcome Document index, the first order to fourth stage index is respectively kilobit, hundred, ten and the individual position in GPS week, and index range is [0,9], level V index is day in GPS week, and index range is [0,7], wherein [0,6] 7 integers represented are the day of one week separate file, and numeral 7 represents separates file in week, and the 6th grade of index is analysis institution's title of three-character doctrine length, the scope of each character drops in [a, z] interval;
Each back end (DataNode) has been responsible for the structure of this node small documents index; After index construct completes, the merging of index completes at another specific back end (DataNode);
3) cutting index block, the index that second step is set up is carried out cutting by data block size (64MB), to observation file, owing to intraday observation data can be merged by GNSS data process software, therefore file sequence number can be unified to be 0, corresponding first order index file sequence number is also 0, during index cutting, to the second to the level V index of observation file, resolve first to the 6th grade of index of achievement, take mode from bottom to top, the size of computation index block, when index size exceedes the size of data block first, get back to an index, this index is saved as an independently index block, in such a way, complete the cutting to all indexes that second step builds,
Index block be merge the back end (DataNode) of index at second step to complete with cutting;
4) index block stores, the index block that 3rd step segments is stored in corresponding data block back end (DataNode) or from the nearest back end (DataNode) of data block, the title of the content of index block with the GNSS large file block after merging is mated, matching way is step by step taked during coupling, when there is branch in index block, there is the ratio shared by each index character of bifurcation in statistics, mate accounting for the maximum character of this grade of index ratio with the title of data block in back end (DataNode), using the memory node of node the highest for matching rate as this index block,
5) file type index/index block path is stored on title node (NameNode), be counted as shown in fruit file storage location explanation schematic diagram as Fig. 4 observes file conciliate, file is observed to GNSS, be stored in index on title node (NameNode) except comprising the file type of a letter representative, also comprise the rear two digits in observation year on date; To resolving Outcome Document, the index be stored on title node (NameNode) only comprises the file type of three letter representatives; By file type index and index block one to one address maps be stored on title node (NameNode), complete the mapping of the index of above-mentioned structure, therefore, except data block copy amount and the large files filename/map paths of storage, file type/index block the path be made up of three bit digital or letter is also stored on title node (NameNode), thus realizes the storage of magnanimity GNSS small documents cloud.
The above; be only the present invention's preferably embodiment; protection scope of the present invention is not limited thereto; anyly be familiar with those skilled in the art in the technical scope that the present invention discloses, the simple change of the technical scheme that can obtain apparently or equivalence are replaced and are all fallen within the scope of protection of the present invention.
From the above, the present invention is a kind of method that new magnanimity GNSS small documents cloud stores, and supports the efficient storage to magnanimity GNSS small documents, management, inquiry and shares.Test the cluster by building 9 node compositions, 1 as title node (NameNode), all the other 8 as back end (DataNode), number of copies is set to 3, tests the write of magnanimity GNSS small documents, reading and deletion speed.By test, the small documents storage means that the present invention proposes is compared with traditional HDFS method, and greatly save storage space, memory consumption reduces 1/2, and writing speed improves about 4 times, and reading speed improves about 3 times, and deletion speed improves about 2.5 times.Difference of the performance of the effect of practical application and the scale of storage system, each node, network environment, size of data and type etc. is closely related.Therefore the present invention is compared with prior art, has following outstanding Advantageous Effects:
(1) storage space is saved
The present invention is according to the data type of GNSS and data characteristics, the observation data of Continuous Observation period is conciliate and is counted as fruit, take the strategy being merged into large files, improve Hadoop distributed file system (HDFS, Hadoop Distributed File System) in each small documents take the situation of whole data block space, the data block of large files after merging after cutting takies the size of a data block, effectively save the storage space of back end (DataNode), improve the utilization factor of storage space.
(2) memory consumption is reduced
What the present invention proposed observes file reconciliation be counted as fruit naming rule according to GNSS, and the large files after being combined sets up index, in the path that the end node storage file of index is preserved.On the one hand, small documents is merged, the quantity of data block in storage system can be greatly reduced, reduce the memory cost of title node (NameNode); On the other hand, large files after being combined sets up index and after cutting, index block is left in back end (DataNode), title node (NameNode) only saves the mapping of the rope file type/index path of file extension and the mapping of large files filename/file path, reduce further the memory consumption of title node (NameNode).
(3) improve write, read and deletion efficiency
The method that the present invention proposes is by merging GNSS small documents, the method of the file set up index after being combined, establish efficient memory mechanism, decrease client and title node (NameNode), title node (NameNode) and back end (DataNode), communication between client and back end (DataNode), decrease the response time of retrieval and indexing.Improve write, read and deletion efficiency.
(4) expansion is easy to
The method that the present invention proposes has applicability more widely, observes file conciliate being counted as fruit by after merging, setting up index and piecemeal, can realize efficient storage to all kinds of GNSS.To newly-increased GNSS data and achievement form, merge according to data type and feature, through index building, after the steps such as piecemeal storage, all can include small documents storage system of the present invention in, broad applicability and stronger extendability can be had, solve bottleneck and challenge that the storage of existing GNSS small documents faces, bring efficient storage efficiency, effectively be applied to " Geodesy and Survey Engineering " technical field in " Surveying Science and Technology " subject, realize magnanimity GNSS small documents efficient storage, management, issue and share, economic and social benefit is huge.

Claims (5)

1. a magnanimity GNSS small documents cloud storage means, is characterized in that, first magnanimity GNSS small documents is merged into large files, and the large files after being combined sets up index; And optimum indexing block storage policy, blocks of files after cutting and index block are stored in data block node or from the nearest back end of data block, by the index stores of GNSS data type on title node, reduce the consumption of memory capacity and the memory consumption of title node, the performance improving large amount of small documents write, access and delete, specifically comprises the following steps:
(1), by magnanimity GNSS small documents merge into large files, to reduce large amount of small documents taking title node memory, small documents merge be first by same observation the period or the resolving time, same type file merge; Wherein when observing the merging of file to GNSS, the sequencing alphabetical by survey station name four merges, to when resolving the merging of Outcome Document, the sequencing alphabetical by GNSS analytic centre title three merges, observed a large amount of GNSS Piece file mergence become an observation period continuous print observation large files, by resolve Outcome Document merge become a resolving time Sequentially continuous resolve achievement large files;
(2) the GNSS large files index building after, being combined, namely respectively observation file is conciliate and be counted as fruit index building, adopt character and index mode one to one, to observation file, by file sequence number, year day of year and survey station name build Pyatyi index, in the end store the positional information of observation file in one-level index; To resolving Outcome Document, building six grades of indexes by day and analytic centre's title in GPS week, week, in the end in one-level index, storing the positional information resolving Outcome Document;
(3), the index of foundation is carried out cutting by data block size, owing to intraday observation data can be merged by GNSS data process software, therefore file sequence number can be unified to be 0, corresponding observation file first order index file sequence number is also 0, during index cutting, to observing the second to the level V index of file, resolving first to the 6th grade of index of achievement, take mode from bottom to top, its cutting is the index block of 64MB size by the size of computation index;
(4), by index block be placed on the node that stores data block or from the nearest node of data block, improve file reading speed and to go forward side by side the memory consumption of the low title node of a step-down;
(5), the index stores of the file type of rear GNSS large files will be merged on title node, blocks of files map paths and sign observation file are stored on title node with three characters/index block map paths resolving achievement type, blocks of files and index block are all stored on back end, and the cloud realizing magnanimity GNSS small documents stores.
2. magnanimity GNSS small documents cloud storage means according to claim 1, is characterized in that, comprise the following steps:
Step 1: magnanimity GNSS small documents is merged into large files, to reduce large amount of small documents taking title node memory, magnanimity GNSS small documents comprises following two type files: a class is, with observation data, ephemeris is gentle resembles the observation file that file is representative in navigation, and another kind of is that to be representative with coordinate file, precise ephemeris, precise clock correction resolve Outcome Document; No matter be observe file or resolve Outcome Document, all have employed the standard format of international uniform, the Interchange Format had nothing to do with receiver that observation file adopts, resolve that achievement adopts resolve irrelevant achievement Interchange Format, ionosphere Interchange Format and precise ephemeris data layout form, n GNSS small documents is stored in system, every part of GNSS small documents all comprises position, time and file type three kinds of parameters, distinguished by parameter between data, GNSS small documents data set D is expressed as:
D={d (L i, T j, I k), d|L i∈ L, T j∈ T, I k∈ I}, i, j, k ∈ Z formula (1)
Wherein, the positional information that L representation file produces, mainly comprises the survey station gathering observation file and the mechanism of resolving Outcome Document; The time mark that T representation file produces, because the 24h Continuous Observation of survey station and the timing of data center are resolved and issue continuously, T is a continuous print time series; I representation file type, is defined by above-mentioned standard format, L and T all divides acquisition from the top of file of filename and file record, and I divides acquisition from the top of file of file extension and file record; D represents set, the sequence number of i, j, k respectively representation file position, time and type parameter, and Z is integer;
Small documents merge time, first by same observation the period or the resolving time, same type file, merge by the sequencing of survey station name four characters, analytic centre's title three characters, merge after GNSS small documents collection be expressed as:
D T j , I k = { d ( L i , T j , I k ) , d | L i ∈ L , T j , I k } , i ∈ Z Formula (2)
Wherein, T jrepresent jth observation period or the moment of resolving, I krepresent a kth file type, Z is integer;
Then, respectively the small documents of each type is merged by continuous print observation period or resolving time sequence, because in GNSS small documents is measured, all solutions are of universal significance, therefore respectively the day of the observation file of continuous 7 days and 7 days being separated Piece file mergence is a large files can be expressed as:
D I k = { d ( L i , T j , L k ) , d | L i ∈ L , T j ∈ T , I k } , i , j ∈ Z Formula (3)
By the merging of above two steps, the GNSS of continuous 7 days can be observed Piece file mergence become one to observe period continuous print observation large files, by 7 days resolve that Outcome Document is merged into a resolving time Sequentially continuous resolve achievement large files; The filename of large files is with the observation of file type, starting and ending or resolving time, first and end survey station name or analytic centre's name mark; File after merging is stored in cloud storage system the mode adopting piecemeal, the size of data block is set to 64MB, each data block is the set of multiple small documents, and take the memory headroom of title node 150B, the memory headroom that each small documents before comparatively merging takies 150B is compared, and greatly reduces the memory consumption of title node;
Step 2: the GNSS large files index building after being combined, namely conciliate observation file respectively and be counted as fruit by L and T index building, method is:
During to observation file index building, because observation file adopts RINEX form to preserve, RINEX form adopts the naming method of 8.3, the wherein root name of 8 representatives for representing 8 character lengths that file belongs to, 3 representatives are for representing the extension name of 3 character lengths of file type, and concrete form is ssssdddf.yyt, and wherein ssss represents the survey station name of 4 character lengths, ddd represents year day of year, and f represents the file sequence number of a day, employing character f represents intraday file sequence number, character string ddd represents year day of year, character string ssss represents survey station name, from top to down, takes character and index mode one to one, build Pyatyi index, in the end the end node of one-level index stores the positional information of observation file, first order index is file sequence number, index range is by [0, 9] and [a, z] two interval compositions, [0, 9] 10 Arabic integers are represented, [a, z] represent 26 English lower case, second level index is hundred of year day of year, index range is [0, 3], [0, 3] 4 Arabic integers are represented, third level index corresponds to ten of year day of year, index range is [0, 9], fourth stage index is the individual position of year day of year, index range is [0, 9], level V index is the survey station name of four character lengths, the scope of each character drops on [0, 9] and [a, z] in two intervals,
To resolving Outcome Document, the form of sssddddd.ttt is adopted to preserve, wherein sss represents three characters abbreviations of analytic centre, front four d representative in ddddd is from the GPS week that on January 6th, 1980,0h started at, last d represents day in week, the type of achievement is resolved in ttt representative, by GPS week, in week, day and analytic centre's title build six grades of indexes, the positional information of Outcome Document is resolved in the end node storage of index, the first order is respectively the kilobit in GPS week to fourth stage index, hundred, ten and a position, index range is [0, 9], level V index is day in GPS week, index range is [0, 7], wherein [0, 6] 7 integers represented represent the day solution file of a week, numeral 7 represents separates file in week, 6th grade of index is analysis institution's title of three-character doctrine length, the equal scope of each character drops on [a, z] in interval,
Step 3: the index of foundation is carried out cutting by data block 64MB, to observation file, owing to intraday observation data can be merged by GNSS data process software, therefore the unification of file sequence number is 0, and corresponding first order index file sequence number is also 0; During index cutting, to observing the second to the level V index of file, resolving first to the 6th grade of index of achievement, take mode from bottom to top, the size of computation index block, the size of a current i-1 and i index meets following formula
Σ 1 i-1 IBlock ≤ 64 MB Σ 1 i IBlock > 64 MB Formula (4)
A front i-1 index is saved as an independently index block, in such a way, completes the cutting to all indexes that step 2 builds;
Step 4: index block is placed on the back end that stores data block or from the nearest back end of data block, improve reading speed to go forward side by side the memory consumption of the low title node of a step-down, the title of the content of the index block of cutting with the GNSS large file block after merging is mated, the mode of mating step by step is taked from top to bottom during coupling, when there is branch in index, there is the ratio shared by each index character of bifurcation in statistics, mate accounting for the maximum character of index block ratio with data block in back end, using the memory node of node the highest for matching rate as index block, when index block be placed on store data block node or from the node that data block is nearest time, on the one hand, reduce communication overhead during digital independent, node local or adjacent again after namely finding certain index just can find corresponding file content, improve reading speed, on the other hand, because index does not leave on title node, but on back end, therefore reduce the memory consumption of title node further,
Step 5: by the index stores of file type of the GNSS large files after merging on title node, observe file to GNSS, is stored in index on title node except comprising the file type of a letter representative, also comprises the rear two digits in observation year on date; To resolving Outcome Document, the index be stored on title node only comprises the file type of three letter representatives; Therefore, except the data block copy amount stored and large files filename/map paths, the file type/index block path be made up of three bit digital or letter is also stored on title node, thus realizes the storage of magnanimity GNSS small documents cloud.
3. magnanimity GNSS small documents cloud storage means according to claim 2, it is characterized in that, described step 1 magnanimity GNSS small documents comprises GNSS to be observed file and resolves Outcome Document, the standard format of international uniform all followed by these files, because GNSS data and achievement form can constantly be upgraded, therefore, to the file layout after upgrading and the file type of up-to-date proposition, the category of GNSS small documents can all be brought into.
4. magnanimity GNSS small documents cloud storage means according to claim 2, it is characterized in that, described step 1 same observation period or resolving time, the file of same type merges, also can resolve the date by the identical observation period first respectively to merge, merge by continuous print observation period or the cycle of resolving again, the filename of large files is with file type, the observation of starting and ending or resolving time, first and end survey station title or analytic centre's name are referred to as mark, large files is stored in cloud storage system the mode adopting piecemeal after merging, the size of data block is set to 64MB, each data block is the set of multiple GNSS small documents.
5. magnanimity GNSS small documents cloud storage means according to claim 2, it is characterized in that, described step 2 is observed file and is resolved Outcome Document and build Pyatyi and six grades of indexes respectively, and Standard File Format is followed in the foundation of index, in the end the routing information of storage file in one-level index.
CN201510204235.7A 2015-04-24 2015-04-24 Magnanimity GNSS small documents cloud storage methods Expired - Fee Related CN104765876B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510204235.7A CN104765876B (en) 2015-04-24 2015-04-24 Magnanimity GNSS small documents cloud storage methods

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510204235.7A CN104765876B (en) 2015-04-24 2015-04-24 Magnanimity GNSS small documents cloud storage methods

Publications (2)

Publication Number Publication Date
CN104765876A true CN104765876A (en) 2015-07-08
CN104765876B CN104765876B (en) 2017-11-10

Family

ID=53647703

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510204235.7A Expired - Fee Related CN104765876B (en) 2015-04-24 2015-04-24 Magnanimity GNSS small documents cloud storage methods

Country Status (1)

Country Link
CN (1) CN104765876B (en)

Cited By (22)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105608212A (en) * 2015-12-30 2016-05-25 成都创智云科技股份有限公司 Method and system for guaranteeing MapReduce data input fragment to contain complete records
CN105843841A (en) * 2016-03-07 2016-08-10 青岛理工大学 Small file storage method and system
CN106528451A (en) * 2016-11-14 2017-03-22 哈尔滨工业大学(威海) Cloud storage framework for second level cache prefetching for small files and construction method thereof
CN106970928A (en) * 2016-01-14 2017-07-21 平安科技(深圳)有限公司 File management method and system
CN107391423A (en) * 2017-07-26 2017-11-24 Tcl移动通信科技(宁波)有限公司 Method, storage medium and the mobile terminal of file are transmitted by OTG functions
CN107402924A (en) * 2016-05-19 2017-11-28 普天信息技术有限公司 MR files apply the implementation method and device in HDFS
CN108460121A (en) * 2018-01-22 2018-08-28 重庆邮电大学 Space-time data small documents merging method in smart city
CN108470577A (en) * 2018-02-02 2018-08-31 重庆金山医疗器械有限公司 Capsule endoscope system date storage method
CN109033137A (en) * 2018-06-06 2018-12-18 千寻位置网络有限公司 Dynamic RINEX date storage method and device
CN109800184A (en) * 2018-12-12 2019-05-24 平安科技(深圳)有限公司 For the caching method of fritter input, system, device and can storage medium
CN109947703A (en) * 2017-11-09 2019-06-28 北京京东尚科信息技术有限公司 File system, file memory method, storage device and computer-readable medium
CN109947721A (en) * 2017-12-01 2019-06-28 北京安天网络安全技术有限公司 A kind of small documents treating method and apparatus
CN110795391A (en) * 2019-10-28 2020-02-14 深圳市元征科技股份有限公司 Automobile repair data processing method and device, electronic equipment and storage medium
CN111159120A (en) * 2019-12-16 2020-05-15 西门子电力自动化有限公司 Method, device and system for processing files in power system
CN111400247A (en) * 2020-04-13 2020-07-10 杭州九州方园科技有限公司 User behavior auditing method and file storage method
CN111461537A (en) * 2020-03-31 2020-07-28 山东胜软科技股份有限公司 Oil gas production data based classified quantity counting method and control system
CN111475463A (en) * 2020-04-01 2020-07-31 中国人民解放***箭军工程大学 GNSS observation data digital relation storage method
CN112347045A (en) * 2020-11-30 2021-02-09 长春工程学院 Storage method of massive cable tunnel state signal data
CN113032348A (en) * 2021-05-25 2021-06-25 湖南省第二测绘院 Spatial data management method, system and computer readable storage medium
CN113420186A (en) * 2021-06-18 2021-09-21 自然资源部第三地形测量队 Data storage method, data storage device, computer readable storage medium and data reading method
CN114416811A (en) * 2021-12-07 2022-04-29 中国科学院国家授时中心 Distributed storage system for GNSS data
CN116150113A (en) * 2023-04-17 2023-05-23 江苏北斗信创科技发展有限公司 Data storage method for GNSS

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102332027A (en) * 2011-10-15 2012-01-25 西安交通大学 Mass non-independent small file associated storage method based on Hadoop
CN102662992A (en) * 2012-03-14 2012-09-12 北京搜狐新媒体信息技术有限公司 Method and device for storing and accessing massive small files
WO2014000458A1 (en) * 2012-06-28 2014-01-03 华为技术有限公司 Small file processing method and device
CN103577123A (en) * 2013-11-12 2014-02-12 河海大学 Small file optimization storage method based on HDFS
CN103856567A (en) * 2014-03-26 2014-06-11 西安电子科技大学 Small file storage method based on Hadoop distributed file system
CN104346384A (en) * 2013-07-31 2015-02-11 上海云端广告有限公司 Method and device for processing small files

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102332027A (en) * 2011-10-15 2012-01-25 西安交通大学 Mass non-independent small file associated storage method based on Hadoop
CN102662992A (en) * 2012-03-14 2012-09-12 北京搜狐新媒体信息技术有限公司 Method and device for storing and accessing massive small files
WO2014000458A1 (en) * 2012-06-28 2014-01-03 华为技术有限公司 Small file processing method and device
CN104346384A (en) * 2013-07-31 2015-02-11 上海云端广告有限公司 Method and device for processing small files
CN103577123A (en) * 2013-11-12 2014-02-12 河海大学 Small file optimization storage method based on HDFS
CN103856567A (en) * 2014-03-26 2014-06-11 西安电子科技大学 Small file storage method based on Hadoop distributed file system

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
BO DONG等: "An optimized approach for storing and accessing small files on cloud storage", 《JOURNAL OF NETWORK AND COMPUTER APPLICATIONS》 *
时倩等: "基于Hadoop的海量小文件存储方法的研究", 《数字技术与应用》 *

Cited By (31)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105608212A (en) * 2015-12-30 2016-05-25 成都创智云科技股份有限公司 Method and system for guaranteeing MapReduce data input fragment to contain complete records
CN105608212B (en) * 2015-12-30 2020-02-07 成都国腾实业集团有限公司 Method and system for ensuring that MapReduce data input fragment contains complete record
CN106970928A (en) * 2016-01-14 2017-07-21 平安科技(深圳)有限公司 File management method and system
CN106970928B (en) * 2016-01-14 2020-12-29 平安科技(深圳)有限公司 File management method and system
CN105843841A (en) * 2016-03-07 2016-08-10 青岛理工大学 Small file storage method and system
CN107402924A (en) * 2016-05-19 2017-11-28 普天信息技术有限公司 MR files apply the implementation method and device in HDFS
CN106528451B (en) * 2016-11-14 2019-09-03 哈尔滨工业大学(威海) The cloud storage frame and construction method prefetched for the L2 cache of small documents
CN106528451A (en) * 2016-11-14 2017-03-22 哈尔滨工业大学(威海) Cloud storage framework for second level cache prefetching for small files and construction method thereof
CN107391423A (en) * 2017-07-26 2017-11-24 Tcl移动通信科技(宁波)有限公司 Method, storage medium and the mobile terminal of file are transmitted by OTG functions
CN109947703A (en) * 2017-11-09 2019-06-28 北京京东尚科信息技术有限公司 File system, file memory method, storage device and computer-readable medium
CN109947721A (en) * 2017-12-01 2019-06-28 北京安天网络安全技术有限公司 A kind of small documents treating method and apparatus
CN109947721B (en) * 2017-12-01 2021-08-17 北京安天网络安全技术有限公司 Small file processing method and device
CN108460121A (en) * 2018-01-22 2018-08-28 重庆邮电大学 Space-time data small documents merging method in smart city
CN108460121B (en) * 2018-01-22 2022-02-08 重庆邮电大学 Little file merging method for space-time data in smart city
CN108470577A (en) * 2018-02-02 2018-08-31 重庆金山医疗器械有限公司 Capsule endoscope system date storage method
CN108470577B (en) * 2018-02-02 2021-07-27 重庆金山医疗器械有限公司 Capsule endoscopy system data storage method
CN109033137A (en) * 2018-06-06 2018-12-18 千寻位置网络有限公司 Dynamic RINEX date storage method and device
CN109033137B (en) * 2018-06-06 2021-11-05 千寻位置网络有限公司 Dynamic RINEX data storage method and device
CN109800184A (en) * 2018-12-12 2019-05-24 平安科技(深圳)有限公司 For the caching method of fritter input, system, device and can storage medium
CN110795391A (en) * 2019-10-28 2020-02-14 深圳市元征科技股份有限公司 Automobile repair data processing method and device, electronic equipment and storage medium
CN111159120A (en) * 2019-12-16 2020-05-15 西门子电力自动化有限公司 Method, device and system for processing files in power system
CN111461537A (en) * 2020-03-31 2020-07-28 山东胜软科技股份有限公司 Oil gas production data based classified quantity counting method and control system
CN111475463A (en) * 2020-04-01 2020-07-31 中国人民解放***箭军工程大学 GNSS observation data digital relation storage method
CN111400247A (en) * 2020-04-13 2020-07-10 杭州九州方园科技有限公司 User behavior auditing method and file storage method
CN112347045A (en) * 2020-11-30 2021-02-09 长春工程学院 Storage method of massive cable tunnel state signal data
CN112347045B (en) * 2020-11-30 2022-07-26 长春工程学院 Storage method of mass cable tunnel state signal data
CN113032348A (en) * 2021-05-25 2021-06-25 湖南省第二测绘院 Spatial data management method, system and computer readable storage medium
CN113420186A (en) * 2021-06-18 2021-09-21 自然资源部第三地形测量队 Data storage method, data storage device, computer readable storage medium and data reading method
CN113420186B (en) * 2021-06-18 2022-10-04 自然资源部第三地形测量队 Data storage method, data storage device, computer readable storage medium and data reading method
CN114416811A (en) * 2021-12-07 2022-04-29 中国科学院国家授时中心 Distributed storage system for GNSS data
CN116150113A (en) * 2023-04-17 2023-05-23 江苏北斗信创科技发展有限公司 Data storage method for GNSS

Also Published As

Publication number Publication date
CN104765876B (en) 2017-11-10

Similar Documents

Publication Publication Date Title
CN104765876A (en) Massive GNSS small file cloud storage method
Lloyd et al. High resolution global gridded data for use in population studies
CN112182410B (en) User travel mode mining method based on space-time track knowledge graph
CN109635068A (en) Mass remote sensing data high-efficiency tissue and method for quickly retrieving under cloud computing environment
CN102662610B (en) A kind of Remote Sensing Image Management System and method
WO2020252799A1 (en) Parallel data access method and system for massive remote-sensing images
CN107423422B (en) Spatial data distributed storage and search method and system based on grid
US20120197900A1 (en) Systems and methods for search time tree indexes
CN103412962A (en) Storage method and reading method for mass tile data
Cowley et al. Historic aerial photographic archives for European archaeology
US9223801B2 (en) Information management method and information management apparatus
CN106933833B (en) Method for quickly querying position information based on spatial index technology
US9877031B2 (en) System and method for multi-resolution raster data processing
CN105069703A (en) Mass data management method of power grid
CN103399945A (en) Data structure based on cloud computing database system
CN102982103A (en) On-line analytical processing (OLAP) massive multidimensional data dimension storage method
CN106599040A (en) Layered indexing method and search method for cloud storage
CN104077411A (en) Remote sensing satellite data processing method and system
CN104008209B (en) Reading-writing method for MongoDB cluster geographic data stored with GeoJSON format structuring method
CN109492060A (en) A kind of map tile storage method based on MBTiles
CN104021210B (en) Geographic data reading and writing method of MongoDB cluster of geographic data stored in GeoJSON-format semi-structured mode
CN103678657B (en) Method for storing and reading altitude data of terrain
CN110659369B (en) On-orbit high-precision lightweight global image control point database construction method and system
CN111813959A (en) Meteorological record knowledge graph construction method
CN111046005B (en) Gridding coding method for urban three-dimensional live-action data

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
EXSB Decision made by sipo to initiate substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20171110

Termination date: 20180424