CN112463905B - Vector data SHP file parallel writing method - Google Patents

Vector data SHP file parallel writing method Download PDF

Info

Publication number
CN112463905B
CN112463905B CN202011366649.7A CN202011366649A CN112463905B CN 112463905 B CN112463905 B CN 112463905B CN 202011366649 A CN202011366649 A CN 202011366649A CN 112463905 B CN112463905 B CN 112463905B
Authority
CN
China
Prior art keywords
file
writing
block
shp
blocks
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202011366649.7A
Other languages
Chinese (zh)
Other versions
CN112463905A (en
Inventor
李三玉
郑波
郑良
李金振
胡剑锋
吴宇峰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hubei Kingtopware Information Technology Co ltd
Original Assignee
Hubei Kingtopware Information Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hubei Kingtopware Information Technology Co ltd filed Critical Hubei Kingtopware Information Technology Co ltd
Priority to CN202011366649.7A priority Critical patent/CN112463905B/en
Publication of CN112463905A publication Critical patent/CN112463905A/en
Application granted granted Critical
Publication of CN112463905B publication Critical patent/CN112463905B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/29Geographical information databases
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/13File access structures, e.g. distributed indices
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/16File or folder operations, e.g. details of user interfaces specifically adapted to file systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Remote Sensing (AREA)
  • Human Computer Interaction (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention relates to the technical field of geographic information, in particular to a vector data SHP file parallel writing method. When parallel writing the SHP file, firstly allocating enough long space in advance, writing the file in blocks and recording statistical information. After writing, reconstructing file header information, and then combining the real contents of each block of file in a streaming mode.

Description

Vector data SHP file parallel writing method
Technical Field
The invention relates to the technical field of Geographic Information (GIS), in particular to a vector data SHP file parallel writing method.
Background
The SHP file is a published format of ArcGIS product of the american abri company, and is widely used in the field of geographic information as a factual standard of space vector data, but for historical reasons, the SHP file can only be read and written in series, and is relatively inefficient. In a cloud GIS environment, the write SHP can only be serial, and resources cannot be utilized.
Since the SHP files are separated by the graph and the attribute, each SHP file has at least three file formats, which are, SHP (graph format, geometric entity for storing elements), sx (graph index format), and dbf (attribute data format). The most difficult of the parallel writing is the SHP file (because the file lock is generated when the arc gis interface is used for writing), because the SHP file stores the space coordinate, and the space coordinate has various types, the start and stop positions of block writing cannot be well determined in the process of writing the parallel writing, and the parallel reading and writing of SHX and DBF are relatively simple. Under the cloud GIS environment, the performance of parallel writing is far higher than that of serial writing, and the performance of cloud computing cannot be exerted in the conventional serial mode.
Disclosure of Invention
Aiming at the defects of the prior art, the parallel writing algorithm of the SHP file is provided, when the SHP file is written in parallel, enough long space is firstly allocated in advance, and the SHP file is written in blocks and statistical information is recorded. After writing, file header information is reconstructed, and then the real contents of all the files are taken in a streaming mode to be combined.
The technical scheme of the invention is as follows: a parallel writing method of vector data SHP files comprises the following steps: the method is characterized in that:
step 1, pre-allocating space of each block, wherein the space length of each block is more than or equal to the total length of a file/the number of divided blocks multiplied by 2, and ensuring that each block cannot exceed the range to cause boundary-crossing writing;
step 2, writing each file in a block manner;
step 3, respectively recording statistical information and recording start and stop positions during block writing;
step 4, a blank file is newly built, statistical information of all blocks is collected, and a file header of a combined file is conveniently constructed;
step 5, reconstructing the header information of the standard shp file, and writing the header information of the file in a binary stream mode;
and 6, combining the real contents of the files in each block in a streaming manner, and generating a new standard shp file after combination.
The vector data SHP file parallel writing method comprises the following steps: in the step 1, the space length of each block is less than or equal to 3 times of the total length of the file/the number of the blocks.
The vector data SHP file parallel writing method comprises the following steps: in step 2, 2 blocks are written in parallel by 1 cpu core.
The vector data SHP file parallel writing method comprises the following steps: the statistical information and the recording start-stop position in step 3 are written in the header of each block of the file.
The invention has the beneficial effects that: firstly, the formation of file locks is avoided, and the purpose of improving the write-in performance of the SHP files through parallel write-in is achieved. And secondly, the traditional SHP file writing is performed by writing elements one by one, and only after one element is written, the next element can be written, the parallel block writing of the invention is to write a batch of elements once, so that the SHP writing efficiency can be greatly improved, and the efficiency can be improved by 3-10 times compared with the traditional serial writing.
Drawings
FIG. 1 is a diagram illustrating a structure of an SHP file.
FIG. 2 is a schematic process diagram of the process of the present invention.
FIG. 3 is a schematic flow chart of the present invention.
Detailed Description
The technical scheme of the invention is further explained by combining the attached drawings.
As shown in fig. 1 to fig. 3, a parallel writing method for vector data SHP file of the present invention includes the following steps:
step 1, pre-allocating space of each block, wherein the space length of each block is more than or equal to the total length of the file/the number of divided blocks multiplied by 2, and ensuring that each block cannot exceed the range to cause boundary-crossing writing. The space length of each block cannot be less than 2 times of the total length/number of the blocks of the file, otherwise, the space length of each block is easy to cross the border, the space length of each block is too large and too small, space waste is caused, generally, the space length is not more than 3 times of the total length/number of the blocks of the file at most, 2-3 times are recommended, a specific value can be estimated according to the record length of each element, the length is long (the attributes of the elements are many or the space points are many), the multiple can slightly estimate a large point, and otherwise, a small point is given.
Step 2, writing each file in a block mode, wherein generally the optimal block number is 2 CPU core numbers, and comparison tests show that 2 blocks of concurrent writing of 1 CPU core have the highest concurrent speed (1 block of CPU has low utilization rate, 3 blocks of CPU have too high load, and the parallel speed is reduced on the contrary)
And 3, respectively recording statistical information and recording start-stop positions when writing in blocks, wherein the statistical information and the recording start-stop positions are written at the head of each block of file, so that the statistical information is convenient and quick to read, and the statistical information mainly records the number of records and the length (byte number) of actual content.
After the step 4 is finished, the file at this time has invalid information, a blank file needs to be newly created, and statistical information of each block is summarized, the statistical information mainly records the total recording number and the total length (byte number) of actual content, so that a file header of the merged file is conveniently constructed.
And 5, reconstructing the standard shp file header information, writing the file header information, and writing the file header information in a binary stream mode.
And 6, merging the real contents of the files in blocks in a streaming mode, generating a new standard shp file after merging, and because the process of writing the merged file is based on writing in the binary stream at the specified position and each position has no overlapping part, the file locks can be parallelly and effectively avoided from being formed.

Claims (3)

1. A parallel writing method of vector data SHP files comprises the following steps: the method is characterized in that:
step 1, pre-allocating space of each block, wherein the space length of each block is more than or equal to the total length of a file/the number of divided blocks multiplied by 2, and ensuring that each block cannot exceed the range to cause boundary-crossing writing;
step 2, writing each file in a block mode, and writing 2 blocks in parallel by 1 CPU core;
step 3, respectively recording statistical information and recording start and stop positions when writing in blocks, and recording the recording number and the length of actual content by the statistical information;
step 4, a blank file is newly built, statistical information of all blocks is collected, and a file header of a combined file is conveniently constructed;
step 5, reconstructing the standard shp file header information, and writing the file header information in a binary stream mode;
and 6, combining the real contents of the files in each block in a streaming manner, and generating a new standard shp file after combination.
2. The parallel writing method of vector data SHP file according to claim 1 comprises the following steps: in the step 1, the space length of each block is less than or equal to 3 times of the total length of the file/the number of the blocks.
3. The parallel writing method of vector data SHP file according to claim 1 comprises the following steps: the statistical information and the recording start-stop position in step 3 are written in the header of each block of the file.
CN202011366649.7A 2020-11-30 2020-11-30 Vector data SHP file parallel writing method Active CN112463905B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011366649.7A CN112463905B (en) 2020-11-30 2020-11-30 Vector data SHP file parallel writing method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011366649.7A CN112463905B (en) 2020-11-30 2020-11-30 Vector data SHP file parallel writing method

Publications (2)

Publication Number Publication Date
CN112463905A CN112463905A (en) 2021-03-09
CN112463905B true CN112463905B (en) 2022-06-03

Family

ID=74809359

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011366649.7A Active CN112463905B (en) 2020-11-30 2020-11-30 Vector data SHP file parallel writing method

Country Status (1)

Country Link
CN (1) CN112463905B (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102591709A (en) * 2011-12-20 2012-07-18 南京大学 Shapefile master-slave type parallel writing method based on OGR (open geospatial rule)
KR20130110691A (en) * 2012-03-30 2013-10-10 동아대학교 산학협력단 Automatic production system for address
CN103617626A (en) * 2013-12-16 2014-03-05 武汉狮图空间信息技术有限公司 Central processing unit (CPU) and ground power unit (GPU)-based remote-sensing image multi-scale heterogeneous parallel segmentation method
CN103678705A (en) * 2013-12-30 2014-03-26 南京大学 Vector data concurrent conversion method from VCT file to shapefile file
CN108986113A (en) * 2018-07-06 2018-12-11 航天星图科技(北京)有限公司 A kind of block parallel multi-scale division algorithm based on LLTS frame

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102591709A (en) * 2011-12-20 2012-07-18 南京大学 Shapefile master-slave type parallel writing method based on OGR (open geospatial rule)
KR20130110691A (en) * 2012-03-30 2013-10-10 동아대학교 산학협력단 Automatic production system for address
CN103617626A (en) * 2013-12-16 2014-03-05 武汉狮图空间信息技术有限公司 Central processing unit (CPU) and ground power unit (GPU)-based remote-sensing image multi-scale heterogeneous parallel segmentation method
CN103678705A (en) * 2013-12-30 2014-03-26 南京大学 Vector data concurrent conversion method from VCT file to shapefile file
CN108986113A (en) * 2018-07-06 2018-12-11 航天星图科技(北京)有限公司 A kind of block parallel multi-scale division algorithm based on LLTS frame

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
基于AE的Shapefile和Excel之间数据转换方法;刘蕊等;《计算机工程与设计》;20070731;第3515-3517页 *

Also Published As

Publication number Publication date
CN112463905A (en) 2021-03-09

Similar Documents

Publication Publication Date Title
WO2020024799A1 (en) Method for aggregation optimization of time series data
CN110019218B (en) Data storage and query method and equipment
WO2020041928A1 (en) Data storage method and system and terminal device
WO2015024474A1 (en) Rapid calculation method for electric power reliability index based on multithread processing of cache data
CN102880615A (en) Data storage method and device
CN103500089A (en) Small file storage system suitable for Mapreduce calculation model
WO2022037015A1 (en) Column-based storage method, apparatus and device based on persistent memory
CN112231276A (en) Method and system for aggregating data in object storage system
CN112463730A (en) Method, system and medium for hierarchical optimization of storage of massive small files
CN113268457A (en) Self-adaptive learning index method and system supporting efficient writing
CN109511008B (en) Method for supporting video and audio file content addition based on object storage
CN107423425A (en) A kind of data quick storage and querying method to K/V forms
CN112463905B (en) Vector data SHP file parallel writing method
CN115470235A (en) Data processing method, device and equipment
CN104102552A (en) Message processing method and device
CN112463880A (en) Block chain data storage method and related device
CN106909623A (en) A kind of data set and date storage method of supporting efficient mass data to analyze and retrieve
CN111611440A (en) Method for rapidly improving OFD signature, signature and verification
CN111370070B (en) Compression processing method for big data gene sequencing file
CN112988866A (en) Method and device for exporting excel file, electronic equipment and storage medium
CN113157680B (en) Data block increment compression and query method suitable for time sequence database
CN112395440A (en) Caching method, efficient image semantic retrieval method and system
CN111079935B (en) Machine learning rapid large-scale sample signature method under spark
CN116414839B (en) SSD-oriented time sequence data storage method and system based on LSM_Tree
CN115993938B (en) Disk formatting method, apparatus, device and computer readable storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant