CN116166640A

CN116166640A - Real-time acquisition and management method and system for global navigation satellite observation data

Info

Publication number: CN116166640A
Application number: CN202111411432.8A
Authority: CN
Inventors: 黄功文; 王斌; 孙敏; 王维; 惠哲; 李阳; 赵红; 成夏葳; 崔文俊; 赵康
Original assignee: Beixingshiyun Nanjing Technology Co ltd; Geodetic Data Processing Center Of Ministry Of Natural Resources
Current assignee: Beixingshiyun Nanjing Technology Co ltd; Geodetic Data Processing Center Of Ministry Of Natural Resources
Priority date: 2021-11-25
Filing date: 2021-11-25
Publication date: 2023-05-26

Abstract

The invention discloses a real-time acquisition and management method and system for global navigation satellite observation data, wherein a management mode of a distributed database is adopted to uniformly store process data and result data acquired by global navigation satellite data, a standardized data query interface is opened, and a quick, standard and concise data service mode is provided for autonomous research and development software and data sharing exchange; meanwhile, the process data can be reserved, and the historical traceability of the data is provided. In addition, the invention also adopts a distributed message queue framework and a distributed stream data parallel computing framework, thereby meeting the requirements of large-scale high-frequency data acquisition and communication, mass data analysis and computation, high concurrency data query requests, elastic management of mass navigation satellite observation data after long-term accumulation and the like, which are generated by networking of a large-scale satellite navigation positioning reference station (CORS).

Description

Real-time acquisition and management method and system for global navigation satellite observation data

Technical Field

The invention relates to the field of satellite positioning, in particular to a method and a system for acquiring and managing global navigation satellite observation data in real time.

Background

With the high-speed development of multi-constellation multi-frequency satellite systems, the networking operation of the Beidou three global satellite navigation system in China and the rapid increase of the mass application requirements of public location services, various users have more urgent requirements on faster, more accurate and more reliable location services. The satellite navigation positioning reference station (Continuously Operating Reference Stations, CORS) is used as the most important component of the spatial data infrastructure, and can provide accurate and reliable spatial positioning service for various users. In the last ten years, the natural resource department, the China seismic bureau and the mapping geographic information departments or units of each province establish CORS with different grades, different accuracies and different functions, and high-precision positioning support service is provided in national economy construction. With the rapid rise of new generation information technologies and applications such as mobile internet and internet of things, the coverage area and service field of high-precision positioning application are deepened continuously, unified collection and unified management of CORS data information and standardized, rich and flexible data service are the necessary trend of satellite navigation and position service capability improvement in the Internet big data background age.

The important content of the CORS management and maintenance is the acquisition, storage and management of satellite observation data. With the increasing demand of various industries for satellite data, the construction types, construction standards and application scenes of the current CORS stations are different, and more sensor devices are basically provided for observing various types of data (such as GNSS data, meteorological information data and the like). The current CORS data acquisition and management of each province mainly depends on a software system provided by a sensor equipment provider, and a plurality of software systems are often parallel, and the following problems are faced: 1) The data formats are different: besides the different data formats caused by different contents of the data, each equipment manufacturer stores the same type of data in a custom file format, for example, trimble adopts a T02 format, southern mapping adopts an STH format to store observation data, and the like; 2) Data content loss: although the international maritime radio technical commission (Radio Technical Commission for Maritime services, RTCM) sets a differential global navigation system service standard for realizing the unification of differential data formats of different receivers so as to facilitate the exchange and processing of observation data, different manufacturers make a choice or even secret transformation on the RTCM partial text information in order to adapt to own algorithm and software compatibility, so that even if the analyzed codes are stored as standard data, the data of different manufacturers may still have differences; 3) The use process is complex: the data user always needs a complex data conversion and extraction process, different preprocessing methods are used according to the data type and data source, and the data can be applied and calculated after being converted into a standard or internal acceptable format, so that the satellite data is not very favorable for wide use in future; 4) Real-time computing is not supported: because the data acquisition process is a closed black box process in equipment manufacturer software, the software only provides static result data, the acquisition process data is not fully opened, and excessive dependence on equipment manufacturers may cause that some application functions cannot be realized or are difficult to realize, so that accurate single point positioning (PPP) calculation in actual practice must acquire and analyze CORS data quickly and correctly, and positioning is realized by using a non-difference model after effective observation values are acquired; 5) Management confusion: in the aspect of data management, the existing navigation satellite observation data are stored in a file system in a file mode, the data are indexed through a file path structure and a file name, a plurality of sets of CORS data file catalogues and file name indexing methods are independent from each other, warehouse entry management is not performed, and unified data catalogues and efficient data query service are not provided. Therefore, the conventional data management capability is unfavorable for the long-term management of CORS, and greatly limits the expansion of satellite data services and the large-scale application development.

Disclosure of Invention

Aiming at the problems of various data, various use scenes, different data storage format standards of equipment manufacturers, inconvenient data use, complex preprocessing process, low efficiency and the like caused by closed data acquisition process and the current situations of low data management efficiency, incapability of meeting the high-efficiency query access requirements of large-scale application and the like caused by no database building management of satellite data, the invention provides a method and a system for acquiring and managing global navigation satellite observation data in real time, which are mainly characterized in that: and the management mode of the database is adopted to carry out unified warehousing storage on the process data and the result data acquired by the global navigation satellite data, and a standardized data query interface is opened, so that the respective coding and independent service modes of the traditional business software are changed, and a quick, standard and concise data service mode is provided for independent research and development of software and data sharing exchange. Meanwhile, the process data is reserved, the historical traceability of the data is provided, more flexible data application can be supported, and the data application process is simplified.

In addition, the invention also utilizes a distributed big data frame to improve the global navigation satellite data acquisition and management method and system in order to meet the requirements of large-scale high-frequency data acquisition and communication, mass satellite data analysis and calculation, high-concurrency navigation satellite data query requests, elastic management of mass observation data after long-term accumulation and the like generated by networking of large-scale reference stations in consideration of the development trend of a national satellite navigation positioning reference station.

The specific technical scheme of the invention comprises the following steps:

scheme one: the invention discloses a real-time acquisition and management method for global navigation satellite observation data, which mainly comprises the following steps: receiving satellite observation data streams based on RTCM3.2 standard which are simultaneously and uninterruptedly transmitted according to TCP/IP protocol from N CORS stations; analyzing the observation data stream into a plurality of independent original observation data, wherein each original observation data only records all fields of one message type; establishing a streaming task to carry out content correction on each original observation number in a parallel processing mode to obtain first observation data; filtering the first observation data by adopting a data verification method in the streaming task to obtain second observation data; establishing a buffer area for buffering second observation data, wherein the buffer area is divided into M buffer areas, the second observation data of the same CORS station are stored in the same buffer area, and a data consumption interface is provided for each buffer area; sending a data acquisition request to each data consumption interface, and packaging the acquired second observation data according to preset rules to establish a plurality of parallel computing units; starting a parallel computing task to analyze and compute second observation data in the parallel computing unit to obtain text fields recorded in a data area and an identification area in the second observation data; writing the text field and the binary original text of the data area and the identification area in the second observation data into a distributed database for storage management, wherein the text field and the binary original text form process data, and a first query interface of the process data is provided; reading text fields from a distributed database at regular time, generating a static observation file, writing the file into the distributed database for storage management, and providing a second query interface of the file; the first query interface and the second query interface are defined based on the HTTP RESTful specification, N and M being natural numbers greater than 1.

As a preferable scheme, the distributed database organizes and manages data according to sites, text types and time, and the method specifically comprises the following steps: the distributed database has a plurality of site collection stations distinguished by CORS stations; the site collection station comprises a plurality of message type collection types, and the same message type collection type stores all process data of the same message type collected at different times; the site collection station also comprises an observation file collection, wherein the observation file collection is divided into a files collection and a chunks collection; the files set is used for storing metadata describing static observation files; the chunks set stores the specific content of the static observation file in the form of binary data.

As a preferred aspect, the data structure of the process data includes: the message type typeId, the reference station number stationId, the time of acquisition of the observed data, the binary original text of the data area and the identification area in the second observed data, and the message field after the analysis of the data area and the identification area in the second observed data.

As a preferred solution, the process data in the message type set type is stored in a time order in which it is collected.

As a preferred solution, the first query interface is defined as:

http://host[:port]/path？{stationId＝value&typeId＝value&startTime＝value&endTime＝value&interval＝value}；

wherein host [: port ] is the server address and port where the query service is located, path is the path of the service, { } represents the input parameters of the interface; the input parameters include a reference station number statiold, a message type number typeId, a data sampling interval, a start time startTime and an end time endTime of the inquiry time range.

As a preferred scheme, each record of the file set may be a file or a folder; the data structure of the file set comprises: unique identification_id of a file or folder, identification_pantid of a previous-level folder of the file or folder, a file name or folder name, observation data start time dataStartTime of file records, file data accumulation duration range, data sampling interval, total file size length and file block size chunkSize; if the record is a folder, the values of length, chunkSize, dataStartTime, range, interval are all null; when the distributed database stores the file, dividing a static observation file with the file data size larger than chunkSize into a plurality of data blocks and storing the data blocks in chunks sets; the files in the file set at least correspond to one data block; the data structure of the chunks set includes: the unique identification (id) of the file to which the data block belongs, the position (n) of the data block in the file, and the binary data after serialization of the data block.

As a preferred solution, the file or folder is created with a virtual folder hierarchy by associating its unique id_id with the superior folder id_pantid.

As a preferred solution, the second query interface is defined as:

http://host[:port]/path？{stationId＝value&dataStartTime＝value&range＝value&interval＝value}；

wherein host [: port ] is the server address and port where the query service is located, path is the path of the service, { } represents the input parameters of the interface; the input parameters comprise a reference station number statioanId, a data sampling interval, a data accumulation duration range recorded by the file and a data start time dataStartTime recorded by the file.

As a preferable scheme, the static observation file is generated by the following steps: and reading the text field corresponding to the required text type in the required site in the required time range from the distributed database through the first query interface, sampling the data at sampling intervals, returning the result data, and generating the target static observation file. Preferably, the static observation file is a RINEX file.

As a preferred solution, the analyzing the observed data stream into a plurality of independent original observed data specifically includes: and (3) sequentially reading binary bytes for the observed data stream of each site, searching for a fixed guide according to the observed data transmission format described by the RTCM3.2 standard, taking the fixed guide as a head, calculating the sum of the prefix, the reserved field, the length of the data area and the byte length described by the check area, defining a data boundary by the guide and the total length, and splitting the observed data stream into a plurality of original observed data.

As a preferable scheme, adding a self-defined identification area at the last position of each piece of original observed data to carry out content correction, specifically comprising: and positioning the end position of the original observed data according to the RTCM3.2 standard, adding an identification area with a self-defined bit number after the end position of the data, wherein the marking content comprises the station number ID of the source station of the original observed data and the time and sequence number of the received original observed data.

As a preferred embodiment, the data check method is a CRC check method.

As a preferable scheme, taking the serial number x of the site from which the second observed data is derived, performing modular operation on the site serial number through a formula (1) to obtain a cache partition serial number f (x), and then storing the observed data into a specific cache partition Part _f(x) ，

f(x)＝x％M (1)

When the number N < = M of stations, one cache partition stores second observation data from the same CORS station; and conversely, one cache partition stores second observation data of more than two CORS stations.

As a preferred solution, the method for packaging the acquired second observation data according to a preset rule to establish a plurality of parallel computing units specifically includes: setting a batch processing time interval delta T1 for each data consumption interface, and packaging all second observation data from the current data consumption interface in delta T1 time by taking delta T1 as a unit to form a parallel computing unit; setting a window time interval delta T2, wherein delta T2 is a multiple of delta T1, and K parallel computing units are formed in the time interval with delta T2 as a unit, wherein K=delta T2/delta T1; packaging M X K parallel computing units from M data consumption interfaces into a batch analysis task, performing parallel computing on the M X K parallel computing units, and executing analysis computing of second observation data; after executing the current batch analysis task for a certain time T, sliding to the next window time, acquiring new second observation data and establishing a new batch analysis task containing M multiplied by K parallel computing units; t is an execution period, and the time length is the average time for completing the analysis and calculation of a batch of parallel calculation units.

As a preferred solution, starting a parallel computing task to perform analytical computation on the second observation data in the parallel computing unit specifically includes: defining an analytic function Parse, getData (key) for each text type, wherein the key is the text type, the value is the second observed data, and generating kv (key, value) pairs for each piece of second observed data in the parallel computing unit; when the parallel computing task is executed, a corresponding analysis function is called based on the key value, analysis computation is carried out on all kv pairs in the parallel computing unit, and the analysis computation result is a text field recorded by the content of a data area and an identification area in the second observation data according to the RTCM3.2 standard.

Scheme II: the invention also provides a real-time acquisition and management system for the global navigation satellite observation data, which comprises a data acquisition center, a data analysis center and a distributed database, wherein the data acquisition center is designed based on a distributed message queue technology, and the data analysis center is designed based on a distributed stream data calculation frame; wherein:

the data acquisition center is constructed to comprise a stream analysis module, a content correction module, a data filtering module and a data cache area; the flow analysis module is used for receiving satellite observation data flows based on RTCM3.2 standard which are simultaneously and uninterruptedly transmitted according to TCP/IP protocol from N CORS stations, analyzing the observation data flows into a plurality of independent original observation data, and recording all fields of one message type in each original observation data; the content correction module is used for establishing a streaming task to perform content correction on each original observation number in a parallel processing mode to obtain first observation data; the data filtering module is used for filtering the first observation data by adopting a data verification method in a streaming task to obtain second observation data; the buffer area is divided into M buffer areas, second observation data of the same CORS station are stored in the same buffer area, and each buffer area is provided with a data consumption interface; n and M are natural numbers greater than 1;

The data analysis center is constructed to comprise a parallel computing unit building module, a data content analysis module and a real-time writing module: the parallel computing unit establishing module is used for sending a data obtaining request to each data consumption interface, and packaging the obtained second observation data according to preset rules to establish a plurality of parallel computing units; the data content analysis module is used for starting a parallel computing task to analyze and compute second observation data in the parallel computing unit, so as to obtain text fields recorded in a data area and an identification area in the second observation data; the real-time writing module is used for writing the text field and the binary original text of the data area and the identification area in the second observation data into a distributed database for storage management, and the text field and the binary original text form process data;

the distributed database is provided with a plurality of site collection stations which are distinguished by CORS stations; the site collection station comprises a plurality of message type collection types, and the same message type collection type is stored at different times to collect all process data of the same message type; the site collection station also comprises an observation file collection, wherein the observation file collection is divided into a chunks collection and a files collection, the files collection is used for storing metadata describing the static observation file, and the chunks collection stores specific contents of the static observation file in a binary data form;

The distributed database also has a first query interface for process data queries and a second query interface for static observation file queries, the first and second query interfaces being defined based on the HTTP RESTful specification.

Preferably, the distributed message queuing technique is Kafka, rabbitMQ or AMQP.

Preferably, the distributed stream data computing framework is Spark stream or Storm, flink.

Preferably, the distributed database is MongoDB, cassandra or CouchDB.

Compared with the prior art, the invention has the following beneficial effects:

(1) In the prior art, most of final products for externally providing data acquisition comprise RINEX files or files in own format, and the invention acquires and records the preprocessed original text and parsed text fields of satellite observation data besides the RINEX files, so that the types of the data products are more abundant, and the invention can flexibly adapt to different application requirements on real-time and static observation data.

(2) The invention manages all data by adopting a database mode, has better management efficiency than the file management mode of the prior system, can support mass data accumulated after a large-scale station network operates for a long time, can provide a complete data directory for a data manager, ensures the flexibility of data expansion and provides a foundation for large-scale station network management.

(3) In the prior art, a file system access path such as an FTP protocol is opened for downloading static observation data, and the invention can provide efficient and networked data sharing exchange through a network query interface of an open HTTP RESTful standard, thereby greatly accelerating the batch acquisition capability of data, greatly facilitating the development and utilization of navigation positioning satellite data, and providing a favorable technical foundation for the Internet application service of the navigation positioning satellite data.

(4) The invention realizes the autonomous satellite data acquisition, records the preprocessed observation data such as original text, parsed text fields and other process data, can forward the original observation data in real time through the query interface, can avoid the complex data extraction process when the observation data is used in different application scenes, improves the data use efficiency, and realizes the calculation application requiring the real-time satellite observation data; meanwhile, related problem data can be tracked and traced, so that missing data caused by conversion is supplemented from a database, and the way of coding the traditional business software respectively is broken.

(5) The invention adopts the current mainstream distributed big data frame, which comprises a distributed message queue frame supporting large-scale data communication, a distributed stream data calculation frame supporting mass data real-time calculation, and distributed data supporting the reading and writing of the distributed calculation frame and supporting the fast-growing observation data storage, thereby ensuring the processing efficiency and stability of the system, increasing the reference station, namely the data source node infinitely, and meeting the requirements of flexible expansion and large-scale application of the current reference station network.

Drawings

FIG. 1 is a schematic diagram of a real-time acquisition and management system architecture for global navigation satellite observation data;

FIG. 2 is a schematic diagram of steps of a method for acquiring and managing global navigation satellite observation data in real time;

FIG. 3 is a schematic diagram of a Kafka-based distributed data acquisition method;

FIG. 4 is a diagram of a Spark-based distributed data parsing method architecture;

FIG. 5 is a partial content of the RINEX file;

FIG. 6 is a distributed database logic storage structure.

Detailed Description

The invention provides a real-time acquisition and management method and system for global navigation satellite observation data, which are characterized in that satellite observation data streams are acquired from a CORS reference station (CORS station or reference station for short), the observation data streams are analyzed and split into original observation data, the original observation data are subjected to content correction and data filtering, the corrected and filtered observation data are subjected to data content analysis, the analyzed data fields (namely text fields) are directly stored in a database, the data fields are read from the database at regular time, RINEX static observation files or other self-defined static observation files are generated according to a certain rule, and the static observation files are stored in the database for management.

The technical scheme described by the invention is further explained below by combining specific embodiments and drawings.

With reference to fig. 1, embodiment 1 discloses a real-time acquisition and management system for global navigation satellite observation data, which mainly comprises a distributed data acquisition center, a distributed data analysis center and a distributed database. The data acquisition center is designed based on a distributed message queue technology, such as Kafka, rabbitMQ, AMQP technology, and mainly comprises a stream analysis module, a content correction module, a data filtering module, a data processing module and a data buffer area. The data analysis center is designed based on a distributed stream data computing framework, such as Spark stream, storm, flash and the like, and mainly comprises a parallel computing unit, a data content analysis module and a real-time writing module. The distributed database may specifically be a database such as MongoDB, cassandra, couchDB, and is mainly used for storing filtered observation data and analysis results (i.e. text fields) thereof, and further includes static observation files generated by post database operations. Fig. 1 shows the main components of the global navigation satellite observation data real-time acquisition and management system and the data processing content and processing flow of each component, and the specific description is as follows:

the data acquisition center can support the uninterrupted reception of the observation data stream of the large-scale CORS reference station, and a series of parallel tasks are established for the received satellite observation data stream to preprocess the data. The reference station continuously transmits satellite observation data streams based on the RTCM3.2 standard, i.e. a series of observation data connected end to end, to the data acquisition center according to the TCP/IP protocol. The pretreatment content mainly comprises: firstly, a satellite observation data stream is split into independent original observation data in a stream analysis module; because the condition that the station number value of the CORS reference station in the data content is missing can occur at some time when satellite observation data is received, the content correction is needed to be carried out on each piece of original observation data in a content correction module, so that the observation data which are processed later all contain effective station numbers and acquisition time, and the effective station numbers and the acquisition time are key information necessary for data management; the corrected observation data (which may be referred to as "first observation data") is then filtered in the data filtering module, that is, through data correctness checking, to ensure that the data does not have data content errors due to code loss and code error during network transmission. The data buffer area is provided with a plurality of buffer areas, after the effective observation data (which can be called as second observation data) is obtained through verification, the effective observation data is distributed to the plurality of buffer areas of the data acquisition center for queuing, and the data is acquired and processed from the buffer area of the data acquisition center when the data analysis center has computing resources.

The data analysis center is used for analyzing the observed data which come from the data acquisition center after pretreatment, namely the second observed data. The data analysis center requests the preprocessed observation data from the cache region of the data acquisition center, packages the second observation data according to a certain rule to establish a plurality of parallel computing units, and starts a parallel computing task in the data content analysis module to analyze and compute the content of the observation data in all the parallel computing units, namely, analyze the content of the data region and the identification region in the second observation data. The analysis result is that the observed data is described according to RTCM3.2 standard in many text fields, the analysis result and the analyzed second observed data are collectively called as "process data", the process data is written into the database for storage in real time, and the query interface of the process data is opened.

In the normal operation process of the system, the message fields can be queried from the distributed database at regular time to generate a static observation file, the file is written back to the database for storage management, and a query interface of static observation file data is opened. The query interfaces for the data may be defined based on the HTTP RESTful specification.

Referring to fig. 2 to 6, embodiment 2 discloses a method for collecting and managing global navigation satellite observation data in real time, which can be based on the system for collecting and managing global navigation satellite observation data in real time described in embodiment 1, and mainly comprises the following steps:

And (3) data receiving: defining CORS reference station as production node of observation data, and collecting it as S { S ₁ 、S ₂ 、……S _n And the number of the stations is N, and all the stations simultaneously and uninterruptedly transmit satellite observation data streams to a data acquisition center, and the data acquisition center receives the satellite observation data streams. Satellite observation data streams are typically a series of unbounded binary bytes containing a plurality of end-to-end, unbounded observation data.

Stream analysis: the data acquisition center needs to perform stream analysis processing firstly, namely splitting the observed data stream into a plurality of independent original observed data. And (3) sequentially reading binary bytes for the observed data stream of each site, searching for a fixed guide according to the observed data transmission format described by the RTCM3.2 standard, taking the fixed guide as a head, calculating the sum of the prefix, the reserved field, the length of the data area and the byte length described by the check area, defining a data boundary by the guide and the total length, and splitting the observed data stream into a plurality of original observed data.

Content correction: after splitting, the data acquisition center establishes a streaming task to perform parallel processing according to the number (usually more than N) of observation data to be processed at the same time, and performs content correction on each piece of observation data. In combination with the table 1, the last bit of the observation data (i.e. the last bit of the byte of the check area) is located according to the RTCM3.2 standard, and a custom 32-bit identification area is added after the last bit of the data, so as to obtain the first observation data, as shown in table 2. The tag content contains the station number ID of the data source station, the time and sequence number of data reception, and the time record is accurate to seconds.

Table 1 rtcm3.2 standard data transmission frame structure

Name of the name	Number of bits	Unit (B)	Range	Description of the invention
					Prefix symbol	8	—	—	Fixed guide 11010011
Reserved field	6	—	—	Reserved field, set 000000
					Data area length	10	byte	0～1023
Data area	—	—		The total length is determined by the length of the data area
					Check area
	24	1	—	CRC24Q check

Table 2 modified observed data transmission frame structure

And (3) data filtering: after the content is corrected, the observed data is subjected to data filtering by using a CRC check method in the streaming task, and the correctness of the data is judged. The CRC method is a channel coding technology for generating short fixed bit check codes according to network data packets, and is mainly used for checking errors possibly occurring after data transmission. Starting from the first bit of the observed data prefix and ending at the last bit of the data region, a check bit sequence is generated using a CRC24Q check formula. And comparing the calculated check sequence with the check code of the check area in the current observation data, if the calculated check sequence and the check code are consistent, checking to be complete, if the calculated check sequence and the check code of the check area are inconsistent, checking to be failed, and judging that the content of the observation data is missing and cannot be analyzed correctly, and discarding the data.

Data caching: the filtered observation data is stored in the buffer area, and the data is requested from the buffer area and processed when the data analysis center has idle computing resources. The buffer area is specifically divided into M buffer areas, and the M buffer areas are gathered into Part { Part } ₁ 、Part ₂ 、……Part _M And a plurality of storage nodes for receiving and storing filtered observation data from the N reference stations. During caching, the second observation data should be distributed to each cache partition as evenly as possible, and balance of computing resources is kept. Since the frequencies at which the reference stations transmit data are all the same, it is desirable to divide and reference stations an equal number of cache partitions (i.e., m=n) and transmit data from one station to the same cache partition. However, when the number of sites is large, that is, the number of cache partitions is too large, the actual running efficiency of the distributed message queue may be reduced, and it is impossible to always correspond the cache partitions to the sites one by one, so the following method may be adopted to cache the number of partitions, that is:

taking the serial number x of the site from which the observed data is derived, performing modular operation on the site serial number through a formula (1) to obtain a cache partition serial number f (x), and then storing the observed data into a specific cache partition Part _f(x) And (3) neutralizing.

f(x)＝x％M (1)

Where M is equal to the number of partitions that are optimal in the actual operation of the message queue (typically experimental experience values in different computing resource environments). When the number N < = M of stations, one cache partition stores the observed data from one reference station, whereas one cache partition may contain the observed data from a plurality of stations.

The steps are all completed in a data acquisition center based on the distributed message queue technology, wherein the data acquisition center adopts a Kafka stream data processing platform shown in fig. 3, but the data acquisition center is not limited to the stream data processing platform in practical application, and can also adopt a RabbitMQ, AMQP processing platform, for example. The main operation of the data analysis center will be further described, as shown in fig. 4, where the data analysis center is exemplified by Spark stream design based on a distributed stream data computing framework, and the data analysis center is not limited to this data architecture in practical application, and may be, for example, storm, flink, etc.

Establishing a parallel computing unit: the data analysis center actively acquires observation data (i.e., consumption) from the data acquisition center. Each cache partition opens a data consumption interface (simply referred to as a consumption interface), and the data analysis center simultaneously sends data obtaining requests to the M consumption interfaces, and after obtaining data, the data can be packaged according to the following rules to generate a set of basic computing units RDD (i.e. data units capable of operating) of parallel computing tasks. For each consuming interface, a batch time interval deltat 1 is set, and all observed data from the current consuming interface in the time unit is packaged by taking deltat 1 as a unit to form an RDD. The window time interval Δt2, Δt2 must be a multiple of Δt1, and K RDDs, k=Δt2/Δt1, are formed in time intervals of Δt2 units. Spark packages M x K RDDs from M consumption interfaces into a batch analysis task, performs parallel calculation on the M x K RDDs, and performs analysis on the content of the observation data area. After a certain time T (also called an execution period) has been executed on the current batch analysis task, the current batch analysis task slides to the next window time, new observation data is obtained, and a new batch analysis task containing m×k RDDs is established. The length of time of T is set to be approximately equal to the average time to complete a batch of RDD parsing calculations.

And (3) analyzing data content: and according to the current batch M multiplied by K RDDs, carrying out parallel analysis calculation on the observation data contained in the RDDs. According to RTCM3.2 standard, the data area of the observation data record records field values of different message types (as shown in table 3), and each observation data record only all fields of one message type. And generating a kv (key, value) pair for each piece of observation data in the RDD by taking the type of the telegram recorded in the observation data as a key, wherein value is a data area and an identification area of the observation data. Since each message type contains different fields (as shown in table 4), the parsing method for the different message types is different, so a parsing function Parse. GetData (key) is defined for each message type.

TABLE 3 telegram type Table in RTCM3.2 Standard (only partial telegrams)

Type of telegram	Names of telegrams	Number of bytes ^a	Description of the invention
				1～100	Test telegraph text	-	-
1001	GPS RTK L1 observations	8.00+7.25×Ns
				1002	Extended GPS RTK L1 observations	8.00+9.25×Ns
1003	GPS RTK L1&L2 observations	8.00+12.625×Ns
				1004	Extended GPS RTK L1&L2 observations	8.00+15.625×Ns
1005	RTK reference station ARP	19
				1006	RTK reference station ARP with high antenna	21
1007	Antenna description	5-36
				1008	Antenna description and sequence number	6-68
1009	GLONASS RTK L1 observations	7.625+8×Ns
				1010	Extended GLONASS RTK L1 observations	7.625+9.875×Ns
1011	GLONASS RTK L1 and L2 observations	7.625+13.375×Ns
				1012	Extended GLONASS RTK L1 and L2 observations	7.625+16.25×Ns
1013	System parameters	8.75+3.625×Nm	Nm=number of message types broadcast
				1014	Network auxiliary station data	14.625
1015	GPS ionosphere correction value single difference	9.5+3.5×Ns
				1016	GPS geometric correction single difference	9.5+4.5×Ns
1017	GPS geometry and ionosphere combined correction value single difference	9.5+6.625×Ns

Table 4 description of field contents of message type 1013 (system parameter message) in rtcm3.2 standard

Data field name	Data field number	Data type	Number of bits	Description of the invention
					Electric text type	DF002	uint12		12	1013
Reference station ID	DF003	uint12	12
					MJD days	DF051	uint16	16
UTC time of day	DF052	uint17	17
					Subsequent message number (Nm)	DF053	uint5		5
GPS-UTC number of hop seconds	DF054	uint8	8
					Telegram ID#1	DF055	uint12		12
Synchronous mark of telegram #1	DF056	bit(1)	1
					Message #1 transmission interval	DF057	uint16	16
Telegram ID#2	DF055	uint12		12
					Message #2 synchronization mark	DF056	bit(1)	1
Message #2 transmissionSpacing of	DF057	uint16	16
					......	......	......	......
Message ID #Nm	DF055	uint12		12
					Message #Nm synchronization mark	DF056	bit(1)	1
Message #Nm transmission interval	DF057	uint16	16
					Totals to	-	-	70+29×Nm

When the parallel computing task is executed, the Spark calls a corresponding analysis function based on the key value (namely the text type), and analysis and calculation are carried out on all kv pairs in the RDD. The result of the parsing calculation is that the data area in the second observation data is recorded according to the text field and the content of the identification area recorded in the RTCM3.2 standard (as shown in table 5). The text field recorded in the content of the data area and the identification area, wherein the text field recorded in the data area is recorded according to the RTCM3.2 standard.

TABLE 5 analysis results of observation data of telegram type 1013

After the data analysis center finishes processing, the process data is written into the distributed database for storage in real time, and a standard process data query interface is opened. Similarly, the system also reads the text field from the database at regular time to generate an observation file, writes the generated observation file back to the database for storage management, and opens a query interface of standard observation file data.

We next proceed to introduce a specific effort and method for the distributed database to store data for different categories:

process data (including text fields and second observations) is stored: the result of the data content parsing calculation is a text field recorded in the second observation data, which includes attributes such as text type, reference station ID, observation data acquisition time recorded in the added identification area, and the like. The second observation data in the process data specifically refers to the binary text of the data area and the identification area contained in the second observation data, and the second observation data is stored in the database in the binary text. In order to improve the query efficiency, a tree structure (i.e. a nested set) of a distributed database MongoDB can be used for modeling, the database comprises a plurality of site sets (stations) distinguished by CORS sites, each site set is used for dividing data records into a plurality of message type sets (types), and all process data serving as record objects are stored in the message type sets in a time sequence in which the process data are collected. The specific data structure stored is as follows:

/>

Wherein typeId is a message type, status Id is a reference station number, time is observation data acquisition time, binary is a binary original text of second observation data, and other attributes such as attribute1, attribute2 and the like are other field results analyzed in the observation data.

As shown in fig. 5, taking the observation data including the telegram type 1013 as an example, the analysis result is stored in the database as follows:

/>

after data is put in storage, text field data based on HTTP RESTful specification and a corresponding binary text query interface can be defined, so that a user can quickly acquire observation data by sending a network HTTP request. The query interface is defined as follows: http:// host [: port ]/path? { statioid=value & typeid=value & starttime=value & endtime=value & interval=value }

Where host [: port ] is the server address and port where the query service is located, path is the path of the service, { } represents the input parameters of the interface, as follows:

examples of interfaces are as follows:

http://www.shxbdcors.com/apps/dataserver/getFieldDatastationId＝0&typeId＝1013,1006,1033&date_startTime＝20201217T032400&date_endTime＝20201218T032359&interval＝15。

RINEX file storage: the RINEX file is a commonly used file form for storing satellite observation data, and the RINEX file is used as a data source in post-calculation of many satellite data, so that in order to meet the existing use habit, observation data text fields directly stored in a database need to be converted into the RINEX file and provided for a data user. A RINEX file is typically generated from field values of multiple message types accumulated from the same reference station for a certain period of time and with a certain sampling interval, for example, observation data acquired every 30 seconds is taken for generating the RINEX file, see part of file contents of the RINEX file generated using observation data of

message types

1006, 1008, 1013, 1033 and 1230 shown in fig. 5. The RINEX file may be a text field in which 24 hours are accumulated at intervals of 30 seconds, or may be a text field in which 8 hours are accumulated at intervals of 1 second, and may be generated as needed. And setting a timing task according to the required data accumulation duration and sampling interval, and inquiring a field value from a database through a text field inquiring interface. For example, a RINEX file is generated with a frequency of 30 seconds, which is an accumulation duration of 24 hours, and the file requires observation data of five

message types

1006, 1008, 1013, 1033 and 1230, and then after each acquisition of observation data of 24 hours, data fields of the corresponding message types are queried every 30 seconds interval within 24 hours. Correspondingly, the database text field data query interface used for query is as follows:

https://www.shxbdcors.com/apps/getFieldDatastationId＝0&typeId＝1006,1008,1013,1033,1230&startTime＝20201217T032400&endTime＝20201218T032359&interval＝30

And calculating and generating the RINEX file by using field data returned by the query interface, and then warehousing the RINEX file in a distributed database for persistent storage, so that the RINEX file management and quick query are facilitated.

The distributed database MongoDB uses two sets to store RINEX files, which are respectively used for storing file data and metadata information describing the files. Specifically, one set is files for storing metadata of the RINEX file, for example, a file name, a source site of the file, a total duration of recorded data of the file, and the like; one set is chunks for writing file content to storage in binary form. First, the site sets described in the process data storage are expanded, and each site set is newly nested with a file set and a chunks set for storing the RINEX file containing the site observation data. Meanwhile, the files set of MongoDB needs to be expanded, so that each record in the set can represent a RINEX file or a folder, whether the file or the folder has a unique ID and an ID of an upper folder, a virtual folder system is established by associating the unique ID and the ID of the upper folder, and the RINEX files of the same type (such as files with the same accumulated duration and sampling interval) are placed in the same folder through the virtual folder system, so that the query efficiency can be improved.

The data structure of the file set is shown below, and mainly includes a unique ID (ID) of a file or folder, an ID (ID) of a previous folder of the file or folder, a file or folder name (name), a start time (dataStartTime) of observed data of a file record, a file data accumulation period (range), a data sampling interval (interval), a file total size (length), and a file chunk size (chunkSize). It should be noted that the MongoDB database blocks a file when storing the file content, and if the data size of the file in the database is greater than the value of chunkSize, the file is divided into a plurality of data blocks and stored in the chunk set.

Wherein the value of length, chunkSize, dataStartTime, range, interval is null if the current record is a folder rather than a file.

The data structure of the chunks set is shown below, and mainly includes a unique ID (ID) of the RINEX file to which the data block belongs, a location identifier (n) of the data block in the RINEX file, and binary data (data) after the data block is serialized.

After the RINEX files are put in storage, a RINEX file query interface based on the HTTP RESTful specification can be defined for a user to quickly query the RINEX files by sending a network HTTP request. The query interface is defined as http:// host [: port ]/path? { statioid=value & datastarttime=value & range=value & interval l=value }

Wherein host [: port ] is the server address and port where the query service is located, path is the path of the service, { } represents the input parameters of the interface, as follows:

examples of interfaces are as follows:

https://www.shxbdcors.com/apps/dataserver/getRINEXstationId＝0&range＝24&interval＝30&dataStartTime＝20201217T032400。

finally, it should be noted that while the above describes embodiments of the invention in terms of drawings, the present invention is not limited to the above-described embodiments and fields of application, which are illustrative, instructive, and not limiting. Those skilled in the art, having the benefit of this disclosure, may effect numerous forms of the invention without departing from the scope of the invention as claimed.

Claims

1. The real-time acquisition and management method for the global navigation satellite observation data is characterized by comprising the following steps of:

receiving satellite observation data streams based on RTCM3.2 standard which are simultaneously and uninterruptedly transmitted according to TCP/IP protocol from N CORS stations;

analyzing the observation data stream into a plurality of independent original observation data, wherein each original observation data only records all fields of one message type;

establishing a streaming task to carry out content correction on each original observation number in a parallel processing mode to obtain first observation data;

Filtering the first observation data by adopting a data verification method in the streaming task to obtain second observation data;

establishing a buffer area for buffering second observation data, wherein the buffer area is divided into M buffer areas, the second observation data of the same CORS station are stored in the same buffer area, and a data consumption interface is provided for each buffer area;

sending a data acquisition request to each data consumption interface, and packaging the acquired second observation data according to preset rules to establish a plurality of parallel computing units;

starting a parallel computing task to analyze and compute second observation data in the parallel computing unit to obtain text fields recorded in a data area and an identification area in the second observation data;

writing the text field and the binary original text of the data area and the identification area in the second observation data into a distributed database for storage management, wherein the text field and the binary original text form process data, and a first query interface of the process data is provided;

reading text fields from a distributed database at regular time, generating a static observation file, writing the file into the distributed database for storage management, and providing a second query interface of the file;

The first query interface and the second query interface are defined based on the HTTP RESTful specification, N and M being natural numbers greater than 1.

2. The method for real-time collection and management according to claim 1, wherein the distributed database organizes and manages data by site, message type and time, specifically as follows:

the distributed database has a plurality of site collection stations distinguished by CORS stations;

the site collection station comprises a plurality of message type collection types, and the same message type collection type stores all process data of the same message type collected at different times;

the site collection station also comprises an observation file collection, wherein the observation file collection is divided into a files collection and a chunks collection; the files set is used for storing metadata describing static observation files; the chunks set stores the specific content of the static observation file in the form of binary data.

3. The method of claim 2, wherein the process data in the message type set type is stored in a time order in which it is collected.

4. The real-time acquisition and management method according to claim 2, wherein the data structure of the process data includes: the message type typeId, the reference station number stationId, the time of acquisition of the observed data, the binary original text of the data area and the identification area in the second observed data, and the message field after the analysis of the data area and the identification area in the second observed data.

5. The method of real-time acquisition and management according to claim 4, wherein the first query interface is defined as: http:// host [: port ]/path? { statioid=value & typeid=value & starttime=value & endtime=value & interval=value };

6. The method for real-time acquisition and management as defined in claim 2, wherein,

each record of the file collection can be a file or a folder; the data structure of the file set comprises: unique identification_id of a file or folder, identification_pantid of a previous-level folder of the file or folder, a file name or folder name, observation data start time dataStartTime of file records, file data accumulation duration range, data sampling interval, total file size length and file block size chunkSize; when the distributed database stores the file, dividing a static observation file with the file data size larger than chunkSize into a plurality of data blocks and storing the data blocks in chunks sets; the files in the file set at least correspond to one data block; if the record is a folder, the values of length, chunkSize, dataStartTime, range, interval are all null;

The data structure of the chunks set includes: the unique identification (id) of the file to which the data block belongs, the position (n) of the data block in the file, and the binary data after serialization of the data block.

7. The method of claim 6, wherein the file or folder is created as a virtual folder hierarchy by associating its unique id_id with the superior folder id_pantid.

8. The method of real-time acquisition and management according to claim 6, wherein the second query interface is defined as: http:// host [: port ]/path? { statioid=value & datastarttime=value & range=value & interval=value };

9. The method for real-time collection and management according to claim 1, wherein the static observation file is generated by: and reading the text field corresponding to the required text type in the required site in the required time range from the distributed database through the first query interface, sampling the data at sampling intervals, returning the result data, and generating the target static observation file.

10. The method for real-time collection and management according to claim 1, wherein the static observation file is a RINEX file.

11. The method for real-time acquisition and management according to claim 1, wherein analyzing the observed data stream into a plurality of independent raw observed data, specifically comprises: and (3) sequentially reading binary bytes for the observed data stream of each site, searching for a fixed guide according to the observed data transmission format described by the RTCM3.2 standard, taking the fixed guide as a head, calculating the sum of the prefix, the reserved field, the length of the data area and the byte length described by the check area, defining a data boundary by the guide and the total length, and splitting the observed data stream into a plurality of original observed data.

12. The method for real-time collection and management according to claim 1, wherein adding a custom identification area at the end of each piece of original observed data for content correction, specifically comprises:

and positioning the end position of the original observed data according to the RTCM3.2 standard, adding an identification area with a self-defined bit number after the end position of the data, wherein the marking content comprises the station number ID of the source station of the original observed data and the time and sequence number of the received original observed data.

13. The method of claim 1, wherein the data check method is a CRC check method.

14. The method of claim 1, wherein the sequence number x of the second site from which the observed data is derived is obtained by performing a modulo operation on the site sequence number by the formula (1) to obtain the buffer sector sequence number f (x), and then storing the observed data in a specific buffer sector Part _f(x) ，

f(x)＝x％M (1)

15. The method for real-time collection and management according to claim 1, wherein the step of packaging the acquired second observation data according to a predetermined rule to create a plurality of parallel computing units comprises:

setting a batch processing time interval delta T1 for each data consumption interface, and packaging all second observation data from the current data consumption interface in delta T1 time by taking delta T1 as a unit to form a parallel computing unit; setting a window time interval delta T2, wherein delta T2 is a multiple of delta T1, and K parallel computing units are formed in the time interval with delta T2 as a unit, wherein K=delta T2/delta T1;

Packaging M X K parallel computing units from M data consumption interfaces into a batch analysis task, performing parallel computing on the M X K parallel computing units, and executing analysis computing of second observation data;

after executing the current batch analysis task for a certain time T, sliding to the next window time, acquiring new second observation data and establishing a new batch analysis task containing M multiplied by K parallel computing units; t is an execution period, and the time length is the average time for completing the analysis and calculation of a batch of parallel calculation units.

16. The method for real-time collection and management according to claim 1, wherein starting a parallel computing task to perform analytical computation on the second observation data in the parallel computing unit, specifically comprises:

defining an analytic function Parse, getData (key) for each text type, wherein the key is the text type, the value is the second observed data, and generating kv (key, value) pairs for each piece of second observed data in the parallel computing unit; when the parallel computing task is executed, a corresponding analysis function is called based on the key value, analysis computation is carried out on all kv pairs in the parallel computing unit, and the analysis computation result is a text field recorded by the content of a data area and an identification area in the second observation data according to the RTCM3.2 standard.

17. The global navigation satellite observation data real-time acquisition and management system is characterized by comprising a data acquisition center, a data analysis center and a distributed database, wherein the data acquisition center is designed based on a distributed message queue technology, and the data analysis center is designed based on a distributed stream data calculation frame; wherein:

the data acquisition center is constructed to comprise a stream analysis module, a content correction module, a data filtering module and a data cache area; the flow analysis module is used for receiving satellite observation data flows based on RTCM3.2 standard which are simultaneously and uninterruptedly transmitted according to TCP/IP protocol from N CORS stations, analyzing the observation data flows into a plurality of independent original observation data, and recording all fields of one message type in each original observation data; the content correction module is used for establishing a streaming task to perform content correction on each original observation number in a parallel processing mode to obtain first observation data; the data filtering module is used for filtering the first observation data by adopting a data verification method in a streaming task to obtain second observation data; the buffer area is divided into M buffer areas, second observation data of the same CORS station are stored in the same buffer area, and each buffer area is provided with a data consumption interface; n and M are natural numbers greater than 1; the data analysis center is constructed to comprise a parallel computing unit building module, a data content analysis module and a real-time writing module: the parallel computing unit establishing module is used for sending a data obtaining request to each data consumption interface, and packaging the obtained second observation data according to preset rules to establish a plurality of parallel computing units; the data content analysis module is used for starting a parallel computing task to analyze and compute second observation data in the parallel computing unit, so as to obtain text fields recorded in a data area and an identification area in the second observation data; the real-time writing module is used for writing the text field and the binary original text of the data area and the identification area in the second observation data into a distributed database for storage management, and the text field and the binary original text form process data;

18. The global navigation satellite system for real-time acquisition and management of data according to claim 17, wherein the distributed message queuing technique is Kafka, rabbitMQ or AMQP; the distributed stream data calculation framework is Spark stream or Storm or Flink; the distributed database is MongoDB, cassandra or CouchDB.