CN116910016A

CN116910016A - AIS data processing method

Info

Publication number: CN116910016A
Application number: CN202311182087.4A
Authority: CN
Inventors: 赵凤龙; 云泽雨; 张建东; 胡青; 马融; 李洋; 王振江; 李建平; 尹凡; 莫培培
Original assignee: Tianjin Communication Center Navigation Guarantee Center Of North China Sea Mot
Current assignee: Tianjin Communication Center Navigation Guarantee Center Of North China Sea Mot
Priority date: 2023-09-14
Filing date: 2023-09-14
Publication date: 2023-10-20
Anticipated expiration: 2043-09-14
Also published as: CN116910016B

Abstract

The invention discloses an AIS data processing method, which comprises the following steps: step one, optimizing an HDFS disk; migrating the disk of the Kafka service to the SSD, so that all mechanical hard disks are used by the HDFS service, and improving the data throughput capacity; modifying the using mode of the HDFS disk; and simulating ASM message data transmission through a data throwing program, continuously throwing the data into Kafka, and simultaneously carrying out AIS message processing and storing the AIS message processing in an Hbase database by a shore-based management system through monitoring the Kafka data, checking the quantity of ASM messages processed by a server every second during data throwing after testing, and confirming that the average AIS message data processing capacity of the system reaches 20 ten thousand pieces per second.

Description

AIS data processing method

Technical Field

The invention belongs to the technical field of data processing, and particularly relates to an AIS data processing method.

Background

The processing capability bottleneck of the current cluster is mainly represented by the serious shortage of the number of disks, and the CPU, the memory and the network have allowance. The increase of the number of the magnetic disks can synchronously improve the read-write capability of the clusters in unit time while improving the cluster capacity, and the improvement is in a linear relation with the number of the magnetic disks. Therefore, it can be deduced from the storage capability that, under the current hardware conditions, it is impossible to complete the tasks of "stable 20 ten thousand messages per second for a long time and" data query 30000 data per second directly from Solr "(the disk space occupied by 30000 data is about 70M, which is also a small test for the network and disk writing capability of the client).

Disclosure of Invention

The invention aims to overcome the existing defects and provide an AIS data processing method for solving the problems in the background technology.

In order to achieve the above purpose, the present invention provides the following technical solutions: an AIS data processing method comprising:

step one, optimizing an HDFS disk:

migrating the disk of the Kafka service to the SSD, so that all mechanical hard disks are used by the HDFS service, and improving the data throughput capacity; modifying the using mode of the HDFS disk;

step two, HBase parameter optimization:

optimizing and adjusting parameters of the HBase through big data;

step three, region Server adjustment:

using the newly added three super fusion servers as nodes of the Region Server;

the content of the change is: three virtual machines are established on the three super fusion servers, and clusters are added; expanding the number of DataNodes, expanding the number of Region servers, and migrating Solr to a newly added virtual machine;

before changing: AIS write speed is 14 ten thousand pieces/second;

after the change: AIS writing speed is reduced compared with that before changing;

the content of the change is: adding a tera-megaswitch to build a tera-megafiber network;

before changing: AIS writing speed is less than 14 ten thousand pieces/second;

after the change: AIS writing speed reaches 20 ten thousand pieces/second; solr query performance is improved;

step four, kafka magnetic disk and partition adjustment:

NFS servers are arranged on VM05 and VM06, and redundant SSD disk space is mapped to VM01-VM 04;

the Kafka cluster uses 4 mechanical hard disks as the persistence use of a message queue, releases the disk resources, provides the disk resources for HDFS for use, and increases the cluster data processing capacity;

step five, solr cluster optimization;

and the Solr service is migrated to the newly added three super fusion servers, so that the original computing resources of the cluster are concentrated on the HDFS and HBase services, and the query performance of the Solr service is improved.

Preferably, the first step further includes: the content of the change is: the logical volume of the mechanical hard disk on the VM01-VM04 is removed, the logical volume is replaced by a JBOD mode, and the read-write and redundancy of the disk are directly managed by the HDFS.

Preferably, the fourth step further comprises: the content of the change is: and building NFS servers on the VM05 and the VM06, creating a shared directory, mounting the shared directory on the VM05 and the VM06 on the VM01-VM04, and modifying the writing folder of Kafka into the shared directory.

Preferably, redis is used as a cache database, mySQL is used as a relational database, hadoop is used as a big data engine, HDFS is used as a distributed file system, HBase is used as a non-relational distributed database, solr is used as a data retrieval engine, and Kafka is used as a message queue.

Preferably, the method further comprises: and step six, optimizing and adjusting the test environment at least once.

Preferably, the sixth step includes: the execution efficiency and response time are tested by using the pressure test tool Jmeter in the test system and the management system when multiple users are online at the same time.

Preferably, step six further includes: and simulating the AIS, ASM, VDE data concurrency capability and the message concurrency execution capability of the base station through the parallel service test function of the VDES shore-based test system.

Preferably, the testing uses one or more of the open source databases Mysql, redis, kafka, hadoop.

An apparatus, comprising:

a memory storing computer program instructions;

a processor which when executed by the processor implements the AIS data processing method as claimed in any one of the preceding claims.

A computer readable storage medium comprising instructions which, when run on a computer, cause the computer to perform the AIS data processing method of any of the preceding claims.

Compared with the prior art, the invention provides an AIS data processing method, which has the following beneficial effects:

1. after long-time optimization, the overall performance of the current cluster can reach 20 ten thousand pieces/second for processing and writing; for queries, we divide data into two categories, "hot data" and "cold data". The latest 100 ten thousand pieces of data are put into a Redis cache, the data reach query performance of 3 ten thousand pieces per second, ASM message data transmission is simulated through a data throwing program, the data are continuously thrown into Kafka, a shore-based management system monitors the Kafka data, AIS message processing is carried out simultaneously and is stored in an Hbase database, after testing, the number of ASM messages processed by a server per second in the data throwing period is checked, and the average AIS message data processing capability of the system is confirmed to reach 20 ten thousand pieces per second.

2. The system of the invention uses Redis as a cache database, mySQL as a relational database, hadoop as a big data engine, HDFS as a distributed file system, HBase as a non-relational distributed database, solr as a data retrieval engine and Kafka as a message queue.

3. According to the invention, through the optimization and adjustment of the repeated test environment, the number of messages processed per second and the query efficiency of AIS, ASM, VDE messages are finally ensured to be within the expected range; the test system and the management system use the pressure test tool Jmeter to test, and the execution efficiency and response time of the multi-user on-line simultaneously meet the requirements; the concurrent execution capacity of the AIS, ASM, VDE data concurrent capacity message is tested by the VDES shore-based test system parallel service test function simulation base station and meets the requirement; the test uses an open source database Mysql, redis, kafka, hadoop and the like, can meet engineering construction requirements, and can be used for actual construction of a system.

Drawings

The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate and together with the embodiments of the invention and do not constitute a limitation to the invention, and in which:

FIG. 1 is a diagram of a performance testing system according to the present invention.

Detailed Description

The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

Referring to fig. 1, the present invention provides a technical solution: an AIS data processing method comprising the steps of:

step one, optimizing an HDFS disk;

step two, optimizing HBase parameters;

the parameters of the HBase are optimized and adjusted through big data, so that the cluster can support high-load operation for a longer time, and the speed of recovering the cluster from an unhealthy state is improved;

step three, region Server adjustment;

the newly added three super fusion servers are used as nodes of the Region Server, so that the processing capacity of the Region Server cluster is increased, and the message processing quantity in unit time is further improved;

the same physical server is divided into more virtual machines, and the total CPU, memory, disk and network resources are also fixed, so that the overall performance of the cluster is greatly improved or the physical server is required to be added;

the content of the change is: three virtual machines (RS 01, RS02 and RS 03) are created on the three super fusion servers and added into a cluster; expanding the number of DataNodes, expanding the number of Region servers, and migrating Solr to a newly added virtual machine;

before changing: AIS write speed is 14 ten thousand pieces/second;

the reason is that: although the performance of the newly added three super-fusion virtual machines is good, the three super-fusion virtual machines only have gigabit networks, the network bandwidth can become the bottleneck of the whole big data cluster under the condition that the three super-fusion virtual machines are used as big data nodes, and even after the three virtual machines are added, the overall performance of the cluster can be greatly reduced compared with that before the three super-fusion virtual machines are added;

in order to bring the performance of the newly added three virtual machines into play, the cluster network needs to be upgraded;

before changing: AIS writing speed is less than 14 ten thousand pieces/second;

the Solr service is migrated to three newly added super fusion servers, so that the original computing resources of the cluster are concentrated on the HDFS and HBase services, and the query performance of the Solr service is improved;

step four, kafka magnetic disk and partition adjustment;

NFS servers are arranged on VM05 and VM06, and redundant SSD disk space is mapped to VM01-VM 04; the storage catalog of Kafka is migrated from the previous 4 mechanical hard disks to the mapped SSD, so that on one hand, the writing and reading speeds of Kafka are improved, and on the other hand, the previous 4 mechanical hard disks are independently used by the HDFS, and the data throughput in the unit time of the HDFS is improved;

step five, solr cluster optimization;

In the present invention, preferably, in the first step: the content of the change is: the logical volume of the mechanical hard disk on the VM01-VM04 is removed, the logical volume is replaced by a JBOD mode, and the read-write and redundancy of the disk are directly managed by the HDFS.

In the present invention, it is preferable that before the modification in the first step: under the condition that Solr is closed and data is put in advance, the AIS writing speed exceeds 20 ten thousand stripes/second; opening Solr, and under the condition that the data throwing and the data processing are performed simultaneously, the AIS writing speed is less than 10 ten thousand pieces/second; after the change: when Solr is opened and the data is put in and processed simultaneously, the AIS writing speed reaches 14 ten thousand stripes/second.

In the present invention, preferably, in the fourth step: the content of the change is: and building NFS servers on the VM05 and the VM06, creating a shared directory, mounting the shared directory on the VM05 and the VM06 on the VM01-VM04, and modifying the writing folder of Kafka into the shared directory.

In the present invention, preferably, in the fourth step: before changing: AIS write speeds are approximately 20 tens of thousands of bars/second; after the change: the AIS write speed exceeds 21 ten thousand bars/second.

In the invention, preferably, the system uses Redis as a cache database, mySQL as a relational database, hadoop as a big data engine, HDFS as a distributed file system, HBase as a non-relational distributed database, solr as a data retrieval engine and Kafka as a message queue.

In the present invention, preferably, the method further comprises:

and step six, optimizing and adjusting the test environment at least once.

In actual operation, through the optimization adjustment of the repeated test environment, the number of messages processed per second of AIS, ASM, VDE messages and the query efficiency are finally ensured to be within the expected range.

In the invention, preferably, the pressure test tool Jmeter is used in the test system and the management system to test that the execution efficiency and response time of multiple users on line simultaneously meet the requirements.

In the invention, preferably, the concurrent execution capacity of the data concurrency capacity message tested AIS, ASM, VDE by the simulation base station of the parallel service test function of the VDES shore-based test system meets the requirements.

In the invention, the open source database Mysql, redis, kafka, hadoop and the like are preferably used for the test, so that the engineering construction requirements can be met, and the method can be used for the actual construction of a system.

In one embodiment of the present invention, an AIS data processing method includes the steps of:

step one, optimizing an HDFS disk;

the content of the change is: removing a logical volume of a mechanical hard disk on the VM01-VM04, replacing the logical volume with a JBOD mode, and directly managing the read-write and redundancy of the disk by an HDFS;

before changing: under the condition that Solr is closed and data is put in advance, the AIS writing speed exceeds 20 ten thousand stripes/second; opening Solr, and under the condition that the data throwing and the data processing are performed simultaneously, the AIS writing speed is less than 10 ten thousand pieces/second;

after the change: under the condition that Solr is opened and data throwing and data processing are performed simultaneously, the AIS writing speed reaches 14 ten thousand stripes/second;

step two, optimizing HBase parameters;

step three, region Server adjustment;

before changing: AIS write speed is 14 ten thousand pieces/second;

before changing: AIS writing speed is less than 14 ten thousand pieces/second;

after the change: AIS write speeds are approximately 20 tens of thousands of bars/second; solr query performance is improved;

step four, kafka magnetic disk and partition adjustment;

the content of the change is: building NFS servers on the VM05 and the VM06, creating a shared directory, mounting the shared directory on the VM05 and the VM06 on the VM01-VM04, and modifying the writing folder of Kafka into the shared directory;

before changing: AIS write speeds are approximately 20 tens of thousands of bars/second;

after the change: AIS write speeds in excess of 21 ten thousand bars/second;

step five, solr cluster optimization;

And step six, optimizing and adjusting the test environment at least once.

According to practical experiments, the CPU Load of the RS01 server is always kept below 20, and the number of the CPUs of the server is 32, so that the average utilization rate of each CPU is below 60%, and the bottleneck is not formed. In terms of memory, the total memory of the server is 144G, which occupies 93GB and has a large margin. In terms of network, the total bandwidth of the network of the server is about 7000Mbits (obtained by practical test), the peak value of the actual network transmission data per second is not more than 200MB, namely 1600Mbits, and the margin is also large. In terms of disks, the aggregate disk IOPS peak exceeds 400, while the single 10000RPM SAS disk theoretical maximum IOPS is 100, and the single 7200RPM SAS disk theoretical maximum IOPS is only 68, and considering that the server uses a disk array (the disk array is a negative factor for large data storage and reduces disk writing performance), the IOPS of a single disk have a larger influence, so that the IOPS of more than 400 have already constituted a disk writing bottleneck.

In terms of storage space, the number of disks in the current cluster is greatly different from the processing and storage capacity of 20 ten thousand pieces/second, taking AIS messages as an example, the size of each message written into a message queue is about 1KB, and the size of 20 ten thousand messages is 200M. The writing was continued for 1 hour at a speed of 20 ten thousand stripes/second, and the disk space required was 200m x 3600=703 GB. To ensure data reliability, at least two copies of each piece of data need to be stored in the HDFS file system, i.e. the data amount generated per hour is actually 1.4TB. The total memory space in the current cluster is about 29TB, and the space remains for writing for 20 hours. This is also illustrated from another perspective, where write performance would not be a problem at all if disk space could meet a 20-kilo-second write.

The hardware configuration of the invention is as follows;

1. the tester configuration is shown in the following table:

2. physical server configuration

The total five physical servers are divided into two configurations, wherein one configuration comprises the configuration of a server 1 and a server 2, the second configuration comprises the configuration of a server 3, the configuration of a server 4 and the configuration of a server 5, virtualized operating systems are respectively installed, and the specific configuration is as follows:

configuration one (server 1, server 2) is as follows:

configuration two (server 3, server 4, server 5) is as follows:

the virtual machine is divided into the following tables:

the embodiment of the invention also provides equipment, which comprises a processor, a memory and a computer program stored in the memory and capable of running on the processor, wherein the computer program realizes the processes of the AIS data processing method embodiment when being executed by the processor, and can achieve the same technical effects, and the repetition is avoided, so that the description is omitted.

The embodiment of the invention also provides a computer readable storage medium, on which a computer program is stored, which when executed by a processor, implements each process of the above embodiment of the AIS data processing method, and can achieve the same technical effects, so that repetition is avoided, and no further description is provided herein. The computer readable storage medium may be a Read-Only Memory (ROM), a random access Memory (Random Access Memory, RAM), a magnetic disk, an optical disk, or the like.

Although embodiments of the present invention have been shown and described, it will be understood by those skilled in the art that various changes, modifications, substitutions and alterations can be made therein without departing from the principles and spirit of the invention, the scope of which is defined in the appended claims and their equivalents.

Claims

1. An AIS data processing method, comprising:

step one, optimizing an HDFS disk:

step two, HBase parameter optimization:

optimizing and adjusting parameters of the HBase through big data;

step three, region Server adjustment:

using the newly added three super fusion servers as nodes of the Region Server;

before changing: AIS write speed is 14 ten thousand pieces/second;

before changing: AIS writing speed is less than 14 ten thousand pieces/second;

step four, kafka magnetic disk and partition adjustment:

step five, solr cluster optimization.

2. An AIS data processing method according to claim 1 wherein: the first step further comprises: the content of the change is: the logical volume of the mechanical hard disk on the VM01-VM04 is removed, the logical volume is replaced by a JBOD mode, and the read-write and redundancy of the disk are directly managed by the HDFS.

3. An AIS data processing method according to claim 1 wherein: the fourth step also comprises: the content of the change is: and building NFS servers on the VM05 and the VM06, creating a shared directory, mounting the shared directory on the VM05 and the VM06 on the VM01-VM04, and modifying the writing folder of Kafka into the shared directory.

4. An AIS data processing method according to claim 1 wherein: redis is used as a cache database, mySQL is used as a relational database, hadoop is used as a big data engine, HDFS is used as a distributed file system, HBase is used as a non-relational distributed database, solr is used as a data retrieval engine, and Kafka is used as a message queue.

5. The AIS data processing method of claim 1, further comprising: and step six, optimizing and adjusting the test environment at least once.

6. The AIS data processing method of claim 5 wherein step six comprises: the execution efficiency and response time are tested by using the pressure test tool Jmeter in the test system and the management system when multiple users are online at the same time.

7. The AIS data processing method of claim 6 wherein step six further comprises: and simulating the AIS, ASM, VDE data concurrency capability and the message concurrency execution capability of the base station through the parallel service test function of the VDES shore-based test system.

8. The AIS data processing method of claim 7, wherein: the testing uses one or more of the open source databases Mysql, redis, kafka, hadoop.

9. An apparatus, comprising:

a memory storing computer program instructions;

a processor which when executed by the processor implements the AIS data processing method of any one of claims 1 to 8.

10. A computer readable storage medium comprising instructions which, when run on a computer, cause the computer to perform the AIS data processing method according to any one of claims 1 to 8.