CN116910016A - AIS data processing method - Google Patents

AIS data processing method Download PDF

Info

Publication number
CN116910016A
CN116910016A CN202311182087.4A CN202311182087A CN116910016A CN 116910016 A CN116910016 A CN 116910016A CN 202311182087 A CN202311182087 A CN 202311182087A CN 116910016 A CN116910016 A CN 116910016A
Authority
CN
China
Prior art keywords
ais
disk
data processing
processing method
hdfs
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202311182087.4A
Other languages
Chinese (zh)
Other versions
CN116910016B (en
Inventor
赵凤龙
云泽雨
张建东
胡青
马融
李洋
王振江
李建平
尹凡
莫培培
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tianjin Communication Center Navigation Guarantee Center Of North China Sea Mot
Original Assignee
Tianjin Communication Center Navigation Guarantee Center Of North China Sea Mot
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tianjin Communication Center Navigation Guarantee Center Of North China Sea Mot filed Critical Tianjin Communication Center Navigation Guarantee Center Of North China Sea Mot
Priority to CN202311182087.4A priority Critical patent/CN116910016B/en
Publication of CN116910016A publication Critical patent/CN116910016A/en
Application granted granted Critical
Publication of CN116910016B publication Critical patent/CN116910016B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/18File system types
    • G06F16/182Distributed file systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/11File system administration, e.g. details of archiving or snapshots
    • G06F16/119Details of migration of file systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2457Query processing with adaptation to user needs
    • G06F16/24578Query processing with adaptation to user needs using ranking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/27Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0655Vertical data movement, i.e. input-output transfer; data movement between one or more hosts and one or more storage devices
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0668Interfaces specially adapted for storage systems adopting a particular infrastructure
    • G06F3/0671In-line storage system
    • G06F3/0683Plurality of storage devices
    • G06F3/0688Non-volatile semiconductor memory arrays
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0668Interfaces specially adapted for storage systems adopting a particular infrastructure
    • G06F3/0671In-line storage system
    • G06F3/0683Plurality of storage devices
    • G06F3/0689Disk arrays, e.g. RAID, JBOD
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/54Interprogram communication
    • G06F9/546Message passing systems or structures, e.g. queues
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • H04L67/1097Protocols in which an application is distributed across nodes in the network for distributed storage of data in networks, e.g. transport arrangements for network file system [NFS], storage area networks [SAN] or network attached storage [NAS]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2209/00Indexing scheme relating to G06F9/00
    • G06F2209/54Indexing scheme relating to G06F9/54
    • G06F2209/548Queue
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Human Computer Interaction (AREA)
  • Software Systems (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Computing Systems (AREA)
  • Computational Linguistics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses an AIS data processing method, which comprises the following steps: step one, optimizing an HDFS disk; migrating the disk of the Kafka service to the SSD, so that all mechanical hard disks are used by the HDFS service, and improving the data throughput capacity; modifying the using mode of the HDFS disk; and simulating ASM message data transmission through a data throwing program, continuously throwing the data into Kafka, and simultaneously carrying out AIS message processing and storing the AIS message processing in an Hbase database by a shore-based management system through monitoring the Kafka data, checking the quantity of ASM messages processed by a server every second during data throwing after testing, and confirming that the average AIS message data processing capacity of the system reaches 20 ten thousand pieces per second.

Description

AIS data processing method
Technical Field
The invention belongs to the technical field of data processing, and particularly relates to an AIS data processing method.
Background
The processing capability bottleneck of the current cluster is mainly represented by the serious shortage of the number of disks, and the CPU, the memory and the network have allowance. The increase of the number of the magnetic disks can synchronously improve the read-write capability of the clusters in unit time while improving the cluster capacity, and the improvement is in a linear relation with the number of the magnetic disks. Therefore, it can be deduced from the storage capability that, under the current hardware conditions, it is impossible to complete the tasks of "stable 20 ten thousand messages per second for a long time and" data query 30000 data per second directly from Solr "(the disk space occupied by 30000 data is about 70M, which is also a small test for the network and disk writing capability of the client).
Disclosure of Invention
The invention aims to overcome the existing defects and provide an AIS data processing method for solving the problems in the background technology.
In order to achieve the above purpose, the present invention provides the following technical solutions: an AIS data processing method comprising:
step one, optimizing an HDFS disk:
migrating the disk of the Kafka service to the SSD, so that all mechanical hard disks are used by the HDFS service, and improving the data throughput capacity; modifying the using mode of the HDFS disk;
step two, HBase parameter optimization:
optimizing and adjusting parameters of the HBase through big data;
step three, region Server adjustment:
using the newly added three super fusion servers as nodes of the Region Server;
the content of the change is: three virtual machines are established on the three super fusion servers, and clusters are added; expanding the number of DataNodes, expanding the number of Region servers, and migrating Solr to a newly added virtual machine;
before changing: AIS write speed is 14 ten thousand pieces/second;
after the change: AIS writing speed is reduced compared with that before changing;
the content of the change is: adding a tera-megaswitch to build a tera-megafiber network;
before changing: AIS writing speed is less than 14 ten thousand pieces/second;
after the change: AIS writing speed reaches 20 ten thousand pieces/second; solr query performance is improved;
step four, kafka magnetic disk and partition adjustment:
NFS servers are arranged on VM05 and VM06, and redundant SSD disk space is mapped to VM01-VM 04;
the Kafka cluster uses 4 mechanical hard disks as the persistence use of a message queue, releases the disk resources, provides the disk resources for HDFS for use, and increases the cluster data processing capacity;
step five, solr cluster optimization;
and the Solr service is migrated to the newly added three super fusion servers, so that the original computing resources of the cluster are concentrated on the HDFS and HBase services, and the query performance of the Solr service is improved.
Preferably, the first step further includes: the content of the change is: the logical volume of the mechanical hard disk on the VM01-VM04 is removed, the logical volume is replaced by a JBOD mode, and the read-write and redundancy of the disk are directly managed by the HDFS.
Preferably, the fourth step further comprises: the content of the change is: and building NFS servers on the VM05 and the VM06, creating a shared directory, mounting the shared directory on the VM05 and the VM06 on the VM01-VM04, and modifying the writing folder of Kafka into the shared directory.
Preferably, redis is used as a cache database, mySQL is used as a relational database, hadoop is used as a big data engine, HDFS is used as a distributed file system, HBase is used as a non-relational distributed database, solr is used as a data retrieval engine, and Kafka is used as a message queue.
Preferably, the method further comprises: and step six, optimizing and adjusting the test environment at least once.
Preferably, the sixth step includes: the execution efficiency and response time are tested by using the pressure test tool Jmeter in the test system and the management system when multiple users are online at the same time.
Preferably, step six further includes: and simulating the AIS, ASM, VDE data concurrency capability and the message concurrency execution capability of the base station through the parallel service test function of the VDES shore-based test system.
Preferably, the testing uses one or more of the open source databases Mysql, redis, kafka, hadoop.
An apparatus, comprising:
a memory storing computer program instructions;
a processor which when executed by the processor implements the AIS data processing method as claimed in any one of the preceding claims.
A computer readable storage medium comprising instructions which, when run on a computer, cause the computer to perform the AIS data processing method of any of the preceding claims.
Compared with the prior art, the invention provides an AIS data processing method, which has the following beneficial effects:
1. after long-time optimization, the overall performance of the current cluster can reach 20 ten thousand pieces/second for processing and writing; for queries, we divide data into two categories, "hot data" and "cold data". The latest 100 ten thousand pieces of data are put into a Redis cache, the data reach query performance of 3 ten thousand pieces per second, ASM message data transmission is simulated through a data throwing program, the data are continuously thrown into Kafka, a shore-based management system monitors the Kafka data, AIS message processing is carried out simultaneously and is stored in an Hbase database, after testing, the number of ASM messages processed by a server per second in the data throwing period is checked, and the average AIS message data processing capability of the system is confirmed to reach 20 ten thousand pieces per second.
2. The system of the invention uses Redis as a cache database, mySQL as a relational database, hadoop as a big data engine, HDFS as a distributed file system, HBase as a non-relational distributed database, solr as a data retrieval engine and Kafka as a message queue.
3. According to the invention, through the optimization and adjustment of the repeated test environment, the number of messages processed per second and the query efficiency of AIS, ASM, VDE messages are finally ensured to be within the expected range; the test system and the management system use the pressure test tool Jmeter to test, and the execution efficiency and response time of the multi-user on-line simultaneously meet the requirements; the concurrent execution capacity of the AIS, ASM, VDE data concurrent capacity message is tested by the VDES shore-based test system parallel service test function simulation base station and meets the requirement; the test uses an open source database Mysql, redis, kafka, hadoop and the like, can meet engineering construction requirements, and can be used for actual construction of a system.
Drawings
The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate and together with the embodiments of the invention and do not constitute a limitation to the invention, and in which:
FIG. 1 is a diagram of a performance testing system according to the present invention.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
Referring to fig. 1, the present invention provides a technical solution: an AIS data processing method comprising the steps of:
step one, optimizing an HDFS disk;
migrating the disk of the Kafka service to the SSD, so that all mechanical hard disks are used by the HDFS service, and improving the data throughput capacity; modifying the using mode of the HDFS disk;
step two, optimizing HBase parameters;
the parameters of the HBase are optimized and adjusted through big data, so that the cluster can support high-load operation for a longer time, and the speed of recovering the cluster from an unhealthy state is improved;
step three, region Server adjustment;
the newly added three super fusion servers are used as nodes of the Region Server, so that the processing capacity of the Region Server cluster is increased, and the message processing quantity in unit time is further improved;
the same physical server is divided into more virtual machines, and the total CPU, memory, disk and network resources are also fixed, so that the overall performance of the cluster is greatly improved or the physical server is required to be added;
the content of the change is: three virtual machines (RS 01, RS02 and RS 03) are created on the three super fusion servers and added into a cluster; expanding the number of DataNodes, expanding the number of Region servers, and migrating Solr to a newly added virtual machine;
before changing: AIS write speed is 14 ten thousand pieces/second;
after the change: AIS writing speed is reduced compared with that before changing;
the reason is that: although the performance of the newly added three super-fusion virtual machines is good, the three super-fusion virtual machines only have gigabit networks, the network bandwidth can become the bottleneck of the whole big data cluster under the condition that the three super-fusion virtual machines are used as big data nodes, and even after the three virtual machines are added, the overall performance of the cluster can be greatly reduced compared with that before the three super-fusion virtual machines are added;
in order to bring the performance of the newly added three virtual machines into play, the cluster network needs to be upgraded;
the content of the change is: adding a tera-megaswitch to build a tera-megafiber network;
before changing: AIS writing speed is less than 14 ten thousand pieces/second;
after the change: AIS writing speed reaches 20 ten thousand pieces/second; solr query performance is improved;
the Solr service is migrated to three newly added super fusion servers, so that the original computing resources of the cluster are concentrated on the HDFS and HBase services, and the query performance of the Solr service is improved;
step four, kafka magnetic disk and partition adjustment;
NFS servers are arranged on VM05 and VM06, and redundant SSD disk space is mapped to VM01-VM 04; the storage catalog of Kafka is migrated from the previous 4 mechanical hard disks to the mapped SSD, so that on one hand, the writing and reading speeds of Kafka are improved, and on the other hand, the previous 4 mechanical hard disks are independently used by the HDFS, and the data throughput in the unit time of the HDFS is improved;
the Kafka cluster uses 4 mechanical hard disks as the persistence use of a message queue, releases the disk resources, provides the disk resources for HDFS for use, and increases the cluster data processing capacity;
step five, solr cluster optimization;
and the Solr service is migrated to the newly added three super fusion servers, so that the original computing resources of the cluster are concentrated on the HDFS and HBase services, and the query performance of the Solr service is improved.
In the present invention, preferably, in the first step: the content of the change is: the logical volume of the mechanical hard disk on the VM01-VM04 is removed, the logical volume is replaced by a JBOD mode, and the read-write and redundancy of the disk are directly managed by the HDFS.
In the present invention, it is preferable that before the modification in the first step: under the condition that Solr is closed and data is put in advance, the AIS writing speed exceeds 20 ten thousand stripes/second; opening Solr, and under the condition that the data throwing and the data processing are performed simultaneously, the AIS writing speed is less than 10 ten thousand pieces/second; after the change: when Solr is opened and the data is put in and processed simultaneously, the AIS writing speed reaches 14 ten thousand stripes/second.
In the present invention, preferably, in the fourth step: the content of the change is: and building NFS servers on the VM05 and the VM06, creating a shared directory, mounting the shared directory on the VM05 and the VM06 on the VM01-VM04, and modifying the writing folder of Kafka into the shared directory.
In the present invention, preferably, in the fourth step: before changing: AIS write speeds are approximately 20 tens of thousands of bars/second; after the change: the AIS write speed exceeds 21 ten thousand bars/second.
In the invention, preferably, the system uses Redis as a cache database, mySQL as a relational database, hadoop as a big data engine, HDFS as a distributed file system, HBase as a non-relational distributed database, solr as a data retrieval engine and Kafka as a message queue.
In the present invention, preferably, the method further comprises:
and step six, optimizing and adjusting the test environment at least once.
In actual operation, through the optimization adjustment of the repeated test environment, the number of messages processed per second of AIS, ASM, VDE messages and the query efficiency are finally ensured to be within the expected range.
In the invention, preferably, the pressure test tool Jmeter is used in the test system and the management system to test that the execution efficiency and response time of multiple users on line simultaneously meet the requirements.
In the invention, preferably, the concurrent execution capacity of the data concurrency capacity message tested AIS, ASM, VDE by the simulation base station of the parallel service test function of the VDES shore-based test system meets the requirements.
In the invention, the open source database Mysql, redis, kafka, hadoop and the like are preferably used for the test, so that the engineering construction requirements can be met, and the method can be used for the actual construction of a system.
In one embodiment of the present invention, an AIS data processing method includes the steps of:
step one, optimizing an HDFS disk;
migrating the disk of the Kafka service to the SSD, so that all mechanical hard disks are used by the HDFS service, and improving the data throughput capacity; modifying the using mode of the HDFS disk;
the content of the change is: removing a logical volume of a mechanical hard disk on the VM01-VM04, replacing the logical volume with a JBOD mode, and directly managing the read-write and redundancy of the disk by an HDFS;
before changing: under the condition that Solr is closed and data is put in advance, the AIS writing speed exceeds 20 ten thousand stripes/second; opening Solr, and under the condition that the data throwing and the data processing are performed simultaneously, the AIS writing speed is less than 10 ten thousand pieces/second;
after the change: under the condition that Solr is opened and data throwing and data processing are performed simultaneously, the AIS writing speed reaches 14 ten thousand stripes/second;
step two, optimizing HBase parameters;
the parameters of the HBase are optimized and adjusted through big data, so that the cluster can support high-load operation for a longer time, and the speed of recovering the cluster from an unhealthy state is improved;
step three, region Server adjustment;
the newly added three super fusion servers are used as nodes of the Region Server, so that the processing capacity of the Region Server cluster is increased, and the message processing quantity in unit time is further improved;
the same physical server is divided into more virtual machines, and the total CPU, memory, disk and network resources are also fixed, so that the overall performance of the cluster is greatly improved or the physical server is required to be added;
the content of the change is: three virtual machines (RS 01, RS02 and RS 03) are created on the three super fusion servers and added into a cluster; expanding the number of DataNodes, expanding the number of Region servers, and migrating Solr to a newly added virtual machine;
before changing: AIS write speed is 14 ten thousand pieces/second;
after the change: AIS writing speed is reduced compared with that before changing;
the reason is that: although the performance of the newly added three super-fusion virtual machines is good, the three super-fusion virtual machines only have gigabit networks, the network bandwidth can become the bottleneck of the whole big data cluster under the condition that the three super-fusion virtual machines are used as big data nodes, and even after the three virtual machines are added, the overall performance of the cluster can be greatly reduced compared with that before the three super-fusion virtual machines are added;
in order to bring the performance of the newly added three virtual machines into play, the cluster network needs to be upgraded;
the content of the change is: adding a tera-megaswitch to build a tera-megafiber network;
before changing: AIS writing speed is less than 14 ten thousand pieces/second;
after the change: AIS write speeds are approximately 20 tens of thousands of bars/second; solr query performance is improved;
the Solr service is migrated to three newly added super fusion servers, so that the original computing resources of the cluster are concentrated on the HDFS and HBase services, and the query performance of the Solr service is improved;
step four, kafka magnetic disk and partition adjustment;
NFS servers are arranged on VM05 and VM06, and redundant SSD disk space is mapped to VM01-VM 04; the storage catalog of Kafka is migrated from the previous 4 mechanical hard disks to the mapped SSD, so that on one hand, the writing and reading speeds of Kafka are improved, and on the other hand, the previous 4 mechanical hard disks are independently used by the HDFS, and the data throughput in the unit time of the HDFS is improved;
the Kafka cluster uses 4 mechanical hard disks as the persistence use of a message queue, releases the disk resources, provides the disk resources for HDFS for use, and increases the cluster data processing capacity;
the content of the change is: building NFS servers on the VM05 and the VM06, creating a shared directory, mounting the shared directory on the VM05 and the VM06 on the VM01-VM04, and modifying the writing folder of Kafka into the shared directory;
before changing: AIS write speeds are approximately 20 tens of thousands of bars/second;
after the change: AIS write speeds in excess of 21 ten thousand bars/second;
step five, solr cluster optimization;
and the Solr service is migrated to the newly added three super fusion servers, so that the original computing resources of the cluster are concentrated on the HDFS and HBase services, and the query performance of the Solr service is improved.
And step six, optimizing and adjusting the test environment at least once.
According to practical experiments, the CPU Load of the RS01 server is always kept below 20, and the number of the CPUs of the server is 32, so that the average utilization rate of each CPU is below 60%, and the bottleneck is not formed. In terms of memory, the total memory of the server is 144G, which occupies 93GB and has a large margin. In terms of network, the total bandwidth of the network of the server is about 7000Mbits (obtained by practical test), the peak value of the actual network transmission data per second is not more than 200MB, namely 1600Mbits, and the margin is also large. In terms of disks, the aggregate disk IOPS peak exceeds 400, while the single 10000RPM SAS disk theoretical maximum IOPS is 100, and the single 7200RPM SAS disk theoretical maximum IOPS is only 68, and considering that the server uses a disk array (the disk array is a negative factor for large data storage and reduces disk writing performance), the IOPS of a single disk have a larger influence, so that the IOPS of more than 400 have already constituted a disk writing bottleneck.
In terms of storage space, the number of disks in the current cluster is greatly different from the processing and storage capacity of 20 ten thousand pieces/second, taking AIS messages as an example, the size of each message written into a message queue is about 1KB, and the size of 20 ten thousand messages is 200M. The writing was continued for 1 hour at a speed of 20 ten thousand stripes/second, and the disk space required was 200m x 3600=703 GB. To ensure data reliability, at least two copies of each piece of data need to be stored in the HDFS file system, i.e. the data amount generated per hour is actually 1.4TB. The total memory space in the current cluster is about 29TB, and the space remains for writing for 20 hours. This is also illustrated from another perspective, where write performance would not be a problem at all if disk space could meet a 20-kilo-second write.
The hardware configuration of the invention is as follows;
1. the tester configuration is shown in the following table:
2. physical server configuration
The total five physical servers are divided into two configurations, wherein one configuration comprises the configuration of a server 1 and a server 2, the second configuration comprises the configuration of a server 3, the configuration of a server 4 and the configuration of a server 5, virtualized operating systems are respectively installed, and the specific configuration is as follows:
configuration one (server 1, server 2) is as follows:
configuration two (server 3, server 4, server 5) is as follows:
the virtual machine is divided into the following tables:
the embodiment of the invention also provides equipment, which comprises a processor, a memory and a computer program stored in the memory and capable of running on the processor, wherein the computer program realizes the processes of the AIS data processing method embodiment when being executed by the processor, and can achieve the same technical effects, and the repetition is avoided, so that the description is omitted.
The embodiment of the invention also provides a computer readable storage medium, on which a computer program is stored, which when executed by a processor, implements each process of the above embodiment of the AIS data processing method, and can achieve the same technical effects, so that repetition is avoided, and no further description is provided herein. The computer readable storage medium may be a Read-Only Memory (ROM), a random access Memory (Random Access Memory, RAM), a magnetic disk, an optical disk, or the like.
Although embodiments of the present invention have been shown and described, it will be understood by those skilled in the art that various changes, modifications, substitutions and alterations can be made therein without departing from the principles and spirit of the invention, the scope of which is defined in the appended claims and their equivalents.

Claims (10)

1. An AIS data processing method, comprising:
step one, optimizing an HDFS disk:
migrating the disk of the Kafka service to the SSD, so that all mechanical hard disks are used by the HDFS service, and improving the data throughput capacity; modifying the using mode of the HDFS disk;
step two, HBase parameter optimization:
optimizing and adjusting parameters of the HBase through big data;
step three, region Server adjustment:
using the newly added three super fusion servers as nodes of the Region Server;
the content of the change is: three virtual machines are established on the three super fusion servers, and clusters are added; expanding the number of DataNodes, expanding the number of Region servers, and migrating Solr to a newly added virtual machine;
before changing: AIS write speed is 14 ten thousand pieces/second;
after the change: AIS writing speed is reduced compared with that before changing;
the content of the change is: adding a tera-megaswitch to build a tera-megafiber network;
before changing: AIS writing speed is less than 14 ten thousand pieces/second;
after the change: AIS writing speed reaches 20 ten thousand pieces/second; solr query performance is improved;
step four, kafka magnetic disk and partition adjustment:
NFS servers are arranged on VM05 and VM06, and redundant SSD disk space is mapped to VM01-VM 04;
the Kafka cluster uses 4 mechanical hard disks as the persistence use of a message queue, releases the disk resources, provides the disk resources for HDFS for use, and increases the cluster data processing capacity;
step five, solr cluster optimization.
2. An AIS data processing method according to claim 1 wherein: the first step further comprises: the content of the change is: the logical volume of the mechanical hard disk on the VM01-VM04 is removed, the logical volume is replaced by a JBOD mode, and the read-write and redundancy of the disk are directly managed by the HDFS.
3. An AIS data processing method according to claim 1 wherein: the fourth step also comprises: the content of the change is: and building NFS servers on the VM05 and the VM06, creating a shared directory, mounting the shared directory on the VM05 and the VM06 on the VM01-VM04, and modifying the writing folder of Kafka into the shared directory.
4. An AIS data processing method according to claim 1 wherein: redis is used as a cache database, mySQL is used as a relational database, hadoop is used as a big data engine, HDFS is used as a distributed file system, HBase is used as a non-relational distributed database, solr is used as a data retrieval engine, and Kafka is used as a message queue.
5. The AIS data processing method of claim 1, further comprising: and step six, optimizing and adjusting the test environment at least once.
6. The AIS data processing method of claim 5 wherein step six comprises: the execution efficiency and response time are tested by using the pressure test tool Jmeter in the test system and the management system when multiple users are online at the same time.
7. The AIS data processing method of claim 6 wherein step six further comprises: and simulating the AIS, ASM, VDE data concurrency capability and the message concurrency execution capability of the base station through the parallel service test function of the VDES shore-based test system.
8. The AIS data processing method of claim 7, wherein: the testing uses one or more of the open source databases Mysql, redis, kafka, hadoop.
9. An apparatus, comprising:
a memory storing computer program instructions;
a processor which when executed by the processor implements the AIS data processing method of any one of claims 1 to 8.
10. A computer readable storage medium comprising instructions which, when run on a computer, cause the computer to perform the AIS data processing method according to any one of claims 1 to 8.
CN202311182087.4A 2023-09-14 2023-09-14 AIS data processing method Active CN116910016B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311182087.4A CN116910016B (en) 2023-09-14 2023-09-14 AIS data processing method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311182087.4A CN116910016B (en) 2023-09-14 2023-09-14 AIS data processing method

Publications (2)

Publication Number Publication Date
CN116910016A true CN116910016A (en) 2023-10-20
CN116910016B CN116910016B (en) 2024-06-11

Family

ID=88367350

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311182087.4A Active CN116910016B (en) 2023-09-14 2023-09-14 AIS data processing method

Country Status (1)

Country Link
CN (1) CN116910016B (en)

Citations (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104583930A (en) * 2014-08-15 2015-04-29 华为技术有限公司 Method of data migration, controller and data migration apparatus
US20150121371A1 (en) * 2013-10-25 2015-04-30 Vmware, Inc. Multi-tenant distributed computing and database
CN104944240A (en) * 2015-05-19 2015-09-30 重庆大学 Elevator equipment state monitoring system based on large data technology
US20160321308A1 (en) * 2015-05-01 2016-11-03 Ebay Inc. Constructing a data adaptor in an enterprise server data ingestion environment
CN106709003A (en) * 2016-12-23 2017-05-24 长沙理工大学 Hadoop-based mass log data processing method
CN107491448A (en) * 2016-06-12 2017-12-19 ***通信集团四川有限公司 A kind of HBase resource adjusting methods and device
CN107644050A (en) * 2016-12-22 2018-01-30 北京锐安科技有限公司 A kind of querying method and device of the Hbase based on solr
CN109525593A (en) * 2018-12-20 2019-03-26 中科曙光国际信息产业有限公司 A kind of pair of hadoop big data platform concentrates security management and control system and method
CN110515726A (en) * 2019-08-14 2019-11-29 苏州浪潮智能科技有限公司 A kind of database loads equalization methods and device
CN110865989A (en) * 2019-11-22 2020-03-06 浪潮电子信息产业股份有限公司 Business processing method for large-scale computing cluster
CN111177271A (en) * 2019-12-31 2020-05-19 奇安信科技集团股份有限公司 Data storage method and device for persistence of kafka data to hdfs, and computer equipment
US20200341855A1 (en) * 2019-04-28 2020-10-29 Synamedia Object store specialized backup and point-in-time recovery architecture
CN111966656A (en) * 2020-07-17 2020-11-20 苏州浪潮智能科技有限公司 Method, system, terminal and storage medium for simulating high-load scene of storage file
CN112433845A (en) * 2020-10-29 2021-03-02 苏州浪潮智能科技有限公司 HBase service management method, device, equipment and readable medium
CN112799597A (en) * 2021-02-08 2021-05-14 东北大学 Hierarchical storage fault-tolerant method for stream data processing
CN115934251A (en) * 2022-12-06 2023-04-07 浪潮云信息技术股份公司 Method and system for realizing high availability of cloud native NFS
CN116383167A (en) * 2022-12-27 2023-07-04 爱信诺征信有限公司 Method for solving insufficient disk space based on object storage

Patent Citations (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150121371A1 (en) * 2013-10-25 2015-04-30 Vmware, Inc. Multi-tenant distributed computing and database
CN104583930A (en) * 2014-08-15 2015-04-29 华为技术有限公司 Method of data migration, controller and data migration apparatus
US20160321308A1 (en) * 2015-05-01 2016-11-03 Ebay Inc. Constructing a data adaptor in an enterprise server data ingestion environment
CN104944240A (en) * 2015-05-19 2015-09-30 重庆大学 Elevator equipment state monitoring system based on large data technology
CN107491448A (en) * 2016-06-12 2017-12-19 ***通信集团四川有限公司 A kind of HBase resource adjusting methods and device
CN107644050A (en) * 2016-12-22 2018-01-30 北京锐安科技有限公司 A kind of querying method and device of the Hbase based on solr
CN106709003A (en) * 2016-12-23 2017-05-24 长沙理工大学 Hadoop-based mass log data processing method
CN109525593A (en) * 2018-12-20 2019-03-26 中科曙光国际信息产业有限公司 A kind of pair of hadoop big data platform concentrates security management and control system and method
US20200341855A1 (en) * 2019-04-28 2020-10-29 Synamedia Object store specialized backup and point-in-time recovery architecture
CN110515726A (en) * 2019-08-14 2019-11-29 苏州浪潮智能科技有限公司 A kind of database loads equalization methods and device
CN110865989A (en) * 2019-11-22 2020-03-06 浪潮电子信息产业股份有限公司 Business processing method for large-scale computing cluster
CN111177271A (en) * 2019-12-31 2020-05-19 奇安信科技集团股份有限公司 Data storage method and device for persistence of kafka data to hdfs, and computer equipment
CN111966656A (en) * 2020-07-17 2020-11-20 苏州浪潮智能科技有限公司 Method, system, terminal and storage medium for simulating high-load scene of storage file
CN112433845A (en) * 2020-10-29 2021-03-02 苏州浪潮智能科技有限公司 HBase service management method, device, equipment and readable medium
CN112799597A (en) * 2021-02-08 2021-05-14 东北大学 Hierarchical storage fault-tolerant method for stream data processing
CN115934251A (en) * 2022-12-06 2023-04-07 浪潮云信息技术股份公司 Method and system for realizing high availability of cloud native NFS
CN116383167A (en) * 2022-12-27 2023-07-04 爱信诺征信有限公司 Method for solving insufficient disk space based on object storage

Also Published As

Publication number Publication date
CN116910016B (en) 2024-06-11

Similar Documents

Publication Publication Date Title
Lakshman et al. Cassandra: a decentralized structured storage system
CN104735110B (en) Metadata management method and system
Han et al. A novel solution of distributed memory nosql database for cloud computing
CN107180113B (en) Big data retrieval platform
Kaiser et al. Design of an exact data deduplication cluster
Fu et al. Performance optimization for managing massive numbers of small files in distributed file systems
CN109918450B (en) Distributed parallel database based on analysis type scene and storage method
WO2011064742A1 (en) Super-records
CN106066890A (en) A kind of distributed high-performance data storehouse integrated machine system
Wang et al. {MAPX}: Controlled data migration in the expansion of decentralized {Object-Based} storage systems
US20200192805A1 (en) Adaptive Cache Commit Delay for Write Aggregation
Otoo et al. Disk cache replacement algorithm for storage resource managers in data grids
CN111813332A (en) High-performance, high-expansion and high-safety intelligent distributed storage system
CN113032356B (en) Cabin distributed file storage system and implementation method
CN109716280B (en) Flexible memory rank storage arrangement
Qi et al. Big data management in digital forensics
CN116910016B (en) AIS data processing method
CN102867029B (en) A kind of method managing distributive catalogue of document system and distributed file system
Rodriguez et al. Unifying the data center caching layer: Feasible? profitable?
Ren et al. Towards realistic benchmarking for cloud file systems: Early experiences
Xu et al. Traffic-aware erasure-coded archival schemes for in-memory stores
Xu et al. TEA: A traffic-efficient erasure-coded archival scheme for in-memory stores
Liu et al. SSD as a Cloud Cache? Carefully Design about It
CN113886472A (en) Data access system, access method, computer equipment and storage medium
Shvidkiy et al. Caching methods analysis for improving distributed storage systems performance

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB03 Change of inventor or designer information
CB03 Change of inventor or designer information

Inventor after: Zhang Jiandong

Inventor after: Mo Peipei

Inventor after: Zhao Fenglong

Inventor after: Yun Zeyu

Inventor after: Hu Qing

Inventor after: Ma Rong

Inventor after: Li Yang

Inventor after: Wang Zhenjiang

Inventor after: Li Jianping

Inventor after: Yin Fan

Inventor before: Zhao Fenglong

Inventor before: Mo Peipei

Inventor before: Yun Zeyu

Inventor before: Zhang Jiandong

Inventor before: Hu Qing

Inventor before: Ma Rong

Inventor before: Li Yang

Inventor before: Wang Zhenjiang

Inventor before: Li Jianping

Inventor before: Yin Fan

GR01 Patent grant
GR01 Patent grant