CN115640201A

CN115640201A - System performance testing method for artificial intelligence server

Info

Publication number: CN115640201A
Application number: CN202211329264.2A
Authority: CN
Inventors: 董建; 徐洋; 杨雨泽; 鲍薇; 张琦
Original assignee: China Electronics Standardization Institute
Current assignee: China Electronics Standardization Institute
Priority date: 2022-10-27
Filing date: 2022-10-27
Publication date: 2023-01-24
Anticipated expiration: 2042-10-27
Also published as: CN115640201B

Abstract

The invention relates to the technical field of server performance testing, in particular to a system performance testing method for an artificial intelligence server, which comprises testing software consisting of a Tester program and Stubs, wherein the Tester program and the Stubs form a load generator LoadGenerator module used in the testing method, and load distribution strategies including continuous or single arrival, fixed period arrival, poisson distribution arrival, peak arrival and offline arrival are carried out on a data set interface and a testing bottom interface which are connected with a tested system.

Description

System performance testing method for artificial intelligence server

Technical Field

The invention relates to the technical field of server performance testing, in particular to a system performance testing method for an artificial intelligence server.

Background

The artificial intelligence server provides data and information services when people search, chat and browse web pages, as the traditional server. In order to meet the requirements of markets and different application scenes, the artificial intelligence server can support N processors at most, so that the artificial intelligence server has strong parallel computing capacity, can obtain the operation capacity of dozens of personal computers, has super-cluster strength, has strong expansibility, can provide an accurate solution according to the real requirements of enterprises, helps the enterprises to realize the appeal of data and resources, and upgrades the position and image of the enterprises.

The artificial intelligence server is mainly presented in the fields of voice recognition, image processing, video imaging, semantic segmentation and the like, and particularly has remarkable achievements in the field of data center computing, data computing provided by the artificial intelligence server is multifaceted and comprises intelligent services such as archive analysis, market segmentation, type division and the like, through specific analysis, an accurate development direction is provided for enterprises, and more targeted development and improvement are achieved.

The artificial intelligence server has the outstanding advantages of large calculated amount, wide operator category and high energy consumption demand, is accepted by numerous enterprises, replaces part of manpower, and is applied to multiple industries such as finance, education, manufacturing, transportation and the like. In future development, the system can be more comprehensively distributed in each industry, and through the advantages of the system, cost reduction, energy consumption saving and efficiency improvement are achieved for enterprises, the bottleneck period is truly eliminated, convenience is brought to the human, and the scientific and technological benefits of higher quality are achieved.

The performance of a server is generally affected by the combination of hardware, a network, an application, configuration and a database, so that the performance is reduced, in terms of a load balancing strategy of middleware, many systems perform service clustering at present, and along with the realization of load balancing, if the load is not balanced enough, some services are easy to be abnormal or hung under a large number of impacts. Due to the learning and recognition capabilities of the artificial intelligence server, the generated computational load is more than that of the traditional server, the service scenes of the artificial intelligence server are various, and how to accurately test the load of the artificial intelligence server is a problem to be faced.

Therefore, in order to solve the above problems, the present application provides a system performance testing method for an artificial intelligence server, which may implement different distribution strategies for loads according to different settings, and combine with a comprehensive testing index system to meet testing requirements in different scenarios.

Disclosure of Invention

The invention aims to fill the blank of the prior art, and provides a system performance testing method for an artificial intelligence server, which can execute different distribution strategies for loads according to different settings so as to meet the testing requirements under different scenes.

In order to achieve the aim, the invention provides a system performance testing method for an artificial intelligence server, which comprises testing software and testing indexes, wherein the testing software consists of a Tester program and a Stubs program, and the Tester is a program operated by a Tester and is responsible for controlling a testing process, maintaining testing data information and receiving testing data sent by the Stubs program; stubs are programs running on the equipment of a test manufacturer, and are responsible for executing actual test programs and butting with a Tester; the Stubs program comprises a Stubs general layer and a Stubs manufacturer adaptation layer, the Stubs general layer is used for flow control and data management, codes are provided by a test mechanism, are compiled into a binary program through C + + language, run on a Tester device, are starting inlets and are responsible for calling manufacturer adaptation codes, monitoring processing flows, communicating with a Tester and integrating and transferring test result data; the stub manufacturer adaptation layer is provided by a manufacturer, comprises service codes for realizing specific test reasoning or training, realizes the addition of interfaces and a script butt stub general layer, and increases a dotting function to obtain information parameters; the Tester program and Stubs program constitute the load generator LoadGenerator module used in the test method.

The test indexes comprise time, power consumption, actual throughput rate, energy efficiency, elasticity, bearing capacity and maximum number of video analysis paths;

the time includes:

and (3) reasoning total delay: continuously deducing end-to-end total delay for multiple times;

end-to-end reasoning delay: the difference between the time the tester sends the sample and the time the tester receives the result;

template sending delay: the difference between the time when the tester sends the sample and the time when the tester receives the sample;

result transfer delay: the difference between the time of sending the result by the testee and the time of receiving the result by the testee;

task assignment latency: the difference between the time when the tested person receives the sample and the time before processing;

pre-processing time delay: the difference between the starting time and the ending time of the pretreatment of the tested person on a certain sample;

and (3) reasoning delay: the difference between the starting time and the ending time of the reasoning of the tested person on a certain sample;

post-processing time delay: the difference between the starting time and the ending time of the post-processing of the tested person on a certain sample;

sample processing delay: the difference between the starting time and the ending time of the sample processing of the tested person, and the processing delay is the synthesis of the preprocessing, reasoning and post-processing time;

assigning processing delays: the difference between the time when the tested person completely receives the sample and the processing ending time;

processing timeout: the maximum time interval allowed for the tester from sending a sample to receiving a corresponding result;

the power consumption includes:

AI server single machine inference average power: the average power of a single AI server in the whole process of reasoning at a certain time;

average power of AI server data preprocessing: the average power of a data preprocessing stage in the whole process of reasoning of a certain time by a single AI server;

the AI server infers the peak power: the single AI server determines the maximum instantaneous power of each component of the server in a full-load pressure state in the whole reasoning process;

AI server cluster inference average power: an AI server cluster which infers the average power in the whole course at a certain time;

the actual throughput rate represents the effective computing capacity of the artificial intelligence server system to the specific inference operation, and the effective computing capacity is improved to achieve the same effect of hardware system capacity expansion; for the visual class test, the unit is images/s, for the natural language processing class test, the unit is sentens/s, including:

the AI server system infers the actual throughput rate: the AI server system completely processes the number of samples for the specific task load in unit time;

the AI server system infers the effective computing power: the AI server system is a weighted geometric average of the ratio of actual throughput rate to per-task baseline throughput rate over a given set S of tasks;

the energy efficiency includes:

visual task energy efficiency ratio: the unit is the image frame number processed by watt per second;

energy efficiency ratio of natural language task: the unit is the number of words processed in watts per second;

the voice task energy efficiency ratio is as follows: the unit is the number of sentences processed in watt per second;

energy efficiency ratio of industrial tasks: calculating according to the energy efficiency ratio of the visual and natural language tasks;

the efficiency is the ratio of the AI server system to the cost of completing the reasoning task, and the unit is kilowatt-hour per second, and comprises the following steps:

AI server reasoning efficiency: the ratio of the actual reasoning accuracy rate to the reasoning energy consumption of the AI server;

the elasticity unit is percentage per megabyte and comprises inference elasticity of an AI server system;

the unit of the bearing pressure is megabyte per second, and the bearing pressure comprises an AI server or a cluster inference bearing pressure:

the actual throughput rate of the measured AI server system when the measured AI server system operates above the concurrency pressure threshold;

the unit of the maximum path number of video analysis is a path, and comprises the following steps of: and the measured AI server system analyzes the video stream under the given response overtime threshold, and can bear the maximum path number.

The method comprises the following steps that a load generator module carries out load distribution strategy on a data set interface and a test bottom interface of a butt joint tested system, and the method specifically comprises the following steps:

s1, continuous or single arrival:

the ith job arrives immediately after the ith-1 job is completed, and the job i is not sent when the job i-1 is not completed or the overtime control threshold is not reached;

s2, a fixed period is reached:

the operation arrives at a fixed period, and one operation is arrived at a time;

s3, poisson distribution reaches:

the operation is achieved by Poisson distribution;

s4, peak arrival:

in the Poisson distribution arrival mode, j short periods exist, a large amount of sudden operations exist in each period, the period lasts for a certain duration and a certain concurrency level is maintained;

s5, offline arrival:

all the steps are achieved at one time;

the LoadGenerator module is developed by C + + and provides interfaces of python, C + +, C for the application calls of the outer layer and supports a plurality of application programs.

The load distribution policy is classified into a synchronous mode and an asynchronous mode according to the difference of job arrival modes.

The synchronous mode comprises a continuous arrival mode, and the sample is serially sent to the system to be tested for processing in the continuous arrival mode, namely, the distribution process and the processing process are the same.

The asynchronous mode comprises a fixed period and Poisson distribution achievement mode, the distribution has fixed time requirements, the fixed period mode needs to distribute samples within a fixed certain time period, and whether the samples are processed or not, so that the distribution thread and the processing thread are not the same in the asynchronous mode.

Compared with the prior art, the method and the device solve the defects of the performance test method of the artificial intelligent server in the prior art, and execute different distribution strategies on the load through different settings so as to meet the test requirements under different scenes.

Drawings

FIG. 1 is a flow diagram of the LoadGenerator load generator framework of the present invention.

FIG. 2 is a schematic diagram of a LoadGenerator module according to the present invention.

FIG. 3 is a flow chart of the synchronization mode of the present invention.

Fig. 4 is a schematic diagram of a registration page according to an embodiment of the present invention.

FIG. 5 is a schematic diagram of a configuration page according to an embodiment of the present invention.

FIG. 6 is a diagram illustrating a test result page according to an embodiment of the present invention.

FIG. 7 is a schematic diagram of page indicators of test results according to an embodiment of the present invention.

Detailed Description

The invention will now be further described with reference to the accompanying drawings.

Referring to fig. 1 to 7, a system performance testing method for an artificial intelligence server includes a test software composed of a Tester program and Stubs program, where the Tester is a program run by a Tester and is responsible for controlling a test process, maintaining test data information, and receiving test data sent by the Stubs program; stubs are programs running on the equipment of the test manufacturer, and are responsible for executing the actual test programs and butting with the Tester; the Stubs program comprises a Stubs general layer and a Stubs manufacturer adaptation layer, the Stubs general layer is used for flow control and data management, codes are provided by a test mechanism, are compiled into a binary program through C + + language, run on a Tester device, are starting inlets and are responsible for calling manufacturer adaptation codes, monitoring processing flows, communicating with a Tester and integrating and transferring test result data; the stub manufacturer adaptation layer is provided by a manufacturer, comprises service codes for realizing specific test reasoning or training, realizes the addition of interfaces and a script butt stub general layer, and increases a dotting function to obtain information parameters; the Tester program and Stubs program constitute the load generator LoadGenerator module used in the test method.

s1, continuous or single arrival:

s2, a fixed period is reached:

s3, poisson distribution reaches:

the operation is achieved by Poisson distribution;

s4, peak arrival:

in the Poisson distribution arrival mode, j short periods exist, a large amount of work is performed in each period, the period lasts for a certain duration and maintains a certain concurrency level;

s5, offline arrival:

all the steps are achieved at one time;

The load distribution strategy is divided into a synchronous mode and an asynchronous mode according to different job reaching modes.

The synchronous mode comprises a continuous arrival mode, and the sample is serially sent to the tested system for processing in the mode, namely the distribution process and the processing process are the same.

Installing software:

1. installing software:

the software contains two parts, the Tester program and the Stubs program. Wherein the Tester program is a server and the Stubs program is a client. Currently, software only supports a Linux operating system.

2. The application program of the Tester server side:

as shown in fig. 4, clicking the register button generates a test ID and generates a configuration file for user registration.

As shown in fig. 5, the user performs the server test through the configuration file, and after the test is completed, the test result of the user is displayed on the Tester server side.

As shown in fig. 6-7, click on the user test ID into the detail results page for that user test.

3. Stubs client application:

decompressing Stubs application programs, wherein the ais-channels-Stubs are entry programs and are responsible for flow control, communication, data management and the like; the code is a service code directory, is read only during operation, a manufacturer code must be put into the directory, and the ais-bench-stubs can monitor the code directory, and if the code is modified, the code is recorded and reported to a tester server; log is a log file directory, and logs operated by the manufacturer service codes need to be put into the directory; a result file directory of result, including a model and an intermediate file, wherein a manufacturer service code needs to be stored in the directory, and the ais-bench-stubs can automatically package and upload the directory to a tester end after the test is finished; work is a temporary storage file.

Acquiring a configuration file, and placing the config.json file generated in the steps to a code folder under a Stubs client.

Acquiring a data set, downloading an imageNet2012 data set, and converting the data set into a TF format

And modifying TRAIN _ DATA _ PATH in config _ imagenet2012.sh under the config folder, and running the Stubs application program.

The above are only preferred embodiments of the present invention, and are only used to help understanding the method and the core idea of the present application, the scope of the present invention is not limited to the above embodiments, and all technical solutions belonging to the idea of the present invention belong to the scope of the present invention. It should be noted that modifications and adaptations to those skilled in the art without departing from the principles of the present invention should also be considered as within the scope of the present invention.

The invention integrally solves the defects of the performance test method of the artificial intelligence server in the prior art, and executes different distribution strategies on the load through different settings so as to meet the test requirements under different scenes.

Claims

1. A system performance test method for an artificial intelligence server is characterized by comprising test software consisting of a Tester program and Stubs and test indexes, wherein the Tester is a program operated by a Tester and is responsible for controlling a test process, maintaining test data information and receiving test data sent by the Stubs; the Stubs are programs running on equipment of a test manufacturer, and are responsible for executing actual test programs and butting with the Tester; the stub program comprises a stub general layer and a stub manufacturer adaptation layer, wherein the stub general layer is used for flow control and data management, codes are provided by a test mechanism, are compiled into a binary program through C + + language, run on test manufacturer equipment, are starting inlets and are responsible for calling manufacturer adaptation codes, monitoring processing flows, communicating with a Tester and integrating and transferring test result data; the stub manufacturer adaptation layer is provided by a manufacturer, comprises a service code for realizing specific test reasoning or training, realizes the butt joint of an additional interface and a script to the stub general layer, and increases a dotting function to obtain information parameters; the Tester program and Stubs program make up the load generator LoadGenerator module used in the test method.

2. The system performance testing method for the artificial intelligence server of claim 1, wherein the test metrics include time, power consumption, goodput, energy efficiency, resiliency, bearing capacity, and video analytics maximum number of lanes;

the time includes:

reasoning total delay: continuously deducing end-to-end total delay for multiple times;

pre-processing time delay: the difference between the starting time and the ending time of the pretreatment of the tested person on a certain sample; and (3) reasoning delay: the difference between the starting time and the ending time of the reasoning of the tested person on a certain sample;

post-processing time delay: the difference between the starting time and the ending time of the post-processing of a certain sample by a tested person;

dispatching processing delay: the difference between the time when the tested person completely receives the sample and the processing ending time;

the power consumption includes:

AI server data preprocessing average power: the average power of a data preprocessing stage in the whole process of reasoning of a certain time by a single AI server;

the energy efficiency includes:

the voice task energy efficiency ratio is as follows: the unit is the number of sentences processed in watts per second;

the elasticity unit is a percentage per megabyte and comprises the inference elasticity of the AI server system;

actual throughput rates of the measured AI server system when operating above the concurrency force threshold.

3. The method as claimed in claim 2, wherein the video analysis maximum number of paths unit is a path, and comprises an AI server video analysis maximum number of paths: and the measured AI server system analyzes the video stream under the given response overtime threshold, and can bear the maximum path number.

4. The method as claimed in claim 1, wherein the load generator module performs a load distribution policy on a dataset interface and a test underlying interface of a system under test, and the method comprises the following steps:

s1, continuous or single arrival:

s2, a fixed period is reached:

s3, poisson distribution reaches:

the operation is achieved by Poisson distribution;

s4, peak arrival:

s5, offline arrival:

can be achieved at one time.

5. The system performance testing method for the artificial intelligence server as claimed in claim 2, wherein the LoadGenerator module is developed by C + + and provides interfaces of python, C + +, C for external application to call and support multiple application programs.

6. The method of claim 4, wherein the load distribution policy is classified into a synchronous mode and an asynchronous mode according to the difference of the job achievement modes.

7. The method as claimed in claim 6, wherein the synchronization mode includes a continuous arrival mode, in which the samples are serially sent to the system under test for processing, i.e. the distribution process and the processing process are the same.

8. The method of claim 6, wherein the asynchronous mode comprises a fixed period, poisson distribution reach mode, in which a fixed time requirement is distributed, and the fixed period mode distributes samples within a fixed period of time, regardless of whether the samples are processed or not, so that the distribution thread and the processing thread are not the same in the asynchronous mode.