CN114040094A

CN114040094A - Method and equipment for adjusting preset position based on pan-tilt camera

Info

Publication number: CN114040094A
Application number: CN202111240092.7A
Authority: CN
Inventors: 王雯雯; 冯远宏; 王江涛
Original assignee: Hisense TransTech Co Ltd
Current assignee: Hisense TransTech Co Ltd
Priority date: 2021-10-25
Filing date: 2021-10-25
Publication date: 2022-02-11
Anticipated expiration: 2041-10-25
Also published as: CN114040094B

Abstract

The invention discloses a preset position adjusting method and equipment based on a pan-tilt camera, wherein in the adjusting equipment, a processor is configured to identify the type of an object to be detected in a target picture to be detected; if the target picture to be detected comprises the non-traffic type object, determining that the scene of the target picture to be detected is a non-traffic scene; or if the target picture to be detected comprises the traffic type object and does not comprise the non-traffic type object, determining that the scene of the target picture to be detected is a non-traffic scene when the first traffic flow direction obtained based on the traffic type object is inconsistent with the second traffic flow direction determined based on the reference picture; and if the scene type of the target picture to be detected is determined to be a non-traffic scene, sending an adjusting instruction of physical position information including the target preset position to the pan-tilt camera so that the pan-tilt camera adjusts the position of the pan-tilt camera to the target preset position according to the physical position information. And automatically adjusting the holder camera at the offset target preset position to the target preset position in time.

Description

Method and equipment for adjusting preset position based on pan-tilt camera

Technical Field

The invention relates to the technical field of video monitoring, in particular to a preset position adjusting method and device based on a pan-tilt camera.

Background

With the large-scale construction of urban video monitoring systems, video-based event detection and analysis are more and more widely applied, and the basis of video analysis and detection needs to be accessed to front-end video monitoring equipment to acquire real-time video streams, so that abnormal events in video pictures are detected and analyzed.

In related applications, a video detection area is usually limited, the basic operation is supported by common electric police and bayonet cameras in front-end video monitoring equipment, but video monitoring cameras with presetting bit functions generally have a pan-tilt and a zoom lens, video pictures are not fixed, once the video pictures caused by the offset (the offset of unknown reasons or artificial movement) of the pan-tilt and ball machine are adjusted, the video detection area is changed, the original visual perception analysis algorithm cannot normally detect and analyze, the presetting bit needs to be manually input and adjusted manually, the video detection area is calibrated, manual adjustment is time-consuming and labor-consuming, and timeliness is poor.

Disclosure of Invention

The invention provides a preset position adjusting method and equipment based on a pan-tilt camera, which can timely acquire the offset condition of the pan-tilt camera and automatically adjust the pan-tilt camera offset from a target preset position to the target preset position so as to accurately perform video detection on a video picture of the target preset position.

According to a first aspect of the exemplary embodiments, there is provided a method for adjusting a preset position based on a pan-tilt camera, the method comprising:

identifying the type of an object to be detected in a target picture to be detected; the target picture to be detected is obtained by processing the picture to be detected according to the position of a target detection area in a reference picture relative to the reference picture, the reference picture is obtained by decoding a target video stream obtained by shooting the tripod head camera at a target preset position, and a scene corresponding to the reference picture is a traffic scene;

if the target picture to be detected comprises the non-traffic type object, determining that the scene of the target picture to be detected is a non-traffic scene; or

If the target picture to be detected comprises an object of a traffic type and does not comprise an object of a non-traffic type, determining that the scene of the target picture to be detected is a non-traffic scene when a first traffic flow direction obtained based on the object of the traffic type is inconsistent with a second traffic flow direction determined based on the reference picture;

and if the scene type of the target picture to be detected is determined to be a non-traffic scene, sending an adjusting instruction of physical position information including a target preset position to the pan-tilt camera so that the pan-tilt camera adjusts the position of the pan-tilt camera to the target preset position according to the physical position information.

In the embodiment of the application, the reference picture is obtained by decoding the target video stream shot by the pan-tilt camera at the target preset position, and the scene corresponding to the reference picture is a traffic scene, so that the reference picture can be used as a reference for comparison. In addition, when the picture to be detected is processed, the picture to be detected is processed according to the position of the target detection area in the reference picture relative to the reference picture, so that the consistency of the target picture to be detected and the reference picture on the detection area is ensured in the picture preprocessing process; and determining the scene of the target picture to be detected by identifying the traffic type object and the non-traffic type object in the target picture to be detected. The reference picture under the target preset position corresponds to the traffic scene, so that when the scene type of the picture to be detected of the target is determined to be a non-traffic scene, the preset position of the pan-tilt camera is shown to be deviated from the target preset position, and at the moment, an adjusting instruction of physical position information including the target preset position is sent to the pan-tilt camera, so that the pan-tilt camera adjusts the position of the pan-tilt camera to the target preset position according to the physical position information. The offset condition of the pan-tilt camera is timely known, the pan-tilt camera at the offset target preset position is automatically adjusted to the target preset position, and the reference picture meeting the video analysis and detection requirements is obtained under the target preset position, so that the video picture shot by the pan-tilt camera after adjustment meets the requirements of video analysis and detection.

In some exemplary embodiments, the method further comprises:

if the target picture to be detected comprises an object of a traffic type and does not comprise an object of a non-traffic type, determining that the scene of the target picture to be detected is a traffic scene when the traffic flow direction obtained based on the object of the traffic type is consistent with the traffic flow direction determined based on the reference picture;

and sending the picture to be detected of the target to a video analysis terminal so that the video analysis terminal can analyze the picture to be detected of the target by applying a preset video analysis algorithm.

In the above embodiment, the picture to be detected of the target includes the traffic type object and does not include the non-traffic type object, which indicates that the pan-tilt camera may be shifted to another preset position where only the traffic type object is just shot, and at this time, it is necessary to determine that the traffic flow direction obtained by the traffic type object is consistent with the traffic flow direction determined based on the reference picture, and then it may be more accurately determined that the preset position of the pan-tilt camera is still the target preset position at this time, and no shift occurs, and at this time, it is determined that the picture to be detected of the target or the video to be detected of the target may be used for video analysis, and then it is sent to the video analysis terminal for analysis.

In some exemplary embodiments, the target picture to be detected is obtained by:

and cutting the picture to be detected according to the position of the target detection area in the reference picture relative to the reference picture to obtain the target picture to be detected.

In the embodiment, the picture to be detected is cut according to the position of the target detection area in the reference picture relative to the reference picture, so that the consistency of the target picture to be detected and the reference picture on the detection area is ensured in the picture preprocessing process.

In some exemplary embodiments, the first traffic flow direction is determined by:

identifying first pixel coordinates of objects of all traffic types in the target picture to be detected;

determining the direction of the first traffic flow according to the first pixel coordinate of each object of the traffic type;

determining a second traffic flow direction by:

identifying second pixel coordinates of objects of all traffic types in the reference picture in the target picture to be detected;

and determining the direction of the second traffic flow according to the second pixel coordinate of each object of the traffic type.

According to the embodiment, the direction of the traffic flow is determined by identifying the pixel coordinates of the traffic type object in the target picture to be detected, so that the accuracy of the determined direction of the traffic flow is ensured.

In some exemplary embodiments, the first traffic flow direction and the second traffic flow direction are determined to be non-coincident by:

determining a first angle of the first traffic flow direction in the target picture to be detected and a second angle of the second traffic flow direction in the reference picture;

and if the angle difference between the first angle and the second angle is larger than a preset angle threshold value, determining that the first traffic flow direction is inconsistent with the second traffic flow direction.

In the embodiment, whether the two traffic flow directions are consistent or not is determined by determining the angle difference between the angles of the traffic flow directions in the corresponding pictures, so that the accuracy of determining the scene type when the target picture to be detected comprises the traffic type object and does not comprise the non-traffic type object is further ensured.

In some exemplary embodiments, the scene type of the target picture to be detected is obtained by inputting the target picture to be detected into a pre-trained neural network model.

In some exemplary embodiments, the backbone network of the neural network model is a lightweight network MobileNet v 2; the lightweight network MobileNet v2 comprises a depth convolution of a depth separable convolution and 1 x 1 point-by-point convolution; the convolution kernel of the lightweight network MobileNet v2 is 3 in size, the basic building blocks are residual bottleneck depth separable convolutions, and the architecture comprises an initial full convolution with 32 convolution kernels and 19 residual bottleneck layers; ReLU6 was used as the nonlinear activation function.

According to a second aspect of the exemplary embodiments, there is provided a pan-tilt-camera-based preset position adjusting apparatus, the apparatus comprising a processor and a video processing unit, wherein:

the video processing unit is configured to:

acquiring a video stream to be detected, which is obtained by shooting the cloud deck camera at a position to be detected, and decoding the video stream to be detected to obtain a picture to be detected;

the processor is configured to:

identifying the type of an object to be detected in a target picture to be detected; the target picture to be detected is obtained by processing the picture to be detected according to the position of a target detection area in a reference picture relative to the reference picture, the reference picture is obtained by decoding a target video stream obtained by shooting the target video stream at a target preset position by the pan-tilt camera, and a scene corresponding to the reference picture is a traffic scene;

In some exemplary embodiments, the processor is further configured to:

In some exemplary embodiments, the processor is further configured to obtain the target picture to be detected by:

In some exemplary embodiments, the processor is further configured to determine the first traffic flow direction by:

the processor is further configured to determine a second traffic flow direction by:

In some exemplary embodiments, the processor is further configured to determine that the first traffic flow direction and the second traffic flow direction are not coincident by:

According to a third aspect of the exemplary embodiments, there is provided a pan-tilt-camera-based preset position adjusting apparatus, the apparatus including:

the object identification module is used for identifying the type of an object to be detected in the target picture to be detected; the target picture to be detected is obtained by processing the picture to be detected according to the position of a target detection area in a reference picture relative to the reference picture, the reference picture is obtained by decoding a target video stream obtained by shooting the tripod head camera at a target preset position, and a scene corresponding to the reference picture is a traffic scene;

the first scene determining module is used for determining that the scene of the target picture to be detected is a non-traffic scene when the target picture to be detected comprises a non-traffic type object;

the second scene determining module is used for determining that the scene of the target picture to be detected is a non-traffic scene when the first traffic flow direction obtained based on the traffic type object is inconsistent with the second traffic flow direction determined based on the reference picture;

and the preset position adjusting module is used for sending an adjusting instruction comprising the physical position information of the target preset position to the holder camera when the scene type of the target picture to be detected is determined to be a non-traffic scene, so that the holder camera adjusts the position of the holder camera to the target preset position according to the physical position information.

In some exemplary embodiments, the apparatus further comprises a third scenario determination module and a transmission module:

the third scenario determination module is to: when the traffic flow direction obtained based on the traffic type object is consistent with the traffic flow direction determined based on the reference picture, determining that the scene of the target picture to be detected is a traffic scene;

the sending module is specifically configured to: and sending the picture to be detected of the target to a video analysis terminal so that the video analysis terminal can analyze the picture to be detected of the target by applying a preset video analysis algorithm.

In some exemplary embodiments, the image processing module is further configured to obtain a picture to be detected of the target by:

In some exemplary embodiments, the system further comprises a traffic flow direction determination module for determining the first traffic flow direction by:

the traffic flow direction determination module is further configured to determine a second traffic flow direction by:

In some exemplary embodiments, the method further comprises determining that the first traffic flow direction and the second traffic flow direction are not consistent by:

According to a fourth aspect of the exemplary embodiments, there is provided a computer storage medium having stored therein computer program instructions, which when run on a computer, cause the computer to execute the pan-tilt-camera-based preset bit adjustment method according to the first aspect.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without inventive exercise.

Fig. 1 exemplarily shows an application scene diagram of a preset position adjusting method based on a pan-tilt camera according to an embodiment of the present invention;

fig. 2 is a flowchart illustrating a preset position adjusting method based on a pan-tilt camera according to an embodiment of the present invention;

fig. 3 is a flowchart illustrating a preset position adjusting method based on a pan-tilt camera according to an embodiment of the present invention;

fig. 4 is a schematic diagram illustrating a video frame captured by a calibrated pan-tilt camera according to an embodiment of the present invention;

fig. 5 is a schematic structural diagram schematically illustrating a preset position adjusting device based on a pan-tilt camera according to an embodiment of the present invention;

fig. 6 schematically illustrates a structural diagram of a preset position adjusting device based on a pan-tilt camera according to an embodiment of the present invention.

Detailed Description

The technical solution in the embodiments of the present application will be described in detail and removed with reference to the accompanying drawings. In the description of the embodiments herein, "/" means "or" unless otherwise specified, for example, a/B may mean a or B; "and/or" in the text is only an association relationship describing an associated object, and means that three relationships may exist, for example, a and/or B may mean: three cases of a alone, a and B both, and B alone exist, and in addition, "a plurality" means two or more than two in the description of the embodiments of the present application.

In the following, the terms "first", "second" are used for descriptive purposes only and are not to be understood as implying or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include one or more of that feature, and in the description of embodiments of the application, unless stated otherwise, "plurality" means two or more.

In the relevant application, the video detection area can be usually limited, the basic operation is supported by common electric police and bayonet cameras, but the video monitoring camera with the presetting bit function generally has a holder and a zoom lens, the video picture is unfixed, once the holder and the ball machine shift or artificially adjust the video picture, the video detection area changes, the original visual perception analysis algorithm cannot normally detect and analyze, the presetting bit needs to be manually input and adjusted manually, the video detection area is calibrated, manual adjustment is time-consuming and labor-consuming, and timeliness is poor.

Therefore, the application provides a preset position adjusting method based on a pan-tilt camera, wherein the method comprises the steps of identifying the type of an object to be detected in a picture to be detected; the target picture to be detected is obtained by processing the picture to be detected according to the position of a target detection area in a reference picture relative to the reference picture, the reference picture is obtained by decoding a target video stream shot by a pan-tilt camera at a target preset position, and a scene corresponding to the reference picture is a traffic scene; if the target picture to be detected comprises the non-traffic type object, determining that the scene of the target picture to be detected is a non-traffic scene; or if the target picture to be detected comprises the traffic type object and does not comprise the non-traffic type object, determining that the scene of the target picture to be detected is a non-traffic scene when the first traffic flow direction obtained based on the traffic type object is inconsistent with the second traffic flow direction determined based on the reference picture; and if the scene type of the target picture to be detected is determined to be a non-traffic scene, sending an adjusting instruction of physical position information including the target preset position to the pan-tilt camera so that the pan-tilt camera adjusts the position of the pan-tilt camera to the target preset position according to the physical position information. The offset condition of the pan-tilt camera is timely known, the pan-tilt camera at the offset target preset position is automatically adjusted to the target preset position, and the reference picture meeting the video analysis and detection requirements is obtained under the target preset position, so that the video picture shot by the pan-tilt camera after adjustment meets the requirements of video analysis and detection.

After introducing the design concept of the embodiment of the present application, some simple descriptions are provided below for application scenarios to which the technical solution of the embodiment of the present application can be applied, and it should be noted that the application scenarios described below are only used for describing the embodiment of the present application and are not limited. In specific implementation, the technical scheme provided by the embodiment of the application can be flexibly applied according to actual needs.

Referring to fig. 1, an application scenario diagram of a preset position adjusting method based on a pan-tilt camera is shown. As can be seen from fig. 1, the video detection area originally subjected to video analysis and detection is a video image of the intersection, but after the pan-tilt camera is shifted or manually adjusted, the video image changes, and only more areas such as green belts can be shot. Therefore, video detection analysis cannot be performed according to the video detection algorithm configured for the original video detection area.

To further illustrate the technical solutions provided by the embodiments of the present application, the following detailed description is made with reference to the accompanying drawings and the detailed description. Although the embodiments of the present application provide method steps as shown in the following embodiments or figures, more or fewer steps may be included in the method based on conventional or non-inventive efforts. In steps where no necessary causal relationship exists logically, the order of execution of the steps is not limited to that provided by the embodiments of the present application.

The technical solution provided by the embodiment of the present application is described below with reference to an application scenario shown in fig. 1 and a preset position adjusting method based on a pan-tilt-zoom camera shown in fig. 2.

S201, identifying the type of an object to be detected in a target picture to be detected.

The target picture to be detected is obtained by processing the picture to be detected according to the position of a target detection area in a reference picture relative to the reference picture, the reference picture is obtained by decoding a target video stream shot by a pan-tilt camera at a target preset position, and a scene corresponding to the reference picture is a traffic scene;

s202, if the target picture to be detected comprises the non-traffic type object, determining that the scene of the target picture to be detected is a non-traffic scene.

S203, if the picture to be detected of the target comprises the object of the traffic type and does not comprise the object of the non-traffic type, when the first traffic flow direction obtained based on the object of the traffic type is inconsistent with the second traffic flow direction determined based on the reference picture, determining that the scene of the picture to be detected of the target is a non-traffic scene.

And S204, if the scene type of the target picture to be detected is determined to be a non-traffic scene, sending an adjusting instruction of physical position information including a target preset position to the pan-tilt camera so that the pan-tilt camera adjusts the position of the pan-tilt camera to the target preset position according to the physical position information.

Referring to S201, a target preset position is preset, for example, 100 preset positions exist when the pan-tilt camera leaves a factory, where for a current intersection, a video analysis picture obtained by shooting a related parameter setting of the 58 th preset position may be used to perform video analysis and detection, and configure related parameters of a corresponding video detection algorithm. In this way, the target preset position may be the 58 th preset position, and the physical position information of the target preset position may be obtained through obtaining the factory parameters of the pan/tilt camera.

And secondly, under the condition that the pan-tilt camera is set as a target preset position, acquiring a video picture shot by the pan-tilt camera within a period of time, calling the video picture as a target video stream, and decoding the target video stream to obtain a plurality of pictures. Since the pictures are obtained by shooting at the target preset position, the obtained multiple pictures are subjected to region labeling, that is, regions where traffic flow, people flow and the like at the intersection are located are labeled as target detection regions to serve as video detection regions. The marked picture is called a reference picture, and since the reference picture is marked according to the above principle, the scene corresponding to the reference picture is a traffic scene. There are also non-traffic scenarios, as distinguished from traffic scenarios, where one example of a non-traffic scenario is the inclusion of too many greenbelts or buildings, etc.

And for the target picture to be detected, the picture to be detected can be obtained by processing the picture to be detected according to the position of the target detection area in the reference picture relative to the reference picture. For example, if the position of the target detection area in the reference picture relative to the reference picture is at the lower right corner of the reference picture, the lower right corner of the picture to be detected is cut to obtain the target picture to be detected. The description of the lower right corner is just an example, and in practical applications, the position of the target detection area relative to the reference picture can be determined by using accurate pixel coordinates.

Specifically, after the target picture to be detected is obtained, the type of the object to be detected in the target picture to be detected is identified, wherein the type of the object to be detected may include a traffic type object and a non-traffic type object. Illustratively, the traffic type objects are, for example, vehicles, signal lights, etc., and the non-traffic type objects are, for example, trees, greenbelts, buildings, etc.

Referring to S202, the scene of the target picture to be detected is determined to be a non-traffic scene by analyzing the type of the object included in the target picture to be detected.

In the first case, if the target picture to be detected includes the non-traffic type object, the scene of the target picture to be detected is determined to be a non-traffic scene.

In this case, for example, if the picture to be detected of the target includes at least one of a tree, a green belt, or a building, it may be determined that the scene of the picture to be detected of the target is a non-traffic scene. In this case, whether the target to be detected has an object of a traffic type in the picture or not, as long as the object of a non-traffic type is included, it can be directly determined that it is a non-traffic scene.

In the second case, the picture to be detected of the target includes the object of the traffic type and does not include the object of the non-traffic type, and when the first traffic flow direction obtained based on the object of the traffic type is inconsistent with the second traffic flow direction determined based on the reference picture, the scene of the picture to be detected of the target is determined to be a non-traffic scene.

In this case, for example, the target picture to be detected includes a car but does not include a tree, and at this time, it cannot be directly determined that the scene of the target picture to be detected is a traffic scene, and further determination is required. And further judging that the scene of the target picture to be detected is a non-traffic scene when the first traffic flow direction obtained based on the object of the traffic type is inconsistent with the second traffic flow direction determined based on the reference picture.

The above example is, for example, a case where the pan-tilt camera is shifted or manually adjusted, and is also just shooting a crossing, such as a crossing, and when the target preset position is located, the car coming from the north and the car going to the north are monitored, and when the target preset position is shifted or adjusted, the car coming from the east or the car going to the east are monitored. In this case, there is also no non-traffic type object in the target picture to be detected, but the actual pan-tilt camera has shifted. At this time, for example, if the direction of the first traffic flow determined based on each vehicle in the target picture to be detected is north and the direction of the second traffic flow determined based on each vehicle in the reference picture is east, it is determined that the two are not consistent, and it is determined that the scene of the target picture to be detected is a non-traffic scene.

In one particular example, the first traffic flow direction is determined by:

identifying first pixel coordinates of objects of all traffic types in the target picture to be detected; the direction of the first traffic flow is determined from the first pixel coordinates of the objects of the respective traffic type.

Specifically, the traffic type object is, for example, a vehicle, and taking 15 vehicles as an example in the target picture to be detected, it is determined that the 15 vehicles correspond to the first pixel coordinate in the target picture to be detected, and then the direction of the first traffic flow is determined according to the variation trend of the abscissa and the variation trend of the ordinate in the 15 groups of pixel coordinates.

Determining a second traffic flow direction by:

identifying second pixel coordinates of objects of all traffic types in the reference picture in the target picture to be detected; and determining the direction of the second traffic flow according to the second pixel coordinate of the object of each traffic type.

Specifically, the traffic type object is, for example, a vehicle, and taking 20 vehicles as an example in the reference picture, it is determined that the 20 vehicles correspond to the second pixel coordinate in the reference picture, and then the direction of the second traffic flow is determined according to the variation trend of the abscissa and the variation trend of the ordinate in the 20 sets of pixel coordinates.

In the practical application process, the pixel coordinates of the vehicle can be refined to the pixel coordinates of the head position and the pixel coordinates of the tail position, so that the directions of the first traffic flow and the second traffic flow can be accurately determined.

After the first traffic flow direction and the second traffic flow direction are determined, whether the first traffic flow direction and the second traffic flow direction coincide may be determined by:

determining a first angle of a first traffic flow direction in a target picture to be detected and a second angle of a second traffic flow direction in a reference picture; and if the angle difference between the first angle and the second angle is larger than a preset angle threshold value, determining that the first traffic flow direction is inconsistent with the second traffic flow direction.

If the target picture to be detected and the reference picture are rectangles, a first angle of the first traffic flow direction in the target picture to be detected is an angle relative to a long edge of the target picture to be detected, and a second angle of the second traffic flow direction in the reference picture is an angle relative to a long edge of the reference picture. And when the angle difference between the first angle and the second angle is larger than a preset angle threshold value, determining that the first traffic flow direction is inconsistent with the second traffic flow direction, and indicating that the pan-tilt camera is deviated. In a specific example, the preset angle threshold is, for example, 5 °. And if the angle difference between the first angle and the second angle is smaller than or equal to a preset angle threshold value, determining that the first traffic flow direction is consistent with the second traffic flow direction, indicating that the pan-tilt camera is still at the target preset position and has no offset.

When the method is applied to determining that the scene type of the picture to be detected of the target is a non-traffic scene, the preset position of the pan-tilt camera at the moment is not the target preset position, an adjusting instruction is sent to the pan-tilt camera at the moment, the adjusting instruction comprises the physical position information of the target preset position, and therefore the pan-tilt camera can adjust the position of the pan-tilt camera to the target preset position according to the physical position information. Therefore, the pan-tilt camera can monitor at a target preset position, and the acquired video stream can be used for video analysis and detection.

It should be noted that, the preset positions in the embodiment of the present application are all for the pan/tilt head, and as for the rotation of the camera relative to the pan/tilt head, the preset positions are not related to the technical problem solved by the present application, and are not within the range that needs to be considered in the embodiment of the present application.

In the process of judging the scene type of the target picture to be detected, there is a case that the target picture to be detected includes an object of a traffic type and does not include an object of a non-traffic type, and when the traffic flow direction obtained based on the object of the traffic type is consistent with the traffic flow direction determined based on the reference picture, the scene of the target picture to be detected is determined to be a traffic scene, which indicates that the pan-tilt camera is still monitored by using the target preset position, that is, the obtained video stream can be used for video analysis, and the corresponding pictures to be detected of each target can be used for video analysis. At the moment, the picture to be detected of the target is sent to the video analysis terminal, and then the video analysis terminal applies a preset video analysis algorithm to analyze the picture to be detected of the target. Illustratively, the video analysis terminal is a video analysis device of a traffic management department. The predetermined video analysis algorithm may be any one of video analysis algorithms in the prior art, and will not be described herein.

In an actual application process, the scene type of the target picture to be detected is usually obtained by inputting the target picture to be detected into a pre-trained neural network model. In the process, the neural network model with the set structure is trained until a neural network model with a better training effect is obtained, and then the trained model identifies the scene of the target picture to be detected.

Specifically, a convolutional neural network is used for extracting image features of a detection area, and finally a softmax classifier is used for predicting scene classification results, wherein the scene classification results comprise two types of traffic scenes and non-traffic scenes, if the detection area is the traffic scene, the scene comparison results are defined to be consistent, and if the detection area is the non-traffic scene, the scene comparison results are defined to be inconsistent.

Regarding algorithm type selection, a scene classification technology research and design algorithm based on deep learning is adopted, and a lightweight network MobileNet v2 is selected as a backbone network. The design principle and the network structure of the selected model will be described separately below.

The selected neural network model with the set structure has the following structure and structure parameters: the backbone network is a lightweight network MobileNet v 2; the lightweight network MobileNet v2 includes a depth convolution of a depth separable convolution and a 1 x 1 point-by-point convolution; the convolution kernel of the lightweight network MobileNet v2 is 3 in size, the basic building blocks are residual bottleneck depth separable convolutions, and the architecture comprises an initial full convolution with 32 convolution kernels and 19 residual bottleneck layers; ReLU6 was used as the nonlinear activation function.

In detail, MobileNet v2 improved over v 1. The core idea in MobileNet v1 is a deep separable convolution that splits a standard convolution into two partial convolutions: the first layer is called deep convolution, and a single-channel lightweight convolution kernel is applied to each input channel; the second layer is a 1 × 1 convolution, called point-by-point convolution, which is responsible for computing the linear combination of the input channels to construct the new features. The convolution kernel size k used in MobileNet v2 is 3, which reduces the calculation amount by 8-9 times compared with the standard convolution, and only slightly loses the precision. The basic building block of MobileNet v2 is residual bottleneck depth separable convolution, and the architecture of MobileNet v2 contains an initial full convolution with 32 convolution kernels, followed by 19 residual bottleneck layers, where ReLU6 is used as a nonlinear activation function, which is more robust at low precision computations. The network structure of MobileNet v2 is shown in table 1, where t is the multiplication factor of the input channel, c is the number of output channels, n is the number of times the module is repeated, and s is the step size at which the module is repeated for the first time. For the 19-class classification requirements of the project, on the basis of the original model, the convolution layer after the last average pooling layer is modified, and two full-connected layers (the thickened part in table 1) are added.

Illustratively, the training process for the model is as follows: selecting a large number of positive samples and negative samples, wherein the positive samples are a plurality of positive pictures obtained by decoding a video stream obtained by a pan-tilt camera under a target preset position, the negative samples are a plurality of negative pictures obtained by decoding a video stream obtained by the pan-tilt camera under other preset positions except the target preset position, the plurality of positive pictures and the plurality of negative pictures form training samples, labeling each training sample to obtain a category label corresponding to each training sample, for example, a label 1 is printed on the positive sample, a label 0 is printed on the negative sample, the label 1 represents a traffic scene, and the label 0 represents a non-traffic scene. Inputting a plurality of training samples and pre-labeled class labels corresponding to the training samples into an initial MobileNet v2 model, and performing fusion processing on image features of the training samples through the initial MobileNet v2 model to obtain a prediction classification result corresponding to each training sample and used for representing the probability that the training samples belong to each preset class respectively; and determining a classification loss value according to a prediction classification result corresponding to each training sample and a pre-labeled class label, and adjusting initial MobileNet v2 model parameters according to the classification loss value until the determined classification loss value is within a preset range to obtain a trained MobileNet v2 model.

In a specific example, table 1 shows the parameters of a MobileNet v2 network. The MobileNet v2 model with better training effect can be obtained by the group of parameters.

TABLE 1 parameter Table for MobileNet v2 network

Input device	Operation of	t	c	n	s
						2242×3	conv2d	-	32	1	2
1122×32	bottleneck	1	16	1	1
						1122×16	bottleneck	6	24	2	2
562×24	bottleneck	6	32	3	2
						282×32	bottleneck	6	64	4	2
142×64	bottleneck	6	96	3	1
						142×96	bottleneck	6	160	3	2
72×160	bottleneck	6	320	1	1
						72×320	conv2d 1×1	-	1280	1	1
72×1280	Avgpool 7×7	-	-	1	-
						1280	fc-1000	-	1000	-	-
1000	fc-19	-	19	-	-

In order to improve the technical solution of the present application, referring to fig. 3, a complete flowchart is used to describe the preset position adjusting method based on the pan-tilt camera in the present application.

S301, cutting the picture to be detected according to the position of the target detection area in the reference picture relative to the reference picture to obtain the target picture to be detected.

S302, identifying the type of the object to be detected in the target picture to be detected.

The target picture to be detected is obtained by processing the picture to be detected according to the position of a target detection area in the reference picture relative to the reference picture, the reference picture is obtained by decoding a target video stream shot by a pan-tilt camera at a target preset position, and a scene corresponding to the reference picture is a traffic scene;

s303, if the target picture to be detected comprises the non-traffic type object, determining that the scene of the target picture to be detected is a non-traffic scene.

S304, if the target picture to be detected comprises the traffic type object and does not comprise the non-traffic type object, identifying first pixel coordinates of all traffic type objects in the target picture to be detected; determining the direction of a first traffic flow according to the first pixel coordinates of the objects of each traffic type; identifying second pixel coordinates of all traffic type objects in the reference picture in the target picture to be detected; and determining the direction of the second traffic flow according to the second pixel coordinate of the object of each traffic type.

S305, determining a first angle of a first traffic flow direction in a target picture to be detected and a second angle of a second traffic flow direction in a reference picture; and if the angle difference between the first angle and the second angle is larger than a preset angle threshold value, determining that the first traffic flow direction is inconsistent with the second traffic flow direction.

S306, when the first traffic flow direction obtained based on the object of the traffic type is inconsistent with the second traffic flow direction determined based on the reference picture, determining that the scene of the target picture to be detected is a non-traffic scene.

S307, if the scene type of the target picture to be detected is determined to be a non-traffic scene, sending an adjusting instruction of physical position information including a target preset position to the pan-tilt camera, so that the pan-tilt camera adjusts the position of the pan-tilt camera to the target preset position according to the physical position information.

S308, if the target picture to be detected comprises the traffic type object and does not comprise the non-traffic type object, determining that the scene of the target picture to be detected is a traffic scene when the traffic flow direction obtained based on the traffic type object is consistent with the traffic flow direction determined based on the reference picture.

S309, sending the target picture to be detected to a video analysis terminal so that the video analysis terminal can analyze the target picture to be detected by applying a preset video analysis algorithm.

In summary, the embodiment of the application can determine whether the preset position of the pan/tilt camera changes only by analyzing the scene, and then adjust the preset position of the pan/tilt camera to the original target preset position for setting the target detection area, thereby ensuring the accuracy and effectiveness of the subsequent video analysis and detection process.

It should be noted that the number of the above-mentioned process steps and the execution sequence of the steps are not directly related, and the whole adjusting process based on the preset position of the pan/tilt camera is implemented by combining fig. 3. In a specific example, fig. 4 shows a schematic view of a video frame taken by a calibrated pan-tilt camera; therefore, the calibrated picture is consistent with the reference picture, and can be used for video analysis and detection of the intersection or the road section.

As shown in fig. 5, based on the same inventive concept, an embodiment of the present invention provides a preset position adjusting device based on a pan-tilt camera, including: an object recognition module 51, a first scene determination module 52, a second scene determination module 53 and a preset bit adjustment module 54.

The object identification module 51 is configured to identify a type of an object to be detected in a target picture to be detected; the target picture to be detected is obtained by processing the picture to be detected according to the position of a target detection area in a reference picture relative to the reference picture, the reference picture is obtained by decoding a target video stream shot by a pan-tilt camera at a target preset position, and a scene corresponding to the reference picture is a traffic scene;

the first scene determining module 52 is configured to determine that a scene of the target picture to be detected is a non-traffic scene when the target picture to be detected includes a non-traffic type object;

a second scene determining module 53, configured to determine that the scene of the target picture to be detected is a non-traffic scene when the first traffic flow direction obtained based on the traffic type object is inconsistent with the second traffic flow direction determined based on the reference picture, and the target picture to be detected includes the traffic type object and does not include the non-traffic type object;

and the preset position adjusting module 54 is configured to send an adjusting instruction including physical position information of the target preset position to the pan-tilt camera when it is determined that the scene type of the target picture to be detected is a non-traffic scene, so that the pan-tilt camera adjusts the position of the pan-tilt camera to the target preset position according to the physical position information.

the sending module is specifically configured to: and sending the target picture to be detected to a video analysis terminal so that the video analysis terminal can analyze the target picture to be detected by applying a preset video analysis algorithm.

determining the direction of a first traffic flow according to the first pixel coordinates of the objects of each traffic type;

and determining the direction of the second traffic flow according to the second pixel coordinate of the object of each traffic type.

determining a first angle of a first traffic flow direction in a target picture to be detected and a second angle of a second traffic flow direction in a reference picture;

In some exemplary embodiments, the backbone network of the neural network model is a lightweight network MobileNet v 2; the lightweight network MobileNet v2 includes a depth convolution of a depth separable convolution and a 1 x 1 point-by-point convolution; the convolution kernel of the lightweight network MobileNet v2 is 3 in size, the basic building blocks are residual bottleneck depth separable convolutions, and the architecture comprises an initial full convolution with 32 convolution kernels and 19 residual bottleneck layers; ReLU6 was used as the nonlinear activation function.

Since the apparatus is the apparatus in the method in the embodiment of the present invention, and the principle of the apparatus for solving the problem is similar to that of the method, the implementation of the apparatus may refer to the implementation of the method, and repeated details are not repeated.

As shown in fig. 6, based on the same inventive concept, an embodiment of the present invention provides a preset position adjusting apparatus based on a pan-tilt camera, including: a processor 601 and a video processing unit 602.

The video processing unit 602 is configured to:

acquiring a video stream to be detected, which is obtained by shooting at a position to be detected by a pan-tilt camera, and decoding the video stream to be detected to obtain a picture to be detected;

the processor 601 is configured to:

identifying the type of an object to be detected in a target picture to be detected; the target picture to be detected is obtained by processing the picture to be detected according to the position of a target detection area in a reference picture relative to the reference picture, the reference picture is obtained by decoding a target video stream shot by a pan-tilt camera at a target preset position, and a scene corresponding to the reference picture is a traffic scene;

If the target picture to be detected comprises the traffic type object and does not comprise the non-traffic type object, determining that the scene of the target picture to be detected is a non-traffic scene when the first traffic flow direction obtained based on the traffic type object is inconsistent with the second traffic flow direction determined based on the reference picture;

and if the scene type of the target picture to be detected is determined to be a non-traffic scene, sending an adjusting instruction of physical position information including the target preset position to the pan-tilt camera so that the pan-tilt camera adjusts the position of the pan-tilt camera to the target preset position according to the physical position information.

In some exemplary embodiments, the processor 601 is further configured to:

if the target picture to be detected comprises the traffic type object and does not comprise the non-traffic type object, determining that the scene of the target picture to be detected is a traffic scene when the traffic flow direction obtained based on the traffic type object is consistent with the traffic flow direction determined based on the reference picture;

and sending the target picture to be detected to a video analysis terminal so that the video analysis terminal can analyze the target picture to be detected by applying a preset video analysis algorithm.

In some exemplary embodiments, the processor 601 is further configured to obtain the target picture to be detected by:

In some exemplary embodiments, the processor 601 is further configured to determine the first traffic flow direction by:

In some exemplary embodiments, processor 601 is further configured to determine that the first traffic flow direction and the second traffic flow direction are not coincident by:

The embodiment of the invention also provides a computer storage medium, wherein computer program instructions are stored in the computer storage medium, and when the instructions run on a computer, the computer is enabled to execute the steps of the network distribution method of the electronic home equipment.

As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

It will be apparent to those skilled in the art that various changes and modifications may be made in the present application without departing from the spirit and scope of the application. Thus, if such modifications and variations of the present application fall within the scope of the claims of the present application and their equivalents, the present application is intended to include such modifications and variations as well.

Claims

1. The utility model provides a preset position adjusting equipment based on cloud platform camera which characterized in that, includes video processing unit and treater:

the video processing unit is configured to:

the processor is configured to:

2. The device of claim 1, wherein the processor is further configured to:

3. The apparatus of claim 1, wherein the processor is further configured to obtain the target picture to be detected by:

4. The device of claim 1, wherein the processor is further configured to determine the first traffic flow direction by:

5. The device of claim 4, wherein the processor is further configured to determine that the first traffic flow direction and the second traffic flow direction are inconsistent by:

6. The device according to any one of claims 1 to 5, wherein the scene type of the target picture to be detected is obtained by inputting the target picture to be detected into a pre-trained neural network model.

7. The apparatus of claim 6, wherein the backbone network of the neural network model is a lightweight network MobileNet v 2; the lightweight network MobileNet v2 comprises a depth convolution of a depth separable convolution and 1 x 1 point-by-point convolution; the convolution kernel of the lightweight network MobileNet v2 is 3 in size, the basic building blocks are residual bottleneck depth separable convolutions, and the architecture comprises an initial full convolution with 32 convolution kernels and 19 residual bottleneck layers; ReLU6 was used as the nonlinear activation function.

8. A preset position adjusting method based on a pan-tilt camera is characterized by comprising the following steps:

9. The method of claim 8, further comprising:

10. The method according to claim 8, wherein the picture to be detected of the target is obtained by: