CN113343870B

CN113343870B - Identification enabling method based on Android system mobile equipment

Info

Publication number: CN113343870B
Application number: CN202110672602.1A
Authority: CN
Inventors: 吴伟; 张嵘; 陈磊; 孙嘉鹏
Original assignee: Nanjing Institute Of Jindun Public Security Technology Co ltd
Current assignee: Nanjing Institute Of Jindun Public Security Technology Co ltd
Priority date: 2021-06-17
Filing date: 2021-06-17
Publication date: 2024-02-23
Anticipated expiration: 2041-06-17
Also published as: CN113343870A

Abstract

According to the identification enabling method based on the Android system mobile equipment, on the basis of conventional integrated wearable image identification equipment, the Android system host is used as a carrier through communication protocol analysis and image identification, the operation capability of the Android equipment is used for enabling the video image acquisition equipment which does not have the image processing capability to have the image identification function, and the video image acquisition equipment is decoupled from the image operation unit; compared with the conventional integrated wearable image recognition equipment, when police or security personnel patrol and duty, equipment is damaged, only damaged equipment units can be replaced, the use cost can be reduced, and the equipment universality is improved. In addition, the local comparison library and the cloud comparison library can be set according to the characteristics of different patrol modes of police or security personnel, and the image recognition speed and the image recognition quantity are further improved according to different tasks.

Description

Identification enabling method based on Android system mobile equipment

Technical Field

The invention relates to the technical field of mobile equipment communication and image recognition, in particular to a recognition enabling method based on Android system mobile equipment, and further relates to a method for enabling the Android mobile equipment to communicate with multi-type video acquisition equipment and endow the Android mobile equipment with image recognition capability.

Background

Aiming at the difficult problem that police or security personnel cannot identify the face features of a plurality of related personnel by means of personal memory when patrolling on duty, conventional integrated wearable image identification equipment is generated, but the integrated wearable image identification equipment is high in integration degree, difficult to maintain after equipment is damaged, and high in iterative upgrade development cost.

Disclosure of Invention

In order to solve the technical problems, the invention provides an identification enabling method of mobile equipment based on an Android system, which comprises the following steps:

s1, a video image acquisition device of a wearable device acquires images, formats the acquired images and/or videos through a communication protocol, and sends the formatted images and/or videos to an Android mobile device;

further, the communication protocol comprises a start frame, a length frame, a command frame, a data field, a check bit and an end frame;

the start frame is 11 in 16, and the end frame is 38 in 16;

the length frame is 26;

the data field comprises the number of data types, each type length field, corresponding analysis of each type and original data;

The communication protocol is provided by a video image acquisition equipment manufacturer of each wearable equipment, is the prior art and is not described herein.

S2, the Android mobile equipment performs data coding on the formatted image and/or video to obtain a data stream;

further, the data stream includes video and or pictures;

s21, if the data stream is a video data stream, the Android mobile device buffers the video data stream according to the time stamp to obtain RTSP of the video data stream and decodes the RTSP to obtain a decoded video stream;

the format of the recorded file can be MP4 or other formats such as AVI;

further, the step of decoding RTSP is to decode RTSP of the video data stream using an H264 video decoder;

s22, if the data stream is a picture data stream, the Android mobile equipment draws RGB pictures of the picture data stream to obtain a decoded picture;

further, the picture can be in other formats of PNG or JPG format;

s3, the Android mobile equipment applies an image recognition technology to generate a characteristic value of the image;

s31, converting the decoded video stream and or the decoded picture into a picture frame;

S311, if the video stream is the decoded video stream, segmenting the decoded video stream into picture frames;

s312, if the decoded picture is the decoded picture, the decoded picture is directly used as a picture frame without processing;

s4, acquiring a characteristic value from the picture frame by applying an image recognition technology;

further, the step of obtaining the feature value from the picture frame by the image recognition technology includes:

s41, converting the picture frame obtained in the step S3 into a gray image;

s42, establishing a scale space, namely establishing a Gaussian difference (DoG) pyramid, and establishing a Gaussian character tower by using a gray level image, wherein the Gaussian blur coefficient calculation formula is as follows:

sigma 0 is the scale of the reference layer, i.e. the initial scale of the image;

o is an index value of the group coordinate group number;

r is the index value of each group of layers;

s is the number of groups of the scale space for finding the extreme point, and the default value is 3.

According to the 3 sigma principle, a Gaussian blur coefficient calculation formula is applied, an NxN template is used for layering the gray image, the gray image is operated at each pixel point, wherein N= [ (6sigma+1) ] is adopted, and the nearest odd number is taken upwards. Using a separate gaussian convolution, namely convolving the image once along the X direction with a 1×n template, and convolving the image once again along the Y direction with an n×1 template, wherein n= [ (6σ+1) ] and the nearest odd number is taken upwards, reducing the severe loss of direct convolution to the image edge information, thereby obtaining layered gray image data;

In the layered gray image data, a Gaussian differential pyramid is generated by subtracting two adjacent layers (the next layer is subtracted from the last layer) in each group in the Gao Sijin sub-tower by using a Gaussian differential pyramid formula;

the gaussian difference pyramid formula is as follows:

L(x,y,σ)＝G(x,y,σ)*I(x,y)

D(x,y,σ)＝[G(x,y,kσ)-G(x,y,σ)]*I(x,y)＝L(x,y,kσ)-L(x,y,σ)

s43, detecting a spatial extreme point (namely a key point) through the Gaussian difference pyramid obtained in the step S42;

detecting extreme points in the scale space, and accurately positioning and screening to create a memory storage with a default size;

the scale space refers to a multidimensional space of a Gaussian differential pyramid;

the key points consist of local extreme points of the DOG space, and the preliminary exploration of the key points is completed through comparison between two adjacent layers of images of each DOG in the same group.

To find the spatial extreme points of the DoG function, each pixel is compared with all its neighbors to see if it is larger or smaller than its neighbors in the image and scale domains. The middle detection point is compared with 26 points which are 8 adjacent points of the same scale and 9 multiplied by 2 points corresponding to the upper and lower adjacent scales, so that the spatial extreme points are detected in the scale space and the two-dimensional image space.

Then, precisely determining the position and the scale of the key point by fitting a three-dimensional quadratic function to obtain the precise position of the characteristic point;

And calculating by utilizing a Taylor expansion (interpolation function) of the DoG function in a scale space while obtaining the accurate position of the characteristic point to obtain a fitting offset and a scale (sigma), and adding the fitting offset and the scale (sigma) to the accurate position of the characteristic point to obtain a home position and the fitting offset and the fitting scale (sigma).

The Taylor expansion (interpolation function) of the DoG function in the scale space is utilized for calculation, and the purpose of obtaining the fitting offset and the scale (sigma) is to remove key points with low contrast and unstable edge response points so as to enhance the matching stability and improve the noise resistance.

In order to improve the stability of key points and the noise immunity, a mode of curve interpolation is adopted for the DoG function in the scale space.

And calculating by utilizing a Taylor expansion (interpolation function) of the DoG function in a scale space to obtain the fitting offset and the scale (sigma), so that the stability and the noise resistance of the key points are improved.

The taylor expansion is:

wherein the partial derivative, the second partial derivative, and the second mixed partial derivative of f are:

when the offset in either dimension is greater than 0.5 (i.e., x or y or σ), meaning that the interpolation center has been offset to its neighbors, the position of the current keypoint is changed while iteratively interpolating at the new position until convergence; it is also possible that the set number of iterations is exceeded or that the extent of the image boundary is exceeded, at which point such points should be deleted. In addition, too small points are susceptible to noise and become unstable, so extreme points smaller than a certain empirical value are deleted.

S44, adding the fitting offset and the scale (sigma) to the original position, and carrying out key point direction distribution and characteristic point direction assignment;

in order to make the descriptor have rotation invariance, a reference direction is allocated to each key point by utilizing the local feature of the gray image, so that each feature point has three pieces of information: position, scale, orientation.

The invention uses the image gradient method to calculate the stable direction of the local structure; after the gradient calculation of the key points is completed, the gradient and the direction of the pixels in the histogram statistical field are used. The gradient histogram divides the direction range of 0-360 degrees into 36 columns, wherein each column is 10 degrees, the peak direction of the histogram represents the main direction of the key point, the peak value of the direction histogram represents the direction of the neighborhood gradient at the characteristic point, and the maximum value in the histogram is taken as the main direction of the key point. To enhance the robustness of the matching, only the direction with peak value 80% greater than the peak value of the main direction is reserved as the auxiliary direction of the key point. The key points are copied into a plurality of key points, the direction values are respectively assigned to the copied key points, and the discrete gradient direction histogram is subjected to interpolation fitting processing to obtain more accurate direction angle values. In order to prevent a certain gradient direction angle from being mutated due to interference by noise, the gradient direction histogram is smoothed. The present invention refers to the smoothing formula used by OpenCV, which is:

Where i ε [0, 35], H and H represent the histogram before and after smoothing, respectively. Since the angle is cyclic, i.e., 0 ° =360°, if h (j) occurs, j is out of the range of (0, …, 35), then its corresponding value between 0 ° =360°, such as h (-1) =h (35), can be found by a circular cyclic method. So far, the key points of the image are detected, and each key point has three pieces of information: position, scale and direction. A feature region can thus be determined.

S45, carrying out feature point descriptors and key point matching according to the feature areas obtained in the step S44. For each key point, there are three pieces of information: position, scale, and orientation.

Establishing a descriptor for each key point so that the descriptor does not change along with various changes, such as illumination change, visual angle change and the like; and the descriptors should have a high uniqueness in order to increase the probability of the feature points matching correctly. The region near the key point is divided into d×d (proposed d=4) sub-regions, each sub-region serves as a seed point, and each seed point has 8 directions. In actual computation, tri-linear interpolation is used, and the required image window edge length is 3×3×σ_oct× (d+1). The radius of the image area required for the actual calculation is given by the rotation factor:

After the actual radius is obtained, the sampling points in the neighborhood are distributed into the corresponding subareas, gradient values in the subareas are distributed to 8 directions, and the weight of the gradient values is calculated. The subscript (x ", y") of the sample point in the sub-region is linearly interpolated to calculate its contribution to each seed point. Similarly, the contribution factors for two adjacent columns are dc and 1-dc, and the contribution factors for two adjacent directions are do and 1-do. The final accumulated gradient magnitude in each direction is:

weight＝w×dr ^k ×(1-dr) ^1-k ×dc ^m ×(1-dc) ^1-m ×do ⁿ ×(1-do) ^1-n

where k, m, n is either 0 (the pixel point is beyond the range of four adjacent subintervals to the interval to be interpolated) or 1 (the pixel point is within the range of one of four adjacent subintervals to the interval to be interpolated). The above-counted 4×4×8=128 gradient information is the feature vector of the key point.

After the feature vector is formed, the influence of illumination change is removed, the feature vector is normalized, and the gradient of each point of the image is obtained by subtracting the neighborhood pixels for the integral drifting of the gray value of the image, so that the influence of illumination change can be removed. The obtained descriptor vector is H= (H1, H2, …, H128), the normalized feature vector is L= (L1, L2, …, L128), and the descriptor vector is the feature value;

S5, the Android mobile equipment compares the characteristic values with the characteristic values in the local comparison library, if the comparison is successful, a key target is detected, and a picture frame corresponding to the successfully compared characteristic values is used as early warning reminding information;

or the Android mobile equipment compares the characteristic value with the characteristic value in the cloud comparison library, if the comparison is successful, a key target is detected, and a picture frame corresponding to the successfully compared characteristic value is used as early warning reminding information;

specifically, a key point description subset is respectively built for a template image (a local comparison library or a cloud comparison library pre-stored image) and a real-time image (a frequency image acquisition device observation image), the target identification is completed through the comparison of 128-dimensional key point descriptors in the two-point set, and the picture frames corresponding to the compared 128-dimensional key point descriptors are key targets, so that key target detection is realized, and key target picture frames are obtained.

The template map comprises a local comparison library or a cloud comparison library pre-stored map, and the real-time map comprises an observation map of the frequency image acquisition equipment;

comparison library: the method comprises a local comparison library and a cloud comparison library, wherein only 128-dimensional key point description sub-feature vectors of key targets are stored, and entity visual pictures are not stored.

And S6, the Android mobile equipment displays early warning reminding information so as to remind the police.

And displaying the key target picture frame as early warning reminding information through the Android mobile equipment so as to remind the police, wherein the reminding mode can be vibration, sound, images and the like.

The beneficial effects are that: according to the identification enabling method based on the Android system mobile equipment, on the basis of conventional integrated wearable image identification equipment, the Android system host is used as a carrier through communication protocol analysis and image identification, and on the function of the conventional integrated wearable image identification equipment, the operation capability of the Android equipment is used for enabling the video image acquisition equipment without image processing capability to have an image identification function, and the video image acquisition equipment is decoupled from an image operation unit; compared with the conventional integrated wearable image recognition equipment, when police or security personnel patrol and duty, equipment is damaged, only damaged equipment units can be replaced, the use cost can be reduced, and the universality of the equipment is improved. In addition, the local comparison library and the cloud comparison library are set according to the characteristics of different patrol modes of police or security personnel, and the image recognition speed and the image recognition quantity can be further improved according to different tasks.

Drawings

FIG. 1 is a technical framework diagram;

FIG. 2 is a schematic diagram of the RTSP custom protocol structure;

FIG. 3 is a diagram of an RTSP custom protocol framework;

fig. 4 is an RTSP custom protocol table.

The specific embodiment is as follows:

example 1: an identification enabling method based on Android system mobile equipment comprises the following steps:

the start frame is 11 in 16, and the end frame is 38 in 16;

the length frame is 26;

Further, the data stream includes video and or pictures;

the format of the recorded file can be MP4 or other formats such as AVI;

further, the picture can be in other formats of PNG or JPG format;

s41, converting the picture frame obtained in the step S3 into a gray image;

o is an index value of the group coordinate group number;

r is the index value of each group of layers;

The gaussian difference pyramid formula is as follows:

L(x,y,σ)＝G(x,y,σ)*I(x,y)

D(x,y,σ)＝[G(x,y,kσ)-G(x,y,σ)]*I(x,y)＝L(x,y,kσ)-L(x,y,σ)

The taylor expansion is:

weight＝w×dr ^k ×(1-dr) ^1-k ×dc ^m ×(1-dc) ^1-m ×do ⁿ ×(1-do) ^1-n

The beneficial effects are that: according to the identification enabling method based on the Android system mobile equipment, on the basis of conventional integrated wearable image identification equipment, the Android system host is used as a carrier through communication protocol analysis and image identification, and on the function of the conventional integrated wearable image identification equipment, the operation capability of the Android equipment is used for enabling the video image acquisition equipment without image processing capability to have an image identification function, and the video image acquisition equipment is decoupled from an image operation unit; compared with the conventional integrated wearable image recognition equipment, when police or security personnel patrol and duty, equipment is damaged, only damaged equipment units can be replaced, the use cost can be reduced, and the universality of the equipment is improved.

Example 2: the invention has the following working processes that the method for identifying and enabling the mobile equipment based on the Android system comprises the steps that the multi-equipment communication protocol is developed based on the RTSP protocol, the RTSP analysis code is based on the library of Live555 and FFMpeg, data is changed through a custom transmission protocol (figure 2), the protocol is unchanged, the analysis is unchanged, a set of protocols is compatible with all protocols, a set of analysis is compatible with all analyses, and then data receiving of various RTSP client protocols of video acquisition equipment is realized, and then data analysis is carried out.

The multi-device communication protocol in the invention increases the fields with variable types of the data fields, increases the fields with variable numbers of the data fields, increases the fields with the length of each type in the data fields, increases the corresponding analysis fields of each type in the data fields, and the protocol rules are shown in figure 4. The RTSP analysis codes are based on Live555 and FFMpeg libraries, each application has proper application scenes, compatibility has advantages and disadvantages, and the system is compatible with third-party RTSP services to the greatest extent, such as various network cameras, local cameras, RTSP servers written by other companies, and the like.

The video transmission decoding of the invention is developed based on an FFMpeg decoder, as shown in fig. 2 and 3, the system decodes at a C++ layer to ensure the use efficiency of a memory, decodes and restores the data such as H.264, AAC and the like to original data, uses queue management, buffers the decoded data and queues the data according to a time stamp. The layer rendering calls the data back to the upper layer, and the drawing is performed in the rendering thread due to the EGL surface.

The picture to be identified by the image identification algorithm is obtained frame by frame after FFMpeg is decoded and then converted into a gray image.

The image recognition algorithm of the invention: using a specific formula and algorithm to realize the extraction of sub-feature vectors of the frame-by-frame key point description of the video stream;

The operation steps are as follows:

step one, establishing a scale space, namely establishing a Gaussian difference (DoG) pyramid, and establishing the Gaussian pyramid by using a gray level image, wherein the Gaussian blur coefficient calculation formula is as follows:

sigma 0 is the scale of a reference layer, namely the initial scale of an image, o is the index value of the number of groups of coordinates, r is the index value of each group of layers, s is the number of groups of the scale space for searching the extreme point, and the default value is 3. According to the 3σ principle, a template of NxN is used to operate at each pixel point of the image, where n= [ (6σ+1) ] and the nearest neighbor is taken upward. The use of separate gaussian convolutions, i.e. the image is convolved once in the X direction with a 1xN template and then again in the Y direction with an Nx1 template, where n= [ (6σ+1) ] and the nearest odd number is taken up, reduces the severe loss of image edge information by direct convolution. And subtracting two adjacent layers (the next layer is subtracted from the previous layer) in each group in the Gaussian golden sub-tower in the obtained data to generate a Gaussian differential pyramid, wherein the formula is as follows:

L(x,y,σ)＝G(x,y,σ)*I(x,y)

D(x,y,σ)＝[G(x,y,kσ)-G(x,y,σ)]*I(x,y)＝L(x,y,kσ)-L(x,y,σ)

and step two, detecting the spatial extreme points (namely key points), detecting the extreme points in the scale space, and accurately positioning and screening to create a memory storage with a default size. The key points are formed by local extreme points of the DOG space, and the preliminary exploration of the key points is completed through the comparison between two adjacent layers of images of each DOG in the same group. To find the extreme points of the DoG function, each pixel is compared with all its neighbors to see if it is larger or smaller than its neighbors in the image and scale domains. The middle detection point is compared with 26 points which are 8 adjacent points of the same scale and 9×2 points corresponding to the upper and lower adjacent scales, so as to ensure that extreme points are detected in both scale space and two-dimensional image space. And then precisely determining the position and the scale of the key points by fitting a three-dimensional quadratic function, and simultaneously removing the key points with low contrast and unstable edge response points so as to enhance the matching stability and improve the noise resistance. In order to improve the stability of the keypoints, curve interpolation is required for the scale space DoG function. The taylor expansion (interpolation function) in the scale space using the DoG function is:

when the offset in either dimension is greater than 0.5 (i.e., x or y or sigma), meaning that the interpolation center has been offset to its neighbors, the position of the current keypoint is changed while iteratively interpolating at the new position until convergence; it is also possible that the set number of iterations is exceeded or that the extent of the image boundary is exceeded, at which point such points should be deleted. In addition, too small points are susceptible to noise and become unstable, so extreme points smaller than a certain empirical value are deleted. Meanwhile, the accurate position of the characteristic point is obtained in the process, namely the original position plus the offset and the scale (sigma) of fitting.

Thirdly, distributing key point directions, assigning characteristic point directions, and distributing a reference direction for each key point by utilizing local characteristics of an image to enable each characteristic point to have three information in order to enable the descriptor to have rotation invariance: position, scale, orientation. The invention uses the image gradient method to calculate the stable direction of the local structure. After the gradient calculation of the key points is completed, the gradient and the direction of the pixels in the histogram statistical field are used. The gradient histogram divides the direction range of 0-360 degrees into 36 columns, wherein each column is 10 degrees, the peak direction of the histogram represents the main direction of the key point, the peak value of the direction histogram represents the direction of the neighborhood gradient at the characteristic point, and the maximum value in the histogram is taken as the main direction of the key point. To enhance the robustness of the matching, only the direction with peak value 80% greater than the peak value of the main direction is reserved as the auxiliary direction of the key point. The key points are copied into a plurality of key points, the direction values are respectively assigned to the copied key points, and the discrete gradient direction histogram is subjected to interpolation fitting processing to obtain more accurate direction angle values. In order to prevent a certain gradient direction angle from being suddenly changed due to noise interference, it is also necessary to smooth the gradient direction histogram. The present invention refers to the smoothing formula used by OpenCV, which is:

Where i ε [0, 35], H and H represent the histogram before and after smoothing, respectively. Since the angle is cyclic, i.e., 0 ° =360°, if h (j) occurs, j is out of the range of (0, …, 35), then its corresponding value between 0 ° =360 v, such as h (-1) =h (35), can be found by a circular cyclic method. So far, the key points of the image are detected, and each key point has three pieces of information: position, scale and direction. A feature region can thus be determined.

And step four, feature point descriptors and key points are matched. Through the above steps, there are three pieces of information for each key point: position, scale, and orientation. A descriptor is then created for each keypoint that does not change with changes such as illumination changes, viewing angle changes, etc. And the descriptors should have a high uniqueness in order to increase the probability of the feature points matching correctly. The region near the key point is divided into d×d (d=4 is proposed) sub-regions, each sub-region is used as a seed point, and each seed point has 8 directions. In consideration of actual calculation, three-linear interpolation is needed, and the window side length of the needed image is 3×3×σ_oct×

(d+1). In consideration of rotation factors, the radius of the image area required for actual calculation is:

weight＝w×dr ^k ×(1-dr) ^1-k ×dc ^m ×(1-dc) ^1-m ×do ⁿ ×(1-do) ^1-n

where k, m, n is either 0 (the pixel point is beyond the range of four adjacent subintervals to the interval to be interpolated) or 1 (the pixel point is within the range of one of four adjacent subintervals to the interval to be interpolated). The above-counted 4×4×8=128 gradient information is the feature vector of the key point. After the feature vectors are formed, in order to remove the influence of illumination change, normalization processing is needed to be carried out on the feature vectors, and the gradient of each point of the image is obtained by subtracting pixels in the neighborhood for the integral drift of the gray value of the image, so that the feature vectors can be removed. The obtained descriptor vector is H= (H1, H2, …, H128), the normalized feature vector is L= (L1, L2, …, L128), a key point descriptor set is respectively established for a template diagram (a local comparison library or a cloud comparison library pre-stored diagram) and a real-time diagram (a frequency image acquisition device observation diagram), and target identification is completed through comparison of 128-dimensional key point descriptors in the two-point set, so that key target detection is realized.

The image recognition comparison library is a local comparison library and a cloud comparison library, only 128-dimensional key point description sub-feature vectors of key targets are stored in the comparison library, and entity visual pictures are not stored. When police or security personnel execute a specific target capturing task, configuring a local comparison library in the Android mobile equipment, and rapidly giving a comparison result by a system in a single machine mode; when patrol and entrance guard tasks are executed, the system is configured into a cloud end comparison library, the cloud end comparison library locally calculates characteristic values and then sends the characteristic values to the cloud end, the characteristic values are compared in many-to-many mode from a large number of related personnel lists, and a variety of image recognition functions are given to video image acquisition equipment which does not have image processing capability.

According to the method, the key target detection is carried out according to an image recognition algorithm, when the key target passes through a video acquisition device, the video acquisition device transmits the video stream to an Android system mobile device to receive and complete conversion calculation of characteristic values, the calculated characteristic values are matched with target characteristic values in a comparison library in real time, so that comprehensive similarity in multiple dimensions is obtained, when the similarity reaches a preset threshold value of 80%, the current key target is considered to exist in the comparison library, the Android system mobile device sends out attention-requesting prompt tone, and meanwhile the Android system mobile device starts vibrating to play a role of early warning prompt, an early warning record is displayed on a recognition interface of the Android system mobile device in real time, and the key target is displayed by drawing lines and capturing pictures in the current environment and corresponding target information in the comparison library.

According to the identification enabling method based on the Android system mobile equipment, through the analysis of a unique communication protocol of a general network and an image identification algorithm, an Android system host is used as a carrier, and the operation capability of the Android equipment is used for enabling the video image acquisition equipment without image processing capability to have an image identification function on the function of the conventional integrated wearable image identification equipment, so that the video image acquisition equipment is decoupled from an image operation unit; compared with the conventional integrated wearable image recognition equipment, when police or security personnel patrol and duty, equipment is damaged, only damaged equipment units can be replaced, the use cost can be reduced, and the universality of the equipment is improved. And a local comparison library and a cloud comparison library are set according to the characteristics of different patrol modes of police or security personnel, so that the image recognition speed and the image recognition quantity are further improved according to different tasks.

Example 3: the embodiment is described with reference to a specific scenario, and the method for enabling identification of mobile equipment based on an Android system according to the present invention may refer to fig. 1, and includes the following steps:

1. video image acquisition device communication

Firstly, after a video acquisition device is started, an open wireless hotspot AP is automatically started, a corresponding device type is selected on an operation front page by an APP preloaded by an Android system mobile device, a specific target acquisition device is prepared to be connected, the operation front page is jumped to a scanning device interface, all wifi lists are scanned, the ssids of the supported video acquisition devices are filtered out through a device white name issued by a cloud, the ssads of the video acquisition devices to be connected are displayed on a scanning result interface of the Android system mobile device, then the ssads of the video acquisition devices to be connected are selected by clicking, the video acquisition devices and the video acquisition devices are located in the same network domain, a custom communication protocol is called through a pre-embedded rtsp address pull stream, and finally the pictures of the video acquisition devices are transmitted to a preview interface of the Android system mobile device, so that the communication process of the video acquisition devices is completed.

2. Video stream frame-by-frame calculation of picture feature values

The parameter of video image preview of the video acquisition equipment is ImageFormat.NV21, NV, which is one of YUV420, a byte [ ] byte stream is obtained through an interface, the intercepted frame image data is in YUV format, YUV value of each pixel point is restored from the code stream according to the sampling format, RGB value of each pixel point is extracted through a conversion formula of YUV and RGB, thus creating Bitmap of a required image frame, and then loading the picture to an algorithm to find key points, thus calculating key point characteristics.

The key point searching algorithm comprises the following steps:

firstly, establishing a scale space, namely establishing a Gaussian difference (DoG) pyramid, and establishing the Gaussian pyramid by using a gray image, wherein Gaussian blur coefficients are established;

detecting a space extreme point (namely a key point), detecting the extreme point in a scale space, and accurately positioning and screening to create a memory storage with a default size;

thirdly, distributing key point directions and assigning characteristic point directions;

and step four, feature point descriptors and key points are matched.

3. Local comparison library and cloud comparison library establishment

Obtaining a key point characteristic value of the picture frame after the second step, wherein the key point characteristic value comprises the following two scenes:

1. The method comprises the steps of establishing a part of special key object specimen local comparison library, identifying accurate individual objects in a use scene in an offline environment without a network, firstly selecting an object comparison library compression package to be loaded through a computer copy mode or a network environment, storing the object comparison library compression package to a local catalog, decompressing the object comparison library compression package in a program, wherein the content contains an excel table and a picture folder, the excel table contains key object information, the key object information comprises an identity card number/license plate number, a name/frame number, gender/brand, a label, notes and the like, storing the picture extraction characteristic values in an identification database table, sequentially loading object information into a personnel information database table for maintenance, and associating the identification database table with the personnel information database table through unique id of the identity card number/license plate number;

2. the cloud end comparison library of the full-quantity key target network is established, the full-quantity key target comparison library is stored in a cloud server under a network environment, a data maintenance mode is consistent with an offline mode, the pictures are stored in an identification database table after feature values are extracted, and target information is sequentially loaded into a personnel information database table for maintenance.

4. Identification and early warning of key targets

When the important target passes through the video acquisition equipment, the video acquisition equipment transmits the video stream to the Android system mobile equipment to receive and finish the conversion calculation of the characteristic value, the calculated characteristic value is matched with the target characteristic value in the comparison library in real time, so that the comprehensive similarity in multiple dimensions is obtained, when the similarity reaches the preset threshold value of 80%, the Android system mobile equipment is considered to be in the comparison library, the Android system mobile equipment sends out attention-seeking prompt sound, and simultaneously the Android system mobile equipment starts vibrating to play a role of early warning prompt, an early warning record is displayed on the identification interface of the Android system mobile equipment in real time, and the important target is displayed by capturing pictures and drawing lines in the current environment and corresponding target information in the comparison library.

According to the identification enabling method based on the Android system mobile equipment, a unique communication protocol analysis and image identification algorithm of a general network is adopted, an Android system host is used as a carrier, the operation capability of the Android equipment is utilized to enable the video image acquisition equipment without image processing capability to have an image identification function on the function of the conventional integrated wearable image identification equipment, and the video image acquisition equipment is decoupled from an image operation unit; compared with the conventional integrated wearable image recognition equipment, when police or security personnel patrol and duty, equipment is damaged, only damaged equipment units can be replaced, the use cost can be reduced, and the equipment universality is improved. And a local comparison library and a cloud end comparison library are set according to the characteristics of different patrol modes of police or security personnel, so that the image recognition speed and the image recognition quantity are further improved according to different tasks.

Claims

1. An identification enabling method based on Android system mobile equipment is characterized by comprising the following steps:

s3, the Android mobile device applies an image recognition technology to generate an image into a picture frame;

respectively establishing a key point descriptor set for the template diagram and the real-time diagram, wherein the identification of the target is completed by comparing the 128-dimensional key point descriptors in the two-point set, and the picture frame corresponding to the 128-dimensional key point descriptors which are compared is a key target, so that the key target is detected, and the key target picture frame is obtained;

s6, the Android mobile equipment displays early warning reminding information so as to remind the police;

displaying the key target picture frame as early warning reminding information through Android mobile equipment, so as to remind the police, wherein the reminding mode can be vibration, sound, images and the like;

the step S4 of obtaining the feature value from the picture frame by applying the image recognition technique includes:

s41, converting the picture frame obtained in the step S3 into a gray image;

s42, establishing a scale space, namely establishing a Gaussian differential pyramid DoG, and establishing the Gaussian differential pyramid DoG by using a gray level image, wherein the Gaussian blur coefficient calculation formula is as follows:

；

o is an index value of the group coordinate group number;

r is the index value of each group of layers;

s is the group number of the scale space for searching the extreme point, and the default value is 3;

according to the 3 sigma principle, using a Gaussian blur coefficient calculation formula, layering a gray image by using an N multiplied by N template, and operating at each pixel point of the gray image, wherein N= [ (6sigma+1) ] and the nearest odd number is taken upwards; using a separate Gaussian convolution, namely convolving the image once along the X direction by using a 1X N template, and convolving the image once again along the Y direction by using an N X1 template, wherein N= [ (6σ+1) ] and the nearest odd number are taken upwards, so that the serious loss of the direct convolution on the image edge information is reduced, and layered gray image data is obtained;

In the layered gray image data, subtracting two adjacent layers in each group in the Gao Sijin sub-tower by using a Gaussian differential pyramid formula to generate a Gaussian differential pyramid;

the gaussian difference pyramid formula is as follows:

；

s43, passing through the Gaussian differential pyramid obtained in the step S42; detecting a spatial extreme point, wherein the spatial extreme point is a key point;

the scale space refers to a multidimensional space of a Gaussian differential pyramid DoG;

the key points consist of local extreme points of the Gaussian differential pyramid DOG space, and the preliminary exploration of the key points is completed through comparison between two adjacent layers of images of each Gaussian differential pyramid DOG in the same group;

in order to find the spatial extreme point of the Gaussian differential pyramid DoG function, each pixel point is compared with all adjacent points of the pixel point to see whether the pixel point is larger or smaller than the adjacent points of the image domain and the scale domain of the pixel point; the middle detection point is compared with 26 points which are 8 adjacent points of the same scale and 9 multiplied by 2 points corresponding to the upper and lower adjacent scales, so that the spatial extreme points are detected in the scale space and the two-dimensional image space;

calculating by utilizing a Taylor expansion of a Gaussian differential pyramid DoG function in a scale space while obtaining the accurate position of the characteristic point, obtaining a fitting offset and a scale sigma by the Taylor expansion, and adding the fitting offset and the scale sigma to the accurate position of the characteristic point to obtain a home position and a fitting offset and a scale sigma;

calculating by utilizing a Taylor expansion of a Gaussian differential pyramid DoG function in a scale space, and obtaining a fitting offset and a scale sigma to remove key points with low contrast and unstable edge response points so as to enhance matching stability and improve noise resistance;

in order to improve the stability of key points and the noise immunity, a mode of performing curve interpolation on a scale space Gaussian differential pyramid DoG function is adopted;

calculating by utilizing a Taylor expansion of a Gaussian differential pyramid DoG function in a scale space to obtain a fitting offset and a scale sigma, so that the stability and noise resistance of key points are improved;

the taylor expansion is:

；

when the offset in either dimension is greater than 0.5, i.e., x or y or σ, meaning that the interpolation center has been offset to its neighbors, the position of the current keypoint is changed while iteratively interpolating at the new position until convergence; it is also possible that the set number of iterations is exceeded or that the extent of the image boundary is exceeded, in which case such points should be deleted; in addition, too small points are susceptible to noise interference and become unstable, so extreme points smaller than a certain empirical value are deleted;

s44, adding the fitted offset and the scale sigma to the original position, and carrying out key point direction distribution and characteristic point direction assignment;

in order to make the descriptor have rotation invariance, a reference direction is allocated to each key point by utilizing the local feature of the gray image, so that each feature point has three pieces of information: position, scale, orientation;

solving the stable direction of the local structure by using an image gradient method; after finishing gradient calculation of the key points, using the gradient and the direction of pixels in the histogram statistical field; the gradient histogram divides the direction range of 0-360 degrees into 36 columns, wherein each column is 10 degrees, the peak value direction of the histogram represents the main direction of the key point, the peak value of the direction histogram represents the direction of the neighborhood gradient at the characteristic point, and the maximum value in the histogram is taken as the main direction of the key point; in order to enhance the robustness of the matching, only the direction with the peak value being 80% greater than the peak value of the main direction is reserved as the auxiliary direction of the key point; copying the key points into a plurality of key points, respectively giving direction values to the copied key points, and carrying out interpolation fitting treatment on discrete gradient direction histograms to obtain more accurate direction angle values; in order to prevent a certain gradient direction angle from suddenly changing due to interference of noise, smoothing the gradient direction histogram; an OpenCV smoothing formula is adopted, which is as follows:

；

Wherein i is E [0, 35]H and H represent histograms before and after smoothing, respectively; since the angle is cyclic, i.e. 0 ⁰ =360 ⁰ If h (j) appears, j is out of the range of (0, …, 35), finding its corresponding position at 0 by a circular method ⁰ =360 ⁰ Values in between, such as h (-1) =h (35); so far, the key points of the image are detected, and each key point has three pieces of information: position, scale and direction; thereby can be used forTo determine a feature region;

s45, carrying out feature point descriptors and key point matching according to the feature areas obtained in the step S44; three pieces of information are available for each key point: position, scale, and orientation;

establishing a descriptor for each key point so that the descriptor does not change along with various changes, including illumination changes and visual angle changes; and the descriptor should have higher uniqueness so as to improve the probability of correct matching of the feature points; dividing an area near the key point into d sub-areas, wherein each sub-area is used as a seed point, and each seed point has 8 directions; in actual calculation, three-linear interpolation is adopted, and the required image window side length is 3×3×σ_oct× (d+1); the radius of the image area required for the actual calculation is given by the rotation factor:

；

After the actual radius is obtained, distributing sampling points in the neighborhood into corresponding subareas, distributing gradient values in the subareas to 8 directions, and calculating weights of the gradient values; linear interpolation of subscripts (x '', y '') of the sampling points in the sub-regions, calculating their contribution to each seed point; similarly, the contribution factors to two adjacent columns are dc and 1-dc, and the contribution factors to two adjacent directions are do and 1-do; the final accumulated gradient magnitude in each direction is:

；

wherein k, m and n are 0, and the pixel point exceeds the range of four adjacent subintervals of the interval to be interpolated; or 1, the pixel point is in the range of one of four adjacent subintervals of the interval to be interpolated; the above-counted 4×4×8=128 gradient information is the feature vector of the key point;

after the feature vector is formed, removing the influence of illumination change, carrying out normalization processing on the feature vector, and obtaining gradients of each point of the image by subtracting neighborhood pixels for integral drifting of the gray value of the image, so that the influence of illumination change can be removed; the obtained descriptor vector is H= (H1, H2, …, H128), the normalized feature vector is L= (L1, L2, …, L128), and the descriptor vector is the feature value.

2. The Android system mobile device-based identification enabling method according to claim 1, wherein the method comprises the following steps: in step S1, the communication protocol includes a start frame, a length frame, a command frame, a data field, a check bit, and an end frame;

The start frame is 11 in 16, and the end frame is 38 in 16;

the length frame is 26;

the data field comprises the number of data types, each type length field, each type corresponding analysis and original data.

3. The Android system mobile device-based identification enabling method according to claim 1, wherein the method comprises the following steps: in step S2, the data stream includes video and/or pictures;

the format of the recorded file is MP4 or AVI format;

s22, if the data stream is a picture data stream, the Android mobile device draws RGB pictures of the picture data stream to obtain a decoded picture.

4. The Android system mobile device-based identification enabling method according to claim 1, wherein the method comprises the following steps: step S3 comprises the sub-steps of,

s31, converting the decoded video stream and/or the decoded picture into a picture frame;

s312, if the decoded picture is not processed, the decoded picture is directly used as a picture frame.