CN113220114A

CN113220114A - Embedded non-contact elevator key interaction method integrating face recognition

Info

Publication number: CN113220114A
Application number: CN202110086981.6A
Authority: CN
Inventors: 谢巍; 许练濠; 卢永辉; 吴伟林
Original assignee: South China University of Technology SCUT
Current assignee: South China University of Technology SCUT
Priority date: 2021-01-22
Filing date: 2021-01-22
Publication date: 2021-08-06
Anticipated expiration: 2041-01-22
Also published as: CN113220114B

Abstract

The invention discloses an embeddable non-contact elevator key interaction method integrating face recognition, which comprises the steps of firstly carrying out edge detection on a shooting area in an original image through a Laplace filter operator to obtain an edge image, and filtering the edge image by utilizing linear filter operators in the horizontal direction and the vertical direction; then, respectively carrying out linear detection on the images filtered in the horizontal direction and the images filtered in the vertical direction by adopting a Hough linear detection algorithm so as to position the region of the elevator key panel and solve a homography transformation matrix; and detecting and positioning the fingers of the elevator user by using an improved YOLOv3 algorithm, obtaining floor keys to which the fingers point according to the homography transformation matrix, and simultaneously obtaining the face information of the resident to perform double verification. The invention can accurately identify the elevator key selected by the elevator user, can realize non-contact taking of the elevator, and ensures the safety of residents through double verification of the floor and the face information of the residents.

Description

Embedded non-contact elevator key interaction method integrating face recognition

Technical Field

The invention relates to the technical field of computer vision and human-computer interaction, in particular to an embeddable non-contact elevator key interaction method integrating face recognition.

Background

Nowadays, the wide application of elevators in urban high-rise buildings has become an indispensable riding tool for people living and working at high-rise. Generally, elevator button adopts the contact, and people need contact elevator button to select preceding floor and control elevator door and open, close, and all have different people to press elevator button in the elevator every day, and this makes and has multiple bacterium or virus on the elevator button, causes cross infection easily, very easily increases the several rates of transmission.

With the development of scientific technology, man-machine interaction technology becomes diversified, people no longer need to simply present a virtual scene, and begin to explore an interaction method with a virtual world, so that more and more novel man-machine interaction technologies come into play. Human-computer interaction techniques fall into several categories: the traditional interactive technology taking a keyboard and a mouse as input; interaction technologies based on touch screen devices, such as smart phones, tablet computers; non-contact interaction technologies based on machine vision and image processing technologies, such as virtual keyboards, gesture interaction systems and the like.

Hiroki Goto et al studied a camera projection interaction system based on a frame difference method and a hand skin color extraction method: firstly, separating hands from a scene based on clustering characteristics of hand skin colors in HSV and YCbCr spaces, and then detecting fingertip positions on a separated foreground image by using a template matching method, thereby realizing projection interaction between a user and a computer or a family television. Fitriani et al propose a human-computer interaction system based on a deformable projection surface, which projects a virtual scene onto the surface of an easily deformable object, then detects the deformation generated when a user touches a projection screen, and analyzes interaction information through an image processing algorithm and a deformation model of the object.

However, the above solutions based on machine vision techniques and image processing algorithms have a common drawback: the diversity of the projection scene cannot be guaranteed. For example, in an interactive system based on hand skin color, when the projected scene is similar to the hand skin color, the effect of the hand foreground separation algorithm is greatly reduced. For an interactive system based on a deformation surface, although the system can keep stable operation in a projection scene set by the system, if the system is applied to a changeable projection scene, deformation detection of a projection image becomes inaccurate, different schemes need to be designed for different scenes, and the development cost of the system is high.

Disclosure of Invention

The invention aims to overcome the defects and shortcomings of the prior art and provides an embeddable non-contact elevator key interaction method integrating face recognition.

It is a second object of the invention to provide a computing device.

A third object of the present invention is to provide an elevator.

The first purpose of the invention is realized by the following technical scheme: an embeddable non-contact elevator key interaction method integrating face recognition comprises the following steps:

s1, obtaining an original image shot by a camera in the elevator car, and carrying out edge detection on a shot area of the original image through a Laplace filter operator to obtain an edge image;

s2, filtering the edge image by using linear filtering operators in the horizontal direction and the vertical direction to enhance the linear edges in the horizontal direction and the vertical direction, and reserving the edges of the elevator key panel area while eliminating noise;

s3, respectively carrying out linear detection on the image subjected to horizontal direction filtering and the image subjected to vertical direction filtering by adopting a Hough linear detection algorithm so as to position the region of the elevator key panel;

s4, solving the mapping relation under the view angle transformation by using the homography transformation matrix;

s5, detecting and positioning the fingers of the elevator users in the original images by using an improved YOLOv3 algorithm, and obtaining floor keys to which the fingers point according to the homography transformation matrix;

s6, acquiring the face information of the resident of the floor pointed by the finger, performing double verification on whether the elevator user is a resident and whether the floor pointed by the finger is the floor where the elevator user lives, wherein the floor key is selected under the condition that the double verification is passed, and finally controlling the elevator car to run to the floor.

Preferably, the camera is arranged above the elevator key panel and shoots the elevator key panel downwards;

in step S1, the process of edge detection of the camera shooting region by the Laplace filter operator is as follows:

s11, carrying out graying processing on the original image to obtain a grayscale image;

s12, based on the principle of no-leakage elevator key panel boundary, adopting a second-order gradient Laplace filter operator to detect the edge of the gray image, wherein the Laplace filter operator specifically utilizes second-order difference to calculate the edge gradient, and the process is as follows:

consider the one-dimensional sequence { f (1), f (2), … f (x-1), f (x +1) } with the second order difference at the x position expressed as:

f``(x)＝(f(x+1)-f(x))-(f(x)-f(x-1))

further simplifying as follows:

f``(x)＝f(x-1)-2*f(x)+f(x+1)

i.e. the second order difference of a one-dimensional discrete sequence can be expressed as the result of the convolution of the sequence with a one-dimensional convolution kernel [ +1, -2, +1], generalizing this conclusion into a two-dimensional matrix of the grayscale image:

for gray scale image I_grayTwo-dimensional kernel K with dimension 3 x 3_L：

Since the two-dimensional kernel only considers the horizontal direction and the vertical direction, the diagonal information is taken into consideration, and the convolution kernel K is_LReplacing the steps as follows:

the second order difference information of the gray level image is obtained by convolution of the convolution kernel and the gray level image, namely:

G＝K_L*I_gray

as the convolution kernel scale increases, the more pronounced the detected edge is;

and (4) taking out points with convolution results of 0, wherein the points are edges, and the edge image is a set of points with obvious gray level change in the gray level image.

Preferably, the process of step S2 is as follows:

s21, defining the size as 1 x n pairs of horizontal straight line filter operators K_horizontalAnd a vertical line filter operator K of size n × 1_vertical：

In the formula, T represents the transposition of a vector pair, and n represents the size of a filter operator; k_horizontalSensitive to horizontal linear edges, K_verticalSensitive to vertical linear edges;

s22, filtering the Laplace edge image l_LaplaceRespectively withConvolving the two operators to obtain a horizontal filtering image I_horizontalAnd filtering the image I in the vertical direction_vertical：

I_horizontal＝K_horizontal*I_Laplace

I_vertical＝K_vertical*I_Laplace。

Preferably, the process of step S3 is as follows:

s31, considering that after the edge image is filtered in the horizontal direction and the vertical direction, the non-horizontal or vertical linear edge of the edge image can be inhibited, firstly, a threshold is used for segmenting and removing the non-horizontal linear edge and the non-vertical linear edge;

s32, respectively carrying out linear detection on the horizontal direction filtering image and the vertical direction filtering image which are subjected to threshold segmentation by using a Hough linear detection algorithm, and finally obtaining four elevator key panel boundary straight lines;

s33, solving the intersection point of the four straight lines of the elevator key panel boundary in pairs to obtain four vertex coordinates (x) of the upper left, the lower right and the upper right of the elevator key panel area in the original image_lx,y_lt),(x_lb,y_lb),(x_rb,y_rb),(x_rt,y_rt)。

Furthermore, homography transformation reflects the process of mapping from one two-dimensional plane to a three-dimensional space and then mapping from the three-dimensional space to another two-dimensional plane, and X-Y-Z is taken as a three-dimensional space coordinate system and can be understood as a world coordinate system, X-Y is a pixel plane space coordinate system, and X '-Y' is an elevator key panel plane coordinate system; the homography transform can be described as: a point (X, Y) on the X-Y coordinate system, corresponding to a straight line l passing through the origin and the point on the X-Y-Z coordinate system:

the straight line intersects the x '-y' coordinate system plane at a point (x ','), and the process from point (x, y) to point (x ',') is called homography;

the process of solving the mapping relation under the view transformation by using the homography transformation matrix is as follows:

s41, setting the X '-Y' plane to be vertical to the Z axis of the X-Y-Z space coordinate system and to be intersected with the Z axis to be a point (0,0,1), namely, the point (X ', Y') under the X '-Y' plane coordinate system is a point (X ', Y', 1) under the X-Y-Z space coordinate system, and describing the mapping relation between the X-Y plane coordinate system and the X-Y-Z space coordinate system by using a homography transformation matrix H:

in the formula, h₁～h₉9 transformation parameters for the homography matrix;

further obtaining the mapping relation from the x-y plane coordinate system to the x '-y' plane coordinate system as follows:

the H matrix has 9 transformation parameters, but actually has only 8 degrees of freedom, because the X-Y-Z spatial coordinate system is a homogeneous coordinate system, and regardless of the coordinate transformation of coordinate scaling, when the H matrix is multiplied by a scaling factor k:

k x H and H actually represent the same mapping relationship, so H has only 8 degrees of freedom;

s42, solving H, wherein one method is to use H₉Setting to 1, the equation to be solved is:

another approach is to add a constraint to the homography matrix H, modulo 1, as follows:

the equation to be solved is then:

and S43, defining target coordinate points of the elevator key panel under the scene coordinate system of the elevator key panel by the four vertexes of the elevator key panel under the pixel coordinate system obtained in the step S3:

(x_lt,y_lt)→(x_lt′,y_lt′)

(x_lb,y_lb)→(x_lb′,y_lb′)

(x_rb,y_rb)→(x_rb′,y_rb′)

(x_rt,y_rt)→(x_rt′,y_rt′)

these target coordinates are respectively substituted into the equation to be solved in step S42, and the H matrix is solved simultaneously.

Preferably, the improved YOLOv3 algorithm comprises improving the loss function of the YOLOv3 target detection algorithm and reducing the feature extraction part of the YOLOv3 network by adopting an adaptive pruning algorithm.

Further, the loss function of the YOLOv3 network is designed as follows:

where the first term is the coordinate error loss, λ_coordIs a coordinate loss function coefficient; s denotes dividing the input image into S × S meshes; b represents the number of frames included in one mesh;

whether the jth frame of the ith grid contains an object or not is represented, the containing time value is 1, and the non-containing time value is 0; x and y respectively represent the center coordinates of the frame; w and h respectively represent the length and width of the frame; r is_ijX, y, w, h representing the jth prediction box of the ith mesh;

x, y, w, h representing the jth real box of the ith network;

the second term and the third term are confidence loss,

whether the jth frame of the ith grid does not contain an object or not is represented, the value of the non-containing time is 1, and the value of the containing time is 0; lambda [ alpha ]_noobjTo balance the loss weights of object-bearing and object-free meshes, the goal is to reduce the confidence loss of the mesh borders without objects; c_ijRepresenting the confidence of the jth frame prediction of the ith grid;

representing the confidence of the jth frame of the ith grid;

the fourth term is category loss, classes represents the number of categories; p is a radical of_ij(c) Representing the predicted probability that the jth frame of the ith grid belongs to the class c object;

representing the true probability that the jth frame of the ith grid belongs to the class c object;

the improvement of the loss function is specifically as follows:

(1) and introducing FocalLoss for a third term, namely confidence coefficient loss to improve the learning capability of the model on the difficult samples, wherein Focallos is improved based on cross entropy and has the following functional form:

in the formula, the ratio of y,

respectively representing predicted and true probability values, i.e. p_ij(c),

Alpha is FocalLoss super ginseng;

the improved confidence loss function is as follows:

(2) adding an adaptive scaling factor to the first term, coordinate loss, as follows:

in the formula (I), the compound is shown in the specification,

represents the width and height of the real bounding box; rho_boxThe range of (1) to (2), and the smaller the real frame, the larger the value;

the improved coordinate loss is as follows:

furthermore, the YOLOv3 network adopts the dark net-53 as a feature extraction subject, and aiming at the problem of dark net-53 complexity redundancy, the YOLOv3 network adopts a network pruning algorithm in a structure-based pruning method to carry out channel-level pruning on the network so as to reduce the number of the feature channels of the network:

first, each convolutional layer is added with a BN layer, and when the BN operation is used in the convolutional neural network, each input characteristic channel is allocated with a separate gamma_ikAnd beta_ikParameter, the output result of BN layer is expressed as:

in the formula (I), the compound is shown in the specification,

is the output of the BN layer; c_ikA k characteristic channel representing the i convolutional layer; mu.s_ik,σ_ikRespectively representing channel characteristics C_ikThe mean and variance of (a) are obtained by historical training data statistics;

γ_ikcorresponding to a scaling factor, the network scales using the scaling factor as the weight of the feature channel, and sparsifying the scaling factors by the Lasso algorithm:

in the formula, loss_newAs a final loss function, loss_oldFor the improved loss function, Layers are the network layer number of the YOLOv3 network, and Channels are the channel number of the YOLOv3 network;

finally, all the gamma parameters are arranged according to the sequence from big to small, and then the gamma parameters after the sequence are deleted according to the proportion_ikThe corresponding characteristic channel and the BN channel.

The second purpose of the invention is realized by the following technical scheme: the computer device comprises a processor and a memory for storing a program executable by the processor, and when the processor executes the program stored by the memory, the embedded non-contact elevator key interaction method fusing face recognition and achieving the first purpose of the invention is realized.

The third purpose of the invention is realized by the following technical scheme: the elevator realizes the identification of floor keys and the operation control of a lift car through the embedded non-contact elevator key interaction method integrating the face identification.

Compared with the prior art, the invention has the following advantages and effects:

(1) the method comprises the steps of firstly positioning the region of the elevator key panel in the image through edge detection, filtering and linear detection operations, solving a homography transformation matrix, then detecting the fingers of the elevator user in the image by utilizing a deep learning technology, and obtaining the floor keys selected by the fingers of the elevator user according to the solved homography transformation matrix. The invention avoids the interference of environmental factors on target detection, improves the accuracy of the selected floor key identification, and also ensures that the method can be applied to changeable environments and has more diversity in interactive scenes.

(2) The method can be applied to realize the non-contact elevator key during the epidemic situation, and avoids the cross infection caused by the multiple touch of the elevator key by multiple people.

(3) The invention identifies the floor key selected by the elevator user by the computer vision technology, and simultaneously adds the face identification technology to form double verification, thereby ensuring that the personnel entering and exiting the target floor are led by the resident or the resident, and greatly improving the interactivity of the elevator and the safety of the resident.

(4) The YOLOv3 algorithm has advantages in speed, and on the basis, the training speed of the YOLOv3 network can be further improved by improving the learning capacity of the YOLOv3 network on difficult samples and improving the loss of small objects; by reducing the number of the characteristic channels of the YOLOv3 network, the calculation complexity can be further reduced, the target detection efficiency is greatly improved, and the real-time detection is facilitated.

(5) In the invention, because the extracted edge image is a combined image containing horizontal edges and vertical edges, the edge image can be divided into a horizontal filtering image and a vertical filtering image by further filtering the edge image by using a horizontal linear filtering operator and a vertical filtering operator, and then linear detection is carried out, thus redundant detection after the edges of a horizontal channel and a vertical channel are combined can be avoided, and the complexity of a linear detection algorithm is effectively reduced.

Drawings

Fig. 1 is a flow chart of an embeddable non-contact elevator key interaction method of the present invention incorporating face recognition.

Fig. 2 is a schematic diagram of hough line detection algorithm in cartesian coordinates.

Fig. 3 is a schematic diagram of hough line detection algorithm of polar coordinate system.

Fig. 4 is a schematic diagram of a homography transform.

Fig. 5 is a schematic pruning diagram of networkslimming.

Detailed Description

The present invention will be described in further detail with reference to examples and drawings, but the present invention is not limited thereto.

The embodiment discloses an embeddable non-contact elevator key interaction method integrating face recognition, which can be applied to an elevator, and the elevator realizes the recognition of floor keys and the operation control of a car through the method. As shown in fig. 1, the method comprises the steps of:

and S1, acquiring an original image shot by a camera in the elevator car, wherein the camera is arranged above the elevator key panel and shoots the elevator key panel downwards at a certain angle.

Then, edge detection is carried out on the shooting area of the original image through a Laplace filter operator, so that an edge image is obtained:

s12, because the edge is the collection of the points with obvious brightness change in the image, and the gradient can reflect the change speed in the value, based on the principle of no leakage of detecting the boundary of the elevator key panel, a second-order gradient Laplace filter operator is adopted to detect the edge of the gray image, wherein the Laplace filter operator adopts a large-scale convolution kernel, specifically, the edge gradient is calculated by using second-order difference, and the process is as follows:

f``(x)＝(f(x+1)-f(x))-(f(x)-f(x-1))

further simplifying as follows:

f``(x)＝f(x-1)-2*f(x0+f(x+1)

i.e. the second order difference of a one-dimensional discrete sequence can be expressed as the result of the convolution of the sequence with a one-dimensional convolution kernel [ +1, -2, +1], which generalizes the conclusion to a two-dimensional matrix of gray scale images, a one-dimensional sequence can be understood as one pixel value in either the horizontal or vertical direction:

for gray scale image I_grayTwo-dimensional kernel K with dimension 3 x 3_L：

Since the two-dimensional kernel only considers the horizontal direction and the vertical direction, the diagonal information is taken into consideration, and the convolution kernel K_LReplacing the steps as follows:

the second order difference information of the gray level image is obtained by convolution of the convolution kernel and the image, namely:

G＝K_L*I_gray

convolution kernel K_LI.e., the Laplace filter operator, the detected edges are more pronounced as the convolution kernel scale increases.

And (4) taking out points with convolution results of 0, wherein the points are edges, and the edge image is a set of points with obvious gray level change in the gray level image. The extracted edge image is a merged image containing horizontal edges and vertical edges.

And S2, filtering the edge image by utilizing the horizontal direction and vertical direction straight line filtering operators.

Because the edge image obtained by the Laplace operator of the large-scale convolution kernel has a plurality of noise points, the key points for positioning the elevator key panel area are the positioning of four straight lines at the boundary, and the four straight lines are in a horizontal or vertical state in the image, the straight line filtering operators in the horizontal direction and the vertical direction can enhance the straight line type edges in the horizontal direction and the vertical direction, and the edges of the elevator key panel area are reserved while noise is eliminated. The filtering process is as follows:

In the formula, T represents the transposition of a vector pair, and n represents the size of a filter operator; k_horizontalSensitive to horizontal linear edges, K_verticalThe method is sensitive to vertical linear edges, and the two operators can effectively eliminate isolated point noise. Generally, the larger n is, the higher the requirement on the length of a straight line is, and the more beneficial is to eliminating a nonlinear noise part; however, if the value of n is too large, the sensitivity to the linear angle is also increased, which may cause a slightly inclined straight line to be filtered out, and the boundary of the projection region in the acquired image is generally not strictly horizontal or vertical, so the value of n cannot be set too large, and needs to be set according to the actual situation.

S22, filtering the Laplace edge image I_LaplaceConvolving with two operators respectively to obtain a horizontal filtering image I_horizontalAnd filtering the image I in the vertical direction_vertical：

I_horizontal＝K_horizontal*I_Laplace

I_vertical＝K_vertical*I_Laplace

S3, respectively carrying out linear detection on the image filtered in the horizontal direction and the image filtered in the vertical direction by adopting a Hough linear detection algorithm so as to position the region of the elevator key panel:

and S32, respectively carrying out linear detection on the horizontal direction filtering image and the vertical direction filtering image which are subjected to threshold segmentation by using a Hough linear detection algorithm, and finally obtaining four elevator key panel boundary straight lines.

Because the edge image extracted in step S1 is a merged image including horizontal edges and vertical edges, the edge image is further filtered by using the horizontal line filter operator and the vertical filter operator in step S2 to be divided into a filtered image including only horizontal direction and a filtered image including only vertical direction, and then the line detection is performed in step S3, so that redundant detection after merging the edges of the horizontal channel and the vertical channel can be avoided, and the complexity of the line detection algorithm is effectively reduced.

The hough line detection algorithm maps each point on the cartesian coordinate system to a straight line in the hough space by using the principle of point-line duality between the cartesian coordinate system and the hough space, so that an intersection point of a plurality of straight lines in the hough space corresponds to a straight line passing through a plurality of points in the cartesian coordinate system.

Specifically, for a straight line y on a cartesian coordinate system, kx + b, where (x, y) represents a coordinate point in the coordinate system, k represents a slope of the straight line, and b represents an intercept of the straight line. The straight line is transformed into: and b is y-xk, and the abscissa in the hough space is k, and the ordinate in the hough space is b, then b is y-xk, which is a straight line with slope-x and intercept y in the hough space. Several points (x) on the same straight line on the Cartesian coordinate system₁,y₁),(x₂,y₂),…,(x_n,y_n) The hough space corresponds to a plurality of straight lines, and the common intersection point (k, b) of the straight lines is the slope and intercept of the same straight line in the cartesian coordinate system, and a schematic diagram is shown in fig. 2.

Since the slope of the vertical line in the image cannot be calculatedThis is typically done in polar form with a hough transform. Specifically, a straight line is expressed by a polar coordinate equation ρ ═ xcos θ + ysin θ, where ρ is a polar distance, i.e., a distance from an origin to the straight line in a polar coordinate space; θ is the polar angle, i.e. the angle between the line segment passing through the origin and perpendicular to the straight line and the x-axis. Defining the horizontal coordinate in Hough space as theta and the vertical coordinate as rho, and then defining the coordinates (x) of a plurality of points on the same straight line on the polar coordinate system₁,y₁),(x₂,y₂),…,(x_n,y_n) The hough space corresponds to a plurality of curves, and a common intersection point (θ, ρ) of the curves is a polar angle and a polar distance of the same straight line in the polar coordinate system, and a schematic diagram is shown in fig. 3.

S33, solving the intersection point of the four straight lines of the elevator key panel boundary in pairs to obtain four vertex coordinates (x) of the upper left, the lower right and the upper right of the elevator key panel area in the original image_lt,y_lt),(x_lb,y_lb),(x_rb,y_rb),(x_rt,y_rt)。

S4, solving the mapping relation under the view angle transformation by using the homography transformation matrix:

s41, homography transformation reflects the process of mapping from a two-dimensional plane to a three-dimensional space and then mapping from the three-dimensional space to another two-dimensional plane, X-Y-Z is taken as a three-dimensional space coordinate system and can be understood as a world coordinate system, and X-Y is taken as a pixel plane space coordinate system; x '-y' is a plane coordinate system of the elevator key panel, and the homography transformation can be described as follows: a point (X, Y) on the X-Y coordinate system, corresponding to a straight line l passing through the origin and the point on the X-Y-Z coordinate system:

the straight line intersects the x '-y' coordinate system plane at point (x ', y'), and the process from point (x, y) to point (x ', y') is referred to as a homography transformation.

Let the X ' -Y ' plane be perpendicular to the Z axis of the X-Y-Z space coordinate system and intersect the Z axis at the point (0,0,1), i.e. the point (X ', Y ') in the X ' -Y ' plane coordinate system is the point (X ') in the X-Y-Z space coordinate system^′Y', 1), is described using a homography transformation matrix HThe mapping relation between the X-Y plane coordinate system and the X-Y-Z space coordinate system is as follows:

the equation to be solved is then:

(x_lt,y_lt)→(x_lt′,y_lt′)

(x_lb,y_lb)→(x_lb′,y_lb′)

(x_rb,y_rb)→(x_rb′,y_rb′)

(x_rt,y_rt)→(x_rt′,y_rt′)

the target coordinates are respectively substituted into the equation to be solved in step S42, and since the four vertex coordinates in the pixel coordinate system are solved first, the H matrix can be solved simultaneously.

S5, detecting and positioning the fingers of the elevator user in the original image by using an improved YOLOv3 algorithm, mapping and converting the position coordinates of the fingers through a homography transformation matrix to obtain corresponding position coordinates in an elevator key panel, and determining which floor key the position coordinates are located at, namely determining the floor key the fingers point to.

The input of the network is an original image collected by a camera in an elevator car, the output is the position coordinate (x, y, w, h) and the confidence of the finger of an elevator user in the original image, and the original image with the known position coordinate, confidence (1 or 0) and classification probability (namely the probability of the finger) of the finger of the elevator user is used as training data during training. The loss function of the network is designed before the network training.

Here, the improved YOLOv3 algorithm includes improving its loss function on the basis of the YOLOv3 target detection algorithm (i.e., YOLOv3 network), and reducing the feature extraction part of the YOLOv3 network by using an adaptive pruning algorithm.

Specifically, for the YOLOv3 network, the loss function is designed as follows:

x, y, w, h representing the jth real box of the ith mesh;

the second term and the third term are confidence loss,

representing the confidence of the jth frame of the ith grid;

representing the true probability that the jth frame of the ith mesh belongs to the class c object.

The above-mentioned YOLOv3 uses a positive and negative sample balance factor λ_noobjThe confidence loss caused by most grids which are not responsible for predicting the target is reduced, the imbalance of positive and negative samples (the positive sample refers to the target to be detected by the network, and the negative sample is the background except the target) can be reduced to a certain extent, but the training problem of the difficult samples is not solved. Therefore, the embodiment refers to focallloss to the third term of the loss function, i.e. the confidence loss, so as to improve the learning ability of the model on the difficult samples.

Wherein Focalloss is improved based on cross entropy, and the functional form is as follows:

in the formula, the ratio of y,

respectively representing predicted and true probability values, i.e. p_ij(c),

Alpha is FocalLoss super ginseng, generally in [0,5 ]]Taking values in between.

Focalloss sets up (1-y) for positive and negative samples, respectively^αAnd y^αTwo weights, for example negative examples, when it is easy to learn, y is close to 0, then weight y^αThe value of (a) is also small; when the sample is difficult to learn, y is close to 0.5, and weight y^αThe value of (a) is large. Therefore, the samples difficult to classify have higher weight than the samples easy to classify, and the capability of the model for learning the samples difficult to classify can be improved.

The improved confidence loss function is as follows:

in addition, in an elevator application scene, the finger of an elevator user only occupies a small area in the image, that is, the frame of a small object in the data set occupies a large proportion, so that the training speed of the network can be accelerated by increasing the loss of the small object, and therefore, the embodiment further increases an adaptive scaling factor for the first term, that is, the coordinate loss, and the scaling factor is as follows:

in the formula (I), the compound is shown in the specification,

represents the width and height of the real bounding box; rho_boxThe range of (1) to (2), and the smaller the real frame, the larger the value, so that the loss proportion of the small object can be improved.

The improved coordinate loss is as follows:

in the convolutional neural network structure, one convolutional channel represents a certain feature of an image, and a model carries out prediction by integrating feature information of all channels, so that the more complex network structure can extract more features. The YOLOv3 network adopts the darknet-53 as the feature extraction main body, the structure has 53 convolutional layers, the number of the convolutional layer channels is doubled every time the structure is sampled, the total number of the channels reaches 17856, the target required to be detected by the elevator is a finger, and the structure of the darknet-53 has enough complexity to extract arrow features and has a large amount of redundancy by analyzing from a visual angle, so that the structure or the size of the network needs to be reduced.

The current pruning technology of the convolutional neural network can be divided into the following categories: a Weight Quantization (Weight Quantization) based method, such as HashNet, which groups Weight variables by hash, wherein the variables in the same group share the same Weight value, and although the method effectively reduces the parameter size of the model, the forward calculation speed of the network cannot be increased; the method based on the weight sparsification is characterized in that the weight variables in the network are subjected to sparse training, and then a large number of weight variables close to 0 in the network can be deleted, but the method can accelerate the forward calculation process only under special hardware; the method based on structure pruning reduces the structure of the network in a self-adaptive manner through training data, thereby effectively reducing the size of model parameters and improving the operation speed.

Therefore, in the embodiment, for the problem of redundancy of the complex of the dark net-53, a channel-level pruning is performed on the network by using a network pruning algorithm based on a structure pruning method, so as to reduce the number of the characteristic channels of the network.

In order to perform channel-level pruning on a network by using a Lasso algorithm, the network pruning method comprises the following steps:

in the formula (I), the compound is shown in the specification,

finally, all the gamma parameters are arranged according to the sequence from large to small, and then the gamma which is arranged at the back (with smaller value) is deleted according to the proportion_ikThe corresponding characteristic channel and the BN channel. A pruning diagram for networksclipping is shown in fig. 4.

S6, face information of the resident on the floor where the finger points is obtained, and the face information of the resident can be registered in advance in an elevator background system;

then whether to elevator user for resident family and the floor that the finger indicates whether carry out dual verification for the floor that this elevator user lives in, this floor button is just chosen under the dual verification passing condition, and final control elevator car moves to this floor, so can ensure that elevator user is this floor resident family, greatly improve the interactive of elevator and the security between the resident family.

The techniques described herein may be implemented by various means. For example, these techniques may be implemented in hardware, firmware, software, or a combination thereof. For a hardware implementation, the processing modules may be implemented within one or more Application Specific Integrated Circuits (ASICs), Digital Signal Processors (DSPs), Programmable Logic Devices (PLDs), field-programmable gate arrays (FPGAs), processors, controllers, micro-controllers, electronic devices, other electronic units designed to perform the functions described herein, or a combination thereof.

For a firmware and/or software implementation, the techniques may be implemented with modules (e.g., procedures, steps, flows, and so on) that perform the functions described herein. The firmware and/or software codes may be stored in a memory and executed by a processor. The memory may be implemented within the processor or external to the processor.

Those of ordinary skill in the art will understand that: all or part of the steps for implementing the method embodiments may be implemented by hardware related to program instructions, and the program may be stored in a computer readable storage medium, and when executed, the program performs the steps including the method embodiments; and the aforementioned storage medium includes: various media that can store program codes, such as ROM, RAM, magnetic or optical disks. For example, the hardware is a computing device comprising a processor and a memory for storing a program executable by the processor, which when executing the program stored by the memory, implements the embeddable non-contact elevator key interaction method described above.

The above embodiments are preferred embodiments of the present invention, but the present invention is not limited to the above embodiments, and any other changes, modifications, substitutions, combinations, and simplifications which do not depart from the spirit and principle of the present invention should be construed as equivalents thereof, and all such changes, modifications, substitutions, and simplifications are intended to be included in the scope of the present invention.

Claims

1. An embeddable non-contact elevator key interaction method integrating face recognition is characterized by comprising the following steps:

s6, acquiring the face information of the resident of the floor pointed by the finger, performing double verification on whether the elevator user is the resident and whether the floor pointed by the finger is the floor where the elevator user lives, wherein the floor key is selected under the condition that the double verification is passed, and finally controlling the elevator car to run to the floor.

2. The method for interacting the embeddable non-contact elevator keys fusing human face recognition as claimed in claim 1, wherein the camera is installed above the elevator key panel and shoots the elevator key panel downward;

in step S1, the process of edge detection of the camera shooting area by the Laplace filter operator is as follows:

s12, based on the principle of no-leakage elevator key panel boundary, adopting a second-order gradient Laplace filter operator to detect the edge of the gray image, wherein the Laplace filter operator specifically calculates the edge gradient by using second-order difference, and the process is as follows:

consider the one-dimensional sequence { f (1), f (2),.. f (x-1), f (x +1) } with the second order difference at the x position expressed as:

f``(x)＝(f(x+1)-f(x))-(f(x)-f(x-1))

further simplifying as follows:

f``(x)＝f(x-1)-2*f(x)+f(x+1)

for gray scale image I_grayTwo-dimensional kernel K with dimension 3 x 3_L：

Since the two-dimensional kernel only considers the horizontal direction and the vertical direction, the diagonal information is added into the considerationThe convolution kernel K_LReplacing the steps as follows:

G＝K_L*I_gray

and (4) taking out points with convolution results of 0, wherein the points are edges, and the edge image is a set of points with obvious gray changes in the gray image.

3. The method for interacting keys of an embeddable non-contact elevator fusing human face recognition as claimed in claim 1, wherein the process of step S2 is as follows:

I_horizontal＝K_horizontal*I_Laplace

I_vertical＝K_vertical*I_Laplace。

4. The method for interacting keys of an embeddable non-contact elevator fusing human face recognition as claimed in claim 1, wherein the process of step S3 is as follows:

s31, considering that after the edge image is filtered in the horizontal direction and the vertical direction, the non-horizontal or vertical linear edges of the edge image can be restrained, firstly, a threshold is used for segmenting and removing the non-horizontal linear edges and the non-vertical linear edges;

s32, respectively carrying out linear detection on the horizontal direction filter image and the vertical direction filter image which are subjected to threshold segmentation by using a Hough linear detection algorithm, and finally obtaining four elevator key panel boundary linear lines;

s33, solving the intersection point of the four straight lines of the elevator key panel boundary in pairs to obtain four vertex coordinates (x) of the upper left, the lower right and the upper right of the elevator key panel area in the original image_lt，y_lt)，(x_lb，y_lb)，(x_rb，y_rb)，(x_rt，y_rt)。

5. The method for interacting the keys of the embeddable non-contact elevator integrating the face recognition as claimed in claim 4, wherein the homography transformation reflects the process of mapping from one two-dimensional plane to a three-dimensional space and then from the three-dimensional space to another two-dimensional plane, X-Y-Z is a three-dimensional space coordinate system which can be understood as a world coordinate system, X-Y is a pixel plane space coordinate system, and X '-Y' is an elevator key panel plane coordinate system; the homography transform can be described as: a point (X, Y) on the X-Y coordinate system corresponding to a straight line passing through the origin and the point on the X-Y-Z coordinate system

The straight line intersects the x '-y' coordinate system plane at the point (x ', y'), and the process from the point (x, y) to the point (x ', y') is called homographyChanging;

the H matrix has 9 transformation parameters, but actually has only 8 degrees of freedom, because the X-Y-Z space coordinate system is a homogeneous coordinate system, and regardless of the coordinate transformation of coordinate scaling, when multiplying the H matrix by a scaling factor k:

the equation to be solved is then:

and S43, defining target coordinate points of the four vertexes of the elevator key panel under the pixel coordinate system obtained in the step S3 under the scene coordinate system of the elevator key panel:

(x_lt，y_lt)→(x_lt′，y_lt′)

(x_lb，y_lb)→(x_lb′，y_lb′)

(x_rb，y_rb)→(x_rb′，y_rb′)

(x_rt，y_rt)→(x_rt′，y_rt′)

6. The method of claim 1, wherein the improved YOLOv3 algorithm comprises improving its loss function based on YOLOv3 target detection algorithm, and using an adaptive pruning algorithm to reduce the feature extraction part of YOLOv3 network.

7. The method for interacting embedded non-contact elevator keys fusing face recognition according to claim 6, characterized in that the loss function of the YOLOv3 network is designed as follows:

x, y, w, h representing the jth real box of the ith network;

the second term and the third term are confidence loss,

representing the confidence of the jth frame of the ith grid;

the improvement of the loss function is specifically as follows:

in the formula, the ratio of y,

respectively representing predicted and true probability values, i.e. p_ij(c)，

Alpha is FocalLoss super ginseng;

the improved confidence loss function is as follows:

in the formula (I), the compound is shown in the specification,

represents the width and height of the real bounding box; rho_boxThe range of (1) to (2) is within 1-2, and the smaller the real frame is, the larger the numerical value is;

the improved coordinate loss is as follows:

8. the method for interacting the keys of the embeddable non-contact elevator fusing the face recognition as claimed in claim 6, wherein the YOLOv3 network adopts the dark net-53 as a feature extraction subject, and aiming at the problem of dark net-53 complexity redundancy, the network is pruned at a channel level by using a network pruning algorithm based on a structure pruning method to reduce the number of feature channels of the network:

in the formula (I), the compound is shown in the specification,

is the output of the BN layer; c_ikA k characteristic channel representing the i convolutional layer; mu.s_ik，σ_ikRespectively represent channel characteristics C_ikThe mean and variance of (a) are obtained by historical training data statistics;

in the formula, loss_newAs a final loss function, loss_oldFor improved loss function, Layers are the number of network Layers of the Yolov3 networkChannels is the number of Channels of the YOLOv3 network;

finally, all the gamma parameters are arranged according to the sequence from big to small, and then the sequenced gamma is deleted according to the proportion_ikThe corresponding characteristic channel and the BN channel.

9. A computing device comprising a processor and a memory for storing a processor-executable program, wherein the processor, when executing the program stored in the memory, implements the method of face recognition fused embeddable non-contact elevator key interaction of any of claims 1 to 8.

10. An elevator, characterized in that the elevator realizes the identification of floor keys and the operation control of a car through the embedded non-contact elevator key interaction method integrating the face identification according to any one of claims 1 to 8.