CN113220114A - Embedded non-contact elevator key interaction method integrating face recognition - Google Patents

Embedded non-contact elevator key interaction method integrating face recognition Download PDF

Info

Publication number
CN113220114A
CN113220114A CN202110086981.6A CN202110086981A CN113220114A CN 113220114 A CN113220114 A CN 113220114A CN 202110086981 A CN202110086981 A CN 202110086981A CN 113220114 A CN113220114 A CN 113220114A
Authority
CN
China
Prior art keywords
elevator
image
coordinate system
horizontal
follows
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110086981.6A
Other languages
Chinese (zh)
Other versions
CN113220114B (en
Inventor
谢巍
许练濠
卢永辉
吴伟林
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
South China University of Technology SCUT
Original Assignee
South China University of Technology SCUT
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by South China University of Technology SCUT filed Critical South China University of Technology SCUT
Priority to CN202110086981.6A priority Critical patent/CN113220114B/en
Publication of CN113220114A publication Critical patent/CN113220114A/en
Application granted granted Critical
Publication of CN113220114B publication Critical patent/CN113220114B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/017Gesture based interaction, e.g. based on a set of recognized hand gestures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/12Edge-based segmentation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/13Edge detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/74Image or video pattern matching; Proximity measures in feature spaces
    • G06V10/75Organisation of the matching processes, e.g. simultaneous or sequential comparisons of image or video features; Coarse-fine approaches, e.g. multi-scale approaches; using context analysis; Selection of dictionaries
    • G06V10/753Transform-based matching, e.g. Hough transform
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/161Detection; Localisation; Normalisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/20Movements or behaviour, e.g. gesture recognition
    • G06V40/28Recognition of hand or arm movements, e.g. recognition of deaf sign language
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02BCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO BUILDINGS, e.g. HOUSING, HOUSE APPLIANCES OR RELATED END-USER APPLICATIONS
    • Y02B50/00Energy efficient technologies in elevators, escalators and moving walkways, e.g. energy saving or recuperation technologies

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Human Computer Interaction (AREA)
  • Software Systems (AREA)
  • Artificial Intelligence (AREA)
  • Computing Systems (AREA)
  • Evolutionary Computation (AREA)
  • Molecular Biology (AREA)
  • Mathematical Physics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • Medical Informatics (AREA)
  • Psychiatry (AREA)
  • Social Psychology (AREA)
  • Image Processing (AREA)
  • Indicating And Signalling Devices For Elevators (AREA)

Abstract

The invention discloses an embeddable non-contact elevator key interaction method integrating face recognition, which comprises the steps of firstly carrying out edge detection on a shooting area in an original image through a Laplace filter operator to obtain an edge image, and filtering the edge image by utilizing linear filter operators in the horizontal direction and the vertical direction; then, respectively carrying out linear detection on the images filtered in the horizontal direction and the images filtered in the vertical direction by adopting a Hough linear detection algorithm so as to position the region of the elevator key panel and solve a homography transformation matrix; and detecting and positioning the fingers of the elevator user by using an improved YOLOv3 algorithm, obtaining floor keys to which the fingers point according to the homography transformation matrix, and simultaneously obtaining the face information of the resident to perform double verification. The invention can accurately identify the elevator key selected by the elevator user, can realize non-contact taking of the elevator, and ensures the safety of residents through double verification of the floor and the face information of the residents.

Description

Embedded non-contact elevator key interaction method integrating face recognition
Technical Field
The invention relates to the technical field of computer vision and human-computer interaction, in particular to an embeddable non-contact elevator key interaction method integrating face recognition.
Background
Nowadays, the wide application of elevators in urban high-rise buildings has become an indispensable riding tool for people living and working at high-rise. Generally, elevator button adopts the contact, and people need contact elevator button to select preceding floor and control elevator door and open, close, and all have different people to press elevator button in the elevator every day, and this makes and has multiple bacterium or virus on the elevator button, causes cross infection easily, very easily increases the several rates of transmission.
With the development of scientific technology, man-machine interaction technology becomes diversified, people no longer need to simply present a virtual scene, and begin to explore an interaction method with a virtual world, so that more and more novel man-machine interaction technologies come into play. Human-computer interaction techniques fall into several categories: the traditional interactive technology taking a keyboard and a mouse as input; interaction technologies based on touch screen devices, such as smart phones, tablet computers; non-contact interaction technologies based on machine vision and image processing technologies, such as virtual keyboards, gesture interaction systems and the like.
Hiroki Goto et al studied a camera projection interaction system based on a frame difference method and a hand skin color extraction method: firstly, separating hands from a scene based on clustering characteristics of hand skin colors in HSV and YCbCr spaces, and then detecting fingertip positions on a separated foreground image by using a template matching method, thereby realizing projection interaction between a user and a computer or a family television. Fitriani et al propose a human-computer interaction system based on a deformable projection surface, which projects a virtual scene onto the surface of an easily deformable object, then detects the deformation generated when a user touches a projection screen, and analyzes interaction information through an image processing algorithm and a deformation model of the object.
However, the above solutions based on machine vision techniques and image processing algorithms have a common drawback: the diversity of the projection scene cannot be guaranteed. For example, in an interactive system based on hand skin color, when the projected scene is similar to the hand skin color, the effect of the hand foreground separation algorithm is greatly reduced. For an interactive system based on a deformation surface, although the system can keep stable operation in a projection scene set by the system, if the system is applied to a changeable projection scene, deformation detection of a projection image becomes inaccurate, different schemes need to be designed for different scenes, and the development cost of the system is high.
Disclosure of Invention
The invention aims to overcome the defects and shortcomings of the prior art and provides an embeddable non-contact elevator key interaction method integrating face recognition.
It is a second object of the invention to provide a computing device.
A third object of the present invention is to provide an elevator.
The first purpose of the invention is realized by the following technical scheme: an embeddable non-contact elevator key interaction method integrating face recognition comprises the following steps:
s1, obtaining an original image shot by a camera in the elevator car, and carrying out edge detection on a shot area of the original image through a Laplace filter operator to obtain an edge image;
s2, filtering the edge image by using linear filtering operators in the horizontal direction and the vertical direction to enhance the linear edges in the horizontal direction and the vertical direction, and reserving the edges of the elevator key panel area while eliminating noise;
s3, respectively carrying out linear detection on the image subjected to horizontal direction filtering and the image subjected to vertical direction filtering by adopting a Hough linear detection algorithm so as to position the region of the elevator key panel;
s4, solving the mapping relation under the view angle transformation by using the homography transformation matrix;
s5, detecting and positioning the fingers of the elevator users in the original images by using an improved YOLOv3 algorithm, and obtaining floor keys to which the fingers point according to the homography transformation matrix;
s6, acquiring the face information of the resident of the floor pointed by the finger, performing double verification on whether the elevator user is a resident and whether the floor pointed by the finger is the floor where the elevator user lives, wherein the floor key is selected under the condition that the double verification is passed, and finally controlling the elevator car to run to the floor.
Preferably, the camera is arranged above the elevator key panel and shoots the elevator key panel downwards;
in step S1, the process of edge detection of the camera shooting region by the Laplace filter operator is as follows:
s11, carrying out graying processing on the original image to obtain a grayscale image;
s12, based on the principle of no-leakage elevator key panel boundary, adopting a second-order gradient Laplace filter operator to detect the edge of the gray image, wherein the Laplace filter operator specifically utilizes second-order difference to calculate the edge gradient, and the process is as follows:
consider the one-dimensional sequence { f (1), f (2), … f (x-1), f (x +1) } with the second order difference at the x position expressed as:
f``(x)=(f(x+1)-f(x))-(f(x)-f(x-1))
further simplifying as follows:
f``(x)=f(x-1)-2*f(x)+f(x+1)
i.e. the second order difference of a one-dimensional discrete sequence can be expressed as the result of the convolution of the sequence with a one-dimensional convolution kernel [ +1, -2, +1], generalizing this conclusion into a two-dimensional matrix of the grayscale image:
for gray scale image IgrayTwo-dimensional kernel K with dimension 3 x 3L
Figure BDA0002911098440000031
Since the two-dimensional kernel only considers the horizontal direction and the vertical direction, the diagonal information is taken into consideration, and the convolution kernel K isLReplacing the steps as follows:
Figure BDA0002911098440000032
the second order difference information of the gray level image is obtained by convolution of the convolution kernel and the gray level image, namely:
G=KL*Igray
as the convolution kernel scale increases, the more pronounced the detected edge is;
and (4) taking out points with convolution results of 0, wherein the points are edges, and the edge image is a set of points with obvious gray level change in the gray level image.
Preferably, the process of step S2 is as follows:
s21, defining the size as 1 x n pairs of horizontal straight line filter operators KhorizontalAnd a vertical line filter operator K of size n × 1vertical
Figure BDA0002911098440000033
Figure BDA0002911098440000034
In the formula, T represents the transposition of a vector pair, and n represents the size of a filter operator; khorizontalSensitive to horizontal linear edges, KverticalSensitive to vertical linear edges;
s22, filtering the Laplace edge image lLaplaceRespectively withConvolving the two operators to obtain a horizontal filtering image IhorizontalAnd filtering the image I in the vertical directionvertical
Ihorizontal=Khorizontal*ILaplace
Ivertical=Kvertical*ILaplace
Preferably, the process of step S3 is as follows:
s31, considering that after the edge image is filtered in the horizontal direction and the vertical direction, the non-horizontal or vertical linear edge of the edge image can be inhibited, firstly, a threshold is used for segmenting and removing the non-horizontal linear edge and the non-vertical linear edge;
s32, respectively carrying out linear detection on the horizontal direction filtering image and the vertical direction filtering image which are subjected to threshold segmentation by using a Hough linear detection algorithm, and finally obtaining four elevator key panel boundary straight lines;
s33, solving the intersection point of the four straight lines of the elevator key panel boundary in pairs to obtain four vertex coordinates (x) of the upper left, the lower right and the upper right of the elevator key panel area in the original imagelx,ylt),(xlb,ylb),(xrb,yrb),(xrt,yrt)。
Furthermore, homography transformation reflects the process of mapping from one two-dimensional plane to a three-dimensional space and then mapping from the three-dimensional space to another two-dimensional plane, and X-Y-Z is taken as a three-dimensional space coordinate system and can be understood as a world coordinate system, X-Y is a pixel plane space coordinate system, and X '-Y' is an elevator key panel plane coordinate system; the homography transform can be described as: a point (X, Y) on the X-Y coordinate system, corresponding to a straight line l passing through the origin and the point on the X-Y-Z coordinate system:
Figure BDA0002911098440000041
the straight line intersects the x '-y' coordinate system plane at a point (x ','), and the process from point (x, y) to point (x ',') is called homography;
the process of solving the mapping relation under the view transformation by using the homography transformation matrix is as follows:
s41, setting the X '-Y' plane to be vertical to the Z axis of the X-Y-Z space coordinate system and to be intersected with the Z axis to be a point (0,0,1), namely, the point (X ', Y') under the X '-Y' plane coordinate system is a point (X ', Y', 1) under the X-Y-Z space coordinate system, and describing the mapping relation between the X-Y plane coordinate system and the X-Y-Z space coordinate system by using a homography transformation matrix H:
Figure BDA0002911098440000042
Figure BDA0002911098440000051
in the formula, h1~h99 transformation parameters for the homography matrix;
further obtaining the mapping relation from the x-y plane coordinate system to the x '-y' plane coordinate system as follows:
Figure BDA0002911098440000052
the H matrix has 9 transformation parameters, but actually has only 8 degrees of freedom, because the X-Y-Z spatial coordinate system is a homogeneous coordinate system, and regardless of the coordinate transformation of coordinate scaling, when the H matrix is multiplied by a scaling factor k:
Figure BDA0002911098440000053
k x H and H actually represent the same mapping relationship, so H has only 8 degrees of freedom;
s42, solving H, wherein one method is to use H9Setting to 1, the equation to be solved is:
Figure BDA0002911098440000054
another approach is to add a constraint to the homography matrix H, modulo 1, as follows:
Figure BDA0002911098440000055
the equation to be solved is then:
Figure BDA0002911098440000056
and S43, defining target coordinate points of the elevator key panel under the scene coordinate system of the elevator key panel by the four vertexes of the elevator key panel under the pixel coordinate system obtained in the step S3:
(xlt,ylt)→(xlt′,ylt′)
(xlb,ylb)→(xlb′,ylb′)
(xrb,yrb)→(xrb′,yrb′)
(xrt,yrt)→(xrt′,yrt′)
these target coordinates are respectively substituted into the equation to be solved in step S42, and the H matrix is solved simultaneously.
Preferably, the improved YOLOv3 algorithm comprises improving the loss function of the YOLOv3 target detection algorithm and reducing the feature extraction part of the YOLOv3 network by adopting an adaptive pruning algorithm.
Further, the loss function of the YOLOv3 network is designed as follows:
Figure BDA0002911098440000061
where the first term is the coordinate error loss, λcoordIs a coordinate loss function coefficient; s denotes dividing the input image into S × S meshes; b represents the number of frames included in one mesh;
Figure BDA0002911098440000062
whether the jth frame of the ith grid contains an object or not is represented, the containing time value is 1, and the non-containing time value is 0; x and y respectively represent the center coordinates of the frame; w and h respectively represent the length and width of the frame; r isijX, y, w, h representing the jth prediction box of the ith mesh;
Figure BDA0002911098440000063
x, y, w, h representing the jth real box of the ith network;
the second term and the third term are confidence loss,
Figure BDA0002911098440000064
whether the jth frame of the ith grid does not contain an object or not is represented, the value of the non-containing time is 1, and the value of the containing time is 0; lambda [ alpha ]noobjTo balance the loss weights of object-bearing and object-free meshes, the goal is to reduce the confidence loss of the mesh borders without objects; cijRepresenting the confidence of the jth frame prediction of the ith grid;
Figure BDA0002911098440000065
representing the confidence of the jth frame of the ith grid;
the fourth term is category loss, classes represents the number of categories; p is a radical ofij(c) Representing the predicted probability that the jth frame of the ith grid belongs to the class c object;
Figure BDA0002911098440000066
representing the true probability that the jth frame of the ith grid belongs to the class c object;
the improvement of the loss function is specifically as follows:
(1) and introducing FocalLoss for a third term, namely confidence coefficient loss to improve the learning capability of the model on the difficult samples, wherein Focallos is improved based on cross entropy and has the following functional form:
Figure BDA0002911098440000071
in the formula, the ratio of y,
Figure BDA0002911098440000072
respectively representing predicted and true probability values, i.e. pij(c),
Figure BDA0002911098440000073
Alpha is FocalLoss super ginseng;
the improved confidence loss function is as follows:
Figure BDA0002911098440000074
(2) adding an adaptive scaling factor to the first term, coordinate loss, as follows:
Figure BDA0002911098440000075
in the formula (I), the compound is shown in the specification,
Figure BDA0002911098440000076
represents the width and height of the real bounding box; rhoboxThe range of (1) to (2), and the smaller the real frame, the larger the value;
the improved coordinate loss is as follows:
Figure BDA0002911098440000077
furthermore, the YOLOv3 network adopts the dark net-53 as a feature extraction subject, and aiming at the problem of dark net-53 complexity redundancy, the YOLOv3 network adopts a network pruning algorithm in a structure-based pruning method to carry out channel-level pruning on the network so as to reduce the number of the feature channels of the network:
first, each convolutional layer is added with a BN layer, and when the BN operation is used in the convolutional neural network, each input characteristic channel is allocated with a separate gammaikAnd betaikParameter, the output result of BN layer is expressed as:
Figure BDA0002911098440000078
in the formula (I), the compound is shown in the specification,
Figure BDA0002911098440000079
is the output of the BN layer; cikA k characteristic channel representing the i convolutional layer; mu.sikikRespectively representing channel characteristics CikThe mean and variance of (a) are obtained by historical training data statistics;
γikcorresponding to a scaling factor, the network scales using the scaling factor as the weight of the feature channel, and sparsifying the scaling factors by the Lasso algorithm:
Figure BDA00029110984400000710
in the formula, lossnewAs a final loss function, lossoldFor the improved loss function, Layers are the network layer number of the YOLOv3 network, and Channels are the channel number of the YOLOv3 network;
finally, all the gamma parameters are arranged according to the sequence from big to small, and then the gamma parameters after the sequence are deleted according to the proportionikThe corresponding characteristic channel and the BN channel.
The second purpose of the invention is realized by the following technical scheme: the computer device comprises a processor and a memory for storing a program executable by the processor, and when the processor executes the program stored by the memory, the embedded non-contact elevator key interaction method fusing face recognition and achieving the first purpose of the invention is realized.
The third purpose of the invention is realized by the following technical scheme: the elevator realizes the identification of floor keys and the operation control of a lift car through the embedded non-contact elevator key interaction method integrating the face identification.
Compared with the prior art, the invention has the following advantages and effects:
(1) the method comprises the steps of firstly positioning the region of the elevator key panel in the image through edge detection, filtering and linear detection operations, solving a homography transformation matrix, then detecting the fingers of the elevator user in the image by utilizing a deep learning technology, and obtaining the floor keys selected by the fingers of the elevator user according to the solved homography transformation matrix. The invention avoids the interference of environmental factors on target detection, improves the accuracy of the selected floor key identification, and also ensures that the method can be applied to changeable environments and has more diversity in interactive scenes.
(2) The method can be applied to realize the non-contact elevator key during the epidemic situation, and avoids the cross infection caused by the multiple touch of the elevator key by multiple people.
(3) The invention identifies the floor key selected by the elevator user by the computer vision technology, and simultaneously adds the face identification technology to form double verification, thereby ensuring that the personnel entering and exiting the target floor are led by the resident or the resident, and greatly improving the interactivity of the elevator and the safety of the resident.
(4) The YOLOv3 algorithm has advantages in speed, and on the basis, the training speed of the YOLOv3 network can be further improved by improving the learning capacity of the YOLOv3 network on difficult samples and improving the loss of small objects; by reducing the number of the characteristic channels of the YOLOv3 network, the calculation complexity can be further reduced, the target detection efficiency is greatly improved, and the real-time detection is facilitated.
(5) In the invention, because the extracted edge image is a combined image containing horizontal edges and vertical edges, the edge image can be divided into a horizontal filtering image and a vertical filtering image by further filtering the edge image by using a horizontal linear filtering operator and a vertical filtering operator, and then linear detection is carried out, thus redundant detection after the edges of a horizontal channel and a vertical channel are combined can be avoided, and the complexity of a linear detection algorithm is effectively reduced.
Drawings
Fig. 1 is a flow chart of an embeddable non-contact elevator key interaction method of the present invention incorporating face recognition.
Fig. 2 is a schematic diagram of hough line detection algorithm in cartesian coordinates.
Fig. 3 is a schematic diagram of hough line detection algorithm of polar coordinate system.
Fig. 4 is a schematic diagram of a homography transform.
Fig. 5 is a schematic pruning diagram of networkslimming.
Detailed Description
The present invention will be described in further detail with reference to examples and drawings, but the present invention is not limited thereto.
The embodiment discloses an embeddable non-contact elevator key interaction method integrating face recognition, which can be applied to an elevator, and the elevator realizes the recognition of floor keys and the operation control of a car through the method. As shown in fig. 1, the method comprises the steps of:
and S1, acquiring an original image shot by a camera in the elevator car, wherein the camera is arranged above the elevator key panel and shoots the elevator key panel downwards at a certain angle.
Then, edge detection is carried out on the shooting area of the original image through a Laplace filter operator, so that an edge image is obtained:
s11, carrying out graying processing on the original image to obtain a grayscale image;
s12, because the edge is the collection of the points with obvious brightness change in the image, and the gradient can reflect the change speed in the value, based on the principle of no leakage of detecting the boundary of the elevator key panel, a second-order gradient Laplace filter operator is adopted to detect the edge of the gray image, wherein the Laplace filter operator adopts a large-scale convolution kernel, specifically, the edge gradient is calculated by using second-order difference, and the process is as follows:
consider the one-dimensional sequence { f (1), f (2), … f (x-1), f (x +1) } with the second order difference at the x position expressed as:
f``(x)=(f(x+1)-f(x))-(f(x)-f(x-1))
further simplifying as follows:
f``(x)=f(x-1)-2*f(x0+f(x+1)
i.e. the second order difference of a one-dimensional discrete sequence can be expressed as the result of the convolution of the sequence with a one-dimensional convolution kernel [ +1, -2, +1], which generalizes the conclusion to a two-dimensional matrix of gray scale images, a one-dimensional sequence can be understood as one pixel value in either the horizontal or vertical direction:
for gray scale image IgrayTwo-dimensional kernel K with dimension 3 x 3L
Figure BDA0002911098440000101
Since the two-dimensional kernel only considers the horizontal direction and the vertical direction, the diagonal information is taken into consideration, and the convolution kernel KLReplacing the steps as follows:
Figure BDA0002911098440000102
the second order difference information of the gray level image is obtained by convolution of the convolution kernel and the image, namely:
G=KL*Igray
convolution kernel KLI.e., the Laplace filter operator, the detected edges are more pronounced as the convolution kernel scale increases.
And (4) taking out points with convolution results of 0, wherein the points are edges, and the edge image is a set of points with obvious gray level change in the gray level image. The extracted edge image is a merged image containing horizontal edges and vertical edges.
And S2, filtering the edge image by utilizing the horizontal direction and vertical direction straight line filtering operators.
Because the edge image obtained by the Laplace operator of the large-scale convolution kernel has a plurality of noise points, the key points for positioning the elevator key panel area are the positioning of four straight lines at the boundary, and the four straight lines are in a horizontal or vertical state in the image, the straight line filtering operators in the horizontal direction and the vertical direction can enhance the straight line type edges in the horizontal direction and the vertical direction, and the edges of the elevator key panel area are reserved while noise is eliminated. The filtering process is as follows:
s21, defining the size as 1 x n pairs of horizontal straight line filter operators KhorizontalAnd a vertical line filter operator K of size n × 1vertical
Figure BDA0002911098440000103
Figure BDA0002911098440000104
In the formula, T represents the transposition of a vector pair, and n represents the size of a filter operator; khorizontalSensitive to horizontal linear edges, KverticalThe method is sensitive to vertical linear edges, and the two operators can effectively eliminate isolated point noise. Generally, the larger n is, the higher the requirement on the length of a straight line is, and the more beneficial is to eliminating a nonlinear noise part; however, if the value of n is too large, the sensitivity to the linear angle is also increased, which may cause a slightly inclined straight line to be filtered out, and the boundary of the projection region in the acquired image is generally not strictly horizontal or vertical, so the value of n cannot be set too large, and needs to be set according to the actual situation.
S22, filtering the Laplace edge image ILaplaceConvolving with two operators respectively to obtain a horizontal filtering image IhorizontalAnd filtering the image I in the vertical directionvertical
Ihorizontal=Khorizontal*ILaplace
Ivertical=Kvertical*ILaplace
S3, respectively carrying out linear detection on the image filtered in the horizontal direction and the image filtered in the vertical direction by adopting a Hough linear detection algorithm so as to position the region of the elevator key panel:
s31, considering that after the edge image is filtered in the horizontal direction and the vertical direction, the non-horizontal or vertical linear edge of the edge image can be inhibited, firstly, a threshold is used for segmenting and removing the non-horizontal linear edge and the non-vertical linear edge;
and S32, respectively carrying out linear detection on the horizontal direction filtering image and the vertical direction filtering image which are subjected to threshold segmentation by using a Hough linear detection algorithm, and finally obtaining four elevator key panel boundary straight lines.
Because the edge image extracted in step S1 is a merged image including horizontal edges and vertical edges, the edge image is further filtered by using the horizontal line filter operator and the vertical filter operator in step S2 to be divided into a filtered image including only horizontal direction and a filtered image including only vertical direction, and then the line detection is performed in step S3, so that redundant detection after merging the edges of the horizontal channel and the vertical channel can be avoided, and the complexity of the line detection algorithm is effectively reduced.
The hough line detection algorithm maps each point on the cartesian coordinate system to a straight line in the hough space by using the principle of point-line duality between the cartesian coordinate system and the hough space, so that an intersection point of a plurality of straight lines in the hough space corresponds to a straight line passing through a plurality of points in the cartesian coordinate system.
Specifically, for a straight line y on a cartesian coordinate system, kx + b, where (x, y) represents a coordinate point in the coordinate system, k represents a slope of the straight line, and b represents an intercept of the straight line. The straight line is transformed into: and b is y-xk, and the abscissa in the hough space is k, and the ordinate in the hough space is b, then b is y-xk, which is a straight line with slope-x and intercept y in the hough space. Several points (x) on the same straight line on the Cartesian coordinate system1,y1),(x2,y2),…,(xn,yn) The hough space corresponds to a plurality of straight lines, and the common intersection point (k, b) of the straight lines is the slope and intercept of the same straight line in the cartesian coordinate system, and a schematic diagram is shown in fig. 2.
Since the slope of the vertical line in the image cannot be calculatedThis is typically done in polar form with a hough transform. Specifically, a straight line is expressed by a polar coordinate equation ρ ═ xcos θ + ysin θ, where ρ is a polar distance, i.e., a distance from an origin to the straight line in a polar coordinate space; θ is the polar angle, i.e. the angle between the line segment passing through the origin and perpendicular to the straight line and the x-axis. Defining the horizontal coordinate in Hough space as theta and the vertical coordinate as rho, and then defining the coordinates (x) of a plurality of points on the same straight line on the polar coordinate system1,y1),(x2,y2),…,(xn,yn) The hough space corresponds to a plurality of curves, and a common intersection point (θ, ρ) of the curves is a polar angle and a polar distance of the same straight line in the polar coordinate system, and a schematic diagram is shown in fig. 3.
S33, solving the intersection point of the four straight lines of the elevator key panel boundary in pairs to obtain four vertex coordinates (x) of the upper left, the lower right and the upper right of the elevator key panel area in the original imagelt,ylt),(xlb,ylb),(xrb,yrb),(xrt,yrt)。
S4, solving the mapping relation under the view angle transformation by using the homography transformation matrix:
s41, homography transformation reflects the process of mapping from a two-dimensional plane to a three-dimensional space and then mapping from the three-dimensional space to another two-dimensional plane, X-Y-Z is taken as a three-dimensional space coordinate system and can be understood as a world coordinate system, and X-Y is taken as a pixel plane space coordinate system; x '-y' is a plane coordinate system of the elevator key panel, and the homography transformation can be described as follows: a point (X, Y) on the X-Y coordinate system, corresponding to a straight line l passing through the origin and the point on the X-Y-Z coordinate system:
Figure BDA0002911098440000121
the straight line intersects the x '-y' coordinate system plane at point (x ', y'), and the process from point (x, y) to point (x ', y') is referred to as a homography transformation.
Let the X ' -Y ' plane be perpendicular to the Z axis of the X-Y-Z space coordinate system and intersect the Z axis at the point (0,0,1), i.e. the point (X ', Y ') in the X ' -Y ' plane coordinate system is the point (X ') in the X-Y-Z space coordinate systemY', 1), is described using a homography transformation matrix HThe mapping relation between the X-Y plane coordinate system and the X-Y-Z space coordinate system is as follows:
Figure BDA0002911098440000122
Figure BDA0002911098440000123
in the formula, h1~h99 transformation parameters for the homography matrix;
further obtaining the mapping relation from the x-y plane coordinate system to the x '-y' plane coordinate system as follows:
Figure BDA0002911098440000124
the H matrix has 9 transformation parameters, but actually has only 8 degrees of freedom, because the X-Y-Z spatial coordinate system is a homogeneous coordinate system, and regardless of the coordinate transformation of coordinate scaling, when the H matrix is multiplied by a scaling factor k:
Figure BDA0002911098440000131
k x H and H actually represent the same mapping relationship, so H has only 8 degrees of freedom;
s42, solving H, wherein one method is to use H9Setting to 1, the equation to be solved is:
Figure BDA0002911098440000132
another approach is to add a constraint to the homography matrix H, modulo 1, as follows:
Figure BDA0002911098440000133
the equation to be solved is then:
Figure BDA0002911098440000134
and S43, defining target coordinate points of the elevator key panel under the scene coordinate system of the elevator key panel by the four vertexes of the elevator key panel under the pixel coordinate system obtained in the step S3:
(xlt,ylt)→(xlt′,ylt′)
(xlb,ylb)→(xlb′,ylb′)
(xrb,yrb)→(xrb′,yrb′)
(xrt,yrt)→(xrt′,yrt′)
the target coordinates are respectively substituted into the equation to be solved in step S42, and since the four vertex coordinates in the pixel coordinate system are solved first, the H matrix can be solved simultaneously.
S5, detecting and positioning the fingers of the elevator user in the original image by using an improved YOLOv3 algorithm, mapping and converting the position coordinates of the fingers through a homography transformation matrix to obtain corresponding position coordinates in an elevator key panel, and determining which floor key the position coordinates are located at, namely determining the floor key the fingers point to.
The input of the network is an original image collected by a camera in an elevator car, the output is the position coordinate (x, y, w, h) and the confidence of the finger of an elevator user in the original image, and the original image with the known position coordinate, confidence (1 or 0) and classification probability (namely the probability of the finger) of the finger of the elevator user is used as training data during training. The loss function of the network is designed before the network training.
Here, the improved YOLOv3 algorithm includes improving its loss function on the basis of the YOLOv3 target detection algorithm (i.e., YOLOv3 network), and reducing the feature extraction part of the YOLOv3 network by using an adaptive pruning algorithm.
Specifically, for the YOLOv3 network, the loss function is designed as follows:
Figure BDA0002911098440000141
where the first term is the coordinate error loss, λcoordIs a coordinate loss function coefficient; s denotes dividing the input image into S × S meshes; b represents the number of frames included in one mesh;
Figure BDA0002911098440000142
whether the jth frame of the ith grid contains an object or not is represented, the containing time value is 1, and the non-containing time value is 0; x and y respectively represent the center coordinates of the frame; w and h respectively represent the length and width of the frame; r isijX, y, w, h representing the jth prediction box of the ith mesh;
Figure BDA0002911098440000143
x, y, w, h representing the jth real box of the ith mesh;
the second term and the third term are confidence loss,
Figure BDA0002911098440000144
whether the jth frame of the ith grid does not contain an object or not is represented, the value of the non-containing time is 1, and the value of the containing time is 0; lambda [ alpha ]noobjTo balance the loss weights of object-bearing and object-free meshes, the goal is to reduce the confidence loss of the mesh borders without objects; cijRepresenting the confidence of the jth frame prediction of the ith grid;
Figure BDA0002911098440000145
representing the confidence of the jth frame of the ith grid;
the fourth term is category loss, classes represents the number of categories; p is a radical ofij(c) Representing the predicted probability that the jth frame of the ith grid belongs to the class c object;
Figure BDA0002911098440000146
representing the true probability that the jth frame of the ith mesh belongs to the class c object.
The above-mentioned YOLOv3 uses a positive and negative sample balance factor λnoobjThe confidence loss caused by most grids which are not responsible for predicting the target is reduced, the imbalance of positive and negative samples (the positive sample refers to the target to be detected by the network, and the negative sample is the background except the target) can be reduced to a certain extent, but the training problem of the difficult samples is not solved. Therefore, the embodiment refers to focallloss to the third term of the loss function, i.e. the confidence loss, so as to improve the learning ability of the model on the difficult samples.
Wherein Focalloss is improved based on cross entropy, and the functional form is as follows:
Figure BDA0002911098440000151
in the formula, the ratio of y,
Figure BDA0002911098440000152
respectively representing predicted and true probability values, i.e. pij(c),
Figure BDA0002911098440000153
Alpha is FocalLoss super ginseng, generally in [0,5 ]]Taking values in between.
Focalloss sets up (1-y) for positive and negative samples, respectivelyαAnd yαTwo weights, for example negative examples, when it is easy to learn, y is close to 0, then weight yαThe value of (a) is also small; when the sample is difficult to learn, y is close to 0.5, and weight yαThe value of (a) is large. Therefore, the samples difficult to classify have higher weight than the samples easy to classify, and the capability of the model for learning the samples difficult to classify can be improved.
The improved confidence loss function is as follows:
Figure BDA0002911098440000154
in addition, in an elevator application scene, the finger of an elevator user only occupies a small area in the image, that is, the frame of a small object in the data set occupies a large proportion, so that the training speed of the network can be accelerated by increasing the loss of the small object, and therefore, the embodiment further increases an adaptive scaling factor for the first term, that is, the coordinate loss, and the scaling factor is as follows:
Figure BDA0002911098440000155
in the formula (I), the compound is shown in the specification,
Figure BDA0002911098440000156
represents the width and height of the real bounding box; rhoboxThe range of (1) to (2), and the smaller the real frame, the larger the value, so that the loss proportion of the small object can be improved.
The improved coordinate loss is as follows:
Figure BDA0002911098440000157
in the convolutional neural network structure, one convolutional channel represents a certain feature of an image, and a model carries out prediction by integrating feature information of all channels, so that the more complex network structure can extract more features. The YOLOv3 network adopts the darknet-53 as the feature extraction main body, the structure has 53 convolutional layers, the number of the convolutional layer channels is doubled every time the structure is sampled, the total number of the channels reaches 17856, the target required to be detected by the elevator is a finger, and the structure of the darknet-53 has enough complexity to extract arrow features and has a large amount of redundancy by analyzing from a visual angle, so that the structure or the size of the network needs to be reduced.
The current pruning technology of the convolutional neural network can be divided into the following categories: a Weight Quantization (Weight Quantization) based method, such as HashNet, which groups Weight variables by hash, wherein the variables in the same group share the same Weight value, and although the method effectively reduces the parameter size of the model, the forward calculation speed of the network cannot be increased; the method based on the weight sparsification is characterized in that the weight variables in the network are subjected to sparse training, and then a large number of weight variables close to 0 in the network can be deleted, but the method can accelerate the forward calculation process only under special hardware; the method based on structure pruning reduces the structure of the network in a self-adaptive manner through training data, thereby effectively reducing the size of model parameters and improving the operation speed.
Therefore, in the embodiment, for the problem of redundancy of the complex of the dark net-53, a channel-level pruning is performed on the network by using a network pruning algorithm based on a structure pruning method, so as to reduce the number of the characteristic channels of the network.
In order to perform channel-level pruning on a network by using a Lasso algorithm, the network pruning method comprises the following steps:
first, each convolutional layer is added with a BN layer, and when the BN operation is used in the convolutional neural network, each input characteristic channel is allocated with a separate gammaikAnd betaikParameter, the output result of BN layer is expressed as:
Figure BDA0002911098440000161
in the formula (I), the compound is shown in the specification,
Figure BDA0002911098440000162
is the output of the BN layer; cikA k characteristic channel representing the i convolutional layer; mu.sikikRespectively representing channel characteristics CikThe mean and variance of (a) are obtained by historical training data statistics;
γikcorresponding to a scaling factor, the network scales using the scaling factor as the weight of the feature channel, and sparsifying the scaling factors by the Lasso algorithm:
Figure BDA0002911098440000163
in the formula, lossnewAs a final loss function, lossoldFor the improved loss function, Layers are the network layer number of the YOLOv3 network, and Channels are the channel number of the YOLOv3 network;
finally, all the gamma parameters are arranged according to the sequence from large to small, and then the gamma which is arranged at the back (with smaller value) is deleted according to the proportionikThe corresponding characteristic channel and the BN channel. A pruning diagram for networksclipping is shown in fig. 4.
S6, face information of the resident on the floor where the finger points is obtained, and the face information of the resident can be registered in advance in an elevator background system;
then whether to elevator user for resident family and the floor that the finger indicates whether carry out dual verification for the floor that this elevator user lives in, this floor button is just chosen under the dual verification passing condition, and final control elevator car moves to this floor, so can ensure that elevator user is this floor resident family, greatly improve the interactive of elevator and the security between the resident family.
The techniques described herein may be implemented by various means. For example, these techniques may be implemented in hardware, firmware, software, or a combination thereof. For a hardware implementation, the processing modules may be implemented within one or more Application Specific Integrated Circuits (ASICs), Digital Signal Processors (DSPs), Programmable Logic Devices (PLDs), field-programmable gate arrays (FPGAs), processors, controllers, micro-controllers, electronic devices, other electronic units designed to perform the functions described herein, or a combination thereof.
For a firmware and/or software implementation, the techniques may be implemented with modules (e.g., procedures, steps, flows, and so on) that perform the functions described herein. The firmware and/or software codes may be stored in a memory and executed by a processor. The memory may be implemented within the processor or external to the processor.
Those of ordinary skill in the art will understand that: all or part of the steps for implementing the method embodiments may be implemented by hardware related to program instructions, and the program may be stored in a computer readable storage medium, and when executed, the program performs the steps including the method embodiments; and the aforementioned storage medium includes: various media that can store program codes, such as ROM, RAM, magnetic or optical disks. For example, the hardware is a computing device comprising a processor and a memory for storing a program executable by the processor, which when executing the program stored by the memory, implements the embeddable non-contact elevator key interaction method described above.
The above embodiments are preferred embodiments of the present invention, but the present invention is not limited to the above embodiments, and any other changes, modifications, substitutions, combinations, and simplifications which do not depart from the spirit and principle of the present invention should be construed as equivalents thereof, and all such changes, modifications, substitutions, and simplifications are intended to be included in the scope of the present invention.

Claims (10)

1. An embeddable non-contact elevator key interaction method integrating face recognition is characterized by comprising the following steps:
s1, obtaining an original image shot by a camera in the elevator car, and carrying out edge detection on a shot area of the original image through a Laplace filter operator to obtain an edge image;
s2, filtering the edge image by using linear filtering operators in the horizontal direction and the vertical direction to enhance the linear edges in the horizontal direction and the vertical direction, and reserving the edges of the elevator key panel area while eliminating noise;
s3, respectively carrying out linear detection on the image subjected to horizontal direction filtering and the image subjected to vertical direction filtering by adopting a Hough linear detection algorithm so as to position the region of the elevator key panel;
s4, solving the mapping relation under the view angle transformation by using the homography transformation matrix;
s5, detecting and positioning the fingers of the elevator users in the original images by using an improved YOLOv3 algorithm, and obtaining floor keys to which the fingers point according to the homography transformation matrix;
s6, acquiring the face information of the resident of the floor pointed by the finger, performing double verification on whether the elevator user is the resident and whether the floor pointed by the finger is the floor where the elevator user lives, wherein the floor key is selected under the condition that the double verification is passed, and finally controlling the elevator car to run to the floor.
2. The method for interacting the embeddable non-contact elevator keys fusing human face recognition as claimed in claim 1, wherein the camera is installed above the elevator key panel and shoots the elevator key panel downward;
in step S1, the process of edge detection of the camera shooting area by the Laplace filter operator is as follows:
s11, carrying out graying processing on the original image to obtain a grayscale image;
s12, based on the principle of no-leakage elevator key panel boundary, adopting a second-order gradient Laplace filter operator to detect the edge of the gray image, wherein the Laplace filter operator specifically calculates the edge gradient by using second-order difference, and the process is as follows:
consider the one-dimensional sequence { f (1), f (2),.. f (x-1), f (x +1) } with the second order difference at the x position expressed as:
f``(x)=(f(x+1)-f(x))-(f(x)-f(x-1))
further simplifying as follows:
f``(x)=f(x-1)-2*f(x)+f(x+1)
i.e. the second order difference of a one-dimensional discrete sequence can be expressed as the result of the convolution of the sequence with a one-dimensional convolution kernel [ +1, -2, +1], generalizing this conclusion into a two-dimensional matrix of the grayscale image:
for gray scale image IgrayTwo-dimensional kernel K with dimension 3 x 3L
Figure FDA0002911098430000021
Since the two-dimensional kernel only considers the horizontal direction and the vertical direction, the diagonal information is added into the considerationThe convolution kernel KLReplacing the steps as follows:
Figure FDA0002911098430000022
the second order difference information of the gray level image is obtained by convolution of the convolution kernel and the gray level image, namely:
G=KL*Igray
as the convolution kernel scale increases, the more pronounced the detected edge is;
and (4) taking out points with convolution results of 0, wherein the points are edges, and the edge image is a set of points with obvious gray changes in the gray image.
3. The method for interacting keys of an embeddable non-contact elevator fusing human face recognition as claimed in claim 1, wherein the process of step S2 is as follows:
s21, defining the size as 1 x n pairs of horizontal straight line filter operators KhorizontalAnd a vertical line filter operator K of size n × 1vertical
Figure FDA0002911098430000023
Figure FDA0002911098430000024
In the formula, T represents the transposition of a vector pair, and n represents the size of a filter operator; khorizontalSensitive to horizontal linear edges, KverticalSensitive to vertical linear edges;
s22, filtering the Laplace edge image ILaplaceConvolving with two operators respectively to obtain a horizontal filtering image IhorizontalAnd filtering the image I in the vertical directionvertical
Ihorizontal=Khorizontal*ILaplace
Ivertical=Kvertical*ILaplace
4. The method for interacting keys of an embeddable non-contact elevator fusing human face recognition as claimed in claim 1, wherein the process of step S3 is as follows:
s31, considering that after the edge image is filtered in the horizontal direction and the vertical direction, the non-horizontal or vertical linear edges of the edge image can be restrained, firstly, a threshold is used for segmenting and removing the non-horizontal linear edges and the non-vertical linear edges;
s32, respectively carrying out linear detection on the horizontal direction filter image and the vertical direction filter image which are subjected to threshold segmentation by using a Hough linear detection algorithm, and finally obtaining four elevator key panel boundary linear lines;
s33, solving the intersection point of the four straight lines of the elevator key panel boundary in pairs to obtain four vertex coordinates (x) of the upper left, the lower right and the upper right of the elevator key panel area in the original imagelt,ylt),(xlb,ylb),(xrb,yrb),(xrt,yrt)。
5. The method for interacting the keys of the embeddable non-contact elevator integrating the face recognition as claimed in claim 4, wherein the homography transformation reflects the process of mapping from one two-dimensional plane to a three-dimensional space and then from the three-dimensional space to another two-dimensional plane, X-Y-Z is a three-dimensional space coordinate system which can be understood as a world coordinate system, X-Y is a pixel plane space coordinate system, and X '-Y' is an elevator key panel plane coordinate system; the homography transform can be described as: a point (X, Y) on the X-Y coordinate system corresponding to a straight line passing through the origin and the point on the X-Y-Z coordinate system
Figure FDA0002911098430000031
The straight line intersects the x '-y' coordinate system plane at the point (x ', y'), and the process from the point (x, y) to the point (x ', y') is called homographyChanging;
the process of solving the mapping relation under the view transformation by using the homography transformation matrix is as follows:
s41, setting the X '-Y' plane to be vertical to the Z axis of the X-Y-Z space coordinate system and to be intersected with the Z axis to be a point (0,0,1), namely, the point (X ', Y') under the X '-Y' plane coordinate system is a point (X ', Y', 1) under the X-Y-Z space coordinate system, and describing the mapping relation between the X-Y plane coordinate system and the X-Y-Z space coordinate system by using a homography transformation matrix H:
Figure FDA0002911098430000032
Figure FDA0002911098430000033
in the formula, h1~h99 transformation parameters for the homography matrix;
further obtaining the mapping relation from the x-y plane coordinate system to the x '-y' plane coordinate system as follows:
Figure FDA0002911098430000034
the H matrix has 9 transformation parameters, but actually has only 8 degrees of freedom, because the X-Y-Z space coordinate system is a homogeneous coordinate system, and regardless of the coordinate transformation of coordinate scaling, when multiplying the H matrix by a scaling factor k:
Figure FDA0002911098430000041
k x H and H actually represent the same mapping relationship, so H has only 8 degrees of freedom;
s42, solving H, wherein one method is to use H9Setting to 1, the equation to be solved is:
Figure FDA0002911098430000042
another approach is to add a constraint to the homography matrix H, modulo 1, as follows:
Figure FDA0002911098430000043
the equation to be solved is then:
Figure FDA0002911098430000044
and S43, defining target coordinate points of the four vertexes of the elevator key panel under the pixel coordinate system obtained in the step S3 under the scene coordinate system of the elevator key panel:
(xlt,ylt)→(xlt′,ylt′)
(xlb,ylb)→(xlb′,ylb′)
(xrb,yrb)→(xrb′,yrb′)
(xrt,yrt)→(xrt′,yrt′)
these target coordinates are respectively substituted into the equation to be solved in step S42, and the H matrix is solved simultaneously.
6. The method of claim 1, wherein the improved YOLOv3 algorithm comprises improving its loss function based on YOLOv3 target detection algorithm, and using an adaptive pruning algorithm to reduce the feature extraction part of YOLOv3 network.
7. The method for interacting embedded non-contact elevator keys fusing face recognition according to claim 6, characterized in that the loss function of the YOLOv3 network is designed as follows:
Figure FDA0002911098430000051
where the first term is the coordinate error loss, λcoordIs a coordinate loss function coefficient; s denotes dividing the input image into S × S meshes; b represents the number of frames included in one mesh;
Figure FDA0002911098430000052
whether the jth frame of the ith grid contains an object or not is represented, the containing time value is 1, and the non-containing time value is 0; x and y respectively represent the center coordinates of the frame; w and h respectively represent the length and width of the frame; r isijX, y, w, h representing the jth prediction box of the ith mesh;
Figure FDA0002911098430000053
x, y, w, h representing the jth real box of the ith network;
the second term and the third term are confidence loss,
Figure FDA0002911098430000054
whether the jth frame of the ith grid does not contain an object or not is represented, the value of the non-containing time is 1, and the value of the containing time is 0; lambda [ alpha ]noobjTo balance the loss weights of object-bearing and object-free meshes, the goal is to reduce the confidence loss of the mesh borders without objects; cijRepresenting the confidence of the jth frame prediction of the ith grid;
Figure FDA0002911098430000055
representing the confidence of the jth frame of the ith grid;
the fourth term is category loss, classes represents the number of categories; p is a radical ofij(c) Representing the predicted probability that the jth frame of the ith grid belongs to the class c object;
Figure FDA0002911098430000056
representing the true probability that the jth frame of the ith grid belongs to the class c object;
the improvement of the loss function is specifically as follows:
(1) and introducing FocalLoss for a third term, namely confidence coefficient loss to improve the learning capability of the model on the difficult samples, wherein Focallos is improved based on cross entropy and has the following functional form:
Figure FDA0002911098430000057
in the formula, the ratio of y,
Figure FDA0002911098430000058
respectively representing predicted and true probability values, i.e. pij(c),
Figure FDA0002911098430000059
Alpha is FocalLoss super ginseng;
the improved confidence loss function is as follows:
Figure FDA0002911098430000061
(2) adding an adaptive scaling factor to the first term, coordinate loss, as follows:
Figure FDA0002911098430000062
in the formula (I), the compound is shown in the specification,
Figure FDA0002911098430000063
represents the width and height of the real bounding box; rhoboxThe range of (1) to (2) is within 1-2, and the smaller the real frame is, the larger the numerical value is;
the improved coordinate loss is as follows:
Figure FDA0002911098430000064
8. the method for interacting the keys of the embeddable non-contact elevator fusing the face recognition as claimed in claim 6, wherein the YOLOv3 network adopts the dark net-53 as a feature extraction subject, and aiming at the problem of dark net-53 complexity redundancy, the network is pruned at a channel level by using a network pruning algorithm based on a structure pruning method to reduce the number of feature channels of the network:
first, each convolutional layer is added with a BN layer, and when the BN operation is used in the convolutional neural network, each input characteristic channel is allocated with a separate gammaikAnd betaikParameter, the output result of BN layer is expressed as:
Figure FDA0002911098430000065
in the formula (I), the compound is shown in the specification,
Figure FDA0002911098430000066
is the output of the BN layer; cikA k characteristic channel representing the i convolutional layer; mu.sik,σikRespectively represent channel characteristics CikThe mean and variance of (a) are obtained by historical training data statistics;
γikcorresponding to a scaling factor, the network scales using the scaling factor as the weight of the feature channel, and sparsifying the scaling factors by the Lasso algorithm:
Figure FDA0002911098430000067
in the formula, lossnewAs a final loss function, lossoldFor improved loss function, Layers are the number of network Layers of the Yolov3 networkChannels is the number of Channels of the YOLOv3 network;
finally, all the gamma parameters are arranged according to the sequence from big to small, and then the sequenced gamma is deleted according to the proportionikThe corresponding characteristic channel and the BN channel.
9. A computing device comprising a processor and a memory for storing a processor-executable program, wherein the processor, when executing the program stored in the memory, implements the method of face recognition fused embeddable non-contact elevator key interaction of any of claims 1 to 8.
10. An elevator, characterized in that the elevator realizes the identification of floor keys and the operation control of a car through the embedded non-contact elevator key interaction method integrating the face identification according to any one of claims 1 to 8.
CN202110086981.6A 2021-01-22 2021-01-22 Face recognition-fused embeddable non-contact elevator key interaction method Active CN113220114B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110086981.6A CN113220114B (en) 2021-01-22 2021-01-22 Face recognition-fused embeddable non-contact elevator key interaction method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110086981.6A CN113220114B (en) 2021-01-22 2021-01-22 Face recognition-fused embeddable non-contact elevator key interaction method

Publications (2)

Publication Number Publication Date
CN113220114A true CN113220114A (en) 2021-08-06
CN113220114B CN113220114B (en) 2023-06-20

Family

ID=77084468

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110086981.6A Active CN113220114B (en) 2021-01-22 2021-01-22 Face recognition-fused embeddable non-contact elevator key interaction method

Country Status (1)

Country Link
CN (1) CN113220114B (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113658088A (en) * 2021-08-27 2021-11-16 诺华视创电影科技(江苏)有限公司 Face synthesis method and device based on multiple discriminators
CN115969144A (en) * 2023-01-09 2023-04-18 东莞市智睿智能科技有限公司 Sole glue spraying track generation method, system, equipment and storage medium
TWI836406B (en) * 2022-04-20 2024-03-21 邁啟科技股份有限公司 Method for the non-contact triggering of buttons

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1887317A1 (en) * 2006-08-04 2008-02-13 Fasep 2000 S.r.l. Method and device for non-contact measurement of the alignment of motor vehicle wheels
US20080244468A1 (en) * 2006-07-13 2008-10-02 Nishihara H Keith Gesture Recognition Interface System with Vertical Display
CN102701033A (en) * 2012-05-08 2012-10-03 华南理工大学 Elevator key and method based on image recognition technology
US20130310951A1 (en) * 2012-05-21 2013-11-21 Ftsi, Llc Automation and motion control system
US20140267042A1 (en) * 2013-03-13 2014-09-18 Jeremy Burr Gesture pre-processing of video stream using skintone detection
CN106598221A (en) * 2016-11-17 2017-04-26 电子科技大学 Eye key point detection-based 3D sight line direction estimation method
JP2019177973A (en) * 2018-03-30 2019-10-17 三菱電機株式会社 Input apparatus and input method

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080244468A1 (en) * 2006-07-13 2008-10-02 Nishihara H Keith Gesture Recognition Interface System with Vertical Display
EP1887317A1 (en) * 2006-08-04 2008-02-13 Fasep 2000 S.r.l. Method and device for non-contact measurement of the alignment of motor vehicle wheels
CN102701033A (en) * 2012-05-08 2012-10-03 华南理工大学 Elevator key and method based on image recognition technology
US20130310951A1 (en) * 2012-05-21 2013-11-21 Ftsi, Llc Automation and motion control system
US20140267042A1 (en) * 2013-03-13 2014-09-18 Jeremy Burr Gesture pre-processing of video stream using skintone detection
CN106598221A (en) * 2016-11-17 2017-04-26 电子科技大学 Eye key point detection-based 3D sight line direction estimation method
JP2019177973A (en) * 2018-03-30 2019-10-17 三菱電機株式会社 Input apparatus and input method

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113658088A (en) * 2021-08-27 2021-11-16 诺华视创电影科技(江苏)有限公司 Face synthesis method and device based on multiple discriminators
CN113658088B (en) * 2021-08-27 2022-12-02 诺华视创电影科技(江苏)有限公司 Face synthesis method and device based on multiple discriminators
TWI836406B (en) * 2022-04-20 2024-03-21 邁啟科技股份有限公司 Method for the non-contact triggering of buttons
CN115969144A (en) * 2023-01-09 2023-04-18 东莞市智睿智能科技有限公司 Sole glue spraying track generation method, system, equipment and storage medium

Also Published As

Publication number Publication date
CN113220114B (en) 2023-06-20

Similar Documents

Publication Publication Date Title
CN113220114B (en) Face recognition-fused embeddable non-contact elevator key interaction method
CN107808143B (en) Dynamic gesture recognition method based on computer vision
CN111709310B (en) Gesture tracking and recognition method based on deep learning
KR101526644B1 (en) Method system and software for providing image sensor based human machine interfacing
CN110688965B (en) IPT simulation training gesture recognition method based on binocular vision
CN102426480A (en) Man-machine interactive system and real-time gesture tracking processing method for same
CN114241548A (en) Small target detection algorithm based on improved YOLOv5
CN109815876A (en) Gesture identification method based on address events stream feature
CN112507918B (en) Gesture recognition method
CN105929947B (en) Man-machine interaction method based on scene situation perception
US10401947B2 (en) Method for simulating and controlling virtual sphere in a mobile device
CN109033978A (en) A kind of CNN-SVM mixed model gesture identification method based on error correction strategies
CN111444764A (en) Gesture recognition method based on depth residual error network
CN103793056A (en) Mid-air gesture roaming control method based on distance vector
CN104167006A (en) Gesture tracking method of any hand shape
CN112657176A (en) Binocular projection man-machine interaction method combined with portrait behavior information
CN114792443A (en) Intelligent device gesture recognition control method based on image recognition
Ling et al. Research on gesture recognition based on YOLOv5
CN107918507A (en) A kind of virtual touchpad method based on stereoscopic vision
Hoque et al. Computer vision based gesture recognition for desktop object manipulation
CN111160372B (en) Large target identification method based on high-speed convolutional neural network
Khan et al. Computer vision based mouse control using object detection and marker motion tracking
Yeh et al. Vision-based virtual control mechanism via hand gesture recognition
CN111913584B (en) Mouse cursor control method and system based on gesture recognition
Wu et al. Partially occluded head posture estimation for 2D images using pyramid HoG features

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant