CN104834922B

CN104834922B - Gesture identification method based on hybrid neural networks

Info

Publication number: CN104834922B
Application number: CN201510280013.3A
Authority: CN
Inventors: 纪禄平; 尹力; 周龙; 王强; 卢鑫; 黄青君; 杨洁
Original assignee: University of Electronic Science and Technology of China
Current assignee: University of Electronic Science and Technology of China
Priority date: 2015-05-27
Filing date: 2015-05-27
Publication date: 2017-11-21
Anticipated expiration: 2035-05-27
Also published as: CN104834922A

Abstract

The invention discloses a kind of gesture identification method based on hybrid neural networks, for images of gestures to be identified and images of gestures training sample, detect to obtain noise spot using Pulse Coupled Neural Network first, compound Denoising Algorithm is recycled to handle noise spot, then using the marginal point in cell neural network extraction images of gestures, marginal point according to extracting obtains connected region, finger tip detection is carried out to each connected region using curvature and obtains finger tip point undetermined, face part is excluded to disturb to obtain gesture area, then split according to gesture shape feature, the Fourier descriptor for retaining phase information is obtained according to the profile point of gesture area after segmentation, several Fourier descriptors are as gesture feature before selection；BP neural network is trained according to the gesture feature of images of gestures training sample, the gesture feature input BP neural network of images of gestures to be identified is identified.The present invention improves the accuracy rate to gesture identification by the utilization of a variety of neutral nets.

Description

Gesture identification method based on hybrid neural networks

Technical field

The invention belongs to technical field of hand gesture recognition, more specifically, is related to a kind of hand based on hybrid neural networks Gesture recognition methods.

Background technology

With advancing by leaps and bounds for computer technology, human-computer interaction technology is increasingly popularized in the life of people.Man-machine friendship Mutually (Human-Computer Interaction, HCI) technology refer between user and computer using certain mode of operation and Interaction between a kind of people performed and computer.Its development substantially experienced pure handwork stage, verbal order Control stage, user interface stage etc., however as the continuous development of the technologies such as artificial intelligence in recent years, gradually cause to man-machine The attention of interaction technique development.

Now with continuous expansion of the computer in terms of application field, existing man-machine interaction mode can not meet Higher level requirement of the people to daily demand, it is badly in need of a kind of more succinct, friendly new man-machine mode interacted. Because the final purpose of man-machine interaction is exchanged naturally between people and machine to realize, and in daily life person to person it Between be largely to pass on information by body language or facial expression, only sub-fraction is come complete by natural language Into, this has bigger advantage in terms of indicating that body language expression human emotion or intention.Due to working as in body language In, hand plays particularly important role, and therefore, the interactive mode based on gesture behavior is gesture Activity recognition system, namely Gesture recognition system gets more and more people's extensive concerning.

Generally, gesture recognition system is mainly made up of following components：Gesture pretreatment, Hand Gesture Segmentation, hand Gesture modeling, gesture feature extraction, gesture identification.Denoising operation for gesture pretreatment operation, mainly images of gestures, at present Common Denoising Algorithm includes：Mean filter, medium filtering, low pass spatial filtering, frequency domain low-pass ripple and pulse-couple god Through network etc., but for a variety of noises in the presence of, the noise removal capability of current algorithm, which can not all reach, good to go Make an uproar effect, therefore it is most important for the identification process in later stage to design a good Denoising Algorithm.Operated for Hand Gesture Segmentation, Currently used Hand Gesture Segmentation method has the dividing method based on Skin Color Information, the dividing method based on movable information and is based on The dividing method of marginal information.Because the dividing method based on Skin Color Information is easily disturbed by background information, based on edge The dividing method of information can not reach good segmentation effect again, therefore how to design a good effectively partitioning algorithm and be also It is vital.Extract and operate for gesture feature, most widely used at present is the feature extraction side based on Fourier descriptor Method, but because the rotational invariance of this method make it that this method is little for the changing features of the gesture after gesture rotation, Therefore it is also vital for how designing a Fourier descriptor without rotational invariance.Grasped for gesture identification Make, method common at present has template matching technique, SVMs, neural net method, hidden Markov model etc., therefore It is how equally most important for gesture recognition system from a good gesture identification method.

Neural net method refers to simulate people's brain neuron using some simple processing units, and these are simple Processing unit connects into network to realize a science to brain simulation in some way.Neural net method often have with Lower advantage：Parallel computation, distributed storage, robustness, nonlinear processing and good adaptivity and fault-tolerance ability. Therefore, neural net method can be applied under multiple scenes.Such as：Gesture identification, image segmentation, noise processed etc..

At present, neural net method has obtained increasing application in gesture Activity recognition field.It is however, neural Application of the network method in gesture Activity recognition field is also only limitted to this stage of gesture identification, for gesture Activity recognition The application in other stages is seldom.

The content of the invention

It is an object of the invention to overcome the deficiencies of the prior art and provide a kind of gesture identification based on hybrid neural networks Method, the denoising effect of images of gestures is improved using Pulse Coupled Neural Network, Hand Gesture Segmentation is carried out using cell neural network, Using the Fourier descriptor with rotational variability as gesture feature, gesture identification is carried out using BP neural network, so as to Improve the accuracy rate of gesture identification

For achieving the above object, the gesture identification method of the invention based on hybrid neural networks, comprises the following steps：

S1：The feature of images of gestures and images of gestures training sample to be identified is extracted, specific steps include：

S1.1：It is recommended that the Pulse-coupled Neural Network Model of gesture gray level image, by current each pixel of gesture gray level image The gray value of point is special using the granting of Pulse Coupled Neural Network as the input that neuron is corresponded in Pulse Coupled Neural Network Property detects to the pixel of images of gestures, if the output state of pixel is fired state, by testing result matrix In element corresponding to the pixel be set to 1, be otherwise set to 0；Each element of testing result matrix is traveled through, if element value is 1, Then the center using the element as noise reduction process window, the size of noise reduction process window are set according to actual conditions, are counted at noise reduction The value of the other elements in window in addition to central point element is managed, if value is more than predetermined threshold value for 0 number of elements, illustrates this Central point is noise spot, and other situations are not then noise spots；

Calculate two kinds of noise estimation value H (i, j) and the V (i, j) of noise spot as follows respectively：

H (i, j)=| a (i, j)-b (i, j) |

Wherein, a (i, j) is the gray value at pixel (i, j) place in image, and b (i, j) is that the pixel carries out medium filtering Intermediate value output gray level value afterwards；

Wherein, m₁(i, j) and m₂It is closest with a (i, j) gray value in neighborhood where (i, j) represents pixel (i, j) respectively Two points gray value；

If H (i, j) >=T₁, and V (i, j) >=T₂, then the noise spot is handled using medium filtering, otherwise adopted The noise spot is handled with mean filter；

S1.2：Histogram equalization is carried out to the gesture gray level image after step S1.1 denoisings；

S1.3：The Cellular Neural Networks of gesture gray level image are established, by each picture of gesture gray level image after equalization The gray value of vegetarian refreshments (i, j) is as the input u that cell is corresponded in Cellular Neural Networks_ij, according to the public affairs of state migration procedure Formula is iterated, and until the convergence of whole cell neural network, obtains the output y of each cell_ij(t)；Traversal cell neutral net In cell member corresponding to each pixel output valve, when the output valve of some pixel is in the range of [0,1], if it is corresponding The pixel value of other pixels and more than predetermined threshold value in neighborhood, then this pixel is not edge pixel, is otherwise edge pixel point； When output valve [- 1,0) in the range of, be not edge pixel point；

S1.4：The edge pixel point obtained according to step S1.3 obtains connected region, and extraction obtains the profile of connected region, Finger tip detection is carried out respectively to each connected region, Fingertip Detection is：

Each wire-frame image vegetarian refreshments in connected region is traveled through, using the pixel as datum mark, coordinate is designated as p (p_x,p_y, 0) a distance constant L, is preset, the l-th point p before p points is taken along contour direction₁(p_1x,p_1y, 0), take the l-th behind point p Point p₂(p_2x,p_2y, 0), calculate vectorWith vectorBetween angle cosine value cos α, if cos α are more than preset curvature Threshold value T, then it is finger tip point undetermined to judge the point, otherwise not as finger tip point undetermined；

Fingertip location vector product is determined according to traversal directionSymbol, if according to gesture area overall profile Traversal clockwise when, vectorial product code should be negative, otherwise just, to calculate finger tip point undetermined vectorWith vectorBetween Vector productIf the symbol of the vector product is identical with symbol corresponding to fingertip location, finger tip undetermined is left Point, does not otherwise retain；

Judge in all finger tip points undetermined for being detected in the connected region, the maximum finger tip point undetermined of y-coordinate and y-coordinate Whether the y-coordinate difference of minimum finger tip point undetermined exceedes the half of face height, if it is, the connected region is not gesture area Domain, otherwise as gesture area undetermined；Whether finger tip point quantity undetermined surpasses in each gesture area undetermined further judged Default amount threshold is crossed, if it is, the connected region is gesture area, is not otherwise；

The principal direction of gesture area is asked for, is entered according to principal direction according to gesture length with width ratio for 2 pairs of gesture areas Row segmentation, the gesture area after being split；

S1.5：The gesture area that will be obtained after step S1.4 segmentations, by the profile point coordinates of gesture area with plural shape Formula represents, all profile point coordinates is formed into discrete series, note profile point quantity is n, and Fourier's change is carried out to the discrete series Change, obtain n Fourier coefficient z (k), k=0,1 ..., n-1, calculate Fourier descriptor

Wherein k '=1,2 ..., n-1,Represent the angle of gesture area principal direction and x-axis.

Q constitutive characteristic vector before being selected in Fourier descriptor；

S2：BP neural network, its corresponding hand are inputted using the characteristic vector of training sample images of gestures as training sample Output of the gesture image category as BP neural network, is trained to BP neural network；

S3：In the BP neural network that the characteristic vector input step S2 of images of gestures to be identified is trained, output identification Obtained images of gestures classification.

Gesture identification method of the invention based on hybrid neural networks, is trained for images of gestures to be identified and images of gestures Sample, the differentiation for being carried out noise spot and marginal point using Pulse Coupled Neural Network first are detected, and recycle compound Denoising Algorithm Noise spot is handled, the marginal point in images of gestures is then extracted using cell neural network, according to the edge extracted Point obtains connected region, and carrying out finger tip detection to each connected region using curvature obtains finger tip point undetermined, then excludes face The interference divided, obtains gesture area, is then split according to hand features of shape, the gesture area after being split；According to The profile point of gesture area obtains the Fourier descriptor for retaining phase information, and several Fourier descriptors are as hand before selecting Gesture feature；BP neural network is trained according to the gesture feature of images of gestures training sample, the gesture of images of gestures to be identified is special Sign input BP neural network is identified.

The invention has the advantages that：

(1) differentiation of noise spot and marginal point is carried out using Pulse Coupled Neural Network, with reference to compound Denoising Algorithm opponent Gesture image carries out denoising, can improve denoising effect；

(2) Hand Gesture Segmentation combines the coarse segmentation of cell neural network and the subdivision based on gesture shape feature is cut, can be with Improve the degree of accuracy of Hand Gesture Segmentation；

(3) gesture feature uses Fourier descriptor, remains phase information, can improve discrimination.

Brief description of the drawings

Fig. 1 is the flow chart of the gesture identification method of the invention based on hybrid neural networks

Fig. 2 is the flow chart of images of gestures feature extraction in the present invention

Fig. 3 is to combine gesture shape characteristic to carry out the flow chart that gesture subdivision is cut

Fig. 4 is the schematic diagram of finger tip detection of the present invention；

Fig. 5 is the exemplary plot of gesture coarse segmentation；

Fig. 6 is the exemplary plot that gesture subdivision is cut.

Embodiment

The embodiment of the present invention is described below in conjunction with the accompanying drawings, so as to those skilled in the art preferably Understand the present invention.Requiring particular attention is that in the following description, when known function and the detailed description of design perhaps When can desalinate the main contents of the present invention, these descriptions will be ignored herein.

Embodiment

Fig. 1 is the flow chart of the gesture identification method of the invention based on hybrid neural networks.As shown in figure 1, base of the present invention Comprise the following steps in the gesture identification method of hybrid neural networks：

S101：Extract the feature of sample and training sample to be identified：

Feature extraction is carried out firstly the need of to images of gestures to be identified and images of gestures training sample.Fig. 2 is in the present invention The flow chart of images of gestures feature extraction.As shown in Fig. 2 images of gestures feature extraction comprises the following steps in the present invention：

S201：Images of gestures noise suppression preprocessing：

The present invention is using based on Pulse Coupled Neural Network (PCNN-Pulse Coupled Neural Network) and again The denoising that the Denoising Algorithm that Denoising Algorithm is combined carries out gesture gray level image is closed, is first passed through using Pulse Coupled Neural Network pair Images of gestures carries out the differentiation detection of noise spot and marginal point, is carried out afterwards according to the type of noise spot using compound Denoising Algorithm Denoising operates, so as to reach the purpose that a variety of noises are removed on the premise of marginal information is retained.

Each neuron of Pulse Coupled Neural Network is made up of three parts：Receiving portion, modulating part and pulse production Raw device.Pulse Coupled Neural Network is a kind of common method of image noise reduction pretreatment, and its main function is that removing the spiced salt makes an uproar Sound.When Pulse Coupled Neural Network is used for image noise reduction field, it is possible to understand that network is connected into the local of two-dimension single layer, Neuron and the pixel in pending gray level image are one-to-one in this network, and between adjacent neurons It is the relation being connected with each other.During noise reduction process, the gray value of each pixel of pending image can be understood as god Feed back input through member, while the output of each neuron is only used as the input of adjacent neurons, and each neuron is defeated Do well only two kinds：Fired state and the state that misfires, can be designated as 1 and 0 respectively.Due to pixel corresponding to noise and surrounding Pixel difference it is larger, therefore can utilize the granting characteristic combination noise of Pulse Coupled Neural Network self-characteristic progress The judgement of noise spot, specific determination methods are as follows：

Establish the Pulse-coupled Neural Network Model of gesture gray level image；By the ash of current each pixel of gesture gray level image Then angle value utilizes the granting characteristic of Pulse Coupled Neural Network as the input that neuron is corresponded in Pulse Coupled Neural Network The pixel of whole image is detected, if the output state of pixel is fired state, by testing result matrix Element corresponding to the pixel is set to 1, is otherwise set to 0, it is seen that testing result matrix is identical with the size of pending image；Set Noise reduction process window size, it is 3 × 3 in the present embodiment；Each element of testing result matrix is traveled through, if element value is 1, It is fired state to be exactly, then the center using the element as noise reduction process window, counts in noise reduction process window and removes central point element The value (i.e. the testing results of other pixels beyond central point corresponding pixel points point) of other elements in addition, if be worth for 0 (i.e. Misfire state) number of elements be more than predetermined threshold value, it is noise spot to illustrate the central point, and then the central point is not other situations Noise spot.So as to reach the purpose for noise spot and marginal point judge differentiation.Amount threshold is usually noise reduction process window The half of middle number of elements.

Corresponding denoising operation is carried out after judging to obtain noise spot, then using compound Denoising Algorithm, its main method is：

Assuming that a (i, j) is the gray value at pixel (i, j) place in image, 1≤i≤M, 1≤j≤N, M represent gesture gray scale The pixel quantity (i.e. columns) that image is often gone, N represent the pixel quantity (i.e. line number) of gesture gray level image each column, b (i, j) The intermediate value output gray level value after medium filtering is carried out for the pixel.In order to reach the purpose to Gaussian noise, using noise spot Pixel value and intermediate value output gray level value difference as noise estimation value, as shown in following formula (1)

H (i, j)=| a (i, j)-b (i, j) | (1)

Because the type of noise is different, if merely using above-mentioned method of estimation, it is impossible to reach and distinguish a variety of noises Purpose, therefore on the basis of above-mentioned formula, introduce another noise estimation value V (i, j) again, the parameter is pixel The pixel value a (i, j) at (i, j) place and similar two points m₁(i, j) and m₂The average value of the gradient sum of (i, j), such as following formula (3) It is shown

Wherein, m₁(i, j) and m₂It is closest with a (i, j) gray value in neighborhood where (i, j) represents pixel (i, j) respectively Two points gray value.

Setting threshold value is T₁And T₂, then by the relation between above two noise estimation value and threshold value, realize and difference made an uproar The alignment processing of sound, specific method are：

If H (i, j) >=T₁, and V (i, j) >=T₂, then judge that the type of the noise spot is made an uproar for salt-pepper noise or pulse Sound, the noise spot is handled using medium filtering, will the gray value of the noise spot be revised as medium filtering output valve, such as Fruit H (i, j) ＜ T₁, or H (i, j) >=T₁And V (i, j) ＜ T₂, then the noise type is judged for Gaussian noise, using average Filtering is handled the noise spot, will the gray value of the noise spot be revised as mean filter output valve.

In above-mentioned algorithm, threshold value T₁And T₂Selection to the fine or not most important of compound Denoising Algorithm result.It is wherein current Conventional Research on threshold selection is that mean absolute deviation algorithm is MAD algorithms.It can be seen from the algorithm, T₁=3.5 δ_ij, δ_ijRepresent The mean absolute deviation of all pixels point in the denoising window of pixel (i, j).Threshold value T₂Selection primarily directed to gesture figure The texture being likely to occur as in, according to MAD algorithms and experiment experience, T₂Value be ordinarily selected to 6~10 integer.

S202：Histogram equalization：

Histogram equalization processing refers to the method being adjusted using image histogram to the contrast of image, so as to handle The grey level histogram of original image becomes to be uniformly distributed in global scope from some gray areas for comparing concentration.The present invention is right Gesture gray level image after step S201 denoisings carries out histogram equalization processing, be in order to expand images of gestures prospect and The difference of background part gray value.Histogram equalization is a kind of current method of conventional picture superposition, and its is specific Step will not be repeated here.

S203：Gesture coarse segmentation based on cell neural network：

As Pulse Coupled Neural Network, the neuron in cell neural network and the pixel in gesture gray level image Corresponding, the cells of note the i-th row jth row is C (i, j) (pixel (i, j) in corresponding gesture gray level image), cell C (i, J) formed by four parts：Input variable u_ij, state transfering variable x_ij, output variable yi_jAnd threshold values I.Cell neural network Cell between locally interconnect, cell C (i, j) only neighborhood N with it_rCell in (i, j) is connected with each other, and and other Cell without direct annexation.Cell C (i, j) neighborhood N_r(i, j) can be defined as：

N_r(i, j)=C (k, l) | max (k-i, l-j)≤r (3)

Wherein, r is positive integer, 1≤i, k≤M, 1≤j, l≤N, and M represents the pixel quantity that gesture gray level image is often gone, N represents the pixel quantity of gesture gray level image each column.That is the neighborhood of cell C (i, j) is the length of side 2r centered on C (i, j) Scope included by+1 square.

The main formulas of cell neural network is：

State migration procedure：

Output equation：

Wherein, 1≤i, k≤M, 1≤j, l≤N；T represents iterations；A (k, l) represents the neighborhood residing for cell C (i, j) N_rThe feedback weight of cell C (k, l) in (i, j)；B (k, l) then represents the neighborhood N residing for cell C (i, j)_rIt is thin in (i, j) Other elements in born of the same parents C (k, l) control weight, namely template B in addition to the element of center.Here the value of (k, l) according to According to neighborhood N_rThe definition of (i, j) determines.

Feedback template A and control module B is (2r+1) × (2r+1) matrix, and I represents the threshold values of cell neural network Template, A, B and I value synthesis determine the input quantity u of cell neural network_ij, output quantity yi_jAnd state transfer amount x_ijPair It should be related to.Therefore for Cellular Neural Networks, how correctly design of feedback template A, Control architecture B and threshold values I value is most important.

The template design method that the present invention uses is to be set based on algebraically knot with the template that forefathers' stencil design experience is combined Meter method, template A, B, I form are typically designed as follows：

I=-d (8)

Wherein, a, b, c, d are normal number.

Establish the Cellular Neural Networks of gesture gray level image, by each pixel of gesture gray level image after equalization (i, J) gray value is as the input u that cell is corresponded in Cellular Neural Networks_ij, changed according to the formula of state migration procedure In generation, until whole cell neural network is restrained, there is output y in each cell_ij(t).It can be seen from output equation, cytocidal action The output valve y of network_ij(t) between 1 and -1, y is worked as_ij(t) be 1 when, represent completely black；Work as y_ij(t) when being -1, represent complete white.

Judge whether certain pixel is that the general principle of marginal point is：When some pixel value is completely black, when being+1, if Each pixel value and more than setting threshold parameter in its corresponding neighborhood, then this pixel is not edge pixel, now pixel Value tends to be complete white；, whereas if each pixel value and less than setting the threshold parameter in its corresponding neighborhood, then this pixel Edge pixel is represented, now pixel value tends to be completely black.When this pixel value is complete white, i.e., when -1, then no matter in its corresponding neighborhood respectively How is the value size of individual pixel, and this pixel value will all tend to be complete white.

According to principles above, judge whether certain pixel is that the method for marginal point is in the present invention：Traversal cell nerve net The output valve of each cell corresponding to pixel member in network, when the output valve of some pixel is in the range of [0,1], if its is right Answer the pixel value of other pixels neighborhood Nei and more than predetermined threshold value, then this pixel is not edge pixel, is otherwise edge pixel Point；When output valve [- 1,0) in the range of, be not edge pixel point.The threshold value of neighborhood territory pixel value sum is set according to actual conditions Put.

S204：Gesture subdivision is carried out with reference to gesture shape characteristic to cut：

Fig. 3 is to combine gesture shape characteristic to carry out the flow chart that gesture subdivision is cut.As shown in figure 3, gesture subdivision of the present invention Cut and comprise the following steps：

S301：Extract connected region and profile：

According to the edge pixel point obtained using cell neural network, connected region is asked for, so as to remove other backgrounds letter The interference of breath, only retain the hand and face area of people.Algorithm that connected region uses is asked in the present embodiment as two_pass Algorithm.Then the profile of connected region is extracted, the present embodiment is using search labeling method extraction profile, idiographic flow：To upper Image after face extraction connected region is systematically scanned, if some point in connected region is run into, with the point For starting point, its edge is then tracked, and the pixel above edge is marked.Completely closed when the profile of scanning reaches Close, then return to a position and continue to scan on, until finding new Pixel Information.Extract connected region and profile can also basis Need to select other method.

S302：Finger tip detection is carried out to each connected region：

Finger tip detection is carried out respectively for obtained each connected region, so as to determine whether gesture area.General feelings Under condition when carrying out gesture identification, finger is all separated, therefore can carry out finger tip detection by curvature estimation.Fig. 4 is The schematic diagram of finger tip detection of the present invention.As indicated at 4, the method for finger tip detection is：

Each wire-frame image vegetarian refreshments in connected region is traveled through, using the pixel as datum mark, coordinate is designated as p (p_x,p_y, 0), (p_x,p_y) two-dimensional coordinate of the datum mark in images of gestures is represented, a distance constant L is preset, p is taken along contour direction The l-th point p of point above₁(p_1x,p_1y, 0), then point p and point p₁Straight line is formed, is then taken along contour direction behind point p L-th point p₂(p_2x,p_2y, 0), then point p and point p₂Straight line can also be formed, meeting shape in an angle, should between this two Angle is designated as α；By vectorWith vectorBetween angle cosine value as the curvature result that will be calculated, i.e. curvature estimation Formula is：

If cos α are more than preset curvature threshold value T, it is finger tip point undetermined to judge the point.Threshold value T size be according to away from From constant L come what is set, when distance constant L is bigger, threshold value T is also bigger.Distance constant L generally also can not be too small or excessive, Set generally according to a quarter of finger average length to half.

For interference for the groove part of finger, vector can be passed throughWith vectorBetween vector product symbol Number determine.As seen in Figure 4, when point p is located at fingertip locationThe symbol of vector product is located at groove with point p During positionThe symbol of vector product is different, therefore can pass throughSymbol judge point p position.Exactly For this purpose, just by point p, p₁And p₂Coordinate represented in a manner of three-dimensional rectangular coordinate.Fingertip locationVector Long-pending symbol is relevant with traversal direction, when according to the traversal clockwise of gesture area overall profile, according to the right hand of vector product Rule, fingertip locationVector product is inside perpendicular to image, as negative, when according to the inverse of gesture area overall profile When hour hands travel through (traversal direction as shown in Figure 4), fingertip locationVector product is outside perpendicular to image, is Just.According to fingertip locationThe symbol of vector product, so as to remove the interference of groove part.Judge finger tip point undeterminedThe symbol of vector product, if with fingertip location corresponding to symbol it is identical, be left finger tip point undetermined, otherwise do not protect Stay.

S303：Judge gesture area：

After finger tip point is detected, it is also necessary to finger tip point is judged, so as to remove some parts of face part because It is more than the interference of threshold value for curvature caused by angle problem, judgement obtains gesture area.Present invention employs double judgement side Method：

The maximum finger tip point undetermined of i.e. y-coordinate for first determining whether to detect in connected region and the y-coordinate detected are minimum Finger tip point undetermined between y-coordinate difference whether exceed face height half, if it is, the connected region is not gesture area Domain, otherwise as gesture area undetermined.Why distance is dimensioned to the half of face height here, is surveyed by testing What examination was drawn, it on the premise of correct finger tip point is completely retained, can thus remove the interference of face part.

Whether finger tip point quantity undetermined exceedes default amount threshold in each gesture area undetermined further judged, If it is, the connected region is gesture area, otherwise it is not.The number for the finger tip point quantity that actual gesture area obtains with it is bent Rate threshold value T is relevant, therefore in actual applications, the threshold value of finger tip point quantity can pass through the reality to several gesture training samples Result is tested to be counted to obtain.

S304：Gesture area is split：

Operation eliminates the interference of other connected regions such as face more than, has obtained gesture area.But gesture area The palm portion for not merely including people is possible to inside domain, it is possible to also have the parts such as wrist.Generally, the gesture of people The information that effective information all concentrates on the parts such as the palm portion of people, wrist can be ignored substantially.Therefore in order that obtaining later stage spy Sign extraction and tracking efficient and effectively, it is necessary to split to gesture area, reaches and only retains finger and the mesh of palm portion 's.

According to the shape facility of human hand, the present invention is approximately equal to 2 come real according to the ratio of the length of gesture and the width of gesture Now to the segmentation of gesture., it is necessary to first know the principal direction of gesture area before being split, gesture master is asked in the present embodiment The method in direction is：The barycenter of gesture area is asked for, barycenter is then tried to achieve to the vector of each finger tip point, these vectors is carried out Average, the direction of the average vector is gesture area principal direction.Then gesture is carried out further according to the principal direction of gesture area Segmentation.The dividing method that the present embodiment uses for：The boundary rectangle of gesture area is obtained by gesture area principal direction, with principal direction Parallel place side is length, and the side vertical with principal direction is width, the broadside where finger tip point is selected, since the broadside, along long side Interception distance is the boundary rectangle of 2 times of width edge lengths, and the interior gesture area included of the boundary rectangle is to split to be obtained The gesture area for only retaining finger and palm portion.

S205：Gesture feature is extracted using the Fourier descriptor for retaining phase information：

The gesture area for splitting to obtain for step S204, the present invention devise a kind of Fourier for retaining phase information and retouched Son is stated to extract gesture feature information, so as to remove the rotational invariance of conventional Fourier description, reaches differentiation rotation gesture Purpose.

Discrete fourier coefficient z (k) can be expressed as：

Wherein, p (i) represents i-th of data in discrete series, and n represents the data bulk in discrete series, and e is represented certainly Right constant, j are imaginary unit.It is gesture profile due to needing into line translation, therefore discrete series p (i) is step in the present invention Rapid S104 splits the plural form of coordinate in obtained gesture area wire-frame image vegetarian refreshments.

Inverse Fourier transform can be expressed as：

According to the fundamental property z (k) of Fourier transformation=z^*(n-k) remove in Fourier transformation form z from K+1 to n- K-1 HFS, wherein, z* here represents z conjugate complex number form；K span is：[0,n/2].Then it is right again The z for removing HFS carries out inverse Fourier transform, will obtain with the approximate curve of former Fourier transformation, but the curve becomes More smooth, this curve turns into the K curve of approximation of former Fourier's change curve.Wherein, above-mentioned described Fourier system Several subsets { z (k) n-K ＜ k≤K } then seeks to the Fourier descriptor for extracting gesture feature.

The Fourier descriptor and yardstick of shape, the original position of direction and curve have certain relation.Therefore, in order to Ensure that recognizer has rotation, translation and scale invariance, then need Fourier descriptor to be normalized operation.According to The fundamental property of Fourier's change can prove, when representing profile with Fourier coefficient, coefficient amplitude | | z (k) | | there is rotation Consistency, translation invariance and start position independence, wherein, 0≤k≤n-1, and because Z [0] does not have translation invariant Property, therefore k span is arranged to [1, n-1].In order to realize the scale invariability of Fourier descriptor, Z [0] can will be removed The amplitude Z (k) of each coefficient in addition | | divided by | | Z (1) | |, so as to reach the characteristic of Scale invariant.Normalization operation it Fourier descriptor S [k '] afterwards can be expressed as：

Wherein, 1≤k '≤n-1；| | | | represent modulo operator.

The detailed description of unitary Fourier descriptor may refer to document, and " Song Rui China describes the hand of son based on Fourier Gesture recognizer [D] Xian Electronics Science and Technology University, 2008 ".

The present invention describes sub rotational invariance to remove conventional Fourier, remains the phase information after rotation, The normalized form of Fourier descriptor after improvementIt can be expressed as：

Wherein,The angle of gesture area principal direction and x-axis is represented, j is imaginary unit.Fourier descriptor S above [k '] remains the phase information of gesture rotation, therefore description does not have rotational invariance.Therefore the present invention is used and removed In additionFeature of the coefficient as gesture area.This feature has a translation and scale invariance, and with gesture profile The original position of curve is unrelated, while has rotational variability again, and this feature vector can reach the mesh distinguished to rotation gesture 's.Because the profile point quantity of different gesture areas is not necessarily identical, therefore in actual applications, only in Fourier descriptor Q constitutive characteristic vector, Q size can be determined according to actual conditions before unified selection.

S102：BP neural network is trained according to training sample：

BP neural network, its corresponding gesture figure are inputted using the characteristic vector of training sample images of gestures as training sample As output of the classification as BP neural network, BP neural network is trained.BP neural network is a kind of conventional nerve net Network, the specific composition and parameter and training method of its network, will not be repeated here.

S103：Gesture identification is carried out to the sample identified：

In the BP neural network that the characteristic vector input step S102 of images of gestures to be identified is trained, output identifies The images of gestures classification arrived.

In order to illustrate the technique effect of the present invention, experimental verification has been carried out to the present invention.The gesture training sample point of selection For gesture upward, gesture down, gesture towards left, gesture towards right four parts, the training samples number per part is 80, equally from Reselection test sample in this four classes image, per partial test sample size 40.In order to show conveniently, gesture court is only chosen herein Loading carries out implementation process explanation originally, and the size in sample per pictures is 256 × 256, gray level 256.

Image denoising is carried out firstly the need of to sample upward.Because the scale size of samples pictures is 256 × 256, due to When Pulse Coupled Neural Network is used for image noise reduction field, its neuron number is to correspond with pending image slices vegetarian refreshments , therefore the neuron number of Pulse Coupled Neural Network is arranged to 65536, the pulse coupled neural net that the present embodiment uses The parameter of network model is arranged to：Neuron iterations τ=10, neuron bonding strength β=3, dynamic threshold parameter θ_ij=1, The amplification coefficient V of threshold value output_θ=20, the attenuation coefficient a of threshold function table_θ=0.2, then using providing characteristic to pulse-couple Neutral net is detected, then judges by testing result to obtain noise spot, then according to the type of noise spot, is gone using compound Algorithm of making an uproar carries out denoising operation, wherein the parameter of compound Denoising Algorithm is arranged to T₁=3.5 δ_ij, wherein S_kRepresent noise window, The detection window of noise window size and Pulse Coupled Neural Network is in the same size, and size is 3 × 3, T₂=8.

After carrying out histogram equalization to the images of gestures after denoising, detect to obtain images of gestures using cell neural network The edge of middle gesture, realizes the coarse segmentation to images of gestures, in the present embodiment, the neighborhood of each cell in cell neural network Size is 3*3, and used template is：

Fig. 5 is the exemplary plot of gesture coarse segmentation.

Images of gestures is finely divided then in conjunction with gesture shape feature and cut.Wherein constant L size is 80, curvature estimation Threshold value T size be 0.5.Fig. 6 is the exemplary plot that gesture subdivision is cut.It can be seen that after progress gesture subdivision is cut, can eliminate The influence in the regions such as face, obtain accurate gesture area.

Then the profile point coordinates for segmenting the gesture area for cutting to obtain is built into discrete series again, after carrying out Fourier transformation Fourier coefficient is obtained, is then normalized according to formula (13), 200 before being selected in the Fourier descriptor after normalization Form gesture feature vector.

BP neural network is trained using the gesture feature vector of training sample, wherein the input layer of BP neural network Number determines that the number of output layer is determined by gesture sample species by gesture feature vector, the input layer used of the invention Number is 200, and the number of hidden layer is 10, and the number of output layer is 4.Output result can by binary form 0001,0010, 0100,1000 represents, wherein 0001 represents gesture upward, 0010 represents gesture down, and 0100 represents that gesture represents towards a left side, 1000 Which kind of type is gesture belong to towards the right side according to the result judgement gesture that gesture exports.

In order to verify that what the present invention designed new is gone based on what Pulse Coupled Neural Network and compound Denoising Algorithm were combined Make an uproar algorithm noise reduction quality, the Denoising Algorithm that designs of the present invention and simple compound Denoising Algorithm and medium filtering are done Comparative analysis, the leading indicator of contrast is Y-PSNR PSNR.Table 1 is PSNR pair of Denoising Algorithm of the present invention and contrast algorithm According to table.

Table 1

As it can be seen from table 1 in the case of identical noise density, its PSNR of denoising method proposed by the present invention value Apparently higher than medium filtering and the value of simple compound Denoising Algorithm.As can be seen here, the combination pulse coupled neural that the present invention designs The Denoising Algorithm of network and compound Denoising Algorithm has good denoising effect.

In addition, also being contrasted using the recognition effect of traditional Fourier descriptor, contrast index is gesture sample Discrimination.Table 2 is the gesture sample recognition result statistical form of conventional Fourier description.Table 3 is Fourier descriptor of the present invention Gesture sample recognition result statistical form.

Table 2

Table 3

By contrast table 2 and the result of table 3, traditional Fourier descriptor can not identify that rotation is larger well Gesture, discrimination only have 71% or so, and discrimination is relatively low, therefore this method has the scene of different implications for rotating gesture When, effect is not fine.The improved Fourier descriptor of the present invention can rotate a certain angle in tolerance gesture, although at angle Degree rotation will be considered that it is two kinds of different images when excessive, but still reach 91% or so by the experimental verification present invention Discrimination, achieve good gesture identification effect.

Although the illustrative embodiment of the present invention is described above, in order to the technology of the art Personnel understand the present invention, it should be apparent that the invention is not restricted to the scope of embodiment, to the common skill of the art For art personnel, if various change in the spirit and scope of the present invention that appended claim limits and determines, these Change is it will be apparent that all utilize the innovation and creation of present inventive concept in the row of protection.

Claims

1. a kind of gesture identification method based on hybrid neural networks, it is characterised in that comprise the following steps：

S1.1：It is recommended that the Pulse-coupled Neural Network Model of gesture gray level image, by current each pixel of gesture gray level image Gray value utilizes the granting characteristic pair of Pulse Coupled Neural Network as the input that neuron is corresponded in Pulse Coupled Neural Network The pixel of images of gestures is detected, if the output state of pixel is fired state, will be somebody's turn to do in testing result matrix Element corresponding to pixel is set to 1, is otherwise set to 0；Each element of testing result matrix is traveled through, if element value is 1, with The element is the center of noise reduction process window, and the size of noise reduction process window is set according to actual conditions, counts noise reduction process window The value of other elements in mouthful in addition to central point element, if value is more than predetermined threshold value for 0 number of elements, illustrate the center Point is noise spot, and other situations are not then noise spots；

H (i, j)=| a (i, j)-b (i, j) |

Wherein, a (i, j) is the gray value at pixel (i, j) place in image, and b (i, j) is after the pixel carries out medium filtering Intermediate value output gray level value；

Wherein, m₁(i, j) and m₂(i, j) represent respectively in neighborhood where pixel (i, j) with a (i, j) gray value immediate two The gray value of individual point；

If H (i, j) >=T₁, and V (i, j) >=T₂, T₁And T₂The threshold value set is represented, then using medium filtering to the noise spot Handled, otherwise the noise spot is handled using mean filter；

S1.3：The Cellular Neural Networks of gesture gray level image are established, by each pixel of gesture gray level image after equalization The gray value of (i, j) is as the input u that cell is corresponded in Cellular Neural Networks_ij, enter according to the formula of state migration procedure Row iteration, until the convergence of whole cell neural network, obtain the output yi of each cell_j(t)；It is every in traversal cell neutral net The output valve of cell member corresponding to individual pixel, when the output valve of some pixel is in the range of [0,1], if its corresponding neighborhood The pixel value of other interior pixels and more than predetermined threshold value, then this pixel is not edge pixel, is otherwise edge pixel point；When defeated Go out value [- 1,0) in the range of, be not edge pixel point；

S1.4：The edge pixel point obtained according to step S1.3 obtains connected region, and extraction obtains the profile of connected region, to every Individual connected region carries out finger tip detection respectively, and Fingertip Detection is：

Each wire-frame image vegetarian refreshments in connected region is traveled through, using the pixel as datum mark, coordinate is designated as p (p_x,p_y, 0), in advance If a distance constant L, the l-th point p before p points is taken along contour direction₁(p_1x,p_1y, 0), take the l-th point p behind point p₂ (p_2x,p_2y, 0), calculate vectorWith vectorBetween angle cosine value cos α, if cos α are more than preset curvature threshold value T, then it is finger tip point undetermined to judge the point, otherwise not as finger tip point undetermined；

Fingertip location vector product is determined according to traversal directionSymbol, if according to the suitable of gesture area overall profile When hour hands travel through, vectorial product code should be negative, otherwise vectorial just, to calculate finger tip point undeterminedWith vectorBetween to Amount productIf the symbol of the vector product is identical with symbol corresponding to fingertip location, finger tip point undetermined is left, it is no Do not retain then；

Judge in all finger tip points undetermined for being detected in the connected region, the maximum finger tip point undetermined of y-coordinate and y-coordinate are minimum The y-coordinate difference of finger tip point undetermined whether exceed the half of face height, if it is, the connected region is not gesture area, Otherwise it is used as gesture area undetermined；Whether finger tip point quantity undetermined exceedes pre- in each gesture area undetermined further judged If amount threshold, if it is, the connected region is gesture area, be not otherwise；

The principal direction of gesture area is asked for, is divided according to principal direction according to gesture length and width ratio for 2 pairs of gesture areas Cut, the gesture area after being split；

S1.5：The gesture area that will be obtained after step S1.4 segmentations, by the profile point coordinates of gesture area with plural form table Show, all profile point coordinates are formed into discrete series, note profile point quantity is n, carries out Fourier transformation to the discrete series, obtains To n Fourier coefficient z (k), k=0,1 ..., n-1, Fourier descriptor is calculated

Wherein k '=1,2 ..., n-1,Represent the angle of gesture area principal direction and x-axis；

S2：BP neural network, its corresponding gesture figure are inputted using the characteristic vector of training sample images of gestures as training sample As output of the classification as BP neural network, BP neural network is trained；

S3：In the BP neural network that the characteristic vector input step S2 of images of gestures to be identified is trained, output identification obtains Images of gestures classification.

2. gesture identification method according to claim 1, it is characterised in that threshold value T in step S1.1₁=3.5 δ_ij, δ_ijTable Show the mean absolute deviation of all pixels point in the denoising window of pixel (i, j)；Threshold value T₂For 6~10 integer.

3. gesture identification method according to claim 1, it is characterised in that in step S1.3, in cell neural network Feeding back template A is：

<mrow> <mi>A</mi> <mrow> <mo>(</mo> <mi>k</mi> <mo>,</mo> <mi>l</mi> <mo>)</mo> </mrow> <mo>=</mo> <mfenced open = "{" close = ""> <mtable> <mtr> <mtd> <mrow> <mi>a</mi> <mo>,</mo> </mrow> </mtd> <mtd> <mrow> <mi>k</mi> <mo>=</mo> <mi>i</mi> <mo>,</mo> <mi>l</mi> <mo>=</mo> <mi>j</mi> </mrow> </mtd> </mtr> <mtr> <mtd> <mrow> <mn>0</mn> <mo>,</mo> </mrow> </mtd> <mtd> <mrow> <mi>k</mi> <mo>&NotEqual;</mo> <mi>i</mi> <mo>,</mo> <mi>l</mi> <mo>&NotEqual;</mo> <mi>j</mi> </mrow> </mtd> </mtr> </mtable> </mfenced> <mo>,</mo> </mrow>

Control architecture B is：

<mrow> <mi>B</mi> <mrow> <mo>(</mo> <mi>k</mi> <mo>,</mo> <mi>l</mi> <mo>)</mo> </mrow> <mo>=</mo> <mfenced open = "{" close = ""> <mtable> <mtr> <mtd> <mrow> <mi>b</mi> <mo>,</mo> </mrow> </mtd> <mtd> <mrow> <mi>k</mi> <mo>=</mo> <mi>i</mi> <mo>,</mo> <mi>l</mi> <mo>=</mo> <mi>j</mi> </mrow> </mtd> </mtr> <mtr> <mtd> <mrow> <mo>-</mo> <mi>c</mi> <mo>,</mo> </mrow> </mtd> <mtd> <mrow> <mi>k</mi> <mo>&NotEqual;</mo> <mi>i</mi> <mo>,</mo> <mi>l</mi> <mo>&NotEqual;</mo> <mi>j</mi> </mrow> </mtd> </mtr> </mtable> </mfenced> <mo>,</mo> </mrow>

Threshold value I=-d,

Wherein (k, l) be using in cell neural network centered on cell C (i, j), the length of side as 2r+1 neighborhood N_rPoint in (i, j), A, b, c, d are normal number.

4. gesture identification method according to claim 1, it is characterised in that in step S1.4, ask for the main side of gesture area To method be：The barycenter of gesture area is asked for, barycenter is then tried to achieve to the vector of each finger tip point, these vectors is put down , the direction of the average vector is gesture area principal direction.