CN111796675A

CN111796675A - Gesture recognition control method of head-mounted device and storage medium

Info

Publication number: CN111796675A
Application number: CN202010443152.4A
Authority: CN
Inventors: 刘德建; 陈丛亮; 郭玉湖; 陈宏�
Original assignee: Fujian TQ Digital Co Ltd
Current assignee: Fujian TQ Digital Co Ltd
Priority date: 2020-05-22
Filing date: 2020-05-22
Publication date: 2020-10-20
Anticipated expiration: 2040-05-22
Also published as: CN111796675B

Abstract

The invention provides a gesture recognition control method and a storage medium of head-mounted equipment, wherein the head-mounted equipment is provided with two cameras, and the method comprises the following steps: within a preset first time length, acquiring unused RGB values according to the RGB values of pixels of each frame of image, and sending the unused RGB values to light-emitting equipment; within a preset second time length, the two cameras respectively identify the coordinate positions of light output by the light-emitting equipment according to the received RGB values in a lens picture; starting timing in turn seamlessly between the first time length and the second time length; and triggering a corresponding preset control instruction according to the distance change between the coordinate positions corresponding to the two cameras within a third time length, wherein the third time length is greater than the sum of the first time length and the second time length. The invention can improve the accuracy and the recognition efficiency of the gesture recognition of the camera, support the recognition and the control of the specific gesture, improve the convenience of operation, enrich the control modes and functions and greatly improve the usability of products.

Description

Gesture recognition control method of head-mounted device and storage medium

Technical Field

The invention relates to the field of gesture recognition, in particular to a gesture recognition control method and a storage medium of head-mounted equipment.

Background

The head-mounted equipment in the prior art is worn on the head, so that the operation control is difficult to be carried out by using a mode such as touch control of a mobile phone screen. Although some head-mounted devices can support gesture control, the problems of complex calculation, low recognition rate, low operation sensitivity and the like generally exist, and the operation mode is not convenient enough.

Therefore, it is necessary to provide a gesture recognition control method and a storage medium for a head-mounted device that can overcome the above problems at the same time.

Disclosure of Invention

The technical problem to be solved by the invention is as follows: provided are a gesture recognition control method and a storage medium for a head-mounted device, and the recognition accuracy, recognition efficiency and operation convenience of gesture control are improved.

In order to solve the technical problems, the invention adopts the technical scheme that:

a gesture recognition control method of head-mounted equipment is provided, wherein two cameras are arranged on the head-mounted equipment, and the method comprises the following steps:

within a preset first time length, acquiring unused RGB values according to the RGB values of pixels of each frame of image, and sending the unused RGB values to light-emitting equipment;

within a preset second time length, the two cameras respectively identify the coordinate positions of light output by the light-emitting equipment according to the received RGB values in a lens picture;

starting timing in turn seamlessly between the first time length and the second time length;

and triggering a corresponding preset control instruction according to the distance change between the coordinate positions corresponding to the two cameras within a third time length, wherein the third time length is greater than the sum of the first time length and the second time length.

The invention provides another technical scheme as follows:

a computer-readable storage medium, on which a computer program is stored, which, when executed by a processor, is capable of carrying out the steps involved in a method for gesture recognition control of a head-mounted device as described above.

The invention has the beneficial effects that: the invention controls the light-emitting equipment to output the light of the unused RGB value by acquiring the unused RGB value; and respectively identifying the moving tracks of the light corresponding to the respective lens pictures according to the two cameras to determine the distance change between the two cameras, thereby determining the corresponding control instruction. On one hand, the conventional mode that the operation track of the real hand of the user can be recognized only by performing complex calculation on all image pixels is changed into a recognition mode that the gesture of the user can be quickly obtained only by simply analyzing pixels with specific RGB values, so that the calculation complexity of recognition can be greatly reduced, and the recognition efficiency is improved; and the accuracy of identification can be ensured at the same time. On the other hand, the difference of the identification coordinates of the two cameras is fully utilized, so that the specific gestures can be identified, and the specified control instruction, such as clicking operation, is triggered, the problem that some gesture identification operations are not well realized is solved, and the control operation is simplified; thereby improving the operation convenience and the product usability.

Drawings

Fig. 1 is a schematic flowchart illustrating a gesture recognition control method for a head-mounted device according to an embodiment of the present invention;

fig. 2 is a schematic flowchart of a gesture recognition control method for a head-mounted device according to an embodiment of the present invention;

fig. 3 is a schematic diagram of coordinate calculation according to the first embodiment and the second embodiment of the present invention.

Detailed Description

In order to explain technical contents, achieved objects, and effects of the present invention in detail, the following description is made with reference to the accompanying drawings in combination with the embodiments.

The most key concept of the invention is as follows: the user gesture can be quickly confirmed only by simply analyzing the pixels with specific RGB values; meanwhile, the specific gesture can be recognized by utilizing the coordinate recognition difference of the double cameras, and the specified control instruction is triggered, so that the control operation is more convenient and easier to realize.

Referring to fig. 1, the present invention provides a gesture recognition control method for a head-mounted device, where the head-mounted device is provided with two cameras, including:

Further, the triggering of the corresponding preset control instruction according to the distance change between the respective corresponding coordinate positions of the two cameras is specifically:

presetting a first control instruction corresponding to distance increase and a second control instruction corresponding to distance decrease;

setting a coordinate position corresponding to one camera as a reference coordinate position;

if the distance between the coordinate position corresponding to the other camera and the reference coordinate position is increased within the third duration, triggering a first control instruction;

and if the distance between the coordinate position corresponding to the other camera and the reference coordinate position becomes smaller in the third time length, triggering a second control instruction.

According to the description, the coordinate recognized by one camera is taken as a reference coordinate, and the corresponding control command is determined according to the distance change of the coordinate recognized by the other camera relative to the reference coordinate; the method has the advantages of little calculation amount and high identification efficiency.

Further, the first control instruction correspondingly enlarges or lifts the image; the second control instruction corresponds to a click.

According to the description, the control instruction triggered by the specific gesture is flexibly set in a user-defined mode, the control operation is simplified, and meanwhile the operation convenience is improved.

Further, the obtaining of the unused RGB values according to the RGB values of each pixel of each frame image specifically includes:

presetting more than two groups respectively corresponding to different RGB value ranges;

acquiring the RGB value of each pixel of each frame of image;

dividing each pixel into a corresponding group according to the RGB value;

calculating the number of pixel points of each group to obtain the group with the least number of pixel points;

and determining the RGB value in the RGB value range corresponding to the group with the least number of pixel points as an unused RGB value.

The manner of dividing the RGB value ranges of each group according to the color values is beneficial to intensively locking one or a plurality of groups when analyzing the unused pixel colors in the image shot in the first time length, so that the groups are not dispersed into a plurality of groups, and the accuracy and the efficiency of subsequent analysis and calculation are improved.

Further, if the unused RGB values correspond to two or more groups, the sending is performed to a light emitting device, specifically:

respectively calculating RGB difference values of more than two groups corresponding to the unused RGB values and other groups;

acquiring an RGB value range corresponding to the group with the largest difference value with other groups;

and sending the RGB value range to a light-emitting device.

As can be seen from the above description, if the unused RGB values are dispersed in two or more groups, the group with the largest difference from other groups is further selected, and the corresponding RGB value range is used as the standard of the light output by the light-emitting device, so that the recognition degree of the light output by the light-emitting device in the display screen of the head portrait device can be further improved, and the recognition accuracy is improved again.

Further, the RGB values of the light output by the light emitting device are randomly chosen from the range of received RGB values.

As can be seen from the above description, the light emitting device can be freely selected from a given range, and the matching degree with the light emitting device is improved, ensuring that it can output light of RGB values meeting the requirements.

Further, the different RGB value ranges are RGB value ranges corresponding to respective colors.

As can be seen from the above description, grouping is directly performed according to the color values corresponding to the colors, so that the available value and intuitiveness of the pixel grouping result can be improved.

Further, the two cameras respectively identify coordinate positions of light output by the light-emitting device according to the received RGB values in the lens frame, specifically:

controlling a light emitting device to emit light corresponding to the received RGB values;

respectively searching pixel points corresponding to RGB values sent to the light-emitting equipment in the current frame images of the two cameras, and acquiring coordinate positions of the pixel points;

and determining the distance change of the coordinate positions of the two cameras in the second time length according to the coordinate positions of the frames of images acquired by the two cameras in the second time length.

As can be seen from the above description, by locating a specific RGB value in an image and combining the RGB values in time sequence, a control gesture made by a user through a light-emitting device can be obtained.

The invention provides another technical scheme as follows:

the program, when executed by a processor, is capable of implementing the steps of a method for gesture recognition control of a head-mounted device comprising:

acquiring the RGB value of each pixel of each frame of image;

dividing each pixel into a corresponding group according to the RGB value;

and sending the RGB value range to a light-emitting device.

As can be understood from the above description, those skilled in the art can understand that all or part of the processes in the above technical solutions can be implemented by instructing related hardware through a computer program, where the program can be stored in a computer-readable storage medium, and when executed, the program can include the processes of the above methods. The program can also achieve advantageous effects corresponding to the respective methods after being executed by a processor.

The storage medium may be a magnetic disk, an optical disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), or the like.

Example one

Referring to fig. 2 and fig. 3, the present embodiment provides a gesture recognition control method for a head-mounted device, which can significantly improve gesture recognition efficiency and accuracy of gesture recognition; meanwhile, the head-mounted equipment is provided with two cameras (arranged at a certain distance), so that the convenient control of specific gestures is supported, the control function is enriched, and meanwhile, the operation convenience and the operation accuracy of gesture control are improved. The light emitting device may be any device capable of emitting light corresponding to a specified RGB value, such as a ring or bracelet configured with LEDs.

The method comprises the following steps:

s1: presetting more than two groups respectively corresponding to different RGB value ranges;

that is, one group corresponds to one RGB value range. Preferably a set corresponds to a range of colour values. For example, the color image may be divided into 9 groups of red, orange, yellow, green, blue, purple, and black, wherein the "red" group corresponds to a cube having a length, width, and height of 50 and 55, which is a rectangular solid space having RGB values ranging from (200, 50, 50) to (255, 0, 0).

Of course, the grouping may also be finer, such as grouping each small RGB value interval.

Red: (200, 50, 50) - (255, 0, 0);

orange: (200, 100, 50) - (255, 50, 0);

yellow: (200, 150, 50) - (255, 100, 0);

green: (0, 255, 0) to (50, 200, 50);

cyan: (0, 255, 50) - (50, 200, 100);

blue: (0, 0, 255) to (50, 50, 200);

purple: (50, 0, 255) - (100, 50, 200);

preferably, the partition of red, orange, yellow, green, blue-violet may also be a range classified in an LAB-wise manner in polar coordinates and then converted into RGB.

S2: presetting a first time length, a second time length and a third time length;

preferably, the first time period and the second time period are equal, such as 100 ms; the third duration is greater than the sum of the first duration and the second duration, and corresponds to a valid recognition period for recognizing the multi-finger gesture, so that at least one gesture recognition period needs to be included.

S3: in the first time period, the head-mounted device obtains unused RGB values according to the RGB values of the pixels of each frame of image and sends the unused RGB values to the light-emitting device.

Specifically, the step includes:

s31: and acquiring the RGB value of each pixel of each frame of image. Namely, the RGB value of each pixel in each frame of image shot by the camera in the first duration is obtained.

S32: and dividing each pixel into a corresponding group according to the RGB value. That is, each pixel acquired in step S31 is grouped according to its RGB value and classified into a group corresponding to a range of RGB values.

S33: and calculating the number of pixel points of each group to obtain the group with the minimum number of pixel points. That is, the number of pixels included in each group is calculated, and the group with the smallest number of pixels is obtained, and if the obtained number of groups is one, the group can be considered to have the largest difference from other groups.

In another specific example, the number of the groups with the smallest number of pixels finally obtained in the step S33 is two or more, and the identification degree of the light output by the light emitting device can be improved by further calculating one group with the largest difference from all other groups.

For example, if statistics shows that the pixel points included in the 3 groups of "red", "orange" and "yellow" are all 0 or close to 0, the RGB value ranges corresponding to the three groups are the unused RGB values.

In accordance with another embodiment, the most different group can be determined by:

s34: RGB difference values of two or more groups corresponding to the unused RGB values acquired at S33 and all other groups are calculated, respectively, to determine.

Taking the above 9 groups of red, orange, yellow, green, blue, purple and black as an example, and determining that the three groups of "red", "orange" and "yellow" have the least and equal pixel points after the step of S33, the unused RGB values correspond to the RGB values of the three groups of "red", "orange" and "yellow". In order to further improve the discrimination of the light emitted by the light emitting device. Then the difference between the three groups "red", "orange" and "yellow" and the 9 groups red, orange, yellow, green, blue, purple and black can be calculated. The calculation process may be: the RGB values of three groups of "red", "orange" and "yellow" are subtracted from other groups having pixel points (6 groups of cyan, blue, violet, black and white) to obtain a group having the largest difference among the three components (R, G, B). The formula is as follows: group 1 and group 2 had a difference d12 ═ 2+ (R1-R2)2+ (G1-G2)2+ (B1-B2) 2; the difference value dij is obtained for each of the red orange yellow 3 group and the green blue violet 4 group. Wherein the numbers of red, orange, yellow, green, blue and purple are 1234567 respectively; the maximum difference value is max (min (d14, d15, d16, d17), min (d24, d25, d26, d27), min (d34, d35, d36, d 37)).

If the maximum value d14 is found, the red group is defined.

S35: if the group corresponding to the unused RGB value is only one group, the RGB values within the range of RGB values corresponding to the group can be directly sent to the light emitting device.

If the group with the least number of pixels corresponds to more than two groups, the corresponding RGB value range can be directly sent to the light-emitting equipment, one RGB value is selected by the light-emitting equipment to carry out light sending accidents, and the group with the highest identification degree can be screened from the more than two groups and then the RGB value range is sent to the light-emitting equipment.

That is, regardless of the number of groups to which unused RGB values correspond, it is preferable to transmit the RGB values having the largest difference to the light emitting devices; of course, it is also possible to choose to send all unused RGB values to the light emitting device.

It should be noted that the head-mounted device sends the RGB value range to the lighting device via the lighting device connected to its bluetooth.

S4: and in the second time period, the two cameras of the head-mounted equipment respectively identify the coordinate positions of the light output by the light-emitting equipment according to the received RGB value range in the respective camera pictures.

The method specifically comprises the following steps:

s41: and controlling the light-emitting equipment to emit corresponding light according to the received RGB value range in the second time length.

Preferably, if the unused RGB values correspond to more than two color values (i.e. two grouped ranges of RGB values), the light-emitting device may randomly select the RGB values from the unused RGB values for output.

In a specific example, if the maximum difference group is found as "red" group, only one group of corresponding RGB values is obtained. If the "red" group corresponds to a cuboid space having RGB values in the ranges of (200, 50, 50) to (255, 0, 0), i.e., a cuboid having a length, width and height of 50 and 55, the light emitting device may randomly output the colors of (200, 50, 50) to (255, 0, 0).

S42: the head-mounted device respectively obtains images shot by the two cameras at present (the cameras still shoot in real time within the second duration), respectively searches pixel points of RGB values correspondingly sent to the light-emitting device in the current frame image, and obtains coordinate positions of the pixel points. Assuming that the first camera acquires a first coordinate position; the second camera acquires a second coordinate position.

S43: and determining the distance change of the coordinate positions of the two cameras in the second time length according to the coordinate positions of the two cameras in the second time length, which are determined in the last step by each frame image.

That is, the head-mounted device only needs to identify pixel points corresponding to RGB values sent to the light-emitting device in images captured by the two cameras in real time, and locate coordinate positions (i.e., a first coordinate position and a second coordinate position) of the pixel points in the image; and then, according to the time sequence, determining the distance change between each first coordinate and each second coordinate position, so that the gesture of the user in the second time length through the light-emitting equipment can be obtained, and the corresponding control instruction is further determined.

S5: and starting timing in turn seamlessly between the first time length and the second time length.

Correspondingly, S3 and S4 are executed alternately, and the first time period is the beginning. Specifically, in the first time period, the light emitting device stops outputting any light, and only when receiving the RGB value range sent by the head-mounted device, the corresponding light is output.

That is, in the first duration, the head-mounted device calculates and transmits the calculation result to the light-emitting device; in a second time period, the light-emitting device outputs corresponding light; and starting the timing of the first time length again, stopping outputting light by the light-emitting device, and repeatedly calculating and sending the light to the light-emitting device by the head-mounted device.

The whole process is that the user utilizes the light-emitting device to simulate a human hand, a mouse or other control devices to make gestures, and the head-mounted device obtains the user control gestures by identifying the position of the screen corresponding to the light emitted by the light-emitting device.

In a specific example, based on the above, it is already possible to set various specific gestures and corresponding manipulation methods in the head-mounted device in advance, and then, after recognizing the gestures, directly execute the manipulation methods corresponding to the gestures. For example, a preset "stroke left" gesture corresponds to "go back to previous page"; the preset 'hooking' gesture corresponds to 'closing the current interface', and the like.

In particular, more complex gesture manipulations can be provided, such as simulating mouse clicks of such functions.

The concrete implementation is as follows:

s6: the head-mounted equipment calculates to obtain a position point of the light corresponding to the current screen according to the coordinate position of the light currently output by the light-emitting equipment in the lenses of the two cameras;

specifically, the coordinates are converted to equal scale according to the scale of the camera and the screen. The shooting range of the camera is the coordinate range from (0, 0) to (x1, y1), the coordinate range of the screen is from (0, 0) to (x2, y2), and then the coordinate position of the camera (x3, y3) is converted into the screen coordinate of (x4, y4) ═ x3 × 1/x2, y3 × y1/y 2).

For example, if the camera shooting range is a coordinate range from (0, 0) to (1920, 1080) and the screen coordinate range is from (0, 0) to (960, 540), the camera coordinate position (2,4) is converted into a screen coordinate of (x4, y4) ═ 2 × 960/1920,4 × 540/1080) ═ 1, 2; the screen position to which camera 2 corresponds, for example camera position (2,6), corresponds to screen position (1, 3).

S7: and receiving a click signal sent by the light-emitting equipment, and executing click operation corresponding to the current position point.

That is, a "click" or "touch" button is preset on the light-emitting device in advance; then, in the process of executing the gesture, the button can be triggered to send a click signal to the head-mounted device (in a Bluetooth or infrared mode, etc.); then the head-mounted equipment instantly determines the position of the current screen corresponding to the light according to the received click signal; and finally triggering the function corresponding to the position. It is understood that the function of clicking the mouse is performed corresponding to the above-mentioned position point.

Through the above, the realization can wear the light emitting equipment through the user and realize carrying out the gesture to the head mounted device and control to can show ground and improve the head mounted device, especially nearly dispose the gesture recognition speed of the head mounted device of single camera, and can guarantee the accuracy of discernment simultaneously.

S8: and in the third time length, triggering a corresponding preset control instruction according to the distance change between the respective corresponding coordinate positions of the two cameras.

Preferably, the third time period is the first time period and the second time period, which are counted in turn for N times, wherein N is greater than or equal to 2,

in this step, the distance between the coordinates of the two cameras within the second duration obtained in the step S4 is changed to obtain a corresponding control instruction, so as to control the screen by convenient operation.

Specifically, the step includes:

s81: presetting a first control instruction corresponding to distance increase and a second control instruction corresponding to distance decrease;

for example, the first control instruction corresponds to enlarging an image or lifting up; the second control instruction corresponds to a click. That is, enlarging the screen when the distance becomes larger; the click is performed when the distance is reduced, and in order to improve the accuracy of the click position, the distance is reduced due to horizontal movement, the movement displacement does not exceed a preset threshold, and if the distance exceeds the threshold, the movement is regarded as simple movement.

S82: setting a coordinate position corresponding to one of the cameras as a reference coordinate position, namely setting one of the cameras as a reference camera with a constant clock, for example, setting a right-eye camera as a reference camera with a constant clock;

s83: acquiring a coordinate position set of the time sequence corresponding to each of the two cameras according to the step S4;

s84: according to the two obtained coordinate position sets, if the distance between the coordinate position corresponding to the other camera and the reference coordinate position is increased in the third time period, triggering a first control instruction, namely, zooming in the screen;

and if the distance between the coordinate position corresponding to the other camera and the reference coordinate position becomes smaller in the third time length, triggering a second control instruction, namely executing click operation.

Referring to fig. 3, O1 is the left eye camera position, and O2 is the right eye camera position; the distance between the central points of the two cameras is b; the two coordinate axes are x1, y1, z1 and x2, y2, z2, respectively; p is the position of the led lamp, and the positions of the led lamp photographed by the two cameras are different, namely P1(x1, y1) and P2(x2, y 2). Therefore, the distance between the point P can be determined according to the positions P1 and P2 of the two cameras.

In the figure, the mouse pointer hovers over the (x1, y1) coordinates because the camera is based on O1.

The specific embodiment can realize the execution of the designated control operation in the gesture mapping touch, so that the gesture control is easier to realize and the operation is more convenient and faster.

Example two

The embodiment provides a specific application scenario corresponding to the first embodiment:

1. the RGB values of pixels of each frame of image of the camera are obtained, firstly, the RGB values are grouped through manual presetting, and each group has an RGB range. For example: the groups can be divided into 9 groups of red, orange, yellow, green, blue, purple and black (the groups can also be divided into one group for each smaller rgb value interval). Then, the number of pixel points of each group is calculated according to the image acquired by the camera. For example, the 3 groups of pixels of red, orange and yellow are statistically found to be 0, or the three groups are the unused rgb values when the pixel is close to 0.

2. The most differentiated color is output by more than two led rings worn on different fingers on the hand and turned off at a fixed frequency, such as 100 milliseconds. The maximum difference group was found from the red orange yellow group as the red group above. For example, if the red rgb ranges from (200, 00, 00) to (255, 00, 00), the led ring randomly outputs the colors of (200, 00, 00) to (255, 00, 00).

3. And the two cameras respectively calculate the rgb range of the maximum rgb difference value red in the obtained image frames in the led off output time period to be (200, 00, 00) to (255, 00, 00) according to the same frequency, then the led outputs the color of the maximum rgb difference value, and the led position is obtained when the led is turned on. After the Led is turned off, the color which the Led should display is calculated, after the Led is turned on, the two cameras respectively search the positions of the pixel points in the Led color range, for example, the shooting range of the cameras is the coordinate range from (0, 0) to (1920, 1080), and the central positions of the pixel points of the Led color identified by the left camera are (400 ).

4. The head mounted display is then also 1920x1080 pixels, showing the mouse pointer hovering over the (100, 200) coordinates.

5. Pressing the button of the led device is equivalent to clicking a mouse.

6. The method comprises the steps of using two cameras with a certain distance, respectively capturing the positions of the led lights in respective images by the method, capturing the distance change of the position coordinates of the led equipment in the two cameras, wherein the distance is enlarged when the distance is larger, and the distance is pressed when the distance is smaller.

Specifically, the step includes:

6.1 two cameras acquire the touch position according to step 4. Because the shooting angles of the two cameras are different, the default setting is based on the coordinate of one camera, for example, based on the left camera, and the right camera is only used as an auxiliary role for judging pressing and lifting.

6.2 acquiring left camera coordinates and right camera coordinates, changing from (400 ) and (420, 400) to (400 ) and (410, 400), and showing that the y coordinates are unchanged, namely the distance is reduced due to horizontal movement, and the distance is reduced (the closer the distance between the two cameras is, the larger the difference is, the farther the distance is, the smaller the difference is), and the representation is pressed; a larger distance indicates a lift.

EXAMPLE III

Corresponding to the first embodiment and the second embodiment, a computer-readable storage medium is provided, on which a computer program is stored, where the computer program is capable of implementing the steps included in the gesture recognition control method for a head-mounted device when the computer program is executed by a processor of the head-mounted device. The detailed steps are not repeated here, and refer to the descriptions of the first embodiment and the second embodiment for details.

In summary, the gesture recognition control method and the storage medium for the head-mounted device provided by the invention can improve the accuracy and recognition efficiency of camera gesture recognition, support recognition and control of specific gestures, improve the convenience of operation, enrich control modes and functions, and greatly improve the product availability. Furthermore, the equipment (light-emitting equipment) required to be matched has the characteristics of simple structure, lightness, portability and the like, so that the scheme also has the characteristics of strong practicability and easiness in implementation.

The above description is only an embodiment of the present invention, and not intended to limit the scope of the present invention, and all equivalent changes made by using the contents of the present specification and the drawings, or applied directly or indirectly to the related technical fields, are included in the scope of the present invention.

Claims

1. A gesture recognition control method of head-mounted equipment is provided with two cameras, and is characterized by comprising the following steps:

2. The method according to claim 1, wherein the triggering of the corresponding preset control command according to the change in the distance between the respective corresponding coordinate positions of the two cameras is specifically:

3. The gesture recognition control method of claim 2, wherein the first control command corresponds to zooming in or zooming out; the second control instruction corresponds to a click.

4. The method according to claim 1, wherein the obtaining of the unused RGB values according to the RGB values of the pixels of each frame of image comprises:

acquiring the RGB value of each pixel of each frame of image;

dividing each pixel into a corresponding group according to the RGB value;

5. The method according to claim 4, wherein if the unused RGB values correspond to two or more groups, the unused RGB values are sent to a light-emitting device, specifically:

and sending the RGB value range to a light-emitting device.

6. The method as claimed in claim 1, wherein the RGB values of the light output from the light emitting device are randomly selected from a range of received RGB values.

7. The method according to claim 6, wherein the different RGB value ranges are RGB value ranges corresponding to respective colors.

8. The gesture recognition control method of claim 1, wherein the two cameras respectively recognize coordinate positions of light output by the light emitting device according to the received RGB values in a lens frame, and specifically:

9. A computer-readable storage medium, on which a computer program is stored, the program being capable of implementing the steps included in a method for gesture recognition control of a head-mounted device according to any one of claims 1 to 8 when executed by a processor.