CN111627039A

CN111627039A - Interaction system and interaction method based on image recognition

Info

Publication number: CN111627039A
Application number: CN202010387205.5A
Authority: CN
Inventors: 张量; 唐崧; 崔玮
Original assignee: Beijing Dog Intelligent Robot Technology Co ltd
Current assignee: Beijing Dog Intelligent Robot Technology Co ltd
Priority date: 2020-05-09
Filing date: 2020-05-09
Publication date: 2020-09-04

Abstract

The application discloses interactive system and interactive method based on image recognition, wherein, this system passes through the image module and forms the image picture in the target area, acquires through the information acquisition module the user's arrangement information of predetermineeing the teaching aid or the motion trail of operation body of putting in the image picture will through result matching module at last predetermine the arrangement information of teaching aid or the motion trail of operation body with the expectation result that the image picture corresponds matches, when matching unanimously, controls image module output first result information, when matching the nonconformity, controls image module output second result information. The interactive system based on image recognition provides an interactive mode of an operation body, and simultaneously provides a brand-new mode for realizing interaction among people, machines and objects (preset teaching aids), is beneficial to realizing scene learning and personalized learning, realizes the interactive mode of rich interactive systems, and promotes the purpose of interest of users in interactive contents.

Description

Interaction system and interaction method based on image recognition

Technical Field

The present application relates to the field of human-computer interaction technologies, and in particular, to an interaction system and an interaction method based on image recognition.

Background

Human-Computer Interaction, also called Human-Computer Interaction (Human-Computer Interaction or Human-machine Interaction, HCI or HMI for short), is a study for studying the Interaction between a system and a user.

The human-computer interaction technology is more and more widely applied to the fields of entertainment, remote education and the like, for example, the remote education is taken, a user acquires a learning interface by using a client side, and the learning result is fed back through an input device, so that remote learning and result acceptance are realized.

The initial man-machine interaction system has a single mode, a user needs to input information through input equipment such as a keyboard, and the system realizes information output or input information feedback through output equipment. With the continuous development of new technologies, new input modes such as voice and gestures appear, and the interaction modes of the human-computer interaction system are enriched to a certain extent. However, for the human-computer interaction system applied to distance education, how to further enrich the interaction manner to enhance the interest of the user in learning content becomes one of the research directions of the related technicians.

Disclosure of Invention

In order to solve the technical problems, the application provides an interactive system and an interactive method based on image recognition, so as to achieve the purposes of enriching the interactive mode of the interactive system and promoting the interest of users in interactive contents.

In order to achieve the technical purpose, the embodiment of the application provides the following technical scheme:

an interactive system based on image recognition, comprising:

the image module is used for forming an image picture in the target area;

the information acquisition module is used for acquiring the arrangement information of a preset teaching aid or the motion trail of an operation body in the image picture;

and the result matching module is used for matching the arrangement information of the preset teaching aid or the motion track of the operation body with the expected result corresponding to the image picture, controlling the image module to output first result information when the matching is consistent, and controlling the image module to output second result information when the matching is inconsistent.

Optionally, the information obtaining module includes:

the image acquisition unit is used for acquiring image information of the target area, wherein the image information comprises the image picture and a preset teaching aid arranged in the target area or a track image sequence, the track image sequence comprises a plurality of target area images, and each target area image comprises the image picture and position information of the operation body in the target area;

and the information identification unit is used for classifying or identifying the image information so as to acquire the arrangement information of a preset teaching aid in the image picture or the motion trail of the operation body.

Optionally, the information identifying unit includes:

the teaching aid identification model is used for classifying image information comprising the image picture and preset teaching aids arranged in the target area so as to obtain arrangement information of the preset teaching aids;

and the track identification model is used for extracting information of the track image sequence to acquire the position information of the operation body in each target area graph in the target area, and combining the position information of the operation body in all the target area graphs in the target area to acquire the motion track of the operation body.

Optionally, the teaching aid identification model is a deep learning network model which is trained by training samples in advance;

the training process of the deep learning network model comprises the following steps:

acquiring a training sample, wherein the training sample is an image which is marked with arrangement information of the preset teaching aid in a target area in advance;

and training a deep learning network model to be trained by using the training sample to obtain the deep learning network model.

Optionally, the result matching module includes:

the teaching aid matching unit is used for matching the arrangement information of the preset teaching aid with an expected result corresponding to the image picture, controlling the image module to output first result information when the arrangement information of the preset teaching aid is matched with the expected result corresponding to the image picture, and controlling the influence module to output second result information when the arrangement information of the preset teaching aid is not matched with the expected result corresponding to the image picture;

and the track matching unit is used for matching the motion track of the operation body with the expected result corresponding to the image picture, controlling the image module to output first result information when the motion track of the operation body is consistent with the expected result corresponding to the image picture, and controlling the image module to output second result information when the arrangement information of the preset teaching aid is inconsistent with the expected result corresponding to the image picture.

Optionally, the result matching module further includes:

and the result prompting unit is used for controlling the image module to output result prompting information corresponding to the image picture when the arrangement information of the preset teaching aid is inconsistent with the expected result matching corresponding to the image picture or when the motion trail of the operation body is inconsistent with the expected result matching corresponding to the image picture.

An interaction method based on image recognition comprises the following steps:

forming an image picture in the target area;

acquiring arrangement information of a preset teaching aid or a motion track of an operation body in the image picture;

and matching the arrangement information of the preset teaching aid or the motion trail of the operation body with an expected result corresponding to the image picture, controlling the image module to output first result information when the matching is consistent, and controlling the image module to output second result information when the matching is inconsistent.

Optionally, the obtaining of the arrangement information of the preset teaching aid in the image picture or the motion trail of the operation body includes:

acquiring image information of the target area, wherein the image information comprises the image picture and a preset teaching aid arranged in the target area or comprises a track image sequence, the track image sequence comprises a plurality of target area images, and each target area image comprises the image picture and position information of the operation body in the target area;

and classifying or identifying the image information to acquire arrangement information of a preset teaching aid in the image or a motion track of an operation body.

Optionally, the image information is classified or identified to obtain arrangement information of a preset teaching aid in the image picture or a motion track of an operation body, and the arrangement information or the motion track includes:

classifying image information including the image picture and preset teaching aids arranged in the target area by using a teaching aid identification model to acquire arrangement information of the preset teaching aids;

and extracting information of the track image sequence by using a track recognition model to obtain the position information of the operation body in each target area graph in the target area, and combining the position information of the operation body in all the target area graphs in the target area to obtain the motion track of the operation body.

Optionally, the method further includes: and when the arrangement information of the preset teaching aid is inconsistent with the expected result matching corresponding to the image picture, or when the motion trail of the operation body is inconsistent with the expected result matching corresponding to the image picture, controlling the image module to output result prompt information corresponding to the image picture.

According to the technical scheme, the embodiment of the application provides an interactive system and an interactive method based on image recognition, wherein, the interactive system based on image recognition forms an image picture in a target area through an image module, acquires through an information acquisition module the arrangement information of a preset teaching aid or the motion trail of an operation body, which is put by a user in the image picture, and finally, the arrangement information of the preset teaching aid or the motion trail of the operation body is matched with an expected result corresponding to the image picture through a result matching module, when the matching is consistent, the image module outputs first result information, and when the matching is inconsistent, the image module outputs second result information. The interactive system based on image recognition provides an interactive mode of an operation body, and simultaneously provides a brand-new mode for realizing interaction among people, machines and objects (preset teaching aids), is beneficial to realizing scene learning and personalized learning, realizes the interactive mode of rich interactive systems, and promotes the purpose of interest of users in interactive contents.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly introduced below, it is obvious that the drawings in the following description are only embodiments of the present application, and for those skilled in the art, other drawings can be obtained according to the provided drawings without creative efforts.

Fig. 1 is a schematic structural diagram of an interactive system based on image recognition according to an embodiment of the present application;

FIG. 2 is a schematic diagram of a preset teaching aid provided in one embodiment of the present application;

FIG. 3 is a schematic view of a scene in which a user interacts with the image recognition-based interaction system;

FIG. 4 is a schematic view of another scene in which a user interacts with the image recognition-based interaction system;

fig. 5 is a schematic structural diagram of an interactive system based on image recognition according to another embodiment of the present application;

FIG. 6 is a schematic structural diagram of an interactive system based on image recognition according to another embodiment of the present application;

FIG. 7 is a schematic structural diagram of an interactive system based on image recognition according to still another embodiment of the present application;

fig. 8 is a schematic structural diagram of an interactive system based on image recognition according to an alternative embodiment of the present application.

Detailed Description

The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

An embodiment of the present application provides an interactive system based on image recognition, as shown in fig. 1, including:

an image module 100 for forming an image frame in a target area;

the information acquisition module 200 is configured to acquire arrangement information of a preset teaching aid or a motion track of an operation body in the image;

and the result matching module 300 is used for matching the arrangement information of the preset teaching aid or the motion trail of the operation body with the expected result corresponding to the image picture, controlling the image module 100 to output first result information when the matching is consistent, and controlling the image module 100 to output second result information when the matching is inconsistent.

Optionally, the image module 100 may be a display screen or an image display device such as a projector, and when the image module 100 is a display screen, the target area is an area where a screen of the display screen is located. When the image module 100 is a projector, the target area may be a desktop, a paper surface, a cloth surface, or a curtain matched with the projector.

Preferably, the image module 100 is a projector, that is, the image module 100 forms an image frame in a projection manner in a target area, when the image module 100 is a projector, a user may use a desktop or a paper surface in a suitable position or a curtain matched with the projector as the target area, and the formation of the image frame in the projection manner is novel for the user, especially a young user, and is helpful to improve the user's interest in using the interactive system based on image recognition.

The functions of the information acquisition module 200 may be implemented by hardware devices including an image acquisition device (e.g., a camera, etc.) and a device having data processing capabilities (e.g., a microprocessor, etc.).

Similarly, the functions of the result matching module 300 may be implemented by a device having data processing capabilities.

Referring to fig. 2, the preset teaching aid may refer to a real object having a shape of chinese pinyin, english alphabet, arabic numeral, figure, symbol, or the like.

Referring to fig. 3 and 4, fig. 3 and 4 are schematic views of scenes where a user interacts with the image recognition-based interaction system. In fig. 3, an image module 100 projects a target area on a desktop to form the image frame, a user places the image frame in the target area according to a certain sequence by using preset teaching aids, an information obtaining module 200 obtains arrangement information (information such as arrangement sequence and arrangement position) of the preset teaching aids in the image frame, a result matching module 300 matches the arrangement information of the preset teaching aids with an expected result corresponding to the image frame currently projected by the image module 100, for example, when the expected result corresponding to the image frame is "applet", if the user arranges five preset teaching aids, namely "a", "p", "l" and "e", the arrangement sequence and position of the five preset teaching aids are correctly combined to form the shape of the "applet" (a single letter placement angle and a letter placement sequence are correct), if the matching is determined to be successful, the video module 100 is controlled to output first result information (for example, "successful", "OK", "correct", etc.), and the video module 100 can form a next screen; when the arrangement sequence and the positions of the five preset teaching aids cannot be correctly combined to form the shape of the "applet", the matching is determined to be failed, the image module 100 is controlled to output second result information (for example, "fail", "error", "NO", and the like), and the current image picture is kept unchanged.

In fig. 4, the image module 100 projects a target area on a desktop to form the image picture, a user writes in the target area by using an operation body such as a finger or a pen, the information acquisition module 200 acquires a motion trajectory of the operation body in the target area, and when the motion trajectory is consistent with an expected result corresponding to the image picture, the matching is determined to be successful, the image module 100 is controlled to output first result information (for example, "successful", "OK", "correct", etc.), and the image module 100 may form a next picture; when the motion trajectory is not consistent with the expected result corresponding to the video frame, it is determined that the matching is failed, and the video module 100 is controlled to output second result information (for example, "fail", "error", "NO", and the like), and the current video frame is kept unchanged.

The specific configuration and function of the information acquisition module 200 and the result matching module 300 are described below.

Optionally, referring to fig. 5, the information obtaining module 200 includes:

an image obtaining unit 210, configured to obtain image information of the target area, where the image information includes the video frame and a preset teaching aid arranged in the target area or includes a track video sequence, the track video sequence includes multiple target area images, and each target area image includes the video frame and position information of the operation body in the target area;

and the information identification unit 220 is configured to classify or identify the image information to obtain arrangement information of a preset teaching aid in the image or a motion track of an operation body.

As mentioned above, the function of the image capturing unit 210 can be implemented by a camera or other device including a combination of a photosensitive element and a lens.

Optionally, as shown in fig. 6, the information identifying unit 220 includes:

a teaching aid identification model 221, configured to classify image information including the image frame and preset teaching aids arranged in the target area to obtain arrangement information of the preset teaching aids;

and the track identification model 222 is configured to extract information of the track image sequence to obtain position information of the operation body in each target area graph in the target area, and combine the position information of the operation body in all the target area graphs in the target area to obtain a motion track of the operation body.

The teaching aid identification model 221 is a deep learning network model which is trained by training samples in advance;

Training sample is preferred to be adopted image acquisition unit 210 shoots, and the shooting in-process will predetermine the arrangement information of teaching aid according to the requirement and put in the target area, then shoot, constantly alternate the put position, teaching aid and the ambient light of teaching aid among the shooting process to obtain the shooting image under the multiple different environment under the same arrangement information, use the scene with the simulation user as far as possible, in order to improve the recognition accuracy of the degree of depth learning network model that the training obtained.

Then the classes and positions of the teaching aids in the shot images are marked.

And finally, training a to-be-trained deep learning network model (SSD) by using the marked training sample to finally obtain the SSD serving as the teaching aid recognizer. Alternatively, the training process may be performed under the TensorFlow operating framework.

After the final deep learning network model is obtained, aiming at different client running environments, different running frameworks are adopted to convert the deep learning network model so as to enable the deep learning network model to be adaptive to the client running environments. For example, optionally, a high-pass operating framework SNPE (Snapdragon Neural network Processing Engine) may be used to convert the trained recognizer into a recognizer suitable for operating under the high-pass framework, and after the conversion is completed, the model is directly imported into the client.

Optionally, referring to fig. 7, the result matching module 300 includes:

a teaching aid matching unit 310, configured to match arrangement information of the preset teaching aid with an expected result corresponding to the image frame, control the image module 100 to output first result information when the arrangement information of the preset teaching aid matches the expected result corresponding to the image frame, and control the influence module to output second result information when the arrangement information of the preset teaching aid matches the expected result corresponding to the image frame;

the track matching unit 320 is used for matching the motion track of the operation body with the expected result corresponding to the image picture, controlling the image module 100 to output first result information when the motion track of the operation body is matched with the expected result corresponding to the image picture, and controlling the image module 100 to output second result information when the arrangement information of the preset teaching aid is not matched with the expected result corresponding to the image picture.

Optionally, as shown in fig. 8, the result matching module 300 further includes:

and the result prompting unit 330 is configured to control the image module 100 to output result prompting information corresponding to the image picture when the arrangement information of the preset teaching aid is inconsistent with the expected result matching corresponding to the image picture, or when the motion trajectory of the operation body is inconsistent with the expected result matching corresponding to the image picture.

In this embodiment, when the user does not use a preset teaching aid or an operation body to make a correct feedback (that is, the arrangement information of the preset teaching aid is inconsistent with the expected result corresponding to the image frame, or the motion trajectory of the operation body is inconsistent with the expected result corresponding to the image frame), the result prompting unit 330 is further configured to control the image module 100 to output result prompting information corresponding to the image frame, so as to prompt the user where the user is wrong or prompt the user of a correct placing or writing mode.

The image recognition-based interaction method provided by the embodiment of the present application is described below, and the image recognition-based interaction method described below may be referred to in correspondence with the image recognition-based interaction system described above.

Correspondingly, the embodiment of the application provides an interaction method based on image recognition, which comprises the following steps:

forming an image picture in the target area;

To sum up, this application embodiment provides an interactive system and interactive method based on image recognition, wherein, interactive system based on image recognition passes through the image module and forms the image picture in the target area, acquires through the information acquisition module the user's arrangement information of presetting the teaching aid or the motion trail of operation body that puts in the image picture will through result matching module at last the arrangement information of presetting the teaching aid or the motion trail of operation body with the expectation result that the image picture corresponds matches, when the matching is unanimous, controls image module output first result information, when the matching is inconsistent, control image module output second result information. The interactive system based on image recognition provides an interactive mode of an operation body, and simultaneously provides a brand-new mode for realizing interaction among people, machines and objects (preset teaching aids), is beneficial to realizing scene learning and personalized learning, realizes the interactive mode of rich interactive systems, and promotes the purpose of interest of users in interactive contents.

The embodiments in the present description are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are referred to each other.

The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present application. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the application. Thus, the present application is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims

1. An interactive system based on image recognition, comprising:

the image module is used for forming an image picture in the target area;

2. The image recognition-based interactive system according to claim 1, wherein the information acquisition module comprises:

3. The image recognition-based interaction system according to claim 2, wherein the information recognition unit comprises:

4. The interactive system based on image recognition of claim 3, wherein the teaching aid recognition model is a deep learning network model which is trained by training samples in advance;

5. The image recognition-based interaction system of claim 1, wherein the result matching module comprises:

6. The image recognition-based interaction system of claim 5, wherein the result matching module further comprises:

7. An interaction method based on image recognition is characterized by comprising the following steps:

forming an image picture in the target area;

8. The method according to claim 7, wherein the acquiring of the arrangement information of the preset teaching aids or the motion trail of the operation body in the image picture comprises:

9. The method according to claim 8, wherein the classifying or identifying the image information to obtain arrangement information of a preset teaching aid or a motion track of an operation body in the image picture comprises:

10. The method of claim 7, further comprising: and when the arrangement information of the preset teaching aid is inconsistent with the expected result matching corresponding to the image picture, or when the motion trail of the operation body is inconsistent with the expected result matching corresponding to the image picture, controlling the image module to output result prompt information corresponding to the image picture.