WO2021077382A1

WO2021077382A1 - Method and apparatus for determining learning state, and intelligent robot

Info

Publication number: WO2021077382A1
Application number: PCT/CN2019/113169
Authority: WO
Inventors: 黄巍伟; 郑小刚; 王国栋
Original assignee: 中新智擎科技有限公司
Priority date: 2019-10-25
Filing date: 2019-10-25
Publication date: 2021-04-29
Also published as: CN110945522A; CN110945522B

Abstract

A method and apparatus for determining a learning state, and an intelligent robot. The method comprises: acquiring a frame image from a class video of a user (110); recognizing an expression of the user in the frame image (120); and recognizing a learning state of the user in the frame image in combination with the expression (130). By means of first recognizing an expression and then determining the learning state of the user in a class in combination with the expression, the method realizes accurate recognition of the learning state of the user in the class, thereby avoiding confusion and misjudgment of the learning state caused by the expression, and improving the recognition accuracy.

Description

一种学习状态的判断方法、装置及智能机器人Method, device and intelligent robot for judging learning state

技术领域Technical field

本申请实施例涉及电子信息技术领域，尤其涉及一种学习状态的判断方法、装置及智能机器人。The embodiments of the present application relate to the field of electronic information technology, and in particular to a method, device and intelligent robot for judging the learning state.

背景技术Background technique

教育就是一种有目的、有组织、有计划、***地传授知识和技术规范等的社会活动，其是人们获得知识，掌握技能的重要手段。而在教育中课堂教学是其最基础和最重要的教学形式。Education is a purposeful, organized, planned, and systematic social activity to impart knowledge and technical specifications. It is an important means for people to acquire knowledge and master skills. In education, classroom teaching is the most basic and most important form of teaching.

目前，为了提高课堂教学的教学质量，通常会对课堂教学进行评估，而评估课堂教学的教学质量主要是通过两维度进行，分别为：课堂知识的掌握情况以及学生上课的学习状态。At present, in order to improve the quality of classroom teaching, classroom teaching is usually evaluated, and the assessment of classroom teaching quality is mainly carried out through two dimensions, namely: the mastery of classroom knowledge and the learning status of students in class.

发明人在实现本发明的过程中，发现：目前，学生上课的学习状态主要通过人工观察或摄像头监控等方式，这样的方式得到的反馈数据都是：在课堂开始的5分钟内，学生A专注听课，课堂的第5分钟至第10分钟，学生A走神等等。这种方式不能很好的判断学生上课的学习状态，有可能出现混淆判断和错误判断的情况。In the process of implementing the present invention, the inventor found that: at present, the learning status of students in class is mainly through manual observation or camera monitoring. The feedback data obtained in this way is all: within 5 minutes of the beginning of class, student A is focused Attending the class, from the 5th minute to the 10th minute of the class, student A wanders and so on. This method cannot well judge the learning status of students in class, and there may be confusion and misjudgment.

发明内容Summary of the invention

本发明实施例主要解决的技术问题是提供一种学习状态的判断方法、装置及智能机器人，能够提高判断用户学习状态的准确性。The technical problem mainly solved by the embodiments of the present invention is to provide a method, device and intelligent robot for judging the learning state, which can improve the accuracy of judging the learning state of the user.

本发明实施例的目的是通过如下技术方案实现的：The purpose of the embodiments of the present invention is achieved through the following technical solutions:

为解决上述技术问题，第一方面，本发明实施例中提供给了一种学习状态的判断方法，包括：In order to solve the above technical problems, in the first aspect, an embodiment of the present invention provides a method for judging the learning state, including:

从用户的上课视频中获取帧图像；Obtain frame images from the user’s lesson video;

识别所述用户在所述帧图像中的表情；Identifying the user's expression in the frame image;

结合所述表情，识别所述帧图像中所述用户的学习状态。In combination with the expression, the learning state of the user in the frame image is recognized.

在一些实施例中，所述表情包括高兴、疑惑、疲惫和中性，所述学习状态包括专注状态、走神状态；In some embodiments, the facial expressions include happy, confused, exhausted, and neutral, and the learning state includes a focused state and a distracted state;

所述结合所述表情，识别所述帧图像中所述用户的学习状态的步骤，具体包括：The step of identifying the learning state of the user in the frame image in combination with the expression specifically includes:

判断所述表情是否为疲惫；Determine whether the expression is tired;

若否，则获取预先存储的所述用户与所述表情对应的专注基准图片和走神基准图片；If not, acquiring pre-stored reference pictures of concentration and distraction corresponding to the facial expressions of the user;

将所述帧图像与所述专注基准图片进行比对，得到第一匹配度；Comparing the frame image with the focus reference picture to obtain a first degree of matching;

判断所述第一匹配度是否大于或者等于第一预设阈值；Judging whether the first matching degree is greater than or equal to a first preset threshold;

若大于或者等于所述第一预设阈值，则将所述帧图像标记为专注状态图像；If it is greater than or equal to the first preset threshold, mark the frame image as a focused state image;

若小于所述第一预设阈值，则将所述帧图像与所述走神基准图片进行比对，得到第二相似度；If it is less than the first preset threshold, comparing the frame image with the wandering reference picture to obtain a second degree of similarity;

判断所述第二相似度是否大于或者等于第二预设阈值；Judging whether the second degree of similarity is greater than or equal to a second preset threshold;

若大于或者等于所述第二预设阈值，则将所述帧图像标记为走神状态图像。If it is greater than or equal to the second preset threshold, the frame image is marked as a distracted state image.

在一些实施例中，所述结合所述表情，识别所述帧图像中所述用户的学习状态的步骤，进一步包括：In some embodiments, the step of identifying the learning state of the user in the frame image in combination with the expression further includes:

若是，则检测所述用户的心率；If yes, detect the user's heart rate;

判断所述心率是否大于或者等于第三预设阈值；Judging whether the heart rate is greater than or equal to a third preset threshold;

若大于或者等于所述第三预设阈值，则将所述帧图像标记为专注状态图像；If it is greater than or equal to the third preset threshold, mark the frame image as a focused state image;

若小于所述第三预设阈值，则将所述帧图像标记为走神状态图像。If it is less than the third preset threshold, the frame image is marked as a distracted state image.

在一些实施例中，所述结合所述表情，识别所述帧图像中所述用户的学习状态的步骤，具体包括：In some embodiments, the step of identifying the learning state of the user in the frame image in combination with the expression specifically includes:

从所述帧图像提取各脸部器官的几何特征；Extracting geometric features of each facial organ from the frame image;

根据各所述脸部器官的几何特征，并且结合预设分类算法模型，确定所述用户的学习状态是处于专注状态还是走神状态。According to the geometric characteristics of each of the facial organs and combined with a preset classification algorithm model, it is determined whether the user's learning state is in a focused state or a distracted state.

在一些实施例中，所述方法在步骤结合所述表情，识别所述帧图像中所述用户的学习状态之后还包括：根据所述用户的学习状态，确定所述用户的专注时间。In some embodiments, the method after identifying the learning state of the user in the frame image in combination with the expression in the step further includes: determining the concentration time of the user according to the learning state of the user.

在一些实施例中，所述根据所述用户的学习状态，确定所述用户的专注时间步骤，具体包括：In some embodiments, the step of determining the concentration time of the user according to the learning state of the user specifically includes:

获取所述专注状态图像的录取时间；Acquiring the admission time of the concentration state image;

将所述专注状态图像的录取时间进行统计，得到所述用户专注时间。Calculate the admission time of the focus state image to obtain the user focus time.

为解决上述技术问题，第二方面，本发明实施例中提供了一种学习状态判断装置，包括：In order to solve the above technical problems, in a second aspect, an embodiment of the present invention provides a learning state judging device, including:

获取模块，用于从用户的上课视频中获取帧图像；The acquisition module is used to acquire frame images from the user's class video;

第一识别模块，用于识别所述用户在所述帧图像中的表情；The first recognition module is used to recognize the user's expression in the frame image;

第二识别模块，用于结合所述表情，识别所述帧图像中所述用户的学习状态。The second recognition module is configured to recognize the learning state of the user in the frame image in combination with the expression.

在一些实施例中，还包括：确定模块，用于根据所述用户的学习状态，确定所述用户的专注时间。In some embodiments, the method further includes: a determining module, configured to determine the concentration time of the user according to the learning state of the user.

为解决上述技术问题，第三方面，本发明实施例提供了一种智能机器人，包括：In order to solve the above technical problems, in a third aspect, an embodiment of the present invention provides an intelligent robot, including:

图像采集模块，用于采集用户在上课时的上课视频；Image acquisition module, used to collect the class video of the user during class;

至少一个处理器，与所述图像采集模块连接；以及，At least one processor connected to the image acquisition module; and,

与所述至少一个处理器通信连接的存储器；其中，A memory communicatively connected with the at least one processor; wherein,

所述存储器存储有可被所述至少一个处理器执行的指令，所述指令被所述至少一个处理器执行，以使所述至少一个处理器能够执行如上第一方面所述的方法。The memory stores instructions executable by the at least one processor, and the instructions are executed by the at least one processor, so that the at least one processor can execute the method described in the first aspect above.

为解决上述技术问题，第四方面，本发明实施例提供了一种包含程序代码的计算机程序产品，当所述计算机程序产品在电子设备上运行时，使得所述电子设备执行如上第一方面所述的方法。In order to solve the above technical problems, in the fourth aspect, embodiments of the present invention provide a computer program product containing program code. When the computer program product runs on an electronic device, the electronic device executes the above-mentioned first aspect. The method described.

本发明实施例的有益效果：区别于现有技术的情况，本发明实施例中提供了一种学习状态的判断方法、装置及智能机器人，该方法通过从用户的上课视频中获取帧图像，识别所述用户在所述帧图像中的表情，结合所述表情，识别所述帧图像中所述用户的学习状态。由于用户在不同表情下，学习状态的呈现方式不一样，通过先进行表情识别，再结合表情来判断用户的在上课时的学习状态，实现对用户在上课时的学习状态进行准确识别，避免了表情对学习状态造成的混淆误判，提高了判断上课学习状态的准确性。Beneficial effects of the embodiments of the present invention: Different from the prior art, the embodiments of the present invention provide a method, device, and intelligent robot for judging the learning state. The method obtains frame images from the user’s class video and recognizes The expression of the user in the frame image is combined with the expression to identify the learning state of the user in the frame image. Because users have different expressions of learning status under different expressions, the user’s learning status in class is determined by first performing expression recognition, and then combined with expressions to determine the user’s learning status in class, so as to achieve accurate recognition of the user’s learning status in class and avoid The confusion and misjudgment caused by facial expressions on the learning state improves the accuracy of judging the learning state in class.

附图说明Description of the drawings

一个或多个实施例通过与之对应的附图中的图片进行示例性说明，这些示例性说明并不构成对实施例的限定，附图中具有相同参考数字标号的元件表示为类似的元件，除非有特别申明，附图中的图不构成比例限制。One or more embodiments are exemplified by the pictures in the corresponding drawings. These exemplified descriptions do not constitute a limitation on the embodiments. The elements with the same reference numerals in the drawings are denoted as similar elements. Unless otherwise stated, the figures in the attached drawings do not constitute a scale limitation.

图1是本发明实施例的学习状态的判断方法的实施例的应用环境的示意图；FIG. 1 is a schematic diagram of an application environment of an embodiment of a method for judging a learning state according to an embodiment of the present invention;

图2是本发明实施例提供的一种学习状态判断的方法的流程图；FIG. 2 is a flowchart of a method for judging learning status according to an embodiment of the present invention;

图3是图2所示方法中步骤130的一子流程图；FIG. 3 is a sub-flow chart of step 130 in the method shown in FIG. 2;

图4是图2所示方法中步骤130的另一子流程图；FIG. 4 is another sub-flow chart of step 130 in the method shown in FIG. 2;

图5是本发明另一实施例提供的一种学习状态判断的方法的流程图；FIG. 5 is a flowchart of a method for judging learning status according to another embodiment of the present invention;

图6是图5所示方法中步骤140的一子流程图；FIG. 6 is a sub-flow chart of step 140 in the method shown in FIG. 5;

图7是本发明实施例提供的一种学习状态判断装置的结构示意图；FIG. 7 is a schematic structural diagram of a learning state judgment device provided by an embodiment of the present invention;

图8是本发明实施例提供的执行上述学习状态的判断方法的智能机器人的硬件结构示意图。FIG. 8 is a schematic diagram of the hardware structure of an intelligent robot that executes the above-mentioned method for determining a learning state provided by an embodiment of the present invention.

具体实施方式Detailed ways

下面结合具体实施例对本发明进行详细说明。以下实施例将有助于本领域的技术人员进一步理解本发明，但不以任何形式限制本发明。应当指出的是，对本领域的普通技术人员来说，在不脱离本发明构思的前提下，还可以做出若干变形和改进。这些都属于本发明的保护范围。The present invention will be described in detail below in conjunction with specific embodiments. The following examples will help those skilled in the art to further understand the present invention, but do not limit the present invention in any form. It should be pointed out that for those of ordinary skill in the art, several modifications and improvements can be made without departing from the concept of the present invention. These all belong to the protection scope of the present invention.

为了使本申请的目的、技术方案及优点更加清楚明白，以下结合附图及实施例，对本申请进行进一步详细说明。应当理解，此处所描述的具体实施例仅用以解释本申请，并不用于限定本申请。In order to make the purpose, technical solutions, and advantages of this application clearer, the following further describes this application in detail with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described here are only used to explain the present application, and are not used to limit the present application.

需要说明的是，如果不冲突，本发明实施例中的各个特征可以相互结合，均在本申请的保护范围之内。另外，虽然在装置示意图中进行了功能模块划分，在流程图中示出了逻辑顺序，但是在某些情况下，可以以不同于装置中的模块划分，或流程图中的顺序执行所示出或描述的步骤。此外，本文所采用的“第一”、“第二”等字样并不对数据和执行次序进行限定，仅是对功能和作用基本相同的相同项或相似项进行区分。It should be noted that, if there is no conflict, the various features in the embodiments of the present invention can be combined with each other, and all fall within the protection scope of the present application. In addition, although the functional modules are divided in the schematic diagram of the device, and the logical sequence is shown in the flowchart, in some cases, the module division in the device may be different from the module division in the device, or the sequence shown in the flowchart may be executed. Or the steps described. In addition, the words "first" and "second" used herein do not limit the data and execution order, but only distinguish the same or similar items with basically the same function and effect.

除非另有定义，本说明书所使用的所有的技术和科学术语与属于本发明的技术领域的技术人员通常理解的含义相同。本说明书中在本发明的说明书中所使用的术语只是为了描述具体的实施方式的目的，不是用于限制本发明。本说明书所使用的术语“和/或”包括一个或多个相关的所列项目的任意的和所有的组合。Unless otherwise defined, all technical and scientific terms used in this specification have the same meaning as commonly understood by those skilled in the technical field of the present invention. The terms used in the specification of the present invention in this specification are only for the purpose of describing specific embodiments, and are not used to limit the present invention. The term "and/or" used in this specification includes any and all combinations of one or more related listed items.

请参见图1，为应用于本发明的学习状态判断的方法的实施例的应用环境的示意图，该***包括：服务器10和摄像机20。Please refer to FIG. 1, which is a schematic diagram of an application environment of an embodiment of the method for determining a learning state of the present invention. The system includes a server 10 and a camera 20.

所述服务器10和所述摄像机20通信连接，所述通信连接可以是有线连接，例如：光纤电缆，也可以是无线通信连接，例如：WIFI连接、蓝牙连接、4G无线通信连接，5G无线通信连接等等。The server 10 and the camera 20 are in a communication connection. The communication connection may be a wired connection, such as an optical fiber cable, or a wireless communication connection, such as a WIFI connection, a Bluetooth connection, a 4G wireless communication connection, and a 5G wireless communication connection. and many more.

所述摄像机20为能够录制视频的装置，例如：具有拍摄功能的手机、录像机或摄像头等。The camera 20 is a device capable of recording video, for example, a mobile phone with a shooting function, a video recorder, or a camera.

所述服务器10为是能够按照程序运行，自动、高速处理海量数据的设备，其通常是由硬件***和软件***所组成，例如：计算机、智能手机等等。服务器10可以是本地设备，其直接与所述摄像机20连接；也可以是云设备，例如：云服务器、云主机、云服务平台、云计算平台等，云设备通过网络与所述摄像机20连接，并且两者通过预定的通信协议通信连接，在一些实施例，该通信协议可以是TCP/IP、NETBEUI和IPX/SPX等协议。The server 10 is a device that can run in accordance with a program and process massive amounts of data automatically and at a high speed. It is usually composed of a hardware system and a software system, such as a computer, a smart phone, and so on. The server 10 may be a local device that is directly connected to the camera 20; it may also be a cloud device, such as a cloud server, a cloud host, a cloud service platform, a cloud computing platform, etc. The cloud device is connected to the camera 20 through a network, And the two communicate through a predetermined communication protocol. In some embodiments, the communication protocol may be TCP/IP, NETBEUI, and IPX/SPX.

可以理解的是：所述服务器10和摄像机20也可以集成在一起，作为一体式的设备，又或者，摄像机20和服务器10集成于智能机器人上，作为智能机器人的部件。智能机器人或摄像机可设置于教室内或用户所处的任何学习场所内，例如互联网教育，智能机器人或摄像机可设置在用户的家里或其他学习场所里。由智能机器人或摄像机采集用户的上课视频，并且基于上课视频来判断用户在上课时的学习状态。It is understandable that the server 10 and the camera 20 can also be integrated together as an integrated device, or the camera 20 and the server 10 can be integrated on the intelligent robot as a component of the intelligent robot. Smart robots or cameras can be installed in classrooms or in any learning places where users are located. For example, for Internet education, smart robots or cameras can be installed in users' homes or other learning places. Intelligent robots or cameras collect the user's class video, and based on the class video, determine the user's learning status during class.

在一些具体的应用场景中，如现在流行的互联网教育，用户在家通过电脑即可学习到各学科老师的实时直播课程，此时，所述摄像机可为电脑前端配置的摄像头。这种方式的授课，老师和用户不是面对面的状态，使老师不能获得用户学习状态的反馈，不能很好且准确的对用户的学习状态进行判断。In some specific application scenarios, such as popular Internet education, users can learn real-time live courses of teachers of various subjects at home through a computer. At this time, the camera may be a camera configured on the front end of the computer. In this way of teaching, the teacher and the user are not in a face-to-face state, so that the teacher cannot obtain feedback on the user's learning state, and cannot judge the user's learning state very well and accurately.

本发明实施例提供了一种应用于上述应用环境的学习状态判断的方法，该方法可被上述服务器10执行，请参阅图2，该方法包括：The embodiment of the present invention provides a method for judging the learning state applied to the above-mentioned application environment. The method can be executed by the above-mentioned server 10. Please refer to FIG. 2. The method includes:

步骤110：从用户的上课视频中获取帧图像。Step 110: Obtain frame images from the user's lesson video.

上课视频是指用户在听课时的图像集，其包含用户的若干个正脸图像。而上课视频可由设置于教室内或用户其他学习场所内的摄像机20采集到的，摄像机20可完整的获取用户的脸部图像信息，例如：将摄像机20设置于黑板边上，并且将摄像头的取景方向正对教室，则可采集到用户在上课时的上课视频；例如在互联网教育中，摄像头置于电脑上方或者摄像头为电脑的内置摄像头，可采集到用户在上课时的脸部图像信息。The class video refers to the image collection of the user during the class, which contains several frontal images of the user. The class video can be collected by the camera 20 installed in the classroom or other learning places of the user. The camera 20 can fully obtain the user's facial image information. For example, the camera 20 is set on the edge of the blackboard, and the camera is set to view the view. Directly facing the classroom, you can collect the user's class video during class; for example, in Internet education, the camera is placed above the computer or the camera is a built-in camera of the computer, and the user's facial image information during class can be collected.

步骤120：识别所述用户在所述帧图像中的表情。Step 120: Identify the user's expression in the frame image.

用户在上课时会随上课内容或者周边同学影响呈现不同表情，而在不同表情下，用户的脸部所呈现的内容会不一样的，反而言之，可以根据用户的人脸所呈现的内容来确定用户的表情。识别所述用户在所述上课视频中各帧图像的表情具体包括以下步骤：1.人脸图像的提取，2.表情特征提取,3.表情归类。人脸图像的提取可以根据现有的图像提取算法从图像中提取出来。表情特征提取可基于几何特征法，依据面部器官的形状和位置来提取表情特征。表情归类是可基于随机森林算法、表情特征降维法、SVM多分类模型或者神经网络算法，将提到的表情特征进行表情归类，进而确定用户的表情。In class, the user will present different expressions according to the content of the class or the influence of the surrounding students. Under different expressions, the content of the user’s face will be different. On the contrary, it can be based on the content of the user’s face. Determine the user's expression. Recognizing the user's facial expression in each frame of the video in the class video specifically includes the following steps: 1. facial image extraction, 2. facial expression feature extraction, and facial expression categorization. The face image can be extracted from the image according to the existing image extraction algorithm. The expression feature extraction can be based on the geometric feature method, and the expression feature can be extracted according to the shape and position of the facial organs. Expression classification can be based on random forest algorithm, expression feature reduction method, SVM multi-classification model or neural network algorithm, to classify the mentioned expression features, and then determine the user's expression.

在一些实施例中，为了提高表情特征的提取精度，在表情特征的提取之前，还可以对人脸图像的大小和灰度进行归一化处理，改善人脸图像质量，消除噪声。In some embodiments, in order to improve the accuracy of expression feature extraction, before the expression feature extraction, the size and gray level of the face image can also be normalized to improve the face image quality and eliminate noise.

步骤130：结合所述表情，识别所述帧图像中所述用户的学习状态。Step 130: Identify the learning state of the user in the frame image in combination with the expression.

学习状态包括走神状态和专注状态，其用于反映用户在上课时的状态。但是，相同的学习状态，在不同的表情下，其表现形式是不一样的，因此，先识别表情，再结合表情识别用户的学习状态，可以提高识别的准确性。The learning state includes a state of distraction and a state of concentration, which are used to reflect the state of the user during class. However, the same learning state has different expressions under different expressions. Therefore, first recognizing the expression, and then combining the expression to recognize the user's learning state can improve the accuracy of recognition.

用户在上课时面部会表现出不同的表情，例如高兴、疑惑、难过、中性等。用户表现上述任一种表情时，其学习状态也会有区别，例如，根据表情识别判断出所述用户此时的表情为高兴，而高兴又可以细分为“因理解了知识点而高兴”和“在规划周末出游行程而高兴”两种不同的学习状态。所以在判断用户的上课学习状态时，难免受到用户表情带来的影响。将所述表情和所述用户的上课帧图像相结合进行判断，可有效地避免用户表情对学习状态判断的影响，提高用户上课学习状态判断的准确性。The user will show different facial expressions during class, such as happy, confused, sad, neutral, etc. When the user expresses any of the above-mentioned expressions, their learning status will also be different. For example, according to the expression recognition, it is judged that the user’s expression at this time is happy, and happy can be subdivided into "happy because of understanding the knowledge points" There are two different learning states and "happy in planning a weekend trip." Therefore, when judging the user's class learning status, it is inevitable to be affected by the user's expression. Combining the expression with the user's class frame image for judgment can effectively avoid the influence of the user's expression on the judgment of the learning state and improve the accuracy of the judgment of the user's class learning state.

在本发明实施例中，通过采集用户上课时的上课视频，再结合表情识别用户的学习状态。用户相同的学习状态在不同表情下其表现方式是不一样的，因此，先进行表情识别，再判断学习状态，可以提高对学习状态判断的准确性。In the embodiment of the present invention, the class video of the user during class is collected, and the learning state of the user is recognized in combination with facial expressions. The performance of the same learning state of a user under different expressions is different. Therefore, first performing expression recognition and then judging the learning state can improve the accuracy of the judgment of the learning state.

具体的，在一些实施例中，所述表情包括高兴、疑惑、疲惫和中性，所述学习状态包括专注状态、走神状态。当表情为高兴、疑惑和中性时，脸部区别特征明显，通过图像对比的方式可以准确的识别用户的学习状态。请参阅图3，步骤130具体包括：Specifically, in some embodiments, the expressions include happy, confused, exhausted, and neutral, and the learning state includes a focused state and a distracted state. When the expression is happy, doubtful, and neutral, the facial features are obvious, and the user's learning state can be accurately identified through image comparison. Please refer to Fig. 3, step 130 specifically includes:

步骤131：判断所述表情是否为疲惫，若否，则执行步骤132；若是，则执行步骤139。Step 131: Determine whether the expression is tired, if not, go to step 132; if yes, go to step 139.

步骤132：获取预先存储的所述用户与所述表情对应的专注基准图片和走神基准图片。Step 132: Obtain a pre-stored focus reference picture and a wandering reference picture corresponding to the expression of the user.

专注基准图片是指用户在所述表情下处于专注状态的图片，走神基准图片是指用户在所述表情下处于走神状态的图片，专注基准图片和走神基准图片可通过对所述上课视频的各帧图像进行人工筛选采集。The focus reference picture refers to the picture in which the user is in a state of concentration under the expression, and the distraction reference picture refers to the picture in which the user is in a state of distraction under the expression. The focus reference picture and the distraction reference picture can be obtained by comparing each of the class videos. Frame images are manually selected and collected.

值得说明的是：同一个用户，当其表情不相同时，其专注基准图片和走神基准图片是不相同的。当然，不同用户之间，由于其外貌不一样，其专注基准图片和走神基准图片也是不相同。It is worth noting that when the same user has different facial expressions, the focus reference picture and the distracted reference picture are not the same. Of course, between different users, due to their different appearances, their focused reference pictures and distracted reference pictures are also different.

步骤133：将所述帧图像与所述专注基准图片进行比对，得到第一匹配度。Step 133: Compare the frame image with the focus reference picture to obtain a first matching degree.

步骤134：判断所述第一匹配度是否大于或者等于第一预设阈值，若大于或者等于所述第一预设阈值，则执行步骤135，否则，执行步骤136；Step 134: Determine whether the first matching degree is greater than or equal to a first preset threshold, if greater than or equal to the first preset threshold, perform step 135, otherwise, perform step 136;

步骤135：将所述帧图像标记为专注状态图像。Step 135: Mark the frame image as a focus state image.

当第一相似度大于或者等于第一预设阈值时，则说明用户此时的脸部图像与专注基准图片高度相似，可认为用户此时处于专注状态。When the first similarity is greater than or equal to the first preset threshold, it means that the user's facial image at this time is highly similar to the focus reference picture, and the user can be considered to be in a focused state at this time.

需要说明的是：第一预设阈值具体数值可以通过多次实验确定，并且第一预设阈值可以根据不同用户设置不同的数值。It should be noted that the specific value of the first preset threshold may be determined through multiple experiments, and the first preset threshold may be set to different values according to different users.

步骤136：则将所述帧图像与所述走神基准图片进行比对，得到第二匹配度。Step 136: compare the frame image with the wandering reference picture to obtain a second degree of matching.

步骤137：判断所述第二匹配度是否大于或者等于第二预设阈值，若大于或者等于所述第二预设阈值，则执行步骤138。Step 137: Determine whether the second matching degree is greater than or equal to a second preset threshold, and if greater than or equal to the second preset threshold, perform step 138.

当第二相似度大于或者等于第二预设阈值时，则说明脸部图像与走神基准图片高度相似，可确定用户处于走神状态。When the second degree of similarity is greater than or equal to the second preset threshold, it means that the facial image is highly similar to the reference picture of distraction, and it can be determined that the user is in a state of distraction.

对于第二预设阈值的具体数值，也可以通过多次实验确定，并且第一预设阈值可以根据不同用户设置不同的数值。The specific value of the second preset threshold may also be determined through multiple experiments, and the first preset threshold may be set to different values according to different users.

步骤138：则将所述帧图像标记为走神状态图像。Step 138: Mark the frame image as a distracted state image.

在一些实施实施例中，当表情为疲惫时，通过检测心率来确定用户的学习状态，具体包括：In some embodiments, when the expression is tired, the user's learning state is determined by detecting the heart rate, which specifically includes:

步骤139：检测所述用户的心率；Step 139: Detect the heart rate of the user;

对于用户的心率，可以通过图像心率法进行检测，具体地，使用OpenCV提供的人脸检测器，对所述用户在所述脸部图像进行人脸区域检测并记录区域位置，然后将人脸区域图像分离为RGB三通道，分别计算区域内灰度均值，可得到随时间变化的三个R、G、B信号，最后对R、G、B信号进行独立成分分析，得到用户在所述的心率。The heart rate of the user can be detected by the image heart rate method. Specifically, the face detector provided by OpenCV is used to detect the user's face area in the face image and record the location of the area, and then the face area The image is separated into three RGB channels, and the average gray value in the area is calculated separately, and three R, G, and B signals that change with time can be obtained. Finally, the independent component analysis of the R, G, and B signals is performed to obtain the user's heart rate. .

步骤1310：判断所述心率是否大于或者等于第三预设阈值，若大于或者等于所述第三预设阈值，执行步骤1311，否则，执行步骤1312；Step 1310: Determine whether the heart rate is greater than or equal to the third preset threshold, if it is greater than or equal to the third preset threshold, go to step 1311, otherwise, go to step 1312;

步骤1311：则将所述帧图像标记为专注状态图像。Step 1311: Mark the frame image as a focused state image.

步骤1312：则将所述帧图像标记为走神状态图像。Step 1312: Mark the frame image as a distracted state image.

在本发明实施例中，通过判断所述表情是否为疲惫，若是，则检测用户的心率以判断所述用户的学习状态为专注状态还是走神状态。当人的表情为疲惫时，因脸部特征区别不明显，导致疲惫表情下的所述专注基准图片和所述走神基准图片的区别特征不显著，将所述帧图像与所述专注基准图片和走神基准图片进行比对时，可能会产生所述第一匹配度和所述第二匹配度接近的情况；或所述帧图片表示的学习状态为走神状态，却因所述专注基准图片和所述走神基准图片的区别特征不显著，导致在进行图像比对时，所述第一匹配度大于所述第一预设阈值，所述帧图片表示的学习状态被判断为专注状态，出现错误判断。这种情况下将会降低用户上课学习状态判断的准确性。因此，当所述表情为疲惫时，采用检测用户心率的方法，以提高用户上课学习状态判断的准确性。当所述用户的心率大于所述第三预设阈值时，可说明此时用户的大脑活跃度较高，即可判断用户此时的学习状态为专注状态，反之，用户此时的学习状态为走神状态。In the embodiment of the present invention, by judging whether the expression is tired, and if so, the user's heart rate is detected to determine whether the user's learning state is a focused state or a distracted state. When the person’s expression is tired, the difference between the facial features is not obvious, and the difference between the focus reference picture and the distracted reference picture under the tired expression is not significant, and the frame image is compared with the focus reference picture and When comparing the reference pictures with distracted attention, it may happen that the first matching degree is close to the second degree of matching; or the learning state represented by the frame picture is distracted, but the focus of the reference picture is different from that of the second matching degree. The distinguishing feature of the reference picture of the distracted spirit is not significant, which results in that the first matching degree is greater than the first preset threshold when the image is compared, the learning state represented by the frame picture is judged to be a focused state, and a wrong judgment occurs . In this case, the accuracy of the judgment of the user's learning status in class will be reduced. Therefore, when the expression is tired, the method of detecting the user's heart rate is adopted to improve the accuracy of the judgment of the user's learning state in class. When the user's heart rate is greater than the third preset threshold, it can indicate that the user's brain activity is high at this time, and it can be judged that the user's learning state at this time is a focused state. On the contrary, the user's learning state at this time is Distracted state.

在另一些实施例中，由于用户处于不同学习状态时，其脸部所表现的内容是不同的，因此，也可以通过识别用户的脸部特征，并且根据脸部特征来确定用户的学习状态，请参阅图4，所述步骤130具体包括：In other embodiments, since the content of the user's face is different when the user is in different learning states, it is also possible to identify the user's facial features and determine the user's learning state according to the facial features. Please refer to FIG. 4, the step 130 specifically includes:

步骤131a：从所述帧图像提取各脸部器官的几何特征。 Step 131a: Extract geometric features of each facial organ from the frame image.

几何特征包括用于表征面部器官的形状、大小和距离，其可采用现有的图像提取算法从脸部图像中提取出来。在一些实施例中，可采用Face++函数库提取脸部图像的几何特征。Geometric features include the shape, size and distance used to characterize facial organs, which can be extracted from facial images using existing image extraction algorithms. In some embodiments, the Face++ function library may be used to extract the geometric features of the face image.

步骤132a：根据各所述脸部器官的几何特征，并且结合预设分类算法模型，确定所述用户的学习状态是处于专注状态还是走神状态。 Step 132a: Determine whether the learning state of the user is in a focused state or a distracted state according to the geometric characteristics of each facial organ in combination with a preset classification algorithm model.

预设分类算法模型可调用现有的分类算法，例如逻辑回归算法、随机森林算法、表情特征降维法、SVM多分类模型或者神经网络算法等。The preset classification algorithm model can call existing classification algorithms, such as logistic regression algorithm, random forest algorithm, expression feature dimensionality reduction method, SVM multi-classification model or neural network algorithm.

相同的学习状态，在不同的表情下，其表现形式是不一样的，因此，先识别表情，再分别建立各表情下的分类算法模型，各分类算法模型与各对应的表情能最大程度适配，从而，可以提高识别的准确性。The same learning state and different expressions have different expressions. Therefore, the expressions are recognized first, and then the classification algorithm models under each expression are established respectively. Each classification algorithm model can adapt to each corresponding expression to the greatest extent Therefore, the accuracy of recognition can be improved.

具体的，预设分类算法模型是预先建立，预设分类算法模型的建立过程具体包括：Specifically, the preset classification algorithm model is established in advance, and the process of establishing the preset classification algorithm model specifically includes:

步骤(1)：获取用户各表情下的脸部训练样本集的几何特征和标签数据；Step (1): Obtain the geometric features and label data of the face training sample set under each expression of the user;

脸部训练样本集为人脸图像集合，通常是由人工调查选取出的已知结果的历史数据。标签数据用于表征各脸部训练样本的表情，其进行数值化，用1和0表示，1表示用户处于专注状态，0表示用户处于走神状态。The face training sample set is a set of face images, usually historical data of known results selected by manual investigation. The label data is used to characterize the expression of each face training sample, which is digitized and represented by 1 and 0. 1 indicates that the user is in a state of concentration, and 0 indicates that the user is in a state of distraction.

步骤(2)：使用所述脸部训练样本集的几何特征和标签数据对初始分类算法模型进行学习训练，得到特征系数，将特征系数代入对应的初始分类算法模型，得到所述预设分类算法模型。Step (2): Use the geometric features and label data of the face training sample set to learn and train the initial classification algorithm model to obtain feature coefficients, and substitute the feature coefficients into the corresponding initial classification algorithm model to obtain the preset classification algorithm model.

初始分类算法模型中各特征系数的具体数值是不清楚的，是通过各对应表情的脸部训练样本集进行学习得到，能有效拟合相对应的脸部训练样本集的几何特征，从而能够准确地判断各表情下的学习状态。The specific value of each feature coefficient in the initial classification algorithm model is not clear. It is learned through the face training sample set of each corresponding expression, which can effectively fit the geometric features of the corresponding face training sample set, so as to be accurate Judging the learning status under each expression.

具体的，上述步骤(2)具体包括：Specifically, the above step (2) specifically includes:

步骤①：分别将所述用户各表情下的脸部训练样本集的几何特征分为五个特征块，所述五个特征块包括嘴部几何特征块、眼部几何特征块、眉毛几何特征块、人脸轮廓几何特征块和视线几何特征块；Step ①: Separate the geometric features of the face training sample set under each expression of the user into five feature blocks, the five feature blocks including the mouth geometry feature block, the eye geometry feature block, and the eyebrow geometry feature block , Face contour geometric feature block and line of sight geometric feature block;

几何特征维度较高，对应的特征权重系数较多，计算量大且不准确，不利于后期建模和计算。而我们知道，在人工识别用户是处于专注状态还是走神状态时，主要是依据用户的嘴部、眼部、眉毛、轮廓和视线方向进行判断，例如，若用户眉毛可能轻微上扬，眼睛睁大，上下眼帘间距变大，嘴部自然合拢，视线注视前方，人脸轮廓变大，则用户处于专注状态。因此，将面部的几何特征分为嘴部几何特征块、眼部几何特征块、眉毛几何特征块、人脸轮廓几何特征块和视线几何特征块，可以提高建模效率和模型识别正确率。The geometric feature dimension is high, the corresponding feature weight coefficient is large, and the calculation is large and inaccurate, which is not conducive to later modeling and calculation. And we know that when manually identifying whether a user is in a state of concentration or distraction, the judgment is mainly based on the user’s mouth, eyes, eyebrows, contours, and direction of sight. For example, if the user’s eyebrows may be slightly raised and the eyes widened, The distance between the upper and lower eyelids becomes larger, the mouth naturally closes, the line of sight is gaze ahead, and the outline of the face becomes larger, and the user is in a state of concentration. Therefore, dividing the geometric features of the face into mouth geometric feature blocks, eye geometric feature blocks, eyebrow geometric feature blocks, face contour geometric feature blocks, and line of sight geometric feature blocks can improve modeling efficiency and model recognition accuracy.

步骤②：使用所述脸部训练样本集的五个特征块和标签数据对初始逻辑回归模型进行学习训练，得到五个特征块系数，将五个特征块系数代入初始逻辑回归模型，得到所述预设逻辑回归模型。Step ②: Use the five feature blocks and label data of the face training sample set to learn and train the initial logistic regression model to obtain five feature block coefficients, and substitute the five feature block coefficients into the initial logistic regression model to obtain the Preset logistic regression model.

逻辑回归是一种广义线性回归，在线性回归的基础上加入Sigmoid函数进行非线性映射，可以将连续值映射到0和1上。确定逻辑回归模型，即在机器学习中，选择逻辑回归模型作为二分类模型进行建模。Logistic regression is a kind of generalized linear regression. On the basis of linear regression, the Sigmoid function is added to perform nonlinear mapping, which can map continuous values to 0 and 1. Determine the logistic regression model, that is, in machine learning, select the logistic regression model as a binary classification model for modeling.

将所述各表情下的标签数据和所述五个特征块，进行数值化和归一化处理后，即可变成模型学习所需要的数据格式，然后用对应所述表情下的初始逻辑回归模型进行学习训练，得到各表情下的五个特征块系数，将各表情下的五个特征块系数分别代入对应所述表情的初始逻辑回归模型，得到各表情下的预设逻辑回归模型。After the label data under each expression and the five feature blocks are digitized and normalized, they can become the data format required for model learning, and then use the initial logistic regression corresponding to the expression The model performs learning training to obtain five feature block coefficients under each expression, and substitute the five feature block coefficients under each expression into the initial logistic regression model corresponding to the expression to obtain the preset logistic regression model under each expression.

用户处于不同表情下，同一个面部器官特征对用户的学习状态反映程度是不一样的。例如，开心时，嘴部张开上扬，难过时，嘴部合拢向下，而在进行学习状态识别时，预设专注时嘴部自然合拢，对于开心和难过这两种不同的表情，若采用同一算法模型对学习状态进行计算，例如嘴部特征权重一样，从而会造成误判。例如，将开心表情下的用户更容易识别为走神状态，然而，有的用户可能因理解了知识点而开心，有的用户可能因开小差而开心。在针对开心表情下的用户和难过表情下的用户，用采用不同的识别模型对学习状态进行识别，能提高准确率。The user is under different expressions, and the same facial features reflect the user's learning state to different degrees. For example, when you are happy, your mouth is opened and raised, when you are sad, your mouth is closed down, and when performing learning state recognition, the mouth naturally closes when you are pre-focused. For the two different expressions of happy and sad, if you use The same algorithm model calculates the learning state, for example, the mouth feature weight is the same, which will cause misjudgment. For example, it is easier to recognize a user with a happy expression as a state of distraction. However, some users may be happy because they understand the knowledge points, and some users may be happy because of a short run. For users with happy expressions and users with sad expressions, different recognition models are used to recognize the learning state, which can improve the accuracy.

因此，先将用户进行表情分类，然后在各表情类别下，确定各自的逻辑回归模型，从而针对不同的表情，有各自相适应的、能有效拟合的逻辑回归模型，提高了识别准确性。Therefore, first classify the user's facial expressions, and then determine their own logistic regression models under each expression category, so that for different expressions, there are logistic regression models that are suitable and can be effectively fitted to improve the recognition accuracy.

如图5所示，在一些实施例中，所述方法在步骤130之后还包括：As shown in FIG. 5, in some embodiments, the method further includes after step 130:

步骤140：根据所述用户的学习状态，确定所述用户的专注时间。Step 140: Determine the concentration time of the user according to the learning state of the user.

专注时间是指用户在上课时处于专注状态的时间。在确定所述用户的专注时间之后，可以基于用户的专注时间进行匹配课程的时间和课程长度，从而使得课程时间和课程长度与用户匹配，实现个性化定制教育，达到因材施教和因状态施教的效果。Focus time refers to the time that the user is in a focused state during class. After the user’s focus time is determined, the course time and course length can be matched based on the user’s focus time, so that the course time and course length match the user, and personalized education can be achieved to teach students in accordance with their aptitude and status. effect.

进一步的，当用户的数量为多个时，也可基于多个用户的专注时间进行分班教学，例如：将具有相同专注时段的用户集全到一个班上，根据用户的专注时长确定课时长度，从而保证每堂课的用户在上课期间，具有最高的专注力，提高整体教学质量。Further, when the number of users is more than one, the teaching can also be divided into classes based on the focus time of multiple users, for example: gather all users with the same focus time into one class, and determine the length of the class according to the user's focus time , So as to ensure that the users of each class have the highest concentration during the class and improve the overall teaching quality.

在一些实施例中，如图6所示，步骤140具体包括：In some embodiments, as shown in FIG. 6, step 140 specifically includes:

步骤141：获取所述专注状态图像的录取时间。Step 141: Obtain the admission time of the concentration state image.

录取时间是指图像在录摄时的时间。The recording time refers to the time when the image was recorded.

步骤142：将所述专注状态图像的录取时间进行统计，得到所述用户专注时间。Step 142: Calculate the admission time of the concentration state image to obtain the concentration time of the user.

专注状态图像是指用户在对应的录取时间内处于专注状态的图像，则将存在连续关系的专注状态图像的录取时间进行统计后，代表用户在此时间段内一直处于专注状态，则此时间段为用户专注时间。连续关系是指所述专注状态图像在所述上课视频中为连续帧的关系。The focused state image refers to the image that the user is in the focused state during the corresponding enrollment time. After the admission time of the focused state image with continuous relationship is counted, it means that the user has been in the focused state during this time period, then this time period Focus on time for users. The continuous relationship refers to the relationship in which the concentration state images are continuous frames in the class video.

进一步地，在确定所述用户的专注时间之后，可以准确地确定用户处于专注状态的具体时间段，以及，用户处于专注状态的时长，并可以以此为基础为用户匹配课程的时间和课程长度。例如，某些学生的单次专注时长为30分钟，且他们的专注时间范围为早上8点到11点，则将这些用户分为一个班，并且采用早上8点到11点单次授课40分钟，中间休息10分钟的方式进行授课。另一些用户的单次专注时长为40分钟，且他们的专注时间范围为早上10点到12点，则将这些用户分为一个班，采用早上10点到12点单次授课50分钟，中间休息10分钟的方式进行授课。Further, after the user’s focused time is determined, the specific time period during which the user is in the focused state can be accurately determined, as well as the length of time the user is in the focused state, and the user can be matched with the time of the course and the length of the course based on this. . For example, some students have a single focus time of 30 minutes, and their focus time range is from 8 am to 11 am, then these users are divided into a class, and a single class of 40 minutes from 8 am to 11 am , Give lectures with a 10-minute break in between. Other users have a single focus time of 40 minutes, and their focus time range is from 10 am to 12 am, then these users are divided into a class, using a single class from 10 am to 12 am for 50 minutes, with a break Classes are taught in 10 minutes.

本发明实施例还提供了一种学习状态判断装置，请参阅图7，其示出了本申请实施例提供的一种学习状态判断装置的结构，该学习状态判断装置200包括：获取模块210、第一识别模块220和第二识别模块230。The embodiment of the present invention also provides a learning state judging device. Please refer to FIG. 7, which shows the structure of a learning state judging device provided in an embodiment of the present application. The learning state judging device 200 includes: an acquisition module 210, The first identification module 220 and the second identification module 230.

获取模块210用于从用户的上课视频中获取帧图像。第一识别模块220用于识别所述用户在所述帧图像中的表情。第二识别模块230用于结合所述表情，识别所述帧图像中所述用户的学习状态。本发明实施例提供的学习状态判断装置200能够更准确地判断用户学习状态。The obtaining module 210 is used to obtain frame images from the user's class video. The first recognition module 220 is used to recognize the user's expression in the frame image. The second recognition module 230 is configured to recognize the learning state of the user in the frame image in combination with the expression. The learning state judging device 200 provided by the embodiment of the present invention can more accurately judge the learning state of the user.

在一些实施例中，请参阅图7，所述学习状态判断装置200还包括确定模块240，确定模块240用于根据所述用户的学习状态，确定所述用户的专注时间。In some embodiments, referring to FIG. 7, the learning state judging apparatus 200 further includes a determining module 240, and the determining module 240 is configured to determine the concentration time of the user according to the learning state of the user.

在一些实施例中，所述表情包括高兴、疑惑、疲惫和中性，所述学习状态包括专注状态和走神状态，所述第二识别模块230还用于判断所述表情是否为疲惫；若否，则获取预先存储的所述用户与所述表情对应的专注基准图片和走神基准图片；将所述帧图像与所述专注基准图片进行比对，得到第一匹配度；判断所述第一匹配度是否大于或者等于第一预设阈值；若大于或者等于所述第一预设阈值，则将所述帧图像标记为专注状态图像；若小于所述第一预设阈值，则将所述帧图像与所述走神基准图片进行比对，得到第二匹配度；判断所述第二匹配度是否大于或者等于第二预设阈值；若大于或者等于所述第二预设阈值，则将所述帧图像标记为走神状态图像。In some embodiments, the expression includes happy, confused, exhausted, and neutral, the learning state includes a state of concentration and a state of distraction, and the second recognition module 230 is also used to determine whether the expression is exhausted; if not , Obtain the pre-stored focus reference picture and distraction reference picture corresponding to the user and the expression; compare the frame image with the focus reference picture to obtain a first degree of matching; determine the first match Whether the degree is greater than or equal to the first preset threshold; if greater than or equal to the first preset threshold, the frame image is marked as a focused state image; if the frame is less than the first preset threshold, the frame The image is compared with the wandering reference picture to obtain a second matching degree; it is determined whether the second matching degree is greater than or equal to a second preset threshold; if it is greater than or equal to the second preset threshold, the The frame image is marked as a distracted state image.

在一些实施例中，所述第二识别模块230还用于当所述表情为疲惫时，则检测所述用户的心率；判断所述心率是否大于或者等于第三预设阈值；若大于或者等于所述第三预设阈值，则将所述帧图像标记为专注状态图像；若小于所述第三预设阈值，则将所述帧图像标记为走神状态图像。In some embodiments, the second recognition module 230 is further configured to detect the user's heart rate when the expression is tired; determine whether the heart rate is greater than or equal to a third preset threshold; if greater than or equal to If the third preset threshold is lower than the third preset threshold, the frame image is marked as a focused state image; if it is less than the third preset threshold, the frame image is marked as a distracted state image.

在一些实施例中，所述第二识别模块230还包括提取单元(图未示)和识别单元(图未示)。所述提取单元从所述帧图像提取各脸部器官的几何特征。所述识别单元用于根据各所述脸部器官的几何特征，并且结合预设分类算法模型，确定所述用户的学习状态是处于专注状态还是走神状态。In some embodiments, the second identification module 230 further includes an extraction unit (not shown in the figure) and an identification unit (not shown in the figure). The extraction unit extracts geometric features of each facial organ from the frame image. The recognition unit is used to determine whether the learning state of the user is in a focused state or a distracted state according to the geometric characteristics of each facial organ in combination with a preset classification algorithm model.

在一些实施例中，所述确定模块240还包括第一获取单元(图未视)和统计单元(图未视)。所述第一获取单元，用于获取所述专注状态图像的录取时间。所述统计单元，用于将所述专注状态图像的录取时间进行统计，得到所述用户专注时间。In some embodiments, the determining module 240 further includes a first obtaining unit (not shown in the figure) and a statistics unit (not shown in the figure). The first acquiring unit is configured to acquire the admission time of the concentration state image. The statistics unit is configured to count the admission time of the concentration state images to obtain the concentration time of the user.

在本发明实施例中，该学习状态判断装置200通过获取模块210从用户的上课视频中获取帧图像，第一识别模块220识别所述用户在所述帧图像中的表情，然后第二识别模块230结合所述表情，识别所述帧图像中所述用户的学习状态。用户相同的学习状态在不同表情下其表现方式是不一样的，先进行表情识别，再判断学习状态，可以提高对学习状态识别的准确性，进而保证用户的专注时间检测的准确性。In the embodiment of the present invention, the learning state judging device 200 obtains frame images from the user’s class video through the obtaining module 210, the first recognition module 220 recognizes the user’s expression in the frame image, and then the second recognition module 230 recognizes the learning state of the user in the frame image in combination with the expression. The performance of the same learning state of the user under different expressions is different. Performing facial expression recognition first, and then judging the learning state, can improve the accuracy of learning state recognition, thereby ensuring the accuracy of the user's focus time detection.

本发明实施例还提供了一种智能机器人300，请参阅图8，所述智能机器人300包括：图像采集模块310，用于采集用户在上课时的上课视频；至少一个处理器320，与所述图像采集模块310连接；以及，与所述至少一个处理器320通信连接的存储器330，图8中以一个处理器320为例。The embodiment of the present invention also provides an intelligent robot 300. Please refer to FIG. 8. The intelligent robot 300 includes: an image acquisition module 310, which is used to collect class videos of the user during class; at least one processor 320 is connected to the The image acquisition module 310 is connected; and, the memory 330 is communicatively connected with the at least one processor 320. In FIG. 8, one processor 320 is taken as an example.

所述存储器330存储有可被所述至少一个处理器320执行的指令，所述指令被所述至少一个处理器320执行，以使所述至少一个处理器320能够执行上述图2至图6所述的学习状态判断的方法。所述处理器320 和所述存储器330可以通过总线或者其他方式连接，图8中以通过总线连接为例。The memory 330 stores instructions that can be executed by the at least one processor 320, and the instructions are executed by the at least one processor 320, so that the at least one processor 320 can execute the instructions shown in FIGS. 2 to 6 above. The method of judging the learning state described. The processor 320 and the memory 330 may be connected through a bus or in other ways. In FIG. 8, the connection through a bus is taken as an example.

存储器330作为一种非易失性计算机可读存储介质，可用于存储非易失性软件程序、非易失性计算机可执行程序以及模块，如本申请实施例中的学习状态判断的方法的程序指令/模块，例如，附图7所示的各个模块。处理器320通过运行存储在存储器330中的非易失性软件程序、指令以及模块，从而执行服务器的各种功能应用以及数据处理，即实现上述方法实施例学习状态的判断方法。As a non-volatile computer-readable storage medium, the memory 330 can be used to store non-volatile software programs, non-volatile computer-executable programs and modules, such as the program of the learning state judgment method in the embodiment of the present application Instructions/modules, for example, the various modules shown in FIG. 7. The processor 320 executes various functional applications and data processing of the server by running the non-volatile software programs, instructions, and modules stored in the memory 330, that is, realizes the method for judging the learning state of the foregoing method embodiment.

存储器330可以包括存储程序区和存储数据区，其中，存储程序区可存储操作***、至少一个功能所需要的应用程序；存储数据区可存储根据学习状态判断装置的使用所创建的数据等。此外，存储器330可以包括高速随机存取存储器330，还可以包括非易失性存储器330，例如至少一个磁盘存储器件、闪存器件、或其他非易失性固态存储器件。在一些实施例中，存储器330可选包括相对于处理器320远程设置的存储器330，这些远程存储器330可以通过网络连接至人脸识别装置。上述网络的实例包括但不限于互联网、企业内部网、局域网、移动通信网及其组合。The memory 330 may include a storage program area and a storage data area. The storage program area may store an operating system and an application program required by at least one function; the storage data area may store data created according to the use of the learning state judgment device and the like. In addition, the memory 330 may include a high-speed random access memory 330, and may also include a non-volatile memory 330, such as at least one magnetic disk storage device, a flash memory device, or other non-volatile solid-state storage devices. In some embodiments, the memory 330 may optionally include a memory 330 remotely provided with respect to the processor 320, and these remote memories 330 may be connected to a face recognition device through a network. Examples of the aforementioned networks include, but are not limited to, the Internet, corporate intranets, local area networks, mobile communication networks, and combinations thereof.

所述一个或者多个模块存储在所述存储器330中，当被所述一个或者多个处理器320执行时，执行上述任意方法实施例中的学习状态的判断方法，例如，执行以上描述的图2至图6的方法步骤，实现图7中的各模块的功能。The one or more modules are stored in the memory 330, and when executed by the one or more processors 320, the learning state judgment method in any of the foregoing method embodiments is executed, for example, the above-described diagram is executed. The method steps from 2 to FIG. 6 realize the functions of each module in FIG. 7.

上述产品可执行本申请实施例所提供的方法，具备执行方法相应的功能模块和有益效果。未在本实施例中详尽描述的技术细节，可参见本申请实施例所提供的方法。The above-mentioned products can execute the methods provided in the embodiments of the present application, and have functional modules and beneficial effects corresponding to the execution methods. For technical details that are not described in detail in this embodiment, please refer to the method provided in the embodiment of this application.

本申请实施例还提供了一种包含程序代码的计算机程序产品，当所述计算机程序产品在电子设备上运行时，使得所述电子设备执行上述任一方法实施例中学习状态判断的方法，例如，执行以上描述的图2至图6的方法步骤，实现图7中各模块的功能。The embodiment of the present application also provides a computer program product containing program code. When the computer program product runs on an electronic device, the electronic device is caused to execute the learning state judgment method in any of the foregoing method embodiments, for example, , Execute the method steps in Fig. 2 to Fig. 6 described above to realize the function of each module in Fig. 7.

在一些具体的应用场景中，如现在流行的互联网教育，用户在家通过电脑即可学习到各学科老师的实时直播课程，然而这种方式的授课，老师和用户不是面对面的状态，这种状态使老师不能很好的判断学生的学习状态，此时，运用本发明所述的判断学习状态的方法，通过电脑的摄像头即可识别用户的上课表情，结合表情即可判断出用户的学习状态，老师将得到用户学习状态的汇总信息，例如，用户处于专注状态的时段分布范围，以及相应课程用户处于专注状态的持续时间。这种方法，不但能提高用户学习状态判断的准确性，老师也能通过反馈数据更好的针对用户进行教学方式的改进，提高用户的学习效率。In some specific application scenarios, such as the popular Internet education, users can learn real-time live courses of teachers in various subjects at home through computers. However, in this way of teaching, teachers and users are not in a face-to-face state. This state makes The teacher cannot judge the student’s learning state well. At this time, using the method of judging the learning state of the present invention, the user’s class expression can be recognized through the computer’s camera, and the user’s learning state can be judged by combining the expression. The summary information of the user's learning state will be obtained, for example, the distribution range of the time period during which the user is in the focused state, and the duration of the user in the focused state of the corresponding course. This method can not only improve the accuracy of the user's learning state judgment, but also the teacher can better improve the user's teaching method through the feedback data, and improve the user's learning efficiency.

需要说明的是，以上所描述的装置实施例仅仅是示意性的，其中所述作为分离部件说明的单元可以是或者也可以不是物理上分开的，作为单元显示的部件可以是或者也可以不是物理单元，即可以位于一个地方，或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部模块来实现本实施例方案的目的。It should be noted that the device embodiments described above are only illustrative, and the units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physically separate. Units can be located in one place or distributed to multiple network units. Some or all of the modules can be selected according to actual needs to achieve the objectives of the solutions of the embodiments.

通过以上的实施方式的描述，本领域普通技术人员可以清楚地了解到各实施方式可借助软件加通用硬件平台的方式来实现，当然也可以通过硬件。本领域普通技术人员可以理解实现上述实施例方法中的全部或部分流程是可以通过计算机程序来指令相关的硬件来完成，所述的程序可存储于一计算机可读取存储介质中，该程序在执行时，可包括如上述各方法的实施例的流程。其中，所述的存储介质可为磁碟、光盘、只读存储记忆体(Read-Only Memory,ROM)或随机存储记忆体(Random Access Memory,RAM)等。Through the description of the above implementation manners, those of ordinary skill in the art can clearly understand that each implementation manner can be implemented by means of software plus a general hardware platform, and of course, it can also be implemented by hardware. A person of ordinary skill in the art can understand that all or part of the processes in the methods of the foregoing embodiments can be implemented by instructing relevant hardware through a computer program. The program can be stored in a computer readable storage medium, and the program can be stored in a computer readable storage medium. When executed, it may include the procedures of the above-mentioned method embodiments. Wherein, the storage medium may be a magnetic disk, an optical disc, a read-only memory (Read-Only Memory, ROM), or a random access memory (Random Access Memory, RAM), etc.

最后应说明的是：以上实施例仅用以说明本发明的技术方案，而非对其限制；在本发明的思路下，以上实施例或者不同实施例中的技术特征之间也可以进行组合，步骤可以以任意顺序实现，并存在如上所述的本发明的不同方面的许多其它变化，为了简明，它们没有在细节中提供；尽管参照前述实施例对本发明进行了详细的说明，本领域的普通技术人员应当理解：其依然可以对前述各实施例所记载的技术方案进行修改，或者对其中部分技术特征进行等同替换；而这些修改或者替换，并不使相应技术方案的本质脱离本发明各实施例技术方案的范围。Finally, it should be noted that the above embodiments are only used to illustrate the technical solutions of the present invention, not to limit them; under the idea of the present invention, the technical features of the above embodiments or different embodiments can also be combined. The steps can be implemented in any order, and there are many other variations of the different aspects of the present invention as described above. For the sake of brevity, they are not provided in the details; although the present invention has been described in detail with reference to the foregoing embodiments, it is common in the art The skilled person should understand that: they can still modify the technical solutions recorded in the foregoing embodiments, or equivalently replace some of the technical features; and these modifications or replacements do not make the essence of the corresponding technical solutions deviate from the implementations of the present invention. Examples of the scope of technical solutions.

Claims

一种学习状态的判断方法，其特征在于，包括：A method for judging learning status, which is characterized in that it includes:

从用户的上课视频中获取帧图像；Obtain frame images from the user’s lesson video;

识别所述用户在所述帧图像中的表情；Identifying the user's expression in the frame image;

结合所述表情，识别所述帧图像中所述用户的学习状态。In combination with the expression, the learning state of the user in the frame image is recognized.
根据权利要求1所述的方法，其特征在于，所述表情包括高兴、疑惑、疲惫和中性，所述学习状态包括专注状态和走神状态，所述结合所述表情，识别所述帧图像中所述用户的学习状态的步骤，具体包括：The method according to claim 1, wherein the expressions include happy, doubtful, exhausted, and neutral, the learning state includes a state of concentration and a state of distraction, and the expressions are combined to identify those in the frame image The steps of the user's learning state specifically include:

判断所述表情是否为疲惫；Determine whether the expression is tired;

若否，则获取预先存储的所述用户与所述表情对应的专注基准图片和走神基准图片；If not, acquiring pre-stored reference pictures of concentration and distraction corresponding to the facial expressions of the user;

将所述帧图像与所述专注基准图片进行比对，得到第一匹配度；Comparing the frame image with the focus reference picture to obtain a first degree of matching;

判断所述第一匹配度是否大于或者等于第一预设阈值；Judging whether the first matching degree is greater than or equal to a first preset threshold;

若大于或者等于所述第一预设阈值，则将所述帧图像标记为专注状态图像；If it is greater than or equal to the first preset threshold, mark the frame image as a focused state image;

若小于所述第一预设阈值，则将所述帧图像与所述走神基准图片进行比对，得到第二匹配度；If it is less than the first preset threshold, comparing the frame image with the wandering reference picture to obtain a second degree of matching;

判断所述第二匹配度是否大于或者等于第二预设阈值；Judging whether the second matching degree is greater than or equal to a second preset threshold;

若大于或者等于所述第二预设阈值，则将所述帧图像标记为走神状态图像。If it is greater than or equal to the second preset threshold, the frame image is marked as a distracted state image.
根据权利要求2所述的方法，其特征在于，所述结合所述表情，识别所述帧图像中所述用户的学习状态的步骤，进一步包括：The method according to claim 2, wherein the step of identifying the learning state of the user in the frame image in combination with the expression further comprises:

若是，则检测所述用户的心率；If yes, detect the user's heart rate;

判断所述心率是否大于或者等于第三预设阈值；Judging whether the heart rate is greater than or equal to a third preset threshold;

若大于或者等于所述第三预设阈值，则将所述帧图像标记为专注状态图像；If it is greater than or equal to the third preset threshold, mark the frame image as a focused state image;

若小于所述第三预设阈值，则将所述帧图像标记为走神状态图像。If it is less than the third preset threshold, the frame image is marked as a distracted state image.
根据权利要求1所述的方法，其特征在于，所述结合所述表情，识别所述帧图像中所述用户的学习状态的步骤，具体包括：The method according to claim 1, wherein the step of identifying the learning state of the user in the frame image in combination with the expression specifically comprises:

从所述帧图像提取各脸部器官的几何特征；Extracting geometric features of each facial organ from the frame image;

根据各所述脸部器官的几何特征，并且结合预设分类算法模型，确定所述用户的学习状态是处于专注状态还是走神状态。According to the geometric characteristics of the facial organs and combined with a preset classification algorithm model, it is determined whether the user's learning state is in a focused state or a distracted state.
根据权利要求2至4中任一项所述的方法，其特征在于，The method according to any one of claims 2 to 4, characterized in that:

所述方法在步骤结合所述表情，识别所述帧图像中所述用户的学习状态之后还包括：根据所述用户的学习状态，确定所述用户的专注时间。After the step of identifying the learning state of the user in the frame image in combination with the expression in the step, the method further includes: determining the concentration time of the user according to the learning state of the user.
根据权利要5所述的方法，其特征在于，所述根据所述用户的学习状态，确定所述用户的专注时间步骤，具体包括：The method according to claim 5, wherein the step of determining the concentration time of the user according to the learning state of the user specifically includes:

获取所述专注状态图像的录取时间；Acquiring the admission time of the concentration state image;

将所述专注状态图像的录取时间进行统计，得到所述用户专注时间。Calculate the admission time of the focus state image to obtain the user focus time.
一种学习状态判断装置，其特征在于，包括：A learning state judging device, characterized in that it comprises:

获取模块，用于从用户的上课视频中获取帧图像；The acquisition module is used to acquire frame images from the user's class video;

第一识别模块，用于识别所述用户在所述帧图像中的表情；The first recognition module is used to recognize the user's expression in the frame image;

第二识别模块，用于结合所述表情，识别所述帧图像中所述用户的学习状态。The second recognition module is configured to recognize the learning state of the user in the frame image in combination with the expression.
根据权利要求7所述的学习状态判断装置，其特征在于，还包括：7. The learning state judging device according to claim 7, further comprising:

确定模块，用于根据所述用户的学习状态，确定所述用户的专注时间。The determining module is used to determine the concentration time of the user according to the learning state of the user.
一种智能机器人，其特征在于，包括：An intelligent robot, characterized in that it includes:

图像采集模块，用于采集用户在上课时的上课视频；Image acquisition module, used to collect the class video of the user during class;

至少一个处理器，与所述图像采集模块连接；以及，At least one processor connected to the image acquisition module; and,

与所述至少一个处理器通信连接的存储器；其中，A memory communicatively connected with the at least one processor; wherein,

所述存储器存储有可被所述至少一个处理器执行的指令，所述指令被所述至少一个处理器执行，以使所述至少一个处理器能够执行如权利要求1-6任一项所述的方法。The memory stores instructions executable by the at least one processor, and the instructions are executed by the at least one processor, so that the at least one processor can execute any one of claims 1 to 6 Methods.
一种包含程序代码的计算机程序产品，其特征在于，当所述计算机程序产品在电子设备上运行时，使得所述电子设备执行如权利要求1至6中任一项所述的方法。A computer program product containing program code, characterized in that, when the computer program product runs on an electronic device, the electronic device is caused to execute the method according to any one of claims 1 to 6.