WO2021077382A1 - 一种学习状态的判断方法、装置及智能机器人 - Google Patents

一种学习状态的判断方法、装置及智能机器人 Download PDF

Info

Publication number
WO2021077382A1
WO2021077382A1 PCT/CN2019/113169 CN2019113169W WO2021077382A1 WO 2021077382 A1 WO2021077382 A1 WO 2021077382A1 CN 2019113169 W CN2019113169 W CN 2019113169W WO 2021077382 A1 WO2021077382 A1 WO 2021077382A1
Authority
WO
WIPO (PCT)
Prior art keywords
user
state
expression
frame image
learning state
Prior art date
Application number
PCT/CN2019/113169
Other languages
English (en)
French (fr)
Inventor
黄巍伟
郑小刚
王国栋
Original Assignee
中新智擎科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 中新智擎科技有限公司 filed Critical 中新智擎科技有限公司
Priority to PCT/CN2019/113169 priority Critical patent/WO2021077382A1/zh
Priority to CN201980002118.9A priority patent/CN110945522B/zh
Publication of WO2021077382A1 publication Critical patent/WO2021077382A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/174Facial expression recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/10Services
    • G06Q50/20Education
    • G06Q50/205Education administration or guidance

Definitions

  • the embodiments of the present application relate to the field of electronic information technology, and in particular to a method, device and intelligent robot for judging the learning state.
  • classroom teaching is usually evaluated, and the assessment of classroom teaching quality is mainly carried out through two dimensions, namely: the mastery of classroom knowledge and the learning status of students in class.
  • the inventor found that: at present, the learning status of students in class is mainly through manual observation or camera monitoring.
  • the feedback data obtained in this way is all: within 5 minutes of the beginning of class, student A is focused Attending the class, from the 5th minute to the 10th minute of the class, student A wanders and so on. This method cannot well judge the learning status of students in class, and there may be confusion and misjudgment.
  • the technical problem mainly solved by the embodiments of the present invention is to provide a method, device and intelligent robot for judging the learning state, which can improve the accuracy of judging the learning state of the user.
  • an embodiment of the present invention provides a method for judging the learning state, including:
  • the learning state of the user in the frame image is recognized.
  • the facial expressions include happy, confused, exhausted, and neutral, and the learning state includes a focused state and a distracted state;
  • the step of identifying the learning state of the user in the frame image in combination with the expression specifically includes:
  • the frame image is marked as a distracted state image.
  • the step of identifying the learning state of the user in the frame image in combination with the expression further includes:
  • the frame image is marked as a distracted state image.
  • the step of identifying the learning state of the user in the frame image in combination with the expression specifically includes:
  • each of the facial organs it is determined whether the user's learning state is in a focused state or a distracted state.
  • the method after identifying the learning state of the user in the frame image in combination with the expression in the step further includes: determining the concentration time of the user according to the learning state of the user.
  • the step of determining the concentration time of the user according to the learning state of the user specifically includes:
  • an embodiment of the present invention provides a learning state judging device, including:
  • the acquisition module is used to acquire frame images from the user's class video
  • the first recognition module is used to recognize the user's expression in the frame image
  • the second recognition module is configured to recognize the learning state of the user in the frame image in combination with the expression.
  • the method further includes: a determining module, configured to determine the concentration time of the user according to the learning state of the user.
  • an embodiment of the present invention provides an intelligent robot, including:
  • Image acquisition module used to collect the class video of the user during class
  • At least one processor connected to the image acquisition module; and,
  • a memory communicatively connected with the at least one processor; wherein,
  • the memory stores instructions executable by the at least one processor, and the instructions are executed by the at least one processor, so that the at least one processor can execute the method described in the first aspect above.
  • embodiments of the present invention provide a computer program product containing program code.
  • the computer program product runs on an electronic device, the electronic device executes the above-mentioned first aspect. The method described.
  • the embodiments of the present invention provide a method, device, and intelligent robot for judging the learning state.
  • the method obtains frame images from the user’s class video and recognizes
  • the expression of the user in the frame image is combined with the expression to identify the learning state of the user in the frame image. Because users have different expressions of learning status under different expressions, the user’s learning status in class is determined by first performing expression recognition, and then combined with expressions to determine the user’s learning status in class, so as to achieve accurate recognition of the user’s learning status in class and avoid The confusion and misjudgment caused by facial expressions on the learning state improves the accuracy of judging the learning state in class.
  • FIG. 1 is a schematic diagram of an application environment of an embodiment of a method for judging a learning state according to an embodiment of the present invention
  • FIG. 2 is a flowchart of a method for judging learning status according to an embodiment of the present invention
  • FIG. 3 is a sub-flow chart of step 130 in the method shown in FIG. 2;
  • FIG. 4 is another sub-flow chart of step 130 in the method shown in FIG. 2;
  • FIG. 5 is a flowchart of a method for judging learning status according to another embodiment of the present invention.
  • FIG. 6 is a sub-flow chart of step 140 in the method shown in FIG. 5;
  • FIG. 7 is a schematic structural diagram of a learning state judgment device provided by an embodiment of the present invention.
  • FIG. 8 is a schematic diagram of the hardware structure of an intelligent robot that executes the above-mentioned method for determining a learning state provided by an embodiment of the present invention.
  • FIG. 1 is a schematic diagram of an application environment of an embodiment of the method for determining a learning state of the present invention.
  • the system includes a server 10 and a camera 20.
  • the server 10 and the camera 20 are in a communication connection.
  • the communication connection may be a wired connection, such as an optical fiber cable, or a wireless communication connection, such as a WIFI connection, a Bluetooth connection, a 4G wireless communication connection, and a 5G wireless communication connection. and many more.
  • the camera 20 is a device capable of recording video, for example, a mobile phone with a shooting function, a video recorder, or a camera.
  • the server 10 is a device that can run in accordance with a program and process massive amounts of data automatically and at a high speed. It is usually composed of a hardware system and a software system, such as a computer, a smart phone, and so on.
  • the server 10 may be a local device that is directly connected to the camera 20; it may also be a cloud device, such as a cloud server, a cloud host, a cloud service platform, a cloud computing platform, etc.
  • the cloud device is connected to the camera 20 through a network, And the two communicate through a predetermined communication protocol.
  • the communication protocol may be TCP/IP, NETBEUI, and IPX/SPX.
  • server 10 and the camera 20 can also be integrated together as an integrated device, or the camera 20 and the server 10 can be integrated on the intelligent robot as a component of the intelligent robot.
  • Smart robots or cameras can be installed in classrooms or in any learning places where users are located. For example, for Internet education, smart robots or cameras can be installed in users' homes or other learning places. Intelligent robots or cameras collect the user's class video, and based on the class video, determine the user's learning status during class.
  • the camera may be a camera configured on the front end of the computer.
  • the teacher and the user are not in a face-to-face state, so that the teacher cannot obtain feedback on the user's learning state, and cannot judge the user's learning state very well and accurately.
  • the embodiment of the present invention provides a method for judging the learning state applied to the above-mentioned application environment.
  • the method can be executed by the above-mentioned server 10. Please refer to FIG. 2.
  • the method includes:
  • Step 110 Obtain frame images from the user's lesson video.
  • the class video refers to the image collection of the user during the class, which contains several frontal images of the user.
  • the class video can be collected by the camera 20 installed in the classroom or other learning places of the user.
  • the camera 20 can fully obtain the user's facial image information.
  • the camera 20 is set on the edge of the blackboard, and the camera is set to view the view.
  • you can collect the user's class video during class for example, in Internet education, the camera is placed above the computer or the camera is a built-in camera of the computer, and the user's facial image information during class can be collected.
  • Step 120 Identify the user's expression in the frame image.
  • the user will present different expressions according to the content of the class or the influence of the surrounding students. Under different expressions, the content of the user’s face will be different. On the contrary, it can be based on the content of the user’s face.
  • the face image can be extracted from the image according to the existing image extraction algorithm.
  • the expression feature extraction can be based on the geometric feature method, and the expression feature can be extracted according to the shape and position of the facial organs.
  • Expression classification can be based on random forest algorithm, expression feature reduction method, SVM multi-classification model or neural network algorithm, to classify the mentioned expression features, and then determine the user's expression.
  • the size and gray level of the face image can also be normalized to improve the face image quality and eliminate noise.
  • Step 130 Identify the learning state of the user in the frame image in combination with the expression.
  • the learning state includes a state of distraction and a state of concentration, which are used to reflect the state of the user during class.
  • the same learning state has different expressions under different expressions. Therefore, first recognizing the expression, and then combining the expression to recognize the user's learning state can improve the accuracy of recognition.
  • the user will show different facial expressions during class, such as happy, confused, sad, neutral, etc.
  • their learning status will also be different. For example, according to the expression recognition, it is judged that the user’s expression at this time is happy, and happy can be subdivided into "happy because of understanding the knowledge points" There are two different learning states and "happy in planning a weekend trip.” Therefore, when judging the user's class learning status, it is inevitable to be affected by the user's expression. Combining the expression with the user's class frame image for judgment can effectively avoid the influence of the user's expression on the judgment of the learning state and improve the accuracy of the judgment of the user's class learning state.
  • the class video of the user during class is collected, and the learning state of the user is recognized in combination with facial expressions.
  • the performance of the same learning state of a user under different expressions is different. Therefore, first performing expression recognition and then judging the learning state can improve the accuracy of the judgment of the learning state.
  • the expressions include happy, confused, exhausted, and neutral, and the learning state includes a focused state and a distracted state.
  • the expression is happy, doubtful, and neutral, the facial features are obvious, and the user's learning state can be accurately identified through image comparison.
  • step 130 specifically includes:
  • Step 131 Determine whether the expression is tired, if not, go to step 132; if yes, go to step 139.
  • Step 132 Obtain a pre-stored focus reference picture and a wandering reference picture corresponding to the expression of the user.
  • the focus reference picture refers to the picture in which the user is in a state of concentration under the expression
  • the distraction reference picture refers to the picture in which the user is in a state of distraction under the expression.
  • the focus reference picture and the distraction reference picture can be obtained by comparing each of the class videos. Frame images are manually selected and collected.
  • Step 133 Compare the frame image with the focus reference picture to obtain a first matching degree.
  • Step 134 Determine whether the first matching degree is greater than or equal to a first preset threshold, if greater than or equal to the first preset threshold, perform step 135, otherwise, perform step 136;
  • Step 135 Mark the frame image as a focus state image.
  • the first similarity is greater than or equal to the first preset threshold, it means that the user's facial image at this time is highly similar to the focus reference picture, and the user can be considered to be in a focused state at this time.
  • the specific value of the first preset threshold may be determined through multiple experiments, and the first preset threshold may be set to different values according to different users.
  • Step 136 compare the frame image with the wandering reference picture to obtain a second degree of matching.
  • Step 137 Determine whether the second matching degree is greater than or equal to a second preset threshold, and if greater than or equal to the second preset threshold, perform step 138.
  • the second degree of similarity is greater than or equal to the second preset threshold, it means that the facial image is highly similar to the reference picture of distraction, and it can be determined that the user is in a state of distraction.
  • the specific value of the second preset threshold may also be determined through multiple experiments, and the first preset threshold may be set to different values according to different users.
  • Step 138 Mark the frame image as a distracted state image.
  • the user's learning state is determined by detecting the heart rate, which specifically includes:
  • Step 139 Detect the heart rate of the user
  • the heart rate of the user can be detected by the image heart rate method.
  • the face detector provided by OpenCV is used to detect the user's face area in the face image and record the location of the area, and then the face area The image is separated into three RGB channels, and the average gray value in the area is calculated separately, and three R, G, and B signals that change with time can be obtained. Finally, the independent component analysis of the R, G, and B signals is performed to obtain the user's heart rate. .
  • Step 1310 Determine whether the heart rate is greater than or equal to the third preset threshold, if it is greater than or equal to the third preset threshold, go to step 1311, otherwise, go to step 1312;
  • Step 1311 Mark the frame image as a focused state image.
  • Step 1312 Mark the frame image as a distracted state image.
  • the user's heart rate is detected to determine whether the user's learning state is a focused state or a distracted state.
  • the difference between the facial features is not obvious, and the difference between the focus reference picture and the distracted reference picture under the tired expression is not significant, and the frame image is compared with the focus reference picture and
  • the first matching degree is close to the second degree of matching; or the learning state represented by the frame picture is distracted, but the focus of the reference picture is different from that of the second matching degree.
  • the distinguishing feature of the reference picture of the distracted spirit is not significant, which results in that the first matching degree is greater than the first preset threshold when the image is compared, the learning state represented by the frame picture is judged to be a focused state, and a wrong judgment occurs . In this case, the accuracy of the judgment of the user's learning status in class will be reduced. Therefore, when the expression is tired, the method of detecting the user's heart rate is adopted to improve the accuracy of the judgment of the user's learning state in class.
  • the user's heart rate is greater than the third preset threshold, it can indicate that the user's brain activity is high at this time, and it can be judged that the user's learning state at this time is a focused state. On the contrary, the user's learning state at this time is Distracted state.
  • the step 130 specifically includes:
  • Step 131a Extract geometric features of each facial organ from the frame image.
  • Geometric features include the shape, size and distance used to characterize facial organs, which can be extracted from facial images using existing image extraction algorithms.
  • the Face++ function library may be used to extract the geometric features of the face image.
  • Step 132a Determine whether the learning state of the user is in a focused state or a distracted state according to the geometric characteristics of each facial organ in combination with a preset classification algorithm model.
  • the preset classification algorithm model can call existing classification algorithms, such as logistic regression algorithm, random forest algorithm, expression feature dimensionality reduction method, SVM multi-classification model or neural network algorithm.
  • the same learning state and different expressions have different expressions. Therefore, the expressions are recognized first, and then the classification algorithm models under each expression are established respectively. Each classification algorithm model can adapt to each corresponding expression to the greatest extent Therefore, the accuracy of recognition can be improved.
  • the preset classification algorithm model is established in advance, and the process of establishing the preset classification algorithm model specifically includes:
  • Step (1) Obtain the geometric features and label data of the face training sample set under each expression of the user;
  • the face training sample set is a set of face images, usually historical data of known results selected by manual investigation.
  • the label data is used to characterize the expression of each face training sample, which is digitized and represented by 1 and 0. 1 indicates that the user is in a state of concentration, and 0 indicates that the user is in a state of distraction.
  • Step (2) Use the geometric features and label data of the face training sample set to learn and train the initial classification algorithm model to obtain feature coefficients, and substitute the feature coefficients into the corresponding initial classification algorithm model to obtain the preset classification algorithm model.
  • each feature coefficient in the initial classification algorithm model is not clear. It is learned through the face training sample set of each corresponding expression, which can effectively fit the geometric features of the corresponding face training sample set, so as to be accurate Judging the learning status under each expression.
  • step (2) specifically includes:
  • Step 1 Separate the geometric features of the face training sample set under each expression of the user into five feature blocks, the five feature blocks including the mouth geometry feature block, the eye geometry feature block, and the eyebrow geometry feature block , Face contour geometric feature block and line of sight geometric feature block;
  • the geometric feature dimension is high, the corresponding feature weight coefficient is large, and the calculation is large and inaccurate, which is not conducive to later modeling and calculation.
  • the judgment is mainly based on the user’s mouth, eyes, eyebrows, contours, and direction of sight. For example, if the user’s eyebrows may be slightly raised and the eyes widened, The distance between the upper and lower eyelids becomes larger, the mouth naturally closes, the line of sight is gaze ahead, and the outline of the face becomes larger, and the user is in a state of concentration. Therefore, dividing the geometric features of the face into mouth geometric feature blocks, eye geometric feature blocks, eyebrow geometric feature blocks, face contour geometric feature blocks, and line of sight geometric feature blocks can improve modeling efficiency and model recognition accuracy.
  • Step 2 Use the five feature blocks and label data of the face training sample set to learn and train the initial logistic regression model to obtain five feature block coefficients, and substitute the five feature block coefficients into the initial logistic regression model to obtain the Preset logistic regression model.
  • Logistic regression is a kind of generalized linear regression.
  • the Sigmoid function is added to perform nonlinear mapping, which can map continuous values to 0 and 1.
  • Determine the logistic regression model that is, in machine learning, select the logistic regression model as a binary classification model for modeling.
  • the label data under each expression and the five feature blocks are digitized and normalized, they can become the data format required for model learning, and then use the initial logistic regression corresponding to the expression
  • the model performs learning training to obtain five feature block coefficients under each expression, and substitute the five feature block coefficients under each expression into the initial logistic regression model corresponding to the expression to obtain the preset logistic regression model under each expression.
  • the user is under different expressions, and the same facial features reflect the user's learning state to different degrees. For example, when you are happy, your mouth is opened and raised, when you are sad, your mouth is closed down, and when performing learning state recognition, the mouth naturally closes when you are pre-focused.
  • the two different expressions of happy and sad if you use
  • the same algorithm model calculates the learning state, for example, the mouth feature weight is the same, which will cause misjudgment. For example, it is easier to recognize a user with a happy expression as a state of distraction. However, some users may be happy because they understand the knowledge points, and some users may be happy because of a short run. For users with happy expressions and users with sad expressions, different recognition models are used to recognize the learning state, which can improve the accuracy.
  • the method further includes after step 130:
  • Step 140 Determine the concentration time of the user according to the learning state of the user.
  • Focus time refers to the time that the user is in a focused state during class. After the user’s focus time is determined, the course time and course length can be matched based on the user’s focus time, so that the course time and course length match the user, and personalized education can be achieved to teach students in accordance with their aptitude and status. effect.
  • the teaching can also be divided into classes based on the focus time of multiple users, for example: gather all users with the same focus time into one class, and determine the length of the class according to the user's focus time , So as to ensure that the users of each class have the highest concentration during the class and improve the overall teaching quality.
  • step 140 specifically includes:
  • Step 141 Obtain the admission time of the concentration state image.
  • the recording time refers to the time when the image was recorded.
  • Step 142 Calculate the admission time of the concentration state image to obtain the concentration time of the user.
  • the focused state image refers to the image that the user is in the focused state during the corresponding enrollment time. After the admission time of the focused state image with continuous relationship is counted, it means that the user has been in the focused state during this time period, then this time period Focus on time for users.
  • the continuous relationship refers to the relationship in which the concentration state images are continuous frames in the class video.
  • the specific time period during which the user is in the focused state can be accurately determined, as well as the length of time the user is in the focused state, and the user can be matched with the time of the course and the length of the course based on this.
  • some students have a single focus time of 30 minutes, and their focus time range is from 8 am to 11 am, then these users are divided into a class, and a single class of 40 minutes from 8 am to 11 am , Give lectures with a 10-minute break in between.
  • Other users have a single focus time of 40 minutes, and their focus time range is from 10 am to 12 am, then these users are divided into a class, using a single class from 10 am to 12 am for 50 minutes, with a break Classes are taught in 10 minutes.
  • the embodiment of the present invention also provides a learning state judging device. Please refer to FIG. 7, which shows the structure of a learning state judging device provided in an embodiment of the present application.
  • the learning state judging device 200 includes: an acquisition module 210, The first identification module 220 and the second identification module 230.
  • the obtaining module 210 is used to obtain frame images from the user's class video.
  • the first recognition module 220 is used to recognize the user's expression in the frame image.
  • the second recognition module 230 is configured to recognize the learning state of the user in the frame image in combination with the expression.
  • the learning state judging device 200 provided by the embodiment of the present invention can more accurately judge the learning state of the user.
  • the learning state judging apparatus 200 further includes a determining module 240, and the determining module 240 is configured to determine the concentration time of the user according to the learning state of the user.
  • the expression includes happy, confused, exhausted, and neutral
  • the learning state includes a state of concentration and a state of distraction
  • the second recognition module 230 is also used to determine whether the expression is exhausted; if not , Obtain the pre-stored focus reference picture and distraction reference picture corresponding to the user and the expression; compare the frame image with the focus reference picture to obtain a first degree of matching; determine the first match Whether the degree is greater than or equal to the first preset threshold; if greater than or equal to the first preset threshold, the frame image is marked as a focused state image; if the frame is less than the first preset threshold, the frame The image is compared with the wandering reference picture to obtain a second matching degree; it is determined whether the second matching degree is greater than or equal to a second preset threshold; if it is greater than or equal to the second preset threshold, the The frame image is marked as a distracted state image.
  • the second recognition module 230 is further configured to detect the user's heart rate when the expression is tired; determine whether the heart rate is greater than or equal to a third preset threshold; if greater than or equal to If the third preset threshold is lower than the third preset threshold, the frame image is marked as a focused state image; if it is less than the third preset threshold, the frame image is marked as a distracted state image.
  • the second identification module 230 further includes an extraction unit (not shown in the figure) and an identification unit (not shown in the figure).
  • the extraction unit extracts geometric features of each facial organ from the frame image.
  • the recognition unit is used to determine whether the learning state of the user is in a focused state or a distracted state according to the geometric characteristics of each facial organ in combination with a preset classification algorithm model.
  • the determining module 240 further includes a first obtaining unit (not shown in the figure) and a statistics unit (not shown in the figure).
  • the first acquiring unit is configured to acquire the admission time of the concentration state image.
  • the statistics unit is configured to count the admission time of the concentration state images to obtain the concentration time of the user.
  • the learning state judging device 200 obtains frame images from the user’s class video through the obtaining module 210, the first recognition module 220 recognizes the user’s expression in the frame image, and then the second recognition module 230 recognizes the learning state of the user in the frame image in combination with the expression.
  • the performance of the same learning state of the user under different expressions is different. Performing facial expression recognition first, and then judging the learning state, can improve the accuracy of learning state recognition, thereby ensuring the accuracy of the user's focus time detection.
  • the embodiment of the present invention also provides an intelligent robot 300.
  • the intelligent robot 300 includes: an image acquisition module 310, which is used to collect class videos of the user during class; at least one processor 320 is connected to the The image acquisition module 310 is connected; and, the memory 330 is communicatively connected with the at least one processor 320.
  • one processor 320 is taken as an example.
  • the memory 330 stores instructions that can be executed by the at least one processor 320, and the instructions are executed by the at least one processor 320, so that the at least one processor 320 can execute the instructions shown in FIGS. 2 to 6 above.
  • the processor 320 and the memory 330 may be connected through a bus or in other ways. In FIG. 8, the connection through a bus is taken as an example.
  • the memory 330 can be used to store non-volatile software programs, non-volatile computer-executable programs and modules, such as the program of the learning state judgment method in the embodiment of the present application Instructions/modules, for example, the various modules shown in FIG. 7.
  • the processor 320 executes various functional applications and data processing of the server by running the non-volatile software programs, instructions, and modules stored in the memory 330, that is, realizes the method for judging the learning state of the foregoing method embodiment.
  • the memory 330 may include a storage program area and a storage data area.
  • the storage program area may store an operating system and an application program required by at least one function; the storage data area may store data created according to the use of the learning state judgment device and the like.
  • the memory 330 may include a high-speed random access memory 330, and may also include a non-volatile memory 330, such as at least one magnetic disk storage device, a flash memory device, or other non-volatile solid-state storage devices.
  • the memory 330 may optionally include a memory 330 remotely provided with respect to the processor 320, and these remote memories 330 may be connected to a face recognition device through a network. Examples of the aforementioned networks include, but are not limited to, the Internet, corporate intranets, local area networks, mobile communication networks, and combinations thereof.
  • the one or more modules are stored in the memory 330, and when executed by the one or more processors 320, the learning state judgment method in any of the foregoing method embodiments is executed, for example, the above-described diagram is executed.
  • the method steps from 2 to FIG. 6 realize the functions of each module in FIG. 7.
  • the embodiment of the present application also provides a computer program product containing program code.
  • the computer program product runs on an electronic device, the electronic device is caused to execute the learning state judgment method in any of the foregoing method embodiments, for example, , Execute the method steps in Fig. 2 to Fig. 6 described above to realize the function of each module in Fig. 7.
  • the embodiments of the present invention provide a method, device, and intelligent robot for judging the learning state.
  • the method obtains frame images from the user’s class video and recognizes
  • the expression of the user in the frame image is combined with the expression to identify the learning state of the user in the frame image. Because users have different expressions of learning status under different expressions, the user’s learning status in class is determined by first performing expression recognition, and then combined with expressions to determine the user’s learning status in class, so as to achieve accurate recognition of the user’s learning status in class and avoid The confusion and misjudgment caused by facial expressions on the learning state improves the accuracy of judging the learning state in class.
  • users can learn real-time live courses of teachers in various subjects at home through computers.
  • teachers and users are not in a face-to-face state.
  • This state makes The teacher cannot judge the student’s learning state well.
  • the user’s class expression can be recognized through the computer’s camera, and the user’s learning state can be judged by combining the expression.
  • the summary information of the user's learning state will be obtained, for example, the distribution range of the time period during which the user is in the focused state, and the duration of the user in the focused state of the corresponding course.
  • This method can not only improve the accuracy of the user's learning state judgment, but also the teacher can better improve the user's teaching method through the feedback data, and improve the user's learning efficiency.
  • the device embodiments described above are only illustrative, and the units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physically separate. Units can be located in one place or distributed to multiple network units. Some or all of the modules can be selected according to actual needs to achieve the objectives of the solutions of the embodiments.
  • each implementation manner can be implemented by means of software plus a general hardware platform, and of course, it can also be implemented by hardware.
  • a person of ordinary skill in the art can understand that all or part of the processes in the methods of the foregoing embodiments can be implemented by instructing relevant hardware through a computer program.
  • the program can be stored in a computer readable storage medium, and the program can be stored in a computer readable storage medium. When executed, it may include the procedures of the above-mentioned method embodiments.
  • the storage medium may be a magnetic disk, an optical disc, a read-only memory (Read-Only Memory, ROM), or a random access memory (Random Access Memory, RAM), etc.

Landscapes

  • Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Educational Administration (AREA)
  • Educational Technology (AREA)
  • Tourism & Hospitality (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Strategic Management (AREA)
  • Theoretical Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Multimedia (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • Human Computer Interaction (AREA)
  • Economics (AREA)
  • Human Resources & Organizations (AREA)
  • Marketing (AREA)
  • Primary Health Care (AREA)
  • General Business, Economics & Management (AREA)
  • Image Analysis (AREA)

Abstract

一种学习状态的判断方法、装置及智能机器人,该方法通过从用户的上课视频中获取帧图像(110),识别所述用户在所述帧图像中的表情(120),结合所述表情,识别所述帧图像中所述用户的学习状态(130)。该方法通过先进行表情识别,再结合表情来判断用户的在上课时的学习状态,实现对用户在上课时的学习状态进行准确识别,避免了表情对学习状态造成的混淆误判,提高了识别的准确性。

Description

一种学习状态的判断方法、装置及智能机器人 技术领域
本申请实施例涉及电子信息技术领域,尤其涉及一种学习状态的判断方法、装置及智能机器人。
背景技术
教育就是一种有目的、有组织、有计划、***地传授知识和技术规范等的社会活动,其是人们获得知识,掌握技能的重要手段。而在教育中课堂教学是其最基础和最重要的教学形式。
目前,为了提高课堂教学的教学质量,通常会对课堂教学进行评估,而评估课堂教学的教学质量主要是通过两维度进行,分别为:课堂知识的掌握情况以及学生上课的学习状态。
发明人在实现本发明的过程中,发现:目前,学生上课的学习状态主要通过人工观察或摄像头监控等方式,这样的方式得到的反馈数据都是:在课堂开始的5分钟内,学生A专注听课,课堂的第5分钟至第10分钟,学生A走神等等。这种方式不能很好的判断学生上课的学习状态,有可能出现混淆判断和错误判断的情况。
发明内容
本发明实施例主要解决的技术问题是提供一种学习状态的判断方法、装置及智能机器人,能够提高判断用户学习状态的准确性。
本发明实施例的目的是通过如下技术方案实现的:
为解决上述技术问题,第一方面,本发明实施例中提供给了一种学习状态的判断方法,包括:
从用户的上课视频中获取帧图像;
识别所述用户在所述帧图像中的表情;
结合所述表情,识别所述帧图像中所述用户的学习状态。
在一些实施例中,所述表情包括高兴、疑惑、疲惫和中性,所述学习状态包括专注状态、走神状态;
所述结合所述表情,识别所述帧图像中所述用户的学习状态的步骤,具体包括:
判断所述表情是否为疲惫;
若否,则获取预先存储的所述用户与所述表情对应的专注基准图片和走神基准图片;
将所述帧图像与所述专注基准图片进行比对,得到第一匹配度;
判断所述第一匹配度是否大于或者等于第一预设阈值;
若大于或者等于所述第一预设阈值,则将所述帧图像标记为专注状态图像;
若小于所述第一预设阈值,则将所述帧图像与所述走神基准图片进行比对,得到第二相似度;
判断所述第二相似度是否大于或者等于第二预设阈值;
若大于或者等于所述第二预设阈值,则将所述帧图像标记为走神状态图像。
在一些实施例中,所述结合所述表情,识别所述帧图像中所述用户的学习状态的步骤,进一步包括:
若是,则检测所述用户的心率;
判断所述心率是否大于或者等于第三预设阈值;
若大于或者等于所述第三预设阈值,则将所述帧图像标记为专注状态图像;
若小于所述第三预设阈值,则将所述帧图像标记为走神状态图像。
在一些实施例中,所述结合所述表情,识别所述帧图像中所述用户的学习状态的步骤,具体包括:
从所述帧图像提取各脸部器官的几何特征;
根据各所述脸部器官的几何特征,并且结合预设分类算法模型,确 定所述用户的学习状态是处于专注状态还是走神状态。
在一些实施例中,所述方法在步骤结合所述表情,识别所述帧图像中所述用户的学习状态之后还包括:根据所述用户的学习状态,确定所述用户的专注时间。
在一些实施例中,所述根据所述用户的学习状态,确定所述用户的专注时间步骤,具体包括:
获取所述专注状态图像的录取时间;
将所述专注状态图像的录取时间进行统计,得到所述用户专注时间。
为解决上述技术问题,第二方面,本发明实施例中提供了一种学习状态判断装置,包括:
获取模块,用于从用户的上课视频中获取帧图像;
第一识别模块,用于识别所述用户在所述帧图像中的表情;
第二识别模块,用于结合所述表情,识别所述帧图像中所述用户的学习状态。
在一些实施例中,还包括:确定模块,用于根据所述用户的学习状态,确定所述用户的专注时间。
为解决上述技术问题,第三方面,本发明实施例提供了一种智能机器人,包括:
图像采集模块,用于采集用户在上课时的上课视频;
至少一个处理器,与所述图像采集模块连接;以及,
与所述至少一个处理器通信连接的存储器;其中,
所述存储器存储有可被所述至少一个处理器执行的指令,所述指令被所述至少一个处理器执行,以使所述至少一个处理器能够执行如上第 一方面所述的方法。
为解决上述技术问题,第四方面,本发明实施例提供了一种包含程序代码的计算机程序产品,当所述计算机程序产品在电子设备上运行时,使得所述电子设备执行如上第一方面所述的方法。
本发明实施例的有益效果:区别于现有技术的情况,本发明实施例中提供了一种学习状态的判断方法、装置及智能机器人,该方法通过从用户的上课视频中获取帧图像,识别所述用户在所述帧图像中的表情,结合所述表情,识别所述帧图像中所述用户的学习状态。由于用户在不同表情下,学习状态的呈现方式不一样,通过先进行表情识别,再结合表情来判断用户的在上课时的学习状态,实现对用户在上课时的学习状态进行准确识别,避免了表情对学习状态造成的混淆误判,提高了判断上课学习状态的准确性。
附图说明
一个或多个实施例通过与之对应的附图中的图片进行示例性说明,这些示例性说明并不构成对实施例的限定,附图中具有相同参考数字标号的元件表示为类似的元件,除非有特别申明,附图中的图不构成比例限制。
图1是本发明实施例的学习状态的判断方法的实施例的应用环境的示意图;
图2是本发明实施例提供的一种学习状态判断的方法的流程图;
图3是图2所示方法中步骤130的一子流程图;
图4是图2所示方法中步骤130的另一子流程图;
图5是本发明另一实施例提供的一种学习状态判断的方法的流程图;
图6是图5所示方法中步骤140的一子流程图;
图7是本发明实施例提供的一种学习状态判断装置的结构示意图;
图8是本发明实施例提供的执行上述学习状态的判断方法的智能机器人的硬件结构示意图。
具体实施方式
下面结合具体实施例对本发明进行详细说明。以下实施例将有助于本领域的技术人员进一步理解本发明,但不以任何形式限制本发明。应当指出的是,对本领域的普通技术人员来说,在不脱离本发明构思的前提下,还可以做出若干变形和改进。这些都属于本发明的保护范围。
为了使本申请的目的、技术方案及优点更加清楚明白,以下结合附图及实施例,对本申请进行进一步详细说明。应当理解,此处所描述的具体实施例仅用以解释本申请,并不用于限定本申请。
需要说明的是,如果不冲突,本发明实施例中的各个特征可以相互结合,均在本申请的保护范围之内。另外,虽然在装置示意图中进行了功能模块划分,在流程图中示出了逻辑顺序,但是在某些情况下,可以以不同于装置中的模块划分,或流程图中的顺序执行所示出或描述的步骤。此外,本文所采用的“第一”、“第二”等字样并不对数据和执行次序进行限定,仅是对功能和作用基本相同的相同项或相似项进行区分。
除非另有定义,本说明书所使用的所有的技术和科学术语与属于本发明的技术领域的技术人员通常理解的含义相同。本说明书中在本发明的说明书中所使用的术语只是为了描述具体的实施方式的目的,不是用于限制本发明。本说明书所使用的术语“和/或”包括一个或多个相关的所列项目的任意的和所有的组合。
请参见图1,为应用于本发明的学习状态判断的方法的实施例的应用环境的示意图,该***包括:服务器10和摄像机20。
所述服务器10和所述摄像机20通信连接,所述通信连接可以是有线连接,例如:光纤电缆,也可以是无线通信连接,例如:WIFI连接、蓝牙连接、4G无线通信连接,5G无线通信连接等等。
所述摄像机20为能够录制视频的装置,例如:具有拍摄功能的手 机、录像机或摄像头等。
所述服务器10为是能够按照程序运行,自动、高速处理海量数据的设备,其通常是由硬件***和软件***所组成,例如:计算机、智能手机等等。服务器10可以是本地设备,其直接与所述摄像机20连接;也可以是云设备,例如:云服务器、云主机、云服务平台、云计算平台等,云设备通过网络与所述摄像机20连接,并且两者通过预定的通信协议通信连接,在一些实施例,该通信协议可以是TCP/IP、NETBEUI和IPX/SPX等协议。
可以理解的是:所述服务器10和摄像机20也可以集成在一起,作为一体式的设备,又或者,摄像机20和服务器10集成于智能机器人上,作为智能机器人的部件。智能机器人或摄像机可设置于教室内或用户所处的任何学习场所内,例如互联网教育,智能机器人或摄像机可设置在用户的家里或其他学习场所里。由智能机器人或摄像机采集用户的上课视频,并且基于上课视频来判断用户在上课时的学习状态。
在一些具体的应用场景中,如现在流行的互联网教育,用户在家通过电脑即可学习到各学科老师的实时直播课程,此时,所述摄像机可为电脑前端配置的摄像头。这种方式的授课,老师和用户不是面对面的状态,使老师不能获得用户学习状态的反馈,不能很好且准确的对用户的学习状态进行判断。
本发明实施例提供了一种应用于上述应用环境的学习状态判断的方法,该方法可被上述服务器10执行,请参阅图2,该方法包括:
步骤110:从用户的上课视频中获取帧图像。
上课视频是指用户在听课时的图像集,其包含用户的若干个正脸图像。而上课视频可由设置于教室内或用户其他学习场所内的摄像机20采集到的,摄像机20可完整的获取用户的脸部图像信息,例如:将摄像机20设置于黑板边上,并且将摄像头的取景方向正对教室,则可采集到用户在上课时的上课视频;例如在互联网教育中,摄像头置于电脑上方或者摄像头为电脑的内置摄像头,可采集到用户在上课时的脸部图像信息。
步骤120:识别所述用户在所述帧图像中的表情。
用户在上课时会随上课内容或者周边同学影响呈现不同表情,而在不同表情下,用户的脸部所呈现的内容会不一样的,反而言之,可以根据用户的人脸所呈现的内容来确定用户的表情。识别所述用户在所述上课视频中各帧图像的表情具体包括以下步骤:1.人脸图像的提取,2.表情特征提取,3.表情归类。人脸图像的提取可以根据现有的图像提取算法从图像中提取出来。表情特征提取可基于几何特征法,依据面部器官的形状和位置来提取表情特征。表情归类是可基于随机森林算法、表情特征降维法、SVM多分类模型或者神经网络算法,将提到的表情特征进行表情归类,进而确定用户的表情。
在一些实施例中,为了提高表情特征的提取精度,在表情特征的提取之前,还可以对人脸图像的大小和灰度进行归一化处理,改善人脸图像质量,消除噪声。
步骤130:结合所述表情,识别所述帧图像中所述用户的学习状态。
学习状态包括走神状态和专注状态,其用于反映用户在上课时的状态。但是,相同的学习状态,在不同的表情下,其表现形式是不一样的,因此,先识别表情,再结合表情识别用户的学习状态,可以提高识别的准确性。
用户在上课时面部会表现出不同的表情,例如高兴、疑惑、难过、中性等。用户表现上述任一种表情时,其学习状态也会有区别,例如,根据表情识别判断出所述用户此时的表情为高兴,而高兴又可以细分为“因理解了知识点而高兴”和“在规划周末出游行程而高兴”两种不同的学习状态。所以在判断用户的上课学习状态时,难免受到用户表情带来的影响。将所述表情和所述用户的上课帧图像相结合进行判断,可有效地避免用户表情对学习状态判断的影响,提高用户上课学习状态判断的准确性。
在本发明实施例中,通过采集用户上课时的上课视频,再结合表情识别用户的学习状态。用户相同的学习状态在不同表情下其表现方式是不一样的,因此,先进行表情识别,再判断学习状态,可以提高对学习 状态判断的准确性。
具体的,在一些实施例中,所述表情包括高兴、疑惑、疲惫和中性,所述学习状态包括专注状态、走神状态。当表情为高兴、疑惑和中性时,脸部区别特征明显,通过图像对比的方式可以准确的识别用户的学习状态。请参阅图3,步骤130具体包括:
步骤131:判断所述表情是否为疲惫,若否,则执行步骤132;若是,则执行步骤139。
步骤132:获取预先存储的所述用户与所述表情对应的专注基准图片和走神基准图片。
专注基准图片是指用户在所述表情下处于专注状态的图片,走神基准图片是指用户在所述表情下处于走神状态的图片,专注基准图片和走神基准图片可通过对所述上课视频的各帧图像进行人工筛选采集。
值得说明的是:同一个用户,当其表情不相同时,其专注基准图片和走神基准图片是不相同的。当然,不同用户之间,由于其外貌不一样,其专注基准图片和走神基准图片也是不相同。
步骤133:将所述帧图像与所述专注基准图片进行比对,得到第一匹配度。
步骤134:判断所述第一匹配度是否大于或者等于第一预设阈值,若大于或者等于所述第一预设阈值,则执行步骤135,否则,执行步骤136;
步骤135:将所述帧图像标记为专注状态图像。
当第一相似度大于或者等于第一预设阈值时,则说明用户此时的脸部图像与专注基准图片高度相似,可认为用户此时处于专注状态。
需要说明的是:第一预设阈值具体数值可以通过多次实验确定,并且第一预设阈值可以根据不同用户设置不同的数值。
步骤136:则将所述帧图像与所述走神基准图片进行比对,得到第二匹配度。
步骤137:判断所述第二匹配度是否大于或者等于第二预设阈值, 若大于或者等于所述第二预设阈值,则执行步骤138。
当第二相似度大于或者等于第二预设阈值时,则说明脸部图像与走神基准图片高度相似,可确定用户处于走神状态。
对于第二预设阈值的具体数值,也可以通过多次实验确定,并且第一预设阈值可以根据不同用户设置不同的数值。
步骤138:则将所述帧图像标记为走神状态图像。
在一些实施实施例中,当表情为疲惫时,通过检测心率来确定用户的学习状态,具体包括:
步骤139:检测所述用户的心率;
对于用户的心率,可以通过图像心率法进行检测,具体地,使用OpenCV提供的人脸检测器,对所述用户在所述脸部图像进行人脸区域检测并记录区域位置,然后将人脸区域图像分离为RGB三通道,分别计算区域内灰度均值,可得到随时间变化的三个R、G、B信号,最后对R、G、B信号进行独立成分分析,得到用户在所述的心率。
步骤1310:判断所述心率是否大于或者等于第三预设阈值,若大于或者等于所述第三预设阈值,执行步骤1311,否则,执行步骤1312;
步骤1311:则将所述帧图像标记为专注状态图像。
步骤1312:则将所述帧图像标记为走神状态图像。
在本发明实施例中,通过判断所述表情是否为疲惫,若是,则检测用户的心率以判断所述用户的学习状态为专注状态还是走神状态。当人的表情为疲惫时,因脸部特征区别不明显,导致疲惫表情下的所述专注基准图片和所述走神基准图片的区别特征不显著,将所述帧图像与所述专注基准图片和走神基准图片进行比对时,可能会产生所述第一匹配度和所述第二匹配度接近的情况;或所述帧图片表示的学习状态为走神状态,却因所述专注基准图片和所述走神基准图片的区别特征不显著,导致在进行图像比对时,所述第一匹配度大于所述第一预设阈值,所述帧图片表示的学习状态被判断为专注状态,出现错误判断。这种情况下将会降低用户上课学习状态判断的准确性。因此,当所述表情为疲惫时, 采用检测用户心率的方法,以提高用户上课学习状态判断的准确性。当所述用户的心率大于所述第三预设阈值时,可说明此时用户的大脑活跃度较高,即可判断用户此时的学习状态为专注状态,反之,用户此时的学习状态为走神状态。
在另一些实施例中,由于用户处于不同学习状态时,其脸部所表现的内容是不同的,因此,也可以通过识别用户的脸部特征,并且根据脸部特征来确定用户的学习状态,请参阅图4,所述步骤130具体包括:
步骤131a:从所述帧图像提取各脸部器官的几何特征。
几何特征包括用于表征面部器官的形状、大小和距离,其可采用现有的图像提取算法从脸部图像中提取出来。在一些实施例中,可采用Face++函数库提取脸部图像的几何特征。
步骤132a:根据各所述脸部器官的几何特征,并且结合预设分类算法模型,确定所述用户的学习状态是处于专注状态还是走神状态。
预设分类算法模型可调用现有的分类算法,例如逻辑回归算法、随机森林算法、表情特征降维法、SVM多分类模型或者神经网络算法等。
相同的学习状态,在不同的表情下,其表现形式是不一样的,因此,先识别表情,再分别建立各表情下的分类算法模型,各分类算法模型与各对应的表情能最大程度适配,从而,可以提高识别的准确性。
具体的,预设分类算法模型是预先建立,预设分类算法模型的建立过程具体包括:
步骤(1):获取用户各表情下的脸部训练样本集的几何特征和标签数据;
脸部训练样本集为人脸图像集合,通常是由人工调查选取出的已知结果的历史数据。标签数据用于表征各脸部训练样本的表情,其进行数值化,用1和0表示,1表示用户处于专注状态,0表示用户处于走神状态。
步骤(2):使用所述脸部训练样本集的几何特征和标签数据对初始分类算法模型进行学习训练,得到特征系数,将特征系数代入对应的初始分类算法模型,得到所述预设分类算法模型。
初始分类算法模型中各特征系数的具体数值是不清楚的,是通过各对应表情的脸部训练样本集进行学习得到,能有效拟合相对应的脸部训练样本集的几何特征,从而能够准确地判断各表情下的学习状态。
具体的,上述步骤(2)具体包括:
步骤①:分别将所述用户各表情下的脸部训练样本集的几何特征分为五个特征块,所述五个特征块包括嘴部几何特征块、眼部几何特征块、眉毛几何特征块、人脸轮廓几何特征块和视线几何特征块;
几何特征维度较高,对应的特征权重系数较多,计算量大且不准确,不利于后期建模和计算。而我们知道,在人工识别用户是处于专注状态还是走神状态时,主要是依据用户的嘴部、眼部、眉毛、轮廓和视线方向进行判断,例如,若用户眉毛可能轻微上扬,眼睛睁大,上下眼帘间距变大,嘴部自然合拢,视线注视前方,人脸轮廓变大,则用户处于专注状态。因此,将面部的几何特征分为嘴部几何特征块、眼部几何特征块、眉毛几何特征块、人脸轮廓几何特征块和视线几何特征块,可以提高建模效率和模型识别正确率。
步骤②:使用所述脸部训练样本集的五个特征块和标签数据对初始逻辑回归模型进行学习训练,得到五个特征块系数,将五个特征块系数代入初始逻辑回归模型,得到所述预设逻辑回归模型。
逻辑回归是一种广义线性回归,在线性回归的基础上加入Sigmoid函数进行非线性映射,可以将连续值映射到0和1上。确定逻辑回归模型,即在机器学习中,选择逻辑回归模型作为二分类模型进行建模。
将所述各表情下的标签数据和所述五个特征块,进行数值化和归一化处理后,即可变成模型学习所需要的数据格式,然后用对应所述表情下的初始逻辑回归模型进行学习训练,得到各表情下的五个特征块系数,将各表情下的五个特征块系数分别代入对应所述表情的初始逻辑回归模型,得到各表情下的预设逻辑回归模型。
用户处于不同表情下,同一个面部器官特征对用户的学习状态反映程度是不一样的。例如,开心时,嘴部张开上扬,难过时,嘴部合拢向下,而在进行学习状态识别时,预设专注时嘴部自然合拢,对于开心和 难过这两种不同的表情,若采用同一算法模型对学习状态进行计算,例如嘴部特征权重一样,从而会造成误判。例如,将开心表情下的用户更容易识别为走神状态,然而,有的用户可能因理解了知识点而开心,有的用户可能因开小差而开心。在针对开心表情下的用户和难过表情下的用户,用采用不同的识别模型对学习状态进行识别,能提高准确率。
因此,先将用户进行表情分类,然后在各表情类别下,确定各自的逻辑回归模型,从而针对不同的表情,有各自相适应的、能有效拟合的逻辑回归模型,提高了识别准确性。
如图5所示,在一些实施例中,所述方法在步骤130之后还包括:
步骤140:根据所述用户的学习状态,确定所述用户的专注时间。
专注时间是指用户在上课时处于专注状态的时间。在确定所述用户的专注时间之后,可以基于用户的专注时间进行匹配课程的时间和课程长度,从而使得课程时间和课程长度与用户匹配,实现个性化定制教育,达到因材施教和因状态施教的效果。
进一步的,当用户的数量为多个时,也可基于多个用户的专注时间进行分班教学,例如:将具有相同专注时段的用户集全到一个班上,根据用户的专注时长确定课时长度,从而保证每堂课的用户在上课期间,具有最高的专注力,提高整体教学质量。
在一些实施例中,如图6所示,步骤140具体包括:
步骤141:获取所述专注状态图像的录取时间。
录取时间是指图像在录摄时的时间。
步骤142:将所述专注状态图像的录取时间进行统计,得到所述用户专注时间。
专注状态图像是指用户在对应的录取时间内处于专注状态的图像,则将存在连续关系的专注状态图像的录取时间进行统计后,代表用户在此时间段内一直处于专注状态,则此时间段为用户专注时间。连续关系是指所述专注状态图像在所述上课视频中为连续帧的关系。
进一步地,在确定所述用户的专注时间之后,可以准确地确定用户 处于专注状态的具体时间段,以及,用户处于专注状态的时长,并可以以此为基础为用户匹配课程的时间和课程长度。例如,某些学生的单次专注时长为30分钟,且他们的专注时间范围为早上8点到11点,则将这些用户分为一个班,并且采用早上8点到11点单次授课40分钟,中间休息10分钟的方式进行授课。另一些用户的单次专注时长为40分钟,且他们的专注时间范围为早上10点到12点,则将这些用户分为一个班,采用早上10点到12点单次授课50分钟,中间休息10分钟的方式进行授课。
本发明实施例还提供了一种学习状态判断装置,请参阅图7,其示出了本申请实施例提供的一种学习状态判断装置的结构,该学习状态判断装置200包括:获取模块210、第一识别模块220和第二识别模块230。
获取模块210用于从用户的上课视频中获取帧图像。第一识别模块220用于识别所述用户在所述帧图像中的表情。第二识别模块230用于结合所述表情,识别所述帧图像中所述用户的学习状态。本发明实施例提供的学习状态判断装置200能够更准确地判断用户学习状态。
在一些实施例中,请参阅图7,所述学习状态判断装置200还包括确定模块240,确定模块240用于根据所述用户的学习状态,确定所述用户的专注时间。
在一些实施例中,所述表情包括高兴、疑惑、疲惫和中性,所述学习状态包括专注状态和走神状态,所述第二识别模块230还用于判断所述表情是否为疲惫;若否,则获取预先存储的所述用户与所述表情对应的专注基准图片和走神基准图片;将所述帧图像与所述专注基准图片进行比对,得到第一匹配度;判断所述第一匹配度是否大于或者等于第一预设阈值;若大于或者等于所述第一预设阈值,则将所述帧图像标记为专注状态图像;若小于所述第一预设阈值,则将所述帧图像与所述走神基准图片进行比对,得到第二匹配度;判断所述第二匹配度是否大于或者等于第二预设阈值;若大于或者等于所述第二预设阈值,则将所述帧图像标记为走神状态图像。
在一些实施例中,所述第二识别模块230还用于当所述表情为疲惫时,则检测所述用户的心率;判断所述心率是否大于或者等于第三预设阈值;若大于或者等于所述第三预设阈值,则将所述帧图像标记为专注状态图像;若小于所述第三预设阈值,则将所述帧图像标记为走神状态图像。
在一些实施例中,所述第二识别模块230还包括提取单元(图未示)和识别单元(图未示)。所述提取单元从所述帧图像提取各脸部器官的几何特征。所述识别单元用于根据各所述脸部器官的几何特征,并且结合预设分类算法模型,确定所述用户的学习状态是处于专注状态还是走神状态。
在一些实施例中,所述确定模块240还包括第一获取单元(图未视)和统计单元(图未视)。所述第一获取单元,用于获取所述专注状态图像的录取时间。所述统计单元,用于将所述专注状态图像的录取时间进行统计,得到所述用户专注时间。
在本发明实施例中,该学习状态判断装置200通过获取模块210从用户的上课视频中获取帧图像,第一识别模块220识别所述用户在所述帧图像中的表情,然后第二识别模块230结合所述表情,识别所述帧图像中所述用户的学习状态。用户相同的学习状态在不同表情下其表现方式是不一样的,先进行表情识别,再判断学习状态,可以提高对学习状态识别的准确性,进而保证用户的专注时间检测的准确性。
本发明实施例还提供了一种智能机器人300,请参阅图8,所述智能机器人300包括:图像采集模块310,用于采集用户在上课时的上课视频;至少一个处理器320,与所述图像采集模块310连接;以及,与所述至少一个处理器320通信连接的存储器330,图8中以一个处理器320为例。
所述存储器330存储有可被所述至少一个处理器320执行的指令,所述指令被所述至少一个处理器320执行,以使所述至少一个处理器320能够执行上述图2至图6所述的学习状态判断的方法。所述处理器320 和所述存储器330可以通过总线或者其他方式连接,图8中以通过总线连接为例。
存储器330作为一种非易失性计算机可读存储介质,可用于存储非易失性软件程序、非易失性计算机可执行程序以及模块,如本申请实施例中的学习状态判断的方法的程序指令/模块,例如,附图7所示的各个模块。处理器320通过运行存储在存储器330中的非易失性软件程序、指令以及模块,从而执行服务器的各种功能应用以及数据处理,即实现上述方法实施例学习状态的判断方法。
存储器330可以包括存储程序区和存储数据区,其中,存储程序区可存储操作***、至少一个功能所需要的应用程序;存储数据区可存储根据学习状态判断装置的使用所创建的数据等。此外,存储器330可以包括高速随机存取存储器330,还可以包括非易失性存储器330,例如至少一个磁盘存储器件、闪存器件、或其他非易失性固态存储器件。在一些实施例中,存储器330可选包括相对于处理器320远程设置的存储器330,这些远程存储器330可以通过网络连接至人脸识别装置。上述网络的实例包括但不限于互联网、企业内部网、局域网、移动通信网及其组合。
所述一个或者多个模块存储在所述存储器330中,当被所述一个或者多个处理器320执行时,执行上述任意方法实施例中的学习状态的判断方法,例如,执行以上描述的图2至图6的方法步骤,实现图7中的各模块的功能。
上述产品可执行本申请实施例所提供的方法,具备执行方法相应的功能模块和有益效果。未在本实施例中详尽描述的技术细节,可参见本申请实施例所提供的方法。
本申请实施例还提供了一种包含程序代码的计算机程序产品,当所述计算机程序产品在电子设备上运行时,使得所述电子设备执行上述任一方法实施例中学习状态判断的方法,例如,执行以上描述的图2至图6的方法步骤,实现图7中各模块的功能。
本发明实施例的有益效果:区别于现有技术的情况,本发明实施例中提供了一种学习状态的判断方法、装置及智能机器人,该方法通过从用户的上课视频中获取帧图像,识别所述用户在所述帧图像中的表情,结合所述表情,识别所述帧图像中所述用户的学习状态。由于用户在不同表情下,学习状态的呈现方式不一样,通过先进行表情识别,再结合表情来判断用户的在上课时的学习状态,实现对用户在上课时的学习状态进行准确识别,避免了表情对学习状态造成的混淆误判,提高了判断上课学习状态的准确性。
在一些具体的应用场景中,如现在流行的互联网教育,用户在家通过电脑即可学习到各学科老师的实时直播课程,然而这种方式的授课,老师和用户不是面对面的状态,这种状态使老师不能很好的判断学生的学习状态,此时,运用本发明所述的判断学习状态的方法,通过电脑的摄像头即可识别用户的上课表情,结合表情即可判断出用户的学习状态,老师将得到用户学习状态的汇总信息,例如,用户处于专注状态的时段分布范围,以及相应课程用户处于专注状态的持续时间。这种方法,不但能提高用户学习状态判断的准确性,老师也能通过反馈数据更好的针对用户进行教学方式的改进,提高用户的学习效率。
需要说明的是,以上所描述的装置实施例仅仅是示意性的,其中所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部模块来实现本实施例方案的目的。
通过以上的实施方式的描述,本领域普通技术人员可以清楚地了解到各实施方式可借助软件加通用硬件平台的方式来实现,当然也可以通过硬件。本领域普通技术人员可以理解实现上述实施例方法中的全部或部分流程是可以通过计算机程序来指令相关的硬件来完成,所述的程序可存储于一计算机可读取存储介质中,该程序在执行时,可包括如上述 各方法的实施例的流程。其中,所述的存储介质可为磁碟、光盘、只读存储记忆体(Read-Only Memory,ROM)或随机存储记忆体(Random Access Memory,RAM)等。
最后应说明的是:以上实施例仅用以说明本发明的技术方案,而非对其限制;在本发明的思路下,以上实施例或者不同实施例中的技术特征之间也可以进行组合,步骤可以以任意顺序实现,并存在如上所述的本发明的不同方面的许多其它变化,为了简明,它们没有在细节中提供;尽管参照前述实施例对本发明进行了详细的说明,本领域的普通技术人员应当理解:其依然可以对前述各实施例所记载的技术方案进行修改,或者对其中部分技术特征进行等同替换;而这些修改或者替换,并不使相应技术方案的本质脱离本发明各实施例技术方案的范围。

Claims (10)

  1. 一种学习状态的判断方法,其特征在于,包括:
    从用户的上课视频中获取帧图像;
    识别所述用户在所述帧图像中的表情;
    结合所述表情,识别所述帧图像中所述用户的学习状态。
  2. 根据权利要求1所述的方法,其特征在于,所述表情包括高兴、疑惑、疲惫和中性,所述学习状态包括专注状态和走神状态,所述结合所述表情,识别所述帧图像中所述用户的学习状态的步骤,具体包括:
    判断所述表情是否为疲惫;
    若否,则获取预先存储的所述用户与所述表情对应的专注基准图片和走神基准图片;
    将所述帧图像与所述专注基准图片进行比对,得到第一匹配度;
    判断所述第一匹配度是否大于或者等于第一预设阈值;
    若大于或者等于所述第一预设阈值,则将所述帧图像标记为专注状态图像;
    若小于所述第一预设阈值,则将所述帧图像与所述走神基准图片进行比对,得到第二匹配度;
    判断所述第二匹配度是否大于或者等于第二预设阈值;
    若大于或者等于所述第二预设阈值,则将所述帧图像标记为走神状态图像。
  3. 根据权利要求2所述的方法,其特征在于,所述结合所述表情,识别所述帧图像中所述用户的学习状态的步骤,进一步包括:
    若是,则检测所述用户的心率;
    判断所述心率是否大于或者等于第三预设阈值;
    若大于或者等于所述第三预设阈值,则将所述帧图像标记为专注状态图像;
    若小于所述第三预设阈值,则将所述帧图像标记为走神状态图像。
  4. 根据权利要求1所述的方法,其特征在于,所述结合所述表情,识别所述帧图像中所述用户的学习状态的步骤,具体包括:
    从所述帧图像提取各脸部器官的几何特征;
    根据各所述脸部器官的几何特征,并且结合预设分类算法模型,确定所述用户的学习状态是处于专注状态还是走神状态。
  5. 根据权利要求2至4中任一项所述的方法,其特征在于,
    所述方法在步骤结合所述表情,识别所述帧图像中所述用户的学习状态之后还包括:根据所述用户的学习状态,确定所述用户的专注时间。
  6. 根据权利要5所述的方法,其特征在于,所述根据所述用户的学习状态,确定所述用户的专注时间步骤,具体包括:
    获取所述专注状态图像的录取时间;
    将所述专注状态图像的录取时间进行统计,得到所述用户专注时间。
  7. 一种学习状态判断装置,其特征在于,包括:
    获取模块,用于从用户的上课视频中获取帧图像;
    第一识别模块,用于识别所述用户在所述帧图像中的表情;
    第二识别模块,用于结合所述表情,识别所述帧图像中所述用户的学习状态。
  8. 根据权利要求7所述的学习状态判断装置,其特征在于,还包括:
    确定模块,用于根据所述用户的学习状态,确定所述用户的专注时间。
  9. 一种智能机器人,其特征在于,包括:
    图像采集模块,用于采集用户在上课时的上课视频;
    至少一个处理器,与所述图像采集模块连接;以及,
    与所述至少一个处理器通信连接的存储器;其中,
    所述存储器存储有可被所述至少一个处理器执行的指令,所述指令被所述至少一个处理器执行,以使所述至少一个处理器能够执行如权利要求1-6任一项所述的方法。
  10. 一种包含程序代码的计算机程序产品,其特征在于,当所述计算机程序产品在电子设备上运行时,使得所述电子设备执行如权利要求1至6中任一项所述的方法。
PCT/CN2019/113169 2019-10-25 2019-10-25 一种学习状态的判断方法、装置及智能机器人 WO2021077382A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
PCT/CN2019/113169 WO2021077382A1 (zh) 2019-10-25 2019-10-25 一种学习状态的判断方法、装置及智能机器人
CN201980002118.9A CN110945522B (zh) 2019-10-25 2019-10-25 一种学习状态的判断方法、装置及智能机器人

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2019/113169 WO2021077382A1 (zh) 2019-10-25 2019-10-25 一种学习状态的判断方法、装置及智能机器人

Publications (1)

Publication Number Publication Date
WO2021077382A1 true WO2021077382A1 (zh) 2021-04-29

Family

ID=69913078

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/113169 WO2021077382A1 (zh) 2019-10-25 2019-10-25 一种学习状态的判断方法、装置及智能机器人

Country Status (2)

Country Link
CN (1) CN110945522B (zh)
WO (1) WO2021077382A1 (zh)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113657146A (zh) * 2021-06-30 2021-11-16 北京惠朗时代科技有限公司 一种基于单幅图像的学生非专注学习低耗识别方法及装置
CN114049669A (zh) * 2021-11-15 2022-02-15 海信集团控股股份有限公司 确定学习效果的方法及装置
CN114971975A (zh) * 2022-07-31 2022-08-30 北京英华在线科技有限公司 在线教育平台用的学习异常提醒方法及***
CN115937961A (zh) * 2023-03-02 2023-04-07 济南丽阳神州智能科技有限公司 一种线上学习识别方法及设备
CN116843521A (zh) * 2023-06-09 2023-10-03 中安华邦(北京)安全生产技术研究院股份有限公司 一种基于大数据的培训档案管理***及方法
CN116844206A (zh) * 2023-06-29 2023-10-03 深圳卓创智能科技有限公司 学生电脑的监控方法、装置、设备及存储介质

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111723288B (zh) * 2020-06-08 2021-06-29 上海松鼠课堂人工智能科技有限公司 智适应学习检测***及方法
CN112215102A (zh) * 2020-09-27 2021-01-12 漳州爱果冻信息科技有限公司 学习状态的处理方法、装置以及书桌
CN112235465A (zh) * 2020-10-27 2021-01-15 四川金沐志德科技有限公司 一种基于智能终端的学习任务及理财管理***
CN112818761A (zh) * 2021-01-15 2021-05-18 深圳信息职业技术学院 一种基于人工智能的在线教育人机交互方法与***
CN113709552A (zh) * 2021-08-31 2021-11-26 维沃移动通信有限公司 视频生成方法、装置及电子设备

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160027046A1 (en) * 2014-07-24 2016-01-28 Samsung Electronics Co., Ltd. Method and device for playing advertisements based on relationship information between viewers
CN106599881A (zh) * 2016-12-30 2017-04-26 首都师范大学 学生状态的确定方法、装置及***
CN107292271A (zh) * 2017-06-23 2017-10-24 北京易真学思教育科技有限公司 学习监控方法、装置及电子设备
CN109815795A (zh) * 2018-12-14 2019-05-28 深圳壹账通智能科技有限公司 基于人脸监测的课堂学生状态分析方法及装置
CN110091335A (zh) * 2019-04-16 2019-08-06 威比网络科技(上海)有限公司 学伴机器人的控制方法、***、设备及存储介质

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9666088B2 (en) * 2013-08-07 2017-05-30 Xerox Corporation Video-based teacher assistance
US9767349B1 (en) * 2016-05-09 2017-09-19 Xerox Corporation Learning emotional states using personalized calibration tasks
CN108304936B (zh) * 2017-07-12 2021-11-16 腾讯科技(深圳)有限公司 机器学习模型训练方法和装置、表情图像分类方法和装置

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160027046A1 (en) * 2014-07-24 2016-01-28 Samsung Electronics Co., Ltd. Method and device for playing advertisements based on relationship information between viewers
CN106599881A (zh) * 2016-12-30 2017-04-26 首都师范大学 学生状态的确定方法、装置及***
CN107292271A (zh) * 2017-06-23 2017-10-24 北京易真学思教育科技有限公司 学习监控方法、装置及电子设备
CN109815795A (zh) * 2018-12-14 2019-05-28 深圳壹账通智能科技有限公司 基于人脸监测的课堂学生状态分析方法及装置
CN110091335A (zh) * 2019-04-16 2019-08-06 威比网络科技(上海)有限公司 学伴机器人的控制方法、***、设备及存储介质

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113657146A (zh) * 2021-06-30 2021-11-16 北京惠朗时代科技有限公司 一种基于单幅图像的学生非专注学习低耗识别方法及装置
CN113657146B (zh) * 2021-06-30 2024-02-06 北京惠朗时代科技有限公司 一种基于单幅图像的学生非专注学习低耗识别方法及装置
CN114049669A (zh) * 2021-11-15 2022-02-15 海信集团控股股份有限公司 确定学习效果的方法及装置
CN114971975A (zh) * 2022-07-31 2022-08-30 北京英华在线科技有限公司 在线教育平台用的学习异常提醒方法及***
CN114971975B (zh) * 2022-07-31 2022-11-01 北京英华在线科技有限公司 在线教育平台用的学习异常提醒方法及***
CN115937961A (zh) * 2023-03-02 2023-04-07 济南丽阳神州智能科技有限公司 一种线上学习识别方法及设备
CN116843521A (zh) * 2023-06-09 2023-10-03 中安华邦(北京)安全生产技术研究院股份有限公司 一种基于大数据的培训档案管理***及方法
CN116843521B (zh) * 2023-06-09 2024-01-26 中安华邦(北京)安全生产技术研究院股份有限公司 一种基于大数据的培训档案管理***及方法
CN116844206A (zh) * 2023-06-29 2023-10-03 深圳卓创智能科技有限公司 学生电脑的监控方法、装置、设备及存储介质

Also Published As

Publication number Publication date
CN110945522B (zh) 2023-09-12
CN110945522A (zh) 2020-03-31

Similar Documents

Publication Publication Date Title
WO2021077382A1 (zh) 一种学习状态的判断方法、装置及智能机器人
CN109522815B (zh) 一种专注度评估方法、装置及电子设备
CN111709409B (zh) 人脸活体检测方法、装置、设备及介质
CN110991381B (zh) 一种基于行为和语音智能识别的实时课堂学生状态分析与指示提醒***和方法
CN110889672B (zh) 一种基于深度学习的学生打卡及上课状态的检测***
WO2020010785A1 (zh) 一种课堂教学认知负荷测量***
CN105516280B (zh) 一种多模态学习过程状态信息压缩记录方法
CN111291613B (zh) 一种课堂表现评价方法及***
CN109063587B (zh) 数据处理方法、存储介质和电子设备
CN106599881A (zh) 学生状态的确定方法、装置及***
CN109740446A (zh) 课堂学生行为分析方法及装置
CN108898115B (zh) 数据处理方法、存储介质和电子设备
WO2021047185A1 (zh) 基于人脸识别的监测方法、装置、存储介质及计算机设备
CN111027486A (zh) 一种中小学课堂教学效果大数据辅助分析评价***及其方法
CN111008971B (zh) 一种合影图像的美学质量评价方法及实时拍摄指导***
CN111008542A (zh) 对象专注度分析方法、装置、电子终端及存储介质
CN115205764B (zh) 基于机器视觉的在线学习专注度监测方法、***及介质
CN111950486A (zh) 基于云计算的教学视频处理方法
CN115937928A (zh) 基于多视觉特征融合的学习状态监测方法及***
CN115546692A (zh) 一种远程教育数据采集分析方法、设备及计算机存储介质
Villegas-Ch et al. Identification of emotions from facial gestures in a teaching environment with the use of machine learning techniques
CN110598607B (zh) 非接触式与接触式协同的实时情绪智能监测***
Yi et al. Real time learning evaluation based on gaze tracking
CN115937793A (zh) 基于图像处理的学生行为异常检测方法
Huang et al. Research on learning state based on students’ attitude and emotion in class learning

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19950125

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 09.09.2022)

122 Ep: pct application non-entry in european phase

Ref document number: 19950125

Country of ref document: EP

Kind code of ref document: A1