WO2023113182A1 - Apparatus and method for measuring attention level of child for non-face-to-face learning, by using ensemble technique combining motion estimation and c3d - Google Patents

Apparatus and method for measuring attention level of child for non-face-to-face learning, by using ensemble technique combining motion estimation and c3d Download PDF

Info

Publication number
WO2023113182A1
WO2023113182A1 PCT/KR2022/015538 KR2022015538W WO2023113182A1 WO 2023113182 A1 WO2023113182 A1 WO 2023113182A1 KR 2022015538 W KR2022015538 W KR 2022015538W WO 2023113182 A1 WO2023113182 A1 WO 2023113182A1
Authority
WO
WIPO (PCT)
Prior art keywords
user
classification
concentration
face
classification unit
Prior art date
Application number
PCT/KR2022/015538
Other languages
French (fr)
Korean (ko)
Inventor
박윤하
김대수
김종일
윤창섭
Original Assignee
주식회사 우경정보기술
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 주식회사 우경정보기술 filed Critical 주식회사 우경정보기술
Publication of WO2023113182A1 publication Critical patent/WO2023113182A1/en

Links

Images

Classifications

    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61BDIAGNOSIS; SURGERY; IDENTIFICATION
    • A61B5/00Measuring for diagnostic purposes; Identification of persons
    • A61B5/16Devices for psychotechnics; Testing reaction times ; Devices for evaluating the psychological state
    • A61B5/168Evaluating attention deficit, hyperactivity
    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61BDIAGNOSIS; SURGERY; IDENTIFICATION
    • A61B5/00Measuring for diagnostic purposes; Identification of persons
    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61BDIAGNOSIS; SURGERY; IDENTIFICATION
    • A61B5/00Measuring for diagnostic purposes; Identification of persons
    • A61B5/0002Remote monitoring of patients using telemetry, e.g. transmission of vital signals via a communication network
    • A61B5/0015Remote monitoring of patients using telemetry, e.g. transmission of vital signals via a communication network characterised by features of the telemetry system
    • A61B5/0022Monitoring a patient using a global network, e.g. telephone networks, internet
    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61BDIAGNOSIS; SURGERY; IDENTIFICATION
    • A61B5/00Measuring for diagnostic purposes; Identification of persons
    • A61B5/103Detecting, measuring or recording devices for testing the shape, pattern, colour, size or movement of the body or parts thereof, for diagnostic purposes
    • A61B5/11Measuring movement of the entire body or parts thereof, e.g. head or hand tremor, mobility of a limb
    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61BDIAGNOSIS; SURGERY; IDENTIFICATION
    • A61B5/00Measuring for diagnostic purposes; Identification of persons
    • A61B5/103Detecting, measuring or recording devices for testing the shape, pattern, colour, size or movement of the body or parts thereof, for diagnostic purposes
    • A61B5/11Measuring movement of the entire body or parts thereof, e.g. head or hand tremor, mobility of a limb
    • A61B5/1121Determining geometric values, e.g. centre of rotation or angular range of movement
    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61BDIAGNOSIS; SURGERY; IDENTIFICATION
    • A61B5/00Measuring for diagnostic purposes; Identification of persons
    • A61B5/103Detecting, measuring or recording devices for testing the shape, pattern, colour, size or movement of the body or parts thereof, for diagnostic purposes
    • A61B5/11Measuring movement of the entire body or parts thereof, e.g. head or hand tremor, mobility of a limb
    • A61B5/1126Measuring movement of the entire body or parts thereof, e.g. head or hand tremor, mobility of a limb using a particular sensing technique
    • A61B5/1128Measuring movement of the entire body or parts thereof, e.g. head or hand tremor, mobility of a limb using a particular sensing technique using image analysis
    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61BDIAGNOSIS; SURGERY; IDENTIFICATION
    • A61B5/00Measuring for diagnostic purposes; Identification of persons
    • A61B5/16Devices for psychotechnics; Testing reaction times ; Devices for evaluating the psychological state
    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61BDIAGNOSIS; SURGERY; IDENTIFICATION
    • A61B5/00Measuring for diagnostic purposes; Identification of persons
    • A61B5/72Signal processing specially adapted for physiological signals or for diagnostic purposes
    • A61B5/7235Details of waveform analysis
    • A61B5/7264Classification of physiological signals or data, e.g. using neural networks, statistical classifiers, expert systems or fuzzy systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/10Services
    • G06Q50/20Education
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61BDIAGNOSIS; SURGERY; IDENTIFICATION
    • A61B2503/00Evaluating a particular growth phase or type of persons or animals
    • A61B2503/06Children, e.g. for attention deficit diagnosis

Definitions

  • the present invention relates to an apparatus and method for measuring a child's concentration on non-face-to-face learning.
  • the present invention is to provide a measuring device and method for accurately measuring, classifying, and predicting a user's concentration on a child's non-face-to-face learning using an ensemble technique in which a plurality of machine learning models are combined.
  • the measurement device of the present invention includes an acquisition unit for obtaining image information in which a user performing non-face-to-face learning is photographed;
  • a first classification unit for classifying the user's concentration for the non-face-to-face learning through analysis of the user's motion included in the image information; may include.
  • the measurement method of the present invention includes an acquisition step of obtaining image information in which a user performing non-face-to-face learning is photographed; a second classification step of extracting a change amount of the user's joint included in a frame constituting the image information by using a machine-learned motion estimation model; A first classification step of classifying the user's concentration for the non-face-to-face learning according to the analysis of the user's motion between the frames using a machine-learned classification model; including, wherein the last layer of the motion estimation model is the classification It can be coupled to the initial layers of the model.
  • the present invention proposes a deep learning ensemble learning methodology that operates based on C3D (Convolution 3D), a type of time series analysis model, and combines a motion estimation model to build a non-face-to-face education environment that can measure the concentration of users, especially children. can do.
  • C3D Convolution 3D
  • a type of time series analysis model a type of time series analysis model
  • FIG. 1 is a schematic diagram showing a measuring device of the present invention.
  • FIG. 2 is a schematic diagram showing a concentration measurer of a comparative example.
  • 3 is a schematic diagram showing the operation of the measuring device.
  • 5 is a flowchart showing the measurement method of the present invention.
  • FIG. 6 is a diagram illustrating a computing device according to an embodiment of the present invention.
  • 'and/or' includes a combination of a plurality of listed items or any item among a plurality of listed items.
  • 'A or B' may include 'A', 'B', or 'both A and B'.
  • the above comparative example is a possible method under the premise that the user's face always appears in the image captured by the camera 90 .
  • the comparison embodiment when a user looks at a book rather than a display on which non-face-to-face learning content is output according to the learning progress, an error that is determined as low concentration may be indicated even when taking notes.
  • the user is a child, there is a limit to the application of the method because the user cannot stare at the camera 90 or the display for more than 5 minutes.
  • the measuring device 100 and the measuring method of the present invention may use boosting ensemble learning in which two different classifiers sequentially learn, and through this, a final prediction result of concentration is determined.
  • Prediction of the degree of concentration can be classified into two types: the case of concentrating and the case of not concentrating. Accordingly, the prediction of the degree of concentration may also be referred to as classification of the degree of concentration, determination of the degree of concentration, or judgment of the degree of concentration.
  • the first classification unit 110 may extract a motion vector value of a user between each frame constituting a captured image using C3D (Convolution 3D) capable of recognizing both spatial and temporal characteristics.
  • C3D Convolution 3D
  • the second classification unit 120 may extract a joint motion vector value of a user appearing in each frame by using a motion estimation model.
  • the general features of the image are extracted through C3D and prediction is performed once more through motion estimation in order to more precisely analyze the movement of a specific joint.
  • an ensemble technique combining a motion estimation model with a C3D model is applied, only a specific object, for example, a specific user or a specific joint, can be analyzed, so it is possible to measure the concentration of users, especially children, for accurate non-face-to-face learning.
  • FIG. 2 is a schematic diagram showing a concentration measurer 10 according to a comparative example.
  • the measuring device 10 of the comparative embodiment checks whether the eyes are open or closed using only feature values corresponding to eyes among facial feature points extracted using deep learning-based landmark detection.
  • the measuring device 10 checks the degree of head bowing using a YOLO-based parietal detection model.
  • the overall structure of the system first recognizes whether the eyes are open or closed (S 1), and if the state in which the eyes are not detected continues for more than 20 seconds (S 2), parietal recognition is attempted (S 3). If an eye is detected even once while the timer is running, it returns to the frame input stage. If both objects (the eyes and the top of the head) are unrecognized, an event in which the degree of concentration is classified as an unconcentrated state may occur (S4).
  • the measuring device 10 of the comparative example uses two detection models to increase the accuracy of concentration measurement, measurement is possible only when a human face appears in an image. Therefore, it is difficult to apply the comparative example to children who rarely stay still during learning time. Even in the case of an adult with excellent concentration, an error may occur in which a state in which the head is lowered for writing, which is an extension of learning, is judged as an unconcentrated state. As a result, according to the comparative example, there is an inaccurate measurement of the user's concentration on non-face-to-face learning.
  • FIG. 1 is a schematic diagram showing a measuring device 100 of the present invention.
  • 3 is a schematic diagram showing the operation of the measuring device 100.
  • the measurement device 100 of the present invention may operate differently from the comparative example in which the degree of concentration is determined according to whether a specific part, such as the crown of the user's head or eyes, is included in the image captured by the user.
  • the measuring device 100 of the present invention analyzes the user's motion and motion regardless of whether a specific part is included, and measures, classifies, predicts, and discriminates the user's concentration on non-face-to-face learning according to the analysis result. can do.
  • Classifying the degree of concentration for learning by looking at the user's motion may be similar to the state in which a teacher directly determines the degree of concentration of a student. This may mean that the user's concentration on non-face-to-face learning can be accurately measured when using the measuring device 100 of the present invention.
  • the measurement device 100 of the present invention may generate a machine-learned classification model using a deep learning model capable of accurately grasping a user's motion and motion included in image information, for example, Convolution 3D (C3D).
  • the measurement device 100 may classify the user's concentration using a classification model generated through a separate learning unit or the like.
  • the measuring device 100 shown in FIG. 1 may include an acquisition unit 190 , a first classification unit 110 , and a second classification unit 120 .
  • the acquisition unit 190 may acquire image information in which a user performing non-face-to-face learning is photographed.
  • the acquisition unit 190 may include a camera 90 that photographs the user.
  • the acquisition unit 190 may include a communication module that communicates with the camera 90 and receives image information from the camera 90 wired or wireless.
  • the first classification unit 110 may analyze a user's motion included in the image information.
  • the first classification unit 110 may classify, predict, measure, and determine the user's concentration for non-face-to-face learning through the user's motion analysis.
  • the second classifier 120 may analyze a second element of the user that is distinguished from the first element analyzed by the first classifier 110 .
  • Boosting ensemble learning may be applied in which the first classifier 110 and the second classifier 120, which are different from each other, sequentially learn to determine the final prediction result of the degree of concentration.
  • image information is input to either the first classification unit 110 or the second classification unit 120 to which boosting ensemble learning is applied, a processing result of the image information is output from the classification unit, and the processing result is input to the other classification units. It can be.
  • processing results of the remaining classification units may correspond to classification results of the degree of concentration.
  • the second classifier 120 may output a second feature included in the image information.
  • the second feature may be input to the first classification unit 110 together with image information.
  • the first classification unit 110 may analyze the user's motion using the image information and the second feature, and classify the user's concentration through the motion analysis.
  • the first classification unit 110 may extract a first vector value representing a user movement between frames constituting image information by using convolution 3D. For example, the movement of the entire body or the movement of the fist corresponding to the extremity when the user shakes his or her fist left and right may be expressed as a first vector value by the first classification unit 110. there is.
  • the second classification unit 120 may extract a second vector value representing the motion of the user's joints within the frame by using the motion estimation model. For example, when a user shakes a fist left and right, a rotation motion of a wrist based on an elbow or a motion of a wrist corresponding to a joint may be expressed as a second vector value by the second classification unit 120. .
  • the first classification unit 110 may predict the degree of concentration according to the first vector value.
  • the second classification unit 120 may additionally predict the degree of concentration according to the second vector value.
  • a first concentration predicted by the first classification unit 110 and a second concentration predicted by the second classification unit 120 may be defined.
  • the first classification unit 110 may predict the first concentration using the first vector value and the second concentration.
  • a machine-learned classification model may be installed in the first classification unit 110 to output a classification result of a degree of concentration when image information is input.
  • the classification model may classify the degree of concentration through the analysis of the user's motion included in the image information.
  • the classification model may be deep-learned using Convolution 3D.
  • C3D which can recognize both spatial and temporal features
  • the performance of a model created using C3D may tend to be proportional to its size.
  • the size of the model required can be rapidly increased. Due to this, various problems may appear.
  • C3D is a model that uses layers that combine spatial and temporal axes, and can use more parameters than basic CNNs. Therefore, overfitting is likely to occur and the learning rate is slow. To solve this problem, model compression may be applied.
  • Model compression may refer to a technique of reducing the size (amount of data, etc.) of a deep learning model while maintaining its performance. For example, a quantization technique may be applied.
  • weight regularization techniques can be applied to the classification model to increase the learning rate while preventing overfitting.
  • weight normalization techniques may include quantization and dropout.
  • Quantization may refer to a method of using fewer bits by truncating a floating point value.
  • Dropout may mean removing some of the used nodes, that is, dropping out some of the entire training data (set a ratio with a dropout rate) and then training.
  • dropout when dropout is applied, if the dropout data of batches are set differently in mini-batch, an ensemble that prevents overfitting of some features effect can be obtained.
  • a dropout rate may be applied.
  • the operation probability of the corresponding node may be calculated by the first classification unit 110 in a form in which 0.6 is multiplied by the weight of each node as a dropout rate or probability value.
  • the first classification unit 110 may remove the corresponding node when the operation probability value is 0.1 or less.
  • the first classification unit 110 may reduce the model by converting a 32-bit weight parameter into an 8-bit value by applying INT 8.
  • Dropout and quantization applied to the first classification model or the C3D step of FIG. 3 can convert a complex model into a simple model to prevent overfitting and improve learning speed.
  • the second classifier 120 may provide specific information within a frame constituting the image information to the first classifier 110.
  • the second classification unit 120 may calculate or extract the amount of change of the user's joint based on the user's keypoint value included in the specific frame through the motion estimation model.
  • the joint change amount may target at least one of a shoulder, an elbow, a pelvis, a knee, and a neck that have a lot of movement.
  • the joint change amount can be used as an additional parameter value when learning a corresponding frame in C3D.
  • the first classifier 110 may additionally input the change amount of the joint while inputting image information to the classification model.
  • the classification model may be deep-learned using convolution 3D to analyze a user's motion between frames constituting image information.
  • the second classification unit 120 may be loaded with a motion estimation model that extracts the amount of change in a user's joints included in a specific frame.
  • the last layer of the motion estimation model may be combined with the initial layer of Convolution 3D.
  • a boosting ensemble model in which the last layer of motion estimation is combined with the initial layer of C3D and sequentially learned may be provided.
  • weights of the motion estimation model are added to enable fast and accurate learning, and after learning is completed, fast and accurate classification and prediction are possible.
  • the dataset used in the experiment was 10 children's online education videos collected by the Gyeongbuk ICT Convergence Industry Promotion Association. In the original image, cam images of several people were separately classified and created as new images. Since 5 children participate in each class, the total data is 50. Labels were tested by classifying them into two categories: focused images and non-focused images.
  • the observer judges the non-face-to-face class video, measures the start time (St) and end time (Et) of the unfocused area, and uses the deep learning model to measure the start time of the unfocused area. (Sm) and end time (Em) are measured. For each measured time, the concentration index (FI) is calculated using Equation 1.
  • the first child showed a difference of 47%, the second child 3%, and the last child 38% in concentration index.
  • the second child's image there are cases where the face is visible in the unfocused section, so the difference between the YOLO-based model of the comparative example and the proposed model is small.
  • a boosting ensemble model combining C3D-based motion estimation can be proposed for the concentration analysis of children during non-face-to-face learning by the above measurement device 100.
  • the concentration measurement index showed an average of 42% higher performance than the YOLO-based learning method of the comparative example. If so, the performance can be higher.
  • 5 is a flowchart showing the measurement method of the present invention.
  • the measuring method of FIG. 5 may be performed by the measuring device 100 shown in FIG. 1 .
  • the measurement device 100 may include an acquisition step (S510), a second classification step (S520), and a third classification step (S530).
  • image information of a user who is performing non-face-to-face learning may be acquired.
  • the acquisition step (S510) may be performed by the acquisition unit 190.
  • the amount of change in the user's joints included in the frame constituting the image information may be extracted using the machine-learned motion estimation model.
  • the second classification step (S520) may be performed by the second classification unit 120.
  • the degree of concentration of the user for non-face-to-face learning may be classified according to the analysis of the user's movement between frames using the machine-learned classification model.
  • the first classification step (S530) may be performed by the first classification unit 110.
  • the last layer of the motion estimation model may be combined with the initial layer of the classification model.
  • the same information as the image information input to the second classification unit 120 may be input to the first classification unit 110 .
  • the joint change amount extracted from the motion estimation model may be additionally input to the first classification unit 110 .
  • Quantization and dropout may be applied to reduce the size of the classification model and increase the learning rate while sacrificing accuracy as much as the accuracy is compensated by the additionally input joint change amount.
  • FIG. 6 is a diagram illustrating a computing device according to an embodiment of the present invention.
  • the computing device TN100 of FIG. 6 may be a device described in this specification (eg, the measuring device 100, etc.).
  • the computing device TN100 may include at least one processor TN110, a transceiver TN120, and a memory TN130.
  • the computing device TN100 may further include a storage device TN140, an input interface device TN150, and an output interface device TN160. Elements included in the computing device TN100 may communicate with each other by being connected by a bus TN170.
  • the processor TN110 may execute program commands stored in at least one of the memory TN130 and the storage device TN140.
  • the processor TN110 may mean a central processing unit (CPU), a graphics processing unit (GPU), or a dedicated processor on which methods according to embodiments of the present invention are performed.
  • Processor TN110 may be configured to implement procedures, functions, methods, and the like described in relation to embodiments of the present invention.
  • the processor TN110 may control each component of the computing device TN100.
  • Each of the memory TN130 and the storage device TN140 may store various information related to the operation of the processor TN110.
  • Each of the memory TN130 and the storage device TN140 may include at least one of a volatile storage medium and a non-volatile storage medium.
  • the memory TN130 may include at least one of read only memory (ROM) and random access memory (RAM).
  • the transmitting/receiving device TN120 may transmit or receive a wired signal or a wireless signal.
  • the transmitting/receiving device TN120 may perform communication by being connected to a network.
  • the embodiments of the present invention are not implemented only through the devices and/or methods described so far, and may be implemented through a program that realizes functions corresponding to the configuration of the embodiments of the present invention or a recording medium in which the program is recorded. And, such implementation can be easily implemented by those skilled in the art from the description of the above-described embodiment.

Landscapes

  • Health & Medical Sciences (AREA)
  • Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Heart & Thoracic Surgery (AREA)
  • Surgery (AREA)
  • Public Health (AREA)
  • Animal Behavior & Ethology (AREA)
  • Molecular Biology (AREA)
  • Medical Informatics (AREA)
  • Veterinary Medicine (AREA)
  • Biophysics (AREA)
  • Pathology (AREA)
  • Biomedical Technology (AREA)
  • Business, Economics & Management (AREA)
  • Physiology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Educational Technology (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Tourism & Hospitality (AREA)
  • Dentistry (AREA)
  • Psychiatry (AREA)
  • Developmental Disabilities (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • Artificial Intelligence (AREA)
  • Child & Adolescent Psychology (AREA)
  • Psychology (AREA)
  • Social Psychology (AREA)
  • Multimedia (AREA)
  • Educational Administration (AREA)
  • Hospice & Palliative Care (AREA)
  • General Business, Economics & Management (AREA)
  • Economics (AREA)
  • Human Resources & Organizations (AREA)
  • Marketing (AREA)
  • Primary Health Care (AREA)
  • Strategic Management (AREA)
  • Computer Networks & Wireless Communication (AREA)

Abstract

A measuring device is provided. The measuring device may comprise: an acquisition unit for acquiring image information in which a user conducting non-face-to-face learning is filmed; and a first classification unit for classifying the user's attention level for the non-face-to-face learning by analyzing the user's motion included in the image information.

Description

동작 추정과 C3D가 결합된 앙상블 기법이 적용된 아동 비대면 학습 집중도 측정 장치 및 방법Apparatus and method for measuring child non-face-to-face learning concentration using ensemble technique combining motion estimation and C3D
본 발명은 비대면 학습에 대한 아동의 집중도를 측정하는 장치 및 방법에 관한 것이다.The present invention relates to an apparatus and method for measuring a child's concentration on non-face-to-face learning.
현재 COVID-19의 출현 이후로 많은 국가에서 바이러스를 억제를 위해 대면 수업 중단을 결정했으며, 수백만 명의 어린이와 청소년의 학습에 큰 영향을 주었다. 이에 대응하여 여러 국가의 교육부는 모든 학교 수준에서 온라인 학습을 구현하도록 권장하거나 의무화를 시행하고 있다.Since the current emergence of COVID-19, many countries have decided to suspend in-person classes to contain the virus, which has significantly impacted the learning of millions of children and adolescents. In response, ministries of education in many countries are encouraging or mandating the implementation of online learning at all school levels.
학교가 교과 과정에 온라인 학습을 적용하기 시작하면서 시간적, 공간적 제약이 사라지고 디지털 기술을 사용한 가상 교육 환경인 Zoom, Google Meet, Microsoft Teams 등과 같은 응용 프로그램을 사용하여 자신만의 학습 주기를 관리할 수 있으며, 다채로운 학습 콘텐츠 편성이 기능해졌다. As schools begin to apply online learning to their curriculum, time and space constraints disappear, and applications such as Zoom, Google Meet, Microsoft Teams, etc., a virtual educational environment using digital technology, can be used to manage their own learning cycles. , the organization of various learning contents became functional.
하지만, 전통적인 교실의 문제는 여전히 온라인 환경에서도 발생한다. 예를 들어 학생에게 제공되는 학습 자료 및 수업내용은 일반적으로 1시간을 넘는 경우가 많은데 감독되지 않는 비대면 환경에서 사람의 집중력은 대체로 1시간을 넘지 못한다. 특히 아동의 경우 연령별 집중도를 살펴보면 2세의 경우 5~7분, 3세는 9~10분, 4세는 12~15분, 5세는 15~20분 정도이며, 초등학교 저학년은 30분 정도가 한계이다. 또한, 인터넷을 사용하여 온라인 수업을 진행하기 때문에 학생들은 게임, 채팅, 유튜브 등에 노출되기 쉽다. 이러한 상황에서 한 명의 교사가 다수의 학생을 감독하는 것은 한계가 있다. However, the problems of the traditional classroom still exist in the online environment. For example, learning materials and lessons provided to students are generally more than one hour long, but in an unsupervised, non-face-to-face environment, human attention usually does not exceed one hour. In particular, in the case of children, looking at the level of concentration by age, 2 years old is 5 to 7 minutes, 3 years old is 9 to 10 minutes, 4 years old is 12 to 15 minutes, 5 years old is about 15 to 20 minutes, and elementary school students are limited to 30 minutes. In addition, since online classes are conducted using the Internet, students are easily exposed to games, chatting, and YouTube. In this situation, there is a limit to one teacher supervising multiple students.
본 발명은 복수의 기계 학습 모델이 결합된 앙상블 기법을 이용하여 아동의 비대면 학습에 대한 사용자의 집중도를 정확하게 측정, 분류, 예측하는 측정 장치 및 방법을 제공하기 위한 것이다.The present invention is to provide a measuring device and method for accurately measuring, classifying, and predicting a user's concentration on a child's non-face-to-face learning using an ensemble technique in which a plurality of machine learning models are combined.
본 발명의 측정 장치는 비대면 학습을 수행 중인 사용자가 촬영된 영상 정보를 획득하는 획득부; 상기 영상 정보에 포함된 사용자의 동작 분석을 통하여 상기 비대면 학습에 대한 상기 사용자의 집중도를 분류하는 제1 분류부;를 포함할 수 있다.The measurement device of the present invention includes an acquisition unit for obtaining image information in which a user performing non-face-to-face learning is photographed; A first classification unit for classifying the user's concentration for the non-face-to-face learning through analysis of the user's motion included in the image information; may include.
본 발명의 측정 방법은 비대면 학습을 수행 중인 사용자가 촬영된 영상 정보를 획득하는 획득 단계; 기계 학습된 동작 추정 모델을 이용하여, 상기 영상 정보를 구성하는 프레임에 포함된 상기 사용자의 관절 변화량을 추출하는 제2 분류 단계; 기계 학습된 분류 모델을 이용하여, 상기 프레임 간 사용자 움직임의 분석에 따라 상기 비대면 학습에 대한 상기 사용자의 집중도를 분류하는 제1 분류 단계;를 포함하고, 상기 동작 추정 모델의 마지막 레이어가 상기 분류 모델의 초기 레이어에 결합될 수 있다.The measurement method of the present invention includes an acquisition step of obtaining image information in which a user performing non-face-to-face learning is photographed; a second classification step of extracting a change amount of the user's joint included in a frame constituting the image information by using a machine-learned motion estimation model; A first classification step of classifying the user's concentration for the non-face-to-face learning according to the analysis of the user's motion between the frames using a machine-learned classification model; including, wherein the last layer of the motion estimation model is the classification It can be coupled to the initial layers of the model.
본 발명은 사용자 특히, 아동의 집중도 측정이 가능한 비대면 교육 환경을 구축하기 위해 시계열 분석 모델의 일종인 C3D(Convolution 3D)를 기반으로 동작하고, 동작 추정 모델이 결합된 딥러닝 앙상블 학습 방법론을 제안할 수 있다.The present invention proposes a deep learning ensemble learning methodology that operates based on C3D (Convolution 3D), a type of time series analysis model, and combines a motion estimation model to build a non-face-to-face education environment that can measure the concentration of users, especially children. can do.
도 1은 본 발명의 측정 장치를 나타낸 개략도이다.1 is a schematic diagram showing a measuring device of the present invention.
도 2는 비교 실시예의 집중도 측정기를 나타낸 개략도이다.2 is a schematic diagram showing a concentration measurer of a comparative example.
도 3은 측정 장치의 동작을 나타낸 개략도이다.3 is a schematic diagram showing the operation of the measuring device.
도 4는 실험 결과를 나타낸 도표이다.4 is a chart showing the experimental results.
도 5는 본 발명의 측정 방법을 나타낸 흐름도이다.5 is a flowchart showing the measurement method of the present invention.
도 6은 본 발명의 실시예에 따른, 컴퓨팅 장치를 나타내는 도면이다.6 is a diagram illustrating a computing device according to an embodiment of the present invention.
아래에서는 첨부한 도면을 참고로 하여 본 발명의 실시예에 대하여 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자가 용이하게 실시할 수 있도록 상세히 설명한다. 그러나 본 발명은 여러 가지 상이한 형태로 구현될 수 있으며 여기에서 설명하는 실시예에 한정되지 않는다. 그리고 도면에서 본 발명을 명확하게 설명하기 위해서 설명과 관계없는 부분은 생략하였으며, 명세서 전체를 통하여 유사한 부분에 대해서는 유사한 도면 부호를 붙였다. Hereinafter, with reference to the accompanying drawings, embodiments of the present invention will be described in detail so that those skilled in the art can easily carry out the present invention. However, the present invention may be embodied in many different forms and is not limited to the embodiments described herein. And in order to clearly explain the present invention in the drawings, parts irrelevant to the description are omitted, and similar reference numerals are attached to similar parts throughout the specification.
본 명세서에서, 동일한 구성요소에 대해서 중복된 설명은 생략한다.In this specification, redundant descriptions of the same components are omitted.
또한 본 명세서에서, 어떤 구성요소가 다른 구성요소에 '연결되어' 있다거나 '접속되어' 있다고 언급된 때에는, 그 다른 구성요소에 직접적으로 연결되어 있거나 또는 접속되어 있을 수도 있지만, 중간에 다른 구성요소가 존재할 수도 있다고 이해되어야 할 것이다. 반면에 본 명세서에서, 어떤 구성요소가 다른 구성요소에 '직접 연결되어' 있다거나 '직접 접속되어' 있다고 언급된 때에는, 중간에 다른 구성요소가 존재하지 않는 것으로 이해되어야 할 것이다.In addition, in this specification, when a component is referred to as being 'connected' or 'connected' to another component, it may be directly connected or connected to the other component, but another component in the middle It should be understood that may exist. On the other hand, in this specification, when a component is referred to as 'directly connected' or 'directly connected' to another component, it should be understood that no other component exists in the middle.
또한, 본 명세서에서 사용되는 용어는 단지 특정한 실시예를 설명하기 위해 사용되는 것으로써, 본 발명을 한정하려는 의도로 사용되는 것이 아니다.In addition, terms used in this specification are only used to describe specific embodiments and are not intended to limit the present invention.
또한 본 명세서에서, 단수의 표현은 문맥상 명백하게 다르게 뜻하지 않는 한, 복수의 표현을 포함할 수 있다. Also, in this specification, a singular expression may include a plurality of expressions unless the context clearly indicates otherwise.
또한 본 명세서에서, '포함하다' 또는 '가지다' 등의 용어는 명세서에 기재된 특징, 숫자, 단계, 동작, 구성요소, 부품, 또는 이들을 조합한 것이 존재함을 지정하려는 것일 뿐, 하나 또는 그 이상의 다른 특징, 숫자, 단계, 동작, 구성요소, 부품 또는 이들을 조합한 것의 존재 또는 부가 가능성을 미리 배제하지 않는 것으로 이해되어야 할 것이다.In addition, in this specification, terms such as 'include' or 'having' are only intended to designate that there is a feature, number, step, operation, component, part, or combination thereof described in the specification, and one or more It should be understood that the presence or addition of other features, numbers, steps, operations, components, parts, or combinations thereof is not precluded.
또한 본 명세서에서, '및/또는' 이라는 용어는 복수의 기재된 항목들의 조합 또는 복수의 기재된 항목들 중의 어느 항목을 포함한다. 본 명세서에서, 'A 또는 B'는, 'A', 'B', 또는 'A와 B 모두'를 포함할 수 있다.Also in this specification, the term 'and/or' includes a combination of a plurality of listed items or any item among a plurality of listed items. In this specification, 'A or B' may include 'A', 'B', or 'both A and B'.
또한 본 명세서에서, 본 발명의 요지를 흐리게 할 수 있는 공지 기능 및 구성에 대한 상세한 설명은 생략될 것이다.Also, in this specification, detailed descriptions of well-known functions and configurations that may obscure the subject matter of the present invention will be omitted.
기계 학습을 통하여 비대면 학습에 대한 사용자의 집중도를 측정하는 비교 실시예가 존재할 수 있다.There may be comparative examples that measure the user's concentration on non-face-to-face learning through machine learning.
일 예로, 딥러닝 기반의 얼굴 랜드마크 탐지를 사용하여 사용자의 눈의 개폐 여부를 판단하는 비교 실시예가 가능하다.As an example, a comparison embodiment in which it is determined whether the user's eyes are open or closed using facial landmark detection based on deep learning is possible.
또는, YOLO(You Only Look Once)를 이용하여 사용자의 정수리를 인식하는 비교 실시예가 가능하다.Alternatively, a comparison embodiment in which the top of the head of a user is recognized using YOLO (You Only Look Once) is possible.
이상의 비교 실시예는 사용자의 얼굴이 항상 카메라(90) 촬영 영상에 등장한다는 전제 하에 가능한 방법이다. 비교 실시예에 따르면, 사용자가 학습 진도에 맞춰 비대면 학습 콘텐츠가 출력되는 디스플레이가 아닌 책을 보는 경우, 필기를 하는 경우도 집중도가 낮은 상태로 판별되는 오류를 나타낼 수 있다. 사용자가 아동인 경우, 5분 이상 카메라(90) 또는 디스플레이를 응시하지 못하기 때문에 해당 방법을 적용하는 데에 한계가 있다.The above comparative example is a possible method under the premise that the user's face always appears in the image captured by the camera 90 . According to the comparison embodiment, when a user looks at a book rather than a display on which non-face-to-face learning content is output according to the learning progress, an error that is determined as low concentration may be indicated even when taking notes. When the user is a child, there is a limit to the application of the method because the user cannot stare at the camera 90 or the display for more than 5 minutes.
이러한 문제를 해결하기 위해, 본 발명의 측정 장치(100) 및 측정 방법은 두 개의 서로 다른 분류기가 순차적으로 학습하고, 이를 통해 집중도의 최종 예측 결과를 결정하는 부스팅 앙상블 학습을 이용할 수 있다. 집중도의 예측은 집중하는 경우와 집중하는 못하는 경우, 2가지로 분류될 수 있다. 따라서, 집중도의 예측은 집중도의 분류 또는 집중도의 결정 또는 집중도의 판단 등으로 지칭되어도 무방하다.In order to solve this problem, the measuring device 100 and the measuring method of the present invention may use boosting ensemble learning in which two different classifiers sequentially learn, and through this, a final prediction result of concentration is determined. Prediction of the degree of concentration can be classified into two types: the case of concentrating and the case of not concentrating. Accordingly, the prediction of the degree of concentration may also be referred to as classification of the degree of concentration, determination of the degree of concentration, or judgment of the degree of concentration.
예를 들어, 제1 분류부(110)는 공간적 특징과 시간적 특징을 같이 인식할 수 있는 C3D(Convolution 3D)를 이용해 촬영 영상을 구성하는 각 프레임 간 사용자의 움직임 벡터값을 추출할 수 있다.For example, the first classification unit 110 may extract a motion vector value of a user between each frame constituting a captured image using C3D (Convolution 3D) capable of recognizing both spatial and temporal characteristics.
제2 분류부(120)는 동작 추정 모델을 이용하여 각 프레임 내에 등장하는 사용자의 관절 움직임 벡터값을 추출할 수 있다.The second classification unit 120 may extract a joint motion vector value of a user appearing in each frame by using a motion estimation model.
본 발명의 측정 장치(100)에 따르면, C3D를 통해 영상의 전반적인 특징을 추출하고 특정 관절의 움직임을 더 정밀하게 분석하기 위해 동작 추정으로 한 번 더 예측을 수행하는 셈이 된다. 이처럼 C3D 모델에 동작 추정 모델을 결합한 앙상블 기법을 적용할 경우 특정 객체, 예를 들어 특정 사용자 또는 특정 관절만을 분석할 수 있으므로 정확한 비대면 학습에 대한 사용자, 특히 아동의 집중도 측정이 가능하다.According to the measuring device 100 of the present invention, the general features of the image are extracted through C3D and prediction is performed once more through motion estimation in order to more precisely analyze the movement of a specific joint. In this way, when an ensemble technique combining a motion estimation model with a C3D model is applied, only a specific object, for example, a specific user or a specific joint, can be analyzed, so it is possible to measure the concentration of users, especially children, for accurate non-face-to-face learning.
도 2는 비교 실시예의 집중도 측정기(10)를 나타낸 개략도이다.2 is a schematic diagram showing a concentration measurer 10 according to a comparative example.
비교 실시예의 측정기(10)는 딥러닝 기반의 랜드마크 탐지를 이용하여 추출된 얼굴 특징점 중에 눈에 해당하는 특징값만을 사용하여 눈의 개폐 여부를 확인한다.The measuring device 10 of the comparative embodiment checks whether the eyes are open or closed using only feature values corresponding to eyes among facial feature points extracted using deep learning-based landmark detection.
측정기(10)는 YOLO 기반의 정수리 탐지 모델을 이용하여 고개 숙임 정도를 확인한다.The measuring device 10 checks the degree of head bowing using a YOLO-based parietal detection model.
전체적인 시스템의 구조는 도 2와 같이, 먼저 눈의 개폐 여부를 인식하고(S 1), 눈이 탐지되지 않은 상태가 20초 이상 지속될(S 2) 경우 정수리 인식을 시도한다(S 3). 만약, 타이머가 동작하는 동안 한 번이라도 눈이 탐지될 경우 다시 프레임 입력단으로 돌아간다. 두 객체(눈과 정수리)가 모두 미인식되면, 집중도가 미집중 상태로 분류되는 이벤트가 발생될 수 있다(S 4).As shown in FIG. 2, the overall structure of the system first recognizes whether the eyes are open or closed (S 1), and if the state in which the eyes are not detected continues for more than 20 seconds (S 2), parietal recognition is attempted (S 3). If an eye is detected even once while the timer is running, it returns to the frame input stage. If both objects (the eyes and the top of the head) are unrecognized, an event in which the degree of concentration is classified as an unconcentrated state may occur (S4).
비교 실시예의 측정기(10)는 두 가지 탐지 모델을 사용하여 집중도 측정의 정확도를 높였으나, 영상에서 사람의 얼굴이 등장해야만 측정 가능하다. 따라서, 학습 시간 동안 가만히 있는 경우가 적은 아동의 경우에는 비교 실시예를 적용하기 어려운 문제가 있다. 집중도가 뛰어난 성인의 경우라 하더라도, 학습의 연장인 필기 등을 위해 고개를 숙이는 상태가 미집중 상태로 판단되는 오류가 나타날 수 있다. 결과적으로, 비교 실시예에 따르면, 비대면 학습에 대한 사용자의 집중도 측정이 부정확한 문제가 있다.Although the measuring device 10 of the comparative example uses two detection models to increase the accuracy of concentration measurement, measurement is possible only when a human face appears in an image. Therefore, it is difficult to apply the comparative example to children who rarely stay still during learning time. Even in the case of an adult with excellent concentration, an error may occur in which a state in which the head is lowered for writing, which is an extension of learning, is judged as an unconcentrated state. As a result, according to the comparative example, there is an inaccurate measurement of the user's concentration on non-face-to-face learning.
도 1은 본 발명의 측정 장치(100)를 나타낸 개략도이다. 도 3은 측정 장치(100)의 동작을 나타낸 개략도이다.1 is a schematic diagram showing a measuring device 100 of the present invention. 3 is a schematic diagram showing the operation of the measuring device 100.
본 발명의 측정 장치(100)는 사용자가 촬영된 영상에 사용자의 정수리, 눈 등의 특정 부위가 포함된 여부에 따라 집중도를 판별하는 비교 실시예와 다르게 동작할 수 있다.The measurement device 100 of the present invention may operate differently from the comparative example in which the degree of concentration is determined according to whether a specific part, such as the crown of the user's head or eyes, is included in the image captured by the user.
예를 들어, 본 발명의 측정 장치(100)는 특정 부위의 포함 여부에 상관없이 사용자의 동작, 움직임을 분석하고, 분석 결과에 따라 비대면 학습에 대한 사용자의 집중도를 측정, 분류, 예측, 판별할 수 있다. 사용자의 동작을 보고 학습에 대한 집중도를 분류하는 모습은 교사가 직접 학생의 집중도를 판별하는 모습과 흡사할 수 있다. 이는 곧, 본 발명의 측정 장치(100)를 이용할 경우 비대면 학습에 대한 사용자의 집중도가 정확하게 측정될 수 있음을 의미할 수 있다.For example, the measuring device 100 of the present invention analyzes the user's motion and motion regardless of whether a specific part is included, and measures, classifies, predicts, and discriminates the user's concentration on non-face-to-face learning according to the analysis result. can do. Classifying the degree of concentration for learning by looking at the user's motion may be similar to the state in which a teacher directly determines the degree of concentration of a student. This may mean that the user's concentration on non-face-to-face learning can be accurately measured when using the measuring device 100 of the present invention.
본 발명의 측정 장치(100)는 영상 정보에 포함된 사용자의 동작, 움직임을 정확하게 파악할 수 있는 딥러닝 모델, 예를 들어 Convolution 3D(C3D)를 이용하여 기계 학습된 분류 모델을 생성할 수 있다. 또는, 측정 장치(100)는 별도의 학습부 등을 통해 생성된 분류 모델을 이용하여 사용자의 집중도를 분류할 수 있다.The measurement device 100 of the present invention may generate a machine-learned classification model using a deep learning model capable of accurately grasping a user's motion and motion included in image information, for example, Convolution 3D (C3D). Alternatively, the measurement device 100 may classify the user's concentration using a classification model generated through a separate learning unit or the like.
도 1에 도시된 측정 장치(100)는 획득부(190), 제1 분류부(110), 제2 분류부(120)를 포함할 수 있다.The measuring device 100 shown in FIG. 1 may include an acquisition unit 190 , a first classification unit 110 , and a second classification unit 120 .
획득부(190)는 비대면 학습을 수행 중인 사용자가 촬영된 영상 정보를 획득할 수 있다.The acquisition unit 190 may acquire image information in which a user performing non-face-to-face learning is photographed.
획득부(190)는 사용자를 촬영하는 카메라(90)를 포함할 수 있다. 또는, 획득부(190)는 카메라(90)와 통신하고, 카메라(90)로부터 영상 정보를 유무선으로 수신하는 통신 모듈을 포함할 수 있다.The acquisition unit 190 may include a camera 90 that photographs the user. Alternatively, the acquisition unit 190 may include a communication module that communicates with the camera 90 and receives image information from the camera 90 wired or wireless.
제1 분류부(110)는 영상 정보에 포함된 사용자의 동작을 분석할 수 있다. 제1 분류부(110)는 사용자의 동작 분석을 통하여 비대면 학습에 대한 사용자의 집중도를 분류, 예측, 측정, 결정할 수 있다.The first classification unit 110 may analyze a user's motion included in the image information. The first classification unit 110 may classify, predict, measure, and determine the user's concentration for non-face-to-face learning through the user's motion analysis.
제2 분류부(120)는 제1 분류부(110)에서 분석된 제1 요소와 구분되는 사용자의 제2 요소를 분석할 수 있다.The second classifier 120 may analyze a second element of the user that is distinguished from the first element analyzed by the first classifier 110 .
서로 다른 제1 분류부(110)와 제2 분류부(120)가 순차적으로 학습하여 집중도의 최종 예측 결과를 결정하는 부스팅 앙상블 학습이 적용될 수 있다. 부스팅 앙상블 학습이 적용된 제1 분류부(110)와 제2 분류부(120) 중 어느 하나에 영상 정보가 입력되면 해당 분류부로부터 영상 정보의 처리 결과가 출력되고, 해당 처리 결과는 나머지 분류부로 입력될 수 있다. 그리고, 나머지 분류부의 처리 결과가 집중도의 분류 결과에 해당될 수 있다. 일 예로, 제2 분류부(120)에 영상 정보가 입력되면, 제2 분류부(120)는 영상 정보에 포함된 제2 특징을 출력할 수 있다. 제2 특징은 영상 정보와 함께 제1 분류부(110)로 입력될 수 있다. 제1 분류부(110)는 영상 정보 및 제2 특징을 이용하여 사용자의 동작을 분석하고, 동작 분석을 통해 사용자의 집중도를 분류할 수 있다.Boosting ensemble learning may be applied in which the first classifier 110 and the second classifier 120, which are different from each other, sequentially learn to determine the final prediction result of the degree of concentration. When image information is input to either the first classification unit 110 or the second classification unit 120 to which boosting ensemble learning is applied, a processing result of the image information is output from the classification unit, and the processing result is input to the other classification units. It can be. In addition, processing results of the remaining classification units may correspond to classification results of the degree of concentration. For example, when image information is input to the second classifier 120, the second classifier 120 may output a second feature included in the image information. The second feature may be input to the first classification unit 110 together with image information. The first classification unit 110 may analyze the user's motion using the image information and the second feature, and classify the user's concentration through the motion analysis.
제1 분류부(110)는 Convolution 3D를 이용하여 영상 정보를 구성하는 프레임 간 사용자 움직임을 나타내는 제1 벡터값을 추출할 수 있다. 예를 들어, 사용자가 주먹을 좌우로 흔들 때, 주먹을 좌우로 흔들 때의 신체 전체의 움직임 또는 말단에 해당하는 주먹의 움직임이 제1 분류부(110)에 의해 제1 벡터값으로 표현될 수 있다.The first classification unit 110 may extract a first vector value representing a user movement between frames constituting image information by using convolution 3D. For example, the movement of the entire body or the movement of the fist corresponding to the extremity when the user shakes his or her fist left and right may be expressed as a first vector value by the first classification unit 110. there is.
제2 분류부(120)는 동작 추정 모델을 이용하여 프레임 내 사용자의 관절 움직임을 나타내는 제2 벡터값을 추출할 수 있다. 예를 들어, 사용자가 주먹을 좌우로 흔들 때, 팔꿈치를 기준으로 하는 팔목의 회동 운동, 또는 관절에 해당하는 손목의 움직임이 제2 분류부(120)에 의해 제2 벡터값으로 표현될 수 있다.The second classification unit 120 may extract a second vector value representing the motion of the user's joints within the frame by using the motion estimation model. For example, when a user shakes a fist left and right, a rotation motion of a wrist based on an elbow or a motion of a wrist corresponding to a joint may be expressed as a second vector value by the second classification unit 120. .
제1 분류부(110)는 제1 벡터값에 따라 집중도를 예측할 수 있다. 제2 분류부(120)는 제2 벡터값에 따라 집중도를 추가로 예측할 수 있다.The first classification unit 110 may predict the degree of concentration according to the first vector value. The second classification unit 120 may additionally predict the degree of concentration according to the second vector value.
제1 분류부(110)에서 예측된 제1 집중도, 제2 분류부(120)에서 예측된 제2 집중도가 정의될 수 있다. 이때, 제1 분류부(110)는 제1 벡터값과 제2 집중도를 이용하여 제1 집중도를 예측할 수 있다.A first concentration predicted by the first classification unit 110 and a second concentration predicted by the second classification unit 120 may be defined. In this case, the first classification unit 110 may predict the first concentration using the first vector value and the second concentration.
제1 분류부(110)에는 영상 정보가 입력되면 집중도의 분류 결과를 출력하도록 기계 학습된 분류 모델이 탑재될 수 있다.A machine-learned classification model may be installed in the first classification unit 110 to output a classification result of a degree of concentration when image information is input.
분류 모델은 영상 정보에 포함된 사용자의 동작 분석을 통하여 집중도를 분류할 수 있다. 분류 모델은 Convolution 3D를 이용하여 딥러닝된 것일 수 있다.The classification model may classify the degree of concentration through the analysis of the user's motion included in the image information. The classification model may be deep-learned using Convolution 3D.
공간적 특징과 시간적 특징을 같이 인식할 수 있는 C3D를 이용할 경우, 사용자의 동작이 정확하게 추출될 수 있다. C3D를 이용하여 생성된 모델의 성능은 그 크기에 비례하는 경향을 가질 수 있다. 비교 실시예와 같이 특정 부위의 등장 여부의 판단에서 더 나아가 사용자의 전체 동작을 분석하기 위해, 요구되는 모델의 크기가 급격하게 증가될 수 있다. 이로 인해, 다양한 문제점이 등장할 수 있다. 예를 들어, C3D는 공간축과 시간축이 결합된 레이어를 사용하는 모델로, 기본 CNN보다 많은 매개변수를 사용할 수 있다. 따라서, 과적합(overfitting)이 발생하기 쉽고, 학습 속도가 느린 문제가 발생될 수 있다. 이와 같은 문제를 해소하기 위해 모델 압축(Model Compression)이 적용될 수 있다.When using C3D, which can recognize both spatial and temporal features, the user's motion can be accurately extracted. The performance of a model created using C3D may tend to be proportional to its size. As in the comparative example, in order to analyze the entire motion of the user in addition to determining whether a specific part appears, the size of the model required can be rapidly increased. Due to this, various problems may appear. For example, C3D is a model that uses layers that combine spatial and temporal axes, and can use more parameters than basic CNNs. Therefore, overfitting is likely to occur and the learning rate is slow. To solve this problem, model compression may be applied.
모델 압축은 딥러닝 모델의 성능을 유지하면서 크기(데이터량 등)를 줄이는 기법을 의미할 수 있다. 예를 들어, 양자화(Quantization) 기법이 적용될 수 있다.Model compression may refer to a technique of reducing the size (amount of data, etc.) of a deep learning model while maintaining its performance. For example, a quantization technique may be applied.
모델 압축과 유사하게, 분류 모델에는 과적합을 방지하면서 학습 속도를 증가시키는 가중치 정규화(regularization) 기법이 적용될 수 있다. 일 예로, 가중치 정규화 기법에는 양자화(Quantization)와 드랍아웃(dropout)이 포함될 수 있다.Similar to model compression, a weight regularization technique can be applied to the classification model to increase the learning rate while preventing overfitting. For example, weight normalization techniques may include quantization and dropout.
양자화(Quantization)는 부동 소수점 값을 잘라내서 더 적은 비트만을 사용하는 방식을 의미할 수 있다.Quantization may refer to a method of using fewer bits by truncating a floating point value.
드랍아웃(dropout)은 사용되는 노드(node)중 일부를 없애는 것, 다시 말해, 전체 학습 데이터 중 일부(dropout rate로 비율 설정)를 배제(dropout)하고 학습시키는 것을 의미할 수 있다. 특히, 드랍아웃을 적용하면, 미니 배츠(mini batch)에서 배츠(batch)의 드랍아웃(dropout) 데이터를 다르게 설정하면, 일부 특징(feature)이 과적합(overfitting)되는 문제를 막는 앙상블(ensemble) 효과가 획득될 수 있다. 결과를 추론할 때는 전체 입력 데이터를 사용하되 드랍아웃 레이트(dropout rate)가 적용될 수 있다.Dropout may mean removing some of the used nodes, that is, dropping out some of the entire training data (set a ratio with a dropout rate) and then training. In particular, when dropout is applied, if the dropout data of batches are set differently in mini-batch, an ensemble that prevents overfitting of some features effect can be obtained. When inferring a result, the entire input data is used, but a dropout rate may be applied.
일 예로, 제1 분류부(110)에 의해, 드랍아웃 레이트 또는 확률값으로 0.6이 각 노드의 가중치로 곱해지는 형태로 해당 노드의 동작 확률이 계산될 수 있다. 제1 분류부(110)는 동작 확률값이 0.1 이하일 경우 해당 노드를 제거할 수 있다. 양자화와 관련하여, 제1 분류부(110)는 INT 8을 적용하여 32비트의 가중치 매개변수를 8비트의 값으로 변환하여 모델을 축소시킬 수 있다.For example, the operation probability of the corresponding node may be calculated by the first classification unit 110 in a form in which 0.6 is multiplied by the weight of each node as a dropout rate or probability value. The first classification unit 110 may remove the corresponding node when the operation probability value is 0.1 or less. Regarding quantization, the first classification unit 110 may reduce the model by converting a 32-bit weight parameter into an 8-bit value by applying INT 8.
제1 분류 모델 또는 도 3의 C3D 단계에 적용된 드랍아웃과 양자화는 복잡한 모델을 단순한 모델로 변환하여 과적합을 방지하고, 학습 속도를 증진시킬 수 있다.Dropout and quantization applied to the first classification model or the C3D step of FIG. 3 can convert a complex model into a simple model to prevent overfitting and improve learning speed.
하지만, 특징점 정보가 축소되어 정확도가 떨어질 우려가 있다. 이를 보완하기 위해 제2 분류부(120)는 영상 정보를 구성하는 프레임 내의 특정 정보를 제1 분류부(110)에 제공할 수 있다However, there is a possibility that the feature point information is reduced and the accuracy is lowered. To compensate for this, the second classifier 120 may provide specific information within a frame constituting the image information to the first classifier 110.
일 예로, 제2 분류부(120)는 동작 추정 모델을 통하여, 특정 프레임에 포함된 사용자의 키포인트값을 기반으로 사용자의 관절의 변화량을 산출하거나 추출할 수 있다. 관절 변화량은 움직임이 많은 어깨, 팔꿈치, 골반, 무릎 부위, 목 중 적어도 하나를 대상으로 할 수 있다. 관절 변화량은 C3D에서 해당 프레임을 학습할 때 추가 매개변수 값으로 사용될 수 있다. 다시 말해, 제1 분류부(110)는 분류 모델이 특정 프레임을 학습하거나 분류할 때, 분류 모델에 영상 정보를 입력하는 동시에 관절의 변화량을 추가로 입력할 수 있다.For example, the second classification unit 120 may calculate or extract the amount of change of the user's joint based on the user's keypoint value included in the specific frame through the motion estimation model. The joint change amount may target at least one of a shoulder, an elbow, a pelvis, a knee, and a neck that have a lot of movement. The joint change amount can be used as an additional parameter value when learning a corresponding frame in C3D. In other words, when the classification model learns or classifies a specific frame, the first classifier 110 may additionally input the change amount of the joint while inputting image information to the classification model.
예를 들어, 분류 모델은 영상 정보를 구성하는 프레임 간 사용자의 동작을 분석하도록 Convolution 3D를 이용하여 딥러닝된 것일 수 있다. 제2 분류부(120)에는 특정 프레임에 포함된 사용자의 관절 변화량을 추출하는 동작 추정 모델이 탑재될 수 있다. 이때, 도 3에 도시된 바와 같이, 동작 추정 모델의 마지막 레이어가 Convolution 3D의 초기 레이어에 결합될 수 있다.For example, the classification model may be deep-learned using convolution 3D to analyze a user's motion between frames constituting image information. The second classification unit 120 may be loaded with a motion estimation model that extracts the amount of change in a user's joints included in a specific frame. In this case, as shown in FIG. 3 , the last layer of the motion estimation model may be combined with the initial layer of Convolution 3D.
이상의 측정 장치(100)에 따르면, 동작 추정의 마지막 레이어가 C3D의 초기 레이어에 결합하여 순차적으로 학습하는 부스팅 앙상블 모델이 제공될 수 있다. 축소된 분류 모델의 부족한 정보량에 대하여 동작 추정 모델의 가중치가 추가되어 빠르고 정확한 학습이 가능하고, 학습이 완료된 후에는 빠르고 정확한 정확도의 분류, 예측이 가능하다.According to the measurement device 100 described above, a boosting ensemble model in which the last layer of motion estimation is combined with the initial layer of C3D and sequentially learned may be provided. For the insufficient information amount of the reduced classification model, weights of the motion estimation model are added to enable fast and accurate learning, and after learning is completed, fast and accurate classification and prediction are possible.
도 4는 실험 결과를 나타낸 도표이다.4 is a chart showing the experimental results.
실험에 사용된 데이터셋은 경북ICT융합산업진흥협회에서 자체 수집한 10개의 아동 온라인 교육 영상을 사용하였다. 원본 영상에서 여러 명의 캠영상을 각각 따로 분류하여 새로운 영상으로 생성하였다. 한 수업당 5명의 아동이 참여하므로 전체 데이터는 총 50개이다. 레이블은 집중하고 있는 영상과 집중하지 않은 영상 두 개로 분류하여 실험하였다.The dataset used in the experiment was 10 children's online education videos collected by the Gyeongbuk ICT Convergence Industry Promotion Association. In the original image, cam images of several people were separately classified and created as new images. Since 5 children participate in each class, the total data is 50. Labels were tested by classifying them into two categories: focused images and non-focused images.
집중도 측정 정확도 평가를 위해 비대면 수업 영상을 관찰자(교사)가 판단하여 집중하지 못한 영역의 시작 시각(St)과 종료 시각(Et)을 측정하고 딥러닝 모델을 이용하여 집중하지 못한 영역의 시작 시각(Sm)과 종료 시각(Em)을 측정한다. 측정된 각 시각은 수식1을 활용하여 집중도 지표(FI)를 계산한다.To evaluate the accuracy of concentration measurement, the observer (teacher) judges the non-face-to-face class video, measures the start time (St) and end time (Et) of the unfocused area, and uses the deep learning model to measure the start time of the unfocused area. (Sm) and end time (Em) are measured. For each measured time, the concentration index (FI) is calculated using Equation 1.
Figure PCTKR2022015538-appb-img-000001
...수식1
Figure PCTKR2022015538-appb-img-000001
...formula 1
도 4는 본 발명의 측정 장치(100)와 비교 실시예의 집중도 지표를 비교한 것이다.4 compares the concentration index of the measurement device 100 of the present invention and the comparative example.
수업에 참여한 3명의 아동을 대상으로 비교한 결과 첫 번째 아동은 47%, 두 번째 아동은 3% 그리고 마지막 아동은 38%의 집중도 지표 차이를 보였다. 두 번째 아동의 영상은 집중하지 못한 구간에서 얼굴이 보이는 경우가 있어 비교 실시예의 YOLO 기반의 모델과 제안한 모델의 차이가 작으나, 나머지 두 영상에서는 얼굴이 보이지 않는 경우가 많아 확연한 차이가 보인다.As a result of comparing the 3 children who participated in the class, the first child showed a difference of 47%, the second child 3%, and the last child 38% in concentration index. In the second child's image, there are cases where the face is visible in the unfocused section, so the difference between the YOLO-based model of the comparative example and the proposed model is small.
이상의 측정 장치(100)에 의해 비대면 학습 동안 아동의 집중도 분석을 위해 C3D 기반의 동작 추정을 결합한 부스팅 앙상블 모델이 제안될 수 있다. 행위와 시간 축의 정보량을 앙상블 학습하는 기법으로 비교 실시예의 YOLO 기반의 학습 기법보다 집중도 측정 지표가 평균 42% 높은 성능을 보였으며, 향후 다양한 수업의 데이터셋을 추가하여 많은 행위에 대한 정보를 추가 학습할 경우 성능은 더 높아질 수 있다.A boosting ensemble model combining C3D-based motion estimation can be proposed for the concentration analysis of children during non-face-to-face learning by the above measurement device 100. As a technique for ensemble learning of the amount of information on the action and time axis, the concentration measurement index showed an average of 42% higher performance than the YOLO-based learning method of the comparative example. If so, the performance can be higher.
도 5는 본 발명의 측정 방법을 나타낸 흐름도이다.5 is a flowchart showing the measurement method of the present invention.
도 5의 측정 방법은 도 1에 도시된 측정 장치(100)에 의해 수행될 수 있다.The measuring method of FIG. 5 may be performed by the measuring device 100 shown in FIG. 1 .
측정 장치(100)는 획득 단계(S 510), 제2 분류 단계(S 520), 제3 분류 단계(S 530)를 포함할 수 있다.The measurement device 100 may include an acquisition step (S510), a second classification step (S520), and a third classification step (S530).
획득 단계(S 510)는 비대면 학습을 수행 중인 사용자가 촬영된 영상 정보를 획득할 수 있다. 획득 단계(S 510)는 획득부(190)에 의해 수행될 수 있다.In the acquiring step (S510), image information of a user who is performing non-face-to-face learning may be acquired. The acquisition step (S510) may be performed by the acquisition unit 190.
제2 분류 단계(S 520)는 기계 학습된 동작 추정 모델을 이용하여, 영상 정보를 구성하는 프레임에 포함된 사용자의 관절 변화량을 추출할 수 있다. 제2 분류 단계(S 520)는 제2 분류부(120)에 의해 수행될 수 있다.In the second classification step ( S520 ), the amount of change in the user's joints included in the frame constituting the image information may be extracted using the machine-learned motion estimation model. The second classification step (S520) may be performed by the second classification unit 120.
제1 분류 단계(S 530)는 기계 학습된 분류 모델을 이용하여, 프레임 간 사용자 움직임의 분석에 따라 비대면 학습에 대한 사용자의 집중도를 분류할 수 있다. 제1 분류 단계(S 530)는 제1 분류부(110)에 의해 수행될 수 있다.In the first classification step (S530), the degree of concentration of the user for non-face-to-face learning may be classified according to the analysis of the user's movement between frames using the machine-learned classification model. The first classification step (S530) may be performed by the first classification unit 110.
이때, 동작 추정 모델의 마지막 레이어가 상기 분류 모델의 초기 레이어에 결합될 수 있다. 기본적으로, 제1 분류부(110)에는 제2 분류부(120)에 입력된 영상 정보와 동일한 정보가 입력될 수 있다. 여기에 더하여, 동작 추정 모델에서 추출된 관절 변화량이 추가로 제1 분류부(110)로 입력될 수 있다. 추가 입력된 관절 변화량에 의해 정확도가 보상된 만큼, 정확도를 희생하면서 분류 모델의 크기를 줄이고 학습 속도를 증진시키는 양자화, 드랍아웃이 적용될 수 있다. 결과적으로, 본 발명의 측정 장치(100) 및 측정 방법에 따르면, 영상 분석을 통한 사용자의 집중도 예측에 C3D를 현실적으로 적용할 수 있는 방안이 제공될 수 있다.In this case, the last layer of the motion estimation model may be combined with the initial layer of the classification model. Basically, the same information as the image information input to the second classification unit 120 may be input to the first classification unit 110 . In addition to this, the joint change amount extracted from the motion estimation model may be additionally input to the first classification unit 110 . Quantization and dropout may be applied to reduce the size of the classification model and increase the learning rate while sacrificing accuracy as much as the accuracy is compensated by the additionally input joint change amount. As a result, according to the measuring device 100 and the measuring method of the present invention, a method for realistically applying C3D to predicting a user's concentration level through image analysis can be provided.
도 6은 본 발명의 실시예에 따른, 컴퓨팅 장치를 나타내는 도면이다. 도 6의 컴퓨팅 장치(TN100)는 본 명세서에서 기술된 장치(예, 측정 장치(100) 등) 일 수 있다. 6 is a diagram illustrating a computing device according to an embodiment of the present invention. The computing device TN100 of FIG. 6 may be a device described in this specification (eg, the measuring device 100, etc.).
도 6의 실시예에서, 컴퓨팅 장치(TN100)는 적어도 하나의 프로세서(TN110), 송수신 장치(TN120), 및 메모리(TN130)를 포함할 수 있다. 또한, 컴퓨팅 장치(TN100)는 저장 장치(TN140), 입력 인터페이스 장치(TN150), 출력 인터페이스 장치(TN160) 등을 더 포함할 수 있다. 컴퓨팅 장치(TN100)에 포함된 구성 요소들은 버스(bus)(TN170)에 의해 연결되어 서로 통신을 수행할 수 있다.In the embodiment of FIG. 6 , the computing device TN100 may include at least one processor TN110, a transceiver TN120, and a memory TN130. In addition, the computing device TN100 may further include a storage device TN140, an input interface device TN150, and an output interface device TN160. Elements included in the computing device TN100 may communicate with each other by being connected by a bus TN170.
프로세서(TN110)는 메모리(TN130) 및 저장 장치(TN140) 중에서 적어도 하나에 저장된 프로그램 명령(program command)을 실행할 수 있다. 프로세서(TN110)는 중앙 처리 장치(CPU: central processing unit), 그래픽 처리 장치(GPU: graphics processing unit), 또는 본 발명의 실시예에 따른 방법들이 수행되는 전용의 프로세서를 의미할 수 있다. 프로세서(TN110)는 본 발명의 실시예와 관련하여 기술된 절차, 기능, 및 방법 등을 구현하도록 구성될 수 있다. 프로세서(TN110)는 컴퓨팅 장치(TN100)의 각 구성 요소를 제어할 수 있다.The processor TN110 may execute program commands stored in at least one of the memory TN130 and the storage device TN140. The processor TN110 may mean a central processing unit (CPU), a graphics processing unit (GPU), or a dedicated processor on which methods according to embodiments of the present invention are performed. Processor TN110 may be configured to implement procedures, functions, methods, and the like described in relation to embodiments of the present invention. The processor TN110 may control each component of the computing device TN100.
메모리(TN130) 및 저장 장치(TN140) 각각은 프로세서(TN110)의 동작과 관련된 다양한 정보를 저장할 수 있다. 메모리(TN130) 및 저장 장치(TN140) 각각은 휘발성 저장 매체 및 비휘발성 저장 매체 중에서 적어도 하나로 구성될 수 있다. 예를 들어, 메모리(TN130)는 읽기 전용 메모리(ROM: read only memory) 및 랜덤 액세스 메모리(RAM: random access memory) 중에서 적어도 하나로 구성될 수 있다. Each of the memory TN130 and the storage device TN140 may store various information related to the operation of the processor TN110. Each of the memory TN130 and the storage device TN140 may include at least one of a volatile storage medium and a non-volatile storage medium. For example, the memory TN130 may include at least one of read only memory (ROM) and random access memory (RAM).
송수신 장치(TN120)는 유선 신호 또는 무선 신호를 송신 또는 수신할 수 있다. 송수신 장치(TN120)는 네트워크에 연결되어 통신을 수행할 수 있다.The transmitting/receiving device TN120 may transmit or receive a wired signal or a wireless signal. The transmitting/receiving device TN120 may perform communication by being connected to a network.
한편, 본 발명의 실시예는 지금까지 설명한 장치 및/또는 방법을 통해서만 구현되는 것은 아니며, 본 발명의 실시예의 구성에 대응하는 기능을 실현하는 프로그램 또는 그 프로그램이 기록된 기록 매체를 통해 구현될 수도 있으며, 이러한 구현은 상술한 실시예의 기재로부터 본 발명이 속하는 기술 분야의 통상의 기술자라면 쉽게 구현할 수 있는 것이다. Meanwhile, the embodiments of the present invention are not implemented only through the devices and/or methods described so far, and may be implemented through a program that realizes functions corresponding to the configuration of the embodiments of the present invention or a recording medium in which the program is recorded. And, such implementation can be easily implemented by those skilled in the art from the description of the above-described embodiment.
이상에서 본 발명의 실시예에 대하여 상세하게 설명하였지만 본 발명의 권리범위는 이에 한정되는 것은 아니고 다음의 청구범위에서 정의하고 있는 본 발명의 기본 개념을 이용한 통상의 기술자의 여러 변형 및 개량 형태 또한 본 발명의 권리범위에 속하는 것이다.Although the embodiments of the present invention have been described in detail above, the scope of the present invention is not limited thereto, and various modifications and improvements of those skilled in the art using the basic concept of the present invention defined in the following claims are also provided. fall within the scope of the invention.

Claims (10)

  1. 비대면 학습을 수행 중인 사용자가 촬영된 영상 정보를 획득하는 획득부;Acquiring unit for acquiring image information in which a user performing non-face-to-face learning is photographed;
    상기 영상 정보에 포함된 사용자의 동작 분석을 통하여 상기 비대면 학습에 대한 상기 사용자의 집중도를 분류하는 제1 분류부;A first classification unit for classifying the degree of concentration of the user for the non-face-to-face learning through analysis of the user's motion included in the image information;
    를 포함하는 측정 장치.Measuring device comprising a.
  2. 제1항에 있어서,According to claim 1,
    상기 제1 분류부에서 분석된 제1 요소와 구분되는 상기 사용자의 제2 요소를 분석하는 제2 분류부가 마련되고,A second classification unit for analyzing a second factor of the user that is distinguished from the first factor analyzed by the first classification unit is provided;
    서로 다른 상기 제1 분류부와 상기 제2 분류부가 순차적으로 학습하여 상기 집중도의 최종 예측 결과를 결정하는 부스팅 앙상블 학습이 적용된 측정 장치.A measuring device to which boosting ensemble learning is applied, wherein the first classifier and the second classifier, which are different from each other, sequentially learn to determine a final prediction result of the degree of concentration.
  3. 제1항에 있어서,According to claim 1,
    상기 제1 분류부는 Convolution 3D를 이용하여 상기 영상 정보를 구성하는 프레임 간 사용자 움직임을 나타내는 제1 벡터값을 추출하고,The first classification unit extracts a first vector value representing a user movement between frames constituting the image information using convolution 3D;
    동작 추정 모델을 이용하여 상기 프레임 내 사용자의 관절 움직임을 나타내는 제2 벡터값을 추출하는 제2 분류부가 마련되고,A second classification unit is provided for extracting a second vector value representing a motion of a user's joint within the frame using a motion estimation model;
    상기 제1 분류부는 상기 제1 벡터값에 따라 상기 집중도를 예측하고,The first classification unit predicts the concentration according to the first vector value,
    상기 제2 분류부는 상기 제2 벡터값에 따라 상기 집중도를 추가로 예측하는 측정 장치.The second classifier further predicts the degree of concentration according to the second vector value.
  4. 제3항에 있어서,According to claim 3,
    상기 제1 분류부에서 예측된 제1 집중도, 상기 제2 분류부에서 예측된 제2 집중도가 정의될 때,When the first concentration predicted by the first classification unit and the second concentration predicted by the second classification unit are defined,
    상기 제1 분류부는 상기 제1 벡터값과 상기 제2 집중도를 이용하여 상기 제1 집중도를 예측하는 측정 장치.The first classification unit predicts the first concentration using the first vector value and the second concentration.
  5. 제1항에 있어서,According to claim 1,
    상기 제1 분류부에는 상기 영상 정보가 입력되면 상기 집중도의 분류 결과를 출력하도록 기계 학습된 분류 모델이 탑재되며,A machine-learned classification model is installed in the first classification unit to output a classification result of the degree of concentration when the image information is input,
    상기 분류 모델은 상기 영상 정보에 포함된 상기 사용자의 동작 분석을 통해 상기 집중도를 분류하는 측정 장치.The classification model classifies the degree of concentration by analyzing the motion of the user included in the image information.
  6. 제5항에 있어서,According to claim 5,
    상기 분류 모델은 Convolution 3D를 이용하여 딥러닝된 것인 측정 장치.The classification model is a measuring device that is deep-learned using Convolution 3D.
  7. 제5항에 있어서,According to claim 5,
    상기 분류 모델에는 과적합을 방지하면서 학습 속도를 증가시키는 가중치 정규화 기법이 적용되는 측정 장치.A measurement device to which a weight normalization technique is applied to the classification model to increase a learning rate while preventing overfitting.
  8. 제5항에 있어서,According to claim 5,
    상기 분류 모델에는 드랍아웃(dropout)과 양자화(Quantization)가 적용되고,Dropout and quantization are applied to the classification model,
    상기 영상 정보를 구성하는 프레임 내의 특정 정보를 상기 제1 분류부에 제공하는 제2 분류부가 마련되며,A second classification unit providing specific information within a frame constituting the image information to the first classification unit is provided;
    상기 제2 분류부는 특정 프레임에 포함된 상기 사용자의 관절의 변화량을 추출하고,The second classification unit extracts the amount of change of the user's joint included in the specific frame,
    상기 제1 분류부는 상기 분류 모델이 상기 특정 프레임을 학습하거나 분류할 때, 상기 분류 모델에 상기 관절의 변화량을 추가로 입력하는 측정 장치.Wherein the first classification unit additionally inputs the amount of change of the joint to the classification model when the classification model learns or classifies the specific frame.
  9. 제5항에 있어서,According to claim 5,
    상기 분류 모델은 상기 영상 정보를 구성하는 프레임 간 상기 사용자의 동작을 분석하도록 Convolution 3D를 이용하여 딥러닝된 것이고,The classification model is deep-learned using convolution 3D to analyze the motion of the user between frames constituting the image information,
    특정 프레임에 포함된 상기 사용자의 관절 변화량을 추출하는 동작 추정 모델이 탑재된 제2 분류부가 마련되며,A second classification unit equipped with a motion estimation model for extracting the amount of change in the user's joints included in a specific frame is provided;
    상기 동작 추정 모델의 마지막 레이어가 상기 Convolution 3D의 초기 레이어에 결합된 측정 장치.A measurement device in which the last layer of the motion estimation model is combined with the initial layer of the convolution 3D.
  10. 측정 장치에 의해 수행되는 측정 방법에 있어서,In the measuring method performed by the measuring device,
    비대면 학습을 수행 중인 사용자가 촬영된 영상 정보를 획득하는 획득 단계;Acquisition step of obtaining image information in which a user performing non-face-to-face learning is photographed;
    기계 학습된 동작 추정 모델을 이용하여, 상기 영상 정보를 구성하는 프레임에 포함된 상기 사용자의 관절 변화량을 추출하는 제2 분류 단계;a second classification step of extracting a change amount of the user's joint included in a frame constituting the image information by using a machine-learned motion estimation model;
    기계 학습된 분류 모델을 이용하여, 상기 프레임 간 사용자 움직임의 분석에 따라 상기 비대면 학습에 대한 상기 사용자의 집중도를 분류하는 제1 분류 단계;를 포함하고,A first classification step of classifying the user's concentration for the non-face-to-face learning according to the analysis of the user's movement between the frames using a machine-learned classification model; Including,
    상기 동작 추정 모델의 마지막 레이어가 상기 분류 모델의 초기 레이어에 결합되는 측정 방법.A measurement method in which the last layer of the motion estimation model is combined with the initial layer of the classification model.
PCT/KR2022/015538 2021-12-13 2022-10-13 Apparatus and method for measuring attention level of child for non-face-to-face learning, by using ensemble technique combining motion estimation and c3d WO2023113182A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
KR1020210177857A KR20230089322A (en) 2021-12-13 2021-12-13 Apparatus and method for measuring children's concentration on non-face-to-face learning with ensemble technique combined with motion estimation and Convolution 3D
KR10-2021-0177857 2021-12-13

Publications (1)

Publication Number Publication Date
WO2023113182A1 true WO2023113182A1 (en) 2023-06-22

Family

ID=86772899

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/KR2022/015538 WO2023113182A1 (en) 2021-12-13 2022-10-13 Apparatus and method for measuring attention level of child for non-face-to-face learning, by using ensemble technique combining motion estimation and c3d

Country Status (2)

Country Link
KR (1) KR20230089322A (en)
WO (1) WO2023113182A1 (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP6898502B1 (en) * 2020-07-29 2021-07-07 株式会社オプティム Programs, methods and information processing equipment
KR102293234B1 (en) * 2020-09-24 2021-08-25 월드버텍 주식회사 Mutimedia education system using Artificial Intelligence and Method for supporting learning
KR20210128054A (en) * 2020-04-16 2021-10-26 클라우드라인주식회사 Online lecture system to secure enhanced class concentration
KR102330159B1 (en) * 2020-12-09 2021-11-23 주식회사 아이즈솔 Evaluation system and method of online class attention using class attitude pattern analysis
KR102336574B1 (en) * 2020-12-04 2021-12-07 (주)매트리오즈 Learning Instruction Method Using Video Images of Non-face-to-face Learners, and Management Server Used Therein

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR102550964B1 (en) 2016-03-23 2023-07-05 한국전자통신연구원 Apparatus and Method for Measuring Concentrativeness using Personalization Model

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20210128054A (en) * 2020-04-16 2021-10-26 클라우드라인주식회사 Online lecture system to secure enhanced class concentration
JP6898502B1 (en) * 2020-07-29 2021-07-07 株式会社オプティム Programs, methods and information processing equipment
KR102293234B1 (en) * 2020-09-24 2021-08-25 월드버텍 주식회사 Mutimedia education system using Artificial Intelligence and Method for supporting learning
KR102336574B1 (en) * 2020-12-04 2021-12-07 (주)매트리오즈 Learning Instruction Method Using Video Images of Non-face-to-face Learners, and Management Server Used Therein
KR102330159B1 (en) * 2020-12-09 2021-11-23 주식회사 아이즈솔 Evaluation system and method of online class attention using class attitude pattern analysis

Also Published As

Publication number Publication date
KR20230089322A (en) 2023-06-20

Similar Documents

Publication Publication Date Title
Sahoo et al. Sign language recognition: State of the art
US6128397A (en) Method for finding all frontal faces in arbitrarily complex visual scenes
WO2020182121A1 (en) Expression recognition method and related device
Bhavana et al. Hand sign recognition using CNN
WO2020140723A1 (en) Method, apparatus and device for detecting dynamic facial expression, and storage medium
Fu et al. University classroom attendance based on deep learning
WO2017164478A1 (en) Method and apparatus for recognizing micro-expressions through deep learning analysis of micro-facial dynamics
WO2018212584A2 (en) Method and apparatus for classifying class, to which sentence belongs, using deep neural network
Bendarkar et al. Web based recognition and translation of American sign language with CNN and RNN
CN110413551B (en) Information processing apparatus, method and device
CN110457523A (en) The choosing method of cover picture, the training method of model, device and medium
Punsara et al. IoT based sign language recognition system
Agarwal et al. Face recognition based smart and robust attendance monitoring using deep CNN
Zhu et al. Unsupervised voice-face representation learning by cross-modal prototype contrast
WO2024054079A1 (en) Artificial intelligence mirroring play bag
WO2023113182A1 (en) Apparatus and method for measuring attention level of child for non-face-to-face learning, by using ensemble technique combining motion estimation and c3d
Zhao et al. Automated assessment system for neonatal endotracheal intubation using dilated convolutional neural network
Rozaliev et al. Detailed analysis of postures and gestures for the identification of human emotional reactions
Sarmah et al. Facial identification expression-based attendance monitoring and emotion detection—A deep CNN approach
WO2022131793A1 (en) Method and apparatus for recognizing handwriting inputs in multiple-user environment
WO2022191366A1 (en) Electronic device and method of controlling same
Rai et al. Gesture recognition system
Pradeep et al. Advancement of sign language recognition through technology using python and OpenCV
Riaz et al. Surface EMG Real-Time Chinese Language Recognition Using Artificial Neural Networks
Alleema et al. Recognition of American sign language using modified deep residual CNN with modified canny edge segmentation

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22907653

Country of ref document: EP

Kind code of ref document: A1