WO2023243398A1 - Unusual behavior discrimination method, unusual behavior discrimination program, and unusual behavior discrimination device - Google Patents

Unusual behavior discrimination method, unusual behavior discrimination program, and unusual behavior discrimination device Download PDF

Info

Publication number
WO2023243398A1
WO2023243398A1 PCT/JP2023/020081 JP2023020081W WO2023243398A1 WO 2023243398 A1 WO2023243398 A1 WO 2023243398A1 JP 2023020081 W JP2023020081 W JP 2023020081W WO 2023243398 A1 WO2023243398 A1 WO 2023243398A1
Authority
WO
WIPO (PCT)
Prior art keywords
behavior
exceptional
exceptional behavior
feature
behavioral
Prior art date
Application number
PCT/JP2023/020081
Other languages
French (fr)
Japanese (ja)
Inventor
文彬 佐藤
大気 関井
遼 八馬
Original Assignee
コニカミノルタ株式会社
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by コニカミノルタ株式会社 filed Critical コニカミノルタ株式会社
Publication of WO2023243398A1 publication Critical patent/WO2023243398A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks

Definitions

  • the present disclosure relates to an exceptional behavior determination method, an exceptional behavior determination program, and an exceptional behavior determination device, and particularly relates to a technique for reducing learning costs for performing exceptional behavior determination processing using a machine learning model.
  • the feature extractor (encoders) 1201 sequentially receives T pieces of skeleton information corresponding to the first to T frames, it extracts the features.
  • Reconstructing decoders 1202 use the features extracted by the feature extractor 1201 to reconstruct T pieces of skeleton information corresponding to the first to T frames. Reconfiguration information is generated sequentially.
  • a Multi-Layer Perceptron (MLP) 1203 uses the T pieces of reconstruction information generated by the reconstructor 1202 to reconstruct T pieces of skeleton information corresponding to each frame from the 1st to the Tth frame. Configure.
  • a predictor 1204 uses the feature quantities extracted by the feature quantity extractor 1201 to generate P pieces of skeleton information for respectively predicting P pieces of skeleton information corresponding to each of the T+1th to T+Pth frames. Prediction information is generated sequentially.
  • the multilayer perceptron 1205 uses the P pieces of prediction information generated by the predictor 1204 to predict P pieces of skeleton information corresponding to each frame from the T+1st to the T+Pth frame.
  • an anomaly score is calculated using a loss function 1216 from the reconstruction error of the T reconstructed skeleton information and the prediction error of the predicted P skeleton information.
  • the MPED-RNN Message-Passing Encoder-Decoder Neural Network 1200, which is composed of a feature extractor 1201, a reconstructor 1202, a predictor 1204, and multilayer perceptrons 1203 and 1205, is a so-called neural network and is configured to perform normal behavior. Machine learning is performed to reduce the degree of anomaly using video footage of people in the room.
  • the degree of anomaly will be reduced for video footage other than the video footage used for machine learning for people who are behaving normally. Therefore, it is possible to determine that a person with a high degree of exceptionality is exhibiting exceptional behavior that is different from normal behavior that has been machine learned in advance.
  • MPED-RNN1200 when determining exceptional behavior from video footage shot by a surveillance camera, MPED-RNN1200's machine learning method uses video footage actually shot at the location where the surveillance camera is installed (hereinafter referred to as "on-site footage") as training data. If used in this way, it will be easier to cover the normal behavior performed at the shooting site without excess or deficiency, so it is thought that the range of behavior that is determined to be normal behavior can be appropriately minimized.
  • the MPED-RNN 1200 needs to extract features from skeletal information for each person, and reconstruct and predict skeletal information from the extracted features, so it can be used not only during machine learning. There is also the problem that the processing load for determining exceptional behavior is high even when used, and the determination process takes time.
  • the present disclosure has been made in view of the above-mentioned problems, and provides an exceptional behavior determination method and an exceptional behavior that can reduce the learning cost and discrimination processing cost of a machine learning model for determining exceptional behavior.
  • the purpose of this invention is to provide a discrimination program and an exceptional behavior discrimination device.
  • an exceptional behavior determination method is a method for determining exceptional behavior of an image object shown in a video image, and the feature points of the image object are extracted for each frame of the video image.
  • the feature point extraction step uses a machine learning model to extract behavioral features from the extraction results of feature points, and the behavior related to the extracted behavioral features is determined as an exceptional behavior.
  • a step of determining the behavior as an exceptional behavior if the statistical frequency of occurrence at the shooting site of the video footage is lower than a standard; machine learning has been completed using the extraction results of the feature points, and the learning video footage does not include information regarding whether the behavior of the image object shown in the learning video footage is an exceptional behavior.
  • the learning video footage may include at least a video footage shot at a location different from the shooting location of the video footage.
  • the feature points may be joint points of the image object.
  • the feature points may be vertices of a rectangle surrounding the image object.
  • the extraction result of the feature point may include the coordinate values of the feature point in the frame from which the feature point was extracted, and the order of the frame from which the feature point was extracted in the video image.
  • the information regarding the feature point includes a detection score indicating that the feature point has been detected plausibly, a label indicating the type of the image object, an attribute indicating the type of the feature point, and a detection score indicating that the feature point is likely to be detected. It may include at least one or more of an attribute representing appearance.
  • feature points may be extracted from one or more frames included in the video image.
  • feature points may be extracted from one or more frames included in the video image by neural calculation using a machine learning model.
  • the machine learning model that performs the neural calculation may include at least one of a convolutional neural network and a self-attention mechanism.
  • the behavioral feature amount may be extracted using a machine learning model that receives information regarding the feature points as input.
  • the machine learning model may be a permutation invariant deep neural network.
  • the deep neural network may be PointNet.
  • behavior features are extracted from a preceding video image shot at the shooting site before the video footage from which the behavior features have been extracted are taken, and the extracted behavior features are used to determine the exceptional behavior.
  • the range of behavioral features to be determined may be adjusted.
  • the behavior discrimination step calculates statistics of a statistical distribution of behavior features from the behavior features extracted from the preceding video image, and uses the calculated statistics to determine a range of behavior features to be determined as an exceptional behavior. may be determined.
  • the statistical distribution of the behavioral feature may be a Gaussian distribution or a mixed Gaussian distribution.
  • the method may also include an exceptional behavior classification step of determining which specific behavior the image object is similar to among known specific behaviors that the image object can take.
  • the exceptional behavior classification step calculates the statistics of the statistical distribution of behavioral features related to the specific behavior from the video footage showing the image object taking the specific behavior, and uses the calculated statistics to classify the specific behavior.
  • the range of behavioral features of similar exceptional behaviors may be determined.
  • the degree of similarity between the specific behavior and the exceptional behavior may be calculated using a machine learning model that is machine learned using a video image showing an image object taking a specific behavior.
  • An exceptional behavior discrimination program is an exceptional behavior discrimination program that causes a computer to discriminate exceptional behavior of an image object shown in a video image, and extracts feature points of the image object for each frame of the video image.
  • the feature point extraction step uses a machine learning model to extract behavioral features from the extraction results of feature points, and the behavior related to the extracted behavioral features is determined as an exceptional behavior. If the frequency of occurrence is statistically lower than the standard at the video footage shooting site, the computer executes a behavior discrimination step of determining the behavior as an exceptional behavior, and the machine learning model is extracted for each frame of the learning video footage.
  • machine learning has been performed using the extraction results of the feature points of the image object, and the learning video footage includes information regarding whether the behavior of the image object shown in the learning video footage is an exceptional behavior. It is characterized by not being
  • An exceptional behavior discrimination device is an exceptional behavior discrimination device that discriminates exceptional behavior of an image object shown in a video image, and has a feature of extracting feature points of the image object for each frame of the video image.
  • a point extraction means a behavioral feature extraction means for extracting behavioral features from the extraction results of feature points using a machine learning model, and a video image that is used to determine exceptional behavior regarding the behavior related to the extracted behavioral features. If the frequency of occurrence is statistically lower than the standard at the shooting site, the machine learning model is configured to determine the behavior as an exceptional behavior if the frequency of occurrence is statistically lower than the standard.
  • Machine learning has been performed using the extraction results of feature points, and the learning video footage does not include information regarding whether or not the behavior of the image object shown in the learning video footage is an exceptional behavior. shall be.
  • the training video footage that does not include information regarding whether or not the behavior of the image object is an exceptional behavior can be used for machine learning of the machine learning model.
  • the processing load during the exceptional behavior determination process can be reduced and the processing speed can be improved.
  • FIG. 1 is a diagram showing the main system configuration of a video surveillance system 1 according to a first embodiment of the present disclosure.
  • FIG. 1 is a block diagram showing the main configuration of an exceptional behavior determination device 100 according to a first embodiment of the present disclosure.
  • 1 is a block diagram showing the main configuration of an exceptional behavior determination program 3 according to a first embodiment of the present disclosure.
  • FIG. 3 is a diagram schematically illustrating skeletal information extracted by a skeletal information extractor 301 of the exceptional behavior determination program 3.
  • FIG. FIG. 3 is a diagram illustrating the statistical distribution of behavioral features extracted from a video image taken by a surveillance camera 110 using a machine-learned skeletal information extractor 301 and a behavioral feature extractor 302.
  • FIG. 2 is a block diagram showing the main configuration of a specific exceptional behavior classification program 8 according to a second embodiment of the present disclosure. This is a diagram illustrating the statistical distribution of behavioral features of specific actions extracted from video footage included in the behavioral recognition dataset using a machine-learned skeletal information extractor 301 and a behavioral feature extractor 302. be.
  • the average value and co-value of the statistical distribution of behavioral features of a specific behavior extracted from the video footage included in the behavioral recognition dataset using the machine-learned skeletal information extractor 301 and behavioral feature extractor 302. 3 is a flowchart illustrating a process of estimating a dispersion matrix.
  • the video surveillance system 1 includes an exceptional behavior determination device 100 and a surveillance camera 110 connected by a cable 120.
  • the number of surveillance cameras 110 connected to the exceptional behavior determination device 100 may be plural.
  • the method of connecting the exceptional behavior determination device 100 and the surveillance camera 110 is not limited to the cable 120, but may also be a wireless connection.
  • the surveillance camera 110 is installed at the installation location facing a predetermined photographing direction, and continues to photograph the monitoring target location.
  • the video image taken by the surveillance camera 110 consists of a plurality of frame images.
  • the video image taken by the surveillance camera 110 is transmitted to the exceptional behavior determination device 100 as digital moving image data.
  • the video image taken by the surveillance camera 110 may be read out from the surveillance camera 110 by the exceptional behavior determination device 100 at any time.
  • the exceptional behavior discrimination device 100 sequentially executes exceptional behavior discrimination processing using video images continuously received from the camera 110 or read out from time to time.
  • the person 130 may be reflected in the video image taken by the surveillance camera 110.
  • an example will be described in which an exceptional behavior of a person in a video image is determined, but it goes without saying that the present disclosure is not limited to this. Actions may also be determined.
  • Image objects other than people include, for example, vehicles in road traffic video images, production machines in factory video images, livestock in ranch video images, and the like. Accidents can be detected from the unusual behavior of vehicles in road traffic.
  • the exceptional behavior determination device 100 is a so-called computer, and includes an exceptional behavior determination device main body 101, a display section 102, and an input section 103.
  • the display unit 102 is a device for the exceptional behavior determination device 100 to present information to the user.
  • a liquid crystal display LCD: Liquid Display Panel
  • LCD Liquid Display Panel
  • the display unit 102 may include multiple display screens.
  • the display unit 102 can be used to display a video image taken by the surveillance camera 110, or to display a determination result of an exceptional behavior using the video image.
  • the input unit 103 is a device for the exceptional behavior determination device 100 to receive instructions from the user.
  • the input unit 103 is a keyboard and a pointing device.
  • a mouse may be used, or a device other than a mouse, such as a trackball, may be used.
  • a touch pad may be used as the input unit 103.
  • a touch pad is attached to cover the display screen of the display unit 102, so that the display unit 102 and the input unit 103 constitute a touch panel.
  • the exceptional behavior determination device body 101 includes a CPU (Central Processing Unit) 201, a ROM (Read Only Memory) 202, a RAM (Random Access Memory) 203, a NIC (Network Interface Card) 204, and a storage unit 205. These are connected to each other using an internal bus 211 so that they can communicate with each other.
  • CPU Central Processing Unit
  • ROM Read Only Memory
  • RAM Random Access Memory
  • NIC Network Interface Card
  • the CPU 201 When the CPU 201 is reset by turning on the power to the exceptional behavior discriminating device main body 101, the CPU 201 reads the boot program from the ROM 202, starts it up, uses the RAM 203 as a working storage area, and executes the OS (OS) read from the storage unit 205. Executes software such as Operating System) and exceptional behavior determination program.
  • OS Operating System
  • ROM202 and RAM203 are semiconductor memories.
  • the ROM 202 is a nonvolatile semiconductor memory.
  • the NIC 204 executes processing for communicating with other devices via a communication network such as a LAN (Local Area Network) or the Internet.
  • a communication network such as a LAN (Local Area Network) or the Internet.
  • This communication may be wired communication or wireless communication.
  • communication may be performed by directly connecting another device to the exceptional behavior determination device 100, such as a USB (Universal Serial Bus) device.
  • USB Universal Serial Bus
  • the storage unit 205 is a large-capacity storage device that uses not only a storage device built into the exceptional behavior determination device 100 such as a hard disk drive (HDD), but also an external storage device such as cloud storage. It's okay.
  • a storage device built into the exceptional behavior determination device 100 such as a hard disk drive (HDD)
  • HDD hard disk drive
  • cloud storage an external storage device
  • the storage unit 205 also stores video footage captured by the surveillance camera 110, parameters and learning data of a neural network, which is a machine learning model used in the exceptional behavior discrimination program, as will be described later. The results of determination by the exceptional behavior determination program are also stored.
  • the CPU 201 accesses the display section 202, the input section 203, and the surveillance camera 110 via the internal bus 211. Further, the video image taken by the surveillance camera 110 is stored in the storage unit 205 via the cable 120 and the internal bus 211.
  • the exceptional behavior determination program uses a machine learning model, particularly a neural network, to determine exceptional behavior shown in a video image.
  • the exceptional behavior discrimination program 3 includes a skeleton information extractor 301, a behavioral feature extractor 302, and an exceptional behavior discriminator 303.
  • Skeleton information extractor 301 When the skeletal information extractor 301 receives input of a video image, the skeletal information extractor 301 performs neural calculations using a machine learning model to extract the relevant joints for each joint point that makes up the skeleton of an image object such as a person for each frame that makes up the video image. Extract information about points.
  • the information regarding the joint points is the coordinates of the joint points, the detection score, the label representing the type of the image object, the attribute representing the type of the joint point, and the attribute representing the appearance of the image object.
  • the coordinates of a joint point are the XY coordinate values of the joint point within a frame and the frame number of the frame.
  • the detection score of a joint point is a probability indicating that the joint point is likely to be detected. If it is certain that the joint point is located at the detected coordinates, the probability is close to 1, and if it is not very certain, the probability becomes small.
  • the label representing the type of image object is information representing the class to which the image object including the joint point belongs. For example, if the label value is "0", the image object is a “person”, if it is “1”, it is a “shovel”, and if it is "2", the image object is a "person”. can be a reflection of a "racquet”, etc.
  • a vector amount having these as components may be used as a label of the image object.
  • the attribute of a joint point is the class to which the joint point belongs. For example, if the joint point attribute is "0", the joint point represents "elbow”, if it is “1”, it represents “wrist”, and if it is "2", it represents "shoulder”. It can be expressed as, for example.
  • a vector quantity having these as components may be used as an attribute of a joint point.
  • the attribute representing the appearance of an image object is information representing the class to which the appearance of the image object including the joint point belongs.
  • An attribute representing the appearance of an image object may also be a vector quantity.
  • OpenPose can be used as the skeleton information extractor 301.
  • OpenPose is a software developed by Carnegie Mellon University's CTTEC (Center for Technology Transfer and Enterprise Creation) that can detect the joint points of multiple people and the connections between the joint points in real time. (See Patent Document 2).
  • the skeleton information extractor 301 preferably includes at least one of a convolutional neural network (CNN) and a self-attention mechanism.
  • CNN convolutional neural network
  • self-attention mechanism a convolutional neural network
  • the information regarding the joint points only needs to include at least the coordinates of the joint points.
  • the joint point detection score, the label representing the type of image object, the joint point attribute, and the attribute representing the appearance of the image object may not be used as information regarding the joint point, or at least one or more of them may be used. good.
  • the skeleton information extractor 301 outputs the skeleton information extracted from the video image to the behavioral feature extractor.
  • FIG. 4 is a diagram schematically showing the XY coordinates of joint points of the skeletal information extracted by the skeletal information extractor 301 for a person walking indoors.
  • the joint coordinates of a person walking indoors are extracted for each frame from frame #1 to frame #4 to constitute skeletal information.
  • the wall surface serving as the background of the walking person is not included in the skeleton information.
  • an information extractor that extracts information other than skeletal information from the video image may be used.
  • information other than skeletal information may be used.
  • the skeleton information information regarding the vertices of a rectangular area surrounding the image object may be extracted for each frame.
  • the XY coordinate values of the vertex within the frame and the frame number are used as the coordinates of the vertex.
  • a probability indicating that the vertex is likely to be detected is used.
  • the label of the image object information representing the class to which the image object surrounded by the rectangular area having the vertex belongs is used.
  • the attribute of a vertex represents the position of the vertex in the rectangular area that includes the vertex. For example, when the attribute of a vertex is "0", the vertex is the "top left” vertex of the rectangular area, and when the attribute is "1", it is the "top right” vertex.
  • YOLO is a deep neural network (DNN) that extracts a bounding box surrounding an image object included in an image and a predicted class probability to which the image object belongs (see Non-Patent Document 3).
  • DNN deep neural network
  • a DNN refers to a multilayer neural network, especially one with four or more layers. DNNs are expected to be able to extract behavioral features with high accuracy, so they are being applied not only to image recognition processing but also to a variety of fields.
  • the behavioral feature quantity is a vector quantity. Each component of the vector corresponds to a behavior label of an image object such as a person related to the behavior feature amount.
  • a behavioral feature is a fixed-length vector in the sense that the number of components is fixed.
  • the behavior feature extractor 302 is used to extract the behavior using the information about the vertices of the rectangular area.
  • Feature quantities may also be extracted.
  • the behavior feature is a fixed-length vector in which each component corresponds to the behavior label of the image object.
  • a deep neural network can be used for the behavioral feature extractor 302.
  • PointNet is used as the DNN to extract behavioral features from skeletal information (see Non-Patent Document 4).
  • PointNet is a Permutation Invariant neural network. That is, even if the order of information regarding joint points in the skeleton information received from the skeleton information extractor 301 is changed, the same behavioral feature amount can be extracted.
  • PointNet does not include processing to combine the feature values for each joint point, so if some joint points are not detected or are detected in the wrong position, the influence of incorrect feature values propagating to lower layers is reduced. few. Therefore, even if an error occurs in joint point detection when an image object takes an exceptional behavior, it is possible to extract behavioral features that have little variation, so the behavior of the image object can be determined from the statistical distribution of the frequency of occurrence of the behavioral features. It is possible to determine whether the behavior is an exceptional behavior or not.
  • the behavioral feature extractor 302 may extract behavioral features using a neural network other than PointNet. Furthermore, the behavioral feature extractor 302 may extract behavioral features using a neural network other than Permutation Invariant. In either case, the purpose of the present disclosure can be achieved by determining the exceptional behavior using the exceptional behavior discriminator 303 as described below.
  • the video footage included in the behavior recognition dataset is used as the learning video footage
  • the skeletal information extracted by the skeletal information extractor 301 using the learning video is used to generate behavioral features.
  • Machine learning of the quantity extractor 302 is performed.
  • the behavior recognition data set usually does not include video footage shot at the location where the surveillance camera 110 is installed, and includes at least video footage other than the video footage shot at the location where the surveillance camera 110 is installed. That is, the action recognition data set includes video footage taken at a location different from where the surveillance camera 110 is installed.
  • the statistical distribution of the frequency of occurrence of the behavioral features extracted from the video footage included in the behavior recognition dataset is the same as the frequency of occurrence of the behavioral features extracted only from the video footage shot at the location where the surveillance camera 110 is installed. There is no guarantee that it will match the statistical distribution of
  • the video footage included in the behavior recognition dataset is used to determine whether the behavior of the image object shown in the video footage corresponds to an exceptional behavior in the video footage captured at the location where the surveillance camera 110 is installed. It does not contain information about whether or not.
  • the video footage taken at the location where the surveillance camera 110 is installed is not used for machine learning by the behavioral feature extractor 302.
  • the learning cost can be significantly reduced.
  • the behavior feature extractor 302 outputs the behavior feature extracted from the skeleton information to the exceptional behavior discriminator 303.
  • (1-3-3) Exceptional behavior discriminator 303 The exceptional behavior discriminator 303 determines from the behavior feature amount received from the behavior feature amount extractor 302 whether the behavior of the image object related to the behavior feature amount is an exceptional behavior.
  • a statistical distribution of behavioral features extracted using video images taken at the location where the surveillance camera 110 is installed (hereinafter referred to as "statistical distribution of the site") is obtained in advance, and The Mahalanobis distance D1 from the center of the statistical distribution of is compared with the threshold Dth1.
  • the Mahalanobis distance D1 is larger than the threshold Dth1, it is determined that the frequency at which the behavioral feature is extracted from the video footage captured by the surveillance camera 110 at the installation location is less than or equal to the frequency of occurrence corresponding to the threshold Dth1. It turns out.
  • the statistical frequency of occurrence of the behavior related to the behavioral feature is less than or equal to the frequency of occurrence corresponding to the threshold Dth1, and the frequency of occurrence of the behavior is low, so it is an exceptional behavior. , it is determined.
  • the Mahalanobis distance D1 is less than or equal to the threshold value Dth1, it is determined that the behavior is not an exceptional behavior because the behavior related to the behavior feature has a high statistical frequency of occurrence at the shooting site of the video image.
  • the skeletal information extractor 301 extracts skeletal information using the video footage taken by the surveillance camera at the installation location, and the behavioral feature extractor 302 extracts the extracted behavioral features using the obtained skeletal information.
  • t is a symbol representing transposition of a vector.
  • ⁇ 1, . . . , ⁇ n are average values for each component of the behavioral feature amount in the statistical distribution of the site, and the average value ⁇ is a vector having these as components.
  • x1, . . . , xn are each component of the behavioral feature x extracted by the behavioral feature extractor 302.
  • is a covariance matrix regarding the statistical distribution of the field.
  • the covariance matrix of sample values becomes the maximum likelihood estimate of the covariance matrix of the statistical distribution of the field.
  • this maximum likelihood estimate is used as the covariance matrix ⁇ .
  • the mean value ⁇ and the covariance matrix ⁇ are statistics related to the statistical distribution of the field.
  • the result of determining exceptional behavior is likely to be influenced by the component of behavioral features that has a large variance value in the statistical distribution of the scene.
  • Mahalanobis distance the inverse matrix of the covariance matrix of the statistical distribution of the site is multiplied, so the exceptional behavior is It can be determined whether or not.
  • the behavioral feature is a two-dimensional vector
  • the on-site statistical When the Mahalanobis distance D1 from the center of the distribution is larger than the threshold Dth1, the behavior related to the behavior feature is determined to be an exceptional behavior.
  • Machine learning of the skeletal information extractor 301 and behavioral feature extractor 302 As described above, the skeletal information extractor 301 and the behavioral feature extractor 302 are both neural networks, and are used to determine exceptional behavior. Before using it, it is necessary to perform machine learning to determine the necessary parameters.
  • an existing behavior recognition data set is used for this machine learning instead of the actual video footage captured by the surveillance camera 110 (hereinafter referred to as "on-site footage").
  • General open data can be used as this behavior recognition data set.
  • the machine learning of the skeletal information extractor 302 and the behavior feature extractor 302 can be completed at the time of developing the exceptional behavior discrimination device 100. Furthermore, the parameters obtained through this machine learning can be used regardless of where the surveillance camera is installed.
  • the exceptional behavior discriminator 303 does not need to perform machine learning in the first place.
  • (1-4-1) Machine learning of the skeletal information extractor 301 Since the skeletal information extractor 301 extracts information about joint points for each frame of the video image, the machine learning of the skeletal information extractor 301 uses still images. Using behavioral recognition dataset. During machine learning, first, a still image for learning is input, and the skeleton information extractor 301 is caused to output skeleton information.
  • an error function (also referred to as a loss function) is used to calculate the error between the correct skeleton information included in the action recognition dataset and the skeleton information output by the skeleton information extractor 301.
  • the parameters of the skeleton information extractor 301 are modified using backpropagation.
  • a still image for learning is input to the skeletal information extractor 301 whose parameters have been corrected, skeletal information is output, and an error is determined.
  • This still image may be the same still image as the previously used still image, or may be a different still image.
  • the parameters of the skeleton information extractor 301 are further modified using the back propagation method in the same manner as above, and the above processing is repeated. If the obtained error is less than or equal to a predetermined threshold, machine learning is terminated. (1-4-2) Machine learning of the behavioral feature extractor 302 Machine learning of the behavioral feature extractor 302 is performed after the machine learning of the skeleton information extractor 301 is completed.
  • the behavioral feature extractor 302 inputs the video image of the behavioral recognition dataset frame by frame to the behavioral feature extractor 302, and uses the obtained skeletal information as machine learning data. In this machine learning, a behavioral teacher label is prepared for each video image.
  • skeletal information is input to the behavioral feature extractor 302 to extract behavioral features. Furthermore, a score for each behavior label is calculated from the extracted behavior features. The calculated score for each behavior label is compared with a behavior teacher label prepared in advance to find an error. When calculating this error, an error function such as a cross-entropy error is used.
  • the skeleton information is input to the behavioral feature extractor 302 using the corrected parameters. and find the error.
  • This skeletal information may be the same skeletal information as the previously used skeletal information, or may be different skeletal information.
  • the parameters of the feature extractor 302 are further modified using the backpropagation method in the same way as above, and the above process is repeated. If the obtained error is less than or equal to a predetermined threshold, machine learning is terminated.
  • the behavioral feature extractor 302 performs machine learning as a classification problem.
  • (1-5) Identification of the statistical distribution of behavioral features used in the exceptional behavior discriminator 303 As described above, the exceptional behavior discriminator 303 specifies the statistical distribution of behavioral features at the shooting site of the surveillance camera 110 (the (statistical distribution) to determine exceptional behavior.
  • the skeletal information extractor 301 and behavioral feature extractor 302 which have undergone machine learning, are used.
  • the detection frequency of behavioral features is initialized to 0 (S601). If the number of possible values of the behavioral feature is small, the detection frequency of the behavioral feature may be counted for each possible value. Furthermore, if there are many types of values that the behavior feature amount can take, the values may be counted for each range of values that the behavior feature amount can take.
  • the video footage captured by the surveillance camera 110 at the filming location is input frame by frame to the skeletal information extractor 301 to extract skeletal information (S602), and the obtained skeletal information is input to the behavioral feature extractor 302. Input and extract behavioral feature amounts (S603).
  • the detection frequency of the behavioral feature obtained in this way or the detection frequency in the range including the behavioral feature is updated (S604). That is, the count value is increased by one.
  • the video image is a preceding video image in the sense that it is input to the exceptional behavior determination program 3 before the video image input to the exceptional behavior determination program 3 for determining exceptional behavior.
  • step S602 After that, if there is video footage taken by the surveillance camera 110 at the filming location that has not been used to count the detection frequency of behavioral features (S605: YES), the process advances to step S602 and the video footage is unused. The same process as above is performed using the video image of .
  • the average value and co-value of the detection frequency (sample value) of the behavioral feature are calculated using the fact that the detection frequency of the behavioral feature follows a multivariate Gaussian distribution.
  • a variance matrix is calculated, and these are estimated as the mean value ⁇ and covariance matrix ⁇ (S606), and the process ends.
  • the Mahalanobis distance D1 can be calculated as described above.
  • the exceptional behavior determination program 3 determines exceptional behavior as follows.
  • the exceptional behavior discrimination device 100 inputs the video image captured by the surveillance camera 110 to the skeleton information extractor 301, as shown in FIG. 7 (S701).
  • the skeleton information extractor 301 it is not necessary to input all the frames constituting the video image to the skeleton information extractor 301, and they may be input to the skeleton information extractor 301 at intervals of a predetermined number of frames. Further, the number of frames input to the skeletal information extractor 301 at one time is determined according to the skeletal information input to the behavioral feature extractor 302.
  • step S701 If the video image input to the skeleton information extractor 301 does not include an image object that has been machine learned in advance, such as a person (S702: NO), the process advances to step S701 and the above process is repeated.
  • the skeletal information extracted by the skeletal information extractor 301 is input to the behavioral feature extractor 302. Then, behavioral features are extracted (S703).
  • the exceptional behavior determiner 303 calculates the Mahalanobis distance D1 between the behavior feature extracted by the behavior feature extractor 302 and the center of the statistical distribution at the scene as described above (S704), and The Mahalanobis distance obtained is compared with the threshold value Dth1.
  • the Mahalanobis distance D1 is larger than the threshold Dth1 (S705: YES), it is determined that the video image shows an exceptional behavior (S706). On the other hand, if the Mahalanobis distance D1 is less than or equal to the threshold Dth1 (S705: NO), it is determined that the video image shows normal behavior (S707).
  • steps S706 and S707 the process advances to step S701 and the above process is repeated.
  • the exceptional behavior discriminator 303 simply calculates the Mahalanobis distance D1 and compares it with the threshold Dth1, so that exceptional behavior can be determined from the reconstruction error and prediction error of skeletal information, as in the prior art. Compared to the case, the processing load on the exceptional behavior determination device 100 becomes lighter and the processing speed becomes faster.
  • the second embodiment of the present disclosure in addition to determining exceptional behavior as in the first embodiment, determines whether the behavior determined to be an exceptional behavior is a known Determine whether the specific behavior is similar to any one.
  • the specific exceptional behavior classification program 8 As shown in FIG. 8, the specific exceptional behavior classification program 8 according to the present embodiment has the same configuration as the exceptional behavior discrimination program 3 according to the first embodiment.
  • a specific exceptional behavior classifier 804 is added after the skeletal information extractor 801, behavior feature extractor 802, and exceptional behavior classifier 803.
  • the specific exceptional behavior classifier 804 calculates the degree of similarity between the particular exceptional behavior and the known specific behavior using the behavioral features that the exceptional behavior discriminator 803 has determined to be exceptional behavior, and based on this degree of similarity. , outputs the classification results of the exceptional behavior.
  • the specific exceptional behavior classifier 804 calculates the average value M and covariance matrix ⁇ of the statistical distribution (multivariate Gaussian distribution) of behavioral features for each specific behavior in advance, and calculates the behavioral features of the exceptional behavior and the covariance matrix ⁇ .
  • the Mahalanobis distance D2 from the average value M is calculated.
  • the mean value M and the covariance matrix ⁇ are statistics related to the statistical distribution of behavioral features for each specific behavior.
  • the skeletal information extractor 301 extracts skeletal information using a video image shot using the surveillance camera 110 at the location where the surveillance camera 110 is installed, and the behavioral feature amount is extracted from the obtained skeletal information.
  • the behavioral feature quantity extracted by the extractor 302 is used as a sample value, and the average value and covariance matrix of the observed values are calculated using this sample value.
  • the mean value and covariance matrix calculated using sample values are the maximum likelihood estimated values of the mean value M and covariance matrix ⁇ of a multivariate Gaussian distribution.
  • the maximum likelihood estimate of the mean value M and the covariance matrix ⁇ will be simply referred to as the mean value M and the covariance matrix ⁇ , respectively.
  • the average value M corresponds to the behavior feature amount of the specific behavior.
  • t is a symbol representing transposition of a vector.
  • M1, . . . , Mn are average values for each component of the behavior feature in the statistical distribution of the behavior feature of the specific behavior, and the average value M is a vector having these as components.
  • x1, . . . , xn are each component of the behavioral feature amount x of the exceptional behavior.
  • is a covariance matrix regarding the statistical distribution of behavioral features of a specific behavior.
  • the mean value M and the covariance matrix ⁇ are statistics regarding the statistical distribution of the field.
  • the obtained Mahalanobis distance D2 is the degree of similarity between the exceptional behavior and the specific behavior. That is, the smaller the Mahalanobis distance D2, the higher the degree of similarity, while the larger the Mahalanobis distance D2, the lower the degree of similarity.
  • the Mahalanobis distance D2 is less than or equal to the predetermined threshold Dth2, it is determined that the exceptional behavior is similar to the specific behavior. On the other hand, if the Mahalanobis distance D2 is larger than the predetermined threshold Dth2, it is determined that the exceptional behavior is not similar to the specific behavior. (2-2) Identification of the statistical distribution of behavioral features used in the specific exceptional behavior classifier 804 The specific exceptional behavior classifier 804 uses the statistical distribution of behavioral features of behaviors that are determined to be specific behaviors in the behavior recognition dataset. Classify exceptional behavior using
  • the skeletal information extractor 301 and behavioral feature extractor 302 which have undergone machine learning, are used.
  • a video image of a specific behavior in the behavior recognition dataset is input to the skeletal information extractor 301 frame by frame. , extracts skeletal information (S1002), inputs the obtained skeletal information to the behavioral feature extractor 302, and extracts behavioral features (S1003).
  • the detection frequency of the behavioral feature obtained in this way is updated (S1004). That is, the count value is increased by one. Note that, similarly to the statistical distribution of the scene in the first embodiment, instead of the detection frequency of the behavior feature, the detection frequency in the range including the behavior feature may be updated.
  • step S1002 Processing similar to the above is performed using unused video footage.
  • the mean value M and the covariance matrix ⁇ are calculated from the detection frequency of the behavioral features using the fact that the detection frequency of the behavioral features follows a multivariate Gaussian distribution. is estimated (S1006), and the process ends.
  • the Mahalanobis distance D2 can be calculated as described above.
  • the exceptional behavior discrimination device 100 classifies which specific behavior the exceptional behavior is similar to, as shown in FIG. Note that steps S1101 to S1107 in FIG. 11 are the same processes as steps S701 to S707 in FIG. 7, so a description thereof will be omitted.
  • the processes from steps S1109 to S1112 are executed for each specific behavior (S1108, S1113). That is, the Mahalanobis distance D2 between the center of the statistical distribution of the specific behavior and the behavior feature of the exceptional behavior is calculated (S1109), and the Mahalanobis distance D2 is compared with the threshold Dth2.
  • the Mahalanobis distance D2 is less than or equal to the threshold Dth2 (S1110: YES), it is determined that the exceptional behavior is similar to the specific behavior (S1111). On the other hand, if the Mahalanobis distance D2 is larger than the threshold Dth2 (S1110: NO), it is determined that the exceptional behavior is not similar to the specific behavior (S1112).
  • the statistical distribution of behavioral features follows a multivariate Gaussian mixture distribution
  • the statistics mean value ⁇ and covariance matrix ⁇
  • the Mahalanobis distance D1 from the average value of any of these multivariate Gaussian distributions to the behavior feature that is the target of exceptional behavior determination is greater than the threshold Dth1, the behavior is determined to be exceptional behavior. It's okay.
  • the Mahalanobis distance D1 for any multivariate Gaussian distribution is less than or equal to the threshold Dth1, it is determined that the behavior is normal (not exceptional behavior).
  • the specific behavior classifier 804 of the specific exceptional behavior classification program 8 uses statistics regarding the statistical distribution of behavioral features for each specific behavior to calculate the similarity of exceptional behaviors.
  • a video image showing an image object taking a specific action is used as a training video image
  • a label that identifies the specific action taken by the image object is used as a teacher signal
  • a machine learning model is used to determine the exceptional behavior. It may also be determined whether the behavior is similar to a specific behavior.
  • the present disclosure is an example of an exceptional behavior discrimination device that discriminates exceptional behavior, and an exceptional behavior discrimination program that causes the exceptional behavior discrimination device to discriminate exceptional behavior.
  • the present disclosure is not limited to these, but in addition to these, it may be an exceptional behavior determination method that is executed by the exceptional behavior determination device when determining exceptional behavior.
  • the Euclidean distance may be used.
  • the behavior feature extractor 302 may be a permutation equivariant.
  • behavioral features may be extracted for each person using PointNet, or other permutation equivariant methods may be used.
  • skeletal information and behavior feature amounts can be associated for each person according to the order of the skeletal information extracted for each person by the skeletal information extractor 301. Therefore, it is effective when it is desired not only to determine whether or not an exceptional behavior has been performed, but also to determine the person who has performed the exceptional behavior.
  • the presence or absence of exceptional behavior can be determined from the distribution of joint points in a three-dimensional space consisting of the XY coordinates and frame numbers of frames, without distinguishing skeletal information for each person. May be determined. In this case, information regarding all joint points is input to PointNet to extract behavioral features.
  • PointNet to extract behavioral features.
  • action recognition data sets other than those described above may be used.
  • a behavioral recognition data set that records vehicle operating conditions, it is possible to identify unusual driving situations such as driving under the influence of alcohol. It is desirable to use an appropriate behavior recognition data set depending on what kind of behavior of a desired image object is desired to be determined as an exceptional behavior.
  • the storage unit 205 may be a cloud storage, but not only the storage unit 205 but also the exceptional behavior determination device 100 itself may be a cloud server.
  • the performance of the exceptional behavior determination device 100 can be scaled according to the number of surveillance cameras 110 and the resolution and frame rate of the surveillance cameras 110. It can be done.
  • exceptional behavior determination method exceptional behavior determination program, and exceptional behavior determination device according to the present disclosure are useful as a technique for reducing the learning cost for performing exceptional behavior determination processing using a machine learning model.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Multimedia (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Software Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Artificial Intelligence (AREA)
  • Health & Medical Sciences (AREA)
  • Image Analysis (AREA)

Abstract

Provided are an unusual behavior discrimination method, an unusual behavior discrimination program, and an unusual behavior discrimination device which can reduce the discrimination processing cost and the learning cost of a machine learning model for discriminating unusual behavior. An unusual behavior discrimination program 3 comprises a skeletal information extractor 301, a behavior feature amount extractor 302, and an unusual behavior discriminator 303. The skeletal information extractor 301 extracts skeletal information from a video image. The behavior feature amount extractor 302 extracts a behavior feature amount from the skeletal information. The skeletal information extractor 301 and the behavior feature amount extractor 302 are machine learning models, and machine learning is performed using a general behavior recognition data set. Subsequently, a behavior feature amount is extracted from a video image captured at the same imaging site as a video image to be discriminated for unusual behavior, and a statistical amount of said feature amount is derived in advance. At the time of unusual behavior discrimination, the unusual behavior discriminator 303 discriminates unusual behavior by comparing the behavior feature amount to be discriminated and the statistical amount.

Description

例外行動判別方法、例外行動判別プログラムおよび例外行動判別装置Exceptional behavior discrimination method, exceptional behavior discrimination program, and exceptional behavior discrimination device
 本開示は、例外行動判別方法、例外行動判別プログラムおよび例外行動判別装置に関し、特に、機械学習モデルによって例外行動の判別処理を行うための学習コストを削減する技術に関する。 The present disclosure relates to an exceptional behavior determination method, an exceptional behavior determination program, and an exceptional behavior determination device, and particularly relates to a technique for reducing learning costs for performing exceptional behavior determination processing using a machine learning model.
 近年、ビデオ映像を用いて、その撮影現場で例外的に発生する行動(以下、「例外行動」という。)を検知する技術が、監視カメラを含む多くのアプリケーションで必要となるため、脚光を浴びている。 In recent years, technology that uses video footage to detect unusual behavior at filming locations (hereinafter referred to as "exceptional behavior") has been in the spotlight as it is required for many applications, including surveillance cameras. ing.
 そのような技術として、当該撮影現場における正常行動を撮影したビデオ映像を学習データとして機械学習しておき、当該機械学習モデルを用いて例外行動を外れ値として検出するものが知られている。 As such a technique, there is a known technique in which video footage of normal behavior at the filming location is used as learning data for machine learning, and the machine learning model is used to detect exceptional behavior as an outlier.
 例えば、非特許文献1に記載された従来技術では、図12に示すように、まず、骨格情報抽出器1211を用いて、ビデオ映像を構成する複数のフレーム(1番目からT番目までのT個とする)から人物の骨格に関する情報を抽出する。 For example, in the conventional technology described in Non-Patent Document 1, as shown in FIG. Extract information about a person's skeleton from
 特徴量抽出器(Encoders)1201は、1番目からT番目のフレームに対応するT個の骨格情報を順次受け付けると、特徴量を抽出する。 When the feature extractor (encoders) 1201 sequentially receives T pieces of skeleton information corresponding to the first to T frames, it extracts the features.
 再構成器(Reconstructing Decoders)1202は、特徴量抽出器1201が抽出した特徴量を用いて、1番目からT番目までのフレームに対応するT個の骨格情報をそれぞれ再構成するためのT個の再構成用情報を順次生成する。 Reconstructing decoders 1202 use the features extracted by the feature extractor 1201 to reconstruct T pieces of skeleton information corresponding to the first to T frames. Reconfiguration information is generated sequentially.
 多層パーセプトロン(MLP: Multi-Layer Perceptron)1203は、再構成器1202が生成したT個の再構成用情報を用いて、1番目からT番目までの各フレームに対応するT個の骨格情報を再構成する。 A Multi-Layer Perceptron (MLP) 1203 uses the T pieces of reconstruction information generated by the reconstructor 1202 to reconstruct T pieces of skeleton information corresponding to each frame from the 1st to the Tth frame. Configure.
 一方、予測器(Predicting Decoders)1204は、特徴量抽出器1201が抽出した特徴量を用いて、T+1番目からT+P番目の各フレームに対応するP個の骨格情報をそれぞれ予測するためのP個の予測用情報を順次生成する。 On the other hand, a predictor (Predicting Decoders) 1204 uses the feature quantities extracted by the feature quantity extractor 1201 to generate P pieces of skeleton information for respectively predicting P pieces of skeleton information corresponding to each of the T+1th to T+Pth frames. Prediction information is generated sequentially.
 多層パーセプトロン1205は、予測器1204が生成したP個の予測用情報を用いて、T+1番目からT+P番目までの各フレームに対応するP個の骨格情報を予測する。 The multilayer perceptron 1205 uses the P pieces of prediction information generated by the predictor 1204 to predict P pieces of skeleton information corresponding to each frame from the T+1st to the T+Pth frame.
 その後、再構成したT個の骨格情報の再構成誤差と、予測したP個の骨格情報の予測誤差から、損失関数(Loss Functions)1216を用いて、例外度(Anomaly Score)を算出する。 Thereafter, an anomaly score is calculated using a loss function 1216 from the reconstruction error of the T reconstructed skeleton information and the prediction error of the predicted P skeleton information.
 特徴量抽出器1201、再構成器1202、予測器1204および多層パーセプトロン1203、1205から構成されるMPED-RNN(Message-Passing Encoder-Decoder Neural Network)1200は、いわゆるニューラルネットワークであり、正常行動をしている人物を撮影したビデオ映像を用いて、例外度が小さくなるように機械学習する。 The MPED-RNN (Message-Passing Encoder-Decoder Neural Network) 1200, which is composed of a feature extractor 1201, a reconstructor 1202, a predictor 1204, and multilayer perceptrons 1203 and 1205, is a so-called neural network and is configured to perform normal behavior. Machine learning is performed to reduce the degree of anomaly using video footage of people in the room.
 このようにすれば、ニューラルネットワークの汎化能力によって、機械学習に用いたビデオ映像以外のビデオ映像についても、正常行動をしている人物については例外度が小さくなる。このため、例外度が大きい人物については、あらかじめ機械学習した正常行動とは異なる例外行動をしていると判別することができる。 In this way, due to the generalization ability of the neural network, the degree of anomaly will be reduced for video footage other than the video footage used for machine learning for people who are behaving normally. Therefore, it is possible to determine that a person with a high degree of exceptionality is exhibiting exceptional behavior that is different from normal behavior that has been machine learned in advance.
 しかしながら、正常行動をしている人物については例外度が小さくなるように機械学習をするだけでは、例外行動をしている人物のすべてについて例外度が大きくなるという保証はない。 However, simply using machine learning to reduce the degree of exceptionality for people who behave normally does not guarantee that the degree of exceptionality will increase for all people who behave exceptionally.
 このため、例外行動をしている人物であるにも関わらず、例外度が小さくなる可能性を排除することができず、正常行動をしている人物であると誤って判別してしまう恐れがある。 For this reason, it is not possible to exclude the possibility that the degree of exceptionality will be small even though the person is exhibiting exceptional behavior, and there is a risk that the person may be mistakenly identified as exhibiting normal behavior. be.
 このような誤判別をしないようにするためには、正常行動と判別される行動の範囲を絞り込むことが必要になる。 In order to avoid such misjudgments, it is necessary to narrow down the range of behaviors that are determined to be normal behaviors.
 例えば、監視カメラで撮影したビデオ映像から例外行動を判別する場合、監視カメラの設置場所で実際に撮影したビデオ映像(以下、「現場映像」という。)を教師データとして、MPED-RNN1200の機械学習に用いれば、撮影現場で行われる正常行動を過不足なく網羅し易くなるので、正常行動と判別される行動の範囲を適切に最小化できると考えられる。 For example, when determining exceptional behavior from video footage shot by a surveillance camera, MPED-RNN1200's machine learning method uses video footage actually shot at the location where the surveillance camera is installed (hereinafter referred to as "on-site footage") as training data. If used in this way, it will be easier to cover the normal behavior performed at the shooting site without excess or deficiency, so it is thought that the range of behavior that is determined to be normal behavior can be appropriately minimized.
 しかしながら、この対策を採用すると、監視カメラの設置場所ごとに、MPED-RNN 1200の機械学習を行わなければならないので、そのための電力や、計算時間、開発のための人的リソース、場合によってはクラウドサーバーの利用料などといった学習コストが膨大になってしまう。 However, if this measure is adopted, machine learning of the MPED-RNN 1200 must be performed for each location where the surveillance camera is installed, which requires power, calculation time, human resources for development, and in some cases cloud computing. Learning costs such as server usage fees become enormous.
 また、上述のように、MPED-RNN 1200は人物ごとに骨格情報から特徴を抽出したり、抽出した特徴から骨格情報を再構成したり予測したりする必要があるので、機械学習時だけでなく、利用時においても例外行動を判別するための処理負荷が高く、判別処理に時間がかかる、という問題もある。 In addition, as mentioned above, the MPED-RNN 1200 needs to extract features from skeletal information for each person, and reconstruct and predict skeletal information from the extracted features, so it can be used not only during machine learning. There is also the problem that the processing load for determining exceptional behavior is high even when used, and the determination process takes time.
 本開示は、上述のような問題に鑑みて為されたものであって、例外行動を判別するための機械学習モデルの学習コストおよび判別処理コストを低減することができる例外行動判別方法、例外行動判別プログラム及び例外行動判別装置を提供することを目的とする。 The present disclosure has been made in view of the above-mentioned problems, and provides an exceptional behavior determination method and an exceptional behavior that can reduce the learning cost and discrimination processing cost of a machine learning model for determining exceptional behavior. The purpose of this invention is to provide a discrimination program and an exceptional behavior discrimination device.
 上記目的を達成するため、本開示の一形態に係る例外行動判別方法は、ビデオ映像に映った画像オブジェクトの例外行動判別方法であって、ビデオ映像のフレーム毎に、画像オブジェクトの特徴点を抽出する特徴点抽出ステップと、機械学習モデルを用いて、特徴点の抽出結果から行動特徴量を抽出する行動特徴量抽出ステップと、抽出した行動特徴量に係る行動について、例外行動の判別対象としたビデオ映像の撮影現場における統計的な発生頻度が基準よりも低ければ、例外行動と判別する行動判別ステップと、を含み、前記機械学習モデルは学習用ビデオ映像のフレーム毎に抽出された、画像オブジェクトの特徴点の抽出結果を用いて、機械学習済みであり、前記学習用ビデオ映像は、当該学習用ビデオ映像に映っている画像オブジェクトの行動が例外行動か否かに関する情報を含んでいないことを特徴とする。 In order to achieve the above object, an exceptional behavior determination method according to an embodiment of the present disclosure is a method for determining exceptional behavior of an image object shown in a video image, and the feature points of the image object are extracted for each frame of the video image. The feature point extraction step uses a machine learning model to extract behavioral features from the extraction results of feature points, and the behavior related to the extracted behavioral features is determined as an exceptional behavior. a step of determining the behavior as an exceptional behavior if the statistical frequency of occurrence at the shooting site of the video footage is lower than a standard; machine learning has been completed using the extraction results of the feature points, and the learning video footage does not include information regarding whether the behavior of the image object shown in the learning video footage is an exceptional behavior. Features.
 この場合において、学習用ビデオ映像は、少なくとも、当該ビデオ映像の撮影場所とは異なる場所で撮影されたビデオ映像を含んでもよい。 In this case, the learning video footage may include at least a video footage shot at a location different from the shooting location of the video footage.
 また、前記特徴点は、前記画像オブジェクトの関節点であってもよい。 Furthermore, the feature points may be joint points of the image object.
 また、前記特徴点は、前記画像オブジェクトを囲む矩形の頂点であってもよい。 Furthermore, the feature points may be vertices of a rectangle surrounding the image object.
 また、前記特徴点の抽出結果は、当該特徴点を抽出したフレームにおける当該特徴点の座標値と、当該特徴点を抽出したフレームの当該動画像における順序と、を含んでもよい。 Furthermore, the extraction result of the feature point may include the coordinate values of the feature point in the frame from which the feature point was extracted, and the order of the frame from which the feature point was extracted in the video image.
 また、前記特徴点に関する情報は、当該特徴点が尤もらしく検出されていることを表す検出スコアと、当該画像オブジェクトの種類を表すラベルと、当該特徴点の種類を表す属性と、当該画像オブジェクトの外観を表す属性と、の少なくとも一つ以上を含んでもよい。 Further, the information regarding the feature point includes a detection score indicating that the feature point has been detected plausibly, a label indicating the type of the image object, an attribute indicating the type of the feature point, and a detection score indicating that the feature point is likely to be detected. It may include at least one or more of an attribute representing appearance.
 また、前記特徴点抽出ステップにおいては、前記ビデオ映像に含まれる一つ以上のフレームから特徴点を抽出してもよい。 Furthermore, in the feature point extraction step, feature points may be extracted from one or more frames included in the video image.
 また、前記特徴点抽出ステップにおいては、前記ビデオ映像に含まれる一つ以上のフレームから、機械学習モデルを用いたニューロ演算によって、特徴点を抽出してもよい。 Furthermore, in the feature point extraction step, feature points may be extracted from one or more frames included in the video image by neural calculation using a machine learning model.
 また、前記ニューロ演算を行う機械学習モデルは、畳み込みニューラルネットワークとセルフアテンション機構との少なくとも一方を含んでもよい。 Furthermore, the machine learning model that performs the neural calculation may include at least one of a convolutional neural network and a self-attention mechanism.
 また、前記行動特徴量抽出ステップは、前記特徴点に関する情報を入力とする機械学習モデルを用いて、前記行動特徴量を抽出してもよい。 Furthermore, in the behavioral feature amount extraction step, the behavioral feature amount may be extracted using a machine learning model that receives information regarding the feature points as input.
 また、前記機械学習モデルは、Permutation Invariantな深層ニューラルネットワークであってもよい。 Furthermore, the machine learning model may be a permutation invariant deep neural network.
 この場合において、前記深層ニューラルネットワークはPointNetであってもよい。 In this case, the deep neural network may be PointNet.
 また、前記行動判別ステップは、前記行動特徴量を抽出したビデオ映像の撮影以前に当該撮影現場で撮影した先行ビデオ映像から行動特徴量を抽出し、抽出された行動特徴量を用いて例外行動と判別する行動特徴量の範囲を調整してもよい。 Further, in the behavior discrimination step, behavior features are extracted from a preceding video image shot at the shooting site before the video footage from which the behavior features have been extracted are taken, and the extracted behavior features are used to determine the exceptional behavior. The range of behavioral features to be determined may be adjusted.
 また、前記行動判別ステップは、前記先行ビデオ映像から抽出した行動特徴量から行動特徴量の統計的分布の統計量を算出し、算出した統計量を用いて例外行動と判別する行動特徴量の範囲を決定してもよい。 Further, the behavior discrimination step calculates statistics of a statistical distribution of behavior features from the behavior features extracted from the preceding video image, and uses the calculated statistics to determine a range of behavior features to be determined as an exceptional behavior. may be determined.
 この場合において、前記行動特徴量の統計的分布は、ガウス分布または混合ガウス分布であってもよい。 In this case, the statistical distribution of the behavioral feature may be a Gaussian distribution or a mixed Gaussian distribution.
 また、前記例外行動が、画像オブジェクトがとり得る既知の特定行動のうち、いずれの特定行動に類似するかを判定する例外行動分類ステップを含んでもよい。 The method may also include an exceptional behavior classification step of determining which specific behavior the image object is similar to among known specific behaviors that the image object can take.
 この場合において、前記例外行動分類ステップは、特定行動をとる画像オブジェクトを映したビデオ映像から特定行動に係る行動特徴量の統計的分布の統計量を算出し、算出した統計量を用いて当該に類似する例外行動の行動特徴量の範囲を決定してもよい。 In this case, the exceptional behavior classification step calculates the statistics of the statistical distribution of behavioral features related to the specific behavior from the video footage showing the image object taking the specific behavior, and uses the calculated statistics to classify the specific behavior. The range of behavioral features of similar exceptional behaviors may be determined.
 或いは、前記例外行動分類ステップは、特定行動をとる画像オブジェクトを映したビデオ映像を用いて機械学習した機械学習モデルを用いて、特定行動と例外行動との類似度を算出してもよい。 Alternatively, in the exceptional behavior classification step, the degree of similarity between the specific behavior and the exceptional behavior may be calculated using a machine learning model that is machine learned using a video image showing an image object taking a specific behavior.
 本開示の一形態に係る例外行動判別プログラムは、ビデオ映像に映った画像オブジェクトの例外行動をコンピューターに判別させる例外行動判別プログラムであって、ビデオ映像のフレーム毎に、画像オブジェクトの特徴点を抽出する特徴点抽出ステップと、機械学習モデルを用いて、特徴点の抽出結果から行動特徴量を抽出する行動特徴量抽出ステップと、抽出した行動特徴量に係る行動について、例外行動の判別対象としたビデオ映像の撮影現場において統計的に発生頻度が基準よりも低ければ、例外行動と判別する行動判別ステップと、をコンピューターに実行させ、前記機械学習モデルは、学習用ビデオ映像のフレーム毎に抽出された、画像オブジェクトの特徴点の抽出結果を用いて、機械学習済みであり、前記学習用ビデオ映像は、当該学習用ビデオ映像に映っている画像オブジェクトの行動が例外行動か否かに関する情報を含んでいないことを特徴とする。 An exceptional behavior discrimination program according to an embodiment of the present disclosure is an exceptional behavior discrimination program that causes a computer to discriminate exceptional behavior of an image object shown in a video image, and extracts feature points of the image object for each frame of the video image. The feature point extraction step uses a machine learning model to extract behavioral features from the extraction results of feature points, and the behavior related to the extracted behavioral features is determined as an exceptional behavior. If the frequency of occurrence is statistically lower than the standard at the video footage shooting site, the computer executes a behavior discrimination step of determining the behavior as an exceptional behavior, and the machine learning model is extracted for each frame of the learning video footage. In addition, machine learning has been performed using the extraction results of the feature points of the image object, and the learning video footage includes information regarding whether the behavior of the image object shown in the learning video footage is an exceptional behavior. It is characterized by not being
 本開示の一形態に係る例外行動判別装置は、ビデオ映像に映った画像オブジェクトの例外行動を判別する例外行動判別装置であって、ビデオ映像のフレーム毎に、画像オブジェクトの特徴点を抽出する特徴点抽出手段と、機械学習モデルを用いて、特徴点の抽出結果から行動特徴量を抽出する行動特徴量抽出手段と、抽出した行動特徴量に係る行動について、例外行動の判別対象としたビデオ映像の撮影現場において統計的に発生頻度が基準よりも低ければ、例外行動と判別する行動判別手段と、を備え、前記機械学習モデルは、学習用ビデオ映像のフレーム毎に抽出された、画像オブジェクトの特徴点の抽出結果を用いて、機械学習済みであり、前記学習用ビデオ映像は、当該学習用ビデオ映像に映っている画像オブジェクトの行動が例外行動か否かに関する情報を含んでいないことを特徴とする。 An exceptional behavior discrimination device according to an embodiment of the present disclosure is an exceptional behavior discrimination device that discriminates exceptional behavior of an image object shown in a video image, and has a feature of extracting feature points of the image object for each frame of the video image. A point extraction means, a behavioral feature extraction means for extracting behavioral features from the extraction results of feature points using a machine learning model, and a video image that is used to determine exceptional behavior regarding the behavior related to the extracted behavioral features. If the frequency of occurrence is statistically lower than the standard at the shooting site, the machine learning model is configured to determine the behavior as an exceptional behavior if the frequency of occurrence is statistically lower than the standard. Machine learning has been performed using the extraction results of feature points, and the learning video footage does not include information regarding whether or not the behavior of the image object shown in the learning video footage is an exceptional behavior. shall be.
 このようにすれば、画像オブジェクトの行動が例外行動か否かに関する情報を含んでいない学習用ビデオ映像を、機械学習モデルの機械学習に用いればよく、例外行動の判別対象となるビデオ映像の撮影現場ごとに、当該撮影現場において撮影されたビデオ映像を機械学習に用いる必要が無い。したがって、機械学習モデルの学習コストを低減することができる。 In this way, the training video footage that does not include information regarding whether or not the behavior of the image object is an exceptional behavior can be used for machine learning of the machine learning model. There is no need to use video footage shot at each filming location for machine learning. Therefore, the learning cost of the machine learning model can be reduced.
 また、行動特徴量に係る行動の発生頻度から例外行動か否かを判別するので、例外行動の判別処理時における処理負荷を軽減することができ、処理速度を向上させることができる。 Furthermore, since it is determined whether or not the behavior is an exceptional behavior based on the frequency of occurrence of the behavior related to the behavior feature amount, the processing load during the exceptional behavior determination process can be reduced and the processing speed can be improved.
本開示の第1の実施の形態に係るビデオ監視システム1の主要なシステム構成を示す図である。1 is a diagram showing the main system configuration of a video surveillance system 1 according to a first embodiment of the present disclosure. 本開示の第1の実施の形態に係る例外行動判別装置100の主要な構成を示すブロック図である。FIG. 1 is a block diagram showing the main configuration of an exceptional behavior determination device 100 according to a first embodiment of the present disclosure. 本開示の第1の実施の形態に係る例外行動判別プログラム3の主要な構成を示すブロック図である。1 is a block diagram showing the main configuration of an exceptional behavior determination program 3 according to a first embodiment of the present disclosure. FIG. 例外行動判別プログラム3の骨格情報抽出器301が抽出する骨格情報を模式的に説明する図である。3 is a diagram schematically illustrating skeletal information extracted by a skeletal information extractor 301 of the exceptional behavior determination program 3. FIG. 監視カメラ110で撮影したビデオ映像から、機械学習済みの骨格情報抽出器301および行動特徴量抽出器302を用いて抽出した、行動特徴量の統計的分布を説明する図である。FIG. 3 is a diagram illustrating the statistical distribution of behavioral features extracted from a video image taken by a surveillance camera 110 using a machine-learned skeletal information extractor 301 and a behavioral feature extractor 302. 監視カメラ110で撮影したビデオ映像から、機械学習済みの骨格情報抽出器301および行動特徴量抽出器302を用いて抽出した、行動特徴量の統計的分布の平均値および共分散行列を推定する処理を説明するフローチャートである。A process of estimating the mean value and covariance matrix of the statistical distribution of behavioral features extracted from the video footage captured by the surveillance camera 110 using the machine-learned skeletal information extractor 301 and behavioral feature extractor 302. It is a flow chart explaining. 監視カメラ110で撮影したビデオ映像に映っている人物などの画像オブジェクトの行動が例外行動か否かを判別する処理を説明するフローチャートである。11 is a flowchart illustrating a process for determining whether the behavior of an image object such as a person shown in a video image captured by a surveillance camera 110 is an exceptional behavior. 本開示の第2の実施の形態に係る特定例外行動分類プログラム8の主要な構成を示すブロック図である。FIG. 2 is a block diagram showing the main configuration of a specific exceptional behavior classification program 8 according to a second embodiment of the present disclosure. 行動認識データセットに含まれているビデオ映像から、機械学習済みの骨格情報抽出器301および行動特徴量抽出器302を用いて抽出した、特定行動の行動特徴量の統計的分布を説明する図である。This is a diagram illustrating the statistical distribution of behavioral features of specific actions extracted from video footage included in the behavioral recognition dataset using a machine-learned skeletal information extractor 301 and a behavioral feature extractor 302. be. 行動認識データセットに含まれているビデオ映像から、機械学習済みの骨格情報抽出器301および行動特徴量抽出器302を用いて抽出した、特定行動の行動特徴量の統計的分布の平均値および共分散行列を推定する処理を説明するフローチャートである。The average value and co-value of the statistical distribution of behavioral features of a specific behavior extracted from the video footage included in the behavioral recognition dataset using the machine-learned skeletal information extractor 301 and behavioral feature extractor 302. 3 is a flowchart illustrating a process of estimating a dispersion matrix. 監視カメラ110で撮影したビデオ映像に映っている人物などの画像オブジェクトの行動が例外行動である場合に、特定行動ごとの統計的分布から、当該例外行動がどの特定行動に類似しているかを判別する処理を説明するフローチャートである。When the behavior of an image object such as a person shown in a video image captured by the surveillance camera 110 is an exceptional behavior, it is determined from the statistical distribution of each specific behavior which specific behavior the exceptional behavior is similar to. FIG. 従来技術に係る例外構造判別プログラムであるMPED-RNN1200の主要な構成を示すブロック図である。FIG. 2 is a block diagram showing the main configuration of MPED-RNN 1200, which is an exception structure determination program according to the prior art.
 以下、本開示に係る例外行動判別方法、例外行動判別プログラムおよび例外行動判別装置の実施の形態について、ビデオ監視システムを例にとって、図面を参照しながら説明する。
[1]第1の実施の形態
(1-1)ビデオ監視システムのシステム構成
 まず、本開示の第1の実施の形態に係るビデオ監視システムのシステム構成について説明する。
DESCRIPTION OF EMBODIMENTS Hereinafter, embodiments of an exceptional behavior determination method, an exceptional behavior determination program, and an exceptional behavior determination device according to the present disclosure will be described with reference to the drawings, taking a video monitoring system as an example.
[1] First Embodiment (1-1) System Configuration of Video Surveillance System First, the system configuration of the video surveillance system according to the first embodiment of the present disclosure will be described.
 図1に示すように、ビデオ監視システム1は、例外行動判別装置100と、監視カメラ110とをケーブル120で接続したものである。例外行動判別装置100に接続する監視カメラ110の台数は複数であってもよい。また、例外行動判別装置100と監視カメラ110との接続方法がケーブル120に限定されないのは言うまでもなく、無線接続であってもよい。 As shown in FIG. 1, the video surveillance system 1 includes an exceptional behavior determination device 100 and a surveillance camera 110 connected by a cable 120. The number of surveillance cameras 110 connected to the exceptional behavior determination device 100 may be plural. Furthermore, it goes without saying that the method of connecting the exceptional behavior determination device 100 and the surveillance camera 110 is not limited to the cable 120, but may also be a wireless connection.
 監視カメラ110は、設置場所で所定の撮影方向へ向けて設置された状態で、監視対象箇所を継続的に撮影し続ける。監視カメラ110が撮影するビデオ映像は、複数のフレーム画像からなっている。監視カメラ110が撮影したビデオ映像は、デジタル動画像データとして、例外行動判別装置100へ送信される。 The surveillance camera 110 is installed at the installation location facing a predetermined photographing direction, and continues to photograph the monitoring target location. The video image taken by the surveillance camera 110 consists of a plurality of frame images. The video image taken by the surveillance camera 110 is transmitted to the exceptional behavior determination device 100 as digital moving image data.
 監視カメラ110が撮影したビデオ映像は、例外行動判別装置100が随時、監視カメラ110から読み出してもよい。例外行動判別装置100は、カメラ110から継続的に受信したり、随時、読み出したりしたビデオ映像を用いて、例外行動の判別処理を順次、実行する。 The video image taken by the surveillance camera 110 may be read out from the surveillance camera 110 by the exceptional behavior determination device 100 at any time. The exceptional behavior discrimination device 100 sequentially executes exceptional behavior discrimination processing using video images continuously received from the camera 110 or read out from time to time.
 監視カメラ110が撮影するビデオ映像には人物130が映り込むことがある。本実施の形態においては、ビデオ映像に映り込んでいる人物の例外行動を判別する場合を例にとって説明するが、本開示がこれに限定されないのはいうまでもなく、人物以外の画像オブジェクトの例外行動を判別してもよい。 The person 130 may be reflected in the video image taken by the surveillance camera 110. In this embodiment, an example will be described in which an exceptional behavior of a person in a video image is determined, but it goes without saying that the present disclosure is not limited to this. Actions may also be determined.
 人物以外の画像オブジェクトとは、例えば、道路交通のビデオ映像における車両や、工場のビデオ映像における生産機械、牧場のビデオ映像における家畜などが挙げられる。道路交通における車両の例外行動からは事故などを検出し得る。 Image objects other than people include, for example, vehicles in road traffic video images, production machines in factory video images, livestock in ranch video images, and the like. Accidents can be detected from the unusual behavior of vehicles in road traffic.
 工場における生産機械の例外行動からは生産機械の誤動作や不具合などを検出し得る。また、牧場における家畜の例外行動からは捕食動物や不審者の侵入や、疫病の蔓延など家畜の健康状態を検出し得る。
(1-2)例外行動判別装置100の構成
 次に、本実施の形態に係る例外行動判別装置100の構成について説明する。
Malfunctions and defects in production machines can be detected from the abnormal behavior of production machines in factories. In addition, it is possible to detect the health condition of livestock, such as the invasion of predators or suspicious persons, and the spread of epidemics, from unusual behavior of livestock on farms.
(1-2) Configuration of exceptional behavior determination device 100 Next, the configuration of exceptional behavior determination device 100 according to the present embodiment will be described.
 例外行動判別装置100は、いわゆるコンピューターであって、例外行動判別装置本体101、表示部102および入力部103を備えている。 The exceptional behavior determination device 100 is a so-called computer, and includes an exceptional behavior determination device main body 101, a display section 102, and an input section 103.
 表示部102は、例外行動判別装置100がユーザーに情報を提示するための装置である。表示部102としては、例えば、液晶ディスプレイ(LCD: Liquid Display Panel)
を用いることができる。
The display unit 102 is a device for the exceptional behavior determination device 100 to present information to the user. As the display unit 102, for example, a liquid crystal display (LCD: Liquid Display Panel) is used.
can be used.
 表示部102は複数の表示画面を備えていてもよい。表示部102は、監視カメラ110が撮影したビデオ映像を表示したり、当該ビデオ映像を用いた例外行動の判別結果を表示したりするのに用いることができる。 The display unit 102 may include multiple display screens. The display unit 102 can be used to display a video image taken by the surveillance camera 110, or to display a determination result of an exceptional behavior using the video image.
 入力部103は、例外行動判別装置100がユーザーの指示入力を受け付けるための装置である。入力部103は、キーボードおよびポインティングデバイスである。ポインティングデバイスとしては、マウスを用いてもよいし、トラックボール等、マウス以外の装置であってもよい。 The input unit 103 is a device for the exceptional behavior determination device 100 to receive instructions from the user. The input unit 103 is a keyboard and a pointing device. As the pointing device, a mouse may be used, or a device other than a mouse, such as a trackball, may be used.
 また、入力部103としてタッチパッドを用いてもよい。この場合には、表示部102の表示画面を覆うように、タッチパッドが取着されることによって、表示部102および入力部103がタッチパネルを構成する。 Additionally, a touch pad may be used as the input unit 103. In this case, a touch pad is attached to cover the display screen of the display unit 102, so that the display unit 102 and the input unit 103 constitute a touch panel.
 例外行動判別装置本体101は、図2に示すように、CPU(Central Processing Unit)201、ROM(Read Only Memory)202、RAM(Random Access Memory)203、NIC(Network Interface Card)204および記憶部205を備えており、これらは、内部バス211を用いて相互に通信可能に接続されている。 As shown in FIG. 2, the exceptional behavior determination device body 101 includes a CPU (Central Processing Unit) 201, a ROM (Read Only Memory) 202, a RAM (Random Access Memory) 203, a NIC (Network Interface Card) 204, and a storage unit 205. These are connected to each other using an internal bus 211 so that they can communicate with each other.
 CPU201は、例外行動判別装置本体101に電源が投入される等することによってリセットされると、ROM202からブートプログラムを読み出して起動し、RAM203を作業用記憶領域として、記憶部205から読み出したOS(Operating System)や例外行動判別プログラム等のソフトウェアを実行する。 When the CPU 201 is reset by turning on the power to the exceptional behavior discriminating device main body 101, the CPU 201 reads the boot program from the ROM 202, starts it up, uses the RAM 203 as a working storage area, and executes the OS (OS) read from the storage unit 205. Executes software such as Operating System) and exceptional behavior determination program.
 ROM202およびRAM203は半導体メモリである。特に、ROM202は不揮発性の半導体メモリである。 ROM202 and RAM203 are semiconductor memories. In particular, the ROM 202 is a nonvolatile semiconductor memory.
 NIC204は、LAN(Local Area Network)やインターネットといった通信ネットワークを経由して他の装置と通信するための処理を実行する。この通信は、有線通信であってもよいし、無線通信であってもよい。また、USB(Universal Serial Bus)機器のように、他の装置を例外行動判別装置100に直接接続して行う通信であってもよい。 The NIC 204 executes processing for communicating with other devices via a communication network such as a LAN (Local Area Network) or the Internet. This communication may be wired communication or wireless communication. Alternatively, communication may be performed by directly connecting another device to the exceptional behavior determination device 100, such as a USB (Universal Serial Bus) device.
 記憶部205は、大容量記憶装置であって、ハードディスク(HDD: Hard Disk Drive)のように例外行動判別装置100に内蔵された記憶装置だけでなく、クラウドストレージのような外部記憶装置を併用してもよい。 The storage unit 205 is a large-capacity storage device that uses not only a storage device built into the exceptional behavior determination device 100 such as a hard disk drive (HDD), but also an external storage device such as cloud storage. It's okay.
 記憶部205は、上記のようなソフトウェアを記憶する他、監視カメラ110が撮影したビデオ映像や、後述のように、例外行動判別プログラムに用いられる機械学習モデルであるニューラルネットワークのパラメーターや学習データ、例外行動判別プログラムによる判別結果なども記憶される。 In addition to storing the software described above, the storage unit 205 also stores video footage captured by the surveillance camera 110, parameters and learning data of a neural network, which is a machine learning model used in the exceptional behavior discrimination program, as will be described later. The results of determination by the exceptional behavior determination program are also stored.
 CPU201は、内部バス211を経由して、表示部202、入力部203および監視カメラ110にアクセスする。また、監視カメラ110が撮影したビデオ映像は、ケーブル120および内部バス211を経由して、記憶部205に記憶される。
(1-3)例外行動判別プログラム
 次に、例外行動判別装置100が実行する例外行動判別プログラムについて説明する。本開示に係る例外行動判別プログラムは、機械学習モデル、特にニューラルネットワークを用いて、ビデオ映像に映った例外行動を判別する。
The CPU 201 accesses the display section 202, the input section 203, and the surveillance camera 110 via the internal bus 211. Further, the video image taken by the surveillance camera 110 is stored in the storage unit 205 via the cable 120 and the internal bus 211.
(1-3) Exceptional behavior discrimination program Next, the exceptional behavior discrimination program executed by the exceptional behavior discrimination device 100 will be explained. The exceptional behavior determination program according to the present disclosure uses a machine learning model, particularly a neural network, to determine exceptional behavior shown in a video image.
 図3に示すように、例外行動判別プログラム3は、骨格情報抽出器301、行動特徴量抽出器302および例外行動判別器303を備えている。
(1-3-1)骨格情報抽出器301
 骨格情報抽出器301は、ビデオ映像の入力を受け付けると、機械学習モデルを用いたニューロ演算によって、当該ビデオ映像を構成するフレーム毎に人物などの画像オブジェクトの骨格を構成する関節点ごとに当該関節点に関する情報を抽出する。
As shown in FIG. 3, the exceptional behavior discrimination program 3 includes a skeleton information extractor 301, a behavioral feature extractor 302, and an exceptional behavior discriminator 303.
(1-3-1) Skeleton information extractor 301
When the skeletal information extractor 301 receives input of a video image, the skeletal information extractor 301 performs neural calculations using a machine learning model to extract the relevant joints for each joint point that makes up the skeleton of an image object such as a person for each frame that makes up the video image. Extract information about points.
 関節点に関する情報は、関節点の座標、検出スコア、画像オブジェクトの種類を表すラベル、関節点の種類を表す属性および画像オブジェクトの外観を表す属性である。 The information regarding the joint points is the coordinates of the joint points, the detection score, the label representing the type of the image object, the attribute representing the type of the joint point, and the attribute representing the appearance of the image object.
 関節点の座標とは、フレーム内における当該関節点のXY座標値、および当該フレームのフレーム番号である。 The coordinates of a joint point are the XY coordinate values of the joint point within a frame and the frame number of the frame.
 関節点の検出スコアとは、当該関節点が尤もらしく検知されていることを表す確率である。検知した座標に関節点が位置することが確実である場合には確率が1に近くなり、あまり確実でない場合には確率が小さくなる。 The detection score of a joint point is a probability indicating that the joint point is likely to be detected. If it is certain that the joint point is located at the detected coordinates, the probability is close to 1, and if it is not very certain, the probability becomes small.
 画像オブジェクトの種類を表すラベルとは、当該関節点を含む画像オブジェクトが属するクラスを表す情報である。例えば、ラベル値が「0」である場合には当該画像オブジェクトは「人物」を映したものであり、「1」である場合には「シャベル」を映したもの、「2」である場合には「ラケット」を映したもの等とすることができる。 The label representing the type of image object is information representing the class to which the image object including the joint point belongs. For example, if the label value is "0", the image object is a "person", if it is "1", it is a "shovel", and if it is "2", the image object is a "person". can be a reflection of a "racquet", etc.
 なお、以上について、複数選択することができるように、これらを成分とするベクトル量を画像オブジェクトのラベルにしてもよい。 Incidentally, in order to be able to select a plurality of the above items, a vector amount having these as components may be used as a label of the image object.
 関節点の属性とは、当該関節点が属するクラスである。例えば、関節点の属性が「0」である場合には当該関節点は「肘」を表し、「1」である場合には「手首」を表し、「2」である場合には「肩」を表す等とすることができる。 The attribute of a joint point is the class to which the joint point belongs. For example, if the joint point attribute is "0", the joint point represents "elbow", if it is "1", it represents "wrist", and if it is "2", it represents "shoulder". It can be expressed as, for example.
 関節点の属性については、複数選択することはないが、これらを成分とするベクトル量を関節点の属性としてもよい。 Although there is no need to select multiple attributes of a joint point, a vector quantity having these as components may be used as an attribute of a joint point.
 画像オブジェクトの外観を表す属性とは、当該関節点を含む画像オブジェクトの外観が属するクラスを表す情報である。画像オブジェクトの外観を表す属性もまたベクトル量であってもよい。 The attribute representing the appearance of an image object is information representing the class to which the appearance of the image object including the joint point belongs. An attribute representing the appearance of an image object may also be a vector quantity.
 骨格情報抽出器301としては、例えば、OpenPoseを用いることができる。OpenPoseは、米カーネギーメロン大学CTTEC(Center for Technology Transfer and Enterprise Creation)で開発されたソフトウェアで、複数の人物の関節点と、関節点どうしの接続関係と、をリアルタイムで検出することができる(非特許文献2を参照)。 As the skeleton information extractor 301, for example, OpenPose can be used. OpenPose is a software developed by Carnegie Mellon University's CTTEC (Center for Technology Transfer and Enterprise Creation) that can detect the joint points of multiple people and the connections between the joint points in real time. (See Patent Document 2).
 OpenPoseでは、関節点として、人物の鼻、心臓および左右の肩、肘、手首、腰、膝、足首、目および耳の位置に相当する座標が抽出される。また、OpenPose以外の機械学習モデルを用いて関節点に関する情報を抽出してもよいことは言うまでもない。 In OpenPose, coordinates corresponding to the positions of the person's nose, heart, left and right shoulders, elbows, wrists, hips, knees, ankles, eyes, and ears are extracted as joint points. Furthermore, it goes without saying that information regarding joint points may be extracted using a machine learning model other than OpenPose.
 骨格情報抽出器301は、畳み込みニューラルネットワーク(CNN: Convolutional Neural Network)とセルフアテンション機構との少なくとも一方を含むのが望ましい。 The skeleton information extractor 301 preferably includes at least one of a convolutional neural network (CNN) and a self-attention mechanism.
 また、関節点に関する情報は、少なくとも関節点の座標を含んでいればよい。関節点の検出スコア、画像オブジェクトの種類を表すラベル、関節点の属性および画像オブジェクトの外観を表す属性については、関節点に関する情報として、用いなくてもよいし、少なくとも1つ以上を用いてもよい。 Furthermore, the information regarding the joint points only needs to include at least the coordinates of the joint points. The joint point detection score, the label representing the type of image object, the joint point attribute, and the attribute representing the appearance of the image object may not be used as information regarding the joint point, or at least one or more of them may be used. good.
 骨格情報抽出器301は、ビデオ映像から抽出した骨格情報を、行動特徴量抽出器へ出力する。 The skeleton information extractor 301 outputs the skeleton information extracted from the video image to the behavioral feature extractor.
 図4は、室内を歩行する人物について、骨格情報抽出器301が抽出した骨格情報のうち、関節点のXY座標を模式的に示した図である。図4に示すように、室内を歩行する人物の関節座標が、フレーム#1からフレーム#4までの各フレームについてそれぞれ抽出され、骨格情報を構成する。なお、言うまでもなく、歩行する人物の背景となる壁面は骨格情報には含まれない。 FIG. 4 is a diagram schematically showing the XY coordinates of joint points of the skeletal information extracted by the skeletal information extractor 301 for a person walking indoors. As shown in FIG. 4, the joint coordinates of a person walking indoors are extracted for each frame from frame #1 to frame #4 to constitute skeletal information. Needless to say, the wall surface serving as the background of the walking person is not included in the skeleton information.
 なお、本開示がこれに限定されないのは言うまでもなく、骨格情報抽出器301に代えて、ビデオ映像から骨格情報以外の情報を抽出する情報抽出器を用いてもよい。例えば、骨格情報に代えて、画像オブジェクトを囲む矩形領域の頂点に関する情報をフレーム毎に抽出してもよい。 Note that it goes without saying that the present disclosure is not limited to this, and instead of the skeletal information extractor 301, an information extractor that extracts information other than skeletal information from the video image may be used. For example, instead of the skeleton information, information regarding the vertices of a rectangular area surrounding the image object may be extracted for each frame.
 この場合には、頂点の座標として、フレーム内における当該頂点のXY座標値、および当該フレーム番号を用いる。頂点の検出スコアとして、当該頂点が尤もらしく検出されていることを表す確率を用いる。また、画像オブジェクトのラベルとして、当該頂点を有する矩形領域に囲まれた画像オブジェクトが属するクラスを表す情報を用いる。 In this case, the XY coordinate values of the vertex within the frame and the frame number are used as the coordinates of the vertex. As the detection score of a vertex, a probability indicating that the vertex is likely to be detected is used. Further, as the label of the image object, information representing the class to which the image object surrounded by the rectangular area having the vertex belongs is used.
 頂点の属性は、当該頂点を有する矩形領域における当該頂点の位置を表す。例えば、頂点の属性が「0」である場合には当該頂点は矩形領域の「左上」の頂点であり、「1」である場合には「右上」の頂点である等とすることができる。 The attribute of a vertex represents the position of the vertex in the rectangular area that includes the vertex. For example, when the attribute of a vertex is "0", the vertex is the "top left" vertex of the rectangular area, and when the attribute is "1", it is the "top right" vertex.
 頂点に関する情報を抽出する場合には、例えば、YOLOを用いることができる。YOLOは、画像に含まれている画像オブジェクトを囲むバウンディングボックスと、当該画像オブジェクトが属するクラス予測確率とを抽出する深層ニューラルネットワーク(DNN: Deep Neural Network)である(非特許文献3を参照)。 When extracting information regarding vertices, for example, YOLO can be used. YOLO is a deep neural network (DNN) that extracts a bounding box surrounding an image object included in an image and a predicted class probability to which the image object belongs (see Non-Patent Document 3).
 DNNは、多層ニューラルネットワークのうち、特に4層以上のものをいう。DNNを用いれば、高い精度で行動特徴量を抽出することができると期待されるため、画像認識処理に留まることなく、様々な分野で応用されている。 A DNN refers to a multilayer neural network, especially one with four or more layers. DNNs are expected to be able to extract behavioral features with high accuracy, so they are being applied not only to image recognition processing but also to a variety of fields.
 また、YOLO以外の機械学習モデルを用いて、矩形領域の頂点に関する情報を抽出してもよいことは言うまでもない。
(1-3-2)行動特徴量抽出器302
 行動特徴量抽出器302は、骨格情報抽出器301から骨格情報を受け付けると、行動特徴量を抽出する。行動特徴量はベクトル量である。当該ベクトルの各成分は、それぞれ当該行動特徴量に係る人物などの画像オブジェクトの行動ラベルに対応している。行動特徴量は、成分の数が固定されているという意味において、固定長のベクトルである。
It goes without saying that information regarding the vertices of a rectangular area may be extracted using a machine learning model other than YOLO.
(1-3-2) Behavioral feature extractor 302
When the behavioral feature extractor 302 receives the skeletal information from the skeletal information extractor 301, it extracts behavioral features. The behavioral feature quantity is a vector quantity. Each component of the vector corresponds to a behavior label of an image object such as a person related to the behavior feature amount. A behavioral feature is a fixed-length vector in the sense that the number of components is fixed.
 なお、骨格情報に代えて、画像オブジェクトを囲む矩形領域の頂点に関する情報をフレーム毎に抽出する場合についても、行動特徴量抽出器302を用いて、当該矩形領域の頂点に関する情報を用いて、行動特徴量を抽出してもよい。この場合にも、行動特徴量は、各成分が画像オブジェクトの行動ラベルに対応している固定長のベクトル量である。 Note that even when extracting information about the vertices of a rectangular area surrounding an image object for each frame instead of skeleton information, the behavior feature extractor 302 is used to extract the behavior using the information about the vertices of the rectangular area. Feature quantities may also be extracted. In this case as well, the behavior feature is a fixed-length vector in which each component corresponds to the behavior label of the image object.
 行動特徴量抽出器302には深層ニューラルネットワークを用いることができる。 A deep neural network can be used for the behavioral feature extractor 302.
 本実施の形態においては、DNNとして、PointNetを用いて、骨格情報から行動特徴量を抽出する(非特許文献4を参照)。PointNetは、PermutationInvariantなニューラルネットワークである。すなわち、骨格情報抽出器301から受け付けた骨格情報において関節点に関する情報の順序が入れ替わっても、同じ行動特徴量を抽出することができる。 In this embodiment, PointNet is used as the DNN to extract behavioral features from skeletal information (see Non-Patent Document 4). PointNet is a Permutation Invariant neural network. That is, even if the order of information regarding joint points in the skeleton information received from the skeleton information extractor 301 is changed, the same behavioral feature amount can be extracted.
 すなわち、関節点に関する情報の順序を置き換えても、PointNetを用いて抽出される行動特徴量は不変である。このため、骨格情報抽出器301が抽出した骨格情報を何らかの基準に従って並べ替えたりしなくても、適切に行動特徴量を抽出することができる。 That is, even if the order of the information regarding the joint points is replaced, the behavioral features extracted using PointNet remain unchanged. Therefore, behavioral features can be appropriately extracted without sorting the skeletal information extracted by the skeletal information extractor 301 according to some criteria.
 PointNetは、関節点毎の特徴量を結合する処理を含まないため、一部の関節点が検出されない場合や、誤った位置に検出された場合に、誤った特徴量が下層に伝搬する影響が少ない。したがって、画像オブジェクトが例外行動をとったときに関節点検出に誤りが発生しても変動の少ない行動特徴量を抽出できるので、行動特徴量の発生頻度の統計的分布から当該画像オブジェクトの行動が例外行動か否かを判別することができる。 PointNet does not include processing to combine the feature values for each joint point, so if some joint points are not detected or are detected in the wrong position, the influence of incorrect feature values propagating to lower layers is reduced. few. Therefore, even if an error occurs in joint point detection when an image object takes an exceptional behavior, it is possible to extract behavioral features that have little variation, so the behavior of the image object can be determined from the statistical distribution of the frequency of occurrence of the behavioral features. It is possible to determine whether the behavior is an exceptional behavior or not.
 なお、行動特徴量抽出器302は、PointNet以外のニューラルネットワークを用いて行動特徴量を抽出してもよいのは言うまでもない。また、行動特徴量抽出器302は、PermutationInvariantでないニューラルネットワークを用いて行動特徴量を抽出してもよい。いずれの場合においても、次に述べるような、例外行動判別器303を用いて例外行動を判別すれば、本開示の目的を達成することができる。 It goes without saying that the behavioral feature extractor 302 may extract behavioral features using a neural network other than PointNet. Furthermore, the behavioral feature extractor 302 may extract behavioral features using a neural network other than Permutation Invariant. In either case, the purpose of the present disclosure can be achieved by determining the exceptional behavior using the exceptional behavior discriminator 303 as described below.
 また、本実施の形態においては、行動認識データセットに含まれているビデオ映像を学習用ビデオ映像として、当該学習用ビデオを用いて骨格情報抽出器301が抽出した骨格情報を用いて、行動特徴量抽出器302の機械学習を行う。 In addition, in this embodiment, the video footage included in the behavior recognition dataset is used as the learning video footage, and the skeletal information extracted by the skeletal information extractor 301 using the learning video is used to generate behavioral features. Machine learning of the quantity extractor 302 is performed.
 行動認識データセットは、通常、監視カメラ110の設置先で撮影したビデオ映像を含んでおらず、少なくとも監視カメラ110の設置先で撮影したビデオ映像以外のビデオ映像を含んでいる。すなわち、行動認識データセットには、監視カメラ110の設置先とは異なる場所で撮影されたビデオ映像が含まれている。 The behavior recognition data set usually does not include video footage shot at the location where the surveillance camera 110 is installed, and includes at least video footage other than the video footage shot at the location where the surveillance camera 110 is installed. That is, the action recognition data set includes video footage taken at a location different from where the surveillance camera 110 is installed.
 このため、行動認識データセットに含まれているビデオ映像から抽出した行動特徴量の発生頻度の統計的分布は、監視カメラ110の設置先で撮影したビデオ映像だけから抽出した行動特徴量の発生頻度の統計的分布と一致する保証がない。 Therefore, the statistical distribution of the frequency of occurrence of the behavioral features extracted from the video footage included in the behavior recognition dataset is the same as the frequency of occurrence of the behavioral features extracted only from the video footage shot at the location where the surveillance camera 110 is installed. There is no guarantee that it will match the statistical distribution of
 このような意味において、行動認識データセットに含まれているビデオ映像は、当該ビデオ映像に映っている画像オブジェクトの行動が、監視カメラ110の設置先で撮影したビデオ映像における例外行動に該当するか否かに関する情報を含んでいない。 In this sense, the video footage included in the behavior recognition dataset is used to determine whether the behavior of the image object shown in the video footage corresponds to an exceptional behavior in the video footage captured at the location where the surveillance camera 110 is installed. It does not contain information about whether or not.
 このように、本実施の形態においては、行動特徴量抽出器302の機械学習には、監視カメラ110の設置先で撮影したビデオ映像は用いない。上述した従来技術では、監視カメラ110の設置先ごとに機械学習を行う必要があるのに対して、学習コストを大幅に削減することができる。 In this way, in this embodiment, the video footage taken at the location where the surveillance camera 110 is installed is not used for machine learning by the behavioral feature extractor 302. In the conventional technology described above, it is necessary to perform machine learning for each installation location of the surveillance camera 110, whereas the learning cost can be significantly reduced.
 行動特徴量抽出器302は、骨格情報から抽出した行動特徴量を例外行動判別器303へ出力する。
(1-3-3)例外行動判別器303
 例外行動判別器303は、行動特徴量抽出器302から受け付けた行動特徴量から、当該行動特徴量に係る画像オブジェクトの行動が例外行動か否かを判定する。
The behavior feature extractor 302 outputs the behavior feature extracted from the skeleton information to the exceptional behavior discriminator 303.
(1-3-3) Exceptional behavior discriminator 303
The exceptional behavior discriminator 303 determines from the behavior feature amount received from the behavior feature amount extractor 302 whether the behavior of the image object related to the behavior feature amount is an exceptional behavior.
 具体的には、当該監視カメラ110が設置先で撮影したビデオ映像を用いて抽出された行動特徴量の統計的分布(以下、「現場の統計的分布」という。)を予め求めておき、現場の統計的分布の中心からのマハラノビス距離D1を閾値Dth1と比較する。 Specifically, a statistical distribution of behavioral features extracted using video images taken at the location where the surveillance camera 110 is installed (hereinafter referred to as "statistical distribution of the site") is obtained in advance, and The Mahalanobis distance D1 from the center of the statistical distribution of is compared with the threshold Dth1.
 比較の結果、マハラノビス距離D1が閾値Dth1よりも大きい場合には、監視カメラ110が設置先において撮影したビデオ映像から、当該行動特徴量が抽出される頻度が当該閾値Dth1に相当する発生頻度以下ということになる。 As a result of the comparison, if the Mahalanobis distance D1 is larger than the threshold Dth1, it is determined that the frequency at which the behavioral feature is extracted from the video footage captured by the surveillance camera 110 at the installation location is less than or equal to the frequency of occurrence corresponding to the threshold Dth1. It turns out.
 言い換えると、当該ビデオ映像の撮影現場において、当該行動特徴量に係る行動の統計的な発生頻度は当該閾値Dth1に相当する発生頻度以下であり、当該行動の発生頻度は低いので、例外行動である、と判定する。 In other words, at the shooting site of the video footage, the statistical frequency of occurrence of the behavior related to the behavioral feature is less than or equal to the frequency of occurrence corresponding to the threshold Dth1, and the frequency of occurrence of the behavior is low, so it is an exceptional behavior. , it is determined.
 マハラノビス距離D1が閾値Dth1以下である場合には、当該ビデオ映像の撮影現場においては、当該行動特徴量に係る行動の統計的な発生頻度が高いので、例外行動でない、と判定する。 If the Mahalanobis distance D1 is less than or equal to the threshold value Dth1, it is determined that the behavior is not an exceptional behavior because the behavior related to the behavior feature has a high statistical frequency of occurrence at the shooting site of the video image.
 すなわち、監視カメラが設置先で撮影したビデオ映像を用いて、骨格情報抽出器301に骨格情報を抽出させ、得られた骨格情報を用いて行動特徴量抽出器302に抽出させた行動特徴量を標本値として、現場の統計的分布の平均値 That is, the skeletal information extractor 301 extracts skeletal information using the video footage taken by the surveillance camera at the installation location, and the behavioral feature extractor 302 extracts the extracted behavioral features using the obtained skeletal information. As a sample value, the average value of the statistical distribution of the field
を推定する。現場の行動特徴量が多変量ガウス分布に従う場合には、標本値の平均値が現場の統計的分布の平均値の最尤推定値になる。 Estimate. When the behavioral features of the scene follow a multivariate Gaussian distribution, the average value of the sample values becomes the maximum likelihood estimate of the average value of the statistical distribution of the scene.
 この最尤推定値を現場の統計的分布の中心とした場合には、当該中心から行動特徴量 If this maximum likelihood estimate is set as the center of the statistical distribution of the site, behavioral features can be calculated from the center.
までのマハラノビス距離D1は、 The Mahalanobis distance D1 to
である。ここで、「t」は、ベクトルの転置を表す記号である。μ1、…、μnは、現場の統計的分布における行動特徴量の成分ごとの平均値であり、平均値μはこれらを成分とするベクトルである。x1、…、xnは、行動特徴量抽出器302が抽出した行動特徴量xの各成分である。 It is. Here, "t" is a symbol representing transposition of a vector. μ1, . . . , μn are average values for each component of the behavioral feature amount in the statistical distribution of the site, and the average value μ is a vector having these as components. x1, . . . , xn are each component of the behavioral feature x extracted by the behavioral feature extractor 302.
 Σは、現場の統計的分布に関する共分散行列である。現場の行動特徴量が多変量ガウス分布に従う場合には、標本値の共分散行列が現場の統計的分布の共分散行列の最尤推定値になる。本実施の形態においては、この最尤推定値を共分散行列Σとして用いる。 Σ is a covariance matrix regarding the statistical distribution of the field. When the behavioral features of the field follow a multivariate Gaussian distribution, the covariance matrix of sample values becomes the maximum likelihood estimate of the covariance matrix of the statistical distribution of the field. In this embodiment, this maximum likelihood estimate is used as the covariance matrix Σ.
 このように、平均値μおよび共分散行列Σは、現場の統計的分布に関する統計量である。 In this way, the mean value μ and the covariance matrix Σ are statistics related to the statistical distribution of the field.
 現場の統計的分布の中心からのユークリッド距離を用いると、行動特徴量のうち、現場の統計的分布において分散値の大きい成分によって例外行動の判定結果が左右され易くなる。一方、マハラノビス距離を用いれば、現場の統計的分布の共分散行列の逆行列を乗算しているので、現場の統計的分布における行動特徴量の成分ごとの分散値の違いに関係なく、例外行動か否かを判定することができる。 When the Euclidean distance from the center of the statistical distribution of the scene is used, the result of determining exceptional behavior is likely to be influenced by the component of behavioral features that has a large variance value in the statistical distribution of the scene. On the other hand, if Mahalanobis distance is used, the inverse matrix of the covariance matrix of the statistical distribution of the site is multiplied, so the exceptional behavior is It can be determined whether or not.
 図5に例示するように、行動特徴量が2次元ベクトルである場合には、行動特徴量の第1成分の平均値μ1と第2成分の平均値μ2とで指定される、現場の統計的分布の中心からのマハラノビス距離D1が閾値Dth1よりも大きい場合には、当該行動特徴量に係る行動は例外行動であると判定される。
(1-4)骨格情報抽出器301および行動特徴量抽出器302の機械学習
 上述のように、骨格情報抽出器301および行動特徴量抽出器302はどちらもニューラルネットワークであり、例外行動の判定に用いるのに先立って、機械学習を行って必要なパラメーターを決定しておく必要がある。
As illustrated in FIG. 5, when the behavioral feature is a two-dimensional vector, the on-site statistical When the Mahalanobis distance D1 from the center of the distribution is larger than the threshold Dth1, the behavior related to the behavior feature is determined to be an exceptional behavior.
(1-4) Machine learning of the skeletal information extractor 301 and behavioral feature extractor 302 As described above, the skeletal information extractor 301 and the behavioral feature extractor 302 are both neural networks, and are used to determine exceptional behavior. Before using it, it is necessary to perform machine learning to determine the necessary parameters.
 本実施の形態においては、この機械学習に、監視カメラ110で撮影した実際のビデオ映像(以下、「現場の映像」という。)ではなく、既存の行動認識データセットを用いる。この行動認識データセットとしては、一般的なオープンデータを用いることができる。 In this embodiment, an existing behavior recognition data set is used for this machine learning instead of the actual video footage captured by the surveillance camera 110 (hereinafter referred to as "on-site footage"). General open data can be used as this behavior recognition data set.
 よくある一般的な行動がオープンデータに網羅されていれば、監視カメラ110を用いて撮影される頻度が高い行動もまた網羅されている可能性が高いからである。このようにすれば、現場の映像を用いなくても、骨格情報抽出器301の機械学習を行うことができる。 This is because if common actions are covered by open data, there is a high possibility that actions that are frequently photographed using the surveillance camera 110 are also covered. In this way, machine learning of the skeletal information extractor 301 can be performed without using on-site video.
 したがって、骨格情報抽出器302および行動特徴量抽出器302の機械学習を、例外行動判別装置100の開発時に、完了することができる。また、この機械学習によって得られたパラメーターは、監視カメラの設置先に関係なく、使用することができる。 Therefore, the machine learning of the skeletal information extractor 302 and the behavior feature extractor 302 can be completed at the time of developing the exceptional behavior discrimination device 100. Furthermore, the parameters obtained through this machine learning can be used regardless of where the surveillance camera is installed.
 このため、上述のような従来技術とは異なって、監視カメラを設置する現場ごとに機械学習を行う必要がない。 Therefore, unlike the conventional technology described above, there is no need to perform machine learning at each site where a surveillance camera is installed.
 また、後述のように、例外行動判別器303はそもそも機械学習を行う必要がない。 Furthermore, as will be described later, the exceptional behavior discriminator 303 does not need to perform machine learning in the first place.
 これらの理由から、従来技術よりも学習コストを著しく削減することができる。また、監視カメラの設置先が多ければ多いほど高い効果を得ることができる。
(1-4-1)骨格情報抽出器301の機械学習
 骨格情報抽出器301は、ビデオ映像のフレーム単位で関節点に関する情報を抽出するため、骨格情報抽出器301の機械学習には静止画像の行動認識データセットを用いる。機械学習の際には、まず、学習用の静止画像を入力として、骨格情報抽出器301に骨格情報を出力させる。
For these reasons, the learning cost can be significantly reduced compared to the prior art. Furthermore, the more places where surveillance cameras are installed, the better the effect can be obtained.
(1-4-1) Machine learning of the skeletal information extractor 301 Since the skeletal information extractor 301 extracts information about joint points for each frame of the video image, the machine learning of the skeletal information extractor 301 uses still images. Using behavioral recognition dataset. During machine learning, first, a still image for learning is input, and the skeleton information extractor 301 is caused to output skeleton information.
 次に、誤差関数(損失関数ともいう。)を用いて、行動認識データセットに含まれている正解の骨格情報と、骨格情報抽出器301が出力した骨格情報と、の誤差を算出する。この誤差が小さくなるように、逆伝播法(backpropagation)を用いて、骨格情報抽出器
301のパラメーターを修正する。
Next, an error function (also referred to as a loss function) is used to calculate the error between the correct skeleton information included in the action recognition dataset and the skeleton information output by the skeleton information extractor 301. In order to reduce this error, the parameters of the skeleton information extractor 301 are modified using backpropagation.
 更に、パラメーターを修正した骨格情報抽出器301に学習用の静止画像を入力して、骨格情報を出力させ、誤差を求める。この静止画像は、先に用いた静止画像と同じ静止画像であってもよいし、別の静止画像であってもよい。 Further, a still image for learning is input to the skeletal information extractor 301 whose parameters have been corrected, skeletal information is output, and an error is determined. This still image may be the same still image as the previously used still image, or may be a different still image.
 得られた誤差が所定の閾値よりも大きければ、上記と同様に逆伝播法を用いて、骨格情報抽出器301のパラメーターを更に修正して、上記の処理を繰り返す。得られた誤差が所定の閾値以下ならば機械学習を終了する。
(1-4-2)行動特徴量抽出器302の機械学習
 行動特徴量抽出器302の機械学習は、骨格情報抽出器301の機械学習が完了してから実施する。
If the obtained error is larger than a predetermined threshold, the parameters of the skeleton information extractor 301 are further modified using the back propagation method in the same manner as above, and the above processing is repeated. If the obtained error is less than or equal to a predetermined threshold, machine learning is terminated.
(1-4-2) Machine learning of the behavioral feature extractor 302 Machine learning of the behavioral feature extractor 302 is performed after the machine learning of the skeleton information extractor 301 is completed.
 行動特徴量抽出器302は、行動認識データセットのビデオ映像をフレーム単位で行動特徴量抽出器302に入力して、得られた骨格情報を機械学習データとする。この機械学習においては、ビデオ映像ごとに行動教師ラベルを用意する。 The behavioral feature extractor 302 inputs the video image of the behavioral recognition dataset frame by frame to the behavioral feature extractor 302, and uses the obtained skeletal information as machine learning data. In this machine learning, a behavioral teacher label is prepared for each video image.
 機械学習の際には、行動特徴量抽出器302に骨格情報を入力して、行動特徴量を抽出させる。更に、抽出させた行動特徴量から行動ラベル毎のスコアを算出させる。算出させた行動ラベル毎のスコアと、予め用意した行動教師ラベルとを比較して、誤差を求める。この誤差を求める場合には、クロスエントロピー誤差などの誤差関数を用いる。 During machine learning, skeletal information is input to the behavioral feature extractor 302 to extract behavioral features. Furthermore, a score for each behavior label is calculated from the extracted behavior features. The calculated score for each behavior label is compared with a behavior teacher label prepared in advance to find an error. When calculating this error, an error function such as a cross-entropy error is used.
 次に、求めた誤差が小さくなるように、逆伝播法を用いて、行動特徴量抽出器302のパラメーターを修正した後、修正したパラメーターを用いた行動特徴量抽出器302に骨格情報を入力して、誤差を求める。この骨格情報は、先に用いた骨格情報と同じ骨格情報であってもよいし、別の骨格情報であってもよい。 Next, after correcting the parameters of the behavioral feature extractor 302 using the backpropagation method so that the obtained error is small, the skeleton information is input to the behavioral feature extractor 302 using the corrected parameters. and find the error. This skeletal information may be the same skeletal information as the previously used skeletal information, or may be different skeletal information.
 求めた誤差が所定の閾値よりも大きければ、上記と同様に逆伝播法を用いて、特徴量抽出器302のパラメーターを更に修正して、上記の処理を繰り返す。得られた誤差が所定の閾値以下ならば機械学習を終了する。 If the determined error is larger than a predetermined threshold, the parameters of the feature extractor 302 are further modified using the backpropagation method in the same way as above, and the above process is repeated. If the obtained error is less than or equal to a predetermined threshold, machine learning is terminated.
 このように、行動特徴量抽出器302については、分類問題として機械学習を行う。
(1-5)例外行動判別器303に用いる行動特徴量の統計的分布の特定
 上述のように、例外行動判別器303は、監視カメラ110の撮影現場における行動特徴量の統計的分布(現場の統計的分布)を用いて、例外行動を判別する。
In this way, the behavioral feature extractor 302 performs machine learning as a classification problem.
(1-5) Identification of the statistical distribution of behavioral features used in the exceptional behavior discriminator 303 As described above, the exceptional behavior discriminator 303 specifies the statistical distribution of behavioral features at the shooting site of the surveillance camera 110 (the (statistical distribution) to determine exceptional behavior.
 行動特徴量の統計的分布を特定する際には、機械学習が完了した骨格情報抽出器301および行動特徴量抽出器302を用いる。 When specifying the statistical distribution of behavioral features, the skeletal information extractor 301 and behavioral feature extractor 302, which have undergone machine learning, are used.
 まず、図6に示すように、行動特徴量の検出頻度を0に初期化する(S601)。行動特徴量の検出頻度は、行動特徴量のとり得る値の種類が少ない場合には、当該取り得る値ごとに計数してもよい。また、行動特徴量の取り得る値の種類が多い場合には、行動特徴量の取り得る値の範囲ごとに計数してもよい。 First, as shown in FIG. 6, the detection frequency of behavioral features is initialized to 0 (S601). If the number of possible values of the behavioral feature is small, the detection frequency of the behavioral feature may be counted for each possible value. Furthermore, if there are many types of values that the behavior feature amount can take, the values may be counted for each range of values that the behavior feature amount can take.
 次に、監視カメラ110が撮影現場で撮影したビデオ映像をフレーム単位で骨格情報抽出器301に入力して、骨格情報を抽出し(S602)、得られた骨格情報を行動特徴量抽出器302に入力して、行動特徴量を抽出する(S603)。このようにして得られた行動特徴量の検出頻度、または当該行動特徴量を含む範囲における検出頻度を更新する(S604)。すなわち、計数値を1だけ増加させる。 Next, the video footage captured by the surveillance camera 110 at the filming location is input frame by frame to the skeletal information extractor 301 to extract skeletal information (S602), and the obtained skeletal information is input to the behavioral feature extractor 302. Input and extract behavioral feature amounts (S603). The detection frequency of the behavioral feature obtained in this way or the detection frequency in the range including the behavioral feature is updated (S604). That is, the count value is increased by one.
 なお、当該ビデオ映像は、例外行動を判別するために例外行動判別プログラム3に入力されるビデオ映像よりも先に、例外行動判別プログラム3に入力されるという意味において、先行するビデオ映像である。 Note that the video image is a preceding video image in the sense that it is input to the exceptional behavior determination program 3 before the video image input to the exceptional behavior determination program 3 for determining exceptional behavior.
 その後、監視カメラ110が撮影現場で撮影したビデオ映像であって、行動特徴量の検出頻度の計数に用いていないビデオ映像がある場合には(S605:YES)、ステップS602へ進んで、未使用のビデオ映像を用いて上記と同様の処理を実行する。 After that, if there is video footage taken by the surveillance camera 110 at the filming location that has not been used to count the detection frequency of behavioral features (S605: YES), the process advances to step S602 and the video footage is unused. The same process as above is performed using the video image of .
 未使用のビデオ映像が無くなった場合には(S605:NO)、行動特徴量の検出頻度が多変量ガウス分布に従うことを利用して、行動特徴量の検出頻度(標本値)の平均値および共分散行列を算出して、これらを平均値μと共分散行列Σと推定し(S606)、処理を終了する。 If there is no unused video footage (S605: NO), the average value and co-value of the detection frequency (sample value) of the behavioral feature are calculated using the fact that the detection frequency of the behavioral feature follows a multivariate Gaussian distribution. A variance matrix is calculated, and these are estimated as the mean value μ and covariance matrix Σ (S606), and the process ends.
 このようにして得られた平均値μおよび共分散行列Σを用いれば、上述のように、マハラノビス距離D1を算出することができる。 By using the mean value μ and covariance matrix Σ obtained in this way, the Mahalanobis distance D1 can be calculated as described above.
 なお、 例外行動判別プログラム3にビデオ映像を入力して、例外行動を判別する処理を実行した後で、監視カメラ110が撮影現場で撮影したビデオ映像を例外行動判別プログラム3に入力して、行動特徴量の統計的分布を更新して、統計量を計算し直してもよい。
(1-6)例外行動の判別処理
 以上のような構成を備えることによって、例外行動判別プログラム3は、次のように、例外行動を判別する。
Note that after inputting the video footage to the exceptional behavior discrimination program 3 and executing the process of identifying exceptional behavior, the video footage captured by the surveillance camera 110 at the filming site is input to the exceptional behavior discrimination program 3, The statistical distribution of the feature values may be updated and the statistics may be recalculated.
(1-6) Exceptional Behavior Determination Process With the above configuration, the exceptional behavior determination program 3 determines exceptional behavior as follows.
 すなわち、例外行動判別装置100は、例外行動判別プログラム3を実行することによって、図7に示すように、監視カメラ110で撮影したビデオ映像を骨格情報抽出器301に入力する(S701)。 That is, by executing the exceptional behavior discrimination program 3, the exceptional behavior discrimination device 100 inputs the video image captured by the surveillance camera 110 to the skeleton information extractor 301, as shown in FIG. 7 (S701).
 なお、ビデオ映像を構成するフレームをすべて骨格情報抽出器301に入力する必要は無く、所定のフレーム数だけ間隔を空けて、骨格情報抽出器301に入力してもよい。また、骨格情報抽出器301に一度に入力するフレーム数は、行動特徴量抽出器302に入力する骨格情報に応じて決定される。 Note that it is not necessary to input all the frames constituting the video image to the skeleton information extractor 301, and they may be input to the skeleton information extractor 301 at intervals of a predetermined number of frames. Further, the number of frames input to the skeletal information extractor 301 at one time is determined according to the skeletal information input to the behavioral feature extractor 302.
 骨格情報抽出器301に入力されたビデオ映像に、人物など予め機械学習した画像オブジェクトが映っていない場合には(S702:NO)、ステップS701に進んで、上記の処理を繰り返す。 If the video image input to the skeleton information extractor 301 does not include an image object that has been machine learned in advance, such as a person (S702: NO), the process advances to step S701 and the above process is repeated.
 骨格情報抽出器301に入力されたビデオ映像に、所定の画像オブジェクトが映っている場合には(S702:YES)、骨格情報抽出器301が抽出した骨格情報を行動特徴量抽出器302に入力して、行動特徴量を抽出する(S703)。 If a predetermined image object is shown in the video image input to the skeletal information extractor 301 (S702: YES), the skeletal information extracted by the skeletal information extractor 301 is input to the behavioral feature extractor 302. Then, behavioral features are extracted (S703).
 次に、例外行動判定器303が、行動特徴量抽出器302が抽出した行動特徴量と、現場の統計的分布の中心とのマハラノビス距離D1を上述のようにして算出し(S704)、得られたマハラノビス距離を閾値Dth1と比較する。 Next, the exceptional behavior determiner 303 calculates the Mahalanobis distance D1 between the behavior feature extracted by the behavior feature extractor 302 and the center of the statistical distribution at the scene as described above (S704), and The Mahalanobis distance obtained is compared with the threshold value Dth1.
 マハラノビス距離D1が閾値Dth1よりも大きい場合には(S705:YES)、ビデオ映像には例外行動が映っていると判別する(S706)。一方、マハラノビス距離D1が閾値Dth1以下である場合には(S705:NO)、ビデオ映像には正常行動が映っていると判別する(S707)。 If the Mahalanobis distance D1 is larger than the threshold Dth1 (S705: YES), it is determined that the video image shows an exceptional behavior (S706). On the other hand, if the Mahalanobis distance D1 is less than or equal to the threshold Dth1 (S705: NO), it is determined that the video image shows normal behavior (S707).
 ステップS706、S707の後、ステップS701へ進んで上記の処理を繰り返す。 After steps S706 and S707, the process advances to step S701 and the above process is repeated.
 このようにすれば、例外行動判別器303においては、マハラノビス距離D1を算出して閾値Dth1と比較するだけなので、従来技術のように、骨格情報の再構成誤差および予測誤差から例外行動を判別する場合と比較して、例外行動判別装置100の処理負荷が軽くなり、処理速度が速くなる。
[2]第2の実施の形態
 本開示の第2の実施の形態は、上記第1の実施の形態のように例外行動を判別するのに加えて、例外行動と判別した行動が、既知の特定行動にいずれかに類似しているか否かを判断する。
In this way, the exceptional behavior discriminator 303 simply calculates the Mahalanobis distance D1 and compares it with the threshold Dth1, so that exceptional behavior can be determined from the reconstruction error and prediction error of skeletal information, as in the prior art. Compared to the case, the processing load on the exceptional behavior determination device 100 becomes lighter and the processing speed becomes faster.
[2] Second Embodiment The second embodiment of the present disclosure, in addition to determining exceptional behavior as in the first embodiment, determines whether the behavior determined to be an exceptional behavior is a known Determine whether the specific behavior is similar to any one.
 以下、主として、上記第1の実施例との差異点に着目して説明する。
(2-1)特定例外行動分類プログラム
 図8に示すように、本実施の形態に係る特定例外行動分類プログラム8は、上記第1の実施の形態に係る例外行動判別プログラム3と同様の構成を備える骨格情報抽出器801、行動特徴量抽出器802および例外行動判別器803の後段に特定例外行動分類器804を追加したものである。
The following description will focus mainly on the differences from the first embodiment.
(2-1) Specific exceptional behavior classification program As shown in FIG. 8, the specific exceptional behavior classification program 8 according to the present embodiment has the same configuration as the exceptional behavior discrimination program 3 according to the first embodiment. A specific exceptional behavior classifier 804 is added after the skeletal information extractor 801, behavior feature extractor 802, and exceptional behavior classifier 803.
 特定例外行動分類器804は、例外行動判別器803が例外行動であると判別した行動特徴量を用いて、当該例外行動と既知の特定行動との類似度を算出し、この類似度に基づいて、例外行動を分類した分類結果を出力する。 The specific exceptional behavior classifier 804 calculates the degree of similarity between the particular exceptional behavior and the known specific behavior using the behavioral features that the exceptional behavior discriminator 803 has determined to be exceptional behavior, and based on this degree of similarity. , outputs the classification results of the exceptional behavior.
 このため、特定例外行動分類器804は、予め特定行動ごとに行動特徴量の統計的分布(多変量ガウス分布)の平均値Mと共分散行列Σを求めておき、例外行動の行動特徴量と当該平均値Mとのマハラノビス距離D2を算出する。 For this reason, the specific exceptional behavior classifier 804 calculates the average value M and covariance matrix Σ of the statistical distribution (multivariate Gaussian distribution) of behavioral features for each specific behavior in advance, and calculates the behavioral features of the exceptional behavior and the covariance matrix Σ. The Mahalanobis distance D2 from the average value M is calculated.
 平均値Mと共分散行列Σは特定行動ごとの行動特徴量の統計的分布に関する統計量である。本実施の形態においては、監視カメラ110の設置先において監視カメラ110を用いて撮影したビデオ映像を用いて、骨格情報抽出器301に骨格情報を抽出させて、得られた骨格情報から行動特徴量抽出器302に抽出させた行動特徴量を標本値として、この標本値を用いて観測値の平均値と共分散行列とを算出する。 The mean value M and the covariance matrix Σ are statistics related to the statistical distribution of behavioral features for each specific behavior. In this embodiment, the skeletal information extractor 301 extracts skeletal information using a video image shot using the surveillance camera 110 at the location where the surveillance camera 110 is installed, and the behavioral feature amount is extracted from the obtained skeletal information. The behavioral feature quantity extracted by the extractor 302 is used as a sample value, and the average value and covariance matrix of the observed values are calculated using this sample value.
 標本値を用いて算出した平均値と共分散行列とは、多変量ガウス分布の平均値Mと共分散行列Σの最尤推定値になっている。以下においては、平均値Mおよび共分散行列Σの最尤推定値をそれぞれ単に平均値Mおよび共分散行列Σというものとする。 The mean value and covariance matrix calculated using sample values are the maximum likelihood estimated values of the mean value M and covariance matrix Σ of a multivariate Gaussian distribution. In the following, the maximum likelihood estimate of the mean value M and the covariance matrix Σ will be simply referred to as the mean value M and the covariance matrix Σ, respectively.
 図9に例示するように、行動認識データセットにおいて、特定行動をしている画像オブジェクトの行動特徴量の統計的分布について、平均値Mと共分散行列Σとを算出することができる。平均値Mは、特定行動の行動特徴量に対応する。 As illustrated in FIG. 9, it is possible to calculate the mean value M and covariance matrix Σ for the statistical distribution of the behavioral feature amounts of image objects performing specific behaviors in the behavioral recognition dataset. The average value M corresponds to the behavior feature amount of the specific behavior.
 特定行動の行動特徴量の統計的分布の平均値 Average value of statistical distribution of behavioral features of specific behavior
と、例外行動の行動特徴量 and behavioral features of exceptional behavior.
とのマハラノビス距離D2は、 The Mahalanobis distance D2 between
である。ここで、「t」は、ベクトルの転置を表す記号である。M1、…、Mnは、特定行動の行動特徴量の統計的分布における行動特徴量の成分ごとの平均値であり、平均値Mはこれらを成分とするベクトルである。x1、…、xnは、例外行動の行動特徴量xの各成分である。Σは、特定行動の行動特徴量の統計的分布に関する共分散行列である。平均値Mおよび共分散行列Σは、現場の統計的分布に関する統計量である。 It is. Here, "t" is a symbol representing transposition of a vector. M1, . . . , Mn are average values for each component of the behavior feature in the statistical distribution of the behavior feature of the specific behavior, and the average value M is a vector having these as components. x1, . . . , xn are each component of the behavioral feature amount x of the exceptional behavior. Σ is a covariance matrix regarding the statistical distribution of behavioral features of a specific behavior. The mean value M and the covariance matrix Σ are statistics regarding the statistical distribution of the field.
 得られたマハラノビス距離D2は当該例外行動と当該特定行動との類似度になっている。すなわち、マハラノビス距離D2が小さいほど、類似度が高い一方、マハラノビス距離D2が大きいほど類似度が低くなる。 The obtained Mahalanobis distance D2 is the degree of similarity between the exceptional behavior and the specific behavior. That is, the smaller the Mahalanobis distance D2, the higher the degree of similarity, while the larger the Mahalanobis distance D2, the lower the degree of similarity.
 本実施の形態においては、マハラノビス距離D2が所定の閾値Dth2以下である場合には、当該例外行動は当該特定行動に類似すると判断する。一方、マハラノビス距離D2が所定の閾値Dth2よりも大きい場合には、当該例外行動は当該特定行動に類似しないと判断する。
(2-2)特定例外行動分類器804に用いる行動特徴量の統計的分布の特定
 特定例外行動分類器804は、行動認識データセットにおいて特定行動と判断される行動の行動特徴量の統計的分布を用いて、例外行動を分類する。
In the present embodiment, if the Mahalanobis distance D2 is less than or equal to the predetermined threshold Dth2, it is determined that the exceptional behavior is similar to the specific behavior. On the other hand, if the Mahalanobis distance D2 is larger than the predetermined threshold Dth2, it is determined that the exceptional behavior is not similar to the specific behavior.
(2-2) Identification of the statistical distribution of behavioral features used in the specific exceptional behavior classifier 804 The specific exceptional behavior classifier 804 uses the statistical distribution of behavioral features of behaviors that are determined to be specific behaviors in the behavior recognition dataset. Classify exceptional behavior using
 行動特徴量の統計的分布を特定する際には、機械学習が完了した骨格情報抽出器301および行動特徴量抽出器302を用いる。 When specifying the statistical distribution of behavioral features, the skeletal information extractor 301 and behavioral feature extractor 302, which have undergone machine learning, are used.
 図10に示すように、行動特徴量の検出頻度を0に初期化した後(S1001)、行動認識データセットのうち特定行動を撮影したビデオ映像をフレーム単位で骨格情報抽出器301に入力して、骨格情報を抽出し(S1002)、得られた骨格情報を行動特徴量抽出器302に入力して、行動特徴量を抽出する(S1003)。 As shown in FIG. 10, after initializing the detection frequency of behavioral features to 0 (S1001), a video image of a specific behavior in the behavior recognition dataset is input to the skeletal information extractor 301 frame by frame. , extracts skeletal information (S1002), inputs the obtained skeletal information to the behavioral feature extractor 302, and extracts behavioral features (S1003).
 このようにして得られた行動特徴量の検出頻度を更新する(S1004)。すなわち、計数値を1だけ増加させる。なお、上記第1の実施の形態における現場の統計的分布と同様に、行動特徴量の検出頻度に代えて、当該行動特徴量を含む範囲における検出頻度を更新してもよい。 The detection frequency of the behavioral feature obtained in this way is updated (S1004). That is, the count value is increased by one. Note that, similarly to the statistical distribution of the scene in the first embodiment, instead of the detection frequency of the behavior feature, the detection frequency in the range including the behavior feature may be updated.
 その後、当該特定行動を撮影したビデオ映像であって、行動特徴量の検出頻度の計数に用いていないビデオ映像が行動認識データセットにある場合には(S1005:YES)、ステップS1002へ進んで、未使用のビデオ映像を用いて上記と同様の処理を実行する。 After that, if there is a video image of the specific action that is not used for counting the detection frequency of the action feature amount in the action recognition dataset (S1005: YES), the process advances to step S1002, Processing similar to the above is performed using unused video footage.
 未使用のビデオ映像が無くなった場合には(S1005:NO)、行動特徴量の検出頻度が多変量ガウス分布に従うことを利用して、行動特徴量の検出頻度から平均値Mと共分散行列Σを推定し(S1006)、処理を終了する。 If there are no unused video images (S1005: NO), the mean value M and the covariance matrix Σ are calculated from the detection frequency of the behavioral features using the fact that the detection frequency of the behavioral features follows a multivariate Gaussian distribution. is estimated (S1006), and the process ends.
 このようにして得られた平均値Mおよび共分散行列Σを用いれば、上述のように、マハラノビス距離D2を算出することができる。 By using the mean value M and covariance matrix Σ obtained in this way, the Mahalanobis distance D2 can be calculated as described above.
 なお、上記の処理は、行動認識データセットのビデオ映像に映っている特定行動ごとに行う。
(2-3)例外行動の判別処理
 以上のような構成を備えることによって、特定例外行動分類プログラム3は、次のようにして、例外行動を分類する。
Note that the above processing is performed for each specific action shown in the video image of the action recognition dataset.
(2-3) Exceptional Behavior Discrimination Process With the above configuration, the specific exceptional behavior classification program 3 classifies exceptional behavior as follows.
 すなわち、例外行動判別装置100は、例外特定行動分類プログラム8を実行することによって、図11に示すように、例外行動がどの特定行動に類似するか分類する。なお、図11におけるステップS1101からS1107については、図7におけるステップS701からS707までと同様の処理であるので説明を省く。 That is, by executing the exceptional behavior classification program 8, the exceptional behavior discrimination device 100 classifies which specific behavior the exceptional behavior is similar to, as shown in FIG. Note that steps S1101 to S1107 in FIG. 11 are the same processes as steps S701 to S707 in FIG. 7, so a description thereof will be omitted.
 ビデオ映像に例外行動が映っていると判別した場合には(S1106)、特定行動ごとにステップS1109からS1112までの処理を実行する(S1108、S1113)。すなわち、当該特定行動の統計的分布の中心と、例外行動の行動特徴量と、のマハラノビス距離D2を求めて(S1109)、当該マハラノビス距離D2と閾値Dth2とを比較する。 If it is determined that an exceptional behavior is shown in the video image (S1106), the processes from steps S1109 to S1112 are executed for each specific behavior (S1108, S1113). That is, the Mahalanobis distance D2 between the center of the statistical distribution of the specific behavior and the behavior feature of the exceptional behavior is calculated (S1109), and the Mahalanobis distance D2 is compared with the threshold Dth2.
 マハラノビス距離D2が閾値Dth2以下である場合には(S1110:YES)、当該例外行動は当該特定行動に類似する、と判断する(S1111)。一方、マハラノビス距離D2が閾値Dth2よりも大きい場合には(S1110:NO)、当該例外行動は当該特定行動に類似しない、と判断する(S1112)。 If the Mahalanobis distance D2 is less than or equal to the threshold Dth2 (S1110: YES), it is determined that the exceptional behavior is similar to the specific behavior (S1111). On the other hand, if the Mahalanobis distance D2 is larger than the threshold Dth2 (S1110: NO), it is determined that the exceptional behavior is not similar to the specific behavior (S1112).
 このようにすれば、監視カメラ110で撮影したビデオ映像に映っている例外行動がどの特定行動に類似する例外行動であるか分類することができる。 In this way, it is possible to classify which specific behavior the exceptional behavior shown in the video image captured by the surveillance camera 110 is similar to.
 また、例外行動の分類を行うために、特定行動ごとに行動認識データセットを用いて推定した行動特徴量の統計的分布を用いるので、ニューラルネットワークを用いて例外行動を分類する場合と比較して、機械学習が不要になるため学習コストがかからない。また、例外行動を分類するための処理負荷を低くすることができ、処理速度を速くすることができる。
[3]変形例
 以上、本開示を実施の形態に基づいて説明してきたが、本開示が上述の実施の形態に限定されないのは勿論であり、以下のような変形例を実施することができる。
(3-1)上記実施の形態においては、行動特徴量の統計的分布が多変量ガウス分布に従う場合を例にとって説明したが、本開示がこれに限定されないのは言うまでもなく、行動特徴量の統計的分布は、多変量ガウス分布に代えて、多変量混合ガウス分布に従うとしてもよい。
In addition, in order to classify exceptional behavior, we use the statistical distribution of behavioral features estimated using behavior recognition datasets for each specific behavior, so compared to the case of classifying exceptional behavior using a neural network. , there is no need for machine learning, so there is no learning cost. Furthermore, the processing load for classifying exceptional behavior can be reduced, and the processing speed can be increased.
[3] Modifications Although the present disclosure has been described above based on the embodiments, it goes without saying that the present disclosure is not limited to the above-described embodiments, and the following modifications can be implemented. .
(3-1) In the above embodiment, the case where the statistical distribution of behavioral features follows a multivariate Gaussian distribution is explained as an example, but it goes without saying that the present disclosure is not limited to this. The distribution may follow a multivariate mixed Gaussian distribution instead of a multivariate Gaussian distribution.
 行動特徴量の統計的分布が多変量混合ガウス分布に従う場合には、当該多変量複合ガウス分布において、重ね合わされている複数の多変量ガウス分布について個別に統計量(平均値μおよび共分散行列Σ)を求めて、これらの多変量ガウス分布のいずれについてもその平均値から例外行動判別の対象となる行動特徴量までのマハラノビス距離D1が閾値Dth1よりも大きい場合に、例外行動であると判別してもよい。 When the statistical distribution of behavioral features follows a multivariate Gaussian mixture distribution, in the multivariate composite Gaussian distribution, the statistics (mean value μ and covariance matrix Σ ), and if the Mahalanobis distance D1 from the average value of any of these multivariate Gaussian distributions to the behavior feature that is the target of exceptional behavior determination is greater than the threshold Dth1, the behavior is determined to be exceptional behavior. It's okay.
 一方、いずれかの多変量ガウス分布についてマハラノビス距離D1が閾値Dth1以下である場合には、正常行動である(例外行動でない)と判別することになる。 On the other hand, if the Mahalanobis distance D1 for any multivariate Gaussian distribution is less than or equal to the threshold Dth1, it is determined that the behavior is normal (not exceptional behavior).
 このようにすれば、監視カメラ110の設置先に応じて、更に柔軟に例外行動を判別することができる。
(3-2)上記実施の形態においては、特定例外行動分類プログラム8の特定行動分類器804が、特定行動ごとの行動特徴量の統計的分布に関する統計量を用いて、例外行動の類似度を判別する場合を例にとって説明したが、本開示がこれに限定されないのは言うまでもなく、これに代えて次のようにしてもよい。
In this way, exceptional behavior can be determined more flexibly depending on where the surveillance camera 110 is installed.
(3-2) In the above embodiment, the specific behavior classifier 804 of the specific exceptional behavior classification program 8 uses statistics regarding the statistical distribution of behavioral features for each specific behavior to calculate the similarity of exceptional behaviors. Although the case where the determination is made has been described as an example, it goes without saying that the present disclosure is not limited to this, and the following may be used instead.
 例えば、特定行動をとる画像オブジェクトを映したビデオ映像を学習用ビデオ映像とし、当該画像オブジェクトがとる特定行動を識別するラベルを教師信号として、機械学習した機械学習モデルを用いて、例外行動がどの特定行動に類似するかを判別してもよい。 For example, a video image showing an image object taking a specific action is used as a training video image, a label that identifies the specific action taken by the image object is used as a teacher signal, and a machine learning model is used to determine the exceptional behavior. It may also be determined whether the behavior is similar to a specific behavior.
 特定行動毎の行動特徴量の統計的分布に関する統計量を推定した場合と同様に、当該機械学習に際しても、学習用ビデオ映像として、一般的な行動認識データセットのビデオ映像を用いることができる。したがって、当該機械学習に要する学習コストを低く抑えることができる。
(3-3)上記実施の形態においては、本開示が、例外行動を判別する例外行動判別装置、および例外行動判別装置に例外行動を判別させる例外行動判別プログラムである場合を例にとって説明したが、本開示がこれに限定されないのは言うまでもなく、これらの他に、例外行動を判別する際に例外行動判別装置が実行する例外行動判別方法であるとしてもよい。
Similar to the case where statistics regarding the statistical distribution of behavior features for each specific behavior are estimated, video images of general behavior recognition data sets can be used as learning video images in the machine learning. Therefore, the learning cost required for the machine learning can be kept low.
(3-3) In the above embodiments, the present disclosure is an example of an exceptional behavior discrimination device that discriminates exceptional behavior, and an exceptional behavior discrimination program that causes the exceptional behavior discrimination device to discriminate exceptional behavior. , it goes without saying that the present disclosure is not limited to these, but in addition to these, it may be an exceptional behavior determination method that is executed by the exceptional behavior determination device when determining exceptional behavior.
 本開示を、このような例外行動判別方法とした場合であっても、例外行動判別装置や例外行動判別プログラムと同様に、本開示の効果を得ることができるのは言うまでもない。
(3-4)上記実施の形態においては、例外行動の判別および分類を行うためにマハラノビス距離を用いる場合を例にとって説明したが、本開示がこれに限定されないのは言うまでもなく、マハラノビス距離に代えて他の距離を用いてもよい。
Even when the present disclosure is applied to such an exceptional behavior discrimination method, it goes without saying that the effects of the present disclosure can be obtained in the same way as the exceptional behavior discrimination device and the exceptional behavior discrimination program.
(3-4) In the above embodiment, the case where the Mahalanobis distance is used to discriminate and classify exceptional behavior has been described as an example, but it goes without saying that the present disclosure is not limited to this. Other distances may also be used.
 例えば、行動特徴量を構成する成分のうち、分散が小さい成分よりも分散が大きい成分を重視する場合には、ユークリッド距離を用いてもよい。 For example, when placing more importance on a component with a large variance than a component with a small variance among the components making up the behavioral feature amount, the Euclidean distance may be used.
 また、更にユークリッド距離以外の距離を用いてもよい。
(3-5)上記実施の形態においては、特に言及しなかったが、行動特徴量抽出器302が人物ごとに行動特徴量を抽出する場合には、人物の順序が置き換わると、この置き換わりに応じて、人物ごとに抽出した行動特徴量の順序を置き換わってもよい。
Furthermore, distances other than Euclidean distance may be used.
(3-5) Although not specifically mentioned in the above embodiment, when the behavioral feature extractor 302 extracts behavioral features for each person, if the order of the persons is replaced, the Then, the order of the behavioral features extracted for each person may be replaced.
 この意味において、行動特徴量抽出器302は、Permutation Equivariantであってもよい。例えば、人物ごとに、PointNetを用いて行動特徴量を抽出してもよいし、他のPermutationEquivariantな手法を用いてもよい。 In this sense, the behavior feature extractor 302 may be a permutation equivariant. For example, behavioral features may be extracted for each person using PointNet, or other permutation equivariant methods may be used.
 このようにすれば、骨格情報抽出器301が人物ごとに抽出した骨格情報の順序に応じて、人物ごとに骨格情報と行動特徴量とを対応付けることができる。したがって、単に、例外行動がなされたか否かの判別だけではなく、例外行動を行った人物まで判別したい場合には、有効である。 In this way, skeletal information and behavior feature amounts can be associated for each person according to the order of the skeletal information extracted for each person by the skeletal information extractor 301. Therefore, it is effective when it is desired not only to determine whether or not an exceptional behavior has been performed, but also to determine the person who has performed the exceptional behavior.
 なお、人物ごとに行動特徴量を抽出する場合には、ビデオ映像を構成するフレーム間で人物ごとに骨格情報を対応付ける必要がある。人物ごとに骨格情報を対応付ける処理は、骨格情報抽出器301が行ってもよいし、行動特徴量抽出器302が行ってもよい。 Note that when extracting behavioral features for each person, it is necessary to associate skeletal information for each person between frames that make up the video image. The process of associating skeletal information with each person may be performed by the skeletal information extractor 301 or the behavior feature extractor 302.
 また、シーン単位で例外行動の有無を判別する場合には、骨格情報を人物ごとに区別せずに、フレームのXY座標およびフレーム番号からなる3次元空間における関節点の分布から例外行動の有無を判別してもよい。この場合には、すべての関節点に関する情報をPointNetに入力して、行動特徴量を抽出する。
(3-6)上記実施の形態においては特に言及しなかったが、行動認識データセットとしては、例えば、人物の例外行動を判別する場合には、学校法人千葉工業大学人工知能・ソフトウェア技術研究センター(ステアラボ)、国立研究開発法人産業技術総合研究所(産総研)および国立研究開発法人新エネルギー・産業技術総合開発機構(NEDO: New Energy and Industrial Technology Development Organization)が構築した動画キャプションデータセット「STAIR Actions キャプションデータセット」を用いることができる。この行動認識データセットには、日常生活シーンを中心として人物の動作が収録されている。
In addition, when determining the presence or absence of exceptional behavior on a scene-by-scene basis, the presence or absence of exceptional behavior can be determined from the distribution of joint points in a three-dimensional space consisting of the XY coordinates and frame numbers of frames, without distinguishing skeletal information for each person. May be determined. In this case, information regarding all joint points is input to PointNet to extract behavioral features.
(3-6) Although not specifically mentioned in the above embodiment, for example, when determining exceptional behavior of a person, the Chiba Institute of Technology Artificial Intelligence and Software Technology Research Center STAIR, a video caption dataset created by the National Institute of Advanced Industrial Science and Technology (AIST) and the New Energy and Industrial Technology Development Organization (NEDO) Actions Caption Dataset” can be used. This action recognition dataset contains human movements mainly in daily life scenes.
 また、上記以外の行動認識データセットを用いてもよいことは言うまでもない。例えば、車両の運行状況を収録した行動認識データセットを用いれば、酒気帯び運転などの例外運行を判別することができる。所望の画像オブジェクトのどのような行動を例外行動として判別したいかに応じて、適切な行動認識データセットを用いるのが望ましい。
(3-7)上記実施の形態においては、例外行動判別装置100と監視カメラ110とが別体である場合を例にとって説明したが、本開示がこれに限定されないのは言うまでもなく、これに代えて、例外行動判別装置100を監視カメラ110に組み込んでもよい。
Furthermore, it goes without saying that action recognition data sets other than those described above may be used. For example, by using a behavioral recognition data set that records vehicle operating conditions, it is possible to identify unusual driving situations such as driving under the influence of alcohol. It is desirable to use an appropriate behavior recognition data set depending on what kind of behavior of a desired image object is desired to be determined as an exceptional behavior.
(3-7) In the above embodiment, the case where the exceptional behavior determination device 100 and the surveillance camera 110 are separate bodies has been described as an example, but it goes without saying that the present disclosure is not limited to this. Alternatively, the exceptional behavior determination device 100 may be incorporated into the surveillance camera 110.
 また、上記実施の形態においては、記憶部205がクラウドストレージであってよいと述べたが、記憶部205だけでなく、例外行動判別装置100そのものがクラウドサーバーであってもよい。 Furthermore, in the above embodiment, it has been stated that the storage unit 205 may be a cloud storage, but not only the storage unit 205 but also the exceptional behavior determination device 100 itself may be a cloud server.
 このようにすれば、記憶部205としてスケーラブルなストレージを利用することができるだけでなく、監視カメラ110の台数や、監視カメラ110の解像度やフレームレートに応じて例外行動判別装置100の性能もまたスケーラブルにすることができる。 In this way, not only can scalable storage be used as the storage unit 205, but also the performance of the exceptional behavior determination device 100 can be scaled according to the number of surveillance cameras 110 and the resolution and frame rate of the surveillance cameras 110. It can be done.
 このような場合においては、本開示を適用することによって、クラウドシステムの負荷や費用を、従来技術を用いる場合よりも削減することができる。
(3-8)上記実施例及び上記変形例をそれぞれ組み合わせるとしてもよい。
In such a case, by applying the present disclosure, the load and cost of the cloud system can be reduced more than when using conventional techniques.
(3-8) The above embodiment and the above modification may be combined respectively.
 本開示に係る例外行動判別方法、例外行動判別プログラムおよび例外行動判別装置は、機械学習モデルによって例外行動の判別処理を行うための学習コストを削減する技術として有用である。 The exceptional behavior determination method, exceptional behavior determination program, and exceptional behavior determination device according to the present disclosure are useful as a technique for reducing the learning cost for performing exceptional behavior determination processing using a machine learning model.
1………………………………ビデオ監視システム
3………………………………例外行動判別プログラム
8………………………………特定例外行動分類プログラム
100…………………………例外行動判別装置
110…………………………監視カメラ
301、801、1211…骨格情報抽出器
302、802………………行動特徴量抽出器
303、803………………例外行動判別部
804…………………………特定行動分類器
1200………………………MPED-RNN
1201………………………特徴量抽出器
1202………………………再構成器
1203、1205…………多層パーセプトロン
1204………………………予測器
1………………………………Video monitoring system 3………………………………Exceptional behavior discrimination program 8………………………………Specific exceptional behavior classification Program 100......Exceptional behavior discriminator 110......Surveillance cameras 301, 801, 1211...Skeletal information extractors 302, 802...Behavior features Quantity extractor 303, 803......Exceptional behavior discriminator 804......Specific behavior classifier 1200...MPED-RNN
1201......Feature extractor 1202......Reconstructor 1203, 1205...Multilayer perceptron 1204...Predictor

Claims (20)

  1.  ビデオ映像に映った画像オブジェクトの例外行動判別方法であって、
     ビデオ映像のフレーム毎に、画像オブジェクトの特徴点を抽出する特徴点抽出ステップと、
     機械学習モデルを用いて、特徴点の抽出結果から行動特徴量を抽出する行動特徴量抽出ステップと、
     抽出した行動特徴量に係る行動について、例外行動の判別対象としたビデオ映像の撮影現場における統計的な発生頻度が基準よりも低ければ、例外行動と判別する行動判別ステップと、を含み、
     前記機械学習モデルは、学習用ビデオ映像のフレーム毎に抽出された、画像オブジェクトの特徴点の抽出結果を用いて、機械学習済みであり、
     前記学習用ビデオ映像は、当該学習用ビデオ映像に映っている画像オブジェクトの行動が例外行動か否かに関する情報を含んでいない
    ことを特徴とする例外行動判別方法。
    A method for determining exceptional behavior of an image object shown in a video image, the method comprising:
    a feature point extraction step of extracting feature points of the image object for each frame of the video image;
    a behavioral feature extraction step of extracting behavioral features from the feature point extraction results using a machine learning model;
    a behavior determination step of determining the behavior related to the extracted behavioral feature amount to be an exceptional behavior if the statistical occurrence frequency at the filming site of the video footage targeted for determination of the exceptional behavior is lower than a standard;
    The machine learning model has been machine learned using the extraction results of feature points of image objects extracted for each frame of the learning video image,
    The method for determining exceptional behavior is characterized in that the learning video image does not include information regarding whether or not the behavior of the image object shown in the learning video image is an exceptional behavior.
  2.  学習用ビデオ映像は、少なくとも、当該ビデオ映像の撮影場所とは異なる場所で撮影されたビデオ映像を含む
    ことを特徴とする請求項1に記載の例外行動判別方法。
    2. The method for determining exceptional behavior according to claim 1, wherein the learning video footage includes at least a video footage shot at a location different from the location where the video footage was shot.
  3.  前記特徴点は、前記画像オブジェクトの関節点である
    ことを特徴とする請求項1に記載の例外行動判別方法。
    2. The exceptional behavior determination method according to claim 1, wherein the feature points are joint points of the image object.
  4.  前記特徴点は、前記画像オブジェクトを囲む矩形の頂点である
    ことを特徴とする請求項1に記載の例外行動判別方法。
    2. The method for determining exceptional behavior according to claim 1, wherein the feature points are vertices of a rectangle surrounding the image object.
  5.  前記特徴点の抽出結果は、
     当該特徴点を抽出したフレームにおける当該特徴点の座標値と、
     当該特徴点を抽出したフレームの当該動画像における順序と、を含む
    ことを特徴とする請求項1に記載の例外行動判別方法。
    The extraction result of the feature points is
    Coordinate values of the feature point in the frame from which the feature point was extracted;
    2. The method for determining exceptional behavior according to claim 1, further comprising: an order in the moving image of frames from which the feature points are extracted.
  6.  前記特徴点に関する情報は、
     当該特徴点が尤もらしく検出されていることを表す検出スコアと、
     当該画像オブジェクトの種類を表すラベルと、
     当該特徴点の種類を表す属性と、
     当該画像オブジェクトの外観を表す属性と、の少なくとも一つ以上を含む
    ことを特徴とする請求項1に記載の例外行動判別方法。
    Information regarding the feature points is
    a detection score indicating that the feature point has been most likely detected;
    a label indicating the type of the image object;
    an attribute representing the type of the feature point;
    2. The method for determining exceptional behavior according to claim 1, further comprising at least one attribute representing an appearance of the image object.
  7.  前記特徴点抽出ステップにおいては、前記ビデオ映像に含まれる一つ以上のフレームから特徴点を抽出する
    ことを特徴とする請求項1に記載の例外行動判別方法。
    2. The method for determining exceptional behavior according to claim 1, wherein in the feature point extraction step, feature points are extracted from one or more frames included in the video image.
  8.  前記特徴点抽出ステップにおいては、前記ビデオ映像に含まれる一つ以上のフレームから、機械学習モデルを用いたニューロ演算によって、特徴点を抽出する
    ことを特徴とする請求項7に記載の例外行動判別方法。
    Exceptional behavior discrimination according to claim 7, characterized in that in the feature point extraction step, feature points are extracted from one or more frames included in the video image by neural calculation using a machine learning model. Method.
  9.  前記ニューロ演算を行う機械学習モデルは、畳み込みニューラルネットワークとセルフアテンション機構との少なくとも一方を含む
    ことを特徴とする請求項8に記載の例外行動判別方法。
    9. The method for determining exceptional behavior according to claim 8, wherein the machine learning model that performs the neural calculation includes at least one of a convolutional neural network and a self-attention mechanism.
  10.  前記行動特徴量抽出ステップは、前記特徴点に関する情報を入力とする機械学習モデルを用いて、前記行動特徴量を抽出する
    ことを特徴とする請求項1に記載の例外行動判別方法。
    2. The method for determining exceptional behavior according to claim 1, wherein the behavioral feature amount extraction step extracts the behavioral feature amount using a machine learning model that receives information regarding the feature points as input.
  11.  前記機械学習モデルは、Permutation Invariantな深層ニューラルネットワークである
    ことを特徴とする請求項10に記載の例外行動判別方法。
    11. The method for determining exceptional behavior according to claim 10, wherein the machine learning model is a permutation invariant deep neural network.
  12.  前記深層ニューラルネットワークはPointNetである
    ことを特徴とする請求項11に記載の例外行動判別方法。
    12. The method for determining exceptional behavior according to claim 11, wherein the deep neural network is PointNet.
  13.  前記行動判別ステップは、
     前記行動特徴量を抽出したビデオ映像の撮影以前に当該撮影現場で撮影した先行ビデオ映像から行動特徴量を抽出し、
     抽出された行動特徴量を用いて例外行動と判別する行動特徴量の範囲を調整する
    ことを特徴とする請求項1に記載の例外行動判別方法。
    The behavior determination step includes:
    Extracting a behavioral feature from a preceding video image shot at the filming location before the video footage from which the behavioral feature was extracted;
    2. The method for determining exceptional behavior according to claim 1, further comprising adjusting a range of behavioral features for determining exceptional behavior using the extracted behavioral features.
  14.  前記行動判別ステップは、
     前記先行ビデオ映像から抽出した行動特徴量から行動特徴量の統計的分布の統計量を算出し、
     算出した統計量を用いて例外行動と判別する行動特徴量の範囲を決定する
    ことを特徴とする請求項13に記載の例外行動判別方法。
    The behavior determination step includes:
    Calculating statistics of a statistical distribution of behavioral features from the behavioral features extracted from the preceding video footage;
    14. The method for determining exceptional behavior according to claim 13, further comprising determining a range of behavioral features for determining exceptional behavior using the calculated statistics.
  15.  前記行動特徴量の統計的分布は、ガウス分布または混合ガウス分布である
    ことを特徴とする請求項14に記載の例外行動判別方法。
    15. The method for determining exceptional behavior according to claim 14, wherein the statistical distribution of the behavior feature is a Gaussian distribution or a mixed Gaussian distribution.
  16.  前記例外行動が、画像オブジェクトがとり得る既知の特定行動のうち、いずれの特定行動に類似するかを判定する例外行動分類ステップを含む
    ことを特徴とする請求項1に記載の例外行動判別方法。
    2. The method for determining exceptional behavior according to claim 1, further comprising an exceptional behavior classification step of determining which specific behavior, among known specific behaviors that the image object can take, the exceptional behavior is similar to.
  17.  前記例外行動分類ステップは、
     特定行動をとる画像オブジェクトを映したビデオ映像から特定行動に係る行動特徴量の統計的分布の統計量を算出し、
     算出した統計量を用いて当該に類似する例外行動の行動特徴量の範囲を決定する
    ことを特徴とする請求項16に記載の例外行動判別方法。
    The exceptional behavior classification step includes:
    Calculate the statistics of the statistical distribution of behavioral features related to a specific behavior from a video image showing an image object taking a specific behavior,
    17. The method for determining exceptional behavior according to claim 16, further comprising determining a range of behavioral features of an exceptional behavior similar to the exceptional behavior using the calculated statistics.
  18.  前記例外行動分類ステップは、
     特定行動をとる画像オブジェクトを映したビデオ映像を用いて機械学習した機械学習モデルを用いて、特定行動と例外行動との類似度を算出する
    ことを特徴とする請求項16に記載の例外行動判別方法。
    The exceptional behavior classification step includes:
    Exceptional behavior determination according to claim 16, characterized in that the degree of similarity between the specific behavior and the exceptional behavior is calculated using a machine learning model machine learned using a video image showing an image object taking a specific behavior. Method.
  19.  ビデオ映像に映った画像オブジェクトの例外行動をコンピューターに判別させる例外行動判別プログラムであって、
     ビデオ映像のフレーム毎に、画像オブジェクトの特徴点を抽出する特徴点抽出ステップと、
     機械学習モデルを用いて、特徴点の抽出結果から行動特徴量を抽出する行動特徴量抽出ステップと、
     抽出した行動特徴量に係る行動について、例外行動の判別対象としたビデオ映像の撮影現場において統計的に発生頻度が基準よりも低ければ、例外行動と判別する行動判別ステップと、をコンピューターに実行させ、
     前記機械学習モデルは、学習用ビデオ映像のフレーム毎に抽出された、画像オブジェクトの特徴点の抽出結果を用いて、機械学習済みであり、
     前記学習用ビデオ映像は、当該学習用ビデオ映像に映っている画像オブジェクトの行動が例外行動か否かに関する情報を含んでいない
    ことを特徴とする例外行動判別プログラム。
    An exceptional behavior discrimination program that causes a computer to discriminate exceptional behavior of an image object shown in a video image,
    a feature point extraction step of extracting feature points of the image object for each frame of the video image;
    a behavioral feature extraction step of extracting behavioral features from the feature point extraction results using a machine learning model;
    For the behavior related to the extracted behavioral feature amount, if the frequency of occurrence is statistically lower than the standard at the filming site of the video footage targeted for discrimination as an exceptional behavior, the computer executes a behavior discrimination step of determining the behavior as an exceptional behavior. ,
    The machine learning model has been machine learned using the extraction results of feature points of image objects extracted for each frame of the learning video image,
    The learning video image does not include information regarding whether the behavior of the image object shown in the learning video image is an exceptional behavior or not.
  20.  ビデオ映像に映った画像オブジェクトの例外行動を判別する例外行動判別装置であって、
     ビデオ映像のフレーム毎に、画像オブジェクトの特徴点を抽出する特徴点抽出手段と、
     機械学習モデルを用いて、特徴点の抽出結果から行動特徴量を抽出する行動特徴量抽出手段と、
     抽出した行動特徴量に係る行動について、例外行動の判別対象としたビデオ映像の撮影現場において統計的に発生頻度が基準よりも低ければ、例外行動と判別する行動判別手段と、を備え、
     前記機械学習モデルは、学習用ビデオ映像のフレーム毎に抽出された、画像オブジェクトの特徴点の抽出結果を用いて、機械学習済みであり、
     前記学習用ビデオ映像は、当該学習用ビデオ映像に映っている画像オブジェクトの行動が例外行動か否かに関する情報を含んでいない
    ことを特徴とする例外行動判別装置。
    An exceptional behavior discrimination device for determining exceptional behavior of an image object shown in a video image,
    Feature point extraction means for extracting feature points of an image object for each frame of a video image;
    a behavioral feature extracting means for extracting a behavioral feature from the feature point extraction results using a machine learning model;
    Behavior determination means for determining the behavior related to the extracted behavioral feature amount as an exceptional behavior if the frequency of occurrence is statistically lower than the standard at the filming site of the video footage targeted for determination of the exceptional behavior;
    The machine learning model has been machine learned using the extraction results of feature points of image objects extracted for each frame of the learning video image,
    An exceptional behavior determination device characterized in that the learning video image does not include information regarding whether or not the behavior of the image object shown in the learning video image is an exceptional behavior.
PCT/JP2023/020081 2022-06-13 2023-05-30 Unusual behavior discrimination method, unusual behavior discrimination program, and unusual behavior discrimination device WO2023243398A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
JP2022-095012 2022-06-13
JP2022095012 2022-06-13

Publications (1)

Publication Number Publication Date
WO2023243398A1 true WO2023243398A1 (en) 2023-12-21

Family

ID=89190980

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2023/020081 WO2023243398A1 (en) 2022-06-13 2023-05-30 Unusual behavior discrimination method, unusual behavior discrimination program, and unusual behavior discrimination device

Country Status (1)

Country Link
WO (1) WO2023243398A1 (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2011055204A (en) * 2009-09-01 2011-03-17 National Institute Of Advanced Industrial Science & Technology Compression method and compression apparatus of moving picture
JP2011130203A (en) * 2009-12-17 2011-06-30 Canon Inc Video information processing method and apparatus therefor
JP2019144830A (en) * 2018-02-20 2019-08-29 Kddi株式会社 Program, device, and method for recognizing actions of persons using a plurality of recognition engines
CN113705534A (en) * 2021-09-17 2021-11-26 平安医疗健康管理股份有限公司 Behavior prediction method, behavior prediction device, behavior prediction equipment and storage medium based on deep vision

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2011055204A (en) * 2009-09-01 2011-03-17 National Institute Of Advanced Industrial Science & Technology Compression method and compression apparatus of moving picture
JP2011130203A (en) * 2009-12-17 2011-06-30 Canon Inc Video information processing method and apparatus therefor
JP2019144830A (en) * 2018-02-20 2019-08-29 Kddi株式会社 Program, device, and method for recognizing actions of persons using a plurality of recognition engines
CN113705534A (en) * 2021-09-17 2021-11-26 平安医疗健康管理股份有限公司 Behavior prediction method, behavior prediction device, behavior prediction equipment and storage medium based on deep vision

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
GIBO, TATSUYA ET AL.: "Anomaly Detection Focused on Expansive Multi-Person Behavior Patterns", IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, vol. J96-D, no. 11, 2013, pages 2765 - 2775 *
LI MENG; CHEN TAO; DU HAO: "Human Behavior Recognition Using Range-Velocity-Time Points", IEEE ACCESS, IEEE, USA, vol. 8, 21 February 2020 (2020-02-21), USA , pages 37914 - 37925, XP011776047, DOI: 10.1109/ACCESS.2020.2975676 *
TATSUYA GIBO, ERI KUZUMOTO, SHIGEKI AOKI, TAKAO MIYAMOTO, MICHIFUMI YOSHIOKA: "Detection of Abnormality Based on Comprehensive Behavioral Patterns", IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, DENSHI JOUHOU TSUUSHIN GAKKAI, JOUHOU SHISUTEMU SOSAIETI, JP, vol. J96-D, no. 11, 1 January 2013 (2013-01-01), JP , pages 2765 - 2775, XP009551204, ISSN: 1880-4535 *

Similar Documents

Publication Publication Date Title
Chen et al. Attention-based context aggregation network for monocular depth estimation
Shami et al. People counting in dense crowd images using sparse head detections
Liu et al. Counting objects by blockwise classification
WO2021073311A1 (en) Image recognition method and apparatus, computer-readable storage medium and chip
Kumaran et al. Recognition of human actions using CNN-GWO: a novel modeling of CNN for enhancement of classification performance
Parashar et al. Deep learning pipelines for recognition of gait biometrics with covariates: a comprehensive review
Zhao et al. Robust unsupervised motion pattern inference from video and applications
Sabokrou et al. Fast and accurate detection and localization of abnormal behavior in crowded scenes
Luo et al. Traffic analytics with low-frame-rate videos
Berlin et al. Spiking neural network based on joint entropy of optical flow features for human action recognition
Dharejo et al. FuzzyAct: A fuzzy-based framework for temporal activity recognition in IoT applications using RNN and 3D-DWT
Hu et al. Video anomaly detection based on 3D convolutional auto-encoder
Mu et al. Abnormal human behavior detection in videos: A review
Lalit et al. Crowd abnormality detection in video sequences using supervised convolutional neural network
Magdy et al. Violence 4D: Violence detection in surveillance using 4D convolutional neural networks
Gorodnichev et al. Research and Development of a System for Determining Abnormal Human Behavior by Video Image Based on Deepstream Technology
Alsaggaf et al. A smart surveillance system for uncooperative gait recognition using cycle consistent generative adversarial networks (CCGANs)
Mademlis et al. Exploiting stereoscopic disparity for augmenting human activity recognition performance
Bux Vision-based human action recognition using machine learning techniques
WO2023243398A1 (en) Unusual behavior discrimination method, unusual behavior discrimination program, and unusual behavior discrimination device
Saif et al. Aggressive action estimation: a comprehensive review on neural network based human segmentation and action recognition
Itano et al. Human actions recognition in video scenes from multiple camera viewpoints
Singh et al. Conditional autoregressive-tunicate swarm algorithm based generative adversarial network for violent crowd behavior recognition
WO2024014199A1 (en) Image identification method, image identification program, and image identification device
Vaquette et al. Robust information fusion in the DOHT paradigm for real-time action detection

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23823692

Country of ref document: EP

Kind code of ref document: A1