WO2021238185A1 - Object detection method and apparatus, electronic device, storage medium and program - Google Patents

Object detection method and apparatus, electronic device, storage medium and program Download PDF

Info

Publication number
WO2021238185A1
WO2021238185A1 PCT/CN2020/137919 CN2020137919W WO2021238185A1 WO 2021238185 A1 WO2021238185 A1 WO 2021238185A1 CN 2020137919 W CN2020137919 W CN 2020137919W WO 2021238185 A1 WO2021238185 A1 WO 2021238185A1
Authority
WO
WIPO (PCT)
Prior art keywords
detected
image
cabin
feature
feature map
Prior art date
Application number
PCT/CN2020/137919
Other languages
French (fr)
Chinese (zh)
Inventor
张澳
杜天元
王飞
钱晨
Original Assignee
深圳市商汤科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳市商汤科技有限公司 filed Critical 深圳市商汤科技有限公司
Priority to JP2021558015A priority Critical patent/JP7224489B2/en
Priority to KR1020217034510A priority patent/KR20210149088A/en
Publication of WO2021238185A1 publication Critical patent/WO2021238185A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/50Context or environment of the image
    • G06V20/59Context or environment of the image inside of a vehicle, e.g. relating to seat occupancy, driver state or inner lighting conditions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • G06T7/246Analysis of motion using feature-based methods, e.g. the tracking of corners or segments
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/74Image or video pattern matching; Proximity measures in feature spaces
    • G06V10/761Proximity, similarity or dissimilarity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10016Video; Image sequence
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30248Vehicle exterior or interior
    • G06T2207/30268Vehicle interior
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/07Target detection
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Definitions

  • the present disclosure relates to the field of tracking technology, and relates to but not limited to object detection methods, devices, electronic equipment, storage media, and computer programs.
  • the embodiments of the present disclosure provide at least one object detection solution.
  • the embodiments of the present disclosure provide an object detection method, including:
  • the sending a prompt message in response to the presence of the object to be detected in the image of the cabin to be detected for a duration exceeding a preset time period includes:
  • a first prompt message is issued, and the first prompt message is sent.
  • a reminder message is used to remind passengers that items are left behind; in the case that the reduced number of people in the cabin is the driver, the duration in response to the presence of the object to be detected in the image of the cabin to be detected exceeds the second preset
  • the second prompt message is sent for the duration, and the second prompt message is used to prompt the driver that the item is left behind.
  • the reduced number of people in the cabin is the driver and/or passengers.
  • the object Detection methods also include:
  • the attribution of the object to be detected is determined; wherein the attribution of the object to be detected is the driver and/or passenger.
  • the performing target detection on the image in the cabin to be detected includes:
  • the first feature map corresponding to the channel and the first feature maps corresponding to other channels are fused with feature information to obtain a fused second feature map ,include:
  • For multiple first feature maps for feature information fusion determine the weight matrix corresponding to the multiple first feature maps; based on the weight matrix, perform a weighted summation on the feature information of the multiple first feature maps , To obtain a second feature map containing each fusion feature information.
  • the detecting the to-be-detected object in the to-be-detected cabin image based on the fused second feature map includes:
  • each candidate area contains a set number of feature points; determine the candidate area based on the feature data of the feature points contained in each candidate area Corresponding confidence level; the confidence level corresponding to each candidate area is used to characterize the credibility of the candidate area containing the object to be detected; based on the confidence level corresponding to each candidate area and the overlapping area between different candidate areas, The detection area corresponding to the object to be detected is screened out from the set number of candidate areas; the detection area is used to identify the position of the object to be detected in the image of the cabin to be detected.
  • the acquiring the image in the cabin to be detected includes:
  • the performing target detection on the image in the cabin to be detected further includes:
  • each cabin image in the cabin video stream to be detected as a to-be-tracked image, and for each non-first frame to be tracked image, based on the previous frame of the non-first frame to-be-tracked image in the to-be-tracked image
  • the target detection of the image in the cabin to be detected is performed by a neural network
  • the neural network is trained by using sample images in the cabin containing the sample objects to be detected and sample images in the cabin that do not contain the sample objects to be detected.
  • the embodiment of the present disclosure also provides an object detection device, which includes:
  • the image acquisition module is configured to acquire an image in the cabin to be detected; the image detection module is configured to perform target detection on the image in the cabin to be detected when the number of people in the cabin is reduced, and determine the to-be-detected cabin image Whether there is an object to be detected in the image of the cabin; the prompt module is configured to send a prompt message in response to the state of the object to be detected in the image of the cabin to be detected for a duration exceeding a preset period of time.
  • the prompt module is configured to send a prompt message in response to a state in which the object to be detected exists in the image in the cabin to be detected exceeds a preset period of time, including:
  • a first prompt message is issued, and the first prompt message is sent.
  • a reminder message is used to remind passengers that items are left behind; in the case that the reduced number of people in the cabin is the driver, the duration in response to the presence of the object to be detected in the image of the cabin to be detected exceeds the second preset
  • the second prompt message is sent for the duration, and the second prompt message is used to prompt the driver that the item is left behind.
  • the reduced number of people in the cabin is the driver and/or passenger.
  • the prompt module sends a prompt message before, the image detection module was also configured as:
  • the attribution of the object to be detected is determined; wherein the attribution of the object to be detected is the driver and/or passenger.
  • the image detection module configured to perform target detection on the image in the cabin to be detected includes:
  • the object to be detected in the image in the cabin to be detected is detected.
  • the image detection module is configured to perform feature information fusion on the first feature map corresponding to the channel and the first feature maps respectively corresponding to other channels for each of the channels, to obtain the fusion
  • the second feature map afterwards includes:
  • the image detection module configured to detect the object to be detected in the image in the cabin to be detected based on the fused second feature map includes:
  • the confidence level corresponding to the candidate area Based on the feature data of the feature points contained in each candidate area, determine the confidence level corresponding to the candidate area; the confidence level corresponding to each candidate area is used to characterize the credibility that the candidate area contains the object to be detected;
  • the detection area corresponding to the object to be detected is screened out from the set number of candidate areas; the detection area is used to identify the to-be-detected object The position of the detection object in the image in the cabin to be detected.
  • the image acquisition module configured to acquire the image in the cabin to be detected includes:
  • the in-cabin images to be detected are extracted at intervals.
  • the image detection module configured to perform target detection on the image in the cabin to be detected, further includes:
  • each cabin image in the cabin video stream to be detected as a to-be-tracked image, and for each non-first frame to be tracked image, based on the previous frame of the non-first frame to-be-tracked image in the to-be-tracked image Determining the predicted position information of the object to be detected in the non-first frame to be tracked image of the position information of the object to be detected and the non-first frame to be tracked image;
  • the non-first frame of the to-be-tracked image is the to-be-detected cabin image in which the object to be detected is detected
  • use the detected position information as the position information of the to-be-detected object in the non-first frame of the to-be-tracked image
  • the determined predicted position information is used as the position information of the to-be-detected object in the non-first frame of the to-be-tracked image.
  • the target detection on the image in the cabin to be detected is performed by a neural network
  • the neural network is trained by using sample images in the cabin containing the sample objects to be detected and sample images in the cabin that do not contain the sample objects to be detected.
  • the embodiments of the present disclosure also provide an electronic device, which includes a processor, a memory, and a bus.
  • the memory stores machine-readable instructions executable by the processor.
  • the processing The processor communicates with the memory through a bus, and when the machine-readable instructions are executed by the processor, any one of the above-mentioned object detection methods is executed.
  • the embodiment of the present disclosure also provides a computer-readable storage medium having a computer program stored on the computer-readable storage medium, and when the computer program is run by a processor, any one of the aforementioned object detection methods is executed.
  • the embodiment of the present disclosure also provides a computer program, the computer program includes computer readable code, when the computer readable code is run in an electronic device, the processor in the electronic device executes for realizing any one of the above Kind of object detection method.
  • a method for detecting leftover objects in a cabin scene is provided.
  • the image is subject to target detection, so that it can be determined whether there is an object to be detected in the image to be detected in the cabin.
  • the object to be detected may be an item lost by a person in the cabin. You can make corresponding prompts when the items are in the car, so as to reduce the probability of item loss in the riding environment and improve the safety of the items in the riding environment.
  • FIG. 1 is a schematic flowchart of an object detection method provided by an embodiment of the disclosure
  • FIG. 2 is a flowchart of a method for detecting an object to be detected according to an embodiment of the disclosure
  • FIG. 3 is a flowchart of a method for determining the detection area of the object to be detected in the image in the cabin to be detected according to an embodiment of the disclosure
  • FIG. 4 is a flowchart of a method for tracking an object to be detected according to an embodiment of the disclosure
  • FIG. 5 is a flowchart of a method for training a target detection network in a neural network provided by an embodiment of the disclosure
  • FIG. 6 is a flowchart of another method for training a target tracking network in a neural network provided by an embodiment of the disclosure
  • FIG. 7 is a schematic structural diagram of an object detection device provided by an embodiment of the disclosure.
  • FIG. 8 is a schematic diagram of an electronic device provided by an embodiment of the disclosure.
  • the embodiments of the present disclosure provide a method for detecting leftover items in a cabin scene.
  • the acquired vehicle to be detected can be checked.
  • the cabin image is subject to target detection, so that it can be determined whether there is an object to be detected in the image in the cabin to be detected.
  • the object to be detected may be an item lost by a person in the cabin.
  • a person loses items he can give corresponding prompts, thereby reducing the probability of item loss in the riding environment and improving the safety of items in the riding environment.
  • the equipment includes, for example, a terminal device or a server or other processing equipment.
  • the terminal device may be a user equipment (User Equipment, UE), a mobile device, a user terminal, and the like.
  • the object detection method may be implemented by a processor invoking computer-readable instructions stored in a memory.
  • FIG. 1 is a flowchart of an improved object detection method according to an embodiment of the present disclosure, including the following S101 to S103:
  • S101 Acquire an image in the cabin to be detected.
  • the cabin can be a cabin of a public transportation vehicle such as a taxi cabin, a train cabin, or an airplane cabin;
  • the image in the cabin to be detected can be captured by an image acquisition device set in a fixed position in the cabin.
  • the image in the cabin to be detected can be captured.
  • S102 When the number of people in the cabin is reduced, target detection is performed on the image in the cabin to be detected, and it is determined whether there is an object to be detected in the image in the cabin to be detected.
  • Object detection is performed on images in the cabin, such as detecting whether there are still items left in the cabin that reduce the number of people.
  • the target detection of the image in the cabin to be detected may be used to detect preset items that are easily lost by passengers or drivers, such as mobile phones, wallets, handbags, suitcases and the like.
  • the embodiments of the present disclosure provide a method for detecting leftover items in a cabin scene.
  • the acquired images in the cabin can be checked.
  • Target detection can determine whether there is an object to be detected in the image in the cabin to be detected.
  • the object to be detected may be an item lost by a person in the cabin.
  • corresponding prompts can be made to reduce the probability of items lost in the riding environment and improve the safety of items in the riding environment.
  • the prompt message in response to the duration of the state in which the object to be detected exists in the image of the vehicle cabin to be detected exceeds the preset time, when the prompt message is sent, it may include:
  • the reduced number of people in the cabin is a passenger
  • the duration of the state of the object to be detected exceeds the first preset time period, a first prompt message is issued, and the first prompt message is used To remind passengers of items left behind;
  • a second prompt message is issued, the second prompt message Used to remind the driver that items are left behind.
  • the first preset duration and the second preset duration here may be the same or different. Considering that the driver may only get out of the cabin for a short time, the second preset duration here may be greater than the first preset duration. duration.
  • both the first prompt information and the second prompt information may be broadcast in language, where the first prompt information is used to prompt the passenger or the driver, and the second prompt information is used to prompt the driver.
  • the reduced number of persons in the cabin is the driver and/or passengers. After it is determined that there is an object to be detected in the image of the cabin to be detected, before the prompt message is issued, the object detection method further includes :
  • the attribution of the object to be detected is determined; where the attribution of the object to be detected is the driver and/or passenger.
  • the position of each person in the cabin in the cabin and the corresponding item to be detected in the cabin can be determined, so that the association between the object to be detected and the position can be established
  • the relationship, and the association relationship between the position and the person in the cabin, and then the person who belongs to the object to be detected can be determined according to the position of the object to be detected in the cabin.
  • a corresponding prompt message is issued according to the attribution.
  • the attribution of the object to be detected can be determined based on the position of the object to be detected in the cabin, so as to facilitate subsequent classification prompts.
  • S201 Perform feature extraction on the image in the cabin to be detected, to obtain a first feature map corresponding to each of the multiple channels; wherein the first feature map corresponding to each channel is the image corresponding to the object to be detected in the channel.
  • the feature extraction of the image in the cabin to be detected can be performed by feature extraction through a feature extraction network trained in advance to obtain the first feature map corresponding to multiple preset channels.
  • each channel can be understood as corresponding to the vehicle to be detected.
  • An image feature category of the cabin image For example, after feature extraction of the cabin image to be detected, the first feature map corresponding to the three channels can be obtained, and the first channel can correspond to the cabin image to be detected.
  • the second channel can correspond to the color feature of the image in the cabin to be detected, and the third channel can correspond to the size feature of the image in the cabin to be detected, so that the image in the cabin to be detected can be obtained Feature maps under each image feature category.
  • feature extraction is performed on the image in the cabin to be detected to obtain the first feature map.
  • the first feature map corresponding to each channel will represent the object to be detected.
  • the feature information of the vehicle cabin is distinguished from the feature information representing the background in the cabin.
  • the feature information representing the object to be detected can be enhanced, and the feature information representing the background in the cabin can be weakened.
  • the feature information of the detected object is enhanced, or only the feature information representing the background in the cabin can be weakened, so that the strength of the feature information representing the object to be detected in each first feature map obtained is greater than that of the cabin.
  • the strength of the characteristic information of the inner background is provided.
  • S202 For each channel, perform feature information fusion on the first feature map corresponding to the channel and the first feature maps respectively corresponding to other channels to obtain a fused second feature map.
  • each channel tends to indicate the feature information of the image in the cabin to be detected under the corresponding image feature category of the channel, in order to obtain a more complete feature map of the feature information, here for each channel, the first feature corresponding to the channel
  • the first feature map corresponding to the image and the other channels is fused with feature information, that is, a second feature map containing multiple image feature categories can be obtained.
  • the feature information in the first feature map corresponding to each channel can be represented by the feature data in the first feature map corresponding to the channel.
  • Feature information fusion refers to fusing the feature data in each first feature map to obtain The second feature map after fusion.
  • S203 Detect the object to be detected in the image of the cabin to be detected based on the second feature map after the fusion.
  • the process of detecting the object to be detected in the image in the cabin to be detected may be based on the target detection network in the pre-trained neural network, and the image in the cabin to be detected The object to be detected is detected, that is, the fused second feature map is input to the target detection network in the pre-trained neural network, which can complete the detection of the object to be detected in the image of the cabin to be detected.
  • detecting the object to be detected in the image in the cabin to be detected may refer to detecting whether there is the object to be detected in the image in the cabin to be detected, and when it is determined that there is the object to be detected in the image in the cabin to be detected , To determine the position information of the object to be detected in the image of the cabin to be detected.
  • the first feature map obtained by feature extraction is the feature map after enhancement processing is performed on the feature of the object to be detected in the image feature category corresponding to the channel, that is, the feature map contained in each first feature map
  • the feature information of the detected object is enhanced compared to the feature information of the non-detected object, so that the object to be detected can be clearly distinguished from the background area in the image of the cabin to be detected through the feature information; then for each channel, the channel
  • the corresponding first feature map and the first feature maps corresponding to other channels are fused with feature information, thereby obtaining a more comprehensive feature information of the object to be detected, and then based on this second feature map to complete the image of the cabin to be detected
  • the detection of the object to be detected can accurately detect the object to be detected in the image of the cabin to be detected.
  • a first feature map with a size of h*w*c is obtained, where c represents the number of the first feature maps, that is, the cabin to be detected.
  • c represents the number of the first feature maps, that is, the cabin to be detected
  • h*w represents the size of each first feature map
  • each first feature map contains h*w feature points corresponding to Characteristic data.
  • the size of the fused second feature map is also h*w*c, that is, each channel corresponds to a second feature map, and each second feature map
  • the size of the graph is h*w
  • the feature data corresponding to any feature point in the second feature graph corresponds to the feature point in the same position of any feature point in the second feature graph in the first feature graph corresponding to each channel
  • the weight matrix here contains the weight vectors corresponding to the c channels, and the weight value in the weight vector corresponding to each channel represents the weight value of the feature data in each first feature map when determining the second feature map corresponding to the channel .
  • c is equal to 3, which means that after feature extraction of the image in the cabin to be detected, first feature maps corresponding to 3 channels are obtained, that is, 3 first feature maps are obtained, and each first feature map contains h* Feature data corresponding to w feature points, these h*w feature data can constitute a feature vector of h*w dimensions, and each feature data in the feature vector is the feature data corresponding to each feature point in the first feature map.
  • the corresponding weight matrix of the channel can be used for each channel.
  • the feature data in the first feature map corresponding to the channel is weighted and summed to obtain the feature data in the second feature map corresponding to the channel.
  • the embodiments of the present disclosure by enriching the feature information contained in the object to be detected, and increasing the degree of discrimination between the object to be detected and the background area in the image in the cabin, it is convenient for the later based to be richer and more distinguishable from the background area.
  • the large feature information accurately determines whether there is an object to be detected in the image of the cabin to be detected, and the position information of the object to be detected.
  • each first feature map contains h*w feature data.
  • the feature matrix formed by the feature vector corresponding to each first feature map is:
  • (a 1 a 2 ... a h*w ) T can be used to represent the feature vector of the first feature map corresponding to the first channel; a 1 represents the first feature map corresponding to the first channel.
  • the feature data of each feature point, a 2 represents the feature data of the second feature point in the first feature map corresponding to the first channel; a h*w represents the h*w th in the first feature map corresponding to the first channel Feature data of each feature point;
  • (b 1 b 2 ... b h*w ) T can be used to represent the feature vector of the first feature map corresponding to the second channel;
  • b 1 represents the first feature in the first feature map corresponding to the second channel Point feature data;
  • b 2 represents the feature data of the second feature point in the first feature map corresponding to the second channel;
  • b h*w represents the h*wth feature in the first feature map corresponding to the second channel Point characteristic data;
  • (d 1 d 2 ... d h*w ) T can be used to represent the feature vector of the first feature map corresponding to the third channel; d 1 represents the first feature in the first feature map corresponding to the third channel Point feature data, d 2 represents the feature data of the second feature point in the first feature map corresponding to the third channel, and d h*w represents the h*wth feature in the first feature map corresponding to the third channel Point characteristic data.
  • (m 1 m 2 m 3 ) T represents the weight vector corresponding to different first feature maps when determining the second feature map corresponding to the first channel, and m 1 represents the first feature map corresponding to the first channel
  • the weight value of each feature data when determining the second feature map corresponding to the first channel; m 2 means that each feature data in the first feature map corresponding to the second channel determines the second feature map corresponding to the first channel
  • M 3 represents the weight value of each feature data in the first feature map corresponding to the third channel when determining the second feature map corresponding to the first channel.
  • (k 1 k 2 k 3 ) T represents the weight vector corresponding to different first feature maps when determining the second feature map corresponding to the second channel, and k 1 represents the first feature map corresponding to the first channel
  • the weight value of each feature data when determining the second feature map corresponding to the second channel; k 2 means that each feature data in the first feature map corresponding to the second channel determines the second feature map corresponding to the second channel
  • K 3 represents the weight value of each feature data in the first feature map corresponding to the third channel when determining the second feature map corresponding to the second channel.
  • (l 1 l 2 l 3 ) T represents the weight vector corresponding to different first feature maps when determining the second feature map corresponding to the third channel
  • l 1 represents the first feature map corresponding to the first channel
  • l 2 means that each feature data in the first feature map corresponding to the second channel determines the second feature map corresponding to the third channel 13 represents the weight value of each feature data in the first feature map corresponding to the third channel when determining the second feature map corresponding to the third channel.
  • the second feature map corresponding to the first channel can be determined according to the following formula (1):
  • T 1 (a 1 a 2 ... a h*w ) T *m 1 +(b 1 b 2 ... b h*w ) T *m 2 +(d 1 d 2 ... d h* w ) T *m 3 (1)
  • the feature data of the first feature point in the second feature map corresponding to the first channel is a 1 m 1 + b 1 m 2 + d 1 m 3 ;
  • the first channel corresponding to the first channel in the second feature map is The feature data of the two feature points is a 2 m 1 + b 2 m 2 + d 2 m 3 ;
  • the feature data of the h*w feature point in the second feature map corresponding to the first channel is a h* w m 1 +b h*w m 2 +d h*w m 3 .
  • the second feature map corresponding to the second channel and the second feature map corresponding to the third channel can be determined in the same manner.
  • the second feature map after fusion by determining the weight matrix corresponding to the first feature map, the second feature map corresponding to each channel is obtained, so that each second feature map corresponds to multiple channels
  • the features under the image feature category are fused. If the image in the cabin to be detected contains the object to be detected, the second feature map after fusion can contain more feature information of the object to be detected, and because of the first feature In the figure, the feature of the detection object is enhanced, and the distinction between the feature information of the object to be detected and the feature information of the background area in the second feature map after the fusion obtained based on the first feature map is also greater, so It is convenient for the later stage to accurately determine whether there is an object to be detected and the position information of the object to be detected in the image of the cabin to be detected based on the feature information that is richer and more distinguishable from the background area.
  • the object to be detected in the image of the cabin to be detected can be detected according to the fused second feature map.
  • the following steps S301 to S303 may be included:
  • S301 Determine a set number of candidate regions based on the fused second feature map, and each candidate region contains a set number of feature points.
  • the candidate area here refers to the area that may contain the object to be detected.
  • the number of candidate areas and the set number of feature points contained in each candidate area can be determined by the candidate area extraction network in the pre-trained neural network. .
  • the set number of candidate regions is based on the consideration of the test accuracy of the target detection network. For example, during the network training process, the candidates are continuously adjusted for the fused second sample feature maps corresponding to a large number of sample images to be detected. The number of regions, and then in the testing process, the trained target detection network is tested, and the set number of candidate regions is determined through the test accuracy corresponding to different candidate regions.
  • the number of settings contained in each candidate area can be determined in advance based on the comprehensive consideration of the test speed and test accuracy of the target detection network. For example, in the network training process, first keep the number of candidate areas unchanged, and continuously adjust each The number of feature points contained in the candidate area, and then in the test process, the target detection network is tested, and the test speed and test accuracy are comprehensively considered to determine the set number of feature points contained in each candidate area.
  • the feature points contained in each candidate area correspond to feature data. According to these feature data, the credibility that the candidate area contains the object to be detected can be determined. For example, the confidence level corresponding to each candidate area can be passed
  • the target detection network in the pre-trained neural network is determined, that is, the feature data in the candidate area is input into the target detection network in the pre-trained neural network, and the confidence level corresponding to the candidate area can be obtained.
  • the detection area corresponding to the object to be detected is selected from a set number of candidate areas based on the confidence level corresponding to each candidate area and the overlap area between different candidate areas, you can start with The set number of candidate regions are selected to filter out the set number of target candidate regions before the confidence ranking, and then based on the preset confidence threshold and the overlap area between different candidate regions, the detection region corresponding to the object to be detected can be determined .
  • the target candidate area with the corresponding confidence level higher than the confidence threshold is more likely to be the detection area corresponding to the object to be detected, and considering that there are overlapping candidate areas between the candidate areas, if an overlapping candidate occurs
  • the overlap area of the region is greater than the set area threshold, which can indicate that the object to be detected contained in the overlapping candidate area may be the same object to be detected.
  • the detection area corresponding to the object to be detected is further selected from the target candidate area.
  • the target candidate area with a confidence higher than the confidence threshold can be reserved in the target candidate area, and the target candidate area with the highest confidence can be reserved in the target candidate area where the overlap occurs, that is, the detection area corresponding to the object to be detected is obtained.
  • the above process of screening out the set number of target candidate regions before the confidence sorting from the set number of candidate regions can be determined according to the target detection network, which can be specifically based on the test speed of the target detection network and The test accuracy is determined in advance by comprehensive consideration. For example, during the network training process, the number of target candidate areas is constantly adjusted, and then during the test process, the target detection network is tested, and the test speed and test accuracy are comprehensively considered to determine the target candidate area here. The number of settings.
  • the confidence level corresponding to each candidate area here is less than the set threshold, it can indicate that there is no object to be detected in the image of the cabin to be detected, and this situation is not described in detail in the embodiment of the present disclosure.
  • the detection area of the object to be detected in the image of the cabin to be detected can be obtained, that is, the position of the object to be detected in the image of the cabin to be detected is obtained, and the second feature after fusion is passed here.
  • Map to determine the candidate area because the second feature map after the fusion contains the feature information of the object to be detected and the feature information of the background area is more distinguishable, and the feature information of the object to be detected is more abundant, so based on the fusion
  • the latter second feature map can accurately obtain the candidate area representing the position of the object to be detected in the area to be detected and the confidence of each candidate area.
  • it is proposed to further consider the possible position information of the object to be detected by considering the overlapping area of the candidate area. Screening can accurately obtain whether there is an object to be detected and the position information of the object to be detected in the image in the cabin to be detected.
  • the object detection method proposed in the embodiments of the present disclosure requires continuous acquisition of the images in the cabin to be detected in many application scenarios, and the detection of the images in the cabin to be detected, for example, the detection of leftover objects in transportation scenarios
  • you can install an image acquisition component in the car for example, install a camera in the car, and make the camera face the set position to shoot.
  • the cabin images to be detected are extracted at intervals.
  • the video stream in the cabin to be detected may be a video stream captured by the image capture component at a set position in the car, and the video stream captured per second may be Contains multiple consecutive frames of images in the cabin. Taking into account the short interval between two adjacent frames of images, the similarity of the images in the adjacent two frames of the cabin is relatively high.
  • the Interval extraction is performed from the frames of the cabin image to obtain the above-mentioned cabin image to be detected. For example, if the cabin video stream to be detected obtained in a certain period of time contains 1000 frames of images, follow Extracting once every frame, you can get 500 frames of images in the cabin to be detected.
  • the detection of these 500 frames of images in the cabin to be detected can accomplish the purpose of detecting the remaining items in the cabin.
  • the images in the cabin to be detected are extracted in an interval manner, and the images in the cabin to be detected that need to be detected are obtained from the video stream in the cabin to be detected, which can improve the detection efficiency.
  • the position information of the to-be-detected object in each frame of the cabin image can also be tracked, as shown in Fig. 4 As shown, the following S401 ⁇ S404 are also included:
  • S401 Use each cabin image in the cabin video stream to be detected as a to-be-tracked image, and for each non-first frame of the to-be-tracked image, based on the previous frame of the non-first-frame to-be-tracked image in the to-be-tracked image.
  • the position information of the object to be detected and the non-first frame of the to-be-tracked image determine the predicted position information of the object to be detected in the non-first frame of the to-be-tracked image.
  • the object detection is performed on the interval-extracted cabin image, and the position information of the object to be detected in the interval-extracted cabin image is respectively determined.
  • target detection is performed on single-number frames of cabin images such as the first frame of the cabin image, the third frame of the cabin image, and the fifth frame of the cabin image.
  • the object to be detected is tracked in the second frame of the cabin image
  • the predicted position information of the object to be detected in the cabin image of the second frame can be determined based on the position information of the object to be detected in the cabin image of the first frame and the cabin image of the second frame .
  • the object to be detected when tracking the object to be detected, it can be based on the target tracking network in the pre-trained neural network. For example, for the first frame to be tracked and the second frame to be tracked, according to the object to be detected The detection area in the image to be tracked, and the feature data of the feature points contained in the detection area, where the detection area has corresponding coordinate information, and then the detection area, the feature data of the feature points contained in the detection area, and the first Two frames of to-be-tracked images are input into the target tracking network, that is, based on the coordinate information corresponding to the detection area in the first frame of the to-be-tracked image of the object to be detected, in the local area corresponding to the coordinate information in the second frame of the to-be-tracked image Look for whether there is a detection area whose feature data similarity to the feature points contained in the detection area exceeds the threshold.
  • the second frame of the image to be tracked contains the object to be detected in the first frame of the image to be tracked, and get The position information of the object to be detected in the image to be tracked in the first frame in the image to be tracked in the second frame is to complete the tracking of the object to be detected.
  • the second frame of the to-be-tracked image Does not include the object to be detected in the image to be tracked in the first frame, and it can be determined that the object to be detected has moved in position.
  • S402 Determine whether the non-first frame of the to-be-tracked image is a to-be-detected cabin image in which the object to be detected is detected.
  • the non-first frame of the to-be-tracked image is the to-be-detected cabin image in which the object to be detected is detected, so as to consider whether the detected object is in the non-first frame of the to-be-tracked image
  • the position information of the object to be detected is corrected in the predicted position information of the image to be tracked in the non-first frame, so as to track the position of the object to be detected in the image to be tracked in the next frame based on the corrected position information.
  • the detected position information is used as the position information of the to-be-detected object in the non-first frame of the to-be-tracked image, that is, complete
  • the predicted position information of the object to be detected in the non-first frame of the image to be tracked is corrected, and subsequently based on the position information of the object to be detected in the non-first frame of the image to be tracked, when the object to be detected is tracked , Can be more accurate.
  • the object to be detected can be determined based on the predicted position information of the object to be detected in the non-first frame of the to-be-tracked image Continue to track the position in the next frame of the image to be tracked. This method can estimate the position of the object to be detected in the cabin at each moment, thereby improving the tracking efficiency.
  • the non-first frame to-be-tracked image can be tracked based on the position information of the to-be-detected object in the previous frame of the non-first frame-to-be-tracked image, and it is determined that the to-be-detected object is in the non-first frame.
  • the predicted location information in the image is tracked, and during the tracking process, the predicted location information can also be adjusted based on the detected location information. In this way, the efficiency and accuracy of tracking the object to be detected can be improved.
  • the target detection of the image in the cabin to be detected proposed in the embodiment of the present disclosure is performed by a neural network, where the neural network uses the image in the cabin that contains the sample object to be detected and does not contain the sample to be detected.
  • the sample images in the vehicle cabin of the object are obtained through training.
  • the network for target detection in the neural network can be obtained by training in the following manner, as shown in FIG. 5, which specifically includes S501 to S505:
  • S501 Acquire a sample image in the cabin to be detected.
  • the sample image in the cabin to be detected here includes the sample image in the cabin that contains the sample object to be detected, which can be recorded as a positive sample image, and the sample image in the cabin that does not contain the sample object to be detected, can be recorded as negative Sample image.
  • the appearance of the leftover objects in the sample image in the cabin may be various color blocks, such as mobile phones, suitcases, etc. can be represented by rectangular color blocks, and the water cup can be represented by cylindrical colors.
  • Block representation in order to enable the neural network to better identify which are the objects to be detected and which are the background in the car, such as the background of the car seat, window, etc., here can add some non-to-be-detected items to the sample image in the cabin
  • the random color patches of are used to represent the objects not to be detected.
  • the real objects to be detected and the non-real random color patches and the background in the car are continuously distinguished, so as to obtain a neural network with higher accuracy.
  • S502 Perform feature extraction on the sample image in the cabin to be detected to obtain a first sample feature map corresponding to each of the multiple channels; wherein the first sample feature map corresponding to each channel is the sample to be detected A sample feature map after the feature of the object in the image feature category corresponding to the channel is enhanced.
  • feature extraction is performed on the sample image to be detected, and the process of obtaining the first sample feature map corresponding to each of the multiple channels is performed with the feature extraction of the image in the cabin to be detected as mentioned above, and multiple The process of the first feature map corresponding to each channel in the channel is similar, and will not be repeated here.
  • S503 For each channel, perform feature information fusion on the first sample feature map corresponding to the channel and the first sample feature maps corresponding to other channels to obtain a fused second sample feature map.
  • the process of obtaining the fused second sample feature map is similar to the process of obtaining the fused second feature map based on the first feature map mentioned above, and will not be repeated here. .
  • S504 Predict the sample object to be detected in the sample image in the cabin to be detected based on the fused second sample feature map.
  • the sample object to be detected in the sample image in the cabin is pre-stored, and the second feature map based on the fusion mentioned above is used to detect the to be detected in the cabin image to be detected.
  • the process of detecting objects is similar, so I won't repeat them here.
  • the position information of the sample object to be tested in the sample image in the cabin to be tested is predicted, the sample image in the cabin to be tested that contains the sample to be tested, and the sample in the cabin to be tested that does not contain the sample to be tested Image, to determine the predicted loss value of the position information of the sample object to be detected in the sample image in the cabin to be detected, and adjust the network parameter value in the neural network through the loss value.
  • the training can be stopped, so as to obtain the trained neural network.
  • the embodiments of the present disclosure also include a process of training the target tracking network in the neural network.
  • the tracking sample image and the to-be-tracked sample image that does not contain the sample object to be detected are obtained through training.
  • the sample object to be detected here may refer to the sample object that needs to be tracked.
  • the sample object to be detected here can be passenger objects in various car scenes.
  • the target tracking network in the neural network can be obtained by training in the following manner, as shown in FIG. 6, which specifically includes S601 to S603:
  • S601 Acquire a sample image to be tracked and information about the sample object to be detected corresponding to the sample object to be detected.
  • the sample image to be tracked here may refer to the sample image that needs to be tracked for the sample object to be detected.
  • the sample image to be tracked here may include a positive sample image that contains the sample object to be detected and a negative sample image that does not contain the sample object to be detected.
  • the detection area image of the sample object to be detected and the sample image to be tracked can be input into the neural network at the same time.
  • the detection area image of the sample object to be detected contains the corresponding image of the sample object to be detected.
  • the information of the sample object to be detected may include the detection area of the object to be detected and the feature data of the feature points contained in the detection area.
  • S602 Based on the sample object information to be detected and the sample image to be tracked, the position of the sample object to be detected in the sample image is tracked, and the position information of the sample object to be detected in the sample image is predicted.
  • the sample object to be detected in the sample images continuously acquired in the same area it can first be determined based on the detection area corresponding to the sample object to be detected in the sample object information to determine that the sample object to be detected is in the sample to be tracked.
  • the local area in the image where the local area is close to the detection area corresponding to the sample object to be detected, so that the sample object to be detected can be detected in the local area based on the feature data, so as to predict that the sample object to be detected is in the sample image to be tracked Location information.
  • the position information of the sample object to be detected in the sample image to be tracked can be predicted, the sample image to be tracked that contains the sample object to be detected, and the sample image to be tracked that does not contain the sample object to be detected, to determine the sample image to be tracked
  • the loss value of the position information of the sample object to be detected After multiple training, the network parameter value in the neural network is adjusted through the loss value. For example, when the loss value is less than the set threshold, the training can be stopped to obtain Neural network target tracking network.
  • the position of the sample object to be detected in the sample image to be tracked is tracked by acquiring the sample image to be tracked and the information of the sample object to be detected corresponding to the sample object to be detected. So as to quickly determine the position of the sample object to be detected in the sample image to be tracked, and then predict the position information of the sample object to be detected in the sample image to be tracked, the sample image to be tracked that contains the sample object to be detected, and the sample that does not contain the sample to be detected.
  • the network parameter values of the neural network are adjusted to obtain a neural network with higher accuracy. Based on the neural network with higher accuracy, the object to be detected can be accurately tracked.
  • the writing order of the steps does not mean a strict execution order but constitutes any limitation on the implementation process.
  • the specific execution order of each step should be based on its function and possibility.
  • the inner logic is determined.
  • the embodiment of the present disclosure also provides an object detection device corresponding to the object detection method. Since the principle of the device in the embodiment of the present disclosure to solve the problem is similar to the above-mentioned object detection method of the embodiment of the present disclosure, the implementation of the device You can refer to the implementation of the method, and the repetition will not be repeated here.
  • the object detection device 700 includes: an image acquisition module 701, an image detection module 702, and a prompt module 703.
  • the image acquisition module 701 is configured to acquire an image in the cabin to be detected
  • the image detection module 702 is configured to perform target detection on the image in the cabin to be detected when the number of people in the cabin is reduced, and determine whether there is an object to be detected in the image in the cabin to be detected;
  • the prompting module 703 is configured to send a prompt message in response to a state in which the object to be detected exists in the image of the cabin to be detected for a duration that exceeds a preset period of time.
  • the prompt module 703 is configured to send a prompt message in response to a state in which the object to be detected exists in the image in the cabin to be detected exceeds a preset period of time, including:
  • a first prompt message is issued, and the first prompt message is sent.
  • a reminder message is used to remind passengers that items are left behind; in the case that the reduced number of people in the cabin is the driver, the duration in response to the presence of the object to be detected in the image of the cabin to be detected exceeds the second preset
  • the second prompt message is sent for the duration, and the second prompt message is used to prompt the driver that the item is left behind.
  • the reduced number of people in the cabin is the driver and/or passenger.
  • the prompt module issues a prompt before information, the image detection module 702 is also configured to:
  • the attribution of the object to be detected is determined; wherein the attribution of the object to be detected is the driver and/or passenger.
  • the image detection module 702, configured to perform target detection on the image in the cabin to be detected includes:
  • the object to be detected in the image in the cabin to be detected is detected.
  • the image detection module 702 is configured to perform, for each channel, a first feature map corresponding to the channel and first feature maps corresponding to other channels to perform feature information fusion to obtain The second feature map after fusion, including:
  • the image detection module 702 is configured to detect the object to be detected in the image in the cabin to be detected based on the fused second feature map, including:
  • the confidence level corresponding to the candidate area Based on the feature data of the feature points contained in each candidate area, determine the confidence level corresponding to the candidate area; the confidence level corresponding to each candidate area is used to characterize the credibility that the candidate area contains the object to be detected;
  • the detection area corresponding to the object to be detected is screened out from the set number of candidate areas; the detection area is used to identify the to-be-detected object The position of the detection object in the image in the cabin to be detected.
  • the image acquisition module 701 configured to acquire the image in the cabin to be detected includes:
  • the in-cabin images to be detected are extracted at intervals.
  • the image detection module 702 configured to perform target detection on the image in the cabin to be detected, further includes:
  • each cabin image in the cabin video stream to be detected as a to-be-tracked image, and for each non-first frame to be tracked image, based on the previous frame of the non-first frame to-be-tracked image in the to-be-tracked image Determining the predicted position information of the object to be detected in the non-first frame to be tracked image of the position information of the object to be detected and the non-first frame to be tracked image;
  • the non-first frame of the to-be-tracked image is the to-be-detected cabin image in which the object to be detected is detected
  • use the detected position information as the position information of the to-be-detected object in the non-first frame of the to-be-tracked image
  • the determined predicted position information is used as the position information of the to-be-detected object in the non-first frame of the to-be-tracked image.
  • the object detection device further includes a neural network training module 704, and the neural network training module 704 is used to:
  • a neural network for target detection on images in the cabin to be detected is trained using sample images in the cabin containing the sample objects to be detected and sample images in the cabin that do not contain the sample objects to be detected.
  • an embodiment of the present disclosure also provides an electronic device 800.
  • a schematic structural diagram of the electronic device 800 provided by the embodiment of the present disclosure includes:
  • the processor 81 and the memory 82 communicate through the bus 83, so that the processor 81 executes the above method Any object detection method in the embodiment.
  • the embodiments of the present disclosure also provide a computer-readable storage medium on which a computer program is stored, and when the computer program is run by a processor, any one of the object detection methods in the foregoing method embodiments is executed.
  • the storage medium may be a volatile or nonvolatile computer readable storage medium.
  • the computer program product of the object detection method provided by the embodiment of the present disclosure includes a computer-readable storage medium storing program code.
  • the instructions included in the program code can be used to execute any of the object detection methods in the foregoing method embodiments. Refer to the foregoing method embodiment, which will not be repeated here.
  • the embodiments of the present disclosure also provide a computer program, which, when executed by a processor, implements any one of the methods in the foregoing embodiments.
  • the computer program product can be specifically implemented by hardware, software, or a combination thereof.
  • the computer program product is specifically embodied as a computer storage medium.
  • the computer program product is specifically embodied as a software product, such as a software development kit (SDK), etc. Wait.
  • SDK software development kit
  • the units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, they may be located in one place, or they may be distributed on multiple network units. Some or all of the units may be selected according to actual needs to achieve the objectives of the solutions of the embodiments.
  • the functional units in the various embodiments of the present disclosure may be integrated into one processing unit, or each unit may exist alone physically, or two or more units may be integrated into one unit.
  • the function is implemented in the form of a software function unit and sold or used as an independent product, it can be stored in a non-volatile computer readable storage medium executable by a processor.
  • the technical solution of the present disclosure essentially or the part that contributes to the prior art or the part of the technical solution can be embodied in the form of a software product, and the computer software product is stored in a storage medium, including Several instructions are used to make a computer device (which may be a personal computer, a server, or a network device, etc.) execute all or part of the steps of the methods described in the various embodiments of the present disclosure.
  • the aforementioned storage media include: U disk, mobile hard disk, read-only memory (Read-Only Memory, ROM), random access memory (Random Access Memory, RAM), magnetic disk or optical disk and other media that can store program code .
  • Embodiments of the present disclosure provide an object detection method, device, electronic equipment, storage medium, and computer program, where the object detection method includes: acquiring an image in a cabin to be detected; when the number of people in the cabin is reduced, Perform target detection on the image in the cabin to be detected to determine whether there is an object to be detected in the image in the cabin to be detected; responding to the state of the object to be detected in the image in the cabin to be detected If the duration exceeds the preset duration, a prompt message will be issued. In this way, when an item lost by a person in the cabin is detected, a corresponding prompt can be given, thereby reducing the probability of item loss in the riding environment and improving the safety of the item in the riding environment.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Multimedia (AREA)
  • Data Mining & Analysis (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Databases & Information Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Software Systems (AREA)
  • Image Analysis (AREA)
  • Traffic Control Systems (AREA)

Abstract

Provided are an object detection method and apparatus, an electronic device and a storage medium. The object detection method comprises: acquiring an in-cabin image to be tested; insofar as the number of persons in a cabin is reduced, performing target detection on said in-cabin image, and determining whether an object to be detected is present in said in-cabin image; and sending prompt information in response to the duration of the state in which the object to be detected is present in said in-cabin image exceeding a preset duration.

Description

对象检测方法、装置、电子设备、存储介质和程序Object detection method, device, electronic equipment, storage medium and program
相关申请的交叉引用Cross-references to related applications
本申请基于申请号为202010477936.9、申请日为2020年5月29日的中国专利申请提出,并要求该中国专利申请的优先权,该中国专利申请的全部内容在此引入本申请作为参考。This application is filed based on a Chinese patent application with an application number of 202010477936.9 and an application date of May 29, 2020, and claims the priority of the Chinese patent application. The entire content of the Chinese patent application is hereby incorporated into this application by reference.
技术领域Technical field
本公开涉及追踪技术领域,涉及但不限于对象检测方法、装置、电子设备、存储介质和计算机程序。The present disclosure relates to the field of tracking technology, and relates to but not limited to object detection methods, devices, electronic equipment, storage media, and computer programs.
背景技术Background technique
随着车辆网的发展,公共交通工具为越来越多的人提供出行方便,在乘车环境中,乘客或者驾驶员通常会携带私人物品,因此在乘车环境中,经常存在乘客丢失个人物品的事件,如何能够有效防止乘车环境中的物品遗失,提高乘车环境下的物品安全,为亟待解决的问题。With the development of the vehicle network, public transportation provides more and more people with convenient travel. In the riding environment, passengers or drivers usually carry personal belongings. Therefore, in the riding environment, passengers often lose their personal belongings. How to effectively prevent the loss of items in the riding environment and improve the safety of items in the riding environment is a problem that needs to be solved urgently.
发明内容Summary of the invention
本公开实施例至少提供一种对象检测方案。The embodiments of the present disclosure provide at least one object detection solution.
本公开实施例提供了一种对象检测方法,包括:The embodiments of the present disclosure provide an object detection method, including:
获取待检测的车舱内图像;在车舱内人员减少的情况下,对所述待检测的车舱内图像进行目标检测,确定所述待检测的车舱内图像中是否存在待检测对象;响应于所述待检测的车舱内图像中存在待检测对象的状态的持续时长超过预设时长,发出提示信息。Acquiring an image in the cabin to be detected; under the condition that the number of people in the cabin is reduced, perform target detection on the image in the cabin to be detected, and determine whether there is an object to be detected in the image in the cabin to be detected; In response to the duration of the state in which the object to be detected exists in the image in the cabin to be detected exceeds a preset time period, a prompt message is issued.
在本公开的一些实施例中,所述响应于所述待检测的车舱内图像中存在待检测对象的状态的持续时长超过预设时长,发出提示信息,包括:In some embodiments of the present disclosure, the sending a prompt message in response to the presence of the object to be detected in the image of the cabin to be detected for a duration exceeding a preset time period includes:
在减少的车舱内人员为乘客的情况下,响应于所述待检测的车舱内图像中存在待检测对象的状态的持续时长超过第一预设时长,发出第一提示信息,所述第一提示信息用于提示乘客物品遗留;在减少的车舱内人员为驾驶员的情况下,响应于所述待检测的车舱内图像中存在待检测对象的状态的持续时长超过第二预设时长,发出第二提示信息,所述第二提示信息用于提示驾驶员物品遗留。In the case that the reduced number of persons in the cabin is a passenger, in response to the duration of the state in which the object to be detected exists in the image to be detected in the cabin exceeds a first preset time period, a first prompt message is issued, and the first prompt message is sent. A reminder message is used to remind passengers that items are left behind; in the case that the reduced number of people in the cabin is the driver, the duration in response to the presence of the object to be detected in the image of the cabin to be detected exceeds the second preset The second prompt message is sent for the duration, and the second prompt message is used to prompt the driver that the item is left behind.
在本公开的一些实施例中,减少的车舱内人员为驾驶员和/或乘客,在确定所述待检测的车舱内图像中存在待检测对象后,在发出提示信息之前,所述对象检测方法还包括:In some embodiments of the present disclosure, the reduced number of people in the cabin is the driver and/or passengers. After it is determined that there is an object to be detected in the image of the cabin to be detected, before the prompt message is issued, the object Detection methods also include:
根据所述待检测对象在所述车舱中位置,确定所述待检测对象的归属人员;其中,所述待检测对象的归属人员为驾驶员和/或乘客。According to the position of the object to be detected in the cabin, the attribution of the object to be detected is determined; wherein the attribution of the object to be detected is the driver and/or passenger.
在本公开的一些实施例中,所述对所述待检测的车舱内图像进行目标检测,包括:In some embodiments of the present disclosure, the performing target detection on the image in the cabin to be detected includes:
对所述待检测的车舱内图像进行特征提取,得到与多个通道中每个通道对应的第一特征图;其中每个通道对应的第一特征图,为将待检测对象在该通道对应的图像特征类别下的特征进行增强处理后的特征图;针对每个所述通道,将该通道对应的第一特征图与其它通道分别对应的第一特征图进行特征信息融合,得到融合后的第二特征图;基于所述融合后的第二特征图,检测所述待检测的车舱内图像中的所述待检测对象。Perform feature extraction on the image in the cabin to be detected to obtain a first feature map corresponding to each of the multiple channels; wherein the first feature map corresponding to each channel corresponds to the object to be detected in the channel The feature map after the feature enhancement processing is performed on the features under the image feature category of the image; for each of the channels, the first feature map corresponding to the channel and the first feature maps corresponding to the other channels are fused with feature information to obtain the fused A second feature map; based on the fused second feature map, the object to be detected in the image of the cabin to be detected is detected.
在本公开的一些实施例中,所述针对每个所述通道,将该通道对应的第一特征图与其它通道分别对应的第一特征图进行特征信息融合,得到融合后的第二特征图,包括:In some embodiments of the present disclosure, for each channel, the first feature map corresponding to the channel and the first feature maps corresponding to other channels are fused with feature information to obtain a fused second feature map ,include:
针对进行特征信息融合的多个第一特征图,确定所述多个第一特征图所对应的权重矩阵;基于所述权重矩阵,对所述多个第一特征图的特征信息进行加权求和,得到包含各个融合特征信息的第二特征图。For multiple first feature maps for feature information fusion, determine the weight matrix corresponding to the multiple first feature maps; based on the weight matrix, perform a weighted summation on the feature information of the multiple first feature maps , To obtain a second feature map containing each fusion feature information.
在本公开的一些实施例中,所述基于所述融合后的第二特征图,检测所述待检测的 车舱内图像中的所述待检测对象,包括:In some embodiments of the present disclosure, the detecting the to-be-detected object in the to-be-detected cabin image based on the fused second feature map includes:
基于所述融合后的第二特征图,确定设定个数的候选区域,每个候选区域包含设定个数的特征点;基于每个候选区域包含的特征点的特征数据,确定该候选区域对应的置信度;每个候选区域对应的置信度用于表征该候选区域中包含所述待检测对象的可信程度;基于每个候选区域对应的置信度以及不同候选区域之间的重叠区域,从所述设定个数的候选区域中筛选出待检测对象对应的检测区域;所述检测区域用于标识所述待检测对象在所述待检测的车舱内图像中的位置。Based on the fused second feature map, determine a set number of candidate areas, each candidate area contains a set number of feature points; determine the candidate area based on the feature data of the feature points contained in each candidate area Corresponding confidence level; the confidence level corresponding to each candidate area is used to characterize the credibility of the candidate area containing the object to be detected; based on the confidence level corresponding to each candidate area and the overlapping area between different candidate areas, The detection area corresponding to the object to be detected is screened out from the set number of candidate areas; the detection area is used to identify the position of the object to be detected in the image of the cabin to be detected.
在本公开的一些实施例中,所述获取所述待检测的车舱内图像,包括:In some embodiments of the present disclosure, the acquiring the image in the cabin to be detected includes:
获取待检测的车舱内视频流;从所述待检测的车舱内视频流包含的连续多帧车舱内图像中,间隔提取待检测的车舱内图像。Obtaining the video stream in the cabin to be detected; extracting the cabin images to be detected at intervals from the continuous multiple frames of the cabin images included in the video stream in the cabin to be detected.
在本公开的一些实施例中,所述对所述待检测的车舱内图像进行目标检测,还包括:In some embodiments of the present disclosure, the performing target detection on the image in the cabin to be detected further includes:
将所述待检测的车舱内视频流中的每个车舱内图像作为待追踪图像,针对每个非首帧待追踪图像,基于该非首帧待追踪图像的前一帧待追踪图像中的所述待检测对象的位置信息以及该非首帧待追踪图像,确定所述待检测对象在该非首帧待追踪图像中的预测位置信息;确定该非首帧待追踪图像是否为检测出待检测对象的待检测的车舱内图像;在确定该非首帧待追踪图像是检测出待检测对象的待检测的车舱内图像时,将检测出的位置信息作为待检测对象在该非首帧待追踪图像中的位置信息;在确定该非首帧待追踪图像不是检测出待检测对象的待检测的车舱内图像时,将确定的预测位置信息作为待检测对象在该非首帧待追踪图像中的位置信息。Use each cabin image in the cabin video stream to be detected as a to-be-tracked image, and for each non-first frame to be tracked image, based on the previous frame of the non-first frame to-be-tracked image in the to-be-tracked image Determine the predicted position information of the object to be detected in the non-first frame of the image to be tracked; determine whether the non-first frame of the image to be tracked is detected The to-be-detected cabin image of the object to be detected; when it is determined that the non-first frame of the to-be-tracked image is the to-be-detected cabin image in which the object to be detected is detected, the detected position information is used as the object to be detected in the non- The position information in the first frame of the image to be tracked; when it is determined that the non-first frame of the image to be tracked is not the image in the cabin to be detected in which the object to be detected is detected, the determined predicted position information is taken as the object to be detected in the non-first frame Location information in the image to be tracked.
在本公开的一些实施例中,所述待检测的车舱内图像进行目标检测由神经网络执行;In some embodiments of the present disclosure, the target detection of the image in the cabin to be detected is performed by a neural network;
所述神经网络利用包含了待检测样本对象的车舱内样本图像和未包含待检测样本对象的车舱内样本图像训练得到。The neural network is trained by using sample images in the cabin containing the sample objects to be detected and sample images in the cabin that do not contain the sample objects to be detected.
本公开实施例还提供了一种对象检测装置,该装置包括:The embodiment of the present disclosure also provides an object detection device, which includes:
图像获取模块,配置为获取待检测的车舱内图像;图像检测模块,配置为在车舱内人员减少的情况下,对所述待检测的车舱内图像进行目标检测,确定所述待检测的车舱内图像中是否存在待检测对象;提示模块,配置为响应于所述待检测的车舱内图像中存在待检测对象的状态的持续时长超过预设时长,发出提示信息。The image acquisition module is configured to acquire an image in the cabin to be detected; the image detection module is configured to perform target detection on the image in the cabin to be detected when the number of people in the cabin is reduced, and determine the to-be-detected cabin image Whether there is an object to be detected in the image of the cabin; the prompt module is configured to send a prompt message in response to the state of the object to be detected in the image of the cabin to be detected for a duration exceeding a preset period of time.
在本公开的一些实施例中,所述提示模块配置为响应于所述待检测的车舱内图像中存在待检测对象的状态的持续时长超过预设时长,发出提示信息,包括:In some embodiments of the present disclosure, the prompt module is configured to send a prompt message in response to a state in which the object to be detected exists in the image in the cabin to be detected exceeds a preset period of time, including:
在减少的车舱内人员为乘客的情况下,响应于所述待检测的车舱内图像中存在待检测对象的状态的持续时长超过第一预设时长,发出第一提示信息,所述第一提示信息用于提示乘客物品遗留;在减少的车舱内人员为驾驶员的情况下,响应于所述待检测的车舱内图像中存在待检测对象的状态的持续时长超过第二预设时长,发出第二提示信息,所述第二提示信息用于提示驾驶员物品遗留。In the case that the reduced number of persons in the cabin is a passenger, in response to the duration of the state in which the object to be detected exists in the image to be detected in the cabin exceeds a first preset time period, a first prompt message is issued, and the first prompt message is sent. A reminder message is used to remind passengers that items are left behind; in the case that the reduced number of people in the cabin is the driver, the duration in response to the presence of the object to be detected in the image of the cabin to be detected exceeds the second preset The second prompt message is sent for the duration, and the second prompt message is used to prompt the driver that the item is left behind.
在本公开的一些实施例中,减少的车舱内人员为驾驶员和/或乘客,在图像检测模块确定所述待检测的车舱内图像中存在待检测对象后,在提示模块发出提示信息之前,所述图像检测模块还配置为:In some embodiments of the present disclosure, the reduced number of people in the cabin is the driver and/or passenger. After the image detection module determines that there is an object to be detected in the image of the cabin to be detected, the prompt module sends a prompt message Before, the image detection module was also configured as:
根据所述待检测对象在所述车舱中位置,确定所述待检测对象的归属人员;其中,所述待检测对象的归属人员为驾驶员和/或乘客。According to the position of the object to be detected in the cabin, the attribution of the object to be detected is determined; wherein the attribution of the object to be detected is the driver and/or passenger.
在本公开的一些实施例中,所述图像检测模块,配置为对所述待检测的车舱内图像进行目标检测,包括:In some embodiments of the present disclosure, the image detection module configured to perform target detection on the image in the cabin to be detected includes:
对所述待检测的车舱内图像进行特征提取,得到与多个通道中每个通道对应的第一特征图;其中每个通道对应的第一特征图,为将待检测对象在该通道对应的图像特征类别下的特征进行增强处理后的特征图;Perform feature extraction on the image in the cabin to be detected to obtain a first feature map corresponding to each of the multiple channels; wherein the first feature map corresponding to each channel corresponds to the object to be detected in the channel The feature map after the enhancement processing is performed on the features under the image feature category;
针对每个所述通道,将该通道对应的第一特征图与其它通道分别对应的第一特征图 进行特征信息融合,得到融合后的第二特征图;For each of the channels, perform feature information fusion on the first feature map corresponding to the channel and the first feature maps respectively corresponding to other channels to obtain a fused second feature map;
基于所述融合后的第二特征图,检测所述待检测的车舱内图像中的所述待检测对象。Based on the fused second feature map, the object to be detected in the image in the cabin to be detected is detected.
在本公开的一些实施例中,所述图像检测模块,配置为针对每个所述通道,将该通道对应的第一特征图与其它通道分别对应的第一特征图进行特征信息融合,得到融合后的第二特征图,包括:In some embodiments of the present disclosure, the image detection module is configured to perform feature information fusion on the first feature map corresponding to the channel and the first feature maps respectively corresponding to other channels for each of the channels, to obtain the fusion The second feature map afterwards includes:
针对进行特征信息融合的多个第一特征图,确定所述多个第一特征图所对应的权重矩阵;Determining a weight matrix corresponding to the plurality of first feature maps for performing feature information fusion;
基于所述权重矩阵,对所述多个第一特征图的特征信息进行加权求和,得到包含各个融合特征信息的所述第二特征图。Based on the weight matrix, performing a weighted summation on the feature information of the multiple first feature maps to obtain the second feature map containing each fused feature information.
在本公开的一些实施例中,所述图像检测模块,配置为基于所述融合后的第二特征图,检测所述待检测的车舱内图像中的所述待检测对象,包括:In some embodiments of the present disclosure, the image detection module configured to detect the object to be detected in the image in the cabin to be detected based on the fused second feature map includes:
基于所述融合后的第二特征图,确定设定个数的候选区域,每个候选区域包含设定个数的特征点;Determine a set number of candidate regions based on the fused second feature map, and each candidate region contains a set number of feature points;
基于每个候选区域包含的特征点的特征数据,确定该候选区域对应的置信度;每个候选区域对应的置信度用于表征该候选区域中包含所述待检测对象的可信程度;Based on the feature data of the feature points contained in each candidate area, determine the confidence level corresponding to the candidate area; the confidence level corresponding to each candidate area is used to characterize the credibility that the candidate area contains the object to be detected;
基于每个候选区域对应的置信度以及不同候选区域之间的重叠区域,从所述设定个数的候选区域中筛选出待检测对象对应的检测区域;所述检测区域用于标识所述待检测对象在所述待检测的车舱内图像中的位置。Based on the confidence level corresponding to each candidate area and the overlapping area between different candidate areas, the detection area corresponding to the object to be detected is screened out from the set number of candidate areas; the detection area is used to identify the to-be-detected object The position of the detection object in the image in the cabin to be detected.
在本公开的一些实施例中,所述图像获取模块,配置为获取所述待检测的车舱内图像,包括:In some embodiments of the present disclosure, the image acquisition module configured to acquire the image in the cabin to be detected includes:
获取待检测的车舱内视频流;Obtain the video stream in the cabin to be detected;
从所述待检测的车舱内视频流包含的连续多帧车舱内图像中,间隔提取得到所述待检测的车舱内图像。From the continuous multiple frames of the in-cabin images contained in the video stream in the cabin to be detected, the in-cabin images to be detected are extracted at intervals.
在本公开的一些实施例中,所述图像检测模块,配置为对所述待检测的车舱内图像进行目标检测,还包括:In some embodiments of the present disclosure, the image detection module, configured to perform target detection on the image in the cabin to be detected, further includes:
将所述待检测的车舱内视频流中的每个车舱内图像作为待追踪图像,针对每个非首帧待追踪图像,基于该非首帧待追踪图像的前一帧待追踪图像中的所述待检测对象的位置信息以及该非首帧待追踪图像,确定所述待检测对象在该非首帧待追踪图像中的预测位置信息;Use each cabin image in the cabin video stream to be detected as a to-be-tracked image, and for each non-first frame to be tracked image, based on the previous frame of the non-first frame to-be-tracked image in the to-be-tracked image Determining the predicted position information of the object to be detected in the non-first frame to be tracked image of the position information of the object to be detected and the non-first frame to be tracked image;
确定该非首帧待追踪图像是否为检测出待检测对象的待检测的车舱内图像;Determine whether the non-first frame of the to-be-tracked image is the to-be-detected image in the cabin where the object to be detected is detected;
在确定该非首帧待追踪图像是检测出待检测对象的待检测的车舱内图像时,将检测出的位置信息作为待检测对象在该非首帧待追踪图像中的位置信息;When it is determined that the non-first frame of the to-be-tracked image is the to-be-detected cabin image in which the object to be detected is detected, use the detected position information as the position information of the to-be-detected object in the non-first frame of the to-be-tracked image;
在确定该非首帧待追踪图像不是检测出待检测对象的待检测的车舱内图像时,将确定的预测位置信息作为待检测对象在该非首帧待追踪图像中的位置信息。When it is determined that the non-first frame of the to-be-tracked image is not the to-be-detected cabin image in which the object to be detected is detected, the determined predicted position information is used as the position information of the to-be-detected object in the non-first frame of the to-be-tracked image.
在本公开的一些实施例中,对所述待检测的车舱内图像进行目标检测由神经网络执行;In some embodiments of the present disclosure, the target detection on the image in the cabin to be detected is performed by a neural network;
所述神经网络利用包含了待检测样本对象的车舱内样本图像和未包含待检测样本对象的车舱内样本图像训练得到。The neural network is trained by using sample images in the cabin containing the sample objects to be detected and sample images in the cabin that do not contain the sample objects to be detected.
本公开实施例还提供了一种电子设备,该电子设备包括:处理器、存储器和总线,所述存储器存储有所述处理器可执行的机器可读指令,当电子设备运行时,所述处理器与所述存储器之间通过总线通信,所述机器可读指令被所述处理器执行时执行上述任意一种对象检测方法。The embodiments of the present disclosure also provide an electronic device, which includes a processor, a memory, and a bus. The memory stores machine-readable instructions executable by the processor. When the electronic device is running, the processing The processor communicates with the memory through a bus, and when the machine-readable instructions are executed by the processor, any one of the above-mentioned object detection methods is executed.
本公开实施例还提供了一种计算机可读存储介质,该计算机可读存储介质上存储有计算机程序,该计算机程序被处理器运行时执行上述任意一种对象检测方法。The embodiment of the present disclosure also provides a computer-readable storage medium having a computer program stored on the computer-readable storage medium, and when the computer program is run by a processor, any one of the aforementioned object detection methods is executed.
本公开实施例还提供了一种计算机程序,该计算机程序包括计算机可读代码,当所 述计算机可读代码在电子设备中运行时,所述电子设备中的处理器执行用于实现上述任意一种对象检测方法。The embodiment of the present disclosure also provides a computer program, the computer program includes computer readable code, when the computer readable code is run in an electronic device, the processor in the electronic device executes for realizing any one of the above Kind of object detection method.
本公开实施例中,提供了一种针对车舱场景下的遗留物品检测的方式,通过获取待检测的车舱内图像,这样在车舱内人员减少时,可以对获取的待检测车舱内图像进行目标检测,从而可以确定待检测的车舱内图像中是否存在待检测对象,示例性地,该待检测对象可以为车舱内人员遗失的物品,这样在检测到存在车舱内人员遗失的物品时,可以进行相应提示,从而降低乘车环境中的物品丢失概率,提高乘车环境中的物品安全性。In the embodiments of the present disclosure, a method for detecting leftover objects in a cabin scene is provided. By acquiring images in the cabin to be detected, when the number of people in the cabin is reduced, the acquired cabin can be checked. The image is subject to target detection, so that it can be determined whether there is an object to be detected in the image to be detected in the cabin. For example, the object to be detected may be an item lost by a person in the cabin. You can make corresponding prompts when the items are in the car, so as to reduce the probability of item loss in the riding environment and improve the safety of the items in the riding environment.
为使本公开的上述目的、特征和优点能更明显易懂,下文特举较佳实施例,并配合所附附图,作详细说明如下。In order to make the above-mentioned objectives, features and advantages of the present disclosure more obvious and understandable, preferred embodiments accompanied with accompanying drawings are described in detail as follows.
附图说明Description of the drawings
为了更清楚地说明本公开实施例的技术方案,下面将对实施例中所需要使用的附图作简单地介绍,此处的附图被并入说明书中并构成本说明书中的一部分,这些附图示出了符合本公开的实施例,并与说明书一起用于说明本公开的技术方案。应当理解,以下附图仅示出了本公开的某些实施例,因此不应被看作是对范围的限定,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他相关的附图。In order to explain the technical solutions of the embodiments of the present disclosure more clearly, the following will briefly introduce the drawings needed in the embodiments. The drawings here are incorporated into the specification and constitute a part of the specification. These attachments The figure shows an embodiment conforming to the present disclosure, and is used together with the description to explain the technical solution of the present disclosure. It should be understood that the following drawings only show certain embodiments of the present disclosure, and therefore should not be regarded as limiting the scope. For those of ordinary skill in the art, they can also Obtain other related drawings based on these drawings.
图1为本公开实施例所提供的一种对象检测方法的流程示意图;FIG. 1 is a schematic flowchart of an object detection method provided by an embodiment of the disclosure;
图2为本公开实施例所提供的一种对待检测对象进行检测的方法流程图;FIG. 2 is a flowchart of a method for detecting an object to be detected according to an embodiment of the disclosure;
图3为本公开实施例所提供的一种确定待检测的车舱内图像中待检测对象的检测区域的方法流程图;FIG. 3 is a flowchart of a method for determining the detection area of the object to be detected in the image in the cabin to be detected according to an embodiment of the disclosure;
图4为本公开实施例所提供的一种对待检测对象进行追踪的方法流程图;FIG. 4 is a flowchart of a method for tracking an object to be detected according to an embodiment of the disclosure;
图5为本公开实施例所提供的一种神经网络中的目标检测网络的训练方法流程图;FIG. 5 is a flowchart of a method for training a target detection network in a neural network provided by an embodiment of the disclosure;
图6为本公开实施例所提供的另一种神经网络中的目标追踪网络的训练方法流程图;6 is a flowchart of another method for training a target tracking network in a neural network provided by an embodiment of the disclosure;
图7为本公开实施例所提供的一种对象检测装置的结构示意图;FIG. 7 is a schematic structural diagram of an object detection device provided by an embodiment of the disclosure;
图8为本公开实施例所提供的一种电子设备的示意图。FIG. 8 is a schematic diagram of an electronic device provided by an embodiment of the disclosure.
具体实施方式Detailed ways
为使本公开实施例的目的、技术方案和优点更加清楚,下面将结合本公开实施例中附图,对本公开实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本公开一部分实施例,而不是全部的实施例。通常在此处附图中描述和示出的本公开实施例的组件可以以各种不同的配置来布置和设计。因此,以下对在附图中提供的本公开的实施例的详细描述并非旨在限制要求保护的本公开的范围,而是仅仅表示本公开的选定实施例。基于本公开的实施例,本领域技术人员在没有做出创造性劳动的前提下所获得的所有其他实施例,都属于本公开保护的范围。In order to make the objectives, technical solutions and advantages of the embodiments of the present disclosure clearer, the technical solutions in the embodiments of the present disclosure will be described clearly and completely in conjunction with the accompanying drawings in the embodiments of the present disclosure. Obviously, the described embodiments are only These are a part of the embodiments of the present disclosure, but not all of the embodiments. The components of the embodiments of the present disclosure generally described and illustrated in the drawings herein may be arranged and designed in a variety of different configurations. Therefore, the following detailed description of the embodiments of the present disclosure provided in the accompanying drawings is not intended to limit the scope of the claimed present disclosure, but merely represents selected embodiments of the present disclosure. Based on the embodiments of the present disclosure, all other embodiments obtained by those skilled in the art without creative work shall fall within the protection scope of the present disclosure.
应注意到:相似的标号和字母在下面的附图中表示类似项,因此,一旦某一项在一个附图中被定义,则在随后的附图中不需要对其进行进一步定义和解释。It should be noted that similar reference numerals and letters indicate similar items in the following drawings. Therefore, once an item is defined in one drawing, it does not need to be further defined and explained in the subsequent drawings.
针对一些公共场景中,经常发生遗失物品的现象,比如在乘车环境中,经常存在乘客丢失个人物品的事件,一般当乘客丢失物品后,在想起后才会返回寻找丢失的物品,该过程时间较久,且比较繁琐,如何能够有效防止乘车环境中的物品遗失,提高乘车环境下的物品安全,为本公开实施例要解决的问题。For some public scenes, the phenomenon of missing items often occurs. For example, in the riding environment, there are often incidents of passengers losing their personal items. Generally, when the passengers lose their items, they will return to look for the lost items after remembering. This process time It is relatively long and cumbersome. How to effectively prevent the loss of articles in the riding environment and improve the safety of articles in the riding environment is a problem to be solved by the embodiments of the present disclosure.
基于上述研究,本公开实施例提供了一种针对车舱场景下的遗留物品检测的方式,通过获取待检测的车舱内图像,这样在车舱内人员减少时,可以对获取的待检测车舱内图像进行目标检测,从而可以确定待检测的车舱内图像中是否存在待检测对象,示例性地,该待检测对象可以为车舱内人员遗失的物品,这样在检测到存在车舱内人员遗失的 物品时,可以进行相应提示,从而降低乘车环境中的物品丢失概率,提高乘车环境中的物品安全性。Based on the above research, the embodiments of the present disclosure provide a method for detecting leftover items in a cabin scene. By acquiring images in the cabin to be detected, when the number of people in the cabin is reduced, the acquired vehicle to be detected can be checked. The cabin image is subject to target detection, so that it can be determined whether there is an object to be detected in the image in the cabin to be detected. For example, the object to be detected may be an item lost by a person in the cabin. When a person loses items, he can give corresponding prompts, thereby reducing the probability of item loss in the riding environment and improving the safety of items in the riding environment.
为便于对本实施例进行理解,首先对本公开实施例所公开的一种对象检测方法进行详细介绍,本公开实施例所提供的对象检测方法的执行主体一般为具有一定计算能力的计算机设备,该计算机设备例如包括:终端设备或服务器或其它处理设备,终端设备可以为用户设备(User Equipment,UE)、移动设备、用户终端等。在一些可能的实现方式中,该对象检测方法可以通过处理器调用存储器中存储的计算机可读指令的方式来实现。In order to facilitate the understanding of this embodiment, an object detection method disclosed in the embodiment of the present disclosure is first introduced in detail. The subject The equipment includes, for example, a terminal device or a server or other processing equipment. The terminal device may be a user equipment (User Equipment, UE), a mobile device, a user terminal, and the like. In some possible implementation manners, the object detection method may be implemented by a processor invoking computer-readable instructions stored in a memory.
参见图1所示,为本公开实施例提高的对象检测方法的流程图,包括以下S101~S103:Refer to FIG. 1, which is a flowchart of an improved object detection method according to an embodiment of the present disclosure, including the following S101 to S103:
S101,获取待检测的车舱内图像。S101: Acquire an image in the cabin to be detected.
其中,该车舱可以为出租车车舱、火车车舱或者飞机车舱等公共交通工具的车舱;待检测的车舱内图像可以根据设置于车舱内固定位置的图像采集设备拍摄得到的待检测的车舱内图像。Among them, the cabin can be a cabin of a public transportation vehicle such as a taxi cabin, a train cabin, or an airplane cabin; the image in the cabin to be detected can be captured by an image acquisition device set in a fixed position in the cabin. The image in the cabin to be detected.
S102,在车舱内人员减少的情况下,对待检测的车舱内图像进行目标检测,确定待检测的车舱内图像中是否存在待检测对象。S102: When the number of people in the cabin is reduced, target detection is performed on the image in the cabin to be detected, and it is determined whether there is an object to be detected in the image in the cabin to be detected.
示例性地,可以根据获取到的待检测的车舱内图像来监测车舱内是否存在人员增加以及是否存在人员减少,当检测到车舱内存在人员减少时,可以对获取到的待检测的车舱内图像进行目标检测,比如检测车舱内是否还存在有减少人员的遗留物品。Exemplarily, it is possible to monitor whether there is an increase in personnel in the cabin and whether there is a decrease in personnel in the cabin according to the acquired images in the cabin to be detected. Object detection is performed on images in the cabin, such as detecting whether there are still items left in the cabin that reduce the number of people.
示例性地,对待检测的车舱内图像进行目标检测,可以为用于检测预先设置好的容易被乘客或者驾驶员遗失的物品,比如手机、钱包、手包、行李箱等物品。Exemplarily, the target detection of the image in the cabin to be detected may be used to detect preset items that are easily lost by passengers or drivers, such as mobile phones, wallets, handbags, suitcases and the like.
S103,响应于待检测的车舱内图像中存在待检测对象的状态的持续时长超过预设时长,发出提示信息。S103: In response to the duration of the state in which the object to be detected exists in the image of the cabin to be detected exceeds a preset time period, a prompt message is sent.
示例性地,当车舱内存在人员减少时,即检测到人员出车舱时,若检测出车舱内仍然存在出车舱人员的遗留物品时,可以进行提示,以对出车舱人员进行提示。Exemplarily, when there are fewer people in the cabin, that is, when people are detected leaving the cabin, if it is detected that there are still items left by the cabin personnel in the cabin, a reminder can be given to the cabin personnel. hint.
本公开实施例提供了一种针对车舱场景下的遗留物品检测的方式,通过获取待检测的车舱内图像,这样在车舱内人员减少时,可以对获取的待检测车舱内图像进行目标检测,从而可以确定待检测的车舱内图像中是否存在待检测对象,示例性地,该待检测对象可以为车舱内人员遗失的物品,这样在检测到存在车舱内人员遗失的物品时,可以进行相应提示,从而降低乘车环境中的物品丢失概率,提高乘车环境中的物品安全性。The embodiments of the present disclosure provide a method for detecting leftover items in a cabin scene. By acquiring images in the cabin to be detected, when the number of people in the cabin is reduced, the acquired images in the cabin can be checked. Target detection can determine whether there is an object to be detected in the image in the cabin to be detected. For example, the object to be detected may be an item lost by a person in the cabin. At the same time, corresponding prompts can be made to reduce the probability of items lost in the riding environment and improve the safety of items in the riding environment.
针对上述S103,在响应于待检测的车舱内图像中存在待检测对象的状态的持续时长超过预设时长,发出提示信息时,可以包括:Regarding the foregoing S103, in response to the duration of the state in which the object to be detected exists in the image of the vehicle cabin to be detected exceeds the preset time, when the prompt message is sent, it may include:
在减少的车舱内人员为乘客的情况下,响应于待检测的车舱内图像中存在待检测对象的状态的持续时长超过第一预设时长,发出第一提示信息,第一提示信息用于提示乘客物品遗留;In the case that the reduced number of people in the cabin is a passenger, in response to the presence of the object to be detected in the image of the cabin to be detected, the duration of the state of the object to be detected exceeds the first preset time period, a first prompt message is issued, and the first prompt message is used To remind passengers of items left behind;
在减少的车舱内人员为驾驶员的情况下,响应于待检测的车舱内图像中存在待检测对象的状态的持续时长超过第二预设时长,发出第二提示信息,第二提示信息用于提示驾驶员物品遗留。In the case that the reduced number of people in the cabin is the driver, in response to the duration of the state of the object to be detected in the image to be detected in the cabin exceeding the second preset duration, a second prompt message is issued, the second prompt message Used to remind the driver that items are left behind.
示例性地,这里的第一预设时长和第二预设时长可以相同,也可以不相同,考虑到驾驶员可能仅仅是短暂出车舱,这里的第二预设时长可以大于第一预设时长。Exemplarily, the first preset duration and the second preset duration here may be the same or different. Considering that the driver may only get out of the cabin for a short time, the second preset duration here may be greater than the first preset duration. duration.
示例性地,第一提示信息与第二提示信息均可以为语言播报,其中第一提示信息用于提示乘客或者驾驶员,第二提示信息用于提示驾驶员。Exemplarily, both the first prompt information and the second prompt information may be broadcast in language, where the first prompt information is used to prompt the passenger or the driver, and the second prompt information is used to prompt the driver.
可以看出,本公开实施例中,针对不同类型的车舱内人员,在检测到车舱内存在遗留物品时,分别进行提示,从而提高乘车安全。It can be seen that, in the embodiments of the present disclosure, for different types of passengers in the cabin, when it is detected that there are items left in the cabin, prompts are separately provided, so as to improve riding safety.
在本公开的一些实施例中,减少的车舱内人员为驾驶员和/或乘客,在确定待检测的车舱内图像中存在待检测对象后,在发出提示信息之前,对象检测方法还包括:In some embodiments of the present disclosure, the reduced number of persons in the cabin is the driver and/or passengers. After it is determined that there is an object to be detected in the image of the cabin to be detected, before the prompt message is issued, the object detection method further includes :
根据待检测对象在车舱中位置,确定待检测对象的归属人员;其中,待检测对象的 归属人员为驾驶员和/或乘客。According to the position of the object to be detected in the cabin, the attribution of the object to be detected is determined; where the attribution of the object to be detected is the driver and/or passenger.
根据获取到的待检测的车舱内图像,可以确定出每个车舱内人员在车舱内的位置,以及该车舱内人员对应的待检测物品,这样可以建立待检测对象与位置的关联关系,以及位置与车舱内人员之间的关联关系,然后进一步可以根据待检测对象在车舱中位置,确定该待检测对象的归属人员。在确定出待检测对象的归属人员之后根据归属人员发出相应的提示信息。According to the acquired images in the cabin to be detected, the position of each person in the cabin in the cabin and the corresponding item to be detected in the cabin can be determined, so that the association between the object to be detected and the position can be established The relationship, and the association relationship between the position and the person in the cabin, and then the person who belongs to the object to be detected can be determined according to the position of the object to be detected in the cabin. After the attribution of the object to be detected is determined, a corresponding prompt message is issued according to the attribution.
可以看出,本公开实施例中,可以基于待检测对象在车舱中位置,确定出待检测对象的归属人员,从而便于后续进行分类提示。It can be seen that, in the embodiments of the present disclosure, the attribution of the object to be detected can be determined based on the position of the object to be detected in the cabin, so as to facilitate subsequent classification prompts.
在一种实施方式中,针对上述S102,在对待检测的车舱内图像进行目标检测时,参见图2所示,包括以下步骤S201~S203,其中:In an embodiment, for the above S102, when performing target detection on the image in the cabin to be detected, referring to FIG. 2, the following steps S201 to S203 are included, wherein:
S201,对待检测的车舱内图像进行特征提取,得到与多个通道中每个通道对应的第一特征图;其中每个通道对应的第一特征图,为将待检测对象在该通道对应的图像特征类别下的特征进行增强处理后的特征图。S201: Perform feature extraction on the image in the cabin to be detected, to obtain a first feature map corresponding to each of the multiple channels; wherein the first feature map corresponding to each channel is the image corresponding to the object to be detected in the channel. The feature map of the feature under the image feature category after enhancement processing.
对待检测的车舱内图像进行特征提取,可以是通过提前训练好的特征提取网络进行特征提取,得到多个预设通道对应的第一特征图,这里每个通道可以理解为对应待检测的车舱内图像的一种图像特征类别,比如对待检测的车舱内图像进行特征提取后,可以得到三个通道分别对应的第一特征图,其中第一个通道可以对应待检测的车舱内图像的纹理特征、第二个通道可以对应待检测的车舱内图像的颜色特征、第三个通道可以对应待检测的车舱内图像的尺寸特征,这样即可以得到待检测的车舱内图像在各个图像特征类别下的特性图。The feature extraction of the image in the cabin to be detected can be performed by feature extraction through a feature extraction network trained in advance to obtain the first feature map corresponding to multiple preset channels. Here, each channel can be understood as corresponding to the vehicle to be detected. An image feature category of the cabin image. For example, after feature extraction of the cabin image to be detected, the first feature map corresponding to the three channels can be obtained, and the first channel can correspond to the cabin image to be detected The second channel can correspond to the color feature of the image in the cabin to be detected, and the third channel can correspond to the size feature of the image in the cabin to be detected, so that the image in the cabin to be detected can be obtained Feature maps under each image feature category.
为了明显地区分待检测对象和车舱内背景,在对待检测的车舱内图像进行特征提取,得到第一特征图的过程中,会对每个通道对应的第一特征图中代表待检测对象的特征信息和代表车舱内背景的特征信息进行区分处理,比如可以对代表待检测对象的特征信息进行增强处理,对代表车舱内背景的特征信息进行弱化处理,或者,可以仅对代表待检测对象的特征信息进行增强处理,或者,可以仅对代表车舱内背景的特征信息进行弱化处理,从而使得得到的每个第一特征图中表示待检测对象的特征信息的强度大于表示车舱内背景的特征信息的强度。In order to clearly distinguish the object to be detected from the background in the cabin, feature extraction is performed on the image in the cabin to be detected to obtain the first feature map. The first feature map corresponding to each channel will represent the object to be detected. The feature information of the vehicle cabin is distinguished from the feature information representing the background in the cabin. For example, the feature information representing the object to be detected can be enhanced, and the feature information representing the background in the cabin can be weakened. The feature information of the detected object is enhanced, or only the feature information representing the background in the cabin can be weakened, so that the strength of the feature information representing the object to be detected in each first feature map obtained is greater than that of the cabin. The strength of the characteristic information of the inner background.
S202,针对每个通道,将该通道对应的第一特征图与其它通道分别对应的第一特征图进行特征信息融合,得到融合后的第二特征图。S202: For each channel, perform feature information fusion on the first feature map corresponding to the channel and the first feature maps respectively corresponding to other channels to obtain a fused second feature map.
因为每个通道趋向表示待检测的车舱内图像在该通道对应的图像特征类别下的特征信息,为了得到特征信息更加完善的特征图,这里针对每个通道,将该通道对应的第一特征图与其它通道分别对应的第一特征图进行特征信息融合,即可以得到包含多种图像特征类别的第二特征图。Because each channel tends to indicate the feature information of the image in the cabin to be detected under the corresponding image feature category of the channel, in order to obtain a more complete feature map of the feature information, here for each channel, the first feature corresponding to the channel The first feature map corresponding to the image and the other channels is fused with feature information, that is, a second feature map containing multiple image feature categories can be obtained.
这里每个通道对应的第一特征图中的特征信息可以通过该通道对应的第一特征图中的特征数据进行表示,特征信息融合是指通过融合每个第一特征图中的特征数据,得到融合后的第二特征图。Here, the feature information in the first feature map corresponding to each channel can be represented by the feature data in the first feature map corresponding to the channel. Feature information fusion refers to fusing the feature data in each first feature map to obtain The second feature map after fusion.
具体如何基于第一特征图进行特征信息融合,得到第二特征图的详细过程将在后文以具体实施例的形式进行详细阐述。The detailed process of how to perform feature information fusion based on the first feature map and obtain the second feature map will be described in detail in the form of specific embodiments later.
S203,基于融合后的第二特征图,检测待检测的车舱内图像中的待检测对象。S203: Detect the object to be detected in the image of the cabin to be detected based on the second feature map after the fusion.
这里,基于融合后的第二特征图,检测待检测的车舱内图像中的待检测对象的过程,可以是基于预先训练的神经网络中的目标检测网络,对待检测的车舱内图像中的待检测对象进行检测,即将融合后的第二特征图输入该预先训练的神经网络中的目标检测网络,即可以完成对待检测的车舱内图像中的待检测对象进行检测。Here, based on the fused second feature map, the process of detecting the object to be detected in the image in the cabin to be detected may be based on the target detection network in the pre-trained neural network, and the image in the cabin to be detected The object to be detected is detected, that is, the fused second feature map is input to the target detection network in the pre-trained neural network, which can complete the detection of the object to be detected in the image of the cabin to be detected.
这里检测待检测的车舱内图像中的待检测对象可以是指检测待检测的车舱内图像中是否存在待检测对象,且在确定待检测的车舱内图像中存在待检测对象的情况下,确定 待检测对象在该待检测的车舱内图像中的位置信息。Here, detecting the object to be detected in the image in the cabin to be detected may refer to detecting whether there is the object to be detected in the image in the cabin to be detected, and when it is determined that there is the object to be detected in the image in the cabin to be detected , To determine the position information of the object to be detected in the image of the cabin to be detected.
本公开实施例中,通过特征提取得到的第一特征图为针对待检测对象在该通道对应的图像特征类别下的特征进行增强处理后的特征图,即每张第一特征图中包含的待检测对象的特征信息较非待检测对象的特征信息进行了增强处理,这样能够通过特征信息明显区分待检测对象和待检测的车舱内图像中的背景区域;然后针对每个通道,将该通道对应的第一特征图与其它通道分别对应的第一特征图进行特征信息融合,从而得到特征信息更加全面的待检测对象,然后基于这样的第二特征图完成针对待检测的车舱内图像中待检测对象的检测,即能够准确地检测待检测的车舱内图像中的待检测对象。In the embodiment of the present disclosure, the first feature map obtained by feature extraction is the feature map after enhancement processing is performed on the feature of the object to be detected in the image feature category corresponding to the channel, that is, the feature map contained in each first feature map The feature information of the detected object is enhanced compared to the feature information of the non-detected object, so that the object to be detected can be clearly distinguished from the background area in the image of the cabin to be detected through the feature information; then for each channel, the channel The corresponding first feature map and the first feature maps corresponding to other channels are fused with feature information, thereby obtaining a more comprehensive feature information of the object to be detected, and then based on this second feature map to complete the image of the cabin to be detected The detection of the object to be detected can accurately detect the object to be detected in the image of the cabin to be detected.
针对上述S202,在针对每个通道,将该通道对应的第一特征图与其它通道分别对应的第一特征图进行特征信息融合,得到融合后的第二特征图时,可以包括:For the above S202, when performing feature information fusion on the first feature map corresponding to the channel and the first feature maps corresponding to other channels respectively for each channel, to obtain the fused second feature map, it may include:
(1)针对进行特征信息融合的多个第一特征图,确定多个第一特征图所对应的权重矩阵;(1) For multiple first feature maps for feature information fusion, determine the weight matrix corresponding to the multiple first feature maps;
(2)基于权重矩阵,对多个第一特征图的特征信息进行加权求和,得到包含各个融合特征信息的第二特征图。(2) Based on the weight matrix, perform a weighted summation on the feature information of the multiple first feature maps to obtain a second feature map containing each fused feature information.
在一些实施例中,对待检测的车舱内图像进行特征提取后,得到大小为h*w*c的第一特征图,其中c表示第一特征图的个数,即待检测的车舱内图像进行特征提取后得到的通道个数,每个通道对应一个第一特征图,h*w表示每个第一特征图的尺寸,且每个第一特征图包含h*w个特征点对应的特征数据。In some embodiments, after feature extraction is performed on the image in the cabin to be detected, a first feature map with a size of h*w*c is obtained, where c represents the number of the first feature maps, that is, the cabin to be detected The number of channels obtained after feature extraction of the image, each channel corresponds to a first feature map, h*w represents the size of each first feature map, and each first feature map contains h*w feature points corresponding to Characteristic data.
这里通过对多个第一特征图进行特征信息融合,得到的融合后的第二特征图的大小同样为h*w*c,即同样每个通道对应一个第二特征图,每个第二特征图的尺寸为h*w,第二特征图中任一特征点对应的特征数据是通过每个通道对应的第一特征图中与第二特征图中该任一特征点相同位置的特征点对应的特征数据融合得到的,具体融合方式如下:Here, by performing feature information fusion on multiple first feature maps, the size of the fused second feature map is also h*w*c, that is, each channel corresponds to a second feature map, and each second feature map The size of the graph is h*w, and the feature data corresponding to any feature point in the second feature graph corresponds to the feature point in the same position of any feature point in the second feature graph in the first feature graph corresponding to each channel The specific fusion method is as follows:
这里的权重矩阵包含c个通道分别对应的权重向量,每个通道对应的权重向量中的权重值表示每个第一特征图中的特征数据在确定该通道对应的第二特征图时的权重值。The weight matrix here contains the weight vectors corresponding to the c channels, and the weight value in the weight vector corresponding to each channel represents the weight value of the feature data in each first feature map when determining the second feature map corresponding to the channel .
比如,c等于3,即表示对待检测的车舱内图像进行特征提取后,得到3个通道对应的第一特征图,即得3个第一特征图,每个第一特征图中包含h*w个特征点对应的特征数据,这h*w个特征数据即可以构成h*w维度的特征向量,特征向量中的各个特征数据,即对应第一特征图中各个特征点的特征数据。For example, c is equal to 3, which means that after feature extraction of the image in the cabin to be detected, first feature maps corresponding to 3 channels are obtained, that is, 3 first feature maps are obtained, and each first feature map contains h* Feature data corresponding to w feature points, these h*w feature data can constitute a feature vector of h*w dimensions, and each feature data in the feature vector is the feature data corresponding to each feature point in the first feature map.
这样,在确定每个通道对应的第一特征图的特征向量以及该第一特征图在构成该通道的第二特征图时对应的权重值后,即可以根据该通道对应的权重矩阵,对各个通道对应的第一特征图中的特征数据进行加权求和,得到该通道对应的第二特征图中的特征数据。In this way, after determining the feature vector of the first feature map corresponding to each channel and the corresponding weight value of the first feature map when constituting the second feature map of the channel, the corresponding weight matrix of the channel can be used for each channel. The feature data in the first feature map corresponding to the channel is weighted and summed to obtain the feature data in the second feature map corresponding to the channel.
可以看出,本公开实施例中,通过丰富待检测对象包含的特征信息,以及增加车舱内图像中待检测对象和背景区域的区分度,从而便于后期基于更加丰富且与背景区域区分度较大的特征信息,准确地确定该待检测的车舱内图像中,是否存在待检测对象,以及待检测对象的位置信息。It can be seen that in the embodiments of the present disclosure, by enriching the feature information contained in the object to be detected, and increasing the degree of discrimination between the object to be detected and the background area in the image in the cabin, it is convenient for the later based to be richer and more distinguishable from the background area. The large feature information accurately determines whether there is an object to be detected in the image of the cabin to be detected, and the position information of the object to be detected.
下面以一个实施例进行解释,如何通过对每个通道对应的第一特征图进行融合得到融合后的第二特征图:The following uses an embodiment to explain how to obtain a fused second feature map by fusing the first feature map corresponding to each channel:
对待检测的车舱内图像进行特征提取,得到3个通道对应的第一特征图,每个第一特征图的尺寸为h*w,即每个第一特征图包含h*w个特征数据,假设每个第一特征图对应的特征向量构成的特征矩阵为:Perform feature extraction on the images in the cabin to be detected to obtain the first feature maps corresponding to the 3 channels. The size of each first feature map is h*w, that is, each first feature map contains h*w feature data. Assume that the feature matrix formed by the feature vector corresponding to each first feature map is:
Figure PCTCN2020137919-appb-000001
Figure PCTCN2020137919-appb-000001
其中,(a 1 a 2 ... a h*w) T可以用于表示第1个通道对应的第一特征图的特征向量;a 1表示第1个通道对应的第一特征图中第1个特征点的特征数据,a 2表示第1个通道对应的第一特征图中第2个特征点的特征数据;a h*w表示第1个通道对应的第一特征图中第h*w个特征点的特征数据; Among them, (a 1 a 2 ... a h*w ) T can be used to represent the feature vector of the first feature map corresponding to the first channel; a 1 represents the first feature map corresponding to the first channel. The feature data of each feature point, a 2 represents the feature data of the second feature point in the first feature map corresponding to the first channel; a h*w represents the h*w th in the first feature map corresponding to the first channel Feature data of each feature point;
(b 1 b 2 ... b h*w) T可以用于表示第2个通道对应的第一特征图的特征向量;b 1表示第2个通道对应的第一特征图中第1个特征点的特征数据;b 2表示第2个通道对应的第一特征图中第2个特征点的特征数据;b h*w表示第2个通道对应的第一特征图中第h*w个特征点的特征数据; (b 1 b 2 ... b h*w ) T can be used to represent the feature vector of the first feature map corresponding to the second channel; b 1 represents the first feature in the first feature map corresponding to the second channel Point feature data; b 2 represents the feature data of the second feature point in the first feature map corresponding to the second channel; b h*w represents the h*wth feature in the first feature map corresponding to the second channel Point characteristic data;
(d 1 d 2 ... d h*w) T可以用于表示第3个通道对应的第一特征图的特征向量;d 1表示第3个通道对应的第一特征图中第1个特征点的特征数据,d 2表示第3个通道对应的第一特征图中第2个特征点的特征数据,d h*w表示第3个通道对应的第一特征图中第h*w个特征点的特征数据。 (d 1 d 2 ... d h*w ) T can be used to represent the feature vector of the first feature map corresponding to the third channel; d 1 represents the first feature in the first feature map corresponding to the third channel Point feature data, d 2 represents the feature data of the second feature point in the first feature map corresponding to the third channel, and d h*w represents the h*wth feature in the first feature map corresponding to the third channel Point characteristic data.
假设3个第一特征图所对应的权重矩阵为:Assume that the weight matrices corresponding to the three first feature maps are:
Figure PCTCN2020137919-appb-000002
Figure PCTCN2020137919-appb-000002
其中,(m 1 m 2 m 3) T表示确定第1个通道对应的第二特征图时,不同第一特征图各自对应的权重向量,m 1表示第1个通道对应的第一特征图中各个特征数据在确定第1个通道对应的第二特征图时的权重值;m 2表示第2个通道对应的第一特征图中各个特征数据在确定第1个通道对应的第二特征图时的权重值;m 3表示第3个通道对应的第一特征图中各个特征数据在确定第1个通道对应的第二特征图时的权重值。 Among them, (m 1 m 2 m 3 ) T represents the weight vector corresponding to different first feature maps when determining the second feature map corresponding to the first channel, and m 1 represents the first feature map corresponding to the first channel The weight value of each feature data when determining the second feature map corresponding to the first channel; m 2 means that each feature data in the first feature map corresponding to the second channel determines the second feature map corresponding to the first channel M 3 represents the weight value of each feature data in the first feature map corresponding to the third channel when determining the second feature map corresponding to the first channel.
其中,(k 1 k 2 k 3) T表示确定第2个通道对应的第二特征图时,不同第一特征图各自对应的权重向量,k 1表示第1个通道对应的第一特征图中各个特征数据在确定第2个通 道对应的第二特征图时的权重值;k 2表示第2个通道对应的第一特征图中各个特征数据在确定第2个通道对应的第二特征图时的权重值;k 3表示第3个通道对应的第一特征图中各个特征数据在确定第2个通道对应的第二特征图时的权重值。 Among them, (k 1 k 2 k 3 ) T represents the weight vector corresponding to different first feature maps when determining the second feature map corresponding to the second channel, and k 1 represents the first feature map corresponding to the first channel The weight value of each feature data when determining the second feature map corresponding to the second channel; k 2 means that each feature data in the first feature map corresponding to the second channel determines the second feature map corresponding to the second channel K 3 represents the weight value of each feature data in the first feature map corresponding to the third channel when determining the second feature map corresponding to the second channel.
其中,(l 1 l 2 l 3) T表示确定第3个通道对应的第二特征图时,不同第一特征图各自对应的权重向量,l 1表示第1个通道对应的第一特征图中各个特征数据在确定第3个通道对应的第二特征图时的权重值;l 2表示第2个通道对应的第一特征图中各个特征数据在确定第3个通道对应的第二特征图时的权重值;l 3表示第3个通道对应的第一特征图中各个特征数据在确定第3个通道对应的第二特征图时的权重值。 Among them, (l 1 l 2 l 3 ) T represents the weight vector corresponding to different first feature maps when determining the second feature map corresponding to the third channel, and l 1 represents the first feature map corresponding to the first channel The weight value of each feature data when determining the second feature map corresponding to the third channel; l 2 means that each feature data in the first feature map corresponding to the second channel determines the second feature map corresponding to the third channel 13 represents the weight value of each feature data in the first feature map corresponding to the third channel when determining the second feature map corresponding to the third channel.
在一些实施例中,在基于权重矩阵,对多个第一特征图的特征信息进行加权求和时,确定第1个通道对应的第二特征图时,可以按照以下公式(1)确定:In some embodiments, when the feature information of multiple first feature maps is weighted and summed based on the weight matrix, the second feature map corresponding to the first channel can be determined according to the following formula (1):
T 1=(a 1 a 2 ... a h*w) T*m 1+(b 1 b 2 ... b h*w) T*m 2+(d 1 d 2 ... d h*w) T*m 3 (1) T 1 = (a 1 a 2 ... a h*w ) T *m 1 +(b 1 b 2 ... b h*w ) T *m 2 +(d 1 d 2 ... d h* w ) T *m 3 (1)
其中,第1个通道对应的第二特征图中第1个特征点的特征数据即为a 1m 1+b 1m 2+d 1m 3;第1个通道对应的第二特征图中第2个特征点的特征数据即为a 2m 1+b 2m 2+d 2m 3;第1个通道对应的第二特征图中第h*w个特征点的特征数据即为a h*wm 1+b h*wm 2+d h*wm 3Among them, the feature data of the first feature point in the second feature map corresponding to the first channel is a 1 m 1 + b 1 m 2 + d 1 m 3 ; the first channel corresponding to the first channel in the second feature map is The feature data of the two feature points is a 2 m 1 + b 2 m 2 + d 2 m 3 ; the feature data of the h*w feature point in the second feature map corresponding to the first channel is a h* w m 1 +b h*w m 2 +d h*w m 3 .
同理,可以按照相同的方式确定第2个通道对应的第二特征图以及第3个通道对应的第二特征图。Similarly, the second feature map corresponding to the second channel and the second feature map corresponding to the third channel can be determined in the same manner.
上述确定融合后的第二特征图的方式,通过确定第一特征图所对应的权重矩阵,得到包含各个通道对应的第二特征图,这样每个第二特征图均是通过多个通道对应的图像特征类别下的特征进行融合得到的,若待检测的车舱内图像中包含待检测对象,则融合后的第二特征图中能够包含待检测对象更加丰富的特征信息,又因为第一特征图中针对检测对象的特征进行了增强处理,则基于第一特征图得到的融合后的第二特征图中待检测对象的特征信息和背景区域的特征信息之间的区分度也较大,从而便于后期基于更加丰富且与背景区域区分度较大的特征信息,准确地确定该待检测的车舱内图像中,是否存在待检测对象,以及待检测对象的位置信息。In the above method of determining the second feature map after fusion, by determining the weight matrix corresponding to the first feature map, the second feature map corresponding to each channel is obtained, so that each second feature map corresponds to multiple channels The features under the image feature category are fused. If the image in the cabin to be detected contains the object to be detected, the second feature map after fusion can contain more feature information of the object to be detected, and because of the first feature In the figure, the feature of the detection object is enhanced, and the distinction between the feature information of the object to be detected and the feature information of the background area in the second feature map after the fusion obtained based on the first feature map is also greater, so It is convenient for the later stage to accurately determine whether there is an object to be detected and the position information of the object to be detected in the image of the cabin to be detected based on the feature information that is richer and more distinguishable from the background area.
在得到融合后的第二特征图后,就可以根据融合后的第二特征图来检测待检测的车舱内图像中的待检测对象,在一些实施例中,在基于融合后的第二特征图,检测待检测的车舱内图像中的待检测对象时,如图3所示,可以包括以下步骤S301~S303:After the fused second feature map is obtained, the object to be detected in the image of the cabin to be detected can be detected according to the fused second feature map. In some embodiments, based on the fused second feature map In the figure, when detecting the object to be detected in the image of the cabin to be detected, as shown in FIG. 3, the following steps S301 to S303 may be included:
S301,基于融合后的第二特征图,确定设定个数的候选区域,每个候选区域包含设定个数的特征点。S301: Determine a set number of candidate regions based on the fused second feature map, and each candidate region contains a set number of feature points.
这里候选区域是指可能包含待检测对象的区域,这里候选区域的个数和每个候选区域中包含的特征点的设定个数可以为预先训练的神经网络中的候选区域提取网络来确定的。The candidate area here refers to the area that may contain the object to be detected. The number of candidate areas and the set number of feature points contained in each candidate area can be determined by the candidate area extraction network in the pre-trained neural network. .
在一些实施例中,候选区域的设定个数基于目标检测网络的测试精度考虑,比如在 网络训练过程中,针对大量待检测样本图像分别对应的融合后的第二样本特征图,不断调整候选区域的个数,然后在测试过程中,对训练的目标检测网络进行测试,通过不同候选区域对应的测试精度,确定候选区域的设定个数。In some embodiments, the set number of candidate regions is based on the consideration of the test accuracy of the target detection network. For example, during the network training process, the candidates are continuously adjusted for the fused second sample feature maps corresponding to a large number of sample images to be detected. The number of regions, and then in the testing process, the trained target detection network is tested, and the set number of candidate regions is determined through the test accuracy corresponding to different candidate regions.
这里每个候选区域包含的设定个数,可以基于目标检测网络的测试速度和测试精度综合考虑来提前确定,比如在网络训练过程中,首先保持候选区域的个数不变,不断调整每个候选区域包含的特征点的个数,然后在测试过程中,对目标检测网络进行测试,综合考虑测试速度以及测试精度,确定每个候选区域包含的特征点的设定个数。Here, the number of settings contained in each candidate area can be determined in advance based on the comprehensive consideration of the test speed and test accuracy of the target detection network. For example, in the network training process, first keep the number of candidate areas unchanged, and continuously adjust each The number of feature points contained in the candidate area, and then in the test process, the target detection network is tested, and the test speed and test accuracy are comprehensively considered to determine the set number of feature points contained in each candidate area.
S302,基于每个候选区域包含的特征点的特征数据,确定该候选区域对应的置信度;每个候选区域对应的置信度用于表征该候选区域中包含待检测对象的可信程度。S302, based on the feature data of the feature points contained in each candidate area, determine the confidence level corresponding to the candidate area; the confidence level corresponding to each candidate area is used to characterize the credibility that the candidate area contains the object to be detected.
每个候选区域中包含的特征点均对应有特征数据,根据这些特征数据,可以确定该候选区域包含待检测对象的可信程度,示例性地,针对每个候选区域对应的置信度,可以通过预先训练的神经网络中的目标检测网络来确定,即将该候选区域中的特征数据输入预先训练的神经网络中的目标检测网络,即可以得到该候选区域对应的置信度。The feature points contained in each candidate area correspond to feature data. According to these feature data, the credibility that the candidate area contains the object to be detected can be determined. For example, the confidence level corresponding to each candidate area can be passed The target detection network in the pre-trained neural network is determined, that is, the feature data in the candidate area is input into the target detection network in the pre-trained neural network, and the confidence level corresponding to the candidate area can be obtained.
S303,基于每个候选区域对应的置信度以及不同候选区域之间的重叠区域,从设定个数的候选区域中筛选出待检测对象对应的检测区域;检测区域用于标识待检测对象在待检测的车舱内图像中的位置。S303, based on the confidence level corresponding to each candidate area and the overlapping area between different candidate areas, filter out the detection area corresponding to the object to be detected from the set number of candidate areas; the detection area is used to identify the object to be detected in the The detected position in the image of the cabin.
在一些实施例中,在基于每个候选区域对应的置信度以及不同候选区域之间的重叠区域,从设定个数的候选区域中筛选出待检测对象对应的检测区域时,这里可以先从设定个数的候选区域中筛选出置信度排序前设定个数的目标候选区域,然后可以基于预先设置置信度阈值以及不同候选区域之间的重叠区域,再确定待检测对象对应的检测区域。In some embodiments, when the detection area corresponding to the object to be detected is selected from a set number of candidate areas based on the confidence level corresponding to each candidate area and the overlap area between different candidate areas, you can start with The set number of candidate regions are selected to filter out the set number of target candidate regions before the confidence ranking, and then based on the preset confidence threshold and the overlap area between different candidate regions, the detection region corresponding to the object to be detected can be determined .
比如,认为对应置信度高于该置信度阈值的目标候选区域为待检测对象对应的检测区域的概率较大,且综合考虑候选区域之间存在重叠的候选区域的情况下,若发生重叠的候选区域的重叠面积大于设定面积阈值,可以说明发生重叠的候选区域包含的待检测对象可能为同一个待检测对象,基于该考虑,进一步在目标候选区域中选择出待检测对象对应的检测区域,比如,可以在目标候选区域中保留置信度高于置信度阈值的目标候选区域,且在发生重叠区域的目标候选区域中保留置信度最高的目标候选区域,即得到待检测对象对应的检测区域。For example, it is considered that the target candidate area with the corresponding confidence level higher than the confidence threshold is more likely to be the detection area corresponding to the object to be detected, and considering that there are overlapping candidate areas between the candidate areas, if an overlapping candidate occurs The overlap area of the region is greater than the set area threshold, which can indicate that the object to be detected contained in the overlapping candidate area may be the same object to be detected. Based on this consideration, the detection area corresponding to the object to be detected is further selected from the target candidate area. For example, the target candidate area with a confidence higher than the confidence threshold can be reserved in the target candidate area, and the target candidate area with the highest confidence can be reserved in the target candidate area where the overlap occurs, that is, the detection area corresponding to the object to be detected is obtained.
以上在执行从设定个数的候选区域中筛选出置信度排序前设定个数的目标候选区域的过程中,可以根据目标检测网络来确定的,具体可以基于进行目标检测网络的测试速度和测试精度综合考虑来提前确定,比如在网络训练过程中,不断调整目标候选区域的个数,然后在测试过程中,对目标检测网络进行测试,综合考虑测试速度以及测试精度,确定这里目标候选区域的设定个数。The above process of screening out the set number of target candidate regions before the confidence sorting from the set number of candidate regions can be determined according to the target detection network, which can be specifically based on the test speed of the target detection network and The test accuracy is determined in advance by comprehensive consideration. For example, during the network training process, the number of target candidate areas is constantly adjusted, and then during the test process, the target detection network is tested, and the test speed and test accuracy are comprehensively considered to determine the target candidate area here. The number of settings.
当然,若这里的每个候选区域对应的置信度均小于设定阈值,则可以说明该待检测的车舱内图像中不存在待检测对象,该情况本公开实施例不做详细阐述。Of course, if the confidence level corresponding to each candidate area here is less than the set threshold, it can indicate that there is no object to be detected in the image of the cabin to be detected, and this situation is not described in detail in the embodiment of the present disclosure.
按照上述S301~S303可以获取到待检测的车舱内图像中包含待检测对象的检测区域,即得到待检测对象在待检测的车舱内图像中的位置,这里通过的融合后的第二特征图来确定候选区域,因为融合后的第二特征图包含的待检测对象的特征信息与背景区域的特征信息的区分度较大,且包含的待检测对象的特征信息更加丰富,从而基于该融合后的第二特征图,能够准确得到表示待检测区域中待检测对象位置的候选区域以及每个候选区域的置信度,另外这里提出通过考虑候选区域的重叠区域对待检测对象存在的可能位置信息进一步筛选,即能够准确得到该待检测的车舱内图像中是否存在待检测对象以及待检测对象的位置信息。According to the above S301~S303, the detection area of the object to be detected in the image of the cabin to be detected can be obtained, that is, the position of the object to be detected in the image of the cabin to be detected is obtained, and the second feature after fusion is passed here. Map to determine the candidate area, because the second feature map after the fusion contains the feature information of the object to be detected and the feature information of the background area is more distinguishable, and the feature information of the object to be detected is more abundant, so based on the fusion The latter second feature map can accurately obtain the candidate area representing the position of the object to be detected in the area to be detected and the confidence of each candidate area. In addition, it is proposed to further consider the possible position information of the object to be detected by considering the overlapping area of the candidate area. Screening can accurately obtain whether there is an object to be detected and the position information of the object to be detected in the image in the cabin to be detected.
因为本公开实施例提出的对象检测方法在很多应用场景下,需要不断获取待检测的车舱内图像,并对待检测的车舱内图像进行检测,比如针对交通运输场景下的遗留物品进行检测的情况,则可以通过在车内设置图像采集部件,比如在车内安装摄像机,且令 该摄像机朝向设定位置进行拍摄,此时,可以根据以下步骤获取待检测的车舱内图像:Because the object detection method proposed in the embodiments of the present disclosure requires continuous acquisition of the images in the cabin to be detected in many application scenarios, and the detection of the images in the cabin to be detected, for example, the detection of leftover objects in transportation scenarios In this case, you can install an image acquisition component in the car, for example, install a camera in the car, and make the camera face the set position to shoot. In this case, you can obtain the image of the cabin to be detected according to the following steps:
(1)获取待检测的车舱内视频流;(1) Obtain the video stream in the cabin to be detected;
(2)从待检测的车舱内视频流包含的连续多帧车舱内图像中,间隔提取得到待检测的车舱内图像。(2) From the continuous multiple frames of the cabin images contained in the video stream to be detected, the cabin images to be detected are extracted at intervals.
示例性地,在针对交通运输场景下的遗留物品进行检测时,这里的待检测的车舱内视频流可以为图像采集部件针对车内设定位置拍摄的视频流,每秒采集的视频流可以包含连续多帧车舱内图像,考虑到相邻两帧图像之间的间隔时间较短,因此相邻两帧车舱内图像的相似度较高,为了提高检测效率,这里提出可以在连续多帧车舱内图像中进行间隔提取,得到上述提到的待检测的车舱内图像,比如,若在某个时间段内得到的待检测的车舱内视频流中包含1000帧图像,则按照每间隔一帧提取一次,可以得到500帧待检测的车舱内图像,这里针对这500帧待检测的车舱内图像进行检测,即可以完成车舱内的遗留物品进行检测的目的。Exemplarily, when detecting leftover items in a transportation scene, the video stream in the cabin to be detected here may be a video stream captured by the image capture component at a set position in the car, and the video stream captured per second may be Contains multiple consecutive frames of images in the cabin. Taking into account the short interval between two adjacent frames of images, the similarity of the images in the adjacent two frames of the cabin is relatively high. In order to improve the detection efficiency, it is proposed here that the Interval extraction is performed from the frames of the cabin image to obtain the above-mentioned cabin image to be detected. For example, if the cabin video stream to be detected obtained in a certain period of time contains 1000 frames of images, follow Extracting once every frame, you can get 500 frames of images in the cabin to be detected. Here, the detection of these 500 frames of images in the cabin to be detected can accomplish the purpose of detecting the remaining items in the cabin.
这里按照间隔方式提取待检测的车舱内图像,从待检测的车舱内视频流中得到需要进行检测的待检测的车舱内图像,可以提高检测效率。Here, the images in the cabin to be detected are extracted in an interval manner, and the images in the cabin to be detected that need to be detected are obtained from the video stream in the cabin to be detected, which can improve the detection efficiency.
在另一种实施方式中,针对上述S102,在对所述待检测的车舱内图像进行目标检测时,还可以对待检测对象在每帧车舱内图像中的位置信息进行追踪,如图4所示,还包括以下S401~S404:In another embodiment, for the above S102, when performing target detection on the to-be-detected cabin image, the position information of the to-be-detected object in each frame of the cabin image can also be tracked, as shown in Fig. 4 As shown, the following S401~S404 are also included:
S401,将待检测的车舱内视频流中的每个车舱内图像作为待追踪图像,针对每个非首帧待追踪图像,基于该非首帧待追踪图像的前一帧待追踪图像中的待检测对象的位置信息以及该非首帧待追踪图像,确定待检测对象在该非首帧待追踪图像中的预测位置信息。S401. Use each cabin image in the cabin video stream to be detected as a to-be-tracked image, and for each non-first frame of the to-be-tracked image, based on the previous frame of the non-first-frame to-be-tracked image in the to-be-tracked image The position information of the object to be detected and the non-first frame of the to-be-tracked image determine the predicted position information of the object to be detected in the non-first frame of the to-be-tracked image.
在对待检测对象进行追踪时,可以从待检测的车舱内视频流中的第二帧车舱内图像开始,依次对待检测对象进行追踪,第一帧车舱内图像中待检测对象的位置信息可以通过上述目标检测方式进行确定,比如按照上述方式,对间隔提取车舱内图像进行了对象检测,分别确定出了待检测对象在间隔提取的车舱内图像中的位置信息,示例性地,比如对第1帧车舱内图像、第3帧车舱内图像、第5帧车舱内图像等单数帧车舱内图像进行了目标检测,在追踪待检测对象在第2帧车舱内图像中的位置信息时,可以基于第1帧车舱内图像中的待检测对象的位置信息以及该第2帧车舱内图像,确定待检测对象在第2帧车舱内图像中的预测位置信息。When tracking the object to be detected, you can start from the second frame of the cabin image in the video stream of the cabin to be detected, and then track the object to be detected in sequence. The position information of the object to be detected in the first frame of the cabin image The determination can be made through the above-mentioned target detection method. For example, according to the above-mentioned method, object detection is performed on the interval-extracted cabin image, and the position information of the object to be detected in the interval-extracted cabin image is respectively determined. Illustratively, For example, target detection is performed on single-number frames of cabin images such as the first frame of the cabin image, the third frame of the cabin image, and the fifth frame of the cabin image. The object to be detected is tracked in the second frame of the cabin image In the case of the position information in the cabin image, the predicted position information of the object to be detected in the cabin image of the second frame can be determined based on the position information of the object to be detected in the cabin image of the first frame and the cabin image of the second frame .
具体在对待检测对象进行追踪时,可以基于预先训练的神经网络中的目标追踪网络来进行追踪,比如针对第1帧待追踪图像和第2帧待追踪图像,根据待检测对象在该第1帧待追踪图像中的检测区域,以及该检测区域中包含的特征点的特征数据,这里的检测区域具有对应的坐标信息,然后将该检测区域、该检测区域中包含的特征点的特征数据以及第2帧待追踪图像输入目标追踪网络中,即可以基于待检测对象在第1帧待追踪图像中的检测区域对应的坐标信息,在第2帧待追踪图像中与该坐标信息对应的局部区域内寻找是否存在与该检测区域中包含的特征点的特征数据相似度超过阈值的检测区域,若有,即可以确定第2帧待追踪图像包含第1帧待追踪图像中的待检测对象,且得到第1帧待追踪图像中的待检测对象在第2帧待追踪图像中的位置信息,即完成了对待检测对象的追踪。Specifically, when tracking the object to be detected, it can be based on the target tracking network in the pre-trained neural network. For example, for the first frame to be tracked and the second frame to be tracked, according to the object to be detected The detection area in the image to be tracked, and the feature data of the feature points contained in the detection area, where the detection area has corresponding coordinate information, and then the detection area, the feature data of the feature points contained in the detection area, and the first Two frames of to-be-tracked images are input into the target tracking network, that is, based on the coordinate information corresponding to the detection area in the first frame of the to-be-tracked image of the object to be detected, in the local area corresponding to the coordinate information in the second frame of the to-be-tracked image Look for whether there is a detection area whose feature data similarity to the feature points contained in the detection area exceeds the threshold. If so, it can be determined that the second frame of the image to be tracked contains the object to be detected in the first frame of the image to be tracked, and get The position information of the object to be detected in the image to be tracked in the first frame in the image to be tracked in the second frame is to complete the tracking of the object to be detected.
当然,若第2帧待追踪图像中与该坐标信息对应的局部区域内不存在与该检测区域中包含的特征点的特征数据相似度超过阈值的检测区域,则可以说明第2帧待追踪图像中不包含第1帧待追踪图像中的待检测对象,可以确定该待检测对象发生了位置移动。Of course, if there is no detection area in the local area corresponding to the coordinate information in the second frame of the to-be-tracked image that has a similarity with the feature data of the feature points contained in the detection area that exceeds the threshold, it can be explained that the second frame of the to-be-tracked image Does not include the object to be detected in the image to be tracked in the first frame, and it can be determined that the object to be detected has moved in position.
S402,确定该非首帧待追踪图像是否为检测出待检测对象的待检测的车舱内图像。S402: Determine whether the non-first frame of the to-be-tracked image is a to-be-detected cabin image in which the object to be detected is detected.
在得到该非首帧待追踪图像中包含的待检测对象的预测位置信息后,可以基于待检测对象在该非首帧待追踪图像中的位置信息,预测该待检测对象在下一帧待追踪图像中 的位置信息。After obtaining the predicted position information of the object to be detected contained in the non-first frame of the image to be tracked, it is possible to predict the object to be detected in the next frame of the image to be tracked based on the position information of the object to be detected in the non-first frame of the image to be tracked Location information in.
在此之前,可以先确定出该非首帧待追踪图像是否为检测出待检测对象的待检测的车舱内图像,以便考虑是否基于检测出的待检测对象在该非首帧待追踪图像中的位置信息对该待检测对象在该非首帧待追踪图像中的预测位置信息进行修正,从而基于修正后的位置信息,追踪待检测对象在下一帧待追踪图像中的位置。Prior to this, it can be determined whether the non-first frame of the to-be-tracked image is the to-be-detected cabin image in which the object to be detected is detected, so as to consider whether the detected object is in the non-first frame of the to-be-tracked image The position information of the object to be detected is corrected in the predicted position information of the image to be tracked in the non-first frame, so as to track the position of the object to be detected in the image to be tracked in the next frame based on the corrected position information.
S403,在确定该非首帧待追踪图像是检测出待检测对象的待检测的车舱内图像时,将检测出的位置信息作为待检测对象在该非首帧待追踪图像中的位置信息。在确定该非首帧待追踪图像不是检测出待检测对象的待检测的车舱内图像时,将确定的预测位置信息作为待检测对象在该非首帧待追踪图像中的位置信息。S403: When it is determined that the non-first frame of the to-be-tracked image is a to-be-detected cabin image in which the object to be detected is detected, use the detected position information as the position information of the to-be-detected object in the non-first frame of the to-be-tracked image. When it is determined that the non-first frame of the to-be-tracked image is not the to-be-detected cabin image in which the object to be detected is detected, the determined predicted position information is used as the position information of the to-be-detected object in the non-first frame of the to-be-tracked image.
在确定该非首帧待追踪图像是检测出待检测对象的待检测的车舱内图像时,将检测出的位置信息作为待检测对象在该非首帧待追踪图像中的位置信息,即完成了对该待检测对象在该非首帧待追踪图像中的预测位置信息的修正,后续在基于该待检测对象在该非首帧待追踪图像中的位置信息,对该待检测对象进行追踪时,能够更加准确。When it is determined that the non-first frame of the to-be-tracked image is the to-be-detected image in the cabin where the object to be detected is detected, the detected position information is used as the position information of the to-be-detected object in the non-first frame of the to-be-tracked image, that is, complete The predicted position information of the object to be detected in the non-first frame of the image to be tracked is corrected, and subsequently based on the position information of the object to be detected in the non-first frame of the image to be tracked, when the object to be detected is tracked , Can be more accurate.
若该非首帧待追踪图像不是检测出待检测对象的待检测的车舱内图像,此时可以基于将待检测对象在该非首帧待追踪图像中的预测位置信息,对该待检测对象在下一帧待追踪图像中的位置进行继续追踪,该方式可以估算出每个时刻待检测对象在车舱内的位置,从而提高追踪效率。If the non-first frame of the to-be-tracked image is not the to-be-detected image in the cabin where the object to be detected is detected, the object to be detected can be determined based on the predicted position information of the object to be detected in the non-first frame of the to-be-tracked image Continue to track the position in the next frame of the image to be tracked. This method can estimate the position of the object to be detected in the cabin at each moment, thereby improving the tracking efficiency.
本公开实施例中,可以基于非首帧待追踪图像的前一帧待追踪图像中待检测对象的位置信息对该非首帧待追踪图像进行追踪,确定出待检测对象在该非首帧待追踪图像中的预测位置信息,并且在追踪过程中,还可以基于检测出的位置信息调整预测的位置信息,这样,能够提高对待检测对象追踪的效率和准确度。In the embodiment of the present disclosure, the non-first frame to-be-tracked image can be tracked based on the position information of the to-be-detected object in the previous frame of the non-first frame-to-be-tracked image, and it is determined that the to-be-detected object is in the non-first frame. The predicted location information in the image is tracked, and during the tracking process, the predicted location information can also be adjusted based on the detected location information. In this way, the efficiency and accuracy of tracking the object to be detected can be improved.
在一种实施方式中,本公开实施例提出的对待检测的车舱内图像进行目标检测由神经网络执行,这里的神经网络利用包含了待检测样本对象的车舱内图像和未包含待检测样本对象的车舱内样本图像训练得到。In one embodiment, the target detection of the image in the cabin to be detected proposed in the embodiment of the present disclosure is performed by a neural network, where the neural network uses the image in the cabin that contains the sample object to be detected and does not contain the sample to be detected. The sample images in the vehicle cabin of the object are obtained through training.
示例性地,神经网络中进行目标检测的网络可以按照以下方式训练得到,如图5所示,具体包括S501~S505:Exemplarily, the network for target detection in the neural network can be obtained by training in the following manner, as shown in FIG. 5, which specifically includes S501 to S505:
S501,获取待检测的车舱内样本图像。S501: Acquire a sample image in the cabin to be detected.
这里的待检测的车舱内样本图像即包括包含待检测样本对象的车舱内样本图像,可以记为正样本图像,以及包括未包含待检测样本对象的车舱内样本图像,可以记录为负样本图像。The sample image in the cabin to be detected here includes the sample image in the cabin that contains the sample object to be detected, which can be recorded as a positive sample image, and the sample image in the cabin that does not contain the sample object to be detected, can be recorded as negative Sample image.
考虑到针对车内场景下的物品检测时,遗留物品在车舱内样本图像中的形貌可能为各种色块,比如手机、行李箱等可以通过矩形色块表示,水杯可以通过圆柱形色块进行表示,为了使得进行神经网络能够更好地识别出哪些是待检测对象,哪些是车内背景,比如车座、窗户等背景,这里可以在车舱内样本图像中加入一些非待检测物品的随机色块,用以表示非待检测对象,通过训练神经网络不断对真实的待检测对象以及对非真实的随机色块以及车内背景进行区分,从而得到准确度较高的神经网络。Considering that when detecting objects in the car scene, the appearance of the leftover objects in the sample image in the cabin may be various color blocks, such as mobile phones, suitcases, etc. can be represented by rectangular color blocks, and the water cup can be represented by cylindrical colors. Block representation, in order to enable the neural network to better identify which are the objects to be detected and which are the background in the car, such as the background of the car seat, window, etc., here can add some non-to-be-detected items to the sample image in the cabin The random color patches of, are used to represent the objects not to be detected. Through training the neural network, the real objects to be detected and the non-real random color patches and the background in the car are continuously distinguished, so as to obtain a neural network with higher accuracy.
S502,对待检测的车舱内样本图像进行特征提取,得到与多个通道中每个通道对应的第一样本特征图;其中每个通道对应的第一样本特征图,为将待检测样本对象在该通道对应的图像特征类别下的特征进行增强处理后的样本特征图。S502: Perform feature extraction on the sample image in the cabin to be detected to obtain a first sample feature map corresponding to each of the multiple channels; wherein the first sample feature map corresponding to each channel is the sample to be detected A sample feature map after the feature of the object in the image feature category corresponding to the channel is enhanced.
这里对待检测样本图像进行特征提取,得到与多个通道中每个通道对应的第一样本特征图的过程,与上文提到的对待检测的车舱内图像进行特征提取,得到与多个通道中每个通道对应的第一特征图的过程相似,在此不再赘述。Here, feature extraction is performed on the sample image to be detected, and the process of obtaining the first sample feature map corresponding to each of the multiple channels is performed with the feature extraction of the image in the cabin to be detected as mentioned above, and multiple The process of the first feature map corresponding to each channel in the channel is similar, and will not be repeated here.
S503,针对每个通道,将该通道对应的第一样本特征图与其它通道分别对应的第一样本特征图进行特征信息融合,得到融合后的第二样本特征图。S503: For each channel, perform feature information fusion on the first sample feature map corresponding to the channel and the first sample feature maps corresponding to other channels to obtain a fused second sample feature map.
这里基于第一样本特征图,得到融合后的第二样本特征图的过程,与上文提到的基 于第一特征图,得到融合后的第二特征图的过程相似,在此不再赘述。Here, based on the first sample feature map, the process of obtaining the fused second sample feature map is similar to the process of obtaining the fused second feature map based on the first feature map mentioned above, and will not be repeated here. .
S504,基于融合后的第二样本特征图,预测待检测的车舱内样本图像中的待检测样本对象。S504: Predict the sample object to be detected in the sample image in the cabin to be detected based on the fused second sample feature map.
这里基于融合后的第二样本特征图,预存车舱内样本图像中的待检测样本对象,与上文提到的基于融合后的第二特征图,检测待检测的车舱内图像中的待检测对象的过程相似,在此不再赘述。Here, based on the fused second sample feature map, the sample object to be detected in the sample image in the cabin is pre-stored, and the second feature map based on the fusion mentioned above is used to detect the to be detected in the cabin image to be detected. The process of detecting objects is similar, so I won't repeat them here.
S505,基于预测出的待检测的车舱内样本图像中的待检测样本对象、包含待检测样本的待检测的车舱内样本图像中和不包含待检测样本的待检测的车舱内样本图像,对神经网络中的网络参数值进行调整。S505, based on the predicted sample object in the sample image in the cabin to be detected, the sample image in the cabin to be detected containing the sample to be detected, and the sample image in the cabin to be detected that does not contain the sample to be detected , To adjust the network parameter values in the neural network.
这里通过预测出的待检测的车舱内样本图像中的待检测样本对象的位置信息、包含待检测样本的待检测的车舱内样本图像和不包含待检测样本的待检测的车舱内样本图像,来确定预测出的待检测的车舱内样本图像中的待检测样本对象的位置信息的损失值,通过损失值来对神经网络中的网络参数值进行调整,经过多次训练后,比如使得损失值小于设定阈值时,即可以停止训练,从而得到训练完成的神经网络。Here, the position information of the sample object to be tested in the sample image in the cabin to be tested is predicted, the sample image in the cabin to be tested that contains the sample to be tested, and the sample in the cabin to be tested that does not contain the sample to be tested Image, to determine the predicted loss value of the position information of the sample object to be detected in the sample image in the cabin to be detected, and adjust the network parameter value in the neural network through the loss value. After multiple training, such as When the loss value is less than the set threshold, the training can be stopped, so as to obtain the trained neural network.
另外,针对上文提到的对待检测图像进行追踪的过程,本公开实施例还包括对神经网络中的目标追踪网络进行训练的过程,这里可以通过待检测样本对象、包含待检测样本对象的待追踪样本图像和未包含待检测样本对象的待追踪样本图像训练得到。In addition, with regard to the process of tracking the image to be detected as mentioned above, the embodiments of the present disclosure also include a process of training the target tracking network in the neural network. The tracking sample image and the to-be-tracked sample image that does not contain the sample object to be detected are obtained through training.
这里的待检测样本对象即可以指需要进行追踪的样本对象,比如针对车内场景下的物品检测时,这里的待检测样本对象即可以为各种车内场景下的乘客物品。The sample object to be detected here may refer to the sample object that needs to be tracked. For example, when detecting objects in a car scene, the sample object to be detected here can be passenger objects in various car scenes.
示例性地,神经网络中的目标追踪网络可以按照以下方式训练得到,如图6所示,具体包括S601~S603:Exemplarily, the target tracking network in the neural network can be obtained by training in the following manner, as shown in FIG. 6, which specifically includes S601 to S603:
S601,获取待追踪样本图像以及待检测样本对象对应的待检测样本对象信息。S601: Acquire a sample image to be tracked and information about the sample object to be detected corresponding to the sample object to be detected.
这里的待追踪样本图像即可以指需要对待检测样本对象进行追踪的样本图像,这里的待追踪样本图像可以包括包含待检测样本对象的正样本图像和不包含待检测样本对象的负样本图像。The sample image to be tracked here may refer to the sample image that needs to be tracked for the sample object to be detected. The sample image to be tracked here may include a positive sample image that contains the sample object to be detected and a negative sample image that does not contain the sample object to be detected.
在对神经网络中的目标追踪网络进行训练时,可以将待检测样本对象的检测区域图像以及待追踪样本图像同时输入进行神经网络,该待检测样本对象的检测区域图像包含待检测样本对象对应的待检测样本对象信息,即可以包括待检测对象的检测区域,以及该检测区域中包含的特征点的特征数据。When training the target tracking network in the neural network, the detection area image of the sample object to be detected and the sample image to be tracked can be input into the neural network at the same time. The detection area image of the sample object to be detected contains the corresponding image of the sample object to be detected. The information of the sample object to be detected may include the detection area of the object to be detected and the feature data of the feature points contained in the detection area.
当然,同样针对车舱内场景下的物品检测时,为了使得进行神经网络能够更好地识别出哪些是待检测对象,哪些是车内背景,比如车座、窗户等背景,这里可以在待追踪样本图像中加入一些非待检测对象的随机色块,用以表示非待检测对象,通过训练神经网络不断对真实的待检测对象以及对非真实的随机色块以及车内背景进行区分,从而得到准确的进行目标追踪的神经网络。Of course, when detecting objects in the cabin scene, in order to enable the neural network to better identify which are the objects to be detected and which are the background in the car, such as the background of car seats, windows, etc., here can be tracked Some random color patches of non-detected objects are added to the sample image to represent the non-detected objects. Through training the neural network, the real object to be detected and the non-real random color patches and the background in the car are continuously distinguished, so as to obtain Neural network for accurate target tracking.
S602,基于待检测样本对象信息和待追踪样本图像,对待检测样本对象在样本图像中的位置进行追踪,预测待检测样本对象在样本图像中的位置信息。S602: Based on the sample object information to be detected and the sample image to be tracked, the position of the sample object to be detected in the sample image is tracked, and the position information of the sample object to be detected in the sample image is predicted.
具体地,若针对同一区域连续获取的样本图像中的待检测样本对象进行追踪时,可以先基于待检测样本对象信息中待检测样本对象对应的检测区域,确定该待检测样本对象在待追踪样本图像中的局部区域,这里的局部区域与待检测样本对象对应的检测区域接近,从而可以在局部区域中基于特征数据查出待检测样本对象,从而预测出待检测样本对象在待追踪样本图像中的位置信息。Specifically, if the sample object to be detected in the sample images continuously acquired in the same area is tracked, it can first be determined based on the detection area corresponding to the sample object to be detected in the sample object information to determine that the sample object to be detected is in the sample to be tracked. The local area in the image, where the local area is close to the detection area corresponding to the sample object to be detected, so that the sample object to be detected can be detected in the local area based on the feature data, so as to predict that the sample object to be detected is in the sample image to be tracked Location information.
S603,基于预测的待检测样本对象在待追踪样本图像中的位置信息、包含待检测样本对象的待追踪样本图像和未包含待检测样本对象的待追踪样本图像,对进行神经网络中的网络参数值进行调整。S603, based on the predicted position information of the sample object to be detected in the sample image to be tracked, the sample image to be tracked that contains the sample object to be detected, and the sample image to be tracked that does not contain the sample object to be detected, perform the network parameters in the neural network Value to be adjusted.
这里可以通过预测出的待检测样本对象在待追踪样本图像中的位置信息,包含待检 测样本对象的待追踪样本图像和未包含待检测样本对象的待追踪样本图像,来确定待追踪样本图像中的待检测样本对象的位置信息的损失值,经过多次训练后,通过损失值来对神经网络中的网络参数值进行调整,比如使得损失值小于设定阈值时,即可以停止训练,从而得到神经网络的目标追踪网络。Here, the position information of the sample object to be detected in the sample image to be tracked can be predicted, the sample image to be tracked that contains the sample object to be detected, and the sample image to be tracked that does not contain the sample object to be detected, to determine the sample image to be tracked The loss value of the position information of the sample object to be detected. After multiple training, the network parameter value in the neural network is adjusted through the loss value. For example, when the loss value is less than the set threshold, the training can be stopped to obtain Neural network target tracking network.
本公开实施例提供的神经网络的目标追踪网络的训练过程,通过获取待追踪样本图像以及待检测样本对象对应的待检测样本对象信息,对待检测样本对象在待追踪样本图像中的位置进行追踪,从而快速确定待检测样本对象在待追踪样本图像中的位置,然后通过预测的待检测样本对象在待追踪样本图像中的位置信息、包含待检测样本对象的待追踪样本图像和未包含待检测样本对象的待追踪样本图像,对神经网络的网络参数值进行调整,从而得到准确度较高的神经网络,基于该准确度较高的神经网络即可以对待检测对象进行准确追踪。In the training process of the neural network target tracking network provided by the embodiments of the present disclosure, the position of the sample object to be detected in the sample image to be tracked is tracked by acquiring the sample image to be tracked and the information of the sample object to be detected corresponding to the sample object to be detected. So as to quickly determine the position of the sample object to be detected in the sample image to be tracked, and then predict the position information of the sample object to be detected in the sample image to be tracked, the sample image to be tracked that contains the sample object to be detected, and the sample that does not contain the sample to be detected. For the sample image of the object to be tracked, the network parameter values of the neural network are adjusted to obtain a neural network with higher accuracy. Based on the neural network with higher accuracy, the object to be detected can be accurately tracked.
本领域技术人员可以理解,在具体实施方式的上述方法中,各步骤的撰写顺序并不意味着严格的执行顺序而对实施过程构成任何限定,各步骤的具体执行顺序应当以其功能和可能的内在逻辑确定。Those skilled in the art can understand that in the above methods of the specific implementation, the writing order of the steps does not mean a strict execution order but constitutes any limitation on the implementation process. The specific execution order of each step should be based on its function and possibility. The inner logic is determined.
基于同一技术构思,本公开实施例中还提供了与对象检测方法对应的对象检测装置,由于本公开实施例中的装置解决问题的原理与本公开实施例上述对象检测方法相似,因此装置的实施可以参见方法的实施,重复之处不再赘述。Based on the same technical concept, the embodiment of the present disclosure also provides an object detection device corresponding to the object detection method. Since the principle of the device in the embodiment of the present disclosure to solve the problem is similar to the above-mentioned object detection method of the embodiment of the present disclosure, the implementation of the device You can refer to the implementation of the method, and the repetition will not be repeated here.
参照图7所示,为本公开实施例,提供的一种对象检测装置700的示意图,该对象检测装置700包括:图像获取模块701、图像检测模块702、提示模块703。Referring to FIG. 7, which is a schematic diagram of an object detection device 700 provided by an embodiment of the present disclosure, the object detection device 700 includes: an image acquisition module 701, an image detection module 702, and a prompt module 703.
图像获取模块701,配置为获取待检测的车舱内图像;The image acquisition module 701 is configured to acquire an image in the cabin to be detected;
图像检测模块702,配置为在车舱内人员减少的情况下,对待检测的车舱内图像进行目标检测,确定待检测的车舱内图像中是否存在待检测对象;The image detection module 702 is configured to perform target detection on the image in the cabin to be detected when the number of people in the cabin is reduced, and determine whether there is an object to be detected in the image in the cabin to be detected;
提示模块703,配置为响应于待检测的车舱内图像中存在待检测对象的状态的持续时长超过预设时长,发出提示信息。The prompting module 703 is configured to send a prompt message in response to a state in which the object to be detected exists in the image of the cabin to be detected for a duration that exceeds a preset period of time.
在本公开的一些实施例中,所述提示模块703配置为响应于所述待检测的车舱内图像中存在待检测对象的状态的持续时长超过预设时长,发出提示信息,包括:In some embodiments of the present disclosure, the prompt module 703 is configured to send a prompt message in response to a state in which the object to be detected exists in the image in the cabin to be detected exceeds a preset period of time, including:
在减少的车舱内人员为乘客的情况下,响应于所述待检测的车舱内图像中存在待检测对象的状态的持续时长超过第一预设时长,发出第一提示信息,所述第一提示信息用于提示乘客物品遗留;在减少的车舱内人员为驾驶员的情况下,响应于所述待检测的车舱内图像中存在待检测对象的状态的持续时长超过第二预设时长,发出第二提示信息,所述第二提示信息用于提示驾驶员物品遗留。In the case that the reduced number of persons in the cabin is a passenger, in response to the duration of the state in which the object to be detected exists in the image to be detected in the cabin exceeds a first preset time period, a first prompt message is issued, and the first prompt message is sent. A reminder message is used to remind passengers that items are left behind; in the case that the reduced number of people in the cabin is the driver, the duration in response to the presence of the object to be detected in the image of the cabin to be detected exceeds the second preset The second prompt message is sent for the duration, and the second prompt message is used to prompt the driver that the item is left behind.
在本公开的一些实施例中,减少的车舱内人员为驾驶员和/或乘客,在图像检测模块702确定所述待检测的车舱内图像中存在待检测对象后,在提示模块发出提示信息之前,所述图像检测模块702还配置为:In some embodiments of the present disclosure, the reduced number of people in the cabin is the driver and/or passenger. After the image detection module 702 determines that the object to be detected exists in the image of the cabin to be detected, the prompt module issues a prompt Before information, the image detection module 702 is also configured to:
根据所述待检测对象在所述车舱中位置,确定所述待检测对象的归属人员;其中,所述待检测对象的归属人员为驾驶员和/或乘客。According to the position of the object to be detected in the cabin, the attribution of the object to be detected is determined; wherein the attribution of the object to be detected is the driver and/or passenger.
在本公开的一些实施例中,所述图像检测模块702,配置为对所述待检测的车舱内图像进行目标检测,包括:In some embodiments of the present disclosure, the image detection module 702, configured to perform target detection on the image in the cabin to be detected, includes:
对所述待检测的车舱内图像进行特征提取,得到与多个通道中每个通道对应的第一特征图;其中每个通道对应的第一特征图,为将待检测对象在该通道对应的图像特征类别下的特征进行增强处理后的特征图;Perform feature extraction on the image in the cabin to be detected to obtain a first feature map corresponding to each of the multiple channels; wherein the first feature map corresponding to each channel corresponds to the object to be detected in the channel The feature map after the enhancement processing is performed on the features under the image feature category;
针对每个所述通道,将该通道对应的第一特征图与其它通道分别对应的第一特征图进行特征信息融合,得到融合后的第二特征图;For each of the channels, perform feature information fusion on the first feature map corresponding to the channel and the first feature maps corresponding to the other channels to obtain a fused second feature map;
基于所述融合后的第二特征图,检测所述待检测的车舱内图像中的所述待检测对象。Based on the fused second feature map, the object to be detected in the image in the cabin to be detected is detected.
在本公开的一些实施例中,所述图像检测模块702,配置为针对每个所述通道,将该 通道对应的第一特征图与其它通道分别对应的第一特征图进行特征信息融合,得到融合后的第二特征图,包括:In some embodiments of the present disclosure, the image detection module 702 is configured to perform, for each channel, a first feature map corresponding to the channel and first feature maps corresponding to other channels to perform feature information fusion to obtain The second feature map after fusion, including:
针对进行特征信息融合的多个第一特征图,确定所述多个第一特征图所对应的权重矩阵;Determining a weight matrix corresponding to the plurality of first feature maps for performing feature information fusion;
基于所述权重矩阵,对所述多个第一特征图的特征信息进行加权求和,得到包含各个融合特征信息的所述第二特征图。Based on the weight matrix, performing a weighted summation on the feature information of the multiple first feature maps to obtain the second feature map containing each fused feature information.
在本公开的一些实施例中,所述图像检测模块702,配置为基于所述融合后的第二特征图,检测所述待检测的车舱内图像中的所述待检测对象,包括:In some embodiments of the present disclosure, the image detection module 702 is configured to detect the object to be detected in the image in the cabin to be detected based on the fused second feature map, including:
基于所述融合后的第二特征图,确定设定个数的候选区域,每个候选区域包含设定个数的特征点;Determine a set number of candidate regions based on the fused second feature map, and each candidate region contains a set number of feature points;
基于每个候选区域包含的特征点的特征数据,确定该候选区域对应的置信度;每个候选区域对应的置信度用于表征该候选区域中包含所述待检测对象的可信程度;Based on the feature data of the feature points contained in each candidate area, determine the confidence level corresponding to the candidate area; the confidence level corresponding to each candidate area is used to characterize the credibility that the candidate area contains the object to be detected;
基于每个候选区域对应的置信度以及不同候选区域之间的重叠区域,从所述设定个数的候选区域中筛选出待检测对象对应的检测区域;所述检测区域用于标识所述待检测对象在所述待检测的车舱内图像中的位置。Based on the confidence level corresponding to each candidate area and the overlapping area between different candidate areas, the detection area corresponding to the object to be detected is screened out from the set number of candidate areas; the detection area is used to identify the to-be-detected object The position of the detection object in the image in the cabin to be detected.
在本公开的一些实施例中,所述图像获取模块701,配置为获取所述待检测的车舱内图像,包括:In some embodiments of the present disclosure, the image acquisition module 701 configured to acquire the image in the cabin to be detected includes:
获取待检测的车舱内视频流;Obtain the video stream in the cabin to be detected;
从所述待检测的车舱内视频流包含的连续多帧车舱内图像中,间隔提取得到所述待检测的车舱内图像。From the continuous multiple frames of the in-cabin images contained in the video stream in the cabin to be detected, the in-cabin images to be detected are extracted at intervals.
在本公开的一些实施例中,所述图像检测模块702,配置为对所述待检测的车舱内图像进行目标检测,还包括:In some embodiments of the present disclosure, the image detection module 702, configured to perform target detection on the image in the cabin to be detected, further includes:
将所述待检测的车舱内视频流中的每个车舱内图像作为待追踪图像,针对每个非首帧待追踪图像,基于该非首帧待追踪图像的前一帧待追踪图像中的所述待检测对象的位置信息以及该非首帧待追踪图像,确定所述待检测对象在该非首帧待追踪图像中的预测位置信息;Use each cabin image in the cabin video stream to be detected as a to-be-tracked image, and for each non-first frame to be tracked image, based on the previous frame of the non-first frame to-be-tracked image in the to-be-tracked image Determining the predicted position information of the object to be detected in the non-first frame to be tracked image of the position information of the object to be detected and the non-first frame to be tracked image;
确定该非首帧待追踪图像是否为检测出待检测对象的待检测的车舱内图像;Determine whether the non-first frame of the to-be-tracked image is the to-be-detected image in the cabin where the object to be detected is detected;
在确定该非首帧待追踪图像是检测出待检测对象的待检测的车舱内图像时,将检测出的位置信息作为待检测对象在该非首帧待追踪图像中的位置信息;When it is determined that the non-first frame of the to-be-tracked image is the to-be-detected cabin image in which the object to be detected is detected, use the detected position information as the position information of the to-be-detected object in the non-first frame of the to-be-tracked image;
在确定该非首帧待追踪图像不是检测出待检测对象的待检测的车舱内图像时,将确定的预测位置信息作为待检测对象在该非首帧待追踪图像中的位置信息。When it is determined that the non-first frame of the to-be-tracked image is not the to-be-detected cabin image in which the object to be detected is detected, the determined predicted position information is used as the position information of the to-be-detected object in the non-first frame of the to-be-tracked image.
在本公开的一些实施例中,对象检测装置还包括神经网络训练模块704,神经网络训练模块704用于:In some embodiments of the present disclosure, the object detection device further includes a neural network training module 704, and the neural network training module 704 is used to:
训练对待检测的车舱内图像进行目标检测的神经网络,神经网络利用包含了待检测样本对象的车舱内样本图像和未包含待检测样本对象的车舱内样本图像训练得到。Train a neural network for target detection on images in the cabin to be detected. The neural network is trained using sample images in the cabin containing the sample objects to be detected and sample images in the cabin that do not contain the sample objects to be detected.
对应于图1中的对象检测方法,本公开实施例还提供了一种电子设备800,如图8所示,为本公开实施例提供的电子设备800结构示意图,包括:Corresponding to the object detection method in FIG. 1, an embodiment of the present disclosure also provides an electronic device 800. As shown in FIG. 8, a schematic structural diagram of the electronic device 800 provided by the embodiment of the present disclosure includes:
处理器81、存储器82、和总线83;存储器82配置为存储执行指令,包括内存821和外部存储器822;这里的内存821也称内存储器,用于暂时存放处理器81中的运算数据,以及与硬盘等外部存储器822交换的数据,处理器81通过内存821与外部存储器822进行数据交换,当电子设备800运行时,处理器81与存储器82之间通过总线83通信,使得处理器81执行上述方法实施例中的任意一种对象检测方法。The processor 81, the memory 82, and the bus 83; the memory 82 is configured to store execution instructions, including a memory 821 and an external memory 822; the memory 821 here is also called internal memory, which is used to temporarily store the arithmetic data in the processor 81, and For data exchanged by an external memory 822 such as a hard disk, the processor 81 exchanges data with the external memory 822 through the memory 821. When the electronic device 800 is running, the processor 81 and the memory 82 communicate through the bus 83, so that the processor 81 executes the above method Any object detection method in the embodiment.
本公开实施例还提供一种计算机可读存储介质,该计算机可读存储介质上存储有计算机程序,该计算机程序被处理器运行时执行上述方法实施例中的任意一种对象检测方法。其中,该存储介质可以是易失性或非易失的计算机可读取存储介质。The embodiments of the present disclosure also provide a computer-readable storage medium on which a computer program is stored, and when the computer program is run by a processor, any one of the object detection methods in the foregoing method embodiments is executed. Wherein, the storage medium may be a volatile or nonvolatile computer readable storage medium.
本公开实施例所提供的对象检测方法的计算机程序产品,包括存储了程序代码的计算机可读存储介质,程序代码包括的指令可用于执行上述方法实施例中的任意一种对象检测方法,具体可参见上述方法实施例,在此不再赘述。The computer program product of the object detection method provided by the embodiment of the present disclosure includes a computer-readable storage medium storing program code. The instructions included in the program code can be used to execute any of the object detection methods in the foregoing method embodiments. Refer to the foregoing method embodiment, which will not be repeated here.
本公开实施例还提供一种计算机程序,该计算机程序被处理器执行时实现前述实施例的任意一种方法。该计算机程序产品可以具体通过硬件、软件或其结合的方式实现。在一个可选实施例中,所述计算机程序产品具体体现为计算机存储介质,在另一个可选实施例中,计算机程序产品具体体现为软件产品,例如软件开发包(Software Development Kit,SDK)等等。The embodiments of the present disclosure also provide a computer program, which, when executed by a processor, implements any one of the methods in the foregoing embodiments. The computer program product can be specifically implemented by hardware, software, or a combination thereof. In an optional embodiment, the computer program product is specifically embodied as a computer storage medium. In another optional embodiment, the computer program product is specifically embodied as a software product, such as a software development kit (SDK), etc. Wait.
所属领域的技术人员可以清楚地了解到,为描述的方便和简洁,上述描述的***和装置的具体工作过程,可以参考前述方法实施例中的对应过程,在此不再赘述。在本公开所提供的几个实施例中,应该理解到,所揭露的***、装置和方法,可以通过其它的方式实现。以上所描述的装置实施例仅仅是示意性的,例如,所述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,又例如,多个单元或组件可以结合或者可以集成到另一个***,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些通信接口,装置或单元的间接耦合或通信连接,可以是电性,机械或其它的形式。Those skilled in the art can clearly understand that, for the convenience and conciseness of description, the specific working process of the system and device described above can refer to the corresponding process in the foregoing method embodiment, which will not be repeated here. In the several embodiments provided in the present disclosure, it should be understood that the disclosed system, device, and method may be implemented in other ways. The device embodiments described above are merely illustrative. For example, the division of the units is only a logical function division, and there may be other divisions in actual implementation. For example, multiple units or components may be combined or It can be integrated into another system, or some features can be ignored or not implemented. In addition, the displayed or discussed mutual couplings or direct couplings or communication connections may be indirect couplings or communication connections between devices or units through some communication interfaces, and may be in electrical, mechanical or other forms.
所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。The units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, they may be located in one place, or they may be distributed on multiple network units. Some or all of the units may be selected according to actual needs to achieve the objectives of the solutions of the embodiments.
另外,在本公开各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。In addition, the functional units in the various embodiments of the present disclosure may be integrated into one processing unit, or each unit may exist alone physically, or two or more units may be integrated into one unit.
所述功能如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个处理器可执行的非易失的计算机可读取存储介质中。基于这样的理解,本公开的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行本公开各个实施例所述方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(Read-Only Memory,ROM)、随机存取存储器(Random Access Memory,RAM)、磁碟或者光盘等各种可以存储程序代码的介质。If the function is implemented in the form of a software function unit and sold or used as an independent product, it can be stored in a non-volatile computer readable storage medium executable by a processor. Based on this understanding, the technical solution of the present disclosure essentially or the part that contributes to the prior art or the part of the technical solution can be embodied in the form of a software product, and the computer software product is stored in a storage medium, including Several instructions are used to make a computer device (which may be a personal computer, a server, or a network device, etc.) execute all or part of the steps of the methods described in the various embodiments of the present disclosure. The aforementioned storage media include: U disk, mobile hard disk, read-only memory (Read-Only Memory, ROM), random access memory (Random Access Memory, RAM), magnetic disk or optical disk and other media that can store program code .
最后应说明的是:以上所述实施例,仅为本公开的具体实施方式,用以说明本公开的技术方案,而非对其限制,本公开的保护范围并不局限于此,尽管参照前述实施例对本公开进行了详细的说明,本领域的普通技术人员应当理解:任何熟悉本技术领域的技术人员在本公开揭露的技术范围内,其依然可以对前述实施例所记载的技术方案进行修改或可轻易想到变化,或者对其中部分技术特征进行等同替换;而这些修改、变化或者替换,并不使相应技术方案的本质脱离本公开实施例技术方案的精神和范围,都应涵盖在本公开的保护范围之内。因此,本公开的保护范围应所述以权利要求的保护范围为准。Finally, it should be noted that the above-mentioned embodiments are only specific implementations of the present disclosure, which are used to illustrate the technical solutions of the present disclosure, but not to limit it. The protection scope of the present disclosure is not limited thereto, although referring to the foregoing The embodiments describe the present disclosure in detail, and those of ordinary skill in the art should understand that any person skilled in the art can still modify the technical solutions described in the foregoing embodiments within the technical scope disclosed in the present disclosure. Or it can be easily conceived of changes, or equivalent replacements of some of the technical features; and these modifications, changes or replacements do not cause the essence of the corresponding technical solutions to deviate from the spirit and scope of the technical solutions of the embodiments of the present disclosure, and should be covered by the present disclosure. Within the scope of protection. Therefore, the protection scope of the present disclosure should be subject to the protection scope of the claims.
工业实用性Industrial applicability
本公开实施例提供了一种对象检测方法、装置、电子设备、存储介质和计算机程序,其中,该对象检测方法包括:获取待检测的车舱内图像;在车舱内人员减少的情况下,对所述待检测的车舱内图像进行目标检测,确定所述待检测的车舱内图像中是否存在待检测对象;响应于所述待检测的车舱内图像中存在待检测对象的状态的持续时长超过预设时长,发出提示信息。这样在检测到存在车舱内人员遗失的物品时,可以进行相应提示,从而降低乘车环境中的物品丢失概率,提高乘车环境中的物品安全性。Embodiments of the present disclosure provide an object detection method, device, electronic equipment, storage medium, and computer program, where the object detection method includes: acquiring an image in a cabin to be detected; when the number of people in the cabin is reduced, Perform target detection on the image in the cabin to be detected to determine whether there is an object to be detected in the image in the cabin to be detected; responding to the state of the object to be detected in the image in the cabin to be detected If the duration exceeds the preset duration, a prompt message will be issued. In this way, when an item lost by a person in the cabin is detected, a corresponding prompt can be given, thereby reducing the probability of item loss in the riding environment and improving the safety of the item in the riding environment.

Claims (21)

  1. 一种对象检测方法,包括:An object detection method includes:
    获取待检测的车舱内图像;Obtain the image in the cabin to be inspected;
    在车舱内人员减少的情况下,对所述待检测的车舱内图像进行目标检测,确定所述待检测的车舱内图像中是否存在待检测对象;Under the condition that the number of people in the cabin is reduced, target detection is performed on the image in the cabin to be detected, and it is determined whether there is an object to be detected in the image in the cabin to be detected;
    响应于所述待检测的车舱内图像中存在待检测对象的状态的持续时长超过预设时长,发出提示信息。In response to the duration of the state in which the object to be detected exists in the image in the cabin to be detected exceeds a preset time period, a prompt message is issued.
  2. 根据权利要求1所述的对象检测方法,其中,所述响应于所述待检测的车舱内图像中存在待检测对象的状态的持续时长超过预设时长,发出提示信息,包括:The object detection method according to claim 1, wherein the sending a prompt message in response to the state in which the object to be detected exists in the image of the cabin to be detected exceeds a preset time period, comprising:
    在减少的车舱内人员为乘客的情况下,响应于所述待检测的车舱内图像中存在待检测对象的状态的持续时长超过第一预设时长,发出第一提示信息,所述第一提示信息用于提示乘客物品遗留;In the case that the reduced number of persons in the cabin is a passenger, in response to the duration of the state in which the object to be detected exists in the image to be detected in the cabin exceeds a first preset time period, a first prompt message is issued, and the first prompt message is sent. A reminder message is used to remind the passenger that the item is left behind;
    在减少的车舱内人员为驾驶员的情况下,响应于所述待检测的车舱内图像中存在待检测对象的状态的持续时长超过第二预设时长,发出第二提示信息,所述第二提示信息用于提示驾驶员物品遗留。In the case where the reduced number of persons in the cabin is the driver, in response to the duration of the state in which the object to be detected exists in the image to be detected in the cabin exceeds the second preset duration, a second prompt message is issued, the The second prompt message is used to prompt the driver that the item is left behind.
  3. 根据权利要求1或2所述的对象检测方法,其中,减少的车舱内人员为驾驶员和/或乘客,在确定所述待检测的车舱内图像中存在待检测对象后,在发出提示信息之前,所述对象检测方法还包括:The object detection method according to claim 1 or 2, wherein the reduced persons in the cabin are drivers and/or passengers, and after determining that the object to be detected exists in the cabin image to be detected, a prompt is issued Before information, the object detection method further includes:
    根据所述待检测对象在所述车舱中位置,确定所述待检测对象的归属人员;其中,所述待检测对象的归属人员为驾驶员和/或乘客。According to the position of the object to be detected in the cabin, the attribution of the object to be detected is determined; wherein the attribution of the object to be detected is the driver and/or passenger.
  4. 根据权利要求1至3任一所述的对象检测方法,其中,所述对所述待检测的车舱内图像进行目标检测,包括:The object detection method according to any one of claims 1 to 3, wherein the performing target detection on the image in the cabin to be detected includes:
    对所述待检测的车舱内图像进行特征提取,得到与多个通道中每个通道对应的第一特征图;其中每个通道对应的第一特征图,为将待检测对象在该通道对应的图像特征类别下的特征进行增强处理后的特征图;Perform feature extraction on the image in the cabin to be detected to obtain a first feature map corresponding to each of the multiple channels; wherein the first feature map corresponding to each channel corresponds to the object to be detected in the channel The feature map after the enhancement processing is performed on the features under the image feature category;
    针对每个所述通道,将该通道对应的第一特征图与其它通道分别对应的第一特征图进行特征信息融合,得到融合后的第二特征图;For each of the channels, perform feature information fusion on the first feature map corresponding to the channel and the first feature maps corresponding to the other channels to obtain a fused second feature map;
    基于所述融合后的第二特征图,检测所述待检测的车舱内图像中的所述待检测对象。Based on the fused second feature map, the object to be detected in the image in the cabin to be detected is detected.
  5. 根据权利要求4所述的对象检测方法,其中,所述针对每个所述通道,将该通道对应的第一特征图与其它通道分别对应的第一特征图进行特征信息融合,得到融合后的第二特征图,包括:The object detection method according to claim 4, wherein, for each of the channels, the first feature map corresponding to the channel and the first feature maps corresponding to other channels are fused with feature information to obtain the fused The second feature map includes:
    针对进行特征信息融合的多个第一特征图,确定所述多个第一特征图所对应的权重矩阵;Determining a weight matrix corresponding to the plurality of first feature maps for performing feature information fusion;
    基于所述权重矩阵,对所述多个第一特征图的特征信息进行加权求和,得到包含各个融合特征信息的所述第二特征图。Based on the weight matrix, performing a weighted summation on the feature information of the multiple first feature maps to obtain the second feature map containing each fused feature information.
  6. 根据权利要求4所述的对象检测方法,其中,所述基于所述融合后的第二特征图,检测所述待检测的车舱内图像中的所述待检测对象,包括:4. The object detection method according to claim 4, wherein the detecting the object to be detected in the image of the cabin to be detected based on the fused second feature map comprises:
    基于所述融合后的第二特征图,确定设定个数的候选区域,每个候选区域包含设定个数的特征点;Determine a set number of candidate regions based on the fused second feature map, and each candidate region contains a set number of feature points;
    基于每个候选区域包含的特征点的特征数据,确定该候选区域对应的置信度;每个候选区域对应的置信度用于表征该候选区域中包含所述待检测对象的可信程度;Based on the feature data of the feature points contained in each candidate area, determine the confidence level corresponding to the candidate area; the confidence level corresponding to each candidate area is used to characterize the credibility that the candidate area contains the object to be detected;
    基于每个候选区域对应的置信度以及不同候选区域之间的重叠区域,从所述设定个 数的候选区域中筛选出待检测对象对应的检测区域;所述检测区域用于标识所述待检测对象在所述待检测的车舱内图像中的位置。Based on the confidence level corresponding to each candidate area and the overlapping area between different candidate areas, the detection area corresponding to the object to be detected is screened out from the set number of candidate areas; the detection area is used to identify the to-be-detected object The position of the detection object in the image in the cabin to be detected.
  7. 根据权利要求1所述的对象检测方法,其中,所述获取所述待检测的车舱内图像,包括:The object detection method according to claim 1, wherein said acquiring the image in the cabin to be detected comprises:
    获取待检测的车舱内视频流;Obtain the video stream in the cabin to be detected;
    从所述待检测的车舱内视频流包含的连续多帧车舱内图像中,间隔提取得到所述待检测的车舱内图像。From the continuous multiple frames of the in-cabin images contained in the video stream in the cabin to be detected, the in-cabin images to be detected are extracted at intervals.
  8. 根据权利要求7所述的对象检测方法,其中,所述对所述待检测的车舱内图像进行目标检测,还包括:The object detection method according to claim 7, wherein said performing target detection on the image in the cabin to be detected further comprises:
    将所述待检测的车舱内视频流中的每个车舱内图像作为待追踪图像,针对每个非首帧待追踪图像,基于该非首帧待追踪图像的前一帧待追踪图像中的所述待检测对象的位置信息以及该非首帧待追踪图像,确定所述待检测对象在该非首帧待追踪图像中的预测位置信息;Use each cabin image in the cabin video stream to be detected as a to-be-tracked image, and for each non-first frame to be tracked image, based on the previous frame of the non-first frame to-be-tracked image in the to-be-tracked image Determining the predicted position information of the object to be detected in the non-first frame to be tracked image of the position information of the object to be detected and the non-first frame to be tracked image;
    确定该非首帧待追踪图像是否为检测出待检测对象的待检测的车舱内图像;Determine whether the non-first frame of the to-be-tracked image is the to-be-detected image in the cabin where the object to be detected is detected;
    在确定该非首帧待追踪图像是检测出待检测对象的待检测的车舱内图像时,将检测出的位置信息作为待检测对象在该非首帧待追踪图像中的位置信息;When it is determined that the non-first frame of the to-be-tracked image is the to-be-detected cabin image in which the object to be detected is detected, use the detected position information as the position information of the to-be-detected object in the non-first frame of the to-be-tracked image;
    在确定该非首帧待追踪图像不是检测出待检测对象的待检测的车舱内图像时,将确定的预测位置信息作为待检测对象在该非首帧待追踪图像中的位置信息。When it is determined that the non-first frame of the to-be-tracked image is not the to-be-detected cabin image in which the object to be detected is detected, the determined predicted position information is used as the position information of the to-be-detected object in the non-first frame of the to-be-tracked image.
  9. 根据权利要求1-8任一所述的对象检测方法,其中,对所述待检测的车舱内图像进行目标检测由神经网络执行;8. The object detection method according to any one of claims 1-8, wherein the target detection on the image in the cabin to be detected is performed by a neural network;
    所述神经网络利用包含了待检测样本对象的车舱内样本图像和未包含待检测样本对象的车舱内样本图像训练得到。The neural network is trained by using sample images in the cabin containing the sample objects to be detected and sample images in the cabin that do not contain the sample objects to be detected.
  10. 一种对象检测装置,包括:An object detection device includes:
    图像获取模块,配置为获取待检测的车舱内图像;An image acquisition module, configured to acquire an image in the cabin to be detected;
    图像检测模块,配置为在车舱内人员减少的情况下,对所述待检测的车舱内图像进行目标检测,确定所述待检测的车舱内图像中是否存在待检测对象;An image detection module configured to perform target detection on the image in the cabin to be detected when the number of people in the cabin is reduced, and determine whether there is an object to be detected in the image in the cabin to be detected;
    提示模块,配置为响应于所述待检测的车舱内图像中存在待检测对象的状态的持续时长超过预设时长,发出提示信息。The prompting module is configured to send a prompt message in response to the duration of the state in which the object to be detected exists in the image of the cabin to be detected exceeds a preset period of time.
  11. 根据权利要求10所述的对象检测装置,其中,所述提示模块配置为响应于所述待检测的车舱内图像中存在待检测对象的状态的持续时长超过预设时长,发出提示信息,包括:The object detection device according to claim 10, wherein the prompt module is configured to send a prompt message in response to a state in which the object to be detected exists in the image in the cabin to be detected exceeds a preset period of time, including :
    在减少的车舱内人员为乘客的情况下,响应于所述待检测的车舱内图像中存在待检测对象的状态的持续时长超过第一预设时长,发出第一提示信息,所述第一提示信息用于提示乘客物品遗留;In the case that the reduced number of persons in the cabin is a passenger, in response to the duration of the state in which the object to be detected exists in the image to be detected in the cabin exceeds the first preset time period, a first prompt message is issued, and the first prompt message is sent. A reminder message is used to remind the passenger that the item is left behind;
    在减少的车舱内人员为驾驶员的情况下,响应于所述待检测的车舱内图像中存在待检测对象的状态的持续时长超过第二预设时长,发出第二提示信息,所述第二提示信息用于提示驾驶员物品遗留。In the case where the reduced number of persons in the cabin is the driver, in response to the duration of the state in which the object to be detected exists in the image to be detected in the cabin exceeds the second preset duration, a second prompt message is issued, the The second prompt message is used to prompt the driver that the item is left behind.
  12. 根据权利要求10或11所述的对象检测装置,其中,减少的车舱内人员为驾驶员和/或乘客,在图像检测模块确定所述待检测的车舱内图像中存在待检测对象后,在提示模块发出提示信息之前,所述图像检测模块还配置为:The object detection device according to claim 10 or 11, wherein the reduced number of people in the cabin is the driver and/or passenger, and after the image detection module determines that there is an object to be detected in the image in the cabin to be detected, Before the prompting module sends out the prompting information, the image detection module is also configured to:
    根据所述待检测对象在所述车舱中位置,确定所述待检测对象的归属人员;其中,所述待检测对象的归属人员为驾驶员和/或乘客。According to the position of the object to be detected in the cabin, the attribution of the object to be detected is determined; wherein the attribution of the object to be detected is the driver and/or passenger.
  13. 根据权利要求10至12任一所述的对象检测装置,其中,所述图像检测模块,配置为对所述待检测的车舱内图像进行目标检测,包括:The object detection device according to any one of claims 10 to 12, wherein the image detection module is configured to perform target detection on the image in the cabin to be detected, comprising:
    对所述待检测的车舱内图像进行特征提取,得到与多个通道中每个通道对应的第一 特征图;其中每个通道对应的第一特征图,为将待检测对象在该通道对应的图像特征类别下的特征进行增强处理后的特征图;Perform feature extraction on the image in the cabin to be detected to obtain a first feature map corresponding to each of the multiple channels; wherein the first feature map corresponding to each channel corresponds to the object to be detected in the channel The feature map after the enhancement processing of the features under the image feature category;
    针对每个所述通道,将该通道对应的第一特征图与其它通道分别对应的第一特征图进行特征信息融合,得到融合后的第二特征图;For each of the channels, perform feature information fusion on the first feature map corresponding to the channel and the first feature maps corresponding to the other channels to obtain a fused second feature map;
    基于所述融合后的第二特征图,检测所述待检测的车舱内图像中的所述待检测对象。Based on the fused second feature map, the object to be detected in the image in the cabin to be detected is detected.
  14. 根据权利要求13所述的对象检测装置,其中,所述图像检测模块,配置为针对每个所述通道,将该通道对应的第一特征图与其它通道分别对应的第一特征图进行特征信息融合,得到融合后的第二特征图,包括:The object detection device according to claim 13, wherein the image detection module is configured to perform, for each of the channels, a first feature map corresponding to the channel and first feature maps corresponding to other channels to perform feature information Fusion, the second feature map after fusion is obtained, including:
    针对进行特征信息融合的多个第一特征图,确定所述多个第一特征图所对应的权重矩阵;Determining a weight matrix corresponding to the plurality of first feature maps for performing feature information fusion;
    基于所述权重矩阵,对所述多个第一特征图的特征信息进行加权求和,得到包含各个融合特征信息的所述第二特征图。Based on the weight matrix, performing a weighted summation on the feature information of the multiple first feature maps to obtain the second feature map containing each fused feature information.
  15. 根据权利要求13所述的对象检测装置,其中,所述图像检测模块,配置为基于所述融合后的第二特征图,检测所述待检测的车舱内图像中的所述待检测对象,包括:The object detection device according to claim 13, wherein the image detection module is configured to detect the object to be detected in the image of the cabin to be detected based on the fused second feature map, include:
    基于所述融合后的第二特征图,确定设定个数的候选区域,每个候选区域包含设定个数的特征点;Determine a set number of candidate regions based on the fused second feature map, and each candidate region contains a set number of feature points;
    基于每个候选区域包含的特征点的特征数据,确定该候选区域对应的置信度;每个候选区域对应的置信度用于表征该候选区域中包含所述待检测对象的可信程度;Based on the feature data of the feature points contained in each candidate area, determine the confidence level corresponding to the candidate area; the confidence level corresponding to each candidate area is used to characterize the credibility that the candidate area contains the object to be detected;
    基于每个候选区域对应的置信度以及不同候选区域之间的重叠区域,从所述设定个数的候选区域中筛选出待检测对象对应的检测区域;所述检测区域用于标识所述待检测对象在所述待检测的车舱内图像中的位置。Based on the confidence level corresponding to each candidate area and the overlapping area between different candidate areas, the detection area corresponding to the object to be detected is screened out from the set number of candidate areas; the detection area is used to identify the to-be-detected object The position of the detection object in the image in the cabin to be detected.
  16. 根据权利要求10所述的对象检测装置,其中,所述图像获取模块,配置为获取所述待检测的车舱内图像,包括:The object detection device according to claim 10, wherein the image acquisition module, configured to acquire the image in the cabin to be detected, comprises:
    获取待检测的车舱内视频流;Obtain the video stream in the cabin to be detected;
    从所述待检测的车舱内视频流包含的连续多帧车舱内图像中,间隔提取得到所述待检测的车舱内图像。From the continuous multiple frames of the in-cabin images contained in the video stream in the cabin to be detected, the in-cabin images to be detected are extracted at intervals.
  17. 根据权利要求16所述的对象检测装置,其中,所述图像检测模块,配置为对所述待检测的车舱内图像进行目标检测,还包括:The object detection device according to claim 16, wherein the image detection module is configured to perform target detection on the image in the cabin to be detected, further comprising:
    将所述待检测的车舱内视频流中的每个车舱内图像作为待追踪图像,针对每个非首帧待追踪图像,基于该非首帧待追踪图像的前一帧待追踪图像中的所述待检测对象的位置信息以及该非首帧待追踪图像,确定所述待检测对象在该非首帧待追踪图像中的预测位置信息;Use each cabin image in the cabin video stream to be detected as a to-be-tracked image, and for each non-first frame to be tracked image, based on the previous frame of the non-first frame to-be-tracked image in the to-be-tracked image Determining the predicted position information of the object to be detected in the non-first frame to be tracked image of the position information of the object to be detected and the non-first frame to be tracked image;
    确定该非首帧待追踪图像是否为检测出待检测对象的待检测的车舱内图像;Determine whether the non-first frame of the to-be-tracked image is the to-be-detected image in the cabin where the object to be detected is detected;
    在确定该非首帧待追踪图像是检测出待检测对象的待检测的车舱内图像时,将检测出的位置信息作为待检测对象在该非首帧待追踪图像中的位置信息;When it is determined that the non-first frame of the to-be-tracked image is the to-be-detected cabin image in which the object to be detected is detected, use the detected position information as the position information of the to-be-detected object in the non-first frame of the to-be-tracked image;
    在确定该非首帧待追踪图像不是检测出待检测对象的待检测的车舱内图像时,将确定的预测位置信息作为待检测对象在该非首帧待追踪图像中的位置信息。When it is determined that the non-first frame of the to-be-tracked image is not the to-be-detected cabin image in which the object to be detected is detected, the determined predicted position information is used as the position information of the to-be-detected object in the non-first frame of the to-be-tracked image.
  18. 根据权利要求10至17任一所述的对象检测装置,其中,对所述待检测的车舱内图像进行目标检测由神经网络执行;The object detection device according to any one of claims 10 to 17, wherein the target detection on the image in the cabin to be detected is performed by a neural network;
    所述神经网络利用包含了待检测样本对象的车舱内样本图像和未包含待检测样本对象的车舱内样本图像训练得到。The neural network is trained by using sample images in the cabin containing the sample objects to be detected and sample images in the cabin that do not contain the sample objects to be detected.
  19. 一种电子设备,包括:处理器、存储器和总线,所述存储器存储有所述处理器可执行的机器可读指令,当电子设备运行时,所述处理器与所述存储器之间通过总线通信,所述机器可读指令被所述处理器执行时执行如权利要求1至9任一所述的对象检测方法。An electronic device, comprising: a processor, a memory, and a bus. The memory stores machine-readable instructions executable by the processor. When the electronic device is running, the processor and the memory communicate through the bus When the machine-readable instructions are executed by the processor, the object detection method according to any one of claims 1 to 9 is executed.
  20. 一种计算机可读存储介质,该计算机可读存储介质上存储有计算机程序,该计算机程序被处理器运行时执行如权利要求1至9任一所述的对象检测方法。A computer-readable storage medium with a computer program stored on the computer-readable storage medium, and when the computer program is run by a processor, the object detection method according to any one of claims 1 to 9 is executed.
  21. 一种计算机程序,包括计算机可读代码,当所述计算机可读代码在电子设备中运行时,所述电子设备中的处理器执行用于实现权利要求1至9任一所述的对象检测方法。A computer program, comprising computer-readable code, when the computer-readable code runs in an electronic device, a processor in the electronic device executes the method for implementing the object detection method of any one of claims 1 to 9 .
PCT/CN2020/137919 2020-05-29 2020-12-21 Object detection method and apparatus, electronic device, storage medium and program WO2021238185A1 (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
JP2021558015A JP7224489B2 (en) 2020-05-29 2020-12-21 Target detection method, device, electronic device, storage medium and program
KR1020217034510A KR20210149088A (en) 2020-05-29 2020-12-21 Object detection method, apparatus, electronic device, storage medium and program

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010477936.9 2020-05-29
CN202010477936.9A CN111652114B (en) 2020-05-29 2020-05-29 Object detection method and device, electronic equipment and storage medium

Publications (1)

Publication Number Publication Date
WO2021238185A1 true WO2021238185A1 (en) 2021-12-02

Family

ID=72352686

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/137919 WO2021238185A1 (en) 2020-05-29 2020-12-21 Object detection method and apparatus, electronic device, storage medium and program

Country Status (4)

Country Link
JP (1) JP7224489B2 (en)
KR (1) KR20210149088A (en)
CN (1) CN111652114B (en)
WO (1) WO2021238185A1 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117036482A (en) * 2023-08-22 2023-11-10 北京智芯微电子科技有限公司 Target object positioning method, device, shooting equipment, chip, equipment and medium
CN117152890A (en) * 2023-03-22 2023-12-01 宁德祺朗科技有限公司 Designated area monitoring method, system and terminal

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111626222A (en) * 2020-05-28 2020-09-04 深圳市商汤科技有限公司 Pet detection method, device, equipment and storage medium
CN111652114B (en) * 2020-05-29 2023-08-25 深圳市商汤科技有限公司 Object detection method and device, electronic equipment and storage medium
CN112818743B (en) * 2020-12-29 2022-09-23 腾讯科技(深圳)有限公司 Image recognition method and device, electronic equipment and computer storage medium
CN113313090A (en) * 2021-07-28 2021-08-27 四川九通智路科技有限公司 Abandoned person detection and tracking method for abandoned suspicious luggage
WO2023039781A1 (en) * 2021-09-16 2023-03-23 华北电力大学扬中智能电气研究中心 Method for detecting abandoned object, apparatus, electronic device, and storage medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106560836A (en) * 2015-10-02 2017-04-12 Lg电子株式会社 Apparatus, Method And Mobile Terminal For Providing Object Loss Prevention Service In Vehicle
CN107585096A (en) * 2016-07-08 2018-01-16 奥迪股份公司 Anti- forgetting system for prompting, method and vehicle
CN108734056A (en) * 2017-04-18 2018-11-02 深圳富泰宏精密工业有限公司 Vehicle environmental detection device and detection method
CN110659600A (en) * 2019-09-19 2020-01-07 北京百度网讯科技有限公司 Object detection method, device and equipment
CN111652114A (en) * 2020-05-29 2020-09-11 深圳市商汤科技有限公司 Object detection method and device, electronic equipment and storage medium

Family Cites Families (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105894700A (en) * 2015-11-02 2016-08-24 乐卡汽车智能科技(北京)有限公司 Image-based in-vehicle moving object remote observing and warning device and method
US9881234B2 (en) * 2015-11-25 2018-01-30 Baidu Usa Llc. Systems and methods for end-to-end object detection
US10303961B1 (en) * 2017-04-13 2019-05-28 Zoox, Inc. Object detection and passenger notification
US11106927B2 (en) * 2017-12-27 2021-08-31 Direct Current Capital LLC Method for monitoring an interior state of an autonomous vehicle
US10628667B2 (en) 2018-01-11 2020-04-21 Futurewei Technologies, Inc. Activity recognition method using videotubes
CN109345510A (en) * 2018-09-07 2019-02-15 百度在线网络技术(北京)有限公司 Object detecting method, device, equipment, storage medium and vehicle
JP7208480B2 (en) * 2018-10-12 2023-01-19 富士通株式会社 Learning program, detection program, learning device, detection device, learning method and detection method
CN110070566B (en) * 2019-04-29 2021-07-30 武汉睿智视讯科技有限公司 Information detection method and device, computer equipment and readable storage medium
CN110610123A (en) * 2019-07-09 2019-12-24 北京邮电大学 Multi-target vehicle detection method and device, electronic equipment and storage medium
CN110807385B (en) * 2019-10-24 2024-01-12 腾讯科技(深圳)有限公司 Target detection method, target detection device, electronic equipment and storage medium
CN111144404B (en) * 2019-12-06 2023-08-11 恒大恒驰新能源汽车科技(广东)有限公司 Method, apparatus, system, computer device and storage medium for detecting legacy object

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106560836A (en) * 2015-10-02 2017-04-12 Lg电子株式会社 Apparatus, Method And Mobile Terminal For Providing Object Loss Prevention Service In Vehicle
CN107585096A (en) * 2016-07-08 2018-01-16 奥迪股份公司 Anti- forgetting system for prompting, method and vehicle
CN108734056A (en) * 2017-04-18 2018-11-02 深圳富泰宏精密工业有限公司 Vehicle environmental detection device and detection method
CN110659600A (en) * 2019-09-19 2020-01-07 北京百度网讯科技有限公司 Object detection method, device and equipment
CN111652114A (en) * 2020-05-29 2020-09-11 深圳市商汤科技有限公司 Object detection method and device, electronic equipment and storage medium

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117152890A (en) * 2023-03-22 2023-12-01 宁德祺朗科技有限公司 Designated area monitoring method, system and terminal
CN117152890B (en) * 2023-03-22 2024-03-08 宁德祺朗科技有限公司 Designated area monitoring method, system and terminal
CN117036482A (en) * 2023-08-22 2023-11-10 北京智芯微电子科技有限公司 Target object positioning method, device, shooting equipment, chip, equipment and medium

Also Published As

Publication number Publication date
KR20210149088A (en) 2021-12-08
JP2022538201A (en) 2022-09-01
CN111652114B (en) 2023-08-25
JP7224489B2 (en) 2023-02-17
CN111652114A (en) 2020-09-11

Similar Documents

Publication Publication Date Title
WO2021238185A1 (en) Object detection method and apparatus, electronic device, storage medium and program
US10769459B2 (en) Method and system for monitoring driving behaviors
CN102073841B (en) Poor video detection method and device
US20170068863A1 (en) Occupancy detection using computer vision
CN111439170B (en) Child state detection method and device, electronic equipment and storage medium
Zhang et al. Visual recognition of driver hand-held cell phone use based on hidden CRF
CN107862340A (en) A kind of model recognizing method and device
CN111325141B (en) Interactive relationship identification method, device, equipment and storage medium
CN102867188A (en) Method for detecting seat state in meeting place based on cascade structure
CN104077776B (en) A kind of visual background extracting method based on color space adaptive updates
US11023714B2 (en) Suspiciousness degree estimation model generation device
CN104182769A (en) Number plate detection method and system
JP2019106193A (en) Information processing device, information processing program and information processing method
Papakis et al. Convolutional neural network-based in-vehicle occupant detection and classification method using second strategic highway research program cabin images
CN112634558A (en) System and method for preventing removal of an item from a vehicle by an improper party
Mathew et al. Detecting new stable objects in surveillance video
JP6472504B1 (en) Information processing apparatus, information processing program, and information processing method
WO2024001617A1 (en) Method and apparatus for identifying behavior of playing with mobile phone
CN108985197B (en) Automatic detection method for taxi driver smoking behavior based on multi-algorithm fusion
CN112381068B (en) Method and system for detecting 'playing mobile phone' of person
CN115690883A (en) Method for obtaining target training sample set and related device
EP4089629A1 (en) Human detection device, human detection method, and recording medium
KR20200027078A (en) Method and apparatus for detecting object independently of size using convolutional neural network
CN116152790B (en) Safety belt detection method and device
CN113361340B (en) Feature prompting method, device and computer storage medium

Legal Events

Date Code Title Description
ENP Entry into the national phase

Ref document number: 2021558015

Country of ref document: JP

Kind code of ref document: A

ENP Entry into the national phase

Ref document number: 20217034510

Country of ref document: KR

Kind code of ref document: A

121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20937357

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 17.03.2023)

122 Ep: pct application non-entry in european phase

Ref document number: 20937357

Country of ref document: EP

Kind code of ref document: A1