CN116977698A - Target detection method, device, equipment, storage medium and program product - Google Patents

Target detection method, device, equipment, storage medium and program product Download PDF

Info

Publication number
CN116977698A
CN116977698A CN202310456122.0A CN202310456122A CN116977698A CN 116977698 A CN116977698 A CN 116977698A CN 202310456122 A CN202310456122 A CN 202310456122A CN 116977698 A CN116977698 A CN 116977698A
Authority
CN
China
Prior art keywords
control parameter
candidate region
determining
prediction
candidate
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310456122.0A
Other languages
Chinese (zh)
Inventor
马焕
吴秉哲
张长青
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tencent Technology Shenzhen Co Ltd
Original Assignee
Tencent Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tencent Technology Shenzhen Co Ltd filed Critical Tencent Technology Shenzhen Co Ltd
Priority to CN202310456122.0A priority Critical patent/CN116977698A/en
Publication of CN116977698A publication Critical patent/CN116977698A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/80Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Computing Systems (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Multimedia (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Molecular Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Image Analysis (AREA)

Abstract

The application provides a target detection method, a target detection device, a target detection equipment, a target detection storage medium and a target detection program product; the application can be applied to target detection scenes such as intelligent medical treatment, automatic driving, unmanned driving and the like; the method comprises the following steps: determining at least one candidate region in an input image aiming at a target object, and determining at least one distribution control parameter corresponding to the at least one candidate region, wherein the distribution control parameter is a parameter for controlling probability distribution obeyed by the region parameter of the candidate region; analyzing the reliability of each candidate region based on the distribution control parameter corresponding to each candidate region to obtain analysis information corresponding to each candidate region; the analysis information is used for providing influences of different factors on the reliability of the candidate area; and determining a corresponding detection result for the target object based on the at least one candidate region and the at least one piece of analysis information. According to the application, the detection accuracy of target detection can be improved.

Description

Target detection method, device, equipment, storage medium and program product
Technical Field
The present application relates to artificial intelligence technology, and in particular, to a target detection method, apparatus, device, storage medium, and program product.
Background
Target detection is an important branch in artificial intelligence technology, and is widely applied to the fields of intelligent medical treatment, automatic driving, unmanned aerial vehicle and the like. With the development of deep learning, target detection is mostly realized by a deep learning model. However, when the deep learning model is used for target detection, reliability source analysis is lacking for the prediction frame, and if a prediction frame of a certain scale is wrong, the detection result of the target detection is most likely to be affected. Therefore, in the related art, there is a problem that the detection accuracy of the target detection is low.
Disclosure of Invention
The embodiment of the application provides a target detection method, a device, equipment, a computer readable storage medium and a computer program product, which can improve the detection accuracy of target detection.
The technical scheme of the embodiment of the application is realized as follows:
the embodiment of the application provides a target detection method, which comprises the following steps:
determining at least one candidate region in an input image aiming at a target object, and determining at least one distribution control parameter corresponding to at least one candidate region, wherein the distribution control parameter is a parameter for controlling probability distribution obeyed by the region parameter of the candidate region;
Analyzing the reliability of each candidate region based on the distribution control parameter corresponding to each candidate region to obtain analysis information corresponding to each candidate region; the analysis information is used for providing influences of different factors on the reliability of the candidate area;
and determining a corresponding detection result for the target object based on at least one candidate region and at least one piece of analysis information.
An embodiment of the present application provides a target detection apparatus, including:
the information determining module is used for determining at least one candidate region in the input image aiming at the target object and determining at least one distribution control parameter corresponding to the at least one candidate region, wherein the distribution control parameter refers to a parameter for controlling probability distribution obeyed by the region parameter of the candidate region;
the reliability analysis module is used for analyzing the reliability of each candidate region based on the distribution control parameter corresponding to each candidate region to obtain analysis information corresponding to each candidate region; the analysis information is used for providing influences of different factors on the reliability of the candidate area;
And the result determining module is used for determining a corresponding detection result for the target object based on at least one candidate region and at least one piece of analysis information.
In some embodiments of the present application, the information determining module is further configured to predict, under M scales, N initial prediction areas where the target object is located in the input image, and N initial control parameters corresponding to the N initial prediction areas; wherein M and N are positive integers, and N is more than or equal to M; and determining at least one candidate region and at least one distribution control parameter corresponding to the at least one candidate region based on the N initial prediction regions and the N initial control parameters.
In some embodiments of the application, the M dimensions include: at least two dimensions; the information determining module is further configured to extract, from N initial prediction areas, a prediction area to be fused for each scale; fusing at least two prediction areas to be fused to obtain a scale fusion area which corresponds to at least two scales together, and determining the scale fusion area as at least one candidate area; fusing at least two initial control parameters corresponding to at least two prediction areas to be fused to obtain fusion control parameters corresponding to the scale fusion areas, and determining the fusion control parameters as at least one distribution control parameter.
In some embodiments of the present application, the information determining module is further configured to screen out reliable prediction areas under each scale from N initial prediction areas according to the observed evidence quantity corresponding to each initial prediction area; and determining the reliable prediction area under each scale as at least one candidate area, and determining an initial control parameter corresponding to the reliable prediction area in N initial control parameters as at least one distribution control parameter.
In some embodiments of the application, at least one of the candidate regions comprises: the plurality of candidate regions, at least one of the resolution information includes: a plurality of the parsing information; the result determining module is further configured to determine, according to each piece of resolution information, a corresponding reliability degree for each candidate region; and determining the candidate region with the largest reliability degree in the candidate regions as the detection result corresponding to the target object.
In some embodiments of the application, the probability distribution includes: a normal inverse gamma distribution defined by a mean variable and a variance variable of the region parameter; the distribution control parameters include: a central control parameter, a variance control parameter, a first variable control parameter, and a second variable control parameter; the central control parameter is used for controlling the distribution center of the normal inverse gamma distribution, the variance control parameter is used for controlling the variance of the normal inverse gamma distribution, the first variable control parameter is used for controlling the concentration degree of variance variables in the normal inverse gamma distribution, and the second variable control parameter is used for controlling the concentration degree of mean variables of the normal inverse gamma distribution.
In some embodiments of the present application, the reliability analysis module is further configured to determine a random impact factor for each of the candidate regions based on the variance control parameter and the first variable control parameter; determining a cognitive influence factor for each of the candidate regions based on the variance control parameter, the first variable control parameter, and the second variable control parameter; and analyzing the reliability of each candidate region according to at least one of the random influence factor and the cognitive influence factor to obtain the analysis information corresponding to each candidate region.
In some embodiments of the present application, the reliability analysis module is further configured to perform a difference calculation for the first variable control parameter and a preset factor to obtain a variable difference; and determining a ratio between the variance control parameter and the variable difference as the random influence factor of each candidate region.
In some embodiments of the present application, the reliability analysis module is further configured to perform a difference calculation for the first variable control parameter and a preset factor to obtain a variable difference; determining the product of the variable difference value and the second variable control parameter as a variable product; and determining the ratio of the second variable control parameter to the variable product as the cognitive influence factor of each candidate region.
In some embodiments of the application, the object detection device further comprises: the model training module is used for predicting and obtaining a training prediction area where a training target object is located in a training image and training control parameters corresponding to the training prediction area under each of M scales through an initial detection model; determining a loss value of each scale based on the training control parameter, the training prediction area and the labeling area of the training target object under each scale; and updating the parameters of the initial detection model by using the loss value of each scale until the model training ending condition is reached, so as to obtain the information prediction model.
In some embodiments of the present application, the model training module is further configured to perform maximum likelihood estimation for each scale by using the training control parameter, the training prediction area, and the labeling area of the training target object at each scale, and determine a maximum likelihood estimation value as the loss value of each scale.
An embodiment of the present application provides an electronic device, including:
a memory for storing computer executable instructions;
and the processor is used for realizing the target detection method provided by the embodiment of the application when executing the computer executable instructions stored in the memory.
The embodiment of the application provides a computer readable storage medium, which stores computer executable instructions for realizing the target detection method provided by the embodiment of the application when being executed by a processor.
Embodiments of the present application provide a computer program product comprising a computer program or computer-executable instructions which, when executed by a processor, implement the object detection method provided by the embodiments of the present application.
The embodiment of the application has the following beneficial effects: the electronic equipment determines the distribution control parameters of probability distribution obeyed by the regional parameters of the candidate region besides determining the candidate region for the target object, performs modeling analysis on the reliability of the candidate region through the distribution control parameters to obtain analysis information for defining influences of different factors on the candidate region, and finally provides guidance for the process of determining the final detection result based on the candidate region through the analysis information, so that the possibility of influencing the final detection result when the candidate region is in error prediction is reduced, and the detection accuracy of target detection is improved.
Drawings
FIG. 1 is a schematic diagram of an architecture of an object detection system according to an embodiment of the present application;
FIG. 2 is a schematic diagram of a structure of the server in FIG. 1 according to an embodiment of the present application;
FIG. 3 is a flowchart illustrating a target detection method according to an embodiment of the present application;
FIG. 4 is a second flow chart of the target detection method according to the embodiment of the present application;
fig. 5 is a flowchart of a target detection method according to an embodiment of the present application;
fig. 6 is a flow chart of a target detection method according to an embodiment of the present application;
fig. 7 is a flowchart of a target detection method according to an embodiment of the present application;
fig. 8 is a schematic diagram of a training process of an information prediction model according to an embodiment of the present application.
Detailed Description
The present application will be further described in detail with reference to the accompanying drawings, for the purpose of making the objects, technical solutions and advantages of the present application more apparent, and the described embodiments should not be construed as limiting the present application, and all other embodiments obtained by those skilled in the art without making any inventive effort are within the scope of the present application.
In the following description, reference is made to "some embodiments" which describe a subset of all possible embodiments, but it is to be understood that "some embodiments" can be the same subset or different subsets of all possible embodiments and can be combined with one another without conflict.
In the following description, the terms "first", "second", and the like are merely used to distinguish between similar objects and do not represent a particular ordering of the objects, it being understood that the "first", "second", or the like may be interchanged with one another, if permitted, to enable embodiments of the application described herein to be practiced otherwise than as illustrated or described herein.
Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs. The terminology used herein is for the purpose of describing embodiments of the application only and is not intended to be limiting of the application.
Before describing embodiments of the present application in further detail, the terms and terminology involved in the embodiments of the present application will be described, and the terms and terminology involved in the embodiments of the present application will be used in the following explanation.
1) Artificial intelligence (Artificial Intelligence, AI) is the theory, method, technique and application system that uses a digital computer or a machine controlled by a digital computer to simulate, extend and extend human intelligence, sense the environment, acquire knowledge and use the knowledge to obtain optimal results. In other words, artificial intelligence is an integrated technology of computer science that attempts to understand the essence of intelligence and to produce a new intelligent machine that can react in a similar way to human intelligence. Artificial intelligence, i.e. research on design principles and implementation methods of various intelligent machines, enables the machines to have functions of sensing, reasoning and decision.
2) Computer Vision (CV) is a science of how to "look" at a machine, and more specifically, to replace a human eye with a camera and a Computer to perform machine Vision such as recognition and measurement on a target, and further perform graphic processing, so that the Computer processing is called an image more suitable for human eye observation or transmission to an instrument for detection. As a scientific discipline, computer vision research-related theory and technology has attempted to build artificial intelligence systems that can acquire information from images or multidimensional data.
3) Object Detection (Object Detection) refers to finding objects of interest in an image and determining the class and location of these objects, object Detection being one of the core problems of computer vision technology.
4) Probability distribution refers to a law for expressing the probability of a random variable taking a value.
5) The control parameter is a parameter for controlling the form and curve of the probability distribution, and a specific probability distribution can be obtained by setting the control parameter. For example, the desired sum variance is two control parameters of a normal distribution, i.e., by adjusting the desired sum variance, a particular normal distribution can be obtained.
6) Random uncertainty refers to the effect of noise present in an input image on prediction accuracy, and may also be referred to as data uncertainty.
7) The cognitive uncertainty refers to the influence on prediction accuracy caused by insufficient performance of the deep learning model, and may also be referred to as model uncertainty.
With research and advancement of artificial intelligence technology, artificial intelligence technology has been developed in various fields such as common smart homes, smart wearable devices, virtual assistants, smart speakers, smart marketing, unmanned, autopilot, unmanned, robotic, smart medical, smart customer service, etc. It is believed that with the development of technology, artificial intelligence technology will find application in more fields and will be of increasing value.
Target detection is an important branch in artificial intelligence technology, and is widely applied to the fields of intelligent medical treatment, automatic driving, unmanned aerial vehicle and the like. With the development of deep learning, object detection is mostly achieved by a deep learning model, wherein the deep learning model may include a series of convolution layers and prediction units. The deep learning model may be obtained by: given an input image x and labeling information y thereof, extracting characteristics of the input image x through a convolution layer, wherein the process can be expressed as g=f (x; θ), and the characteristic is the convolution layer characteristics of the input image x; according to the convolution layer characteristics extracted by the convolution layer, the prediction unit predicts whether a target center exists in the image area which is responsible for the prediction unit, if the target center does not exist in the image area, the classification score cl of the image area is constrained to be close to 0, and the constraint loss of the classification score cl is L cl =(cl-0) 2 The method comprises the steps of carrying out a first treatment on the surface of the If there is a target center inside, the constraint loss of the classification score cl is l= (cl-1) 2 And calculates the Euclidean distance of the position of the target center from the position of a certain angle (such as the upper left corner or the upper right corner) of the input image to obtain a loss L c =(pc-c) 2 The method comprises the steps of carrying out a first treatment on the surface of the After the target center prediction is completed, predicting the width pw and the height ph of the target object, and obtaining a loss L according to the frame reality value w h=(pw-w) 2 +(ph-h) 2 The method comprises the steps of carrying out a first treatment on the surface of the Finally, according to all losses obtained, l=l cl +L c +L wh And optimizing the parameter theta of the deep learning model. In forward reasoning processes, i.e. needlesWhen the target detection is carried out on the input image which is not marked, the target center prediction, the target width prediction and the target height prediction are carried out on the input image which is not marked, so that a prediction result is obtained.
However, in the actual target detection task, the sizes of the target objects are not uniform, that is, the scales of the target objects are different, and at this time, the deep learning model obtained by the method is not stable. In contrast, in the related art, a method of multi-collaborative prediction is introduced into a deep learning model, and at this time, the deep learning model is obtained by: given an input image x and labeling information y thereof, dividing the input image x into 3 different scales according to preset scale conditions, and then extracting features by using a convolution layer, wherein the process can be expressed as g k =f (x; θ), k=1, 2,3, where g k A convolution layer feature of a kth scale of the input image x; the prediction unit predicts the target center in the image area of each scale, if the target center is not present, the classification score is still constrained to approach 0, and the constraint loss of the classification score cl is L cl =(cl-0) 2 The method comprises the steps of carrying out a first treatment on the surface of the If the target center exists, the prediction unit predicts the frame of the target center of each scale, and calculates the loss L according to the position of the target center c Calculate the loss L for the frame w h, performing H; finally, all losses l=l with individual scales cl +L c +L w And h, optimizing the parameter theta of the deep learning model, constructing a coincident prediction frame by the deep learning model in cooperation with each scale in the forward reasoning process, and feeding back a final prediction frame as a final detection result.
As is apparent from the above description, in the related art, although single-scale or multi-scale target detection is possible using the deep learning model, the lack of reliability source analysis on the prediction frames due to the modeling analysis of the lack of uncertainty (e.g., cognitive uncertainty and random uncertainty) of the prediction frames for each scale, so that there is a high possibility that the detection result of target detection will be affected if a prediction frame of a certain scale is erroneous. As can be seen from the above, in the related art, there is a problem that the detection accuracy of the target detection is low.
The embodiment of the application provides a target detection method, a device, equipment, a computer readable storage medium and a computer program product, which can improve the detection accuracy of target detection. The following describes exemplary applications of the electronic device provided by the embodiments of the present application, where the electronic device provided by the embodiments of the present application may be implemented as a notebook computer, a tablet computer, a desktop computer, a set-top box, a mobile device (for example, a mobile phone, a portable music player, a personal digital assistant, a dedicated messaging device, a portable game device), a vehicle-mounted terminal, or any other type of terminal, and may also be implemented as a server. In the following, an exemplary application when the electronic device is implemented as a server will be described.
Referring to fig. 1, fig. 1 is a schematic diagram of an architecture of an object detection system according to an embodiment of the present application. To enable support for an object detection application, in the object detection system 100, terminals (terminal 400-1 and terminal 400-2 are illustratively shown) are connected to the server 200 via a network 300, and the network 300 may be a wide area network or a local area network, or a combination of both. In the object detection system 100, a database 500 is also provided for providing data support to the server 200. The database 500 may be independent of the server 200 or may be provided in the server 200. Fig. 1 shows a case where the database 500 is independent of the server 200.
The terminal 400-1 and the terminal 400-2 are respectively used for calling the image acquisition device, shooting images of the scene where the image acquisition device is located, and transmitting the shot images as input images to the server 200 through the network 300.
The server 200 is configured to determine at least one candidate region in the input image for the target object, and determine at least one distribution control parameter corresponding to the at least one candidate region, where the distribution control parameter is a parameter for controlling a probability distribution to which the region parameter of the candidate region is subjected; analyzing the reliability of each candidate region based on the distribution control parameter corresponding to each candidate region to obtain analysis information corresponding to each candidate region; and determining a corresponding detection result for the target object based on at least one candidate region and at least one piece of analysis information, and issuing the detection result to the terminal 400-1 and the terminal 400-2.
The terminals 400-1 and 400-2 display the input image and the detection result of the target object in the graphic interfaces 410-1 and 410-2, respectively, for the user to view.
The embodiment of the application can be realized by means of Cloud Technology (Cloud Technology), wherein the Cloud Technology refers to a hosting Technology for integrating serial resources such as hardware, software, network and the like in a wide area network or a local area network to realize calculation, storage, processing and sharing of data.
Cloud computing is a generic term of network technology, information technology, integration technology, management platform, application technology and the like based on cloud computing business model application, and can form a resource pool, be used as required, and be flexible and convenient. Cloud computing technology will become an important support. The system background service of the technical network needs a large amount of computing and storage resources and needs to be realized through cloud computing.
The server 200 may be an independent physical server, a server cluster or a distributed system formed by a plurality of physical servers, or a cloud server providing cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communication, middleware services, domain name services, security services, CDNs, basic cloud computing services such as big data and artificial intelligence platforms. The terminals 400-1 and 400-2 may be smart phones, tablet computers, notebook computers, desktop computers, smart speakers, smart watches, smart home appliances, car terminals, etc., but are not limited thereto. The terminal and the server may be directly or indirectly connected through wired or wireless communication, which is not limited in the embodiment of the present application.
Referring to fig. 2, fig. 2 is a schematic structural diagram of the server (an implementation of an electronic device) in fig. 1 according to an embodiment of the present application, and the server 200 shown in fig. 2 includes: at least one processor 210, a memory 250, at least one network interface 220, and a user interface 230. The various components in server 200 are coupled together by bus system 240. It is understood that the bus system 240 is used to enable connected communications between these components. The bus system 240 includes a power bus, a control bus, and a status signal bus in addition to the data bus. But for clarity of illustration the various buses are labeled as bus system 240 in fig. 2.
The processor 210 may be an integrated circuit chip with signal processing capabilities such as a general purpose processor, such as a microprocessor or any conventional processor, or the like, a digital signal processor (DSP, digital Signal Processor), or other programmable logic device, discrete gate or transistor logic, discrete hardware components, or the like.
The user interface 230 includes one or more output devices 231, including one or more speakers and/or one or more visual displays, that enable presentation of media content. The user interface 230 also includes one or more input devices 232, including user interface components that facilitate user input, such as a keyboard, mouse, microphone, touch screen display, camera, other input buttons and controls.
The memory 250 may be removable, non-removable, or a combination thereof. Exemplary hardware devices include solid state memory, hard drives, optical drives, and the like. Memory 250 optionally includes one or more storage devices physically located remote from processor 210.
Memory 250 includes volatile memory or nonvolatile memory, and may also include both volatile and nonvolatile memory. The non-volatile memory may be read only memory (ROM, read Only Me mory) and the volatile memory may be random access memory (RAM, random Access Memor y). The memory 250 described in embodiments of the present application is intended to comprise any suitable type of memory.
In some embodiments, memory 250 is capable of storing data to support various operations, examples of which include programs, modules and data structures, or subsets or supersets thereof, as exemplified below.
An operating system 251 including system programs for handling various basic system services and performing hardware-related tasks, such as a framework layer, a core library layer, a driver layer, etc., for implementing various basic services and handling hardware-based tasks;
network communication module 252 for reaching other computing devices via one or more (wired or wireless) network interfaces 220, exemplary network interfaces 220 include: bluetooth, wireless compatibility authentication (WiFi), and universal serial bus (USB, universal Serial Bus), etc.;
A presentation module 253 for enabling presentation of information (e.g., a user interface for operating peripheral devices and displaying content and information) via one or more output devices 231 (e.g., a display screen, speakers, etc.) associated with the user interface 230;
an input processing module 254 for detecting one or more user inputs or interactions from one of the one or more input devices 232 and translating the detected inputs or interactions.
In some embodiments, the object detection device provided in the embodiments of the present application may be implemented in software, and fig. 2 shows the object detection device 255 stored in the memory 250, which may be software in the form of a program, a plug-in, or the like, including the following software modules: information determination module 2551, reliability resolution module 2552, result determination module 2553, and model training module 2554 are logical, and thus may be arbitrarily combined or further split depending on the functions implemented. The functions of the respective modules will be described hereinafter.
In other embodiments, the object detection device provided by the embodiments of the present application may be implemented in hardware, and by way of example, the object detection device provided by the embodiments of the present application may be a processor in the form of a hardware decoding processor that is programmed to perform the object detection method provided by the embodiments of the present application, for example, the processor in the form of a hardware decoding processor may employ one or more application specific integrated circuits (ASIC, application Specific Integrated Circuit), DSP, programmable logic device (PLD, progra mmable Logic Device), complex programmable logic device (CPLD, complex Programmabl e Logic Device), field programmable gate array (FPGA, field-Programmable Gate Array), or other electronic component.
In some embodiments, the terminal or the server (all being possible implementations of the electronic device) may implement the target detection method provided by the embodiments of the present application by running a computer program. For example, the computer program may be a native program or a software module in an operating system; the system can be a local (Native) Application program (APP), namely a program which can be run only by being installed in an operating system, such as a personnel behavior detection APP and a smart doorbell APP; the method can also be an applet, namely a program which can be run only by being downloaded into a browser environment; but also an applet that can be embedded in any APP. In general, the computer programs described above may be any form of application, module or plug-in.
The method and the device can be applied to target detection scenes of intelligent medical treatment, automatic driving, unmanned driving and the like. In the following, an exemplary application and implementation of the electronic device provided by the embodiment of the present application will be described.
Referring to fig. 3, fig. 3 is a flowchart illustrating a target detection method according to an embodiment of the present application, and the steps illustrated in fig. 3 will be described.
S101, determining at least one candidate area in an input image aiming at a target object, and determining at least one distribution control parameter corresponding to the at least one candidate area.
The embodiment of the application is realized in a scene of carrying out target detection on the input image, namely, the area where the target object is located is determined in the input image. The input image may be received by the electronic device from other terminals, or may be obtained from a storage space or a database of the electronic device itself. After the electronic device obtains the input image, the area where the target object may appear in the input image is determined first, and each area is a candidate area candidate for the detection result called the target object, so that the electronic device can obtain at least one candidate area. Meanwhile, the electronic device determines the distribution control parameters corresponding to each candidate region, so that the electronic device can obtain at least one distribution control parameter, that is, the number of the distribution control parameters is the same as the number of the candidate regions.
The distribution control parameter is a parameter for controlling a probability distribution to which the region parameter of the candidate region is subjected. The region parameters of the candidate region may include a center position of the candidate region, or may include frame information such as a width and a height of the candidate region, or may include coordinates of vertices of the candidate region, etc., which are not limited herein.
The probability distribution to which the regional parameters are subjected is used to determine the probability of occurrence of the regional parameters. The probability distribution in the embodiment of the present application may be a normal distribution, a gamma distribution, or a normal inverse gamma distribution, which is not limited herein.
In the embodiment of the present application, the distribution control parameter refers to a generic term of all parameters capable of controlling the form of the probability distribution, and is not particularly limited to one parameter. For example, the expected and variance are collectively referred to as a distribution control parameter of a normal distribution. In other words, in an embodiment of the present application, the distribution control parameter may include one or more parameters.
In the embodiment of the application, the target object can be an object such as an automobile, a pedestrian and the like, and at the moment, the input image can be an image of a place such as a road, a square and the like; the target object may be a lesion, organ, or the like in the medical field, and the input image may be a medical image.
Referring to fig. 4, fig. 4 is a second flowchart of the target detection method according to the embodiment of the present application. In some embodiments of the present application, S101 in fig. 3, that is, the process of determining at least one candidate region in the input image for the target object and determining at least one distribution control parameter corresponding to the at least one candidate region, may be implemented by S1011-S1012 as follows:
S1011, predicting N initial prediction areas where the target object is located in the input image and N initial control parameters corresponding to the N initial prediction areas under M scales.
The electronic equipment predicts the region where the target object is located in the input image and the control parameters of probability distribution obeyed by the parameters of the region under each of the M scales to obtain at least one initial prediction region and at least one initial control parameter. When the region prediction is completed for the M scales respectively, the electronic equipment can obtain N initial prediction regions and N initial control parameters, wherein M and N are positive integers, and N is more than or equal to M. That is, the number of initial prediction regions is equal to or greater than the number of scales on which region prediction is required.
In some embodiments of the present application, an electronic device may read in an input image through an information prediction model capable of performing region prediction and parameter prediction simultaneously, so as to perform region prediction for a target object under each scale through the information prediction model, obtain N initial prediction regions, and synchronously predict control parameters of probability distribution obeyed by the regional parameters of each initial prediction region, so as to obtain initial control parameters corresponding to each initial prediction region.
In other embodiments of the present application, the electronic device may further read in the input image through a parameter prediction model that is only capable of parameter prediction, so as to obtain a control parameter of a probability distribution to which the parameter prediction model is subjected for the target object predictor parameter at each scale, and then determine a corresponding image area (for example, by using a table look-up manner or directly extracting a parameter for controlling a central position of the probability distribution in the control parameter (the parameter is the same as a central position of a control target center and a border) according to the control parameter, and use the image area as an initial prediction area. Thus, the electronic device can obtain N initial prediction regions and N initial control parameters.
S1012, determining at least one candidate area and at least one distribution control parameter corresponding to the at least one candidate area based on the N initial prediction areas and the N initial control parameters.
After obtaining the N initial prediction areas and the N initial control parameters, the electronic device combines the initial prediction areas and the initial control parameters to determine at least one candidate area for the target object, and at least one distribution control parameter corresponding to the at least one candidate area one by one.
Referring to fig. 5, fig. 5 is a flowchart illustrating a target detection method according to an embodiment of the present application. In some embodiments of the application, the M dimensions include: at least two scales, thus, S1012 in fig. 4, that is, the specific implementation process of determining at least one candidate region and at least one distributed control parameter corresponding to the at least one candidate region based on N initial prediction regions and N initial control parameters may be implemented through S1012a-S1012c as follows:
s1012a, extracting the prediction areas to be fused from the N initial prediction areas according to each scale.
And the electronic equipment extracts the regions from the N initial prediction regions for each of at least two scales, wherein the extracted regions are used for fusing the extracted regions in other scales, so that the prediction regions to be fused corresponding to each scale are obtained.
It should be noted that, when more than one initial prediction region is included in each scale, the electronic device may perform screening of the prediction regions to be fused in one round only for each scale, so that only one round of region fusion is performed for each scale. At this time, the electronic device randomly selects a prediction area to be fused for each scale from the initial prediction areas under each scale. The electronic device can also screen the prediction areas to be fused for a plurality of rounds according to each scale, so that the multi-round area fusion is carried out according to each scale. At this time, in each round, the electronic device selects a prediction region to be fused for each scale from the initial prediction regions that have not been selected at each scale.
S1012b, fusing at least two prediction areas to be fused to obtain a scale fusion area which corresponds to at least two scales together, and determining the scale fusion area as at least one candidate area.
The electronic equipment can directly carry out weighted fusion on at least two prediction areas to be fused according to the set weight to obtain a scale fusion area. The electronic device may also select, from initial control parameters corresponding to each prediction area to be fused, parameters capable of characterizing an observed evidence quantity corresponding to each prediction area to be fused (the observed evidence quantity of the prediction area to be fused is the observed evidence quantity of the initial prediction area), and then use a normalized value of the parameters as a weight to perform weighted fusion on the prediction areas to be fused of at least two scales, so as to obtain a scale fusion area. After one or more rounds of region fusion, the electronic device determines all the obtained scale fusion regions as candidate regions, so that at least one candidate region can be obtained.
It can be understood that the region prediction results of the target object in a plurality of scales are integrated in the scale fusion region, so that adverse effects of various factors on the candidate region during region prediction can be reduced, and the candidate region is more accurate.
S1012c, fusing at least two initial control parameters corresponding to at least two prediction areas to be fused to obtain a fused control parameter corresponding to the scale fusion area, and determining the fused control parameter as at least one distribution control parameter.
The electronic equipment extracts corresponding initial control parameters from all initial control parameters aiming at the to-be-fused prediction areas of each scale to obtain at least two initial control parameters corresponding to at least two to-be-fused prediction areas, and fuses the at least two control parameters to obtain fusion control parameters, wherein the fusion control parameters correspond to the scale fusion areas, and the electronic equipment directly fuses the control parameters to serve as at least one distribution control parameter corresponding to at least one candidate area.
It should be noted that, when the initial fusion parameters include a plurality of different parameters, the electronic device may perform corresponding summation on each parameter in the initial fusion parameters with different scales to complete parameter fusion, or may directly perform corresponding summation on some parameters, and add a certain tail term (for example, add a preset value or add a value determined by other parameters) to other parameters based on the corresponding summation, so as to complete parameter fusion.
The electronic equipment finishes the process of determining the candidate areas and the distribution control parameters thereof based on multi-scale fusion.
Referring to fig. 6, fig. 6 is a flowchart illustrating a target detection method according to an embodiment of the present application. In some embodiments of the present application, S1012 in fig. 4, that is, the specific implementation process of determining at least one candidate region and at least one distributed control parameter corresponding to the at least one candidate region based on N initial prediction regions and N initial control parameters may be implemented through S1012d-S1012e as follows:
s1012d, screening and obtaining reliable prediction areas under each scale from the N initial prediction areas according to the observation evidence quantity corresponding to each initial prediction area.
The electronic device may determine a corresponding observed evidence amount for each initial prediction area, then compare the observed evidence amount with an evidence amount threshold, screen the initial prediction areas where the observed evidence amount is greater than the evidence amount threshold, or sort the observed evidence amounts of all the initial prediction areas, screen the initial prediction areas corresponding to the observed evidence amounts at the head (e.g., the first 3, the first 1, etc.) of the sequence, and obtain a reliable prediction area. The electronic equipment performs the above process for the initial prediction area under each scale, so that a reliable prediction area under each scale can be obtained.
The observed evidence amount corresponding to each initial prediction region refers to the information amount of the feature obtained from the input image when each initial prediction region is determined. Among the initial control parameters, there are parameters associated with the amount of observation evidence from which the electronic device can determine the amount of observation evidence for each initial prediction area (e.g., can be obtained by looking up a table or directly using the parameters as the amount of observation evidence).
S1012e, determining the reliable prediction area under each scale as at least one candidate area, and determining an initial control parameter corresponding to the reliable prediction area in N initial control parameters as at least one distribution control parameter.
The electronic device determines the reliably predicted regions screened out for each scale as candidate regions, and thus, at least one candidate region can be obtained. Meanwhile, the electronic equipment screens initial control parameters corresponding to the reliable prediction areas from N initial control parameters, and determines the initial control parameters obtained by screening as distribution control parameters corresponding to each candidate area. In this way, the electronic device obtains at least one distributed control parameter.
It can be understood that the observed evidence quantity can reflect whether the initial prediction area is reliable to a certain extent, and for the prediction area with low reliability, no further reliability analysis value is provided, so in the embodiment of the application, the initial prediction area is initially screened according to the observed evidence quantity, and the unreliable initial prediction area can be eliminated, so that the number of candidate areas needing to be analyzed for reliability later is reduced.
The electronic device completes the process of determining candidate areas and corresponding distributed control parameters based on area screening.
In other embodiments of the present application, S101 in fig. 3, that is, the process of determining at least one candidate region in the input image for the target object and determining at least one distribution control parameter corresponding to the at least one candidate region, may be further implemented by the following processes: and carrying out region prediction on the target object in the input image by utilizing a region detection model to obtain at least one candidate region, carrying out similarity calculation on each candidate region and a preset region, and taking the control parameter of the region parameter of the preset region with the maximum similarity as the distribution control parameter of each candidate region.
S102, analyzing the reliability of each candidate region based on the distribution control parameters corresponding to each candidate region, and obtaining analysis information corresponding to each candidate region.
After obtaining the candidate areas and the distribution control parameters corresponding to the candidate areas, the electronic device can analyze the reliability of each candidate area by using the distribution control parameters corresponding to each candidate area. The reliability here, i.e. the uncertainty of the candidate region, accounts for the confidence level of the candidate region. The analysis information obtained by the electronic device can illustrate the source of uncertainty of the candidate region, namely whether the uncertainty of the candidate region is influenced by noise in the input image or the performance of the deep learning model for predicting the candidate region, and the corresponding influence degree. Thus, the resolution information is used to provide the impact of different factors on the reliability of the candidate region.
When probability distributions are different, the distribution control parameters corresponding to the candidate regions are also different, and thus, the manner of determining the analysis information corresponding to the candidate regions according to the distribution control parameters is also different.
In some embodiments of the application, the probability distribution includes: a normal inverse gamma distribution defined by a mean variable and a variance variable of the region parameters, in which case the distribution control parameters include: a central control parameter, a variance control parameter, a first variable control parameter, and a second variable control parameter.
The central control parameter is used for controlling the distribution center of the normal inverse gamma distribution, the variance control parameter is used for controlling the variance of the normal inverse gamma distribution, the first variable control parameter is used for controlling the concentration degree of variance variables in the normal inverse gamma distribution, and the second variable control parameter is used for controlling the concentration degree of mean variables of the normal inverse gamma distribution.
Based on this, referring to fig. 7, fig. 7 is a flowchart of a target detection method according to an embodiment of the present application. In some embodiments of the present application, S102 in fig. 3, that is, the process of analyzing reliability of each candidate region based on the distribution control parameter corresponding to each candidate region to obtain the analysis information corresponding to each candidate region, may be implemented by S1021-S1023 as follows:
s1021, determining a random influence factor of each candidate region based on the variance control parameter and the first variable control parameter.
The electronic device extracts a variance control parameter and a first variable control parameter from the distribution control parameters, estimates random uncertainty for each candidate region based on the two obtained parameters, and determines the obtained factors as random influence factors. Thus, the random influence factor is used to characterize the random uncertainty of the candidate region (random uncertainty describes the inherent noise in the input image that can cause unavoidable errors and cannot be attenuated by adding input data), i.e. the degree of influence of the candidate region on the performance of the deep learning model at the time of prediction.
In some embodiments, the specific process of determining the random impact factor for each candidate region based on the variance control parameter and the first variable control parameter may be implemented by: performing difference calculation on the first variable control parameter and a preset factor to obtain a variable difference; the ratio between the variance control parameter and the variable difference is determined as a random influence factor for each candidate region.
The value of the preset factor may be set according to the actual situation, for example, may be set to 1 or may be set to 3, which is not limited herein. If the preset factor is set to 1, the above calculation process is actually a calculation process of the variance of the mean variable of the normal inverse gamma distribution, and thus the resulting random influence factor is substantially the variance of the mean variable of the normal inverse gamma distribution.
Exemplary, the embodiment of the present application provides a determination formula of a random influence factor, see formula (1):
wherein, beta is the variance control parameter, alpha is the first variable control parameter, 1 is the preset factor, var [ mu ] is the random influence factor, that is, the variance of the mean variable.
In other embodiments, the specific process of determining the random impact factor for each candidate region based on the variance control parameter and the first variable control parameter may be implemented by: and searching a random factor table according to the variance control parameter and the first variable control parameter (different values of the random factors when the table records different variance control parameters and first variable control parameters), and determining the searched random factors as random influence factors of each candidate region.
S1022, determining the cognitive influence factor of each candidate region based on the variance control parameter, the first variable control parameter and the second variable control parameter.
The electronic device extracts the variance control parameter, the first variable control parameter and the second variable control parameter from the distribution control parameter at the same time, and combines the obtained three parameters to estimate the cognition uncertainty of each candidate region, and determines the obtained cognition uncertainty (the cognition uncertainty is caused by poor model performance, which may be caused by poor model training, insufficient training data and other reasons, is irrelevant to a single input image, and can be relieved by targeted adjustment) as a cognition influence factor of each candidate region.
In some embodiments, determining the cognitive impact factor for each candidate region based on the variance control parameter, the first variable control parameter, and the second variable control parameter may be accomplished by: performing difference calculation on the first variable control parameter and a preset factor to obtain a variable difference; determining the product of the variable difference value and the second variable control parameter as a variable product; and determining the ratio of the second variable control parameter to the variable product as the cognitive influence factor of each candidate region.
The value of the preset factor may be set according to the actual situation, which is not limited in the embodiment of the present application. If the preset factor is set to 1, the above calculation process is actually the expected calculation process of the variance variable of the normal inverse gamma distribution, and thus the obtained cognitive influence factor is essentially the expected variance variable of the normal inverse gamma distribution.
For example, the embodiment of the present application provides a calculation formula of a cognitive influence factor, see formula (2):
wherein, beta is a variance control parameter, alpha is a first variable control parameter, gamma is a second variable parameter, 1 is a preset factor, esigma 2 ]Is the cognitive influencing factor, i.e. the expectation of variance variables.
In other embodiments, determining the cognitive influence factor for each candidate region based on the variance control parameter, the first variable control parameter, and the second variable control parameter may also be accomplished by: screening a plurality of preset factors to obtain a plurality of initial influence factors corresponding to the variance control parameters; and screening the plurality of initial influence factors again based on the average value of the first variable control parameter and the second variable control parameter to obtain the cognitive influence factors. It should be noted that, in the embodiment of the present application, different values of the parameters correspond to different initial impact factors.
S1023, analyzing the reliability of each candidate region according to at least one of the random influence factors and the cognitive influence factors to obtain analysis information corresponding to each candidate region.
After the electronic device obtains the random influence factor and the cognitive influence factor, any one of the random influence factor and the cognitive influence factor can be directly determined as analysis information of the candidate region, or a weighted fusion result of the random influence factor and the cognitive influence factor can be determined as analysis information of the candidate region, so that reliability analysis of each candidate region is completed.
In other embodiments of the present application, the probability distribution includes a normal distribution and the distribution control parameters include a desired and a variance. In this case, S102 in fig. 3, that is, the process of analyzing the reliability of each candidate region based on the distribution control parameter corresponding to each candidate region to obtain the analysis information corresponding to each candidate region, may be implemented by the following processes: according to the expectation and the variance, respectively inquiring an information table, wherein different analysis information is stored in the information table; and determining the analysis information hit by the expectation and the variance as the analysis information corresponding to each candidate region.
The electronic device thus completes the reliability analysis process for each candidate region.
S103, determining a corresponding detection result for the target object based on at least one candidate region and at least one piece of analysis information.
After the electronic device obtains the analysis information corresponding to each candidate region, the electronic device determines and obtains a final detection result from the input image for the target object by screening at least one candidate region or fusing at least one candidate region according to the influence of different factors provided in the analysis information on the reliability of the candidate region, namely, the region where the target object is actually located.
In some embodiments of the application, the at least one candidate region comprises: the plurality of candidate regions, the at least one resolution information comprising: the specific process of S103 in fig. 3, that is, determining the corresponding detection result for the target object based on at least one candidate region and at least one piece of analysis information, may be implemented by: determining a corresponding reliability degree for each candidate region according to each piece of analysis information; and determining a candidate region with the greatest reliability degree from the plurality of candidate regions as a detection result corresponding to the target object.
Wherein, the analysis information may include at least one of a cognitive influence factor (a factor for describing cognitive uncertainty) and a random influence factor (a factor for describing random uncertainty) of the candidate region. When the cognitive influence factor and the random influence factor exist in the form of numerical values, the electronic device can take the reciprocal of any one of the cognitive influence factor and the random influence factor as the reliability degree, and can take the reciprocal of the summation result as the reliability degree by carrying out weighted summation on the cognitive influence factor and the random influence factor. When the cognitive influence factors and the random influence factors exist in a hierarchical form, the electronic device can provide a table of the corresponding relation between the influence factors and the reliability degree in a table look-up mode, and the reliability degree hit by at least one of the cognitive influence factors and the random influence factors is used as the reliability degree of each candidate region.
In other embodiments of the present application, the specific process of S103 in fig. 3, that is, determining the corresponding detection result for the target object based on at least one candidate region and at least one resolution information, may also be implemented by: the electronic equipment performs preset information matching on the plurality of pieces of analysis information (matching can be realized through similarity), and a candidate area corresponding to the analysis information matched with the preset information is determined as a final detection result of the target object. The electronic device thus completes the process of determining the final detection result for the target object.
It can be understood that, compared with modeling analysis of lack of uncertainty (such as cognitive uncertainty and random uncertainty) of a prediction frame for each scale in the related art, the problem that reliability source analysis of the prediction frame is lack and finally accuracy of a detection result of target detection is low is caused, in the embodiment of the application, the electronic device determines, besides determining a candidate region for a target object, distributed control parameters of probability distribution obeyed by regional parameters of the candidate region, performs modeling analysis on reliability of the candidate region through the distributed control parameters to obtain analysis information for defining influence of different factors on the candidate region, finally provides guidance for determining a final detection result based on the candidate region through the analysis information, reduces possibility of influencing the final detection result when the candidate region is predicted to be wrong, and improves detection accuracy of target detection.
In some embodiments of the present application, N initial prediction regions where the target object is located in the input image and N initial control parameters corresponding to the N initial prediction regions are predicted under M scales, which are implemented by an information prediction model. The information prediction model is obtained by training a training image with labels.
Fig. 8 is a schematic diagram of a training process of an information prediction model according to an embodiment of the present application. Referring to fig. 8, the information prediction model is obtained by:
s201, predicting to obtain a training prediction area where the training target object is located in the training image and training control parameters corresponding to the training prediction area under each of M scales through an initial detection model.
In the embodiment of the application, the initial detection model can be an untrained neural network model, or can be a neural network model which is pretrained by using untagged data.
S202, determining a loss value of each scale based on the training control parameters, the training prediction area and the labeling area of the training target object under each scale.
The training prediction area where the training target object is located in the training image is basically the same as the central control parameter in the training control parameters, so that the electronic device can directly take the central control parameter in the training control parameters as the training prediction area, thereby completing the prediction of the training prediction area.
In some embodiments, determining the loss value for each scale based on the training control parameters, the training prediction region, and the labeled region of the training target object at each scale may be accomplished by: and carrying out maximum likelihood estimation on each scale through training control parameters, training a prediction area and a labeling area of the training target object under each scale, and determining a maximum likelihood estimation value as a loss value of each scale.
Exemplary, the embodiment of the present application provides a calculation formula of a loss value (the calculation formula corresponds to a case where a probability distribution is a normal inverse gamma distribution), see formula (3):
wherein γ is the second variation in the training control parameterQuantity control parameter, alpha is the first variable control parameter in the training control parameters, y is the labeling area, delta is the central control parameter in the training control parameters, Ω=2β (1+γ),f (·) represents the gamma function, L is the loss value.
In other embodiments, determining the loss value for each scale based on the training control parameters, the training prediction region, and the labeled region of the training target object at each scale may also be accomplished by: and calculating the region difference between the training prediction region and the labeling region of the training target object, and adjusting the region difference through training control parameters to obtain the loss value of each scale.
And S203, updating parameters of the initial detection model by using the loss value of each scale until the model training ending condition is reached, and obtaining the information prediction model.
In the embodiment of the present application, the model training end condition may be that the loss value reaches the loss threshold, or that the iteration number reaches the number threshold during training.
The electronic equipment finishes the training process of the information prediction model, so that the initial prediction area and the initial control parameters can be predicted from the input image directly through the information prediction model conveniently.
In the following, an exemplary application of the embodiment of the present application in a practical application scenario will be described.
The embodiment of the application is realized in the scene of carrying out target identification on the medical image.
In the embodiment of the application, the deep learning model (called an information prediction model) outputs four parameters (delta, gamma, alpha, beta) of normal inverse gamma distribution obeyed by the center position and the frame length besides the center position and the frame length (called an initial prediction area) of an output target (called a target object), namely, the normal inverse gamma distribution controlled by the center position and the frame length obeying parameters (delta, gamma, alpha, beta), and then carries out uncertain estimation through the four parameters, and guides a subsequent frame screening process according to the uncertainty obtained by estimation.
It should be noted that the integral inverse gamma distribution in the embodiment of the present application is defined by the mean variable and the variance variable, and the mean variable follows the normal distribution, i.e., μ -N (δ, σ) 2 γ -1 ) Variance variable obeys inverse gamma distribution, i.e. sigma 2 ~Γ -1 (α,β)。
In training the deep learning model, the electronic device fits the loss value through maximum likelihood estimation at each scale, so that the loss value can be as shown in equation (3).
In the embodiment of the application, the uncertainty is estimated by using the concept of virtual observation in Bayesian learning. Thus, the random uncertainty in embodiments of the present application may be defined as the variance of the mean variable, which may be calculated from equation (1), and the cognitive uncertainty may be defined as the expectation of the variance variable, which may be calculated from equation (2).
In the embodiment of the application, two kinds of multi-scale cooperative target detection based on uncertainty are provided, wherein the detection comprises frame screening based on uncertainty and frame fusion based on uncertainty.
Frame screening is carried out based on uncertainty, namely a reliable frame with lower uncertainty (called a reliable prediction area) is obtained by trimming frames with higher uncertainty, and then a target detection frame with a corresponding scale (called a detection result) is screened according to the uncertainty of the reliable frame. In more detail, the electronic device firstly calculates the observed evidence quantity according to α and γ, for example, 2α+γ is used as the observed evidence quantity, then the observed evidence quantity is compared with a threshold τ, when 2α+γ < τ, the current frame of the current scale is determined to be unreliable, the frame is discarded, and the rest of reliable frames (called reliable prediction areas) are used as candidate frames; and then comparing the uncertainties of the candidate frames (called candidate areas) of all scales, namely comparing the uncertainties obtained by the estimation method, selecting a scale result with smaller uncertainties, simultaneously removing overlapped frames, and reserving the frame with the smallest uncertainty as a final detection result.
And (3) performing frame fusion based on uncertainty, namely fusing frames with multiple scales into a frame containing multi-scale information. At the time of fusion, how much of the amount of evidence is observed is taken as the balance weight at the time of fusion, which is because: if the amount of observation evidence of a certain scale is large, the uncertainty of the observation evidence is smaller, and the observation evidence should occupy larger specific gravity during fusion, otherwise, the specific gravity during fusion needs to be reduced. The specific fusion process can be achieved by the formula (4):
wherein the frame of the ith scale (called initial predicted frame) is written as delta i (because the frame is identical to the parameter delta in the normal inverse gamma distribution), the amount of observation evidence can be expressed as gamma i And alpha i (this is because the amount of evidence observed is proportional to α and γ), but because α is used i Will result in an unclosed solution (mean delta i Is from gamma i Obtained from virtual observations, alpha i Then represents the virtual observation when the variance is obtained, and therefore, only gamma is selected i As weights can also ensure the accuracy of fusion), only γ is used in the embodiments of the present application i As a weight.
Meanwhile, in order to ensure the subsequent uncertainty estimation capability, the electronic equipment also needs to perform closed solution fusion on the remaining parameters of the normal inverse gamma distribution, so that the finally fused frame still obeys the normal inverse gamma distribution. The remaining parameters, i.e., the manner of fusion of γ, α, β are provided in table 1.
TABLE 1
Finally, the electronic device performs uncertainty estimation on the fused frames (called scale fusion areas) based on the fused parameters, and performs removal of overlapped frames based on uncertainty, namely, the frame with the smallest uncertainty is reserved as a final detection result.
Thus, the electronic equipment can complete the target detection process of the medical image.
It will be appreciated that in embodiments of the present application that user information, such as data relating to input images, medical images, etc., is involved, when the embodiments of the present application are applied to a particular product or technology, user approval or consent is required, and the collection, use and processing of the relevant data is required to comply with relevant laws and regulations and standards of the relevant country and region.
Continuing with the description below of an exemplary architecture of the object detection device 255 implemented as a software module provided by embodiments of the present application, in some embodiments, as shown in fig. 2, the software modules stored in the object detection device 255 of the memory 250 may include:
an information determining module 2551, configured to determine at least one candidate region in an input image for a target object, and determine at least one distribution control parameter corresponding to at least one candidate region, where the distribution control parameter is a parameter for controlling a probability distribution to which a region parameter of the candidate region is subjected;
The reliability analysis module 2552 is configured to perform reliability analysis on each candidate region based on the distribution control parameter corresponding to each candidate region, so as to obtain analysis information corresponding to each candidate region; the analysis information is used for providing influences of different factors on the reliability of the candidate area;
the result determining module 2553 is configured to determine a corresponding detection result for the target object based on at least one candidate region and at least one resolution information.
In some embodiments of the present application, the information determining module 2551 is further configured to predict, under M scales, N initial prediction areas where the target object is located in the input image, and N initial control parameters corresponding to the N initial prediction areas; wherein M and N are positive integers, and N is more than or equal to M; and determining at least one candidate region and at least one distribution control parameter corresponding to the at least one candidate region based on the N initial prediction regions and the N initial control parameters.
In some embodiments of the application, the M dimensions include: at least two dimensions; the information determining module 2551 is further configured to extract, from N initial prediction areas, a prediction area to be fused for each scale; fusing at least two prediction areas to be fused to obtain a scale fusion area which corresponds to at least two scales together, and determining the scale fusion area as at least one candidate area; fusing at least two initial control parameters corresponding to at least two prediction areas to be fused to obtain fusion control parameters corresponding to the scale fusion areas, and determining the fusion control parameters as at least one distribution control parameter.
In some embodiments of the present application, the information determining module 2551 is further configured to screen out reliable prediction areas under each scale from N initial prediction areas according to the observed evidence amount corresponding to each initial prediction area; and determining the reliable prediction area under each scale as at least one candidate area, and determining an initial control parameter corresponding to the reliable prediction area in N initial control parameters as at least one distribution control parameter.
In some embodiments of the application, at least one of the candidate regions comprises: the plurality of candidate regions, at least one of the resolution information includes: a plurality of the parsing information; the result determining module 2553 is further configured to determine, according to each piece of resolution information, a corresponding reliability degree for each candidate region; and determining the candidate region with the largest reliability degree in the candidate regions as the detection result corresponding to the target object.
In some embodiments of the application, the probability distribution includes: a normal inverse gamma distribution defined by a mean variable and a variance variable of the region parameter; the distribution control parameters include: a central control parameter, a variance control parameter, a first variable control parameter, and a second variable control parameter; the central control parameter is used for controlling the distribution center of the normal inverse gamma distribution, the variance control parameter is used for controlling the variance of the normal inverse gamma distribution, the first variable control parameter is used for controlling the concentration degree of variance variables in the normal inverse gamma distribution, and the second variable control parameter is used for controlling the concentration degree of mean variables of the normal inverse gamma distribution.
In some embodiments of the present application, the reliability parsing module 2552 is further configured to determine a random impact factor for each of the candidate regions based on the variance control parameter and the first variable control parameter; determining a cognitive influence factor for each of the candidate regions based on the variance control parameter, the first variable control parameter, and the second variable control parameter; and analyzing the reliability of each candidate region according to at least one of the random influence factor and the cognitive influence factor to obtain the analysis information corresponding to each candidate region.
In some embodiments of the present application, the reliability analysis module 2552 is further configured to perform a difference calculation for the first variable control parameter and a preset factor to obtain a variable difference; and determining a ratio between the variance control parameter and the variable difference as the random influence factor of each candidate region.
In some embodiments of the present application, the reliability analysis module 2552 is further configured to perform a difference calculation for the first variable control parameter and a preset factor to obtain a variable difference; determining the product of the variable difference value and the second variable control parameter as a variable product; and determining the ratio of the second variable control parameter to the variable product as the cognitive influence factor of each candidate region.
In some embodiments of the present application, the object detection device 255 further includes: model training module 2554, configured to predict, by using an initial detection model, a training prediction area where a training target object is located in a training image and training control parameters corresponding to the training prediction area under each of M scales; determining a loss value of each scale based on the training control parameter, the training prediction area and the labeling area of the training target object under each scale; and updating the parameters of the initial detection model by using the loss value of each scale until the model training ending condition is reached, so as to obtain the information prediction model.
In some embodiments of the present application, the model training module 2554 is further configured to perform a maximum likelihood estimation for each scale by using the training control parameter, the training prediction area, and the labeling area of the training target object at each scale, and determine a maximum likelihood estimation value as the loss value of each scale.
Embodiments of the present application provide a computer program product comprising a computer program or computer-executable instructions stored in a computer-readable storage medium. The processor of the electronic device reads the computer-executable instructions from the computer-readable storage medium, and the processor executes the computer-executable instructions, so that the electronic device executes the target detection method according to the embodiment of the application.
Embodiments of the present application provide a computer-readable storage medium having stored therein computer-executable instructions which, when executed by a processor, cause the processor to perform an object detection method provided by embodiments of the present application, for example, an object detection method as shown in fig. 3.
In some embodiments, the computer readable storage medium may be FRAM, ROM, PROM, EP ROM, EEPROM, flash memory, magnetic surface memory, optical disk, or CD-ROM; but may be a variety of devices including one or any combination of the above memories.
In some embodiments, computer-executable instructions may be written in any form of programming language, including compiled or interpreted languages, or declarative or procedural languages, in the form of programs, software modules, scripts, or code, and they may be deployed in any form, including as stand-alone programs or as modules, components, subroutines, or other units suitable for use in a computing environment.
As an example, computer-executable instructions may, but need not, correspond to files in a file system, may be stored as part of a file that holds other programs or data, such as in one or more scripts in a hypertext markup language (HTML, hyper Text Markup Language) document, in a single file dedicated to the program in question, or in multiple coordinated files (e.g., files that store one or more modules, sub-programs, or portions of code).
As an example, computer-executable instructions may be deployed to be executed on one electronic device or on multiple electronic devices located at one site or, alternatively, on multiple electronic devices distributed across multiple sites and interconnected by a communication network.
In summary, according to the embodiment of the application, the electronic device determines, in addition to the candidate region for the target object, the distribution control parameter of the probability distribution to which the region parameter of the candidate region is subjected, and performs modeling analysis on the reliability of the candidate region through the distribution control parameter, so as to obtain analysis information for defining the influence of different factors on the candidate region, and finally provides guidance for the process of determining the final detection result based on the candidate region through the analysis information, so that the possibility of influencing the final detection result when the candidate region is in error prediction is reduced, and the detection accuracy of the target detection is improved.
The foregoing is merely exemplary embodiments of the present application and is not intended to limit the scope of the present application. Any modification, equivalent replacement, improvement, etc. made within the spirit and scope of the present application are included in the protection scope of the present application.

Claims (15)

1. A method of target detection, the method comprising:
determining at least one candidate region in an input image aiming at a target object, and determining at least one distribution control parameter corresponding to at least one candidate region, wherein the distribution control parameter is a parameter for controlling probability distribution obeyed by the region parameter of the candidate region;
analyzing the reliability of each candidate region based on the distribution control parameter corresponding to each candidate region to obtain analysis information corresponding to each candidate region; the analysis information is used for providing influences of different factors on the reliability of the candidate area;
and determining a corresponding detection result for the target object based on at least one candidate region and at least one piece of analysis information.
2. The method of claim 1, wherein determining at least one candidate region in the input image for the target object and determining at least one distribution control parameter corresponding to at least one of the candidate regions comprises:
under M scales, predicting N initial prediction areas where the target object is located in the input image and N initial control parameters corresponding to the N initial prediction areas; wherein M and N are positive integers, and N is more than or equal to M;
And determining at least one candidate region and at least one distribution control parameter corresponding to the at least one candidate region based on the N initial prediction regions and the N initial control parameters.
3. The method of claim 2, wherein the M scales comprise: at least two dimensions; the determining at least one candidate region and at least one distribution control parameter corresponding to the at least one candidate region based on the N initial prediction regions and the N initial control parameters includes:
extracting to-be-fused prediction areas from N initial prediction areas respectively aiming at each scale;
fusing at least two prediction areas to be fused to obtain a scale fusion area which corresponds to at least two scales together, and determining the scale fusion area as at least one candidate area;
fusing at least two initial control parameters corresponding to at least two prediction areas to be fused to obtain fusion control parameters corresponding to the scale fusion areas, and determining the fusion control parameters as at least one distribution control parameter.
4. The method of claim 2, wherein the determining at least one of the candidate regions and at least one of the distributed control parameters corresponding to the at least one candidate region based on the N initial prediction regions and the N initial control parameters comprises:
screening and obtaining reliable prediction areas under each scale from N initial prediction areas according to the observation evidence quantity corresponding to each initial prediction area;
and determining the reliable prediction area under each scale as at least one candidate area, and determining an initial control parameter corresponding to the reliable prediction area in N initial control parameters as at least one distribution control parameter.
5. The method of any one of claims 1 to 4, wherein at least one of the candidate regions comprises: the plurality of candidate regions, at least one of the resolution information includes: a plurality of the parsing information;
the determining a corresponding detection result for the target object based on at least one candidate region and at least one piece of analysis information includes:
determining a corresponding reliability degree for each candidate region according to each piece of analysis information;
And determining the candidate region with the largest reliability degree in the candidate regions as the detection result corresponding to the target object.
6. The method according to any one of claims 1 to 4, wherein the probability distribution comprises: a normal inverse gamma distribution defined by a mean variable and a variance variable of the region parameter;
the distribution control parameters include: a central control parameter, a variance control parameter, a first variable control parameter, and a second variable control parameter;
the central control parameter is used for controlling the distribution center of the normal inverse gamma distribution, the variance control parameter is used for controlling the variance of the normal inverse gamma distribution, the first variable control parameter is used for controlling the concentration degree of variance variables in the normal inverse gamma distribution, and the second variable control parameter is used for controlling the concentration degree of mean variables of the normal inverse gamma distribution.
7. The method of claim 6, wherein the analyzing the reliability of each candidate region based on the distribution control parameter corresponding to each candidate region to obtain the analysis information corresponding to each candidate region includes:
Determining a random influence factor for each of the candidate regions based on the variance control parameter and the first variable control parameter;
determining a cognitive influence factor for each of the candidate regions based on the variance control parameter, the first variable control parameter, and the second variable control parameter;
and analyzing the reliability of each candidate region according to at least one of the random influence factor and the cognitive influence factor to obtain the analysis information corresponding to each candidate region.
8. The method of claim 7, wherein the determining a random impact factor for each of the candidate regions based on the variance control parameter and the first variable control parameter comprises:
performing difference calculation on the first variable control parameter and a preset factor to obtain a variable difference;
and determining a ratio between the variance control parameter and the variable difference as the random influence factor of each candidate region.
9. The method of claim 7, wherein the determining the cognitive influence factor for each of the candidate regions based on the variance control parameter, the first variable control parameter, and the second variable control parameter comprises:
Performing difference calculation on the first variable control parameter and a preset factor to obtain a variable difference;
determining the product of the variable difference value and the second variable control parameter as a variable product;
and determining the ratio of the second variable control parameter to the variable product as the cognitive influence factor of each candidate region.
10. The method according to claim 2, wherein N initial prediction areas where the target object is located in the input image and N initial control parameters corresponding to the N initial prediction areas are predicted and obtained under M scales, and the N initial control parameters are implemented by an information prediction model; the information prediction model is obtained through the following steps:
predicting a training prediction area where a training target object is located in a training image and training control parameters corresponding to the training prediction area under each of M scales by an initial detection model;
determining a loss value of each scale based on the training control parameter, the training prediction area and the labeling area of the training target object under each scale;
and updating the parameters of the initial detection model by using the loss value of each scale until the model training ending condition is reached, so as to obtain the information prediction model.
11. The method of claim 10, wherein the determining a loss value for each scale based on the training control parameters, the training prediction region, and the labeled region of the training target object at each scale comprises:
and carrying out maximum likelihood estimation on each scale through the training control parameters, the training prediction area and the labeling area of the training target object under each scale, and determining a maximum likelihood estimation value as the loss value of each scale.
12. An object detection device, the device comprising:
the information determining module is used for determining at least one candidate region in the input image aiming at the target object and determining at least one distribution control parameter corresponding to the at least one candidate region, wherein the distribution control parameter refers to a parameter for controlling probability distribution obeyed by the region parameter of the candidate region;
the reliability analysis module is used for analyzing the reliability of each candidate region based on the distribution control parameter corresponding to each candidate region to obtain analysis information corresponding to each candidate region; the analysis information is used for providing influences of different factors on the reliability of the candidate area;
And the result determining module is used for determining a corresponding detection result for the target object based on at least one candidate region and at least one piece of analysis information.
13. An electronic device, the electronic device comprising:
a memory for storing computer executable instructions;
a processor for implementing the object detection method of any one of claims 1 to 11 when executing computer-executable instructions stored in the memory.
14. A computer readable storage medium storing computer executable instructions which when executed by a processor implement the object detection method of any one of claims 1 to 11.
15. A computer program product comprising a computer program or computer-executable instructions which, when executed by a processor, implements the object detection method of any one of claims 1 to 11.
CN202310456122.0A 2023-04-18 2023-04-18 Target detection method, device, equipment, storage medium and program product Pending CN116977698A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310456122.0A CN116977698A (en) 2023-04-18 2023-04-18 Target detection method, device, equipment, storage medium and program product

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310456122.0A CN116977698A (en) 2023-04-18 2023-04-18 Target detection method, device, equipment, storage medium and program product

Publications (1)

Publication Number Publication Date
CN116977698A true CN116977698A (en) 2023-10-31

Family

ID=88475587

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310456122.0A Pending CN116977698A (en) 2023-04-18 2023-04-18 Target detection method, device, equipment, storage medium and program product

Country Status (1)

Country Link
CN (1) CN116977698A (en)

Similar Documents

Publication Publication Date Title
CN109978893B (en) Training method, device, equipment and storage medium of image semantic segmentation network
CN108171260B (en) Picture identification method and system
CN109492222B (en) Intention identification method and device based on concept tree and computer equipment
CN111046980A (en) Image detection method, device, equipment and computer readable storage medium
CN112256537B (en) Model running state display method and device, computer equipment and storage medium
CN110705573A (en) Automatic modeling method and device of target detection model
CN112182362A (en) Method and device for training model for online click rate prediction and recommendation system
CN112990294A (en) Training method and device of behavior discrimination model, electronic equipment and storage medium
CN111985518A (en) Door and window detection method and model training method and device thereof
CN111611390B (en) Data processing method and device
CN112818162A (en) Image retrieval method, image retrieval device, storage medium and electronic equipment
CN111178656A (en) Credit model training method, credit scoring device and electronic equipment
CN115187772A (en) Training method, device and equipment of target detection network and target detection method, device and equipment
CN112749737A (en) Image classification method and device, electronic equipment and storage medium
CN112884569A (en) Credit assessment model training method, device and equipment
CN112420125A (en) Molecular attribute prediction method and device, intelligent equipment and terminal
CN110490058B (en) Training method, device and system of pedestrian detection model and computer readable medium
CN117036843A (en) Target detection model training method, target detection method and device
CN110069997B (en) Scene classification method and device and electronic equipment
CN113762303B (en) Image classification method, device, electronic equipment and storage medium
CN113919432A (en) Classification model construction method, data classification method and device
CN114692889A (en) Meta-feature training model for machine learning algorithm
CN113239883A (en) Method and device for training classification model, electronic equipment and storage medium
CN113705293A (en) Image scene recognition method, device, equipment and readable storage medium
CN117036834A (en) Data classification method and device based on artificial intelligence and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication