WO2019227294A1 - 图像处理方法、相关设备及计算机存储介质 - Google Patents

图像处理方法、相关设备及计算机存储介质 Download PDF

Info

Publication number
WO2019227294A1
WO2019227294A1 PCT/CN2018/088758 CN2018088758W WO2019227294A1 WO 2019227294 A1 WO2019227294 A1 WO 2019227294A1 CN 2018088758 W CN2018088758 W CN 2018088758W WO 2019227294 A1 WO2019227294 A1 WO 2019227294A1
Authority
WO
WIPO (PCT)
Prior art keywords
image
training
intensity
feature
pixel
Prior art date
Application number
PCT/CN2018/088758
Other languages
English (en)
French (fr)
Inventor
冯柏岚
姚春凤
黄凯奇
张彰
陈晓棠
黄厚景
李党伟
Original Assignee
华为技术有限公司
中国科学院自动化研究所
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司, 中国科学院自动化研究所 filed Critical 华为技术有限公司
Priority to PCT/CN2018/088758 priority Critical patent/WO2019227294A1/zh
Publication of WO2019227294A1 publication Critical patent/WO2019227294A1/zh
Priority to US17/039,544 priority patent/US11836619B2/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/774Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformations in the plane of the image
    • G06T3/40Scaling of whole images or parts thereof, e.g. expanding or contracting
    • G06T3/4007Scaling of whole images or parts thereof, e.g. expanding or contracting based on interpolation, e.g. bilinear interpolation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/50Context or environment of the image
    • G06V20/52Surveillance or monitoring of activities, e.g. for recognising suspicious objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/047Probabilistic or stochastic networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N5/00Computing arrangements using knowledge-based models
    • G06N5/04Inference or reasoning models
    • G06N5/046Forward inferencing; Production systems

Definitions

  • the present invention relates to the field of image processing technologies, and in particular, to an image processing method, a related device, and a computer storage medium.
  • pedestrian re-recognition tasks are mainly affected by occlusion, changes in perspective, large similarities in features such as different pedestrian clothing and body shapes, etc., resulting in low accuracy of pedestrian recognition using traditional models.
  • the generalization performance of the model is not high, and the accuracy of pedestrian recognition using the model is not high.
  • the existing expansion methods of training data mainly include randomly flipping images, randomly cropping multiple image regions, randomly disturbing pixel values of the images, and the like.
  • the existing data expansion method is to expand the training data from the surface form, which does not improve the accuracy of the model.
  • the embodiment of the invention discloses an image processing method, related equipment and a computer storage medium, which can solve the problem that the accuracy of the model is not high due to the limitation of the training data in the prior art.
  • an embodiment of the present invention provides an image processing method.
  • the method includes:
  • a region to be blocked in the training image is blocked using a preset window to obtain a new image; wherein the region to be blocked includes pixels to be blocked, and the new image is used for Update image recognition model.
  • the terminal device determines an area to be blocked in the training image according to the feature intensity image. Blocking the area to be blocked using a preset window to obtain a new image.
  • the area to be blocked includes one or more pixels to be blocked.
  • the preset window is custom set by the user side or the system side, and the attribute characteristics such as the size and shape of the preset window are not limited.
  • the preset window may be a matrix frame, a diamond shape, a fan shape, or the like.
  • using a preset window to occlude a region to be blocked in the training image to obtain a new image includes: according to pixel points in the feature intensity image Determine the mapping pixel point, which is a pixel point in the characteristic intensity image whose intensity value satisfies a preset condition; using a preset window to occlude the pixel to be blocked to obtain a new image ; Wherein the pixel to be blocked is a pixel corresponding to the mapped pixel in the training image.
  • the preset condition may be set by the user or the system. For example, when the pixel value of a pixel is larger, it indicates that the pixel is more important for image recognition.
  • the preset condition may be that the intensity value is greater than or equal to the first preset intensity, that is, the selected intensity value is greater than Or a pixel point equal to a first preset intensity is used as the mapping pixel point.
  • the preset condition may be that the intensity value is less than or equal to the second preset intensity, that is, the intensity value is selected. Pixel points that are less than or equal to a second preset intensity are used as the mapping pixel points.
  • the mapped pixels are obtained using a polynomial sampling algorithm.
  • the terminal device may determine the mapped pixel point from the characteristic intensity image according to a polynomial sampling algorithm and an intensity value of each pixel point in the characteristic intensity image.
  • the number of mapped pixels is multiple, and the to-be-occluded pixels include any one or more pixels in the training image corresponding to the mapped pixels . That is, the pixels to be blocked may correspond to the mapped pixels one-to-one, or may not support one-to-one correspondence.
  • the acquiring the feature intensity image corresponding to the training image includes:
  • the recognition scores are used to reflect the area blocked by the sliding window in the occluded image to identify the Important intensity of training images;
  • a feature intensity image corresponding to the training image is determined.
  • the image interpolation algorithm includes, but is not limited to, any one of the following: a bilinear interpolation algorithm, a Langzos interpolation algorithm, a cubic convolution interpolation algorithm, a nearest neighbor interpolation algorithm, a piecewise linear interpolation algorithm, and other Algorithm used for image interpolation.
  • determining the feature intensity image corresponding to the training image according to the image interpolation algorithm and the respective recognition scores of the m occlusion images includes:
  • the intensity value of each pixel in the characteristic intensity image is determined, thereby obtaining the characteristic intensity image.
  • determining the feature intensity image corresponding to the training image according to the image interpolation algorithm and the respective recognition scores of the m occlusion images includes:
  • the recognition score of the training image is obtained by inputting the training image into the image recognition model.
  • the acquiring the feature intensity image corresponding to the training image includes:
  • a feature intensity image corresponding to the training image is determined.
  • the acquiring the feature intensity image corresponding to the training image includes:
  • the resolution of the feature image is less than the resolution of the training image
  • a feature intensity image corresponding to the training image is obtained.
  • performing feature extraction on the training image to obtain a corresponding feature image includes:
  • the resolution of the down-sampling image is the same as the resolution of the feature image
  • the pixel to be blocked includes at least two pixels including a first pixel and a second pixel, and a distance between the first pixel and the second pixel Greater than or equal to a preset first distance,
  • the occlusion of the pixel to be occluded by using a preset window to obtain a new image includes any one of the following:
  • a preset first window is used to occlude the first pixel point in the training image
  • a preset second window is used to occlude the second pixel point in the training image to obtain a new image.
  • the training image is an image from a new image last used to update the image recognition model.
  • the number of the training images is multiple
  • the acquiring the feature intensity image corresponding to the training image includes:
  • the occlusion of a region to be occluded in the training image according to the feature intensity image to obtain a new image includes:
  • the method further includes:
  • the training image and the new image have the same label information, and the label information is used to indicate an object included in the image, or a category to which the object belongs.
  • an embodiment of the present invention provides another image processing method, where the method includes:
  • the area to be blocked is determined according to a feature intensity image corresponding to the training image, and the new image is used to update an image recognition model.
  • the area to be blocked includes pixels to be blocked.
  • the method before the using a preset window to occlude the area to be blocked in the training image to obtain a new image, the method further includes: obtaining a feature intensity image corresponding to the training image, where pixels in the feature intensity image The intensity value of the point is used to indicate the important intensity of the pixel pair to identify the training image, and the resolution of the training image is the same as the resolution of the characteristic intensity image.
  • an embodiment of the present invention provides another image processing method (model training method), where the method includes:
  • the training image is an image from a new image last used to update the image recognition model.
  • an embodiment of the present invention provides another image processing method (model use method), and the method includes:
  • the image recognition model is obtained by training using multiple new images, and any new image among the multiple new images is obtained by using a preset window to occlude a region to be blocked in the training image.
  • the method before the inputting the image to be processed into the image recognition model, the method further includes: obtaining the image recognition model.
  • an embodiment of the present invention provides a terminal device, where the terminal device includes a functional unit for executing the method according to any one of the first to fourth aspects.
  • an embodiment of the present invention provides another terminal device, including a memory and a processor coupled to the memory; the memory is used to store instructions, and the processor is used to execute the instructions; When the processor executes the instructions, the method described in any one of the first to fourth aspects is performed.
  • the terminal device further includes a display coupled to the processor, and the display is configured to display an image (such as a training image and a feature intensity image) under the control of the processor.
  • a display coupled to the processor, and the display is configured to display an image (such as a training image and a feature intensity image) under the control of the processor.
  • the terminal device further includes a communication interface that communicates with the processor, and the communication interface is used to communicate with other devices (such as a server, etc.) under the control of the processor. .
  • a computer-readable storage medium stores program code.
  • the program code includes instructions for performing the method described in any one of the first to fourth aspects.
  • the problem that the accuracy of the model is not high due to the limitation of the training data in the prior art can be solved, thereby improving the accuracy of the model.
  • FIG. 1 is a schematic flowchart of an image processing method according to an embodiment of the present invention.
  • FIG. 2A and FIG. 2B are schematic diagrams of several types of occlusion images provided by embodiments of the present invention.
  • FIG. 3 is a schematic flowchart of a method for obtaining a characteristic intensity image according to an embodiment of the present invention.
  • FIG. 4 is a schematic diagram of image occlusion provided by an embodiment of the present invention.
  • 5-6 are schematic flowcharts of two other methods for obtaining characteristic intensity images according to the embodiments of the present invention.
  • FIG. 7A and 7B are schematic structural diagrams of two types of terminal devices according to an embodiment of the present invention.
  • FIG. 1 is an image processing method according to an embodiment of the present invention.
  • the method shown in FIG. 1 includes the following implementation steps:
  • Step S102 The terminal device obtains a characteristic intensity image corresponding to the training image, and the intensity value of the pixel in the characteristic intensity image is used to indicate the importance of the pixel to identify the training image, and the resolution and Resolutions of the characteristic intensity images are the same;
  • the characteristic intensity image refers to an image in which the intensity (intensity value) of each scene (or point) pair in the scene is used to identify the scene as the pixel value. That is, the pixel value of a pixel point in the characteristic intensity image is an intensity value, which is used to reflect / indicate the importance or importance of the pixel point to the original image (here, the training image) corresponding to the characteristic intensity image.
  • how the intensity value of the pixel point reflects the importance of the pixel point to the recognition training image may be specifically set by the user side or the system side. For example, the greater the intensity value of a pixel, the greater the importance of the pixel to the recognition of the training image; or the greater the intensity value of the pixel, the less importance of the pixel to the recognition of the training image and many more.
  • Step S104 The terminal device uses a preset window to occlude the area to be occluded in the training image according to the feature intensity image to obtain a new image; wherein the area to be occluded includes pixels to be occluded,
  • the new image is used to train and update the image recognition model.
  • the terminal device may determine a region to be blocked in the training image according to the feature intensity image. Then, the area to be blocked is occluded by using a preset window to obtain a new image. The details are described below.
  • the preset window is a user-defined window or a system-defined window.
  • the size and shape of the preset window are not limited in this application.
  • the preset window may be a rectangular window, a triangular window, a fan window, a diamond window, or the like.
  • the terminal device may obtain the feature intensity image corresponding to the training image based on a preset feature intensity image acquisition method, or the terminal device
  • the training image and the feature intensity image corresponding to the training training can be directly obtained from other devices (such as a server) through a network.
  • the feature intensity acquisition method may be a user- or system-side custom setting method, which may include, but is not limited to, obtaining a feature intensity image corresponding to a training image based on sliding window occlusion, and obtaining a feature intensity image corresponding to a training image based on gradient 2.
  • the feature intensity image corresponding to the training image is obtained based on a class activation mapping (CAM) algorithm or other implementation methods for obtaining the feature intensity image.
  • CAM class activation mapping
  • the terminal device may determine the area to be blocked in the training image according to the intensity value of each pixel in the characteristic intensity image.
  • the area to be blocked includes one or more pixels to be blocked. Further, the to-be-occluded area in the training image is occluded by using a preset window to obtain a new image. The new image is used for training and updating the image recognition model.
  • the terminal device may determine a pixel point whose intensity value meets a preset condition from the characteristic intensity image as a mapping pixel point according to a preset rule. Further, a pixel point corresponding to the mapped pixel point is determined in the training image as the pixel point to be blocked. An area composed of a plurality of pixels to be blocked may be referred to as an area to be blocked.
  • the preset rule is a custom-defined rule on the user side or the system side, and the preset rule is associated with the preset condition. For example, the greater the intensity value of the pixel point, it indicates that the pixel point is related to the image.
  • a pixel point with an intensity value greater than or equal to a first preset intensity may be selected from the characteristic intensity image as the mapped pixel point.
  • an intensity value with an intensity value less than or equal to a second preset intensity may be selected from the characteristic intensity image.
  • the first preset intensity and the second preset intensity may be intensity thresholds custom-set by the user side or the system side, and they may be the same or different, which is not limited in this application.
  • the number of the mapped pixels is not limited in this application, and may be one or more.
  • the number of the pixels to be blocked may also be one or more.
  • the to-be-occluded pixels and the mapped pixels may correspond one-to-one, or may not support one-to-one correspondence. That is, the pixel to be blocked may be a pixel corresponding to the mapped pixel in the training image, or may be any one or more of the pixels corresponding to the mapped pixel in the training image. Each.
  • the terminal device may further adopt a setting algorithm and combine the intensity value of each pixel point in the characteristic intensity image to obtain the mapped pixel point from the characteristic intensity image.
  • the setting algorithm is custom set by the user side or the system side, and is used to obtain a mapping pixel point that satisfies a preset condition from the feature intensity image.
  • the characteristic intensity image is composed of n pixels.
  • the intensity value of a pixel is represented by Q i , where i is a positive integer less than or equal to n.
  • Q i the intensity value of a pixel
  • the terminal device may first use the following formula (1) to perform normalization processing on the intensity values of the n pixel points to obtain the intensity value R i of the pixel point represented by the probability.
  • the probability value also indicates the probability or priority of the pixel selected as the mapped pixel.
  • the terminal device may select a pixel point j that satisfies a preset condition from the n pixel points according to the intensity value R i of each pixel point as the mapped pixel point.
  • the terminal device may calculate R i by using the following formula (2):
  • the terminal device may perform multiple samplings to obtain multiple mapped pixels according to the sampling principle of the polynomial algorithm described above.
  • the terminal device may use a preset window to block a region to be blocked in the training image to obtain a new image.
  • the resolution (or size) of the area to be blocked is greater than or equal to a preset resolution (or size).
  • the terminal device may use the rectangular frame of 64 * 64 to slide and block the area to be blocked in the training image, Get multiple new images.
  • the illustration is limited by the size of the area to be occluded, and two new images are obtained. This is only an example and does not constitute a limitation.
  • the area to be blocked includes pixels to be blocked, and the terminal device may use a preset window to block the pixels to be blocked in the training image to obtain a new image.
  • a terminal device may use the pixel A to be blocked as a center and use a 64 * 64
  • the rectangular frame occludes the training image to obtain a new image.
  • the pixel value of a pixel in an image area blocked by the preset window may be represented or replaced by a preset pixel value, for example, a grayscale pixel value, 0 or 255, and the like. Pixel values of pixels in the training image that are not blocked by the preset window do not change. Accordingly, a new image can be formed / obtained.
  • the number of pixels to be blocked is not limited in this application, and may be one or more.
  • the terminal device may use a preset window to block the pixels to be blocked to obtain a new image.
  • the terminal device may use a preset window to block part of the pixels to be blocked, and the remaining pixels are not blocked to obtain a new image.
  • the terminal device may use a preset window to uniformly block the pixels to be blocked to obtain a new Image.
  • the terminal device may use multiple preset windows to block the plurality of pixels to be blocked, respectively, to obtain a new image.
  • At least two pixel points including the first pixel point and the second pixel point of the plurality of pixel points to be blocked are taken as an example. If the distance between the first pixel point and the second pixel point is close, for example, less than or equal to a preset distance (for example, 5 cm), a preset window may be used to simultaneously The second pixel is occluded to obtain a new image.
  • a preset distance for example, 5 cm
  • the terminal device may use a preset window to block any one of the first pixel point and the second pixel point to obtain a new image.
  • the terminal device may use two preset windows to block the first pixel point and the second pixel point respectively to obtain a new image.
  • a first pixel point may be blocked using a preset first window
  • a second pixel point may be blocked using a preset second window, thereby obtaining a new image.
  • the preset first window and the preset second window may be the same or different, which is not limited in this application.
  • a feature intensity image corresponding to a training image is obtained based on a sliding window.
  • FIG. 3 a schematic flowchart of a method for obtaining a characteristic intensity image based on sliding window occlusion is provided. The method shown in FIG. 3 includes the following implementation steps:
  • Step S202 The terminal device occludes the training image by using a sliding window to obtain m occluded images, where m is a positive integer.
  • the sliding window may be a user-defined window or a system-defined window, and its size, shape, and other attribute characteristics are not limited.
  • Step S204 The terminal device inputs the m occluded images into the image recognition model, and obtains respective recognition scores of the m occluded images.
  • Step S206 The terminal device determines a feature intensity image corresponding to the training image according to an image interpolation algorithm and a respective recognition score of the m occlusion images.
  • the terminal device may randomly block the training image m times using a sliding window to obtain m occluded images.
  • the terminal device may use a sliding window to traverse and occlude the training image to obtain m occluded images.
  • the sliding window may be moved in the training image according to a set moving line.
  • the set mobile line can be customized by the user or the system, such as moving from left to right and from top to bottom in a fixed step (1 pixel, etc.).
  • the fixed step length of the sliding window moving in the horizontal direction and the vertical direction may be different or the same, which is not limited in this application.
  • the fixed step size can be set to be larger, such as 10 pixels and so on.
  • FIG. 4 a schematic diagram of traversing an occlusion training image using a sliding window is shown in FIG. 4.
  • a rectangular frame is used as a sliding window, and m occlusion images can be obtained after traversing the training image in the order from left to right and from top to bottom.
  • the terminal device may input the m occluded images into a trained image recognition model to obtain respective recognition scores of the m occluded images.
  • the image recognition model may specifically be a previously trained image recognition model or a first trained initial image recognition model. The training (or iterative training) of the image recognition model will be described in detail later in this application.
  • the terminal device may also input the training image into the image recognition model, and obtain the recognition score of the training image for subsequent obtaining the feature intensity image corresponding to the training image.
  • the terminal device may determine the feature intensity image of the training image according to the image interpolation algorithm and the respective recognition scores of the m occlusion images.
  • the terminal device may use the recognition score of each of the m occlusion images as the recognition score of the m occlusion regions in the training image.
  • the occlusion area is an area occluded by the sliding window in the occlusion image.
  • the recognition score is used to reflect the importance of the occlusion region to the recognition of the training image.
  • m occlusion images can be obtained, that is, m occlusion regions.
  • the respective recognition scores of the m occlusion images obtained subsequently that is, the respective recognition scores of the m occlusion regions.
  • the greater the recognition score of the occluded area the smaller the degree of importance of the occluded area for identifying the training image.
  • the smaller the recognition score of the occlusion area the greater the importance of the occlusion area for identifying the training image.
  • the terminal device may determine the intensity values of m pixel points in the characteristic intensity image correspondingly according to the recognition scores of the m occlusion regions.
  • the terminal device may consider the occlusion region as a pixel.
  • the center point of the occlusion region is regarded as one pixel point.
  • the recognition score of the occlusion region is directly used as the intensity value of the pixel; or the recognition score of the occlusion region is preprocessed, and the processing result is used as the intensity value of the pixel.
  • the pre-processing is customized on the user side or the system side, for example, normalization processing, preset scaling processing, and the like, which are not described and limited in this application.
  • the terminal device may determine the intensity values of m pixels in the characteristic intensity image corresponding to the training image according to the recognition scores of m occlusion regions in the training image.
  • the terminal device may obtain an intensity value of each pixel point in the characteristic intensity image according to an image interpolation algorithm and intensity values of m pixels in the characteristic intensity image, thereby obtaining the characteristic intensity image.
  • the terminal device may use an image interpolation algorithm and intensity values of m pixels in the characteristic intensity image to perform image interpolation to obtain the intensity value of each pixel constituting the characteristic intensity image, thereby obtaining the The characteristic intensity image is described.
  • the image interpolation algorithm can be pre-defined by the user side or the system side, for example, a bilinear interpolation algorithm, a Langezos interpolation algorithm, a cubic convolution interpolation algorithm, a nearest neighbor interpolation algorithm, and piecewise linear interpolation Algorithms and other algorithms for image interpolation.
  • this application will not go into details.
  • the terminal device may arrange the recognition scores (ie, the m recognition scores) of the m occlusion regions into a two-dimensional matrix. Since the size of the two-dimensional matrix is smaller than the resolution of the training image, an image interpolation algorithm needs to be used to interpolate the data in the two-dimensional matrix to obtain a new matrix with the same resolution as the training image.
  • the new matrix is a feature intensity image corresponding to the training image.
  • the intensity value of a pixel point in the feature intensity image is smaller, it indicates that the pixel point is more important for identifying the training image.
  • the terminal device may determine the intensity values of m pixels in the initial intensity image according to the respective recognition scores of the m occlusion images. Then, the image interpolation algorithm and the intensity values of m pixels in the initial intensity image are used to perform image interpolation to obtain the intensity value of each pixel in the initial intensity image, thereby obtaining the initial intensity image.
  • the initial intensity image refer to related descriptions in the foregoing embodiments, and details are not described herein again.
  • the terminal device may also obtain the intensity value of each pixel point in the characteristic intensity image according to the intensity value of each pixel point in the initial intensity image, thereby obtaining the characteristic intensity image.
  • the terminal device may obtain the intensity values of the m pixels in the initial intensity image according to the recognition scores of the m occlusion images.
  • an image interpolation algorithm and intensity values of m pixels in the initial intensity image are used to perform image interpolation to obtain the initial intensity image correspondingly.
  • the intensity value (ie, the recognition score) of a pixel in the initial intensity image is larger, it indicates that the importance of the pixel for identifying the training image is smaller. That is, the intensity value of the pixel point is inversely proportional to the importance degree reflected by the intensity value.
  • the terminal device may also process the intensity value of the pixel point in the initial intensity image to obtain the intensity value of the pixel point in the characteristic intensity image.
  • the characteristic intensity image is obtained.
  • the intensity value of the pixel point in the characteristic intensity image is proportional to the importance degree reflected by the intensity value. That is, the greater the intensity value of a pixel point in the characteristic intensity image, the greater the importance of the pixel point for identifying the training image. Conversely, the smaller the intensity value of a pixel in the feature intensity image, the smaller the importance of the pixel for identifying the training image.
  • the terminal device may determine the target pixel point with the largest intensity value in the initial intensity image according to the intensity value of each pixel point in the initial intensity image. Then, the intensity value of each pixel point in the initial intensity image is subtracted from the intensity value of the target pixel point to obtain the intensity value of each pixel point in the characteristic intensity image, thereby obtaining the characteristic intensity image.
  • the terminal device may use the recognition score of the training image and the intensity value of each pixel point in the initial intensity image to determine the intensity value of each pixel point in the characteristic intensity image, thereby obtaining the characteristic intensity image.
  • the following formula (3) can be used to obtain the intensity value of each pixel in the characteristic intensity image:
  • p 0 is the recognition score of the training image.
  • p i is the intensity value (recognition score) of the i-th pixel in the initial intensity image.
  • i is a positive integer less than or equal to N.
  • N is the total number of pixels in the initial intensity image.
  • FIG. 5 is a schematic flowchart of a method for obtaining a feature intensity image based on a gradient according to an embodiment of the present invention.
  • the method shown in FIG. 5 includes the following implementation steps:
  • Step S302 The terminal device inputs the training image into the image recognition model, and performs forward calculation and reverse calculation on the training image to obtain a corresponding gradient data block.
  • the size and The resolutions of the training images are the same.
  • Step S304 The terminal device determines a feature intensity image corresponding to the training image according to the gradient data block.
  • the terminal device may input the training image into an image recognition model, and perform a forward pass on the training image to obtain a recognition score corresponding to the training image. Then, the obtained recognition score is back-propagated to obtain a corresponding gradient data block.
  • the gradient data block is represented by a matrix block of C * H * W, where C is the number of channels, and H and W are generally the length and width of the training image.
  • step S304 the terminal device performs a rule setting operation process on the gradient data block to obtain a feature intensity image corresponding to the training image.
  • the setting rules are user-defined or system-defined algorithms, such as weighted summation, averaging of the gradient data blocks along the channel dimension, etc., and the new matrix / data block obtained through processing indicates the The characteristic intensity image is described.
  • the training image is an RGB image, which can be represented by a 3 * C * H data block.
  • the terminal device may input a training image into an image recognition model, and forward-feed the training image to obtain a recognition score corresponding to the training image.
  • the recognition score of the training image is transmitted from back to front to obtain a 3 * H * W gradient data block.
  • 3 is the number of channels
  • 3 * H * W is understood to be composed of three two-dimensional matrices of H * W.
  • the terminal device may average the gradient data blocks along the channel dimension to obtain a new matrix of H * W.
  • the new matrix is a feature intensity image corresponding to the training image.
  • FIG. 6 is a schematic flowchart of a method for obtaining a feature intensity image based on a CAM according to an embodiment of the present invention. The method shown in FIG. 6 includes the following implementation steps:
  • Step S402 The terminal device inputs the training image into the image recognition model, and performs feature extraction on the training image to obtain a feature image.
  • the resolution of the feature image is smaller than the resolution of the training image.
  • Step S404 The terminal device obtains a feature intensity image corresponding to the training image according to an image interpolation algorithm and the feature image.
  • step S402 the terminal device inputs the training image into the image recognition model, and the network layer (for example, a convolution layer, a pooling layer, an activation layer, etc.) in the model may be used to perform feature extraction on the training image to obtain Corresponding feature image.
  • the network layer for example, a convolution layer, a pooling layer, an activation layer, etc.
  • the terminal device may use the set network layer in the image recognition model to down-sample the training image to obtain a corresponding down-sampled image.
  • the set network layer may be a network layer that is set by the system to implement an image downsampling function, such as a convolution layer, a pooling layer, and the like.
  • the number of the set network layers can be set according to actual requirements, for example, it can be one or more, which is not limited in this application.
  • the model includes 5 convolutional layers.
  • the five convolutional layers in ResNet-50 can be used to sequentially convolve the training image (that is, downsampling) to obtain the image output by the last convolutional layer. And as the down-sampled image.
  • the terminal device may further process the down-sampled image according to the weight of the fully connected layer in the image recognition model to obtain the feature image.
  • the resolution of the training image is 224 * 224.
  • the training image is input into ResNet-50 to obtain the down-sampled image output from the fifth convolution layer.
  • the down-sampled image is obtained after the training image is down-sampled 32 times, and the resolution of the down-sampled image is 1/32 of the training image, that is, 7 * 7.
  • the down-sampled image may be represented by a data block of 2048 * 7 * 7, where 2048 represents the number of channels in ResNet-50. It is understandable that with different image recognition models, the number of channels set in the models may also be different, which is not described in detail in this application.
  • the terminal device may determine the weight of the fully connected layer to be used in the image recognition model according to the label information of the training image, which may be represented by a vector or a matrix.
  • the label information is used to indicate a target classification to which an object included in the training image belongs. All the weights in the fully connected layer can be represented by a 2048 * W data block, where W is the total number of classifications that the model supports for identifying objects.
  • the terminal device may select a column of weight data of the target classification in the data block of 2048 * W from the 2048 * W data block according to the target classification to which the training image belongs, that is, the full connection layer to be used. Weights.
  • the weighted summation of the down-sampled image is performed by using the weight of the selected fully connected layer, so as to obtain a 7 * 7 two-dimensional matrix (or a new matrix).
  • the two-dimensional matrix represents the feature image.
  • the terminal device may also perform image interpolation on the feature image using an image interpolation algorithm to obtain a feature intensity image corresponding to the training image .
  • image interpolation algorithm this application does not describe it in detail here.
  • the new image and the training image have the same label information
  • the label information is used to indicate an object included in the image, or a category to which the object included in the image belongs.
  • the label information may be identification information used to characterize / differentiate the pedestrian, such as a pedestrian's name, ID number, and the like.
  • the terminal device may also obtain multiple new images. For how to obtain the new images, refer to related descriptions in the foregoing embodiments, and details are not described herein again. Further, the terminal device may use the plurality of new images to train and update an image recognition model.
  • the terminal device can obtain a training sample set, where the training sample set may include multiple training images and multiple new images corresponding to the multiple training images, one of which is a training image Can correspond to one or more new images. Then, the image recognition model to be trained is obtained, and related parameters such as the learning rate and the number of iterations can be set when the model is trained. Further, the terminal device may use the images in the training sample set to train and update the image recognition model. Regarding how to train and update the image recognition model, this application does not elaborate here.
  • the training image may be an image in a new image used for training / updating the image recognition model last time. That is, during the iterative training process, the training sample set used for each training may be a new image generated by occluding all or part of the images in the training sample set used for training the image recognition model last time. Optionally, all or part of the images in the previous training sample set can also be combined.
  • training image A and training image B are started to obtain an initial image recognition model.
  • the training image A can be occluded to obtain new images C and D.
  • new images E and F can be obtained.
  • the terminal device may use the training images A and B and the new images C, D, E, and F to train and update the image recognition model.
  • the terminal device may use the six images of AF as the training images required in the second iterative training process.
  • the six images are occluded. To get 6 new images.
  • the image recognition model obtained in the first iteration can be trained and updated again.
  • the number of images involved in this example is merely an example and does not constitute a limitation. In the actual training process of the model, the number of training images required far exceeds the number of examples. How to train the image recognition model is not described in detail in this application.
  • the image recognition model is used for identifying images, which may include, but is not limited to, recurrent neural networks, recursive neural networks, deep neural networks, and volumes. Convolutional neural networks, deep generative models, deep belief networks, generative adversarial networks, or other models for identifying images.
  • the terminal device may input the image to be processed into a trained image recognition model to obtain a recognition result corresponding to the image to be processed.
  • the recognition results corresponding to the images to be processed may be different.
  • the image to be processed includes an object to be identified, and the recognition result may include an identification classification corresponding to the object and the identification score.
  • the recognition result may be used to indicate whether the image to be processed is a preset image, and so on.
  • the identification score involved in this application may be data (or probability) after normalization processing.
  • a sofmax function is designed in the image recognition model to achieve data normalization, which is not described in detail in this application.
  • the following describes two application scenarios to which this application may be applicable.
  • the trajectory tracking of a target object in a mass video (image) is based. Specifically, using an image including a target object, feature comparison is performed on a mass video (image) to find the target object from the mass video, and then obtain a motion trajectory of the target object.
  • an alarm prompt may be provided immediately, thereby improving the efficiency of image processing and saving time.
  • the identity of the target object in the video (image) is identified.
  • the target object can be located and similarly compared based on re-identification technology to achieve identity verification.
  • biometric recognition technologies such as attribute recognition (such as a person's body shape, wear, etc.) and gait recognition (such as a person's walking posture) may also be used to perform identity verification and identification on the target object.
  • the terminal device includes a hardware structure and / or a software module corresponding to each function.
  • the embodiments of the present invention can be implemented in the form of hardware or a combination of hardware and computer software. Whether a certain function is performed by hardware or computer software-driven hardware depends on the specific application and design constraints of the technical solution. Those skilled in the art may use different methods to implement the described functions for each specific application, but such implementation should not be considered to be beyond the scope of the technical solutions of the embodiments of the present invention.
  • the functional units of the terminal device may be divided according to the foregoing method example.
  • each functional unit may be divided corresponding to each function, or two or more functions may be integrated into one processing unit.
  • the above integrated unit may be implemented in the form of hardware or in the form of software functional unit. It should be noted that the division of the units in the embodiment of the present invention is schematic, and is only a logical function division. In actual implementation, there may be another division manner.
  • FIG. 7A illustrates a possible structural diagram of a terminal device involved in the foregoing embodiment.
  • the terminal device 700 includes a processing unit 702 and a communication unit 703.
  • the processing unit 702 is configured to control and manage the actions of the terminal device 700.
  • the processing unit 702 is configured to support the terminal device 700 to perform steps S102-S104 in FIG. 1, steps S202-S206 in FIG. 3, steps S302-S304 in FIG. 5, steps S402-S404 in FIG. 6, and / or use For performing other steps of the techniques described herein.
  • the communication unit 703 is configured to support communication between the terminal device 700 and other devices.
  • the communication unit 703 is configured to support the terminal device 700 to obtain an image (such as a training image, an image to be processed, or a feature intensity image) from a network device, and / or For performing other steps of the techniques described herein.
  • the terminal device 700 may further include a storage unit 701 for storing program code and data of the terminal device 700.
  • the processing unit 702 may be a processor or a controller, for example, it may be a central processing unit (English: Central Processing Unit, CPU), a general-purpose processor, a digital signal processor (English: Digital Signal Processor, DSP), and an application-specific integrated circuit. (English: Application-Specific Integrated Circuit, ASIC), Field Programmable Gate Array (English: Field Programmable Gate Array, FPGA) or other programmable logic device, transistor logic device, hardware component or any combination thereof. It may implement or execute various exemplary logical blocks, modules, and circuits described in connection with the present disclosure.
  • the processor may also be a combination that realizes computing functions, for example, a combination including one or more microprocessors, a combination of a DSP and a microprocessor, and so on.
  • the communication unit 703 may be a communication interface, a transceiver, a transceiver circuit, and the like.
  • the communication interface is collectively referred to and may include one or more interfaces, such as an interface between a network device and other devices.
  • the storage unit 701 may be a memory.
  • the terminal device 700 may further include a display unit (not shown).
  • the display unit may be used for previewing or displaying an image, for example, using the display unit to display a training image, an image to be processed, or a feature intensity image.
  • the display unit may be a display or a player, which is not limited in this application.
  • the terminal device involved in this embodiment of the present invention may be the terminal device shown in FIG. 7B.
  • the terminal device 710 includes: a processor 712, a communication interface 713, and a memory 77.
  • the terminal device 710 may further include a bus 714.
  • the communication interface 713, the processor 712, and the memory 77 may be connected to each other through a bus 714.
  • the bus 714 may be a peripheral component interconnect standard (English: Peripheral Component Interconnect (PCI) bus or an extended industry standard structure (English: Extended Industry). Standard Architecture (EISA) bus and so on.
  • the bus 714 may be divided into an address bus, a data bus, a control bus, and the like. For ease of representation, only a thick line is used in FIG. 7B, but it does not mean that there is only one bus or one type of bus.
  • FIG. 7A or FIG. 7B may also refer to corresponding descriptions of the foregoing method embodiments, and details are not described herein again.
  • the steps of the method or algorithm described in connection with the disclosure of the embodiments of the present invention may be implemented in a hardware manner, or may be implemented in a manner that a processor executes software instructions.
  • Software instructions can be composed of corresponding software modules.
  • Software modules can be stored in random access memory (English: Random Access Memory, RAM), flash memory, read-only memory (English: Read Only Memory, ROM), erasable and programmable Read-only memory (English: Erasable Programmable ROM, EPROM), electrically erasable programmable read-only memory (English: Electrically EPROM, EEPROM), registers, hard disk, mobile hard disk, read-only optical disk (CD-ROM), or well-known in the art Any other form of storage medium.
  • An exemplary storage medium is coupled to the processor such that the processor can read information from, and write information to, the storage medium.
  • the storage medium may also be an integral part of the processor.
  • the processor and the storage medium may reside in an ASIC.
  • the ASIC can reside in a network device.
  • the processor and the storage medium may also exist as discrete components in the terminal device.
  • the program can be stored in a computer-readable storage medium.
  • the foregoing storage medium includes various media that can store program codes, such as a ROM, a RAM, a magnetic disk, or an optical disc.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Computing Systems (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Multimedia (AREA)
  • Medical Informatics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Molecular Biology (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Databases & Information Systems (AREA)
  • Image Analysis (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)

Abstract

本发明实施例公开了一种图像处理方法、相关设备以及计算机存储介质,其中所述方法包括:获取训练图像对应的特征强度图像,所述特征强度图像中像素点的强度值用于指示所述像素点对识别所述训练图像的重要强度,所述训练图像的分辨率和所述特征强度图像的分辨率相同;根据所述特征强度图像,使用预设窗口对所述训练图像中的待遮挡区域进行遮挡,以获得新的图像;其中,所述待遮挡区域包括待遮挡像素点,所述新的图像用于更新图像识别模型。采用本发明实施例,能够解决现有技术中由于训练数据的局限性导致模型的精确度不高、泛化性能较差等问题。

Description

图像处理方法、相关设备及计算机存储介质 技术领域
本发明涉及图像处理技术领域,尤其涉及图像处理方法、相关设备及计算机存储介质。
背景技术
随着人们对社会公共安全的日益关注以及城市监控网络的广泛应用,大量监控摄像头应用在公共场所中,以进行视频(图像)监控。其中,行人再识别是视频监控中的一项基本任务,旨在识别不同摄像头所拍摄的行人图像是否为同一行人的图像。
目前,行人再识别任务主要受遮挡、视角的变化、不同行人衣着和体型等特征相似度较大等因素,导致使用传统模型进行行人识别的精确度不高。具体的,由于传统模型所使用的训练数据比较有限,导致模型的泛化性能不高,使用该模型进行行人识别的精确度不高。
为解决上述问题,现有训练数据的扩充方法主要有随机翻转图像、随机裁剪多个图像区域、随机扰动图像的像素值等等。但在实践中发现,现有的数据扩充方法是从表层形式上进行训练数据的扩充,并不能很好地提升模型的精确度。
发明内容
本发明实施例公开了图像处理方法、相关设备及计算机存储介质,能够解决现有技术中由于训练数据的局限性,导致模型的精确度不高的问题。
第一方面,本发明实施例公开提供了一种图像处理方法,所述方法包括:
获取训练图像对应的特征强度图像,所述特征强度图像中像素点的强度值用于指示所述像素点对识别所述训练图像的重要强度,所述训练图像的分辨率和所述特征强度图像的分辨率相同;
根据所述特征强度图像,使用预设窗口对所述训练图像中的待遮挡区域进行遮挡,以获得新的图像;其中,所述待遮挡区域包括待遮挡像素点,所述新的图像用于更新图像识别模型。
具体的,终端设备根据所述特征强度图像,确定所述训练图像中的待遮挡区域。再使用预设窗口对所述待遮挡区域进行遮挡,以获得新的图像。所述待遮挡区域包括一个或多个待遮挡像素点。所述预设窗口为用户侧或***侧自定义设置的,所述预设窗口的大小和形状等属性特征,不限定。例如,所述预设窗口可为矩阵框、菱形以及扇形等。
在一些可能的实施例中,所述根据所述特征强度图像,使用预设窗口对所述训练图像中的待遮挡区域进行遮挡,以获得新的图像包括:根据所述特征强度图像中像素点的强度值,确定映射像素点,所述映射像素点为所述特征强度图像中强度值满足预设条件的像素点;使用预设窗口对所述待遮挡像素点进行遮挡,以获得新的图像;其中,所述待遮挡像素点为所述训练图像中与所述映射像素点对应的像素点。
所述预设条件可为用户侧或***侧自定义设置的。例如在像素点的像素值越大,其表示该像素点对图像识别的重要程度越大的情况下,所述预设条件可为强度值大于或等于第一预设强度,即选取强度值大于或等于第一预设强度的像素点作为所述映射像素点。反之, 在像素点的像素值越小,其表示该像素点对图像识别的重要程度越大的情况下,所述预设条件可为强度值小于或等于第二预设强度,即选取强度值小于或等于第二预设强度的像素点作为所述映射像素点。
在一些可能的实施例中,所述映射像素点是使用多项式采样算法获得的。具体的,终端设备可根据多项式采样算法以及所述特征强度图像中每个像素点的强度值,从所述特征强度图像中确定出所述映射像素点。
在一些可能的实施例中,所述映射像素点的数量为多个,所述待遮挡像素点包括所述训练图像中与所述映射像素点对应的像素点中的任一个或多个像素点。即所述待遮挡像素点与所述映射像素点可一一对应,也可不支持一一对应。
在一些可能的实施例中,所述获取训练图像对应的特征强度图像包括:
使用滑动窗口对所述训练图像进行遮挡,获得m张遮挡图像,m为正整数;
将所述m张遮挡图像输入所述图像识别模型中,获得所述m张遮挡图像各自的识别分数,所述识别分数用于反映所述遮挡图像中所述滑动窗口遮挡的区域对识别所述训练图像的重要强度;
根据图像插值算法以及所述m张遮挡图像各自的识别分数,确定所述训练图像对应的特征强度图像。
其中,所述图像插值算法包括但不限于以下中的任一项:双线性插值算法、蓝格佐斯Lanczos插值算法、三次卷积插值算法、最邻近插值算法、分段线性插值算法以及其他用于图像插值的算法。
在一些可能的实施例中,所述根据图像插值算法以及所述m张遮挡图像各自的识别分数,确定所述训练图像对应的特征强度图像包括:
根据所述m张遮挡图像各自的识别分数,确定所述特征强度图像中m个像素点的强度值;
根据图像插值算法以及所述特征强度图像中m个像素点的强度值,确定所述特征强度图像中每个像素点的强度值,从而获得所述特征强度图像。
在一些可能的实施例中,所述根据图像插值算法以及所述m张遮挡图像各自的识别分数,确定所述训练图像对应的特征强度图像包括:
根据所述m张遮挡图像各自的识别分数,确定初始强度图像中m个像素点的强度值;
根据图像插值算法以及所述初始强度图像中m个像素点的强度值,确定所述初始强度图像中每个像素点的强度值;
根据所述训练图像的识别分数以及所述初始强度图像中每个像素点的强度值,确定所述特征强度图像中每个像素点的强度值,从而获得所述特征强度图像;其中,所述训练图像的识别分数为将所述训练图像输入所述图像识别模型中获得的。
在一些可能的实施例中,所述获取训练图像对应的特征强度图像包括:
将所述训练图像输入所述图像识别模型中,对所述训练图像进行正向运算和反向运算,以获得对应的梯度数据块,所述梯度数据块的大小和所述训练图像的分辨率大小相同;
根据所述梯度数据块,确定所述训练图像对应的特征强度图像。
在一些可能的实施例中,所述获取训练图像对应的特征强度图像包括:
将所述训练图像输入所述图像识别模型中,对所述训练图像进行特征提取,以获得对应的特征图像,所述特征图像的分辨率小于所述训练图像的分辨率;
根据图像插值算法以及所述特征图像,获得所述训练图像对应的特征强度图像。
在一些可能的实施例中,所述对所述训练图像进行特征提取,以获得对应的特征图像包括:
对所述训练图像进行下采样,以获得对应的下采样图像,所述下采样图像的分辨率和所述特征图像的分辨率相同;
根据所述图像识别模型中全连接层的权重对所述下采样图像进行处理,得到所述特征图像。
在一些可能的实施例中,所述待遮挡像素点包括第一像素点和第二像素点在内的至少两个像素点,所述第一像素点和所述第二像素点之间的距离大于或等于预设第一距离,
所述使用预设窗口对所述待遮挡像素点进行遮挡,以获得新的图像包括以下中的任一项:
使用预设窗口对所述训练图像中的第一像素点进行遮挡,以获得新的图像;
使用预设窗口对所述训练图像中的第二像素点进行遮挡,以获得新的图像;
使用预设第一窗口对所述训练图像中的第一像素点进行遮挡,并使用预设第二窗口对所述训练图像中的第二像素点进行遮挡,以获得新的图像。
在一些可能的实施例中,所述训练图像为上一次用于更新所述图像识别模型的新的图像中的图像。
在一些可能的实施例中,所述训练图像的数量为多个,
所述获取训练图像对应的特征强度图像包括:
获取多个训练图像各自对应的特征强度图像;
所述根据所述特征强度图像,使用预设窗口对所述训练图像中的待遮挡区域进行遮挡,以获得新的图像包括:
根据所述多个训练图像各自对应的特征强度图像,使用预设窗口对所述多个训练图像中各自的待遮挡区域进行遮挡,以获得多个新的图像;
所述方法还包括:
根据所述多个新的图像,对所述图像识别模型进行训练及更新。
在一些可能的实施例中,所述训练图像和所述新的图像具有相同的标签信息,所述标签信息用于指示图像中包括的对象,或者所述对象所属的分类。
第二方面,本发明实施例提供了又一种图像处理方法,所述方法包括:
使用预设窗口对训练图像中待遮挡区域进行遮挡,以获得新的图像;
其中,所述待遮挡区域是根据所述训练图像对应的特征强度图像确定的,所述新的图像用于更新图像识别模型。
在一些可能的实施例中,所述待遮挡区域包括待遮挡像素点。
在一些可能的实施例中,所述使用预设窗口对训练图像中待遮挡区域进行遮挡,以获得新的图像之前,还包括:获取训练图像对应的特征强度图像,所述特征强度图像中像素点的强度值用于指示所述像素点对识别所述训练图像的重要强度,所述训练图像的分辨率 和所述特征强度图像的分辨率相同。
关于本发明实施例中未示出或未描述的内容,可参见前述第一方面所述方法实施例中的介绍,这里不再赘述。
第三方面,本发明实施例提供了又一种图像处理方法(模型训练方法),所述方法包括:
获取多个新的图像,所述多个新的图像中的任一新的图像是使用预设窗口对训练图像中的待遮挡区域进行遮挡获得的;
根据所述多个新的图像,对图像识别模型进行训练及更新。
在一些可能的实施例中,所述训练图像为上一次用于更新所述图像识别模型中的新的图像中的图像。
关于本发明实施例中未示出或未描述的内容,可参见前述第一方面所述方法实施例中的介绍,这里不再赘述。
第四方面,本发明实施例提供了又一种图像处理方法(模型使用方法),所述方法包括:
将待处理图像输入图像识别模型中,以获得所述待处理图像对应的识别结果;
其中,所述图像识别模型为使用多个新的图像训练获得的,所述多个新的图像中的任一新的图像是使用预设窗口对训练图像中的待遮挡区域进行遮挡获得的。
在一些可能的实施例中,所述将待处理图像输入图像识别模型中之前,还包括:获取所述图像识别模型。
关于本发明实施例中未示出或未描述的内容,可参见前述第一方面所述方法实施例中的介绍,这里不再赘述。
第五方面,本发明实施例提供了一种终端设备,所述终端设备包括用于执行如上第一方面至第四方面中任一方面所述方法的功能单元。
第六方面,本发明实施例提供了又一种终端设备,包括存储器及与所述存储器和耦合的处理器;所述存储器用于存储指令,所述处理器用于执行所述指令;其中,所述处理器执行所述指令时执行上述第一方面至第四方面中任一方面所描述的方法。
在一些实施例中,所述终端设备还包括与所述处理器耦合的显示器,所述显示器用于在所述处理器的控制下显示图像(例如训练图像以及特征强度图像等)。
在一些实施例中,所述终端设备还包括通信接口,所述通信接口与所述处理器通信,所述通信接口用于在所述处理器的控制下与其他设备(如服务器等)进行通信。
第七方面,提供了一种计算机可读存储介质,所述计算机可读存储介质存储了程序代码。所述程序代码包括用于执行上述第一方面至第四方面中任一方面所描述的方法的指令。
通过实施本发明实施例,能够解决现有技术中由于训练数据的局限性导致模型的精确度不高的问题,从而提高了模型的精确度。
附图说明
为了更清楚地说明本发明实施例或现有技术中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍。
图1是本发明实施例提供的一种图像处理方法的流程示意图。
图2A和图2B是本发明实施例提供的几种遮挡图像的示意图。
图3是本发明实施例提供的一种特征强度图像获得方法的流程示意图。
图4是本发明实施例提供的一种图像遮挡的示意图。
图5-图6是本发明实施例提供的另两种特征强度图像获得方法的流程示意图。
图7A和图7B是本发明实施例提供的两种终端设备的结构示意图。
具体实施方式
下面将结合本发明的附图,对本发明实施例中的技术方案进行详细描述。
申请人在提出本申请的过程中发现:为扩充模型训练所使用的训练样本,通常采用随机翻转图像、随机剪裁多个图像区域、随机扰动图像中像素点的像素值等方法来获得的新训练样本。但在实践中发现,通过上述方法扩充获得的新训练样本,并不能很好地提升模型的精确度以及泛化性能。
为解决上述问题,本申请提出一种图像处理方法以及所述方法适用的终端设备。请参见图1,是本发明实施例提供的一种图像处理方法。如图1所示的方法包括如下实施步骤:
步骤S102、终端设备获取训练图像对应的特征强度图像,所述特征强度图像中像素点的强度值用于指示所述像素点对识别所述训练图像的重要程度,所述训练图像的分辨率和所述特征强度图像的分辨率相同;
所述特征强度图像是指用场景中各物体(或点)对识别该场景的强度大小(强度值)作为像素值的图像。即,所述特征强度图像中像素点的像素值为强度值,用于反映/指示该像素点对识别所述特征强度图像对应的原图像(这里即训练图像)的重要程度或重要大小。
其中,所述像素点的强度值如何反映该像素点对识别训练图像的重要程度,具体可为用户侧或***侧自定义设置的。例如,像素点的强度值越大,则表示该像素点对识别训练图像的重要程度也越大;或者,像素点的强度值越大,则表示该像素点对识别训练图像的重要程度越小等等。
步骤S104、所述终端设备根据所述特征强度图像,使用预设窗口对所述训练图像中的待遮挡区域进行遮挡,以获得新的图像;其中,所述待遮挡区域包括待遮挡像素点,所述新的图像用于训练及更新图像识别模型。
终端设备可根据所述特征强度图像,确定所述训练图像中的待遮挡区域。然后,使用预设窗口对所述待遮挡区域进行遮挡,以获得新的图像。具体在下文进行详细阐述。
所述预设窗口为用户侧或***侧自定义设置的窗口。其中,所述预设窗口的大小以及形状等属性特征,本申请不做限定。例如,所述预设窗口可为矩形窗口、三角形窗口、扇形窗口以及菱形窗口等等。
下面介绍本申请涉及的一些具体实施例以及可选实施例。
步骤S102中,终端设备获取训练图像对应的特征强度图像的实施方式有多种,例如终端设备可基于预设的特征强度图像获取方法来获得所述训练图像对应的特征强度图像,或者,终端设备可通过网络从其他设备(如服务器)中直接获得所述训练图像以及所述训练训练对应的特征强度图像等。其中,所述特征强度获取方法可为用户侧或***侧自定义设置的方法,其可包括但不限于基于滑动窗口遮挡获得训练图像对应的特征强度图像、基于梯度获得训练图像对应的特征强度图像、基于类激活映射(class activation mapping,CAM) 算法获得训练图像对应的特征强度图像或者其他用于获取特征强度图像的实施方法。本申请下文将具体介绍如何基于滑动窗口遮挡、基于梯度以及基于CAM获得训练图像对应的特征强度图像,这里不做详述。
步骤S104中,终端设备可根据所述特征强度图像中每个像素点的强度值,确定所述训练图像中待遮挡区域。所述待遮挡区域中包括一个或多个待遮挡像素点。进而,使用预设窗口对所述训练图像中待遮挡区域进行遮挡,以获得新的图像。其中,所述新的图像用于训练以及更新图像识别模型。
具体的,终端设备可根据预设规则,从所述特征强度图像中确定出强度值满足预设条件的像素点,作为映射像素点。进而,在所述训练图像中确定出与所述映射像素点对应的像素点,作为所述待遮挡像素点。其中,多个所述待遮挡像素点组成的区域,可称为待遮挡区域。所述预设规则为用户侧或***侧自定义设置的规则,所述预设规则和所述预设条件对应关联,例如在所述像素点的强度值越大,其表示该像素点对图像识别的重要程度越大的情况下,可从所述特征强度图像中选取强度值大于或等于第一预设强度的像素点,作为所述映射像素点。反之,在所述像素点的强度值越小,其表示该像素点对图像识别的重要度越大的情况下,可从所述特征强度图像中选取强度值小于或等于第二预设强度的像素点,作为所述映射像素点等。
所述第一预设强度和所述第二预设强度可为用户侧或***侧自定义设置的强度阈值,它们可以相同、也可不同,本申请不做限定。
所述映射像素点的数量本申请不做限定,其可为一个或多个。相应地,所述待遮挡像素点的数量也可为一个或多个。且所述待遮挡像素点和所述映射像素点可以一一对应,也可不支持一一对应。即,所述待遮挡像素点可为所述训练图像中与所述映射像素点对应的像素点,也可为所述训练图像中与所述映射像素点对应的像素点中的任一个或者多个。
在一些实施例中,终端设备还可采用设定算法,并结合所述特征强度图像中每个像素点的强度值,从所述特征强度图像中获得所述映射像素点。所述设定算法为用户侧或***侧自定义设置的,其用于从所述特征强度图像中获得满足预设条件的映射像素点。
下面以所述设定算法为多项式采样算法为例,详细阐述如何从所述特征强度图像中确定所述映射像素点。本例中,所述特征强度图像由n个像素点组成。像素点的强度值用Q i表示,i为小于等于n的正整数。其中,Q i越大,则表示该像素点用于图像识别的重要程度越大。
具体实现中,终端设备可先采用如下公式(1)对n个像素点的强度值进行归一化处理,以获得用概率表示的像素点的强度值R i。在一定程度上,概率数值也表示了该像素点被选择为映射像素点的概率大小,或者优先级大小。
Figure PCTCN2018088758-appb-000001
接着,终端设备可根据每个像素点各自的强度值R i,从n个像素点中挑选出满足预设条件的像素点j,作为所述映射像素点。示例性地,终端设备可采用如下公式(2)对R i进 行累加计算:
Figure PCTCN2018088758-appb-000002
在利用多项式采样算法进行像素点采样的过程中,会产生一个服从0~1分布的随机数r。如果r≤s 1,则返回j=1,即采样获得第一个像素点作为所述映射像素点。如果s j-1<r≤s j,则返回j,即采样获得的第j个像素点作为所述映射像素点。
可理解的,当终端设备需获得多个映射像素点时,可根据上述多项式算法采样原理,进行多次采样以获得多个映射像素点。
在一些实施例中,终端设备可利用预设窗口对所述训练图像中的待遮挡区域进行遮挡,以获得新的图像。可选的,所述待遮挡区域的分辨率(或尺寸大小)大于或等于预设分辨率(或尺寸大小)。
示例性地,如图2A中,以所述预设窗口为64*64的矩形框为例,终端设备可利用64*64的矩形框对所述训练图像中的待遮挡区域进行滑动遮挡,以获得多个新的图像。图示受限于所述待遮挡区域的大小,获得两个新的图像,这里仅为示例并不构成限定。
在一些实施例中,所述待遮挡区域包括待遮挡像素点,终端设备可利用预设窗口对所述训练图像中的待遮挡像素点进行遮挡,以获得新的图像。
示例性地,如图2B中,以所述预设窗口为64*64的矩形框以及一个待遮挡像素点A为例,终端设备可以所述待遮挡像素点A为中心,利用64*64的矩形框对训练图像进行遮挡,以获得新的图像。在实际应用中,所述预设窗口所遮挡的图像区域中像素点的像素值可用预设像素值表示或替代,例如灰度像素值,0或者255等。所述训练图像中未被所述预设窗口遮挡的像素点的像素值不发生变化。相应地,即可形成/获得一张新的图像。
在一些实施例中,所述待遮挡像素点的数量,本申请不做限定,具体可为一个或多个。在所述待遮挡像素点的数量为多个的情况下,终端设备可使用预设窗口对所述多个待遮挡像素点进行遮挡,以获得新的图像。或者,终端设备可使用预设窗口分别对所述多个待遮挡像素点中的部分像素点进行遮挡,剩余的像素点不遮挡,以获得新的图像。
具体实现中,当所述多个待遮挡像点中的任两个像素点之间的间距比较近时,终端设备可使用一个预设窗口将多个待遮挡像素点进行统一遮挡,以获得新的图像。在所述多个待遮挡像素点中存在间距较远的至少两个像素点时,终端设备可使用多个预设窗口分别对多个待遮挡像素点进行遮挡,以获得新的图像。
示例性地,以所述多个待遮挡像素点包括第一像素点和第二像素点在内的至少两个像素点为例。如果所述第一像素点和所述第二像素点之间的间距较近,例如小于或等于预设距离(如5厘米),则可使用一个预设窗口同时对所述第一像素点和第二像素点进行遮挡,以获得新的图像。
如果所述第一像素点和所述第二像素点间距较远,即第一像素点和第二像素点之间的距离大于或等于预设距离。此时,所述终端设备可使用预设窗口对所述第一像素点以及第二像素点中的任一个像素点进行遮挡,以获得新的图像。或者,所述终端设备可使用两个预设窗口分别对所述第一像素点和第二像素点进行遮挡,以获得新的图像。示例性地,如 可使用预设第一窗口对第一像素点进行遮挡,同时使用预设第二窗口对第二像素点进行遮挡,从而获得新的图像。其中,所述预设第一窗口和所述预设第二窗口可以相同,也可不同,本申请不做限定。
在一些实施例,下面介绍S102中获得训练图像对应的特征强度图像的三种具体实施方式。
第一种实施方式,基于滑动窗口获得训练图像对应的特征强度图像。参见图3给出一种基于滑动窗口遮挡获得特征强度图像的方法流程示意图。如图3所示的方法包括如下实施步骤:
步骤S202、终端设备使用滑动窗口对所述训练图像进行遮挡,以获得m张遮挡图像,m为正整数。
所述滑动窗口可为用户侧或***侧自定义设置的窗口,其大小、形状等属性特征,不做限定。
步骤S204、所述终端设备将所述m张遮挡图像输入所述图像识别模型中,获得所述m张遮挡图像各自的识别分数。
步骤S206、所述终端设备根据图像插值算法以及所述m张遮挡图像各自的识别分数,确定所述训练图像对应的特征强度图像。
步骤S202中,终端设备可使用滑动窗口随机对所述训练图像进行m次遮挡,以获得m张遮挡图像。或者,终端设备可使用滑动窗口对训练图像进行遍历和遮挡,以获得m张遮挡图像。
具体的,所述滑动窗口可按照设定的移动线路在所述训练图像中进行移动。所述设定的移动线路可为用户侧或***侧自定义设置的,例如按照固定步长(1个像素点等)从左至右、从上至下的顺序移动等等。其中,滑动窗口在水平方向和竖直方向上移动的固定步长可不同,也可相同等,本申请不做限定。为提高效率,固定步长可设置为较大一些,例如10个像素点等等。
示例性地,如图4示出一种使用滑动窗口遍历遮挡训练图像的示意图。如图4中,使用矩形框作为滑动窗口,按照从左到右、从上到下的顺序遍历完所述训练图像后,可获得m张遮挡图像。
步骤S204中,终端设备可将所述m张遮挡图像输入到训练好的图像识别模型中,以获得所述m张遮挡图像各自的识别分数。其中,所述图像识别模型具体可为上一次训练好的图像识别模型,或者第一次训练好的初始的图像识别模型。关于所述图像识别模型的训练(或迭代训练)将在本申请下文进行详细阐述。
可选的,终端设备还可将所述训练图像输入所述图像识别模型中,获得所述训练图像的识别分数,以用于后续获得所述训练图像对应的特征强度图像。
步骤S206中,终端设备可根据图像插值算法以及所述m张遮挡图像各自的识别分数,确定所述训练图像的特征强度图像。具体存在以下几种实施方式。
在一些实施例中,终端设备可将所述m个遮挡图像各自的识别分数,作为所述训练图 像中m个遮挡区域的识别分数。所述遮挡区域为所述遮挡图像中被所述滑动窗口遮挡的区域。所述识别分数用于反映所述遮挡区域对识别所述训练图像的重要程度。
如图4所示,使用滑动窗口在训练图像中进行不同位置的遮挡,可获得m张遮挡图像,也即是m个遮挡区域。相应地,后续获得的所述m张遮挡图像各自的识别分数,也即为所述m个遮挡区域各自的识别分数。其中,所述遮挡区域的识别分数越大,则表示该遮挡区域用于识别所述训练图像的重要程度越小。反之,所述遮挡区域的识别分数越小,则表示该遮挡区域用于识别所述训练图像的重要程度越大。
进一步地,终端设备可根据所述m个遮挡区域的识别分数,对应确定出所述特征强度图像中m个像素点的强度值。
具体的,终端设备可将所述遮挡区域视为一个像素点。或者,将所述遮挡区域的中心点视为一个像素点。相应地,将所述遮挡区域的识别分数直接作为该像素点的强度值;或者,对该遮挡区域的识别分数进行预处理,将处理结果作为该像素点的强度值。所述预处理为用户侧或***侧自定义设置的,例如归一化处理、预设比例的缩放处理等,本申请不做详述和限定。
同理,所述终端设备可根据所述训练图像中m个遮挡区域的识别分数,确定出所述训练图像对应的特征强度图像中m个像素点的强度值。
进一步地,所述终端设备可根据图像插值算法以及所述特征强度图像中m个像素点的强度值,获得所述特征强度图像中每个像素点的强度值,从而获得所述特征强度图像。
具体的,终端设备可利用图像插值算法以及所述特征强度图像中m个像素点的强度值,进行图像插值,以获得构成所述特征强度图像的每个像素点的强度值,从而获得了所述特征强度图像。其中,所述图像插值算法可为用户侧或***侧预先自定义设置的,例如双线性插值算法、蓝格佐斯Lanczos插值算法、三次卷积插值算法、最邻近插值算法、分段线性插值算法以及其他用于图像插值的算法。关于如何使用图像插值算法进行图像插值以获得图像中每个像素点的强度值,本申请不做详述。
以所述识别分数直接作为像素点的强度值为例,在实际处理过程中,终端设备可将所述m个遮挡区域的识别分数(即m个识别分数)排列成一个二维矩阵。由于该二维矩阵的大小小于训练图像的分辨率大小,则需采用图像插值算法对二维矩阵中的数据进行插值,以获得和所述训练图像的分辨率大小相同的新矩阵。所述新矩阵即表示所述训练图像对应的特征强度图像。
需要说明的是,本实施例中特征强度图像中像素点的强度值越大,则表示该像素点用于对识别所述训练图像的重要程度越小。相应地,当特征强度图像中像素点的强度值越小,则表示该像素点用于对识别所述训练图像的重要程度越大。
在又一些实施例中,终端设备可根据所述m张遮挡图像各自的识别分数,确定初始强度图像中m个像素点的强度值。然后再利用图像插值算法以及所述初始强度图像中m个像素点的强度值,进行图像插值,以获得所述初始强度图像中每个像素点的强度值,从而获得所述初始强度图像。关于如何所述初始强度图像的具体实施方式可参见前述实施例中的相关介绍,这里不再赘述。
进一步地,所述终端设备还可根据所述初始强度图像中每个像素点的强度值,获得特 征强度图像中每个像素点的强度值,从而获得所述特征强度图像。
具体的,以所述识别分数为像素点的强度值为例,终端设备可根据所述m个遮挡图像各自的识别分数,获得初始强度图像中m个像素点的强度值。接着,利用图像插值算法以及所述初始强度图像中m个像素点的强度值(即m个识别分数),进行图像插值,对应获得所述初始强度图像。由于该初始强度图像中像素点的强度值(即识别分数)越大,表示该像素点用于识别训练图像的重要程度越小。即像素点的强度值和该强度值反映的重要程度呈反比,为此终端设备还可对所述初始强度图像中像素点的强度值进行处理,以获得特征强度图像中像素点的强度值,从而获得所述特征强度图像。其中,所述特征强度图像中像素点的强度值和该强度值反映的重要程度乘正比。即,所述特征强度图像中像素点的强度值越大,则表示该像素点用于识别训练图像的重要程度越大。反之,所述特征强度图像中像素点的强度值越小,则表示该像素点用于识别训练图像的重要程度越小。
示例性地,终端设备可根据初始强度图像中每个像素点的强度值,确定出所述初始强度图像中强度值最大的目标像素点。然后,利用目标像素点的强度值减去所述初始强度图像中每个像素点的强度值,以获得特征强度图像中每个像素点的强度值,从而获得所述特征强度图像。
又例如,终端设备可利用所述训练图像的识别分数以及所述初始强度图像中每个像素点的强度值,确定特征强度图像中每个像素点的强度值,从而获得所述特征强度图像。具体实现时,可采用如下公式(3)获得特征强度图像中每个像素点的强度值:
Figure PCTCN2018088758-appb-000003
其中,p 0为训练图像的识别分数。p i为初始强度图像中第i个像素点的强度值(识别分数)。其中,i为小于等于N的正整数。N为初始强度图像中像素点的总数量。
第二种实施方式,基于梯度获得训练图像对应的特征强度图像。请参见图5,是本发明实施例提供的一种基于梯度获得特征强度图像的方法流程示意图。如图5所示的方法包括如下实施步骤:
步骤S302、终端设备将所述训练图像输入所述图像识别模型中,对所述训练图像进行正向运算以及反向运算,以获得对应的梯度数据块;其中,所述梯度数据块的大小和所述训练图像的分辨率大小相同。
步骤S304、所述终端设备根据所述梯度数据块,确定所述训练图像对应的特征强度图像。
步骤S302中,终端设备可将所述训练图像输入图像识别模型中,对所述训练图像进行前传,以获得所述训练图像对应的识别分数。接着,将获得的识别分数进行反向传播,以获得对应的梯度数据块。通常,所述梯度数据块用一个C*H*W的矩阵块表示,其中C为通道数,H和W通常为所述训练图像的长度和宽度。
最后S304中,终端设备对所述梯度数据块进行设定规则的运算处理,以获得所述训练图像对应的特征强度图像。所述设定规则为用户侧或***侧自定义设置的运算法则,例如加权求和、沿着通道维度对所述梯度数据块进行求平均等,其处理获得的新矩阵/数据块即 表示所述特征强度图像。
举例来说,所述训练图像为RGB图像,其可用3*C*H的数据块来表示。终端设备可将训练图像输入至图像识别模型中,对所述训练图像进行前传,得到所述训练图像对应的识别分数。接着,将训练图像的识别分数从后往前传,得到一个3*H*W的梯度数据块。这里的3为通道数,3*H*W可理解由三个H*W的二维矩阵组成。进一步地,终端设备可沿着通道维度对梯度数据块求平均,以获得一个H*W的新矩阵。所述新矩阵即表示所述训练图像对应的特征强度图像。
第三种实施方式,基于CAM获得训练图像对应的特征强度图像。请参见图6,是本发明实施例提供的一种基于CAM获得特征强度图像的方法流程示意图。如图6所示的方法包括如下实施步骤:
步骤S402、终端设备将所述训练图像输入所述图像识别模型中,对所述训练图像进行特征提取,以获得特征图像,所述特征图像的分辨率小于所述训练图像的分辨率。
步骤S404、所述终端设备根据图像插值算法以及所述特征图像,获得所述训练图像对应的特征强度图像。
步骤S402中,终端设备将所述训练图像输入图像识别模型中,可利用模型内部的网络层(例如卷积层、池化层、激活层等等)对所述训练图像进行特征提取,以获得对应的特征图像。下面示例性介绍一种S402的具体实施方式。
在一些实施例中,终端设备可利用图像识别模型中的设定网络层对所述训练图像进行下采样,以获得对应的下采样图像。
所述设定网络层可为***侧自定设置用于实现图像下采样功能的网络层,例如卷积层、池化层等等。所述设定网络层的数量可根据实际需求进行设定,例如可为一个或多个,本申请不做限定。
例如,以图像识别模型为神经网络ResNet-50模型为例,该模型中包括5个卷积层。相应地,终端设备将训练图像输入ResNet-50后,可采用ResNet-50中的5个卷积层依次对训练图像进行卷积处理(即下采样),获得最后一个卷积层输出的图像,并作为所述下采样图像。
进一步地,终端设备还可根据所述图像识别模型中全连接层的权重对所述下采样图像进行处理,以获得所述特征图像。
以所述图像识别为对象识别(即对图像中包括的对象进行分类识别)为例,假设训练图像的分辨率为224*224。将训练图像输入ResNet-50中,获得第五个卷积层输出的下采样图像。该下采样图像为所述训练图像经过了32次下采样所获得的,则所述下采样图像的的分辨率为训练图像的1/32,即7*7。在数据处理过程中,所述下采样图像可用2048*7*7的数据块表示,这里2048表示ResNet-50中的通道数。可理解的,随着图像识别模型的不同,其模型中设置的通道数也可不同,本申请不做详述。
进一步地,终端设备可根据所述训练图像的标签信息,确定所述图像识别模型中待使用的全连接层的权重,其可用向量或矩阵表示。所述标签信息用于指示所述训练图像中包 括的对象所属的目标分类。全连接层中所有的权重可用一个2048*W的数据块表示,其中W为该模型支持识别对象分类的总数量。终端设备可根据训练图像对应所属的目标分类,从2048*W的数据块中选出该目标分类在所述数据块中的一列权重数据2048*1,即为所述待使用的全连接层的权重。然后利用选出的全连接层的权重对所述下采样图像进行加权求和,从而可获得一个7*7的二维矩阵(或新矩阵)。此时该二维矩阵即表示所述特征图像。
步骤S404中,由于所述特征图像的分辨率小于所述训练图像的分辨率,因此终端设备还可采用图像插值算法对所述特征图像进行图像插值,以获得所述训练图像对应的特征强度图像。关于如何使用图像插值算法获得所述特征强度图像,本申请这里不做详述。
下面介绍本申请涉及的一些可选实施例。
在一些实施例中,所述新的图像和所述训练图像具有相同的标签信息,所述标签信息用于指示图像中包括的对象,或者图像中包括的对象所属的分类。以所述对象为行人为例,所述标签信息可为用于表征/区别该行人的标识信息,例如行人的名称、ID号等等。
在一些实施例中,终端设备还可获得多个新的图像,其中关于如何获得所述新的图像可参见前述实施例中的相关介绍,这里不再赘述。进一步地,终端设备可利用所述多个新的图像对图像识别模型进行训练以及更新。
具体的,终端设备在训练图像识别模型之前,终端设备可获得训练样本集,所述训练样本集中可包括多个训练图像以及所述多个训练图像对应的多个新的图像,其中一个训练图像可对应一个或多个新的图像。接着,获得待训练的图像识别模型,还可设定模型训练时所使用的相关参数,例如学习率以及迭代次数等。进一步地,所述终端设备可使用所述训练样本集中的图像对所述图像识别模型进行训练以及更新。关于如何训练及更新所述图像识别模型,本申请这里不做详述。
可选的,在迭代训练过程中,所述训练图像可为上一次用于训练/更新所述图像识别模型中新的图像中的图像。即在迭代训练过程中,每次训练使用的训练样本集可以是对上一次用于训练所述图像识别模型的训练样本集中的所有或部分图像进行遮挡后,产生的新的图像。可选的,还可结合上一次训练样本集中的所有或部分图像。
示例性地,以两次迭代训练为例,假设开始用训练图像A和训练图像B训练获得一个初始的图像识别模型。参考前述新的图像的获取方法,可对训练图像A进行遮挡,获得新的图像C和D。对训练图像B进行遮挡,可获得新的图像E和F。接着,在第一次迭代训练过程中,终端设备可利用训练图像A和B、新的图像C、D、E以及F对所述图像识别模型进行训练及更新。在第二次迭代训练过程中,终端设备可将A-F这6张图像都作为第二迭代训练过程中所需的训练图像,同样基于前述新的图像获取方法,分别对这6张图像进行遮挡,以获得6张新图像。利用获得的6张新图像以及原本的6张训练图像(共12张),可对第一次迭代获得的图像识别模型再次进行训练以及更新。其中,本例中涉及的图像数量仅为示例并不构成限定。在模型的实际训练过程中,所需使用的训练图像远远超过示例数量。关于图像识别模型如何训练,本申请不做详述。
在一些实施例中,所述图像识别模型用于识别图像,其可包括但不限于循环神经网络 (recurrent neural networks)、递归神经网络(recursive neural networks)、深度神经网络(deep neural networks)、卷积神经网络(convolutional neural networks)、深度生成模型(deep generative models)、深度信念网络(deep belief neural networks)、生成式对抗网络(generative adversarial networks)或者其他用于识别图像的模型等。
在一些实施例中,终端设备可将待处理图像输入训练好的图像识别模型中,以获得所述待处理图像对应的识别结果。
具体的,在不同的应用场景中,所述待处理图像对应的识别结果可不同。示例性地,在对象分类场景中,所述待处理图像中包括有待识别的对象,则所述识别结果可包括所述对象对应的识别分类以及所述识别分数。又如,在判定图像是否为预设图像的场景中,所述识别结果可用于指示所述待处理图像是否为预设图像等等。
在一些实施例中,本申请中涉及的识别分数可为归一化处理后的数据(或概率)。具体的,在图像识别模型中设计sofmax函数,以实现数据的归一化,本申请这里不做详述。
在一些实施例中,下面介绍本申请可能适用的两种应用场景。
一种场景中,基于海量视频(图像)中对目标对象的轨迹跟踪。具体的,利用包括有目标对象的图像,对海量视频(图像)进行特征对比,以从所述海量视频中查找出所述目标对象,进而获得该目标对象的运动轨迹。可选的,在查找到所述目标对象后,可立即告警提示,从而提升图像处理的高效性,节省时间。
另一种场景中,对视频(图像)中目标对象的身份识别。在特定场景下,例如人像背向、侧向或者人脸模糊不清等场景下,可基于重识别技术对目标对象进行定位和相似度对比等实现身份验证。可选的,还可利用属性识别(例如人的体型、穿着等)、步态识别(人走路的姿态等)等生物特征识别技术对目标对象进行身份验证和识别。
通过实施本发明实施例,能够解决现有技术中由于训练数据的局限性导致模型的精确度不高或者泛化性能不好等问题。
上述主要从终端设备实现深度预测模块训练的角度出发对本发明实施例提供的方案进行了介绍。可以理解的是,终端设备为了实现上述功能,其包含了执行各个功能相应的硬件结构和/或软件模块。结合本发明中所公开的实施例描述的各示例的单元及算法步骤,本发明实施例能够以硬件或硬件和计算机软件的结合形式来实现。某个功能究竟以硬件还是计算机软件驱动硬件的方式来执行,取决于技术方案的特定应用和设计约束条件。本领域技术人员可以对每个特定的应用来使用不同的方法来实现所描述的功能,但是这种实现不应认为超出本发明实施例的技术方案的范围。
本发明实施例可以根据上述方法示例对终端设备进行功能单元的划分,例如,可以对应各个功能划分各个功能单元,也可以将两个或两个以上的功能集成在一个处理单元中。上述集成的单元既可以采用硬件的形式实现,也可以采用软件功能单元的形式实现。需要说明的是,本发明实施例中对单元的划分是示意性的,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式。
在采用集成的单元的情况下,图7A示出了上述实施例中所涉及的终端设备的一种可能的结构示意图。终端设备700包括:处理单元702和通信单元703。处理单元702用于 对终端设备700的动作进行控制管理。示例性地,处理单元702用于支持终端设备700执行图1中步骤S102-S104,图3中步骤S202-S206、图5中步骤S302-S304、图6中步骤S402-S404,和/或用于执行本文所描述的技术的其它步骤。通信单元703用于支持终端设备700与其它设备的通信,例如,通信单元703用于支持终端设备700从网络设备中获取图像(例如训练图像、待处理图像或者特征强度图像),和/或用于执行本文所描述的技术的其它步骤。可选的,终端设备700还可以包括存储单元701,用于存储终端设备700的程序代码和数据。
其中,处理单元702可以是处理器或控制器,例如可以是中央处理器(英文:Central Processing Unit,CPU),通用处理器,数字信号处理器(英文:Digital Signal Processor,DSP),专用集成电路(英文:Application-Specific Integrated Circuit,ASIC),现场可编程门阵列(英文:Field Programmable Gate Array,FPGA)或者其他可编程逻辑器件、晶体管逻辑器件、硬件部件或者其任意组合。其可以实现或执行结合本发明公开内容所描述的各种示例性的逻辑方框,模块和电路。所述处理器也可以是实现计算功能的组合,例如包含一个或多个微处理器组合,DSP和微处理器的组合等等。通信单元703可以是通信接口、收发器、收发电路等,其中,通信接口是统称,可以包括一个或多个接口,例如网络设备与其他设备之间的接口。存储单元701可以是存储器。
可选的,所述终端设备700还可包括显示单元(图未示)。所述显示单元可用于预览或显示图像,例如使用显示单元显示训练图像、待处理图像或者特征强度图像等。在实际应用中,所述显示单元可为显示器或播放器等,本申请不做限定。
当处理单元702为处理器,通信单元703为通信接口,存储单元701为存储器时,本发明实施例所涉及的终端设备可以为图7B所示的终端设备。
参阅图7B所示,该终端设备710包括:处理器712、通信接口713、存储器77。可选地,终端设备710还可以包括总线714。其中,通信接口713、处理器712以及存储器77可以通过总线714相互连接;总线714可以是外设部件互连标准(英文:Peripheral Component Interconnect,简称PCI)总线或扩展工业标准结构(英文:Extended Industry Standard Architecture,简称EISA)总线等。所述总线714可以分为地址总线、数据总线、控制总线等。为便于表示,图7B中仅用一条粗线表示,但并不表示仅有一根总线或一种类型的总线。
上述图7A或图7B所示的终端设备的具体实现还可以对应参照前述方法实施例的相应描述,此处不再赘述。
结合本发明实施例公开内容所描述的方法或者算法的步骤可以硬件的方式来实现,也可以是由处理器执行软件指令的方式来实现。软件指令可以由相应的软件模块组成,软件模块可以被存放于随机存取存储器(英文:Random Access Memory,RAM)、闪存、只读存储器(英文:Read Only Memory,ROM)、可擦除可编程只读存储器(英文:Erasable Programmable ROM,EPROM)、电可擦可编程只读存储器(英文:Electrically EPROM,EEPROM)、寄存器、硬盘、移动硬盘、只读光盘(CD-ROM)或者本领域熟知的任何其它形式的存储介质中。一种示例性的存储介质耦合至处理器,从而使处理器能够从该存储介质读取信息,且可向该存储介质写入信息。当然,存储介质也可以是处理器的组成部分。处理器和存储介质可以位于ASIC中。另外,该ASIC可以位于网络设备中。当然,处理器 和存储介质也可以作为分立组件存在于终端设备中。
本领域普通技术人员可以理解实现上述实施例方法中的全部或部分流程,是可以通过计算机程序来指令相关的硬件来完成,所述的程序可存储于计算机可读取存储介质中,该程序在执行时,可包括如上述各方法的实施例的流程。而前述的存储介质包括:ROM、RAM、磁碟或者光盘等各种可以存储程序代码的介质。

Claims (28)

  1. 一种图像处理方法,其特征在于,所述方法包括:
    获取训练图像对应的特征强度图像,所述特征强度图像中像素点的强度值用于指示所述像素点对识别所述训练图像的重要强度,所述训练图像的分辨率和所述特征强度图像的分辨率相同;
    根据所述特征强度图像,使用预设窗口对所述训练图像中的待遮挡区域进行遮挡,以获得新的图像;其中,所述待遮挡区域包括待遮挡像素点,所述新的图像用于更新图像识别模型。
  2. 根据权利要求1所述的方法,其特征在于,所述根据所述特征强度图像,使用预设窗口对所述训练图像中的待遮挡区域进行遮挡,以获得新的图像包括:
    根据所述特征强度图像中像素点的强度值,确定映射像素点,所述映射像素点为所述特征强度图像中强度值满足预设条件的像素点;
    使用预设窗口对所述待遮挡像素点进行遮挡,以获得新的图像;其中,所述待遮挡像素点为所述训练图像中与所述映射像素点对应的像素点。
  3. 根据权利要求2所述的方法,其特征在于,所述映射像素点是使用多项式采样算法获得的。
  4. 根据权利要求1-3中任一项所述的方法,其特征在于,所述获取训练图像对应的特征强度图像包括:
    使用滑动窗口对所述训练图像进行遮挡,获得m张遮挡图像,m为正整数;
    将所述m张遮挡图像输入所述图像识别模型中,获得所述m张遮挡图像各自的识别分数,所述识别分数用于反映所述遮挡图像中所述滑动窗口遮挡的区域对识别所述训练图像的重要强度;
    根据图像插值算法以及所述m张遮挡图像各自的识别分数,确定所述训练图像对应的特征强度图像。
  5. 根据权利要求4所述的方法,其特征在于,所述根据图像插值算法以及所述m张遮挡图像各自的识别分数,确定所述训练图像对应的特征强度图像包括:
    根据所述m张遮挡图像各自的识别分数,确定所述特征强度图像中m个像素点的强度值;
    根据图像插值算法以及所述特征强度图像中m个像素点的强度值,确定所述特征强度图像中每个像素点的强度值,从而获得所述特征强度图像。
  6. 根据权利要求4所述的方法,其特征在于,所述根据图像插值算法以及所述m张遮挡图像各自的识别分数,确定所述训练图像对应的特征强度图像包括:
    根据所述m张遮挡图像各自的识别分数,确定初始强度图像中m个像素点的强度值;
    根据图像插值算法以及所述初始强度图像中m个像素点的强度值,确定所述初始强度图像中每个像素点的强度值;
    根据所述训练图像的识别分数以及所述初始强度图像中每个像素点的强度值,确定所述特征强度图像中每个像素点的强度值,从而获得所述特征强度图像;其中,所述训练图像的识别分数为将所述训练图像输入所述图像识别模型中获得的。
  7. 根据权利要求1-4中任一项所述的方法,其特征在于,所述获取训练图像对应的特征强度图像包括:
    将所述训练图像输入所述图像识别模型中,对所述训练图像进行正向运算和反向运算,以获得对应的梯度数据块,所述梯度数据块的大小和所述训练图像的分辨率大小相同;
    根据所述梯度数据块,确定所述训练图像对应的特征强度图像。
  8. 根据权利要求1-4中任一项所述的方法,其特征在于,所述获取训练图像对应的特征强度图像包括:
    将所述训练图像输入所述图像识别模型中,对所述训练图像进行特征提取,以获得对应的特征图像,所述特征图像的分辨率小于所述训练图像的分辨率;
    根据图像插值算法以及所述特征图像,获得所述训练图像对应的特征强度图像。
  9. 根据权利要求8所述的方法,其特征在于,所述对所述训练图像进行特征提取,以获得对应的特征图像包括:
    对所述训练图像进行下采样,以获得对应的下采样图像,所述下采样图像的分辨率和所述特征图像的分辨率相同;
    根据所述图像识别模型中全连接层的权重对所述下采样图像进行处理,得到所述特征图像。
  10. 根据权利要求2-9中任一项所述的方法,其特征在于,所述待遮挡像素点包括第一像素点和第二像素点在内的至少两个像素点,所述第一像素点和所述第二像素点之间的距离大于或等于预设第一距离,
    所述使用预设窗口对所述待遮挡像素点进行遮挡,以获得新的图像包括以下中的任一项:
    使用预设窗口对所述训练图像中的第一像素点进行遮挡,以获得新的图像;
    使用预设窗口对所述训练图像中的第二像素点进行遮挡,以获得新的图像;
    使用预设第一窗口对所述训练图像中的第一像素点进行遮挡,并使用预设第二窗口对所述训练图像中的第二像素点进行遮挡,以获得新的图像。
  11. 根据权利要求1-10中任一项所述的方法,其特征在于,所述训练图像为上一次用于更新所述图像识别模型的新的图像中的图像。
  12. 根据权利要求1-11中任一项所述的方法,其特征在于,所述训练图像的数量为多个,
    所述获取训练图像对应的特征强度图像包括:
    获取多个训练图像各自对应的特征强度图像;
    所述根据所述特征强度图像,使用预设窗口对所述训练图像中的待遮挡区域进行遮挡,以获得新的图像包括:
    根据所述多个训练图像各自对应的特征强度图像,使用预设窗口对所述多个训练图像中各自的待遮挡区域进行遮挡,以获得多个新的图像;
    所述方法还包括:
    根据所述多个新的图像,对所述图像识别模型进行训练及更新。
  13. 根据权利要求1-12中任一项所述的方法,其特征在于,所述训练图像和所述新的 图像具有相同的标签信息,所述标签信息用于指示图像中包括的对象,或者所述对象所属的分类。
  14. 一种终端设备,其特征在于,包括处理单元;其中,
    所述处理单元,用于获取训练图像对应的特征强度图像,所述特征强度图像中像素点的强度值用于指示所述像素点对识别所述训练图像的重要强度,所述训练图像的分辨率和所述特征强度图像的分辨率相同;
    所述处理单元,还用于根据所述特征强度图像,使用预设窗口对所述训练图像中的待遮挡区域进行遮挡,以获得新的图像;其中,所述待遮挡区域包括待遮挡像素点,所述新的图像用于更新图像识别模型。
  15. 根据权利要求14所述的终端设备,其特征在于,
    所述处理单元,用于根据所述特征强度图像中像素点的强度值,确定映射像素点,所述映射像素点为所述特征强度图像中强度值满足预设条件的像素点;
    所述处理单元,还用于使用预设窗口对所述待遮挡像素点进行遮挡,以获得新的图像;其中,所述待遮挡像素点为所述训练图像中与所述映射像素点对应的像素点。
  16. 根据权利要求15所述的终端设备,其特征在于,所映射像素点是使用多项式采样算法获得的。
  17. 根据权利要求14-16中任一项所述的终端设备,其特征在于,
    所述处理单元,用于使用滑动窗口对所述训练图像进行遮挡,获得m张遮挡图像,m为正整数;
    所述处理单元,还用于将所述m张遮挡图像输入所述图像识别模型中,获得所述m张遮挡图像各自的识别分数,所述识别分数用于反映所述遮挡图像中所述滑动窗口遮挡的区域对识别所述训练图像的重要强度;
    所述处理单元,还用于根据图像插值算法以及所述m张遮挡图像各自的识别分数,确定所述训练图像对应的特征强度图像。
  18. 根据权利要求17所述的终端设备,其特征在于,
    所述处理单元,具体用于根据所述m张遮挡图像各自的识别分数,确定所述特征强度图像中m个像素点的强度值;
    所述处理单元,还具体用于根据图像插值算法以及所述特征强度图像中m个像素点的强度值,确定所述特征强度图像中每个像素点的强度值,从而获得所述特征强度图像。
  19. 根据权利要求17所述的终端设备,其特征在于,
    所述处理单元,具体用于根据所述m张遮挡图像各自的识别分数,确定初始强度图像中m个像素点的强度值;
    所述处理单元,还具体用于根据图像插值算法以及所述初始强度图像中m个像素点的强度值,确定所述初始强度图像中每个像素点的强度值;
    所述处理单元,还具体用于根据所述训练图像的识别分数以及所述初始强度图像中每个像素点的强度值,确定所述特征强度图像中每个像素点的强度值,从而获得所述特征强度图像;其中,所述训练图像的识别分数为将所述训练图像输入所述图像识别模型中获得的。
  20. 根据权利要求14-16中任一项所述的终端设备,其特征在于,
    所述处理单元,用于将所述训练图像输入所述图像识别模型中,对所述训练图像进行正向运算和反向运算,以获得对应的梯度数据块,所述梯度数据块的大小和所述训练图像的分辨率大小相同;
    所述处理单元,还用于根据所述梯度数据块,确定所述训练图像对应的特征强度图像。
  21. 根据权利要求14-16中任一项所述的终端设备,其特征在于,
    所述处理单元,用于将所述训练图像输入所述图像识别模型中,对所述训练图像进行特征提取,以获得对应的特征图像,所述特征图像的分辨率小于所述训练图像的分辨率;
    所述处理单元,还用于根据图像插值算法以及所述特征图像,获得所述训练图像对应的特征强度图像。
  22. 根据权利要求21所述的终端设备,其特征在于,
    所述处理单元,具体用于对所述训练图像进行下采样,以获得对应的下采样图像,所述下采样图像的分辨率和所述特征图像的分辨率相同;
    所述处理单元,还具体用于根据所述图像识别模型中全连接层的权重对所述下采样图像进行处理,得到所述特征图像。
  23. 根据权利要求15-22中任一项所述的终端设备,其特征在于,所述待遮挡像素点包括第一像素点和第二像素点在内的至少两个像素点,所述第一像素点和所述第二像素点之间的距离大于或等于预设第一距离,所述处理单元用于执行以下中的任一项:
    使用预设窗口对所述训练图像中的第一像素点进行遮挡,以获得新的图像;
    使用预设窗口对所述训练图像中的第二像素点进行遮挡,以获得新的图像;
    使用预设第一窗口对所述训练图像中的第一像素点进行遮挡,并使用预设第二窗口对所述训练图像中的第二像素点进行遮挡,以获得新的图像。
  24. 根据权利要求14-23中任一项所述的终端设备,其特征在于,所述训练图像的数量为多个,
    所述处理单元,用于获取多个训练图像各自对应的特征强度图像;
    所述处理单元,还用于根据所述多个训练图像各自对应的特征强度图像,使用预设窗口对所述多个训练图像中各自的待遮挡区域进行遮挡,以获得多个新的图像;
    所述处理单元,还用于根据所述多个新的图像,对所述图像识别模型进行训练及更新。
  25. 根据权利要求14-24中任一项所述的终端设备,其特征在于,所述训练图像和所述新的图像具有相同的标签信息,所述标签信息用于指示图像中包括的对象,或者所述对象所属的分类。
  26. 一种终端设备,其特征在于,包括存储器及与所述存储器耦合的处理器;所述存储器用于存储指令,所述处理器用于执行所述指令;其中,所述处理器执行所述指令时执行如上权利要求1-13中任一项所述的方法。
  27. 根据权利要求26所述的终端设备,其特征在于,还包括与所述处理器耦合的显示器,所述显示器用于在所述处理器的控制下显示图像。
  28. 一种计算机可读存储介质,所述计算机可读存储介质存储有计算机程序,其特征在于,所述计算机程序被处理器执行时实现如权利要求1至13任一项所述方法。
PCT/CN2018/088758 2018-05-28 2018-05-28 图像处理方法、相关设备及计算机存储介质 WO2019227294A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
PCT/CN2018/088758 WO2019227294A1 (zh) 2018-05-28 2018-05-28 图像处理方法、相关设备及计算机存储介质
US17/039,544 US11836619B2 (en) 2018-05-28 2020-09-30 Image processing method, related device, and computer storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2018/088758 WO2019227294A1 (zh) 2018-05-28 2018-05-28 图像处理方法、相关设备及计算机存储介质

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US17/039,544 Continuation US11836619B2 (en) 2018-05-28 2020-09-30 Image processing method, related device, and computer storage medium

Publications (1)

Publication Number Publication Date
WO2019227294A1 true WO2019227294A1 (zh) 2019-12-05

Family

ID=68698560

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2018/088758 WO2019227294A1 (zh) 2018-05-28 2018-05-28 图像处理方法、相关设备及计算机存储介质

Country Status (2)

Country Link
US (1) US11836619B2 (zh)
WO (1) WO2019227294A1 (zh)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111353470A (zh) * 2020-03-13 2020-06-30 北京字节跳动网络技术有限公司 图像的处理方法、装置、可读介质和电子设备
CN112396125A (zh) * 2020-12-01 2021-02-23 中国第一汽车股份有限公司 一种定位测试场景的分类方法、装置、设备及存储介质
CN113449552A (zh) * 2020-03-25 2021-09-28 江苏翼视智能科技有限公司 基于分块非直接耦合gan网络的行人重识别方法

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20190136893A (ko) * 2018-05-30 2019-12-10 카네기 멜론 유니버시티 강건한 자동화 학습 시스템을 생성하고 훈련된 자동화 학습 시스템을 시험하기 위한 방법, 장치 및 컴퓨터 프로그램
CN113012176B (zh) * 2021-03-17 2023-12-15 阿波罗智联(北京)科技有限公司 样本图像的处理方法、装置、电子设备及存储介质

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101996401A (zh) * 2009-08-24 2011-03-30 三星电子株式会社 基于强度图像和深度图像的目标分析方法及设备
US20130287251A1 (en) * 2012-02-01 2013-10-31 Honda Elesys Co., Ltd. Image recognition device, image recognition method, and image recognition program
CN104123532A (zh) * 2013-04-28 2014-10-29 浙江大华技术股份有限公司 对目标对象进行检测、确定目标对象数量的方法和设备
CN104504365A (zh) * 2014-11-24 2015-04-08 闻泰通讯股份有限公司 视频序列中的笑脸识别***及方法
CN106156691A (zh) * 2015-03-25 2016-11-23 中测高科(北京)测绘工程技术有限责任公司 复杂背景图像的处理方法及其装置
CN106339665A (zh) * 2016-08-11 2017-01-18 电子科技大学 一种人脸的快速检测方法
CN106529448A (zh) * 2016-10-27 2017-03-22 四川长虹电器股份有限公司 利用聚合通道特征进行多视角人脸检测的方法
US20170185864A1 (en) * 2015-12-23 2017-06-29 Fotonation Limited Image processing system
CN106980825A (zh) * 2017-03-15 2017-07-25 广东顺德中山大学卡内基梅隆大学国际联合研究院 一种基于归一化像素差特征的人脸姿势分类方法

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101996401A (zh) * 2009-08-24 2011-03-30 三星电子株式会社 基于强度图像和深度图像的目标分析方法及设备
US20130287251A1 (en) * 2012-02-01 2013-10-31 Honda Elesys Co., Ltd. Image recognition device, image recognition method, and image recognition program
CN104123532A (zh) * 2013-04-28 2014-10-29 浙江大华技术股份有限公司 对目标对象进行检测、确定目标对象数量的方法和设备
CN104504365A (zh) * 2014-11-24 2015-04-08 闻泰通讯股份有限公司 视频序列中的笑脸识别***及方法
CN106156691A (zh) * 2015-03-25 2016-11-23 中测高科(北京)测绘工程技术有限责任公司 复杂背景图像的处理方法及其装置
US20170185864A1 (en) * 2015-12-23 2017-06-29 Fotonation Limited Image processing system
CN106339665A (zh) * 2016-08-11 2017-01-18 电子科技大学 一种人脸的快速检测方法
CN106529448A (zh) * 2016-10-27 2017-03-22 四川长虹电器股份有限公司 利用聚合通道特征进行多视角人脸检测的方法
CN106980825A (zh) * 2017-03-15 2017-07-25 广东顺德中山大学卡内基梅隆大学国际联合研究院 一种基于归一化像素差特征的人脸姿势分类方法

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111353470A (zh) * 2020-03-13 2020-06-30 北京字节跳动网络技术有限公司 图像的处理方法、装置、可读介质和电子设备
CN111353470B (zh) * 2020-03-13 2023-08-01 北京字节跳动网络技术有限公司 图像的处理方法、装置、可读介质和电子设备
CN113449552A (zh) * 2020-03-25 2021-09-28 江苏翼视智能科技有限公司 基于分块非直接耦合gan网络的行人重识别方法
CN112396125A (zh) * 2020-12-01 2021-02-23 中国第一汽车股份有限公司 一种定位测试场景的分类方法、装置、设备及存储介质
CN112396125B (zh) * 2020-12-01 2022-11-18 中国第一汽车股份有限公司 一种定位测试场景的分类方法、装置、设备及存储介质

Also Published As

Publication number Publication date
US11836619B2 (en) 2023-12-05
US20210027094A1 (en) 2021-01-28

Similar Documents

Publication Publication Date Title
WO2019227294A1 (zh) 图像处理方法、相关设备及计算机存储介质
WO2020177651A1 (zh) 图像分割方法和图像处理装置
US20220108546A1 (en) Object detection method and apparatus, and computer storage medium
WO2021164228A1 (zh) 一种图像数据的增广策略选取方法及***
CN111914997B (zh) 训练神经网络的方法、图像处理方法及装置
CN111627050B (zh) 一种目标跟踪模型的训练方法和装置
KR20180065889A (ko) 타겟의 검측 방법 및 장치
CN106326853B (zh) 一种人脸跟踪方法及装置
WO2021003936A1 (zh) 图像分割方法、电子设备和计算机可读存储介质
CN115410030A (zh) 目标检测方法、装置、计算机设备及存储介质
CN113807361B (zh) 神经网络、目标检测方法、神经网络训练方法及相关产品
WO2020233069A1 (zh) 点云数据处理方法、装置、电子设备及存储介质
CN115631112B (zh) 一种基于深度学习的建筑轮廓矫正方法及装置
CN117218622A (zh) 路况检测方法、电子设备及存储介质
WO2020244076A1 (zh) 人脸识别方法、装置、电子设备及存储介质
JP4387889B2 (ja) テンプレート照合装置および方法
CN111242176A (zh) 计算机视觉任务的处理方法、装置及电子***
CN116129386A (zh) 可行驶区域检测方法、***及计算机可读介质
CN112580435B (zh) 人脸定位方法、人脸模型训练与检测方法及装置
CN115862112A (zh) 一种人脸图像痤疮疗效评估用目标检测模型
WO2022052853A1 (zh) 目标跟踪方法、装置、设备及计算机可读存储介质
WO2022017129A1 (zh) 目标对象检测方法、装置、电子设备及存储介质
CN112257686B (zh) 人体姿态识别模型的训练方法、装置及存储介质
CN112632601B (zh) 面向地铁车厢场景的人群计数方法
CN112884804A (zh) 行动对象追踪方法及相关设备

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18920748

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 18920748

Country of ref document: EP

Kind code of ref document: A1