CN110555877B

CN110555877B - Image processing method, device and equipment and readable medium

Info

Publication number: CN110555877B
Application number: CN201810571964.XA
Authority: CN
Inventors: 徐跃书; 肖飞; 范蒙; 俞海
Original assignee: Hangzhou Hikvision Digital Technology Co Ltd
Current assignee: Hangzhou Hikvision Digital Technology Co Ltd
Priority date: 2018-05-31
Filing date: 2018-05-31
Publication date: 2022-05-31
Anticipated expiration: 2038-05-31
Also published as: CN110555877A; WO2019228450A1

Abstract

The invention provides an image processing method, an image processing device, image processing equipment and a readable medium, wherein the image processing method comprises the steps of acquiring position information of a specified target in first image data in a first acquired data format; intercepting target data corresponding to the position information from the first image data; and converting the data format of the target data from the first data format into a second data format, wherein the second data format is suitable for display and/or transmission of the target data. The image quality of the detection target can be improved.

Description

Image processing method, device and equipment and readable medium

Technical Field

The present invention relates to the field of image processing technologies, and in particular, to an image processing method, an image processing apparatus, an image processing device, and a readable medium.

Background

The main purpose of object detection techniques is to detect and locate a specific object from a single frame of picture or video. Currently, target detection techniques have been widely applied in various fields in society, such as: the method comprises the following steps of character detection of cargo handling in logistics, detection of illegal vehicles in road traffic, detection of passenger flow in markets and stations, statistics of passenger flow and the like.

The current target detection algorithm mainly uses low-bit-width images processed by an ISP, and after an interested target is detected, corresponding target images are extracted from the images for display or subsequent identification. In the system of the detection technology, the quality difference of the finally obtained small target images is generally large, and some images have better quality, but the quality conditions such as blurring, insufficient brightness, insufficient contrast and the like may exist in many times.

In patent application document CN104463103A published by the chinese patent office, an image processing method and device are proposed, when the detected target is a character, the character in the target image is processed clearly, the main flow of the scheme is as follows: firstly, an interested target in an image is detected, the detected target is classified by using a preset classifier, and when the classification result is a character, the character is subjected to sharpening processing.

The prior ISP processing algorithm finally loses the original information of the image to a certain extent due to the defects of design and the loss superposition of each processing module, and the technical scheme of the patent application is that the information in the subsequent word processing is a data format image processed by the ISP algorithm, so that the information is possibly seriously lost and cannot be repaired in the subsequent process; the method only processes the characters, the characters are generally a tiny part of an object concerned by people, and when other objects concerned by people, such as faces, vehicles, buildings and the like, are detected, subsequent processing is not carried out so as to improve the quality of key objects; overall, the current scheme is limited, and the image quality of the detected target cannot be comprehensively improved.

Disclosure of Invention

In view of the above, the present invention provides an image processing method, an image processing apparatus, an image processing device, and a readable medium, which can improve the image quality of a detection target.

A first aspect of the present invention provides an image processing method, including:

acquiring position information of a specified target in first image data in a first acquired data format;

intercepting target data corresponding to the position information from the first image data;

and converting the data format of the target data from the first data format into a second data format, wherein the second data format is suitable for display and/or transmission of the target data.

According to an embodiment of the present invention, the acquiring, from the acquired first image data in the first data format, the position information of the specified target in the first image data includes:

converting the first image data into second image data capable of carrying out target detection;

and detecting the position information of the designated target in the second image data, and determining the detected position information as the position information of the designated target in the first image data.

According to an embodiment of the present invention, the detecting of the position information of the designated object in the second image data, and the determining of the detected position information as the position information of the designated object in the first image data includes:

inputting the second image data to a trained first neural network; the first neural network realizes the positioning and output of the position information of the specified target through at least a convolution layer for performing convolution, a pooling layer for performing down-sampling, a full-link layer for performing feature synthesis, and a bounding box regression layer for performing coordinate transformation;

determining a result output by the first neural network as position information of the specified target in the first image data.

According to an embodiment of the present invention, the converting the first image data into second image data capable of object detection includes:

and converting the first image data into second image data capable of carrying out target detection by adopting at least one image processing mode of black level correction, white balance correction, color interpolation, contrast enhancement and bit width compression.

inputting the first image data to a trained second neural network; the second neural network converts the first image data into second image data which can be subjected to target detection and detects position information of a specified target in the second image data at least through a grayness layer for performing grayscale processing, a convolution layer for performing convolution, a pooling layer for performing downsampling, a full connection layer for performing feature synthesis and a frame regression layer for performing coordinate transformation;

determining a result output by the second neural network as position information of the specified target in the first image data.

According to an embodiment of the present invention, the converting the data format of the target data from the first data format to the second data format includes:

inputting the target data to a trained third neural network; the third neural network effects conversion of the data format of the target data from the first data format to a second data format by at least the convolutional layer for performing convolution.

performing ISP processing on the target data; wherein the ISP processing is used for converting the data format of the target data from the first data format to a second data format, and the ISP processing at least comprises color interpolation.

According to one embodiment of the invention, the ISP processing further comprises at least one of: white balance correction, curve mapping.

A second aspect of the present invention provides an image processing apparatus comprising:

the first processing module is used for acquiring position information of a specified target in first image data from the acquired first image data in a first data format;

the second processing module is used for intercepting target data corresponding to the position information from the first image data;

and the third processing module is used for converting the data format of the target data from the first data format into a second data format, and the second data format is suitable for display and/or transmission of the target data.

According to one embodiment of the invention, the first processing module comprises a first processing unit and a second processing unit;

the first processing unit is used for converting the first image data into second image data capable of carrying out target detection;

the second processing unit is configured to detect position information of the designated object in the second image data, and determine the detected position information as position information of the designated object in the first image data.

According to an embodiment of the present invention, the second processing unit is specifically configured to:

inputting the second image data into a trained first neural network, and determining a result output by the first neural network as position information of the specified target in the first image data; the first neural network enables location and output of location information of the specified target through at least a convolution layer for performing convolution, a pooling layer for performing downsampling, a full-link layer for performing feature synthesis, and a bounding box regression layer for performing coordinate transformation.

According to an embodiment of the present invention, the first processing unit is specifically configured to:

According to one embodiment of the invention, the first processing module comprises a third processing unit;

the third processing unit is used for inputting the first image data to a trained second neural network; the second neural network converts the first image data into second image data which can be subjected to target detection and detects the position information of a specified target in the second image data at least through a graying layer for performing grayscale processing, a convolution layer for performing convolution, a pooling layer for performing downsampling, a full connection layer for performing feature synthesis and a frame regression layer for performing coordinate transformation; determining a result output by the second neural network as position information of the designated target in the first image data.

According to one embodiment of the invention, the third processing module comprises a fourth processing unit;

the fourth processing unit is used for inputting the target data to a trained third neural network; the third neural network effects conversion of the data format of the target data from the first data format to a second data format by at least the convolutional layer for performing convolution.

According to one embodiment of the invention, the third processing module comprises a fifth processing unit;

the fifth processing unit is configured to perform ISP processing on the target data; wherein the ISP processing is configured to convert the data format of the target data from the first data format to a second data format, including at least color interpolation.

A third aspect of the invention provides an electronic device comprising a processor and a memory; the memory stores a program that can be called by the processor; wherein the processor, when executing the program, implements the image processing method as in any one of the preceding embodiments.

A fourth aspect of the present invention provides a machine-readable storage medium on which a program is stored, the program, when executed by a processor, implementing the image processing method as set forth in any one of the preceding embodiments.

Compared with the prior art, the embodiment of the invention has the following beneficial effects:

the embodiment of the invention utilizes the first image data in the first data format acquired to detect the designated target to acquire the position information of the designated target, then utilizes the first image data in the first data format to intercept the target data corresponding to the position information, and utilizes the target data to convert the format into the data format suitable for display and/or transmission because the target data is intercepted from the first image data, thereby improving the image quality of the detected target compared with the conventional mode of post-processing the image after detection.

Drawings

FIG. 1 is a flowchart illustrating an image processing method according to an exemplary embodiment of the invention;

FIG. 2 is a block diagram of an image processing apparatus according to an exemplary embodiment of the present invention;

FIG. 3 is a block diagram of one embodiment of a first processing module provided in the present invention;

FIG. 4 is a flowchart illustrating an embodiment of converting first image data into second image data according to the present invention;

FIG. 5 is a schematic diagram of one embodiment of color interpolation provided by the present invention;

FIG. 6 is a block diagram of an embodiment of a first neural network provided by the present invention;

FIG. 7 is a block diagram of another embodiment of a first neural network provided by the present invention;

FIG. 8 is a block diagram of another embodiment of a first processing module provided in the present invention;

FIG. 9 is a block diagram of a second neural network according to an embodiment of the present invention;

FIG. 10 is a block diagram of another embodiment of a second neural network provided by the present invention;

FIG. 11 is a diagram illustrating an embodiment of performing graying according to the present invention;

fig. 12 is a block diagram of an embodiment of an image processing apparatus according to the present invention;

FIG. 13 is a block diagram of a third neural network according to an embodiment of the present invention;

fig. 14 is a block diagram of another embodiment of an image processing apparatus according to the present invention;

FIG. 15 is a block diagram illustrating an embodiment of ISP processing to convert a data format of target data from a first data format to a second data format in accordance with the present invention;

fig. 16 is a block diagram of an electronic device according to an exemplary embodiment of the present invention.

Detailed Description

Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The embodiments described in the following exemplary embodiments do not represent all embodiments consistent with the present invention. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the invention, as detailed in the appended claims.

The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used in this specification and the appended claims, the singular forms "a", "an", and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used herein refers to and encompasses any and all possible combinations of one or more of the associated listed items.

It is to be understood that although the terms first, second, third, etc. may be used herein to describe various information, these information should not be limited to these terms. These terms are only used to distinguish one type of information from another. For example, first information may also be referred to as second information, and similarly, second information may also be referred to as first information, without departing from the scope of the present invention. The word "if" as used herein may be interpreted as "at … …" or "when … …" or "in response to a determination", depending on the context.

In order to make the description of the present invention clearer and more concise, some technical terms in the present invention are explained below:

isp (image Signal processor): the image processing is mainly used for processing image signals acquired by an image sensor of front-end imaging equipment, and has the main functions of dead pixel correction, black level correction, white balance correction, color interpolation, gamma correction, color correction, sharpening, denoising and the like, and one or more of the functions can be selected according to actual application.

Deep learning: the concept of deep learning is derived from the research of artificial neural networks, and is a method for simulating human brain analysis learning and establishing corresponding data representation by using deeper neural networks.

Neural Network (Neural Network): the neural network is a network technology abstracted by simulating a brain information processing process and mainly comprises neurons; its artificial neurons can respond to a part of the surrounding cells within the coverage, and has excellent performance for large-scale image processing, and it can include Convolutional Layer (Convolutional Layer) and Pooling Layer (Pooling Layer), etc.

The following describes the image processing method according to the embodiment of the present invention more specifically, but not limited thereto.

In one embodiment, referring to fig. 1, an image processing method of an embodiment of the present invention is shown, which may include the steps of:

s1: acquiring position information of a specified target in first image data in a first acquired data format;

s2: intercepting target data corresponding to the position information from the first image data;

s3: and converting the data format of the target data from the first data format into a second data format, wherein the second data format is suitable for display and/or transmission of the target data.

In the embodiment of the present invention, the image processing method may be applied to an image device, and the image device may be a device having an imaging function, such as a camera, or a device capable of performing image post-processing, and the like, and is not limited in particular. The first image data in the first data format may be image data acquired by the device itself, or image data acquired by other devices acquired by the device from other devices, and is not limited in particular.

The image data format collected by the image device is a first data format. The first data format is a raw image format acquired by an imaging device and may contain data in one or more spectral bands, which may include, for example, spectral sampling signals in the wavelength range of 380nm to 780nm and/or spectral sampling signals in the wavelength range of 780nm to 2500 nm. Generally speaking, there can be certain difficulties in using the images of the first data format directly for display or transmission.

In step S1, position information of a specified target in first image data in a first data format is acquired from the acquired first image data.

The first image data includes a specific target, which is an object desired to be subjected to ISP processing for improving image quality of the specific target. Since the first image data itself is processed when the designated object is detected and located in the first image data, the first image data is no longer the original image during the acquisition after the designated object is detected.

Specifying the location information of the target in the first image data may include: specifying coordinates of the feature point of the target in the first image data and a size of a target image area; alternatively, the coordinates of the start point and the end point of the designated target image area, and the like, are not particularly limited as long as the position of the designated target in the first image data can be located.

Then, step S2 is executed to intercept the target data corresponding to the position information from the first image data.

The first image data in step S2 is the first image data in the acquired first data format, that is, the original image when the device acquires the first image data, and is not the image data obtained by processing the first image data to acquire the position information of the target object, and there is no problem of losing image information. That is, the first image data utilized in step S1 and step S2 may be the same data source, may be the same first image data, or may be different first image data captured in the same scene, for example, may be two frames of image data before and after, as long as the designated target does not move or otherwise change in the two frames of image data. Of course, it is preferable that the same first image data is used in step S1 and step S2, and the first image data can be stored in the image device and retrieved when needed.

Since the position information is detected and acquired from the first image data, the image area corresponding to the position information in the first image data is the designated target. And positioning the first image data to the area pointed by the position information to perform image interception to obtain target data corresponding to the designated target. Since the target data is cut out from the first image data, the data format thereof is still the first data format, which is the same as the data format of the first image data.

Step S3 is then executed to convert the data format of the target data from the first data format to a second data format, wherein the second data format is suitable for display and/or transmission of the target data.

In step S3, the target data in the first data format is subjected to image processing, and the data format of the target data is converted into a second data format, where the second data format is a data format suitable for display and/or transmission of the target data, and of course, both the first data format and the second data format are image formats. Of course, the image processing process may not only perform data format conversion, but also include other image processing to improve the image quality of the target data.

The embodiment of the invention utilizes the first image data of the first data format acquired to detect the designated target to acquire the position information of the designated target, then utilizes the first image data of the first data format to intercept the target data corresponding to the position information, and the target data is intercepted from the first image data, so that the image format or quality is not changed.

Step S1 is a step of acquiring location information, where location information is obtained by detecting an interested specified object and locating the specified object after detecting the specified object, where types of the specified object are not limited, such as characters, people, vehicles, license plates, buildings, and the like, and shapes and sizes of the specified object are also not limited. The method can be implemented by converting the input first image data in the first data format into common data capable of performing target detection, and then performing target detection, or by directly performing target detection on the first image data in the first data format and outputting target position information, and the specific implementation manner is not limited.

In one embodiment, the above method flow can be executed by the image processing apparatus 100, as shown in fig. 2, the image processing apparatus 100 mainly includes 3 modules: a first processing module 101, a second processing module 102 and a third processing module 103. The first processing module 101 is used for executing the above step S1, the second processing module 102 is used for executing the above step S2, and the third processing module 103 is used for executing the above step S3.

As shown in fig. 2, the first processing module 101 takes first image data in a first data format obtained by an image capturing device as input, detects a target or an object of interest therein, and outputs position information of the detected target; the second processing module 102 combines the position information of the target of interest output by the first processing module 101 and the originally input first image data in the first data format as input, and acquires target data in the first data format corresponding to the target of interest from the first image data in the original first data format; the third processing module 103 takes the target data in the first data format corresponding to the target output by the second processing module 102 as input, and performs adaptive ISP processing to obtain target data in the second data format with higher quality.

In one embodiment, as shown in fig. 3, the first processing module 101 includes a first processing unit 1011 and a second processing unit 1012, and step S101 may be performed by the first processing unit 1011, and step S102 may be performed by the second processing unit 1012, so as to implement step S1 described above. The step S1 specifically includes the following steps:

s101: converting the first image data into second image data capable of carrying out target detection;

s102: position information of a specified object is detected in the second image data, and the detected position information is determined as the position information of the specified object in the first image data.

Since the designated object needs to be detected and the first image data cannot be used to directly detect the designated object, in step S101, the first image data is first converted into second image data that can be used for object detection, so that the second image data can be used to detect the designated object. The specific conversion method is not limited as long as the first image data can be converted into the second image data capable of detecting the target.

The second image data may not be in the first data format any more due to conversion, and if the second image data is used for post-processing to detect the extraction target, the image quality cannot be guaranteed. Thus, in the present embodiment, the second image data is used not to extract the designated object but to detect the position information of the designated object.

After step S101, step S102 is executed, the position information of the designated object is detected in the second image data, and the designated object in the second image data is subjected to object recognition and positioning, so as to determine the position information of the designated object in the second image data, wherein the position relationship of the designated object in the first image data and the second image data does not change in general, and scaling or translation of the designated object between the first image data and the second image data is not excluded, but these scaling and translation can be determined during the processing, so that the position information of the designated object in the first image data can be obtained by knowing the position information of the designated object in the second image data, and the detected position information is determined as the position information of the designated object in the first image data.

Further, the manner of converting the first image data into the second image data that can be subject to the object detection may include performing color interpolation processing on at least the first image data. On the basis, at least one of the following treatments can be carried out: black level correction, white balance correction, contrast enhancement, and bit width compression, although not specifically limited thereto.

In one possible implementation, the first processing unit 1011 may implement step S101 described above by performing steps S1011 to S1015. Referring to fig. 4, steps S1011 to S1015 specifically include:

s1011: correcting a black level;

s1012: correcting white balance;

s1013: color interpolation;

s1014: contrast enhancement;

s1015: and compressing the bit width.

It is to be understood that the manner of converting the first image data into the second image data is not limited to the above steps S1011 to S1015, and the processing order is not limited, and for example, the first image data may be converted into the second image data by only performing the color interpolation processing as long as the obtained second image data can perform the object detection.

In step S1011, assuming that the first image data in the first data format is imgR, the black level correction is to remove the influence of the black level in the first image data in the first data format, and the imgR is output_blc：

imgR_blc＝imgR-V_blc

Wherein, V_blcIs a black level value.

In step S1012, the white balance correction is to remove the color cast of the image due to the environmental illumination effect to restore the original color information of the image, and may be implemented by two coefficients R_gain、B_gainTo control the adjustment of the corresponding R1 and B1 components:

R1′＝R1*R_gain

B1′＝B1*B_gain

where R1 and B1 are color components of red and blue channels of the image data after the black level correction processing, R1 'and B1' are color components of red and blue channels of the output image of the white balance correction module, and the output image is denoted as imgR_wb。

In step S1013, the data for which the color interpolation is performed is the data after the white balance correction processing, and the color interpolation may be implemented by using a nearest neighbor interpolation method, and the first image data in the single-channel first data format is expanded into multi-channel data. For the first image data in the first data format in the Bayer format, the nearest color pixel is directly used to fill the pixel point with the missing corresponding color, so that each pixel point contains three color components of RGB, the specific interpolation process is as shown in fig. 5, R11 fills the three color pixels adjacent to the pixel point into R11, which adjacent color pixels can be specifically filled and set, and the other color pixels are also the same, which is not described herein again, and the image after interpolation is recorded as imgC.

In step S1014, the data for contrast enhancement is color interpolated data, the contrast enhancement is to enhance the contrast of the interpolated image, and linear mapping may be performed using a Gamma curve, and assuming that the mapping function of the Gamma curve is f (), the mapped image is denoted as imgC_gm：

imgC_gmAnd (i, j) is the coordinate of the pixel point.

In step S1015, the data targeted for bit width compression is contrast-enhanced data, and the bit width compression is high bit width data imgC obtained by enhancing the contrast_gmCompressing to a bit width corresponding to the second data format, for example, directly adopting linear compression, and recording the compressed image as imgC_lb：

imgC_lb(i,j)＝imgC_gm(i,j)/M

Wherein, M is the compression ratio from the first data format to the second data format.

In one possible implementation, the second processing unit 1012 may implement step S102 described above by performing steps S1021 to step 1022.

S1021: inputting the second image data to a trained first neural network; the first neural network is used for realizing positioning at least through a convolution layer, a pooling layer, a full-connection layer and a frame regression layer;

s1022: determining a result output by the first neural network as position information of the specified target in the first image data.

In step S1021, the first neural network is a trained network, and the second image data is input into the first neural network, so that the designated target can be positioned in the second image data, and the position information of the designated target is obtained accordingly.

The first neural network may be integrated in the second processing unit 1012 as a part of the first processing module 101, or may be disposed outside the first processing module 101, and may be scheduled by the second processing unit 1012.

Referring to fig. 6, the first neural network 200 may include at least one convolutional layer 201 for performing convolution, at least one pooling layer 202 for performing downsampling, at least one fully-connected layer 203 for performing feature synthesis, and at least one bounding box regression layer 204 for performing coordinate transformation.

As an example of the first neural network, referring to fig. 7, the first neural network 200 may include a convolutional layer 205, a convolutional layer 206, a pooling layer 207 …, a convolutional layer 208, a pooling layer 209, a fully-connected layer 210, and a bounding box regression layer 211, which are connected in sequence, input the second image data, and output position information as position information of a specified target in the first image data. The functions performed by each layer of the first neural network have been described above, and each layer may be adaptively changed, for example, the convolution kernels of different convolutional layers may be different, and thus, the description thereof is omitted here. It is to be understood that the first neural network shown in fig. 7 is only one example, and is not particularly limited thereto, for example, convolutional layers, and/or pooling layers, and/or other layers may be reduced or added.

The specific functions of the layers in the first neural network are described below, but should not be limited thereto.

The convolutional layer (Conv) performs convolution operations, and may also have an activation function ReLU, and may perform activation operations on convolution results, so that the operation on a convolutional layer may be represented by the following formula:

YC_i(I)＝g(W_i*YC_i-1(I)+B_i)

wherein, YC_i(I) Is the output of the ith convolutional layer, YC_i-1(I) For the input of the ith convolution layer, denotes the convolution operation, W_iAnd B_iG () represents an activation function, and when the activation function is ReLU, g (x) is max (0, x), and x is YC_i(I)。

The pooling layer (Pool) is a special down-sampling layer, i.e., a feature map obtained by convolution is reduced, the size of a reduction window is, for example, N × N, and when maximum pooling is used, the maximum value is obtained for the N × N window as the value of the corresponding point of the latest image, and the specific formula is as follows:

YP_j(I)＝maxpool(YP_j-1(I))

wherein YP_j-1(I) Is the input to the jth pooling layer, YP_j(I) Is the output of the jth pooling layer.

The fully-connected layer (FC) can be regarded as a convolutional layer with a filtering window of 1 × 1, each node of the fully-connected layer is connected to all nodes of the previous layer, and is used to integrate the extracted features, which is implemented similar to the convolutional filtering, and the expression may be as follows, for example:

wherein, F_kI(I) For input to the kth fully-connected layer, YF_k(I) Is the output of the kth fully-connected layer, R, C is F_kI(I) Width and height of (W)_ijAnd B_ijRespectively, the connection weight coefficient and the bias coefficient of the full connection layer, and g () represents the activation function, I is (I, j).

The frame regression layer (BBR) is used for searching a relation so that a window P output by the full connection layer is mapped to obtain a window G' which is closer to the real window G; the regression is generally carried out by transforming the coordinates of the window P, including for example a translation transformation and/or a scaling transformation; let the coordinate of the window P of the full link layer output be (x)₁,x₂,y₁,y₂) Transformed coordinate (x) after window₃,x₄,y₃,y₄)；

If the translation is converted into translation transformation, the translation scale is (Δ x, Δ y), and the coordinate relationship before and after translation is as follows:

x₃＝x₁+Δx

x₄＝x₂+Δx

y₃＝y₁+Δy

y₄＝y₂+Δy

if the scale is transformed into scaling transformation, the scaling in the direction X, Y is dx and dy, and the coordinate relationship before and after transformation is:

x₄-x₃＝(x₂-x₁)*dx

y₄-y₃＝(y₂-y₁)*dy。

in step S1022, the position information of the designated target in the first image data is determined according to the output result of the first neural network, and the output result of the first neural network may be directly used as the position information of the designated target in the first image data, or the output result may be converted by using the position change relationship of the designated target between the first image data and the second image data to obtain the position information of the designated target in the first image data.

For the training of the first neural network, the training model of the first neural network may be trained by acquiring the second image data sample and the corresponding position information sample as a training sample set, taking the second image data sample as an input, and taking the corresponding position information sample as an output. With regard to the acquisition of the second image data sample and the corresponding position information sample, the second image data sample may be processed in an image processing manner in which the detection target can be identified to obtain the corresponding position information sample.

In another embodiment, referring to fig. 8, the first processing module 101 includes a third processing unit 1013, and the steps S111 and S112 may be executed by the third processing unit 1013 to implement the step S1. Step S111 and step S112 are specifically:

s111: inputting the first image data to a trained second neural network; the second neural network at least converts the first image data into second image data capable of carrying out target detection and detects the position information of a specified target in the second image data through a graying layer, a convolutional layer, a pooling layer, a full-link layer and a frame regression layer;

s112: determining a result output by the second neural network as position information of the specified target in the first image data.

The second neural network may be integrated in the third processing unit 1013 as a part of the first processing module 101, or may be disposed outside the first processing module 101 and may be scheduled by the third processing unit 1013.

Referring to fig. 9, the second neural network 300 includes at least one graying layer 301 for performing grayscale processing, one convolutional layer 302 for performing convolution, one pooling layer 303 for performing downsampling, one fully connected layer 304 for performing feature synthesis, and one bounding box regression layer 305 for performing coordinate transformation, and the conversion of the first image data into second image data capable of object detection and the detection of the position information of the specified object in the second image data are realized entirely by the second neural network without performing other ISP processing, and of course, according to different requirements, certain information processing may be performed on the basis of the second neural network processing, without limitation.

As an example of the second neural network, referring to fig. 10, the second neural network 300 may include a graying layer 306, a convolutional layer 307, a convolutional layer 308, a pooling layer 309 … …, a convolutional layer 310, a pooling layer 311, a fully-connected layer 312, and a bounding box regression layer 313, the first image data is directly input to the second neural network, and each layer structure of the second neural network is subjected to application processing and then outputs position information as position information of a specified target in the first image data. The functions performed by each layer of the first neural network have been described above, and each layer may be adaptively changed, for example, the convolution kernels of different convolutional layers may be different, and thus, the description thereof is omitted here. It is to be appreciated that the second neural network 300 shown in fig. 10 is merely an example, and in particular, is not limited thereto, e.g., convolutional layers, and/or pooling layers, and/or other layers may be reduced or added.

The graying layer in the second neural network converts the multi-channel first data format information into single-channel gray information, and can be obtained by respectively weighting components representing different colors around the current pixel point. Referring to fig. 11, the weighted components RGB of different colors are converted into single-channel gray information Y by the graying layer process, for example, for Y22, the calculation formula is as follows: y22 ═ B22+ (G12+ G32+ G21+ G23)/4+ (R11+ R13+ R31+ R33)/4)/3

Other components may be similar, and are not described in detail herein.

The convolution layer, pooling layer, full-link layer and frame regression layer in the second neural network may perform the same function as the corresponding layer in the first neural network, and each layer may have adaptive variation, for example, convolution kernels of different convolution layers may be different, and thus, the details are not repeated herein.

For the training of the second neural network, the training model of the second neural network may be trained by acquiring the first image data sample and the corresponding position information sample as a training sample set, taking the first image data sample as an input, and taking the corresponding position information sample as an output. For obtaining the first image data sample and the corresponding position information sample, the first image data sample may be first subjected to image processing for making the target detectable, and then the target may be detected by an image processing method for identifying the detected target, so as to obtain the corresponding position information sample.

In step S2, data may be cut from the position information of the designated object in the first image data in the first data format obtained in step S1 to the corresponding position in the first image data in the originally input first data format, and the cut data is used as the object data in the first data format of the corresponding object.

In one embodiment, assuming that the position information of the designated target in the first image data obtained in step S1 is [ x1, x2, y1, y2], where x1 and y1 are start position information, and x2 and y2 are end position information, when the first image data of the first data format corresponding to the whole image is represented by imgR, the target data imgT of the first data format of the designated target is:

imgT＝imgR(x1:x2,y1:y2)。

in step S3, the target data of the first data format corresponding to the designated target obtained in step S2 is processed to convert the target data of the designated target from the first data format to the second data format. Step S3 is actually image processing for a small target, and may be realized by ISP processing realized by a non-neural network or may be realized by a neural network.

In one embodiment, as shown in fig. 12, the third processing module 103 includes a fourth processing unit 1031, and the following steps may be executed by the fourth processing unit 1031 to implement the above step S3.

Inputting the target data to a trained third neural network; the third neural network implements conversion of the data format of the target data from the first data format to a second data format by at least a convolutional layer.

The third neural network may be integrated in the fourth processing unit 1031 as a part of the third processing module 103, or may be disposed outside the third processing module 103, and may be scheduled by the fourth processing unit 1031.

The third neural network may include at least one convolutional layer implementation for performing convolution to convert a data format of the target data from the first data format to a second data format. Of course, the layer structure of the third neural network is not limited thereto, and may further include at least one ReLu layer for performing activation, or may further include other layers, for example. The number of layers is not limited.

The image processing is realized based on the third neural network, and error propagation possibly caused by the fact that the traditional image processing is processed in each processing step respectively is reduced.

The operations performed by the layers of the third neural network are described in detail below, but should not be limited thereto.

Convolutional layers of a third neural network, assuming the input of each convolutional layer is FC_iThe output of the convolutional layer is FC_i+1Then, there are:

FC_i+1＝g(w_ik*FC_i+b_ik)

w_ik、b_ikfor the parameters of the kth convolution in the current convolutional layer, g (x) is a linear weighting function, i.e., the convolution output of each convolutional layer is linearly weighted. When in useHowever, the convolutional layers of the third neural network and the convolutional layers of the first neural network are all performed convolution operations, and thus the functions are similar, and the related description can also refer to the contents of the convolutional layers of the first neural network.

ReLu layers of the third neural network, assuming inputs of each ReLu layer as FR_iThe input of the ReLu layer is FR_i+1Then, there are:

FR_i+1＝max(FR_i0), i.e. selecting 0 and FR_iThe largest one.

As an embodiment of the third neural network, referring to fig. 13, the third neural network 400 may include a convolutional layer 401, a convolutional layer 402, a ReLu layer 403, a convolutional layer 404, and a convolutional layer 405 connected in sequence, and the input is target data in a first data format and the output is target data in a second data format. The functions performed by each layer of the third neural network have been described above, and each layer may be adaptively changed, for example, the convolution kernels of different convolutional layers may be different, and thus, the description thereof is omitted here. It is to be understood that the third neural network shown in fig. 13 is only an example, and is not particularly limited thereto, and for example, convolutional layers, and/or pooling layers, and/or other layers may be reduced or added.

For the training of the third neural network, in order to optimize the deep neural network in advance, a large number of target data samples in the first data format and target data samples corresponding to the ideal second data format may be used to form samples, and network parameters used in the training process of the third neural network are continuously trained until the target data in the first data format is input, the target data in the ideal second data format can be output, and at this time, the network parameters are output for the third neural network to actually test and use.

The training procedure for training the third neural network may include the steps of:

s311: collecting training samples: first data format information corresponding to the object of interest and corresponding ideal second data format information are collected. Suppose that n training sample pairs { (x) have been obtained₁,y₁)，(x₂,y₂),…,(x_n,y_n) In which x_iFirst data format information, y, representing input_iRepresenting the corresponding ideal second data format information.

S312: designing the structure of a third neural network; the network structure used for network training and the network structure used for testing are the same;

s313: initializing a training parameter; initializing network parameters of the structure of the third neural network, wherein random value initialization, fixed value initialization and the like can be adopted; setting training related parameters such as learning rate, iteration times and the like;

s314: forward propagation; based on current network parameters, using training sample x_iPerforming forward propagation on the third neural network to obtain an output F (x) of the third neural network_i) Calculating a Loss function Loss:

Loss＝(F(x_i)-y_i)²；

s315: backward propagation: adjusting network parameters of a third neural network by using back propagation;

s316: and (3) repeatedly iterating: and repeating the iteration steps S314 and S315 until the network converges, and outputting the network parameters at the moment.

Of course, the training process of the third neural network is not limited to this, and other training manners are also possible, as long as the trained third neural network can achieve that the target data in the first data format is input and the corresponding target data in the second data format is obtained.

In another embodiment, as shown in fig. 14, the third processing module 103 includes a fifth processing unit 1032, and the fifth processing unit 1032 may perform ISP processing on the target data, where the data format of the target data is converted from the first data format to the second data format, where the ISP processing at least includes color interpolation, so as to implement step S3 described above.

Further, the ISP processing further comprises at least one of: white balance correction and curve mapping can further improve the image quality.

The parameters in the ISP processing are calculated only by using the target data in the first data format, so that the accuracy of the processing parameters can be improved, and the image quality after the target data is processed is improved.

As an embodiment of performing ISP processing on the target data, referring to fig. 15, the ISP processing may sequentially include the following steps:

s301: correcting white balance; inputting target data in a first data format;

s302: color interpolation;

s303: curve mapping; outputting the target data in the second data format.

It is to be understood that the ISP processing for converting the data format of the target data from the first data format to the second data format is not limited thereto, and may be, for example, only color interpolation, or may include other ISP processing methods.

The ISP processing, such as white balance correction, color interpolation, and curve mapping, will be described in more detail below, but should not be limited thereto.

The white balance correction is to remove the color cast of the image due to the influence of ambient light to restore the original color information of the image, and generally consists of two coefficients R_-gain、B_-gainTo control the adjustment of the corresponding R and B components.

R2′＝R2*R_-gain

B2′＝B2*B_-gain

Where R2, B2 are the red, blue channel color components of the white balance corrected input image, and R2', B2' are the red, blue channel color components of the white balance corrected output image; for white balance correction relative to the full picture, where R_-gain、B_-gainOnly the R, B, G channel color components of the object of interest need to be counted and calculated;

in the calculation of R_-gain、B_-gainWhen it is needed, the mean value R of each color component of the R, G, B channel needs to be counted first_avg、G_avgAnd B_avgThen, there are:

the color interpolation means that the target data of the first data format after white balance correction is expanded into a multi-channel data format with each channel representing a color component from a single-channel format; the method can be realized by adopting a nearest neighbor interpolation method, and target data in a single-channel first data format is expanded into target data in multiple channels. For example, for image data in the first data format in the Bayer format, the nearest color pixel may be directly used to fill up a pixel point with missing corresponding color, so that each pixel point contains three color components of RGB.

The curve mapping refers to adjusting the brightness and contrast of image data according to the visual characteristics of human eyes, commonly mapping Gamma curves with different parameters, assuming that the mapping function of the Gamma curve is g, and recording the mapped image as img_gmAnd if the image before mapping is denoted as img, then:

img_gm(i,j)＝g(img(i,j))。

the following describes the image processing apparatus according to an embodiment of the present invention, but the present invention is not limited thereto.

In one embodiment, referring to fig. 2, an image processing apparatus 100 may include:

the first processing module 101 is configured to acquire position information of a specified target in first image data in a first data format from the acquired first image data;

the second processing module 102 is configured to intercept target data corresponding to the position information from the first image data;

a third processing module 103, configured to convert the data format of the target data from the first data format to a second data format, where the second data format is suitable for display and/or transmission of the target data.

In the embodiment of the present invention, the image processing apparatus 100 may be applied to an image device, and the image device may be a device having an imaging function, such as a camera, or a device capable of performing image post-processing, and is not limited in particular. The first image data in the first data format may be image data acquired by the device itself, or image data acquired by the device from other devices and acquired by the other devices, and is not limited specifically.

In one embodiment, referring to fig. 3, the first processing module 101 comprises a first processing unit 1011 and a second processing unit 1012;

the first processing unit 1011 is configured to convert the first image data into second image data that can be used for target detection;

the second processing unit 1012 is configured to detect position information of the designated object in the second image data, and determine the detected position information as position information of the designated object in the first image data.

In an embodiment, the second processing unit 1012 is specifically configured to:

inputting the second image data into a trained first neural network, and determining the result output by the first neural network as the position information of the specified target in the first image data; the first neural network enables location and output of location information of the specified target through at least a convolution layer for performing convolution, a pooling layer for performing downsampling, a full-link layer for performing feature synthesis, and a bounding box regression layer for performing coordinate transformation.

In an embodiment, the first processing unit 1011 is specifically configured to:

In one embodiment, referring to fig. 8, the first processing module 101 comprises a third processing unit 1013;

the third processing unit 1013 is configured to input the first image data to a trained second neural network; the second neural network converts the first image data into second image data which can be subjected to target detection and detects the position information of a specified target in the second image data at least through a graying layer for performing grayscale processing, a convolution layer for performing convolution, a pooling layer for performing downsampling, a full connection layer for performing feature synthesis and a frame regression layer for performing coordinate transformation; determining a result output by the second neural network as position information of the specified target in the first image data.

In one embodiment, referring to fig. 12, the third processing module 103 comprises a fourth processing unit 1031;

the fourth processing unit 1031, configured to input the target data to a trained third neural network; the third neural network effects conversion of the data format of the target data from the first data format to a second data format by at least a convolutional layer for performing convolution.

In one embodiment, referring to fig. 14, the third processing module 103 includes a fifth processing unit 1032;

the fifth processing unit 1032 is configured to perform ISP processing on the target data; wherein the ISP processing is configured to convert the data format of the target data from the first data format to a second data format, including at least color interpolation.

The implementation process of the functions and actions of each unit in the above device is specifically described in the implementation process of the corresponding step in the above method, and is not described herein again.

For the device embodiments, since they substantially correspond to the method embodiments, reference may be made to the partial description of the method embodiments for relevant points. The above-described embodiments of the apparatus are merely illustrative, wherein the units described as separate parts may or may not be physically separate, and the parts shown as units may or may not be physical units.

The invention also provides an electronic device, which comprises a processor and a memory; the memory stores a program that can be called by the processor; wherein the processor, when executing the program, implements the image processing method as described in any of the preceding embodiments.

The embodiment of the image processing device can be applied to electronic equipment. Taking a software implementation as an example, as a logical device, the device is formed by reading, by a processor of the electronic device where the device is located, a corresponding computer program instruction in the nonvolatile memory into the memory for operation. From a hardware aspect, as shown in fig. 16, fig. 16 is a hardware structure diagram of an electronic device where the image processing apparatus 100 is located according to an exemplary embodiment of the present invention, and besides the processor 510, the memory 530, the interface 520, and the nonvolatile memory 540 shown in fig. 7, the electronic device where the apparatus 100 is located in the embodiment may also include other hardware according to an actual function of the electronic device, which is not described again.

The present invention also provides a machine-readable storage medium on which a program is stored, which, when executed by a processor, causes an image apparatus to implement the image processing method as described in any one of the preceding embodiments.

The present invention may take the form of a computer program product embodied on one or more storage media including, but not limited to, disk storage, CD-ROM, optical storage, and the like, having program code embodied therein. Machine-readable storage media include both permanent and non-permanent, removable and non-removable media, and the storage of information may be accomplished by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of machine-readable storage media include, but are not limited to: phase change memory (PRAM), Static Random Access Memory (SRAM), Dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), Read Only Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technologies, compact disc read only memory (CD-ROM), Digital Versatile Discs (DVD) or other optical storage, magnetic tape storage or other magnetic storage devices, or any other non-transmission medium, may be used to store information that may be accessed by a computing device.

The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like made within the spirit and principle of the present invention should be included in the scope of the present invention.

Claims

1. An image processing method, comprising:

acquiring position information of a specified target in first image data in a first acquired data format; the first data format is a data format that is not displayable as an image;

converting the data format of the target data from the first data format to a second data format, the second data format being a data format displayable as an image, the second data format being suitable for display and/or transmission of the target data.

2. The image processing method according to claim 1, wherein the acquiring of the position information of the specified target in the first image data from the acquired first image data in the first data format comprises:

3. The image processing method according to claim 2, wherein the detecting of the position information of the specified object in the second image data, and the determining of the detected position information as the position information of the specified object in the first image data comprises:

4. The image processing method according to claim 2, wherein said converting the first image data into second image data that can be subject to object detection comprises:

5. The image processing method according to claim 1, wherein the acquiring of the position information of the specified target in the first image data from the acquired first image data in the first data format comprises:

inputting the first image data to a trained second neural network; the second neural network converts the first image data into second image data which can be subjected to target detection and detects position information of a specified target in the second image data at least through a graying layer for performing grayscale processing, a convolution layer for performing convolution, a pooling layer for performing downsampling, a full-link layer for performing feature synthesis and a bounding box regression layer for performing coordinate transformation;

6. The image processing method of any of claims 1 to 5, wherein the converting the data format of the target data from the first data format to a second data format comprises:

7. The image processing method of any of claims 1 to 5, wherein the converting the data format of the target data from the first data format to a second data format comprises:

8. The image processing method of claim 7, wherein the ISP processing further comprises at least one of: white balance correction, curve mapping.

9. An image processing apparatus characterized by comprising:

the first processing module is used for acquiring position information of a specified target in first image data from the acquired first image data in a first data format; the first data format is a data format that is not displayable as an image;

and the third processing module is used for converting the data format of the target data from the first data format into a second data format, the second data format is a data format which can be displayed as an image, and the second data format is suitable for the display and/or transmission of the target data.

10. The image processing apparatus according to claim 9, wherein the first processing module includes a first processing unit and a second processing unit;

11. The image processing apparatus according to claim 10, wherein the second processing unit is specifically configured to:

12. The image processing apparatus according to claim 10, wherein the first processing unit is specifically configured to:

13. The image processing apparatus of claim 9, wherein the first processing module includes a third processing unit;

the third processing unit is used for inputting the first image data to a trained second neural network; the second neural network converts the first image data into second image data which can be subjected to target detection and detects the position information of a specified target in the second image data at least through a graying layer for performing grayscale processing, a convolution layer for performing convolution, a pooling layer for performing downsampling, a full connection layer for performing feature synthesis and a frame regression layer for performing coordinate transformation; determining a result output by the second neural network as position information of the specified target in the first image data.

14. The image processing apparatus according to claim 9, wherein the third processing module includes a fourth processing unit;

15. The image processing apparatus of claim 9, wherein the third processing module includes a fifth processing unit;

16. An electronic device comprising a processor and a memory; the memory stores a program that can be called by the processor; wherein the processor, when executing the program, implements the image processing method of any one of claims 1 to 8.

17. A machine-readable storage medium, having stored thereon a program which, when executed by a processor, implements the image processing method according to any one of claims 1 to 8.