WO2020006739A1 - 图像处理方法和装置 - Google Patents

图像处理方法和装置 Download PDF

Info

Publication number
WO2020006739A1
WO2020006739A1 PCT/CN2018/094716 CN2018094716W WO2020006739A1 WO 2020006739 A1 WO2020006739 A1 WO 2020006739A1 CN 2018094716 W CN2018094716 W CN 2018094716W WO 2020006739 A1 WO2020006739 A1 WO 2020006739A1
Authority
WO
WIPO (PCT)
Prior art keywords
target object
pixel
image
information
current image
Prior art date
Application number
PCT/CN2018/094716
Other languages
English (en)
French (fr)
Inventor
郑萧桢
封旭阳
张李亮
赵丛
Original Assignee
深圳市大疆创新科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳市大疆创新科技有限公司 filed Critical 深圳市大疆创新科技有限公司
Priority to CN201880037369.6A priority Critical patent/CN110720224B/zh
Priority to PCT/CN2018/094716 priority patent/WO2020006739A1/zh
Publication of WO2020006739A1 publication Critical patent/WO2020006739A1/zh

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/23Processing of content or additional data; Elementary server operations; Server middleware
    • H04N21/234Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs
    • H04N21/2343Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs involving reformatting operations of video signals for distribution or compliance with end-user requests or end-user device requirements
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/134Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
    • H04N19/136Incoming video signal characteristics or properties
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/20Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using video object coding
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/44Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs
    • H04N21/4402Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs involving reformatting operations of video signals for household redistribution, storage or real-time display

Definitions

  • the present application relates to the field of image processing, and in particular, to an image processing method and device.
  • the objects of interest including people, animals, plants, public facilities, vehicles or landscapes, and scenery
  • the decoder or the observer can update To track the change of the object in the video stream, so as to better assist the observer to observe or interact with the object.
  • This type of method in image processing can generally be referred to as object tracking technology.
  • Existing object tracking technologies usually use image processing, computer vision, and computer analysis and understanding technologies to identify the content of the video stream and identify the objects that require attention.
  • the position and size of an object of interest in each frame of the image are identified by a rectangular area at the encoding end or the decoding end.
  • the decoder performs additional operations based on the rectangular region, which results in poor processing results and low processing efficiency.
  • the present application provides an image processing method and apparatus, so that a decoding device can perform pixel-level processing on a target object more efficiently and accurately.
  • an image processing method including: obtaining code stream data of a current image, where the code stream data includes identification information, and the identification information is used to identify at least one target object in the current image.
  • the identification information includes image area information and pixel information, the image area information includes a position and a size of an image area where the target object is located, and the pixel information includes an attribute of at least one pixel in the image area; Decode the stream data to obtain the current image and the identification information; and perform pixel-level processing on the current image according to the identification information.
  • an image processing apparatus including: at least one memory for storing computer-executable instructions; at least one processor, individually or collectively, for: accessing the at least one memory and executing the computer-executable Execute an instruction to implement the following operations: obtaining code stream data of the current image, the code stream data including identification information, the identification information used to identify at least one target object in the current image, the identification information including an image Area information and pixel information, the image area information including a position and a size of an image area where the target object is located, the pixel information including an attribute of at least one pixel in the image area, and decoding the code stream data To obtain the current image and the identification information; and perform pixel-level processing on the current image according to the identification information.
  • a computer-readable storage medium on which instructions are stored, and when the instructions run on the computer, the computer is caused to execute the method of the first aspect.
  • the image processing method and device of the present application indicate the position and size of the image area where the target object is located through the image area information, and indicate the attributes of multiple pixels in the image area through the pixel information, thereby identifying the target object with a finer granularity, This enables the decoding device to perform pixel-level processing on the target object more efficiently and accurately.
  • the image processing method and device of the present application enable recognition of a target object to be performed on an encoding end, and a decoding device only needs to perform subsequent image processing. Therefore, on the one hand, the image processing method of the present application can be implemented on platforms such as mobile phones and tablets; on the other hand, the computing resources of the decoding device can be used for more complex image processing, so that the decoding device can present a higher quality and more delicate Image.
  • FIG. 1 is a schematic flowchart of an encoding method according to an embodiment of the present application.
  • FIG. 2 is a schematic diagram of a target object in an image according to an embodiment of the present application.
  • FIG. 3 is a schematic flowchart of a decoding method according to an embodiment provided in the present application.
  • FIG. 4 is a schematic flowchart of an encoding device according to an embodiment of the present application.
  • FIG. 5 is a schematic flowchart of an encoding device according to another embodiment provided in the present application.
  • FIG. 6 is a schematic flowchart of a decoding device according to an embodiment provided in this application.
  • FIG. 7 is a schematic flowchart of a decoding device according to another embodiment provided in this application.
  • FIG. 8 is a schematic flowchart of an image processing method according to an embodiment of the present application.
  • 9A and 9B are schematic diagrams of two images obtained by fusion according to an embodiment of the present application.
  • 10A and 10B are schematic diagrams of adding an indicator halo on a target object according to an embodiment of the present application.
  • FIG. 11 is a schematic diagram of changing the brightness of a target object according to an embodiment of the present application.
  • FIG. 12A is an original image of the current image
  • FIG. 12B is an object category segmentation image corresponding to the current image.
  • FIG. 13 is a schematic diagram of images identified by different colors at different parts of an embodiment of the present application.
  • FIG. 14A is an original image of the current image
  • FIG. 14B is a reflection intensity segmented image corresponding to the current image.
  • FIG. 15A is an original image of the current image
  • FIG. 15B is a depth map corresponding to the current image.
  • FIG. 16 is a schematic block diagram of an image processing apparatus according to an embodiment of the present application.
  • FIG. 17 is a schematic block diagram of an image processing apparatus according to another embodiment of the present application.
  • the target object can refer to the objects in the image that need to be focused on, to be identified, identified or to be observed, and can include people, animals, plants, public facilities, vehicles or landscapes, scenery, etc., and can also include other types of objects ; Can also include specific parts of people, animals, plants, public facilities, vehicle landscapes, scenery or other types of objects.
  • the image area may refer to a regular or irregular shape area where the target object is located. Generally speaking, the position and size of the image area should be such that all parts of the target object fall into the image area, or at least 80% of the area on the target object should fall into the image area.
  • the image area can be roughly delimited, so that the decoder can determine the position and size of the target object faster.
  • the sub-image area may be a piece of area in the image area where pixels have the same attribute.
  • an existing method of object tracking technology is to encode the video content at the encoding end; analyze the video content at the decoding end to find the objects that need to be focused on, and respond to the need The objects of interest are identified, that is, the identification is done at the decoding end.
  • the problem with completing the identification at the decoding end is that video encoding is usually a lossy process, and the information of the video content will suffer loss after encoding.
  • the video content decoded by the decoder has a certain degree of degradation in quality and information.
  • the decoder analyzes a damaged video content and extracts the objects that need attention. The effect is usually not satisfactory.
  • analyzing video content and extracting objects at the decoding end will consume a lot of computing resources at the decoding end.
  • decoders are widely used in mobile devices such as mobile phones, which are more sensitive to power consumption. Therefore, the calculation of video content at the decoding end will affect the user experience to a certain extent.
  • the function of analyzing video content is transferred from the decoding end to the encoding end for execution.
  • These technologies identify the extracted object at the encoding end and write the identification information into the video file to ensure that the decoding end can identify the object extracted from the encoding end by parsing the identification information.
  • the advantages of this approach are: 1. Analysis of the original uncompressed loss video content at the encoding end can more efficiently and accurately extract the objects that need attention. 2. Since the encoding device usually has stronger computing capabilities, and the encoding device usually needs to analyze the video content in order to perform some additional operations, the calculation and analysis of the original decoding end will be transferred to the encoding end, without bringing Comes with a bad user experience. These additional operations may be, for example, obstacle avoidance operations performed after the captured video content is analyzed on the drone system.
  • the encoding end may use a common video coding standard, for example, H.264 / advanced video coding (AVC) standard, H.265 / high efficiency video coding (HEVC) Standards, source coding standards (audio video coding standard (AVS) 1-P2, AVS2-P2, VP9 standards, Open Media Alliance Video (AV) 1 standard, universal video coding (VVC) standard pair
  • AVC advanced video coding
  • HEVC high efficiency video coding
  • AVC advanced video coding
  • AVS audio video coding standard
  • AVS2-P2 high efficiency video coding
  • VP9 Open Media Alliance Video
  • VVC universal video coding
  • FIG. 1 is a schematic flowchart of an encoding method 100 according to an embodiment of the present application.
  • the encoding method 100 is performed by an encoding device.
  • the encoding method 100 includes: S110, encoding processing of a current image to generate code stream data, the code stream data includes identification information, and the identification information is used to identify at least one target object in the current image
  • the identification information includes image area information and pixel information, the image area information includes a position and a size of an image area where the target object is located, and the pixel information includes an attribute of at least one pixel in the image area.
  • the position and size of an image area where a target object is located are indicated by image area information, and attributes of multiple pixels in the image area are indicated by pixel information, thereby identifying the target object with finer granularity, It is beneficial for the decoding end to perform operations on the target object more efficiently and accurately.
  • the encoding method 100 may further include: performing image recognition on the current image, determining the target object, and obtaining the identifier of the target object.
  • image recognition can be based on technologies such as image processing, computer vision, and computer analysis and understanding.
  • the identification information in the embodiment of the present application may also be obtained by other methods, for example, by receiving external input.
  • the form and content of the obtained identification information can be various, which will be described in detail below.
  • the identification information may be located in auxiliary enhancement information or extended data of the current image.
  • the auxiliary enhancement information may be Supplemental Enhancement Information (SEI)
  • the extension data may be ED (Extension Data).
  • SEI and ED can generally be considered as part of the stream data.
  • the decoding device may decode according to the SEI and / or ED, or discard the SEI and / or ED. Whether to decode the identification information may not affect the decoding of the content of the current image. This will also be described in detail below.
  • the image area may be a rectangular area.
  • the image area is the smallest rectangular area or a smaller rectangular area that can frame the target object.
  • the image region information may include coordinates of any corner of the rectangular region (for example, coordinates of the upper left corner), height information of the rectangular region, and width information of the rectangular region.
  • the image area information may include the coordinates of the center point of the rectangular area, the height information of the rectangular area, and the width information of the rectangular area.
  • the height information of the rectangular area may be the full height or half height of the rectangular area.
  • the width information of the rectangular area may be the full width or half width of the rectangular area, which is not limited herein.
  • the image region information may include coordinates of the upper left corner of the rectangular region and coordinates of the lower right corner of the rectangular region.
  • the image region information may include coordinates of the upper-right corner of the rectangular region and coordinates of the lower-left corner of the rectangular region, and so on.
  • the specific content of the image area information is not limited in the embodiments of the present application.
  • the image region may have other shapes, such as a circle, a polygon, or a curved edge, and so on.
  • the image area information may include the coordinates of the center of the circle (that is, the coordinates of the center point) and the radius information.
  • the image area information may include the coordinates of the center point and the distance information between the center point and the vertex of the regular hexagon.
  • the image region may include multiple sub-image regions.
  • the sub-image area may be a piece of area in the image area where pixels have the same attribute.
  • one sub-image region may be a region corresponding to a target object
  • another sub-image region may be a region corresponding to a background.
  • a sub-image area may be a region corresponding to one part of the target object
  • another sub-image region may be a region corresponding to another part of the target object
  • another sub-image region may be a region corresponding to the background.
  • the attributes may be measured in pixels, that is, each pixel corresponds to its own attribute.
  • the pixel information includes information about the attributes of each pixel.
  • the attributes may also be measured in pixels. Ground, the pixel information includes information of attributes of at least one pixel block, and the pixel block includes at least two pixels.
  • a pixel block may be an area with a finer or smaller granularity than an image area.
  • the attribute of a pixel block means that the attributes of all pixel points in the pixel block are attributes of the pixel block.
  • a pixel block may be a regular shaped block, such as a square or rectangular block.
  • a pixel block can also be an irregularly shaped block.
  • a pixel block may include multiple pixels (e.g., 2, 4, 9, or 16 pixels). When the attributes are measured in pixel blocks, the sizes of multiple pixel blocks may be the same or different.
  • the current image can be down-sampled first to obtain the attribute information corresponding to the pixel block.
  • the attributes are measured in pixels, the attributes are measured in pixel blocks to reduce the amount of data stored or transmitted by the encoding device.
  • the pixel information may also be obtained from other alternative forms or solutions, which are not listed here one by one.
  • the pixel information may include a value assigned to at least one pixel in the image area; wherein pixels in different sub-image areas are assigned the same or different values.
  • the pixel values of different sub-image regions in the same image region may be the same or different.
  • the two sub-image regions in the graphics region that are not connected are all regions other than the target object, then the The values assigned by the pixels may be the same or different.
  • the pixel values of the sub-image areas in different image areas can be the same or different.
  • the values assigned by the sub-image areas belonging to the target object in different image areas can be the same or different.
  • the sub-image areas outside the target object in different image areas The assigned values may be the same or different.
  • the pixel information may also be represented by a non-numeric indicator, which is not limited in the embodiment of the present application.
  • the attribute of the at least one pixel may include whether the at least one pixel belongs to the target object.
  • different values are assigned to at least one pixel to indicate whether the at least one pixel belongs to the target object.
  • the first portion of pixels is assigned a first value to indicate that the first portion of pixels does not belong to the target object.
  • the pixel information includes values of pixels that do not belong to the target object.
  • the image region includes one (or more) sub-image regions as target objects; the image region also includes several sub-image regions as backgrounds that do not belong to the target object.
  • the pixel information may include only attributes of pixels that do not belong to the target object, or the pixel information may include only values of pixels that do not belong to the target object. In other words, the pixel information may include only attributes or numerical values of pixels belonging to the several sub-image regions of the background.
  • the second part of the pixels is assigned a second value to indicate that the second part of the pixels belong to the target object.
  • the pixel information includes numerical values of pixels belonging to the target object.
  • the image region includes one (or more) sub-image regions as target objects; the image region also includes several sub-image regions as backgrounds that do not belong to the target object.
  • the pixel information may include only attributes of pixels belonging to the target object, or the pixel information may include only values of pixels belonging to the target object. In other words, the pixel information may include only attributes or numerical values of pixels belonging to one (or more) sub-image regions of the target object.
  • the first partial pixel is assigned a first value to indicate that the first partial pixel does not belong to the target object; the second partial pixel is assigned a second value to indicate the second partial pixel.
  • the pixel information includes numerical values of all pixels.
  • the image region includes one (or more) sub-image regions as target objects; the image region also includes several sub-image regions as backgrounds that do not belong to the target object.
  • the pixel information may include both the attributes of pixels belonging to the target object and the attributes of pixels belonging to the background; or in other words, the pixel information may include both the values of pixels belonging to the target object and the values of pixels belonging to the background.
  • the pixel information may include both attributes or values of pixels belonging to one (or more) sub-image areas of the target object; and attributes or values of pixels belonging to several sub-image areas of the background.
  • the pixel information may be represented by a mask.
  • the template value can be identified by the binary values 0 and 1.
  • the template value of the pixels belonging to the target object in the pixel information is 1; the template value of the pixels belonging to the background is 0.
  • the specific content of the identification information of the target object i can be as follows. Those skilled in the art can understand that this content is only schematic and can be obtained from other alternative forms or solutions, which are not listed here one by one.
  • ar_object_top [i], ar_object_left [i], ar_object_width [i], and ar_object_height [i] represent the position and size of the target object i
  • ar_object_top [i] and ar_object_left [i] represent the position of the upper left corner of the target object i
  • ar_object_width [i] and ar_object_height [i] represent the width and height of the target object i.
  • mask [m] [n] represents the template value corresponding to pixels with coordinates offset m and n in the vertical and horizontal directions relative to the upper left corner of the rectangular area. When the pixel belongs to the target object, the value of mask [m] [n] is 1; otherwise, when the pixel belongs to the background, the value of mask [m] [n] is 0.
  • a point-by-point identification method may be adopted, or the target frame class indicated by the target object identified by ar_object_top [i], ar_object_left [i], ar_object_width [i], and ar_object_height [i] may be used.
  • the starting position of each line and the length of the target object in the line are identified. The specific method is as follows:
  • mask_pos [i] [m] represents the starting position of the i-th object in the m-th row in the target frame
  • mask_len [i] [m] represents the length of the i-th object in the m-th row in the target frame.
  • FIG. 2 is a schematic diagram of a target object in an image 200 according to an embodiment of the present application.
  • the image 200 includes a target object 1 and a target object 2.
  • the image region 1 corresponding to the target object 1 is a rectangular region; the image region 2 corresponding to the target object 2 is also a rectangular region.
  • pixels with a value of 1 belong to the target object 1, and pixels with a value of 0 do not belong to the target object 1.
  • pixels with a value of 1 belong to the target object 2
  • pixels with a value of 0 do not belong to the target object 2.
  • the attribute of the at least one pixel includes a part of the target object to which the at least one pixel belongs.
  • different pixels are assigned different values to indicate that different pixels belong to different parts of the target object.
  • the target object is a person; a first part of the at least one pixel is assigned a third value to indicate that the first part of the pixel belongs to the head of the target object; and / or, the at least one pixel
  • the second portion of pixels is given a fourth value to indicate that the second portion of pixels belong to the hand of the target object.
  • the at least one pixel may further include a third partial pixel, which is used to indicate that the third partial pixel does not belong to the target object but belongs to the background.
  • the third portion of pixels is assigned 0 to indicate that the third portion of pixels does not belong to the target object, but belongs to the background;
  • the first portion of pixels is assigned 1 to indicate that the first portion of pixels belongs to the head of the target object;
  • the two partial pixels are assigned 2 to indicate that the second partial pixels belong to the hand of the target object.
  • the target object is a car; a first part of the at least one pixel is assigned a fifth value to indicate that the first part of the pixel belongs to the head of the target object; and / or, the The second part of the pixels is assigned a sixth value to indicate that the second part of the pixels belong to the rear of the target object.
  • the at least one pixel may further include a third partial pixel, which is used to indicate that the third partial pixel does not belong to the target object but belongs to the background.
  • the third portion of pixels is assigned 0 to indicate that the third portion of pixels does not belong to the target object, but belongs to the background; the first portion of pixels is assigned 1 to indicate that the first portion of pixels belongs to the head of the target object; the second Part of the pixel is assigned 2 to indicate that the second part of the pixel belongs to the rear of the target object.
  • the attribute of the at least one pixel includes a description feature corresponding to the at least one pixel.
  • the descriptive feature may be point cloud data.
  • the description feature corresponding to at least one pixel may include at least one of the following: the reflection intensity of the point cloud corresponding to the at least one pixel, the infrared intensity corresponding to the at least one pixel, and the Describe the depth value corresponding to at least one pixel.
  • depth is a measure of distance, such as the distance to the lens.
  • the above specifically describes the identification information used by the encoding device to identify the target object.
  • the following will give a specific identification scheme that can effectively identify the target object, can effectively improve the efficiency of the identification, and reduce data storage and transmission.
  • the core idea of this specific identification scheme is to identify one or more target objects that have changed relative to the encoded image by comparing the current image with the encoded image. Among them, the identification object in the current image and the identification object in the encoded image can be compared one by one.
  • the target object may be at least one of the following: an identification object added to the current image relative to the encoded image; and an identification that changes the position of the current image relative to the encoded image.
  • the encoding method 100 may further include at least one of the following steps: determining the current image relative to the encoded image, and determining the newly added object to be identified as the target object; comparing the current image to the encoded image, the position and The object to be identified whose size is changed is determined as the target object; the object to be identified whose pixel information in the image area is changed relative to the encoded image is determined as the target object.
  • the identification information of the stream data further includes a category identification bit for indicating at least one of the following situations: the target object is an identification object newly added to the current image relative to the encoded image; the The target object is an identification object whose position of the current image is changed relative to the encoded image; the target object is an identification object whose size of the current image is changed relative to the encoded image; the target object is the current image which is relatively encoded An identification object in which pixel information in the image area in the image changes. The change of the identification object is identified through the category identification bit. For example, the category identification bit indicates whether the identification object is newly added, or the position / size and pixel information are changed.
  • the identification object whose position of the current image changes relative to the encoded image may refer to a change in the position of the identification object itself, or a change in the position of the image region where the identification object is located.
  • the identification object whose size of the current image changes relative to the encoded image may refer to a change in the size of the identification object itself, or may refer to a change in the size of an image area where the identification object is located.
  • the target object includes a newly added identification object of the current image relative to the encoded image
  • the image region information includes an absolute value of a position and a size of the image region where the newly added identification object is located.
  • the image area information and pixel information should be marked.
  • the image area information may include an absolute value of a position and an absolute value of a size of an image area where the newly-identified object is located.
  • the target object may include an identification object whose current image is relatively encoded with a changed position, and then the image region information of the target object (that is, the identification object whose position is changed) includes the image where the target object is located.
  • the absolute value of the location of a region or the relative value of a change in location refers to the position of the image area where the target object is located in the current image; the relative value of the position change refers to the position of the image area where the target object is in the encoded image, and the target object The difference between the positions of the image area in the current image.
  • a target object that is, an identification object whose position changes
  • the size of the area changes or remains the same.
  • the image area information of the target object includes an absolute value of a size of the image area where the target object is located or a relative value of the size change.
  • the absolute value of the size refers to the size of the image area where the target object is located in the current image
  • the relative value of the size change refers to the size of the area where the target object is located in the encoded image, and The difference between the size of the image in the current image.
  • the image area information of the target object includes identification bits for indicating that the size of the image area where the target object is located remains unchanged in the decoded image.
  • the size of the image area is not encoded in the image area information of the target object in the code stream data.
  • a target object that is, an identification object whose position changes
  • the pixel information of the target object includes an absolute value of an attribute of at least one pixel of the image region where the target object is located or a relative value of the attribute change of at least one pixel.
  • the absolute value of the attribute refers to the attribute of at least one pixel of the image region where the target object is in the current image; the attribute of at least one pixel may refer to the absolute value of the attributes of all pixels in the image region, or it may be Refers to the absolute value of the attributes of some pixels in the image region whose attributes have changed.
  • the relative value of the attribute change refers to the difference between the value assigned to the pixel of the image area of the target object in the current image and the value assigned to the pixel of the image area of the target object in the encoded image. .
  • the relative value may be a difference value corresponding to all pixels in the image area, or may be a difference value corresponding to some pixels in the image area where attributes are changed.
  • the image area information of the target object includes an identification bit, which is used to indicate that the pixel information of the image area where the target object is located remains unchanged in the decoded image.
  • the pixel information of the target object is not encoded in the bitstream data.
  • the target object may include an identification object whose current image is relatively encoded with a changed size, and then the image region information of the target object (that is, the identification object whose size is changed) includes the image where the target object is located.
  • the image region information of the target object that is, the identification object whose size is changed
  • a target object that is, an identification object whose size changes
  • the image area information of the target object includes an absolute value of a position of the image area where the target object is located or a relative value of a position change.
  • the image area information of the target object includes identification bits for indicating that the position of the image area where the target object is located remains unchanged in the decoded image.
  • the position of the image region is not encoded in the image region information of the target object in the code stream data.
  • a target object that is, an identification object whose position changes
  • the pixel information of the target object includes an absolute value of an attribute of at least one pixel of the image region where the target object is located or a relative value of the attribute change of at least one pixel.
  • the image area information of the target object includes an identification bit, which is used to indicate that the pixel information of the image area where the target object is located remains unchanged in the decoded image.
  • the pixel information of the target object is not encoded in the bitstream data.
  • the target object may include an identification object in which the pixel information of the image area where the current image is relative to the encoded image is changed. Then, the target object (that is, the identification object where the pixel information is changed) is in the image where the current image is located.
  • the pixel information of the region includes an absolute value of an attribute of the at least one pixel of the image region where the target object is located or a relative value of a change in the attribute.
  • a target object that is, an identification object whose pixel information changes
  • the position of the image area changes or remains the same.
  • the image area information of the target object includes an absolute value of a position of the image area where the target object is located or a relative value of a position change.
  • the image area information of the target object includes identification bits for indicating that the position of the image area where the target object is located remains unchanged in the decoded image.
  • the position of the image region is not encoded in the image region information of the target object in the code stream data.
  • a target object that is, an identification object whose pixel information changes
  • the size of the image area changes or remains the same.
  • the image area information of the target object includes an absolute value of a size of the image area where the target object is located or a relative value of the size change.
  • the image area information of the target object includes identification bits for indicating that the size of the image area where the target object is located remains unchanged in the decoded image.
  • the size of the image area is not encoded in the image area information of the target object in the code stream data.
  • the image area information may include an identification bit, which is used to indicate that the size and position of the image area where the target object is located remain inferior to the encoded image change.
  • the identification bit may be a single identification bit, which indicates that the size and position are unchanged; the identification bit may also include two sub-identification bits, which indicate that the size is unchanged and the position is unchanged.
  • Each of the identification information may include both image area information and pixel information.
  • the specific content of the identification information can be as follows.
  • ar_object_mask_present_flag indicates whether mask information of an object needs to be identified in the current image
  • ar_num_objects_minus1 indicates the number of objects that need to be identified in the current image
  • ar_object_idx [i] represents the label of the i-th object in the current image that needs to be identified
  • ar_bounding_box_mask_present_flag [ar_object_idx [i ]] Indicates whether the label ar_object_idx [i] has a mask identifying the shape of the object
  • ar_bounding_box_mask_infer_flag [ar_object_idx [i]] indicates whether the label value is from the previously encoded image when the label ar_object_idx [i] contains mask information ar_object_idx [i] mask of the object
  • ar_new_object_flag [ar_object_idx [i]] indicates whether the object labeled ar_object_idx [i] in the current image is
  • mask [m] [n] represents the template value corresponding to pixels with coordinates offset m and n in the vertical and horizontal directions relative to the upper left corner of the rectangular area.
  • the value of mask [m] [n] is 1; otherwise, when the pixel belongs to the background, the value of mask [m] [n] is 0.
  • a point-by-point identification method may be adopted, or the target object may be represented by ar_object_top [ar_object_idx [i]], ar_object_left [ar_object_idx [i]], ar_object_width [ar_object_idx [i]], and ar_object_height [ar_object_idx [i]] identifies the starting position of each line of the target frame class and the length of the target object in the line.
  • the specific method is as follows:
  • mask_pos [ar_object_idx [i]] [m] represents the starting position of the ar_object_idx [i] th object in the target frame
  • mask_len [ar_object_idx [i]] [m] represents the ar_object_idx [i] th The length of the m-th row of the object in the target frame.
  • ar_new_object_flag and the like can be considered as the category identification bits mentioned above.
  • ar_object_idx [i] is the label of the target object, which can also be called the indicator bit, number or index of the target object, which is used to indicate which target object it is.
  • the code stream data and / or the identification information may further include an indication bit of an encoded image, which is used to indicate which encoded image is currently referenced.
  • the indication bit may be the number of the encoded image, or the number of frames from the current image in the encoding order.
  • the code stream data and the identification information may not include the reference bit of the encoded image, but use the previous frame image or the previous N frame image specified or defaulted by the protocol as the reference encoded image.
  • the encoded image may be determined by the following method.
  • the label of one or more target objects in the current image is used as a search condition, and among the multiple images that have been encoded, the image of the target object that is closest to the current image is searched as the encoded image used as a reference.
  • the encoded image may be determined by the following method.
  • search for the value of the at least one parameter is the encoded image used as a reference.
  • the target object may be one or more.
  • the search can be based on at least one parameter value of the same target object in the current image, that is, when the same target object and the position and / or size and / or pixel information are closest, it is considered that a coded image is searched for reference .
  • the search may not be based on the same target object in the current image, but only based on at least one of the position, size, and pixel information, that is, regardless of whether the target objects are the same, the position and / or size and / or pixel information are When approached, it is considered that a coded image was searched for reference.
  • drones control camera equipment through the PTZ, so that target objects, such as people, remain at the center of the screen or at a specific position on the screen.
  • the center of the image area where the target object is located is always maintained at the center of the picture or a specific position of the picture.
  • the image area can be a rectangular area
  • the image area information can include the coordinates of the center point of the rectangular area, the Height information and width information of the rectangular area.
  • the code stream data may not be centered on the image area in the image area information.
  • the specific value of the point coordinate is encoded, but the flag is used to indicate that the value is unchanged.
  • the image area is a rectangular area
  • the image area information includes the coordinates of the center point of the rectangular area, the height information of the rectangular area, and the width information of the rectangular area.
  • the image area information may include an identification bit, which is used to indicate that the coordinates of the center point of the image area where the target object is located remain unchanged compared to the encoded image.
  • the identification information may also be used to identify a removed object of the current image relative to the encoded image. It should be understood that each identification object in the embodiments of the present application may have a unique label or index. In addition, the label or index of the same identified object in different images may be the same. In some possible implementation manners, the identification information includes label information of the removed object or position information of the removed object. In one example, the specific identification scheme of the removed object may be as follows.
  • ar_num_cancel_objects represents objects that the current image no longer exists relative to the encoded image
  • ar_cancel_object_idx [i] represents the labels of the objects that no longer exist.
  • the target objects mentioned above can be people, cars and public facilities.
  • the identification information may further include content information, and the content information is used to indicate the content of the target object.
  • the content information may be label information.
  • the label can use natural language to directly indicate the content of the target object.
  • the natural language can use the Internet Engineering Task Force (IETF) Request For Comments (RFC) 5646 standard, that is, the IETF RFC 5646 standard.
  • IETF Internet Engineering Task Force
  • RFC Request For Comments
  • the content information may be a numerical value. That is, one-dimensional values can be added, and different values can be used to indicate what kind of content the target object is. For example, a value of content information of 1 indicates that the content of the target object is a person; a value of content information of 2 indicates that the content of the target object is a car.
  • the code stream data may further include image content data of the current image.
  • the image content data of the current image includes reference frame data of the current image and residual data between the current image and the reference frame.
  • FIG. 3 is a schematic flowchart of a decoding method 300 according to an embodiment provided in the present application.
  • the decoding method 300 is performed by a decoding device.
  • the decoding method 300 includes: S310, obtaining code stream data of a current image, where the code stream data includes identification information, the identification information is used to identify at least one target object in the current image, and the identification information Including image area information and pixel information, the image area information includes the position and size of the image area where the target object is located, the pixel information includes attributes of at least one pixel in the image area; S320, at least part of the bitstream data Perform decoding processing.
  • the position and size of the image area where the target object is located are indicated by the image area information, and the attributes of multiple pixels in the image area are indicated by the pixel information, thereby identifying the target object with finer granularity, It is beneficial for the decoding device to perform operations on the target object more efficiently and accurately.
  • bitstream data of the current image obtained in step 310 may be the same as the bitstream data in the encoding method provided in the present invention.
  • bitstream data in step 310 refer to Explanation of code stream data in the above coding method.
  • the attribute of the at least one pixel may include whether the at least one pixel belongs to the target object.
  • the image area may include multiple sub-image areas, and the pixel information may include a value assigned to at least one pixel in the image area; wherein pixels in different sub-image areas are assigned different values .
  • At least one pixel may be assigned a different value in the pixel information, S320.
  • Decoding the at least part of the bitstream data may include: Pixel information determines whether at least one pixel in the image area belongs to the target object.
  • the first part of the at least one pixel may be assigned a first value
  • determining whether at least one pixel in the image area belongs to the target object according to the pixel information in the bitstream data may include: when In the pixel information in the bitstream data, the first portion of pixels corresponds to the first value, and it is determined that the first portion of pixels does not belong to the target object. For example, if the first part of the pixel information corresponds to 0, the first part of the pixel does not belong to the target object.
  • the second part of the at least one pixel may be assigned a second value.
  • determining whether at least one pixel in the image area belongs to the target object may include: : When the second portion of pixels in the pixel information in the bitstream data corresponds to the second value, it is determined that the second portion of pixels belongs to the target object. For example, if the second part of the pixel information corresponds to 1, the second part of the pixel belongs to the target object.
  • the attribute of the at least one pixel may include a part of the target object to which the at least one pixel belongs.
  • S320 decoding at least part of the bitstream data may include: determining an image area according to the pixel information in the bitstream data. At least one of the pixels is in a part to which the target object belongs.
  • the target object may be a person; the first part of the at least one pixel may be assigned a third value, and according to the pixel information in the code stream data, it is determined that at least one pixel in the image area belongs to the target object.
  • the part may include: when the first part of the pixel information in the bitstream data corresponds to the third value, determining that the first part of the pixel belongs to the head of the target object; and / or, the second part of the at least one pixel may be given
  • the fourth value, according to the pixel information in the code stream data, determining at least one pixel in the image area at the part to which the target object belongs, may include: when the second part of the pixels in the pixel information in the code stream data corresponds to the fourth value, Determine that the second part of the pixel belongs to the hand of the target object.
  • the target object may be a car; the first part of the at least one pixel may be assigned a fifth value, and at least one pixel in the image area is determined to belong to the target object according to the pixel information in the code stream data.
  • the part of the pixel information may include: when the first part of the pixel information in the bitstream data corresponds to the fifth value, determining that the first part of the pixel belongs to the head of the target object; and / or, the second part of the at least one pixel may be given
  • the sixth value, according to the pixel information in the code stream data, determining that at least one pixel in the image area belongs to the target object may include: when the second part of the pixel information in the pixel information in the code stream data corresponds to the sixth value, determining The second part of the pixels belongs to the rear of the target object.
  • the attribute of the at least one pixel may include a description feature corresponding to the at least one pixel.
  • the description feature corresponding to at least one pixel may include at least one of the following: the reflection intensity of a point cloud corresponding to the at least one pixel, the infrared intensity corresponding to the at least one pixel, and the depth value corresponding to the at least one pixel .
  • the attributes are measured in pixel blocks, and the pixel information may include attribute information of at least one pixel block, and the pixel block may include at least two pixels.
  • the code stream data may further include a category identification bit.
  • the decoding method 300 may further include: determining that the target object may be at least one of the following cases according to the category identification bit: a newly added identification object of the current image relative to the decoded image; and an identification of a change in the position of the current image relative to the decoded image.
  • the target object may include a newly added identification object of the current image relative to the decoded image.
  • the image area information may include the absolute value of the position and the size of the image area where the target object is located.
  • S320 decoding the at least part of the bitstream data may include: determining the position and size of the target object, that is, the image region where the newly identified object is located, according to the image region information in the bitstream data.
  • the target object may include an identification object whose current image is relative to the decoded image and whose position is changed.
  • the image region information of the target object (that is, the identification object whose position is changed) includes the image where the target object is located.
  • the absolute value of the location of a region or the relative value of a change in location refers to the position of the image area where the target object is located in the current image;
  • the relative value of the position change refers to the position of the image area where the target object is in the decoded image, and the target object The difference between the positions of the image area in the current image.
  • S320 decoding at least part of the bitstream data may include: according to where the target object is located in the decoded image.
  • the position of the image region and the relative value of the position change of the image region determine the position of the image region where the target object is located in the current image.
  • the decoding device may determine the position of the target image region in the decoded image; based on the position of the target image region in the decoded image, and the position of the target image region in the decoded image and the target object The difference between the positions of the image regions in the current image determines the position of the image region in which the target object is located in the current image.
  • the current image has a target object (that is, an identification object whose position changes) relative to the decoded image
  • a size of the image region where the target object is located in the current image compared to the image in the decoded image.
  • the size of the area changes or remains the same.
  • the image area information of the target object includes an absolute value of a size of the image area where the target object is located or a relative value of the size change.
  • the absolute value of the size refers to the size of the image area where the target object is located in the current image
  • the relative value of the size change refers to the size of the area where the target object is in the decoded image, and The difference between the size of the image in the current image.
  • S320 decoding at least part of the bitstream data may include: according to where the target object is located in the decoded image The size of the image area and the relative value of the size change of the image area determine the size of the image area where the target object is located in the current image.
  • the decoding device may determine the size of the image area where the target object is in the decoded image; based on the size of the image area where the target object is in the decoded image, and the size of the image area where the target object is in the decoded image and the target object The difference between the sizes of the image areas where the current image is located determines the size of the image area where the target object is located in the current image.
  • the image area information of the target object includes identification bits for indicating that the size of the image area where the target object is located remains unchanged in the decoded image.
  • the size of the image area is not encoded in the image area information of the target object in the code stream data.
  • S320 decodes at least part of the bitstream data, and may further include: determining, based on a size of an image area in which the target object is located in the decoded image, an image area of the target object in the current image. size. That is, the size of the image area where the target object is located in the decoded image is determined as the size of the image area where it is located in the current image.
  • a target object that is, an identification object whose position changes
  • the pixel information of the target object includes an absolute value of an attribute of at least one pixel of the image region where the target object is located or a relative value of the attribute change of at least one pixel.
  • the absolute value of the attribute refers to the attribute of at least one pixel of the image region where the target object is in the current image; the attribute of at least one pixel may refer to the absolute value of the attributes of all pixels in the image region, or it may be Refers to the absolute value of the attributes of some pixels in the image region whose attributes have changed.
  • the relative value of the attribute change refers to the difference between the value assigned to the pixels of the image area of the target object in the current image and the value assigned to the pixels of the image area of the target object in the decoded image. .
  • the relative value may be a difference value corresponding to all pixels in the image area, or a difference value corresponding to some pixels in the image area where attributes are changed, that is, when the difference value is 0, the difference value may be omitted.
  • S320 decodes at least part of the bitstream data, and may further include: according to the target object in the decoded The pixel information in the image and the relative value of the attribute change of the at least one pixel determine the pixel information of the target object in the current image.
  • the decoding device may determine the attributes of at least one pixel of the image region where the target object is in the decoded image; according to the attributes of at least one pixel of the image region where the target object is in the decoded image, and where the target object is in the decoded image A difference between the attribute of at least one pixel of the image region and the attribute of at least one pixel of the image region where the target object is in the current image determines the attribute of at least one pixel of the image region where the target object is in the current image.
  • the decoding device may consider that the attributes of the remaining pixels have not changed.
  • the image area information of the target object includes an identification bit, which is used to indicate that the pixel information of the image area where the target object is located remains unchanged in the decoded image.
  • the pixel information of the target object is not encoded in the code stream data.
  • S320 decodes at least part of the bitstream data, and may further include: determining pixel information of the target object in the image region where the current image is located according to the pixel information of the image region where the target object is in the decoded image.
  • the target object may include an identification object whose current image is relative to the decoded image and whose size changes, and then the image region information of the target object (that is, the identification object whose size has changed) includes the image where the target object is located.
  • S320 decoding at least part of the bitstream data may include: according to where the target object is located in the decoded image The size of the image area and the relative value of the size change of the image area determine the size of the image area where the target object is located in the current image.
  • a target object that is, an identification object whose size changes
  • the image area information of the target object includes an absolute value of a position of the image area where the target object is located or a relative value of a position change.
  • S320 decodes at least part of the bitstream data, and may further include: according to the target object in the decoded image The position of the image region where the image is located and the relative value of the position change of the image region determine the position of the image region where the target object is located in the current image.
  • the image area information of the target object includes identification bits for indicating that the position of the image area where the target object is located remains unchanged in the decoded image.
  • the position of the image region is not encoded in the image region information of the target object in the code stream data.
  • S320 decodes at least part of the bitstream data, and may further include: determining, based on the position of the target object in the image region where the target object is located, the image region of the target object in the current image. position. That is, the position of the image region where the target object is in the decoded image is determined as the position of the image region where it is in the current image.
  • a target object that is, an identification object whose position changes
  • the pixel information of the target object includes an absolute value of an attribute of at least one pixel of the image region where the target object is located or a relative value of the attribute change of at least one pixel.
  • the pixel information includes the relative value of the attribute change of at least one pixel of the image area where the target object is located
  • S320 decodes at least part of the bitstream data, and may further include: according to the target object in the decoded The pixel information in the image and the relative value of the attribute change of the at least one pixel determine the pixel information of the target object in the current image.
  • the image area information of the target object includes an identification bit, which is used to indicate that the pixel information of the image area where the target object is located remains unchanged in the decoded image.
  • the pixel information of the target object is not encoded in the bitstream data.
  • the pixel information of the target object is not encoded in the code stream data.
  • S320 decodes at least part of the bitstream data, and may further include: determining pixel information of the target object in the image region where the current image is located according to the pixel information of the image region where the target object is in the decoded image.
  • the target object may include an identification object in which the pixel information of the image area where the current image is relative to the decoded image is changed. Then, the target object (that is, the identification object where the pixel information is changed) is in the image where the current image is located.
  • the pixel information of the region includes an absolute value of an attribute of the at least one pixel of the image region where the target object is located or a relative value of a change in the attribute.
  • S320 decoding at least part of the bitstream data may include: according to the target object in the decoded image And the relative value of the attribute change of the at least one pixel to determine the pixel information of the target object in the current image.
  • the image area information of the target object includes an absolute value of a position of the image area where the target object is located or a relative value of a position change.
  • S320 decodes at least part of the bitstream data, and may further include: according to the target object in the decoded image The position of the image region where the image is located and the relative value of the position change of the image region determine the position of the image region where the target object is located in the current image.
  • the image area information of the target object includes identification bits for indicating that the position of the image area where the target object is located remains unchanged in the decoded image.
  • the position of the image region is not encoded in the image region information of the target object in the code stream data.
  • S320 decodes at least part of the bitstream data, and may further include: determining, based on the position of the target object in the image region where the target object is located, the image region of the target object in the current image. position. That is, the position of the image region where the target object is in the decoded image is determined as the position of the image region where it is in the current image.
  • a target object that is, an identification object whose pixel information changes
  • the size of the image area changes or remains the same.
  • the image area information of the target object includes an absolute value of a size of the image area where the target object is located or a relative value of the size change.
  • the image area information includes the relative value of the size change of the image area where the target object is located
  • S320 decodes at least part of the bitstream data, which may include: The size of the image area and the relative value of the size change of the image area determine the size of the image area where the target object is located in the current image.
  • the image area information of the target object includes identification bits for indicating that the size of the image area where the target object is located remains unchanged in the decoded image.
  • the size of the image area is not encoded in the image area information of the target object in the code stream data.
  • S320 decodes at least part of the bitstream data, and may further include: determining, based on a size of an image area in which the target object is located in the decoded image, an image area of the target object in the current image. size.
  • the image area may be a rectangular area.
  • the image region information may include coordinates of a center point of the rectangular region, height information of the rectangular region, and width information of the rectangular region.
  • the code stream data may not include the numerical value of the coordinates of the center point of the image area in the image area information, but use the identification bit to indicate its content constant.
  • the image area information may further include an identification bit, which is used to indicate that the coordinates of the center point of the image area where the target object is located remain unchanged.
  • S320 performing decoding processing on at least a part of the bitstream data may include: determining a center point of the target object in the image area according to coordinates of a center point of the image area where the target object is in the decoded image. coordinate.
  • the decoding device can determine the coordinates of the center point of the identified object in the image area based on the coordinates of the center point of the image area where the identified object remains in the decoded image; determine the image according to the image area information of the current image The height information and width information of the area; determine the image area where the identity object remains unchanged according to the coordinates of the center point of the image area, and the height information and width information of the image area.
  • the identification information may also be used to identify a removed object of the current image relative to the decoded image.
  • the identification information may include label information of the removed object or position information of the removed object.
  • the code stream data may further include image content data of the current image.
  • S320 performing decoding processing on at least a part of the code stream data may include: performing decoding processing on image content data of the current image in the code stream data.
  • the image content data of the current image includes reference frame data of the current image and residual data between the current image and the reference frame.
  • S320 decoding at least part of the code stream data may include: decoding the identification information in the code stream data, obtaining the current image and the decoded identification information.
  • S320 decoding the at least part of the code stream data may include: discarding the identification information and not decoding the identification information.
  • the identification information may further include content information.
  • S320 performing decoding processing on at least a part of the bitstream data may include: determining content of the target object according to content information in the bitstream data.
  • the content information may be label information.
  • the content information may be a numerical value.
  • the image area may be a rectangular area.
  • the image region information may include coordinates of any corner of the rectangular region, height information of the rectangular region, and width information of the rectangular region.
  • the image area information may include the coordinates of the center point of the rectangular area, the height information of the rectangular area, and the width information of the rectangular area.
  • the image region information may include coordinates of the upper left corner of the rectangular region and coordinates of the lower right corner of the rectangular region.
  • the image region information may include coordinates of the upper right corner of the rectangular region and coordinates of the lower left corner of the rectangular region.
  • the identification information may be located in auxiliary enhancement information or extended data of the current image.
  • the pixel information may be represented by a mask.
  • the template value can be identified by the binary values 0 and 1.
  • the template value of the pixels belonging to the target object in the pixel information is 1; the template value of the pixels belonging to the background is 0.
  • the image area of the target object i is a rectangular area; the image area information of the target object i includes the coordinates of the upper left corner of the rectangular area, the height information of the rectangular area, and the width information of the rectangular area; the pixel information of the target object i is a template
  • the specific content of the identification information of the target object i can be as follows. Those skilled in the art can understand that this content is only schematic and can be obtained from other alternative forms or solutions, which are not listed here one by one.
  • ar_object_top [i], ar_object_left [i], ar_object_width [i], and ar_object_height [i] represent the position and size of the target object i
  • ar_object_top [i] and ar_object_left [i] represent the position of the upper left corner of the target object i
  • ar_object_width [i] and ar_object_height [i] represent the width and height of the target object i.
  • mask [m] [n] represents the template value corresponding to pixels with coordinates offset m and n in the vertical and horizontal directions relative to the upper left corner of the rectangular area.
  • a point-by-point identification method may be adopted, or the target frame class indicated by the target object identified by ar_object_top [i], ar_object_left [i], ar_object_width [i], and ar_object_height [i] may be used.
  • the starting position of each line and the length of the target object in the line are identified. The specific method is as follows:
  • mask_pos [i] [m] represents the starting position of the i-th object in the m-th row in the target frame
  • mask_len [i] [m] represents the length of the i-th object in the m-th row in the target frame.
  • the related information of the identification object that decodes the current image can refer to the situation of the decoded image.
  • the specific content of the identification information received by the decoding device may be as follows.
  • ar_object_top [ar_object_idx [i]], ar_object_left [ar_object_idx [i]], ar_object_width [ar_object_idx [i]], and ar_object_height [ar_object_idx [i]] indicate the position and size of the object labeled ar_object_idx [i]
  • Ar_object_top [ar_object_idx [i]] and ar_object_left [ar_object_idx [i]] indicate the position of the upper left corner of the object labeled ar_object_idx [i]
  • ar_object_width [ar_object_idx [i]] and ar_object_height [ar_object_idx [i]] indicate that the label is ar_object_id i] the width and height of the object.
  • the position, size, and pixel information of the rectangular area corresponding to ar_object_idx [i] are consistent with the position, size, and pixel information of the rectangular area corresponding to the label ar_object_idx [i] in the decoded image; if there is a change , The position, size, and pixel information of the rectangular region corresponding to ar_object_idx [i] are re-decoded.
  • mask [m] [n] represents the template value corresponding to pixels with coordinates offset m and n in the vertical and horizontal directions relative to the upper left corner of the rectangular area.
  • the value of the decoded mask_value is 1, the value of mask [m] [n] is 1, indicating that the pixel belongs to the object labeled ar_object_idx [i]; when the value of the decoded mask_value is 0, mask [ The value of m] [n] is 0, which means that the pixel belongs to the background.
  • a point-by-point identification method may be adopted, or the target object may be represented by ar_object_top [ar_object_idx [i]], ar_object_left [ar_object_idx [i]], ar_object_width [ar_object_idx [i]], and ar_object_height [ar_object_idx [i]] identifies the starting position of each line of the target frame class and the length of the target object in the line.
  • the specific method is as follows:
  • mask_pos [ar_object_idx [i]] [m] represents the starting position of the ar_object_idx [i] th object in the target frame
  • mask_len [ar_object_idx [i]] [m] represents the ar_object_idx [i] th The length of the m-th row of the object in the target frame.
  • the bitstream data and / or the identification information may further include an indication bit of a decoded image, which is used to indicate which decoded image is currently referenced.
  • the indication bit may be the number of the decoded image, or the number of frames from the current image in the decoding order.
  • the code stream data and the identification information may not include the reference bit of the decoded image, but use the previous frame image or the previous N frame image specified or defaulted by the protocol as the reference decoded image.
  • FIG. 4 is a schematic block diagram of an encoding device 400 according to an embodiment of the present application. As shown in FIG. 4, the encoding device 400 includes:
  • At least one memory 410 configured to store computer-executable instructions
  • At least one processor 420 is used alone or collectively to access the at least one memory 410 and execute the computer-executable instructions to perform the following operations:
  • the code stream data includes identification information
  • the identification information is used to identify at least one target object in the current image
  • the identification information includes image area information and pixels Information
  • the image region information includes a position and a size of an image region where the target object is located
  • the pixel information includes attributes of at least one pixel in the image region.
  • the encoding device in the embodiment of the present application indicates the position and size of the image area where the target object is located through the image area information, and indicates the attributes of multiple pixels in the image area through the pixel information, thereby identifying the target object with finer granularity. It is beneficial for the decoding device to perform operations on the target object more efficiently and accurately.
  • the attributes of the at least one pixel include whether the at least one pixel belongs to the target object.
  • the image region includes a plurality of sub-image regions
  • the pixel information includes a value assigned to at least one pixel in the image region; wherein pixels in different sub-image regions are assigned different values.
  • the pixel information assigns different values to the at least one pixel to indicate whether the at least one pixel belongs to the target object.
  • a first partial pixel is assigned a first value to indicate that the first partial pixel does not belong to the target object.
  • a second partial pixel is assigned a second value to indicate that the second partial pixel belongs to the target object.
  • the attributes of the at least one pixel include a part of the target object to which the at least one pixel belongs.
  • different values are assigned to different pixels in the pixel information to indicate that the different pixels belong to different parts of the target object.
  • the target object is a person
  • a first partial pixel of the at least one pixel is assigned a third numerical value to indicate that the first partial pixel belongs to a head of the target object;
  • a second partial pixel of the at least one pixel is given a fourth numerical value, which is used to indicate that the second partial pixel belongs to a hand of the target object.
  • the target object is a car
  • a first partial pixel of the at least one pixel is assigned a fifth value, which is used to indicate that the first partial pixel belongs to a head of the target object;
  • a second part of the at least one pixel is assigned a sixth value for indicating that the second part of the pixel belongs to the rear of the vehicle of the target object.
  • the attributes of the at least one pixel include descriptive features corresponding to the at least one pixel.
  • the description feature corresponding to the at least one pixel includes at least one of the following: a reflection intensity of a point cloud corresponding to the at least one pixel, an infrared intensity corresponding to the at least one pixel, and the at least one The depth value corresponding to the pixel.
  • the attributes are measured in pixel blocks
  • the pixel information includes information on attributes of at least one pixel block
  • the pixel blocks include at least two pixels.
  • the target object is an object that meets at least one of the following conditions:
  • An identification object whose size of the current image changes relative to the encoded image
  • An identification object in which the pixel information in the image area of the current image is changed relative to the encoded image.
  • the code stream data further includes a category identification bit for indicating at least one of the following situations:
  • the target object is a newly added identification object of the current image relative to the encoded image
  • the target object is an identification object whose position of the current image is changed relative to the encoded image
  • the target object is an identification object that changes the size of the current image relative to the encoded image
  • the target object is an identification object in which the current image is changed relative to pixel information in the image area in the encoded image.
  • the target object includes a new identification object of the current image relative to the encoded image
  • the image area information includes an absolute value and a size of a position of an image area where the new identification object is located. Absolute value.
  • the target object includes an identification object whose position is changed relative to the encoded image and the current image;
  • the image region information includes an absolute value of a position of an image region where the target object is located or a relative value of a position change.
  • the image area information includes an identification bit, which is used to indicate that the size of the image area where the target object is located remains unchanged compared to the encoded image.
  • the target object includes an identification object in which the current image is changed in size relative to the encoded image
  • the image area information includes an absolute value of a size of an image area where the target object is located or a relative value of a size change.
  • the pixel information includes an identification bit, which is used to indicate that the pixel information of the image region where the target object is located remains unchanged from the encoded image.
  • the target object includes an identification object in which the current image is changed with respect to the encoded image and pixel information
  • the pixel information includes an absolute value of the pixel information or a relative value of the pixel information change.
  • the pixel information includes an identification bit, which is used to indicate that the pixel information of the image area where the target object is located is changed compared to the encoded image.
  • the image region information includes an identification bit, which is used to indicate that the size and / or position of the image region where the target object is located remains unchanged from the encoded image.
  • the image area is a rectangular area
  • the image area information includes coordinates of a center point of the rectangular area, height information of the rectangular area, and width information of the rectangular area;
  • the image region information includes an identification bit, which is used to indicate that the coordinates of the center point of the image region where the target problem is located remain unchanged compared to the encoded image.
  • the identification information is further used to identify a removed object of the current image relative to the encoded image.
  • the identification information includes label information of the removed object or position information of the removed object.
  • the processor 420 is further configured to:
  • An object to be identified whose pixel information in the image area changes from the current image to the encoded image is determined as the target object.
  • the identification information further includes content information, and the content information is used to indicate content of the target object.
  • the content information is label information.
  • the content information is a numerical value.
  • the image area is a rectangular area.
  • the image region information includes coordinates of any corner of the rectangular region, height information of the rectangular region, and width information of the rectangular region;
  • the image area information includes coordinates of a center point of the rectangular area, height information of the rectangular area, and width information of the rectangular area;
  • the image area information includes coordinates of an upper left corner of the rectangular area and coordinates of a lower right corner of the rectangular area;
  • the image region information includes coordinates of an upper right corner of the rectangular region and coordinates of a lower left corner of the rectangular region.
  • the processor 420 may be further configured to:
  • the identification information is located in auxiliary enhancement information or extended data of the current image.
  • FIG. 5 is a schematic block diagram of an encoding device 500 according to an embodiment of the present application.
  • the encoding device 500 may include an encoding module 510 for performing encoding processing, generating code stream data, and the like.
  • Each module in the encoding device may be used to execute the methods in the embodiments of the present application, and details are not described herein again.
  • FIG. 6 is a schematic block diagram of a decoding device 600 according to an embodiment of the present application. As shown in Figure 6,
  • At least one memory 610 configured to store computer-executable instructions
  • At least one processor 620 is configured to access the at least one memory 610 and execute the computer-executable instructions to perform the following operations:
  • bitstream data of a current image including identification information, the identification information being used to identify at least one target object in the current image, the identification information including image area information and pixel information, the
  • the image region information includes a position and a size of an image region where the target object is located, and the pixel information includes attributes of at least one pixel in the image region;
  • the decoding device provided in the embodiment of the present application indicates the position and size of the image area where the target object is located through the image area information, and indicates the attributes of multiple pixels in the image area through the pixel information, thereby identifying the target object with finer granularity, It is beneficial for the decoding device to perform operations on the target object more efficiently and accurately.
  • the attributes of the at least one pixel include whether the at least one pixel belongs to the target object.
  • the image region includes a plurality of sub-image regions
  • the pixel information includes a value assigned to at least one pixel in the image region; wherein pixels in different sub-image regions are assigned different values.
  • the pixel information assigns different values to the at least one pixel
  • the processor 620 decodes at least a part of the bitstream data, including:
  • the determining, by the processor 620 according to the pixel information in the code stream data, whether the at least one pixel in the image region belongs to the target object may include:
  • the first partial pixel in the pixel information in the bitstream data corresponds to the first value, it is determined that the first partial pixel does not belong to the target object.
  • the determining, by the processor 620 according to the pixel information in the code stream data, whether the at least one pixel in the image region belongs to the target object may include:
  • the second partial pixel in the pixel information in the bitstream data corresponds to the second value, it is determined that the second partial pixel belongs to the target object.
  • the attributes of the at least one pixel include a part of the target object to which the at least one pixel belongs.
  • different values are assigned to different pixels in the pixel information
  • the processing performed by the processor 620 on at least a part of the code stream data includes:
  • the target object is a person
  • the determining, by the processor 620 according to the pixel information in the bitstream data, the at least one pixel in the image region at a part to which the target object belongs includes:
  • the determining, by the processor 620 according to the pixel information in the bitstream data, the at least one pixel in the image region at a part to which the target object belongs includes:
  • the second partial pixel in the pixel information in the bitstream data corresponds to the fourth value, it is determined that the second partial pixel belongs to the hand of the target object.
  • the target object is a car
  • the determining, by the processor 620 according to the pixel information in the bitstream data, the at least one pixel in the image region at a part to which the target object belongs includes:
  • a second part of the at least one pixel is assigned a sixth value, and the processor 620 determines that the at least one pixel in the image area belongs to the target object according to the pixel information in the code stream data Parts, including:
  • the attributes of the at least one pixel include descriptive characteristics corresponding to the at least one pixel.
  • the description feature corresponding to the at least one pixel includes at least one of the following: a reflection intensity of a point cloud corresponding to the at least one pixel, an infrared intensity corresponding to the at least one pixel, and the at least one The depth value corresponding to the pixel.
  • the attribute is a measurement unit of a pixel block
  • the pixel information includes information of an attribute of at least one pixel block
  • the pixel block includes at least two pixels.
  • the code stream data includes a category identification bit
  • the processor 620 is further configured to:
  • An identification object in which the pixel information in the image area of the current image is changed relative to the decoded image.
  • the target object includes an identification object newly added to the decoded image of the current image
  • the image region information includes an absolute value of a position and an absolute value of a size of an image region where the target object is located.
  • the target object includes an identification object that changes a position of the current image relative to a decoded image
  • the image region information includes an absolute value of a position of an image region where the target object is located
  • the image region information includes a relative value of a position change of an image region where the target object is located, and the processor 620 decodes at least a part of the bitstream data, which may include:
  • the position of the image region where the target object is in the current image is determined according to the position of the image region where the target object is in the decoded image and the relative value of the position change of the image region.
  • the image region information includes an identification bit, which is used to indicate that a size of an image region where the target object is located remains unchanged in the decoded image;
  • the processor 620 may perform decoding processing on at least a part of the code stream data, and further includes:
  • Determining the size of the image area in which the target object is located in the current image according to the size of the image area in which the target object is located in the decoded image.
  • the target object includes an identification object whose size of the current image changes relative to the decoded image
  • the image area information includes an absolute value of a size of the image area
  • the image region information includes a relative value of a size change of the image region, and the processor 620 decodes at least a part of the bitstream data, including:
  • the size of the image region in which the target object is located in the current image is determined according to the size of the image region in which the target object is located in the decoded image and the relative value of the size change of the image region.
  • the pixel information includes an identification bit for indicating that the pixel information of an image area where the target object is located remains unchanged from the decoded image;
  • the processor 620 decodes at least a part of the code stream data, and further includes:
  • the code stream data includes the pixel information
  • the processor 620 decodes at least a part of the code stream data, and further includes:
  • the bitstream data further includes an identification bit for indicating that pixel information of an image area where the target object is located is changed compared to the decoded image.
  • the target object includes an identification object that changes pixel information of the current image relative to the decoded image
  • the pixel information includes an absolute value of an attribute of the at least one pixel
  • the pixel information includes a relative value of an attribute change of the at least one pixel, and the processor 620 decodes at least a part of the code stream data, including:
  • the image area information further includes an identification bit, which is used to indicate that the target object is unchanged in the image area where the current image is located compared to the decoded image;
  • the processing performed by the processor 620 on at least a part of the code stream data includes:
  • the image area is a rectangular area
  • the image area information includes coordinates of a center point of the rectangular area, height information of the rectangular area, and width information of the rectangular area;
  • the image area information further includes an identification bit, which is used to indicate that the coordinates of the center point of the image area where the target object is located remain unchanged;
  • the processing performed by the processor 620 on at least a part of the code stream data includes:
  • the identification information is further used to identify a removed object of the current image relative to the decoded image.
  • the identification information includes label information of the removed object or position information of the removed object in the decoded image.
  • the processor 620 decodes at least a portion of the code stream data, including:
  • the processor 620 decodes at least a portion of the code stream data, including:
  • the identification information is discarded, and the identification information is not decoded.
  • the code stream data further includes image content data of the current image
  • the processing performed by the processor 620 on at least a part of the code stream data includes:
  • Decoding processing is performed on image content data of the current image in the code stream data.
  • the image content data of the current image includes reference frame data of the current image and residual data between the current image and the reference frame.
  • the identification information further includes content information
  • the processing performed by the processor 620 on at least a part of the code stream data includes:
  • the content information is label information.
  • the content information is a numerical value.
  • the image area is a rectangular area.
  • the image region information includes coordinates of any corner of the rectangular region, height information of the rectangular region, and width information of the rectangular region;
  • the image area information includes coordinates of a center point of the rectangular area, height information of the rectangular area, and width information of the rectangular area;
  • the image area information includes coordinates of an upper left corner of the rectangular area and coordinates of a lower right corner of the rectangular area;
  • the image region information includes coordinates of an upper right corner of the rectangular region and coordinates of a lower left corner of the rectangular region.
  • the identification information is located in auxiliary enhancement information or extended data of the current image.
  • FIG. 7 is a schematic block diagram of a decoding device 700 according to an embodiment of the present application.
  • the decoding device 700 may include an obtaining module 710 to obtain code stream data of a current image, and further include a decoding module 720 to perform decoding processing on at least part of the code stream data.
  • Each module in the decoding device may be used to execute the methods in the embodiments of the present application, and details are not described herein again.
  • FIG. 8 is a schematic flowchart of an image processing method 800 according to an embodiment of the present application. As shown in FIG. 8, the method 800 includes the following steps.
  • the code stream data includes identification information, where the identification information is used to identify at least one target object in the current image.
  • the identification information includes image area information and pixel information.
  • the image area information It includes the position and size of the image area where the target object is located, and the pixel information includes attributes of at least one pixel in the image area.
  • S820 Decode the bitstream data to obtain the current image and the identification information.
  • the position and size of the image area where the target object is located are indicated by the image area information, and the attributes of multiple pixels in the image area are indicated by the pixel information, thereby identifying the target object with finer granularity
  • the decoding device can perform pixel-level processing on the target object more efficiently and accurately.
  • the image processing method in the embodiment of the present application enables recognition of a target object to be performed on an encoding end, and a decoding device only needs to perform subsequent image processing. Therefore, on the one hand, the image processing method in the embodiment of the present application can be implemented on platforms such as mobile phones and tablet computers; on the other hand, the computing resources of the decoding device can be used for more complex image processing, so that the decoding device can present better quality More beautiful images.
  • S830 performs pixel-level processing on the current image according to the identification information, which may include: changing display content of the current image according to the identification information.
  • S830 performs pixel-level processing on the current image according to the identification information, and may include: performing statistics on data information in the current image according to the identification information.
  • the identification information may include image area information and finer pixel information.
  • the decoding device performs display processing or statistics on one or more pixels in the current image, referring to the image area information and finer pixel information, it can reduce its own calculation amount, can save computing resources, and reduce processing requirements. time.
  • the image processing method according to the embodiment of the present application will be described in more detail in terms of display and statistics respectively.
  • the attribute of the at least one pixel includes whether the at least one pixel belongs to the target object.
  • at least one pixel is assigned a different value to indicate whether the at least one pixel belongs to the target object.
  • a first portion of pixels in at least one pixel is assigned a first value to indicate that the first portion of pixels does not belong to the target object; and / or a second portion of pixels in at least one pixel is assigned a second value to indicate a second portion Some pixels belong to the target object.
  • the method 800 may further include: acquiring a first image.
  • S830: Performing pixel-level processing on the current image according to the identification information may include: performing fusion processing on the current image and the first image based on the identification information to obtain a second image, and the second image. Including at least part of the content of the current image and at least part of the content of the first image.
  • parameters of the current image and the second image may be the same, such as the same size, the same number of pixels, and the same resolution.
  • the parameters of the current image and the second image may also be different, which is not limited in the embodiment of the present application.
  • the performing fusion processing on the current image and the first image based on the identification information may include: weighting the current image and the first image based on the identification information. And, a weight value of a pixel corresponding to the target object in the current image is different from a weight value of at least a part of pixels in the current image except the target object. The weight value of the pixel corresponding to the target object is large, and the weight value of at least some pixels other than the target object is small, so that the target object of the current image in the second image obtained after the fusion process is more prominent than the non-target object.
  • the following processing may be adopted: weighting and summing the pixels in the current image and the pixels in the first image, and the weight of the pixels belonging to the target object in the current image is greater than that of the first image A weight of a pixel at a corresponding position in the, and a weight of a pixel in the current image that does not belong to the target object is smaller than a weight of a corresponding pixel in the first image.
  • the weight value of the pixels corresponding to the target object is 0.6, and the weight value of the pixels corresponding to the position of the target object in the first image is 0.4; the weight value of at least some pixels except the target object in the current image is 0.2, and the first image corresponds to The weighting value of at least some pixels at positions other than the target object is 0.8.
  • the final image effect is that the current image floats semi-transparently on the first image as the background, and the target object is more prominent in the fused picture. The target object floats on the background shown in the first image.
  • the performing fusion processing on the current image and the first image based on the identification information may include: determining, based on the image area information and the pixel information, whether the Pixels that do not belong to the target object; replacing pixels in the current image that do not belong to the target object with corresponding pixels in the first image to obtain a second image.
  • the at least one pixel is assigned a different value.
  • the determining, according to the image area information and the pixel information, pixels in the current image that do not belong to the target object may include: according to the image area information, removing pixels outside the image area where the target object is located. The pixels are determined to be pixels that do not belong to the target object in the current image; when a first portion of pixels in the pixel information corresponds to a first value, it is determined that the first portion of pixels does not belong to the target object. In this way, the decoding device can simply determine the details such as the position, size, and boundary of the target object without performing complicated calculations.
  • replacing pixels that do not belong to the target object in the current image with corresponding pixels in the first image to obtain a second image may include: comparing the pixels in the current image with all pixels in the current image. The pixels in the first image are weighted and summed, and pixels in the current image that do not belong to the target object are replaced with corresponding pixels in the first image. Among them, some pixels of the first image are used as the background.
  • the pixels in the current image and the pixels in the first image are weighted and summed together, the weight of the pixels belonging to the target object in the current image is 1, and the corresponding position in the first image is The weight of a pixel is 0; the weight of a pixel that does not belong to the target object in the current image is 0, and the weight of a pixel at a corresponding position in the first image is 1. If the pixel information that belongs to the target object is assigned a value of 1 and the pixels that do not belong to the target object are assigned a value of 0, the value in the pixel information may directly correspond to the weight of the pixel during fusion.
  • 9A and 9B are schematic diagrams of two images obtained by fusion according to an embodiment of the present application. The figure shows the target object and the background, respectively.
  • the method may further include: according to the pixel information, Determining a boundary of the target object, and performing an expansion operation on the target object based on the boundary; and determining, according to the image area information and the pixel information, pixels in the current image that do not belong to the target object, The method includes: determining pixels in the current image that do not belong to the target object according to the image area information, the pixel information, and a boundary of the expanded target object.
  • the expansion operation is to expand the field of the target object so that pixels that originally did not belong to the target object become pixels that belong to the target object. Specifically, the value of the pixels in the pixel information near the original boundary of the target object is changed from 0 to 1, so that when these pixels are fused, they are not replaced by pixels in the first image, thereby protecting the target object.
  • the pixels of the target object can be extracted for other occasions based on the identification information.
  • the performing pixel-level processing on the current image according to the identification information may include: determining, according to the image area information and the pixel information, pixels in the current image that belong to the target object, Extract pixels belonging to the target object.
  • the shooting angle of the current image is the same as the shooting angle of the first image.
  • the embodiments of the present application can be applied to a single-frame image or video.
  • a scene in which the method is applied to a single frame image can be as follows. Someone has a photo of a person (ie, a target object) taken at a certain position on the top of Mount Tai and at a certain shooting angle. However, because the shooting day was cloudy, the sunrise was not captured.
  • there is a first image which is also a picture of the scenery taken at the position and angle of the top of Mount Tai.
  • the shooting angle of the current image may be carried in the code stream data or more specifically in the identification information.
  • the identification information further includes viewing angle information, which is used to indicate a shooting angle of the target object relative to the shooting device, or a shooting angle of the current image.
  • the method may further include: determining a shooting angle of the target object and a shooting of the first image Same angle.
  • the identification information may not include the perspective information.
  • the decoding device determines the shooting angle of the target object relative to the shooting device through other methods or algorithms, which is not limited in the embodiment of the present application.
  • S830, performing pixel-level processing on the current image according to the identification information may include: determining, based on the image area information and the pixel information, For the target object, an augmented reality AR special effect is added to the target object.
  • the identification information may further include perspective information
  • the perspective information is used to indicate a shooting angle of the target object relative to the shooting device
  • the attribute of the at least one pixel may further include the at least one pixel Corresponding depth value.
  • Determining the target object in the current image and adding an AR special effect to the target object according to the image region information and the pixel information includes: determining the target object according to the image region information and the pixel information The target object in the current image; adding an AR special effect to the target object according to the target object, the shooting angle, and a depth value corresponding to the at least one pixel.
  • the AR special effect may be adding an icon (for example, an arrow, a halo, etc.), a text, a layer, and the like.
  • the shooting angle of the current image and / or the depth value corresponding to at least one pixel may be carried in the code stream data or more specifically in the identification information, or the decoding device may determine the target object through other methods or algorithms, etc.
  • Shooting angle relative to the camera For example, the viewing angle information and / or the depth value corresponding to at least one pixel may be obtained by calculating the ground attitude of the drone, which is not limited in the embodiment of the present application.
  • the at least one pixel is assigned a different value.
  • a first portion of pixels in at least one pixel is assigned a first value to indicate that the first portion of pixels does not belong to the target object; and / or a second portion of pixels in at least one pixel is assigned a second value to indicate that the second portion of pixels belongs to Target object.
  • the determining the boundary of the target object in the current image and adding an augmented reality AR effect to the target object according to the image area information and the pixel information may include: when the second part of the pixel information When the pixel corresponds to the second value, it is determined that the second part of the pixel belongs to the target object; based on the boundary of the target object, an indicator halo is added to the target object.
  • 10A and 10B are schematic diagrams of adding an indicator halo on a target object according to an embodiment of the present application. As shown in FIG. 10A, when the bright portion of the indication halo is rotated in front of the target object, the bright portion of the indication halo blocks the target object. As shown in FIG. 10B, when the bright portion of the indicating halo is rotated behind the target object, the bright portion of the indicating halo is blocked by the target object.
  • the performing the pixel-level processing on the current image according to the identification information in S830 may include: determining, based on the image area information and the pixel information, whether the Pixels that belong to the target object or pixels of non-target objects that do not belong to the target object in the current image; change at least one of the brightness, color, and grayscale of the target object, or change the non-target At least one of the brightness, color, and grayscale of the object, or changing the contrast of the target object and the non-target object.
  • changing the brightness or color of the target object or changing the brightness or color of the non-target object may include: modifying the YUV value, the RGB value, or ⁇ of the target object or the non-target object Curve to change the brightness or color of the target object, or change the brightness or color of the non-target object.
  • the image is a grayscale image
  • the grayscale of the target object can be changed according to the identification information.
  • the contrast between the target object and the non-target object needs to be emphasized, the contrast between the target object and the non-target object can be improved.
  • the at least one pixel in the pixel information may be assigned a different value.
  • a first portion of pixels in at least one pixel is assigned a first value to indicate that the first portion of pixels does not belong to the target object; and / or a second portion of pixels in at least one pixel is assigned a second value to indicate that the second portion of pixels belongs to Target object.
  • the second partial pixel in the pixel information corresponds to a second value
  • the first partial pixel in the pixel information corresponds to a first value
  • a first partial pixel in the pixel information corresponds to a first value
  • the first partial pixel in the pixel information corresponds to a first value
  • the second partial pixel in the pixel information corresponds to a second value
  • the first The two partial pixels belong to the target object
  • the contrast between the first partial pixels and the second partial pixels is improved.
  • the identification information further includes content information, which is used to indicate a content category of the target object.
  • the changing at least one of the brightness, color, and grayscale of the target object may include: when the target object is a first content category, changing the brightness of the target object to a preset first brightness value Change the color of the target object to a preset first color value, or change the gray level of the target object to a preset first gray value.
  • FIG. 11 is a schematic diagram of changing the brightness of a target object according to an embodiment of the present application.
  • the current image includes multiple target objects. Part of the content categories of the multiple target objects are people, and the other content categories are not people.
  • the content category can be changed to a brightness of a target object of a person to a preset first brightness value. As shown in FIG. 11, the content category is that the target object of a person is displayed by being highlighted or popped out to facilitate observation by an observer. For example, this example can be used for playback and viewing of the recording results of a smart camera, or real-time playback.
  • the changing at least one of the brightness, color, and grayscale of the target object may include: Content categories, which assign different brightness values, color values, or grayscale values to the target objects of different content categories. That is, target objects of different content categories are identified with different brightness, color, or grayscale, so as to facilitate observation by an observer.
  • S830 performing pixel-level processing on the current image according to the identification information may include: according to the image area information, the pixel information, and the content category, based on the The current image generates an object class segmentation image.
  • object segmentation images target objects of different content categories can be given different colors.
  • FIG. 12A is an original image of the current image
  • FIG. 12B is an object category segmentation image corresponding to the current image.
  • cars are marked with blue
  • buildings are marked with gray
  • ground is marked with purple.
  • people are marked with red
  • street lights are marked with yellow
  • plants are marked with green, etc., which are not shown in FIG. 12B.
  • the content information may be a label or a value.
  • the attributes of the at least one pixel may include a part of the target object to which the at least one pixel belongs.
  • the at least one pixel in the pixel information is assigned different values to indicate that the at least one pixel belongs to a different part of the target object.
  • S830 performing pixel-level processing on the current image according to the identification information may include: using different brightness and colors for different parts of the target object according to the image area information and the pixel information. Or grayscale identification, or different contrast between different parts.
  • different parts of the target object are marked with different colors as an example.
  • FIG. 13 is a schematic diagram of images identified by different colors at different parts of an embodiment of the present application.
  • the person at the lower left of FIG. 13 forms a target object together with a bag and a bicycle.
  • people are identified by yellow
  • bags are identified by red
  • bicycles are identified by green.
  • target objects of other content types such as cars are marked in blue, and people are marked in yellow.
  • the attribute of the at least one pixel may include a description feature corresponding to the at least one pixel.
  • the description feature corresponding to the at least one pixel may include at least one of the following: a reflection intensity of a point cloud corresponding to the at least one pixel, an infrared intensity corresponding to the at least one pixel, and a depth value corresponding to the at least one pixel. .
  • the current image may include a plurality of the target objects.
  • S830 performing pixel-level processing on the current image according to the identification information may include: according to the reflection intensity, infrared intensity, or depth value of multiple target objects, the The target object is given different brightness, color, and gray values. Taking color as an example, S830 performs pixel-level processing on the current image according to the identification information, which may include at least one of the following processing.
  • a reflection intensity segmented image is generated based on the current image according to the image area information, pixel information, and the reflection intensity of a point cloud corresponding to the at least one pixel.
  • each part of the target object can be distinguished, and the parts with different reflection intensities are identified by different colors; the parts of the target object can also be distinguished, such as the average of the reflection intensity of each part of the target object (or the reflection The intensity itself corresponds to the average reflection intensity of the entire target object), a target object is identified by a single color.
  • Target objects with different reflection intensities are assigned different colors.
  • FIG. 14A is an original image of the current image
  • FIG. 14B is a reflection intensity segmented image corresponding to the current image. As shown in FIG. 14B, target objects with different reflection intensities are given different colors, and the portions in FIG. 14B that do not belong to any target object, such as the background portion, are identified by white.
  • a depth map is generated based on the current image according to the image area information, pixel information, and a depth value corresponding to the at least one pixel.
  • each part of the target object can be distinguished, and the parts with different depth values are identified by different colors; the parts of the target object can also be distinguished, such as averaging the depth values of the parts of the target object (or the depth value itself). Is the average depth value corresponding to the entire target object), a target object is identified by a single color.
  • FIG. 15A is an original image of the current image
  • FIG. 15B is a depth map corresponding to the current image. As shown in FIG. 15B, pixels of different depth values are assigned different colors.
  • an infrared image may be generated based on the current image according to the image area information, the pixel information, and an infrared intensity corresponding to the at least one pixel, and details are not described herein again.
  • S830 performs pixel-level processing on the current image according to the identification information, which may include: performing statistics on data information in the current image according to the identification information.
  • the identification information further includes content information, which is used to indicate a content category of the target object. Some statistics can be made using the content category of the target object included in the content information.
  • S830 performing pixel-level processing on the current image according to the identification information may include: performing statistics on the target object in the current image according to the content category of the target object to obtain a statistical result.
  • the statistics of the target object in the current image according to the content category of the target object to obtain a statistical result may include: performing statistics on the target object whose content category is a person in the current image to obtain People flow results and / or crowd density results.
  • This scenario can be used by the municipal management department for crowd management during peak commutes or holidays, or it can be used for commercial layout purposes, and used to count passenger traffic.
  • the statistics of the target object in the current image according to the content category of the target object to obtain a statistical result may include: counting the target object whose content category is a car in the current image To obtain the results of traffic flow and / or density results.
  • This scenario can be used by traffic management departments for rush-hour commute or traffic management at public transit stations.
  • the pixel-level processing in S830 may be expression recognition or motion recognition.
  • S830 performs pixel-level processing on the current image according to the identification information, which may include: when a first partial pixel in the pixel information corresponds to a third value, determining the first partial pixel Belonging to the head of the target object; performing facial expression recognition according to the head of the target object; and / or, performing the pixel-level processing on the current image according to the identification information in S830 may include: when the When the second part of the pixel information corresponds to the fourth value, it is determined that the second part of the pixel belongs to the hand of the target object; hand motion recognition is performed according to the hand of the target object.
  • the decoding device may send a control instruction to the drone based on the result of the facial expression recognition or the result of hand motion recognition. For example, when the hand is in a "T" shape, the drone is suspended or returned to home. Another example is when the operator nods, the drone speeds up and so on.
  • the meaning represented by the facial expressions or hand movements may be agreed in advance by the drone and the control terminal, which is not described in the embodiment of the present application.
  • the pixel-level processing in S830 may involve traffic management and the like.
  • S830 performs pixel-level processing on the current image according to the identification information, which may include: determining a first partial pixel when the first partial pixel in the pixel information corresponds to a fifth value.
  • the head of the target object; determining the driving direction of the target object according to the head of the target object; and / or, S830 performing pixel-level processing on the current image according to the identification information may include: when When the first partial pixel in the pixel information corresponds to a sixth value, it is determined that the second partial pixel belongs to the rear of the target object; and the driving direction of the target object is determined according to the rear of the target object.
  • the method implemented in this application can quickly find the retrograde vehicle and assist the traffic police to deal with it in time.
  • the description feature corresponding to the at least one pixel includes at least one of the following: a reflection intensity of a point cloud corresponding to the at least one pixel, and an infrared corresponding to the at least one pixel.
  • Performing pixel-level processing on the current image according to the identification information may include: performing statistics on the target object in the current image according to the description feature to obtain a statistical result. Taking the depth value as an example, the number of target objects whose depth value is a certain value, or the number of target objects whose depth value is within a certain range, is used for some distance-related statistics, which are not described here one by one.
  • the method 800 may further include: generating a heat map for the statistical results of the target object according to the statistical results. That is, after the statistics are completed, a heat map is generated for displaying the statistical results.
  • the attributes may be measured in pixels; the attributes may also be measured in pixel blocks, and the pixel information may include information about attributes of at least one pixel block.
  • a block includes at least two pixels.
  • the identification information may be located in auxiliary enhancement information or extended data of the current image.
  • FIG. 16 is a schematic block diagram of an image processing apparatus 1600 according to an embodiment of the present application. As shown in FIG. 16, the device 1600 includes:
  • At least one memory 1610 for storing computer-executable instructions
  • At least one processor 1620 is used to access the at least one memory 1610 and execute the computer-executable instructions to perform the following operations:
  • bitstream data of a current image including identification information, the identification information being used to identify at least one target object in the current image, the identification information including image area information and pixel information, the
  • the image region information includes a position and a size of an image region where the target object is located, and the pixel information includes attributes of at least one pixel in the image region;
  • Pixel level processing is performed on the current image according to the identification information.
  • the image processing apparatus indicates the position and size of the image area where the target object is located through the image area information, and indicates the attributes of multiple pixels in the image area through the pixel information, thereby identifying the target object with finer granularity This allows the image processing device to perform pixel-level processing on the target object more efficiently and accurately.
  • the image processing apparatus 1600 may be a decoding device.
  • the hardware of the decoding device is usually higher, and the decoding device is usually a computer or a server.
  • the image processing method in the embodiment of the present application enables recognition of a target object to be performed on an encoding end, and a decoding device only needs to perform subsequent image processing. Therefore, on the one hand, the image processing method in the embodiment of the present application can be implemented on platforms such as mobile phones and tablet computers; on the other hand, the computing resources of the decoding device can be used for more complex image processing, so that the decoding device can present better quality More beautiful images.
  • the processor 1620 performs pixel-level processing on the current image according to the identification information, including: changing display content of the current image according to the identification information.
  • the attributes of the at least one pixel include whether the at least one pixel belongs to the target object.
  • the processor 1620 is further configured to obtain a first image; the processor performing pixel-level processing on the current image according to the identification information includes: based on the identification information Fusion processing is performed on the current image and the first image to obtain a second image, where the second image includes at least part of the content of the current image and at least part of the content of the first image.
  • the processor 1620 performs fusion processing on the current image and the first image based on the identification information, including: performing a fusion process on the current image and the first image based on the identification information.
  • the first image performs weighted summing, wherein the weighting value of pixels corresponding to the target object in the current image is different from the weighting value of at least some pixels in the current image except for the target object.
  • the processor 1620 performs fusion processing on the current image and the first image based on the identification information, including: determining according to the image area information and the pixel information Pixels that do not belong to the target object in the current image; replace pixels that do not belong to the target object in the current image with corresponding pixels in the first image to obtain a second image.
  • the processor 1620 determines the current image based on the image area information and the pixel information.
  • Pixels that do not belong to the target object include: determining pixels outside the image region where the target object is located as pixels in the current image that do not belong to the target object according to the image region information; when the pixels When the first portion of pixels in the information corresponds to the first value, it is determined that the first portion of pixels does not belong to the target object.
  • the processor 1620 before the processor 1620 determines pixels that do not belong to the target object in the current image according to the image area information and the pixel information, the processor 1620 further uses At: determining a boundary of the target object based on the pixel information, and performing an expansion operation on the target object based on the boundary; the processor 1620 determining the current based on the image area information and the pixel information Pixels that do not belong to the target object in the image include: determining pixels that do not belong to the target object in the current image according to the image area information, the pixel information, and the boundary of the expanded target object.
  • the shooting angle of the current image is the same as the shooting angle of the first image.
  • the identification information further includes viewing angle information, where the viewing angle information is used to indicate a shooting angle of the target object relative to the shooting device; in the processor 1620, the Before the corresponding pixels replace pixels in the current image that do not belong to the target object, the processor 1620 is further configured to determine that the shooting angle of the target object is the same as the shooting angle of the first image.
  • the processor 1620 performs pixel-level processing on the current image according to the identification information, including: determining the current image according to the image area information and the pixel information. Adding the augmented reality AR special effect to the target object in.
  • the identification information further includes viewing angle information, the viewing angle information is used to indicate a shooting angle of the target object relative to the shooting device, and the attribute of the at least one pixel further includes the at least A depth value corresponding to one pixel;
  • the processor 1620 determines the target object in the current image according to the image area information and the pixel information, and adds an AR special effect to the target object, including: Image area information and the pixel information, determining the target object in the current image; adding an AR special effect to the target object according to the target object, the shooting angle, and a depth value corresponding to the at least one pixel .
  • the processor 1620 performs pixel-level processing on the current image according to the identification information, including: determining the current image according to the image area information and the pixel information. A pixel belonging to the target object or a pixel of a non-target object that does not belong to the target object in the current image; changing at least one of the brightness, color, and grayscale of the target object, or changing the non-target object At least one of the brightness, color, and grayscale of the target object, or changing the contrast of the target object and the non-target object.
  • the processor 1620 changes the brightness or color of the target object, or changes the brightness or color of the non-target object, including: modifying the target object or the non-target The YUV value, RGB value, or ⁇ curve of the object to change the brightness or color of the target object, or change the brightness or color of the non-target object.
  • the identification information further includes content information, which is used to indicate a content category of the target object.
  • the processor 1620 changes at least one of brightness, color, and grayscale of the target object, including: when the target object is a first content category, changing the The brightness of the target object is a preset first brightness value, the color of the target object is changed to a preset first color value, or the grayscale of the target object is changed to a preset first grayscale value.
  • the current image includes a plurality of the target objects
  • the processor 1620 changes at least one of the target object's brightness, color, and grayscale, including: Content categories of the target objects, and assign different brightness values, color values, or grayscale values to the target objects of different content categories.
  • the processor 1620 performs pixel-level processing on the current image according to the identification information, including: according to the image area information, the pixel information, and the content category, An object category segmentation image is generated based on the current image.
  • the content information is a label or a value.
  • the attributes of the at least one pixel include a part of the target object to which the at least one pixel belongs.
  • the at least one pixel is assigned a different value to indicate that the at least one pixel belongs to a different part of the target object.
  • the processor 1620 performs pixel-level processing on the current image according to the identification information, including: according to the image area information and the pixel information, performing processing on the target object Different parts of the logo are marked with different brightness, color, or grayscale, or have different contrast between different parts.
  • the attribute of the at least one pixel includes a description feature corresponding to the at least one pixel.
  • the description feature corresponding to the at least one pixel includes at least one of the following: a reflection intensity of a point cloud corresponding to the at least one pixel, an infrared intensity corresponding to the at least one pixel, and A depth value corresponding to the at least one pixel.
  • the processor 1620 performs pixel-level processing on the current image according to the identification information, including at least one of the following: according to the image area information, the pixel information A reflection intensity segmentation image of a point cloud corresponding to the at least one pixel is generated based on the current image; based on the image area information, the pixel information, and an infrared intensity corresponding to the at least one pixel, based on the An infrared image is generated from the current image; a depth map is generated based on the current image according to the image area information, the pixel information, and a depth value corresponding to the at least one pixel.
  • the current image includes a plurality of the target objects
  • the processor 1620 performs pixel-level processing on the current image according to the identification information, including: The reflection intensity, infrared intensity, or depth value of the target object is described, and different brightness values, color values, and grayscale values are assigned to the target object with different reflection intensity, infrared intensity, or depth value.
  • the processor 1620 performs pixel-level processing on the current image according to the identification information, including: performing statistics on data information in the current image according to the identification information. .
  • the identification information further includes content information, which is used to indicate a content category of the target object.
  • the processor 1620 performs pixel-level processing on the current image according to the identification information, including: according to the content category of the target object, processing all the current images in the current image. Statistics on the target objects are obtained, and statistical results are obtained.
  • the target object is a person
  • the processor 1620 performs pixel-level processing on the current image according to the identification information, including: when the first portion of pixels in the pixel information corresponds to At the third value, it is determined that the first part of the pixels belong to the head of the target object; and facial expression recognition is performed according to the head of the target object;
  • the processor 1620 performs pixel-level processing on the current image according to the identification information, including: when the second partial pixel in the pixel information corresponds to a fourth value, determining that the second partial pixel belongs to the target The hand of the object; performing hand motion recognition according to the hand of the target object.
  • the processor 1620 is further configured to send a control instruction to the drone based on a result of facial expression recognition or a result of hand motion recognition.
  • the target object is a car
  • the processor 1620 performs pixel-level processing on the current image according to the identification information, including: when a first portion of pixels in the pixel information corresponds to At a fifth value, determining that the first portion of pixels belongs to the head of the target object; and determining the driving direction of the target object according to the head of the target object;
  • the processor 1620 performs pixel-level processing on the current image according to the identification information, including: when a first partial pixel in the pixel information corresponds to a sixth value, determining that the second partial pixel belongs to the target object The rear of the vehicle; determining the driving direction of the target according to the rear of the vehicle.
  • the processor 1620 performs statistics on the target object in the current image according to the content category of the target object, and obtains a statistical result, including:
  • the content category collects statistics on the target objects of people, and obtains the results of the flow of people and / or the results of the density of people.
  • the processor 1620 performs statistics on the target object in the current image according to the content category of the target object, and obtains a statistical result, including:
  • the content category collects statistics on the target objects of the vehicle, and obtains the results of the traffic flow and / or the results of the traffic density.
  • the description feature corresponding to the at least one pixel includes at least one of the following: a reflection intensity of a point cloud corresponding to the at least one pixel, an infrared intensity corresponding to the at least one pixel, and A depth value corresponding to the at least one pixel; and the processor 1620 performs pixel-level processing on the current image according to the identification information, including: according to the description feature, the target object in the current image Perform statistics and get statistical results.
  • the processor 1620 is further configured to generate a heat map for the statistical result of the target object according to the statistical result.
  • the attributes are measured in pixel blocks
  • the pixel information includes information on attributes of at least one pixel block
  • the pixel blocks include at least two pixels.
  • the identification information is located in auxiliary enhancement information or extended data of the current image.
  • FIG. 17 is a schematic block diagram of an image processing apparatus 1700 according to an embodiment of the present application.
  • the image processing device 1700 may include an obtaining module 1710 for obtaining code stream data of the current image; a decoding module 1720 for decoding the code stream data to obtain the current image and the identification information; processing Module 1730 is configured to perform pixel-level processing on the current image according to the identification information.
  • Each module in the image processing apparatus 1700 can be used to execute the image processing method in each embodiment of the present application, and details are not described herein again.
  • processors mentioned in the embodiments of the present application may be a central processing unit (CPU), or other general-purpose processors, digital signal processors (DSPs), and application-specific integrated circuits (DSPs).
  • DSPs digital signal processors
  • DSPs application-specific integrated circuits
  • ASIC application specific integrated circuit
  • FPGA field programmable gate array
  • a general-purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
  • the memory mentioned in the embodiments of the present application may be a volatile memory or a non-volatile memory, or may include both volatile and non-volatile memory.
  • the non-volatile memory may be a read-only memory (ROM), a programmable read-only memory (PROM), an erasable programmable read-only memory (EPROM), or Erase programmable read-only memory (EPROM, EEPROM) or flash memory.
  • the volatile memory may be random access memory (RAM), which is used as an external cache.
  • RAM random access memory
  • DRAM dynamic random access memory
  • SDRAM synchronous dynamic random access memory
  • double SDRAM double SDRAM
  • DDR SDRAM double data rate synchronous dynamic random access memory
  • enhanced SDRAM enhanced SDRAM
  • SLDRAM synchronous connection dynamic random access memory
  • direct RAMbus RAM direct RAMbus RAM
  • the processor is a general-purpose processor, a DSP, an ASIC, an FPGA, or other programmable logic device, a discrete gate or transistor logic device, or a discrete hardware component
  • the memory memory module
  • An embodiment of the present application further provides a computer-readable storage medium having instructions stored thereon.
  • the computer is caused to execute the methods of the foregoing method embodiments.
  • An embodiment of the present application further provides a computer program, which causes a computer to execute the methods of the foregoing method embodiments.
  • An embodiment of the present application further provides a computing device, where the computing device includes the computer-readable storage medium described above.
  • the embodiments of the present application can be applied in the field of aircraft, especially in the field of drones.
  • circuits, sub-circuits, and sub-units in the embodiments of the present application is merely schematic. Those of ordinary skill in the art may realize that the circuits, sub-circuits, and sub-units of the examples described in the embodiments disclosed herein can be split or combined again.
  • a computer program product includes one or more computer instructions.
  • the computer may be a general purpose computer, a special purpose computer, a computer network, or other programmable device.
  • the computer instructions may be stored in or transmitted from one computer-readable storage medium to another computer-readable storage medium, for example, the computer instructions may be transmitted from a web site, computer, server, or data center via a wired (e.g., Coaxial cable, optical fiber, digital subscriber line (DSL), or wireless (such as infrared, wireless, microwave, etc.) transmission to another website site, computer, server, or data center.
  • the computer-readable storage medium may be any available medium that can be accessed by a computer or a data storage device such as a server, a data center, or the like that includes one or more available medium integrations.
  • Usable media may be magnetic media (e.g., floppy disks, hard disks, magnetic tapes), optical media (e.g., high-density digital video discs (DVDs)), or semiconductor media (e.g., solid state disks (SSDs) )Wait.
  • magnetic media e.g., floppy disks, hard disks, magnetic tapes
  • optical media e.g., high-density digital video discs (DVDs)
  • semiconductor media e.g., solid state disks (SSDs)
  • an embodiment or “an embodiment” mentioned throughout the specification means that a particular feature, structure, or characteristic related to the embodiment is included in at least one embodiment of the present application.
  • the appearances of "in one embodiment” or “in an embodiment” appearing throughout the specification are not necessarily referring to the same embodiment.
  • the particular features, structures, or characteristics may be combined in any suitable manner in one or more embodiments.
  • the size of the sequence numbers of the above processes does not mean the order of execution.
  • the execution order of each process should be determined by its function and internal logic.
  • the implementation process constitutes any limitation.
  • B corresponding to A means that B is associated with A, and B can be determined according to A.
  • determining B based on A does not mean determining B based solely on A, but also determining B based on A and / or other information.
  • the disclosed systems, devices, and methods may be implemented in other ways.
  • the device embodiments described above are only schematic.
  • the division of the unit is only a logical function division.
  • multiple units or components may be combined or Can be integrated into another system, or some features can be ignored or not implemented.
  • the displayed or discussed mutual coupling or direct coupling or communication connection may be indirect coupling or communication connection through some interfaces, devices or units, which may be electrical, mechanical or other forms.
  • the units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, may be located in one place, or may be distributed on multiple network units. Some or all of the units may be selected according to actual needs to achieve the objective of the solution of this embodiment.
  • each functional unit in each embodiment of the present application may be integrated into one processing unit, or each of the units may exist separately physically, or two or more units may be integrated into one unit.

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Image Processing (AREA)

Abstract

一种图像处理方法和装置,该方法包括:获取当前图像的码流数据,该码流数据中包括标识信息,该标识信息用于标识该当前图像中的至少一个目标物体,该标识信息包括图像区域信息和像素信息,该图像区域信息包括该目标物体所在的图像区域的位置和尺寸,该像素信息包括该图像区域中的至少一个像素的属性;对该码流数据进行解码,得到该当前图像和该标识信息;根据该标识信息,对该当前图像进行像素级别处理。该图像处理方法和装置通过图像区域信息指示目标物体所在的图像区域的位置和尺寸,通过像素信息指示图像区域中的多个像素的属性,从而以更细的粒度来标识目标物体,使得解码设备可以更高效更准确地对目标物体执行像素级别处理。

Description

图像处理方法和装置
版权申明
本专利文件披露的内容包含受版权保护的材料。该版权为版权所有人所有。版权所有人不反对任何人复制专利与商标局的官方记录和档案中所存在的该专利文件或者该专利披露。
技术领域
本申请涉及图像处理领域,尤其涉及一种图像处理方法和装置。
背景技术
在视频监控、人机互动、安全巡视等场景中,重点关注的物体(包括人物、动物、植物、公共设施、交通工具或景观、景物等)通常需要被标识出来,以便解码端或观测者更好地跟踪该物体在视频流中的变化,从而更好地辅助观测者对该物体进行观测或与其进行互动。图像处理中的这类方法一般可以被称为物体跟踪(object tracking)技术。
现有的物体跟踪技术,通常利用图像处理、计算机视觉以及计算机分析与理解等技术识别出视频流的内容,并将需要重点关注的物体标识出来。现有的方案,在编码端或解码端通过矩形区域标识每一帧图像中重点关注的物体的位置和尺寸。解码端基于矩形区域执行额外操作,处理效果差,处理效率低。
发明内容
本申请提供了一种图像处理方法和装置,使得解码设备可以更高效更准确地对目标物体执行像素级别处理。
第一方面,提供一种图像处理方法,包括:获取当前图像的码流数据,所述码流数据中包括标识信息,所述标识信息用于标识所述当前图像中的至少一个目标物体,所述标识信息包括图像区域信息和像素信息,所述图像区域信息包括所述目标物体所在的图像区域的位置和尺寸,所述像素信息包括所述图像区域中的至少一个像素的属性;对所述码流数据进行解码,得到所 述当前图像和所述标识信息;根据所述标识信息,对所述当前图像进行像素级别处理。
第二方面,提供一种图像处理装置,包括:至少一个存储器,用于存储计算机可执行指令;至少一个处理器,单独或共同地用于:访问所述至少一个存储器,并执行所述计算机可执行指令,以实施以下操作:获取当前图像的码流数据,所述码流数据中包括标识信息,所述标识信息用于标识所述当前图像中的至少一个目标物体,所述标识信息包括图像区域信息和像素信息,所述图像区域信息包括所述目标物体所在的图像区域的位置和尺寸,所述像素信息包括所述图像区域中的至少一个像素的属性;对所述码流数据进行解码,得到所述当前图像和所述标识信息;根据所述标识信息,对所述当前图像进行像素级别处理。
第三方面,提供一种计算机可读存储介质,其上存储有指令,当指令在计算机上运行时,使得计算机执行第一方面的方法。
本申请的图像处理方法和装置,通过图像区域信息指示目标物体所在的图像区域的位置和尺寸,通过像素信息指示图像区域中的多个像素的属性,从而以更细的粒度来标识目标物体,使得解码设备可以更高效更准确地对目标物体执行像素级别处理。
本申请的图像处理方法和装置使得目标物体的识别可以放在编码端进行,解码设备仅需要进行后续的图像处理即可。因此,一方面,本申请的图像处理方法可以在手机、平板电脑等平台上实现;另一方面,解码设备的计算资源可以用于更复杂的图像处理,使得解码设备能够呈现出更优质更精美的图像。
附图说明
图1是本申请提供的一个实施例的编码方法的示意性流程图。
图2是本申请的一个实施例的图像中目标物体的示意图。
图3本申请提供的一个实施例的解码方法的示意性流程图。
图4本申请提供的一个实施例的编码设备的示意性流程图。
图5本申请提供的另一个实施例的编码设备的示意性流程图。
图6本申请提供的一个实施例的解码设备的示意性流程图。
图7本申请提供的另一个实施例的解码设备的示意性流程图。
图8是本申请一个实施例的图像处理方法的示意性流程图。
图9A和图9B是本申请实施例的融合得到的两张图像的示意图。
图10A和图10B是本申请实施例的在目标物体上添加指示光环的示意图。
图11是本申请实施例的改变目标物体的亮度的示意图。
图12A是当前图像的原图;图12B是当前图像对应的物体类别分割图像。
图13是本申请一个实施例的不同部位用不同的颜色标识的图像的示意图。
图14A是当前图像的原图;图14B是当前图像对应的反射强度分割图像。
图15A是当前图像的原图;图15B是当前图像对应的深度图。
图16是本申请一个实施例的图像处理装置的示意性框图。
图17是本申请另一个实施例的图像处理装置的示意性框图。
具体实施方式
下面将结合附图,对本申请实施例中的技术方案进行描述。
除非另有定义,本文所使用的所有的技术和科学术语与属于本申请的技术领域的技术人员通常理解的含义相同。本文中在本申请的说明书中所使用的术语只是为了描述具体的实施例的目的,不是旨在于限制本申请。
首先介绍本申请实施例涉及的相关技术及概念。
目标物体可以是指图像中需重点关注的、待标识、已标识或待观测的物体,其可以包括人物、动物、植物、公共设施、交通工具或景观、景物等,也可以包括其他类型的物体;也可以包括人物、动物、植物、公共设施、交通工具景观、景物或其他类型物体的特定部位。
图像区域可以是指目标物体所在的、形状规则或不规则的一片区域。通常而言,图像区域的位置和尺寸应使得目标物体的各部位全部落入该图像区域中,或者应使得目标物体上至少80%的区域落入该图像区域中。图像区域可以大致圈定一个范围,使得解码端能更快地确定出目标物体的位置和尺寸。
子图像区域可以是图像区域中像素具有同一属性的一片区域。
对于一个从编码端到解码端的***,现有的一种物体追踪技术的做法是在编码端对视频内容进行编码;在解码端对视频内容进行分析,找到需要重点关注的物体,并对该需要重点关注的物体进行标识,即标识在解码端完成。
在解码端完成标识的问题在于,视频编码通常是一个有损的过程,视频内容经过编码之后其信息会遭受损失。解码端解码得到的视频内容相比于编码端的视频内容在质量、信息量上都有一定程度的下降。解码端针对一个受损的视频内容进行分析、提取出需要重点关注的物体,效果通常不如人意。此外,在解码端对视频内容进行分析以及进行物体的提取,会大量耗费解码端的计算资源。然而,解码端广泛应用于手机等移动设备中,此类移动设备对功耗较为敏感。因此,在解码端耗费计算能力进行视频内容的分析,会在一定程度上影响用户体验。
本申请中,将对视频内容分析的功能由解码端转移到编码端执行。这些技术通过在编码端对提取出来的物体进行标识,将标识信息写入视频文件中,保证解码端通过解析标识信息可以识别从编码端提取出来的物体。这样做的好处在于:1、在编码端对原始的未经压缩损失的视频内容进行分析,可以更高效、更准确地提取出需要重点关注的物体。2、由于编码端的设备通常具有更强的计算能力,并且编码端的设备通常本身就需要对视频内容进行分析以便执行一些额外的操作,因此将原来解码端的计算和分析转移至编码端,不会带来不好的用户体验。这些额外的操作,可以是例如在无人机***上对所拍摄的视频内容进行分析之后,执行的避障操作。
在一些实现方式中,编码端可以使用通用的视频编码标准,例如,H.264/高级视频编码(advanced video coding,AVC)标准、H.265/高效率视频编码(high efficiency video coding,HEVC)标准、信源编码标准(audio video coding standard,AVS)1-P2、AVS2-P2、VP9标准、开放媒体联盟视频(AOMedia Video,AV)1标准、通用视频编码(versatile video coding,VVC)标准对视频内容进行编码,得到视频文件。然后,
图1是本申请提供的一个实施例的编码方法100的示意性流程图。编码方法100由编码设备执行。如图1所示,该编码方法100包括:S110,对当前图像进行编码处理,生成码流数据,该码流数据中包括标识信息,该标识信息用于标识该当前图像中的至少一个目标物体,该标识信息包括图像区域信息和像素信息,该图像区域信息包括该目标物体所在的图像区域的位置和 尺寸,该像素信息包括该图像区域中的至少一个像素的属性。
本申请实施例提供的编码方法,通过图像区域信息指示目标物体所在的图像区域的位置和尺寸,通过像素信息指示图像区域中的多个像素的属性,从而以更细的粒度来标识目标物体,有利于解码端更高效更准确地对目标物体执行操作。
在一些可能的实现方式中,在S110对当前图像进行编码处理,生成码流数据之前,编码方法100还可以包括:对该当前图像进行图像识别,确定该目标物体,得到该目标物体的该标识信息。其中,图像识别可以基于图像处理、计算机视觉以及计算机分析与理解等技术。当然本申请实施例的标识信息也可以是通过其他方式得到的,例如通过接收外界输入等。得到的标识信息的形式和内容可以是多样的,这将在下文进行详细描述。
在一些可能的实现方式中,标识信息可以位于当前图像的辅助强化信息或扩展数据。例如,辅助强化信息可以为SEI(Supplemental Enhancement Information)扩展数据可以为ED(Extension Data)中。SEI和ED通常可以认为是码流数据的一部分。解码设备接收到SEI和/或ED时,可以根据SEI和/或ED进行解码,也可以丢弃SEI和/或ED,是否对标识信息进行解码可以不影响对当前图像的内容的解码。这也将在下文进行详细描述。
可选地,在本申请的一些实施例中,图像区域可以为矩形区域。一些实现方式中,图像区域为能够框住目标物体的最小矩形区域或较小的矩形区域。用图像区域信息指示矩形区域的位置和尺寸的方式可以有多种。例如,图像区域信息可以包括该矩形区域的任意一角的坐标(例如可以是左上角的坐标)、该矩形区域的高度信息和该矩形区域的宽度信息。再如,图像区域信息可以包括该矩形区域的中心点坐标、该矩形区域的高度信息和该矩形区域的宽度信息;其中,该矩形区域的高度信息可以是该矩形区域的全高或者半高,该矩形区域的宽度信息可以是该矩形区域的全宽或者半宽,在此不做限制。又如,图像区域信息可以包括该矩形区域的左上角坐标和该矩形区域的右下角坐标。当然,图像区域信息可以包括该矩形区域的右上角坐标和该矩形区域的左下角坐标,等等。本申请各实施例对图像区域信息的具体内容不作限定。
此外,在本申请的另一些实施例中,图像区域可以为其他形状,例如为圆形、多边形或曲边形,等等。当图像区域为圆形时,图像区域信息可以包 括圆心的坐标(即中心点的坐标)和半径信息。当图像区域为多边形,例如为正六边形时,图像区域信息可以包括中心点的坐标,以及中心点至正六边形的顶点的距离信息。本领域技术人员可理解地,图像区域和图像区域信息还可以由其他可替换的形式或方案得到,此处不在文中一一列举。
应理解,在本申请各实施例中,图像区域可以包括多个子图像区域。子图像区域可以是图像区域中像素具有同一属性的一片区域。例如,一个子图像区域可以是目标物体对应的一片区域,另一个子图像区域可以是背景对应的一片区域。再如,一个子图像区域可以是目标物体的一个部位对应的一片区域,另一个子图像区域可以是目标物体的另一个部位对应的一片区域,又一个子图像区域可以是背景对应的一片区域。
在本申请各实施例中,属性可以以像素为计量单位,即每个像素对应各自的属性,对应地,像素信息包括每个像素的属性的信息;属性也可以以像素块为计量单位,对应地,像素信息包括至少一个像素块的属性的信息,像素块包括至少两个像素。
像素块可以是粒度比图像区域细或小的区域。一个像素块的属性指的是,该像素块内的所有像素点的属性均为该像素块的属性。像素块可以是一个形状规则的块,例如为正方形、或矩形的块。像素块也可以是一个形状不规则的块。像素块可以包括多个像素(例如,2个、4个、9个或16个像素)。当属性以像素块为计量单位时,多个像素块的尺寸可以是相同的,也可以是不同的。可以先对当前图像进行下采样,来得到像素块对应的属性的信息。
相对于属性以像素为计量单位的情况,属性以像素块为计量单位可以减少编码设备存储或传输的数据量。本领域技术人员可理解地,像素信息还可以由其他可替换的形式或方案得到,此处不在文中一一列举。
可选地,在本申请的一些实施例中,像素信息可以包括赋予图像区域中的至少一个像素的数值;其中,不同子图像区域中的像素赋予相同或者不同的数值。应理解,同一图像区域中不同子图像区域的像素的数值可以相同或者不同,例如图形区域中包括不相接的两个子图像区域均为除目标物体以外的区域,那么该两个子图像区域中的像素所赋予的数值可以相同或者不同。不同图像区域中的子图像区域的像素的数值可以相同或者不同,例如不同图像区域中属于目标物体的子图像区域所赋予的数值可以相同或者不同,不同图像区域中属于目标物体以外的子图像区域所赋予的数值可以相同或者不 同。当然,像素信息也可以用非数值的指示符表示,本申请实施例对此不作限定。
可选地,在本申请的一些实施例中,至少一个像素的属性可以包括至少一个像素是否属于目标物体。例如,在像素信息中,为至少一个像素赋予不同的数值,用于指示至少一个像素是否属于目标物体。
在一种可能的实现方式中,至少一个像素中,第一部分像素被赋予第一数值,来指示第一部分像素不属于目标物体。即,像素信息中包括不属于目标物体的像素的数值。例如,图像区域中包括一个(或多个)子图像区域,为目标物体;图像区域中还包括若干个子图像区域,为不属于目标物体的背景。像素信息中可以仅包括不属于目标物体的像素的属性,或者说像素信息中可以仅包括不属于目标物体的像素的数值。换句话说,像素信息中可以仅包括属于背景的若干个子图像区域的像素的属性或数值。
在另一种可能的实现方式中,至少一个像素中,第二部分像素被赋予第二数值,来指示第二部分像素属于目标物体。即,像素信息中包括属于目标物体的像素的数值。例如,图像区域中包括一个(或多个)子图像区域,为目标物体;图像区域中还包括若干个子图像区域,为不属于目标物体的背景。像素信息中可以仅包括属于目标物体的像素的属性,或者说像素信息中可以仅包括属于目标物体的像素的数值。换句话说,像素信息中可以仅包括属于目标物体的一个(或多个)子图像区域的像素的属性或数值。
在又一种可能的实现方式中,至少一个像素中,第一部分像素被赋予第一数值,来指示第一部分像素不属于目标物体;第二部分像素被赋予第二数值,来指示第二部分像素属于目标物体。即,像素信息中包括所有像素的数值。例如,图像区域中包括一个(或多个)子图像区域,为目标物体;图像区域中还包括若干个子图像区域,为不属于目标物体的背景。像素信息中可以既包括属于目标物体的像素的属性,又包括属于背景的像素的属性;或者说,像素信息中可以既包括属于目标物体的像素的数值,又包括属于背景的像素的数值。换句话说,像素信息中可以既包括属于目标物体的一个(或多个)子图像区域的像素的属性或数值;又包括属于背景的若干个子图像区域的像素的属性或数值。
在属性以像素为计量单位一个例子中,像素信息可以以模板(mask)来表示。模板值可以以二进制数值0和1来标识。像素信息中属于目标物体的 像素的模板值为1;属于背景的像素的模板值为0。以目标物体i的图像区域为矩形区域为例;目标物体i的图像区域信息包括该矩形区域的左上角的坐标、该矩形区域的高度信息和该矩形区域的宽度信息;目标物体i的像素信息以模板来表示为例,目标物体i的标识信息的具体内容可以如下。本领域技术人员可理解地,该内容仅是示意性的,可以由其他可替换的形式或方案得到,此处不在文中一一列举。
Figure PCTCN2018094716-appb-000001
其中,ar_object_top[i]、ar_object_left[i]、ar_object_width[i]和ar_object_height[i]表示目标物体i的位置和尺寸,ar_object_top[i]和ar_object_left[i]表示目标物体i的左上角的位置;ar_object_width[i]和ar_object_height[i]表示目标物体i的宽度和高度。mask[m][n]表示相对于矩形区域的左上角,坐标在垂直和水平方向偏移m和n的像素对应的模板值。当像素属于目标物体时,mask[m][n]的值为1;否则,像素属于背景时,mask[m][n]的值为0。
此外,对于mask的标识可以采用逐点的标识方法,也可以通过表示所述目标物体在所述以ar_object_top[i]、ar_object_left[i]、ar_object_width[i]和ar_object_height[i]标识的目标框类的每一行的起点位置以及所述目标物体在该行的长度来标识。具体方法如下:
Figure PCTCN2018094716-appb-000002
其中,mask_pos[i][m]表示第i个物体在所述目标框内第m行的起点位置,mask_len[i][m]表示第i个物体在所述目标框内第m行的长度。
图2是本申请的一个实施例的图像200中目标物体的示意图。如图2所 示,该图像200中包括目标物体1和目标物体2。目标物体1对应的图像区域1为矩形区域;目标物体2对应的图像区域2也为矩形区域。图像区域1中,值为1的像素属于目标物体1,值为0的像素不属于目标物体1。图像区域2中,值为1的像素属于目标物体2,值为0的像素不属于目标物体2。
可选地,在本申请的一些实施例中,至少一个像素的属性包括至少一个像素所属的目标物体的部位。例如,在像素信息中,不同像素被赋予不同的数值,用于指示不同像素属于目标物体的不同部位。当然还可以有部分像素用于指示不属于目标物体,而是属于背景。
在一个具体的例子中,目标物体是人;该至少一个像素中的第一部分像素被赋予第三数值,用于指示该第一部分像素属于目标物体的头部;和/或,该至少一个像素中的第二部分像素被赋予第四数值,用于指示该第二部分像素属于目标物体的手部。至少一个像素中还可以包括第三部分像素,用于指示该第三部分像素不属于目标物体,而是属于背景。例如,第三部分像素被赋予0,用于指示该第三部分像素不属于目标物体,而是属于背景;第一部分像素被赋予1,用于指示该第一部分像素属于目标物体的头部;第二部分像素被赋予2,用于指示该第二部分像素属于目标物体的手部。
在另一个具体的例子中,目标物体是车;该至少一个像素中的第一部分像素赋予第五数值,用于指示该第一部分像素属于目标物体的车头;和/或,该至少一个像素中的第二部分像素赋予第六数值,用于指示该第二部分像素属于目标物体的车尾。至少一个像素中还可以包括第三部分像素,用于指示该第三部分像素不属于目标物体,而是属于背景。例如,第三部分像素被赋予0,用于指示该第三部分像素不属于目标物体,而是属于背景;第一部分像素被赋予1,用于指示该第一部分像素属于目标物体的车头;第二部分像素被赋予2,用于指示该第二部分像素属于目标物体的车尾。
可选地,在本申请的一些实施例中,至少一个像素的属性包括至少一个像素对应的描述特征。例如,该描述特征可以是点云数据。在一个具体的例子中,至少一个像素对应的描述特征可以包括以下中的至少一种:所述至少一个像素对应的点云的反射强度(intensity)、所述至少一个像素对应的红外强度和所述至少一个像素对应的深度值。其中,深度是一种距离上的度量,例如到镜头的距离。
上文具体描述了编码设备标识目标物体使用的标识信息。下面将给出一 种既可以有效的标识出目标物体,又可以有效提高标识的效率,减少数据存储和传输的具体的标识方案。该具体的标识方案的核心思想在于,通过将当前图像与已编码图像进行对比,标识出相对已编码图像有变化的一个或多个目标物体。其中,可以将当前图像中的标识物体与已编码图像中的标识物体逐个对比。
可选地,在本申请的一些实施例中,目标物体可以为符合以下情况的至少一种的物体:当前图像相对已编码图像新增的标识物体;当前图像相对已编码图像位置发生变化的标识物体;当前图像相对已编码图像尺寸发生变化的标识物体;当前图像相对已编码图像中图像区域中的像素信息发生变化的标识物体。
或者,从执行步骤的角度,编码方法100还可以包括以下步骤中的至少一个:将当前图像相对已编码图像,新增的待标识物体确定为目标物体;将当前图像相对已编码图像,位置和/或尺寸发生变化的待标识物体确定为目标物体;将当前图像相对已编码图像,图像区域中的像素信息发生变化的待标识物体确定为目标物体。
从码流数据的角度,码流数据的标识信息中还包括用于指示以下至少一种情况的类别标识位:所述目标物体为所述当前图像相对已编码图像新增的标识物体;所述目标物体为所述当前图像相对已编码图像位置发生变化的标识物体;所述目标物体为所述当前图像相对已编码图像尺寸发生变化的标识物体;所述目标物体为所述当前图像相对已编码图像中所述图像区域中的像素信息发生变化的标识物体。通过类别标识位将标识物体的变化情况标识出来。例如,通过类别标识位表示出标识物体是新增的、或是位置/尺寸、像素信息发生了变化。
应理解,本申请各实施例中,当前图像相对已编码图像位置发生变化的标识物体,可以是指标识物体本身的位置发生变化,也可以是指标识物体所在的图像区域的位置发生变化。当前图像相对已编码图像尺寸发生变化的标识物体,可以是指标识物体本身的尺寸发生变化,也可以是指标识物体所在的图像区域的尺寸发生变化。
在一些实现方式中,目标物体包括当前图像相对已编码图像新增的标识物体,图像区域信息包括新增的标识物体所在的图像区域的位置的绝对值和尺寸的绝对值。在当前图像相对已编码图像新增标识物体时,图像区域信息 和像素信息均应标出。图像区域信息可以包括新增的标识物体所在的图像区域的位置的绝对值和尺寸的绝对值。
在一些实现方式中,目标物体可以包括当前图像相对已编码图像,位置发生变化的标识物体,那么该目标物体(也即该位置发生变化的标识物体)的图像区域信息包括该目标物体所在的图像区域的位置的绝对值或位置变化的相对值。其中,该位置的绝对值,指的是目标物体在当前图像中所在图像区域的位置;该位置变化的相对值,指的是目标物体在已编码图像中所在图像区域的位置,与该目标物体在当前图像中所在图像区域的位置之间的差值。
在上述当前图像相对已编码图像存在目标物体(也即位置发生变化的标识物体)的实现方式中,可能存在该目标物体在当前图像中所在图像区域的尺寸相比在该已解码图像中所在图像区域的尺寸发生变化或保持不变两种情况。
在发生变化的情况中,可选地,该目标物体的图像区域信息包括该目标物体所在的图像区域的尺寸的绝对值或尺寸变化的相对值。其中,该尺寸的绝对值,指的是目标物体在当前图像中所在图像区域的尺寸;该尺寸变化的相对值,指的是目标物体在已编码图像中所在区域的尺寸,与该目标物体在当前图像中所在图像去的尺寸之间的差值。
在保持不变的情况中,可选地,该目标物体的图像区域信息包括标识位,用于指示该目标物体所在图像区域的尺寸相比在该已解码图像中保持不变。可选的,码流数据中该目标物体的图像区域信息中不对该图像区域的尺寸进行编码。
在上述当前图像相对已编码图像存在目标物体(也即位置发生变化的标识物体)的实现方式中,可能存在该目标物体在当前图像中所在图像区域的像素信息相比在该已解码图像中所在图像区域的像素发生变化或保持不变两种情况。
在发生变化的情况中,可选地,该目标物体的像素信息包括该目标物体所在的图像区域的至少一个像素的属性的绝对值或者至少一个像素的属性变化的相对值。其中,该属性的绝对值,指的是目标物体在当前图像中所在图像区域的至少一个像素的属性;至少一个像素的属性可以是指该图像区域中所有像素的属性的绝对值,也可以是指该图像区域中属性发生变化的部分 像素的属性的绝对值。该属性变化的相对值,指的是该目标物体在当前图像中所在图像区域的像素被赋予的数值,与该目标物体在已编码图像中所在图像区域的像素被赋予的数值之间的差值。该相对值可以是该图像区域中所有像素分别对应的差值,也可以是该图像区域中属性发生变化的部分像素对应的差值。
在保持不变的情况中,可选地,该目标物体的图像区域信息包括标识位,用于指示该目标物体所在图像区域的像素信息相比在该已解码图像中保持不变。可选的,码流数据中不对该目标物体的像素信息进行编码。
在一些实现方式中,目标物体可以包括当前图像相对已编码图像,尺寸发生变化的标识物体,那么该目标物体(也即该尺寸发生变化的标识物体)的图像区域信息包括该目标物体所在的图像区域的尺寸的绝对值或尺寸变化的相对值。
在上述当前图像相对已编码图像存在目标物体(也即尺寸发生变化的标识物体)的实现方式中,可能存在该目标物体在当前图像中所在图像区域的位置相比在该已解码图像中所在图像区域的位置发生变化或保持不变两种情况。
在发生变化的情况中,可选地,该目标物体的图像区域信息包括该目标物体所在的图像区域的位置的绝对值或位置变化的相对值。
在保持不变的情况中,可选地,该目标物体的图像区域信息包括标识位,用于指示该目标物体所在图像区域的位置相比在该已解码图像中保持不变。可选的,码流数据中该目标物体的图像区域信息中不对该图像区域的位置进行编码。
在上述当前图像相对已编码图像存在目标物体(也即位置发生变化的标识物体)的实现方式中,可能存在该目标物体在当前图像中所在图像区域的像素信息相比在该已解码图像中所在图像区域的像素发生变化或保持不变两种情况。
在发生变化的情况中,可选地,该目标物体的像素信息包括该目标物体所在的图像区域的至少一个像素的属性的绝对值或者至少一个像素的属性变化的相对值。
在保持不变的情况中,可选地,该目标物体的图像区域信息包括标识位,用于指示该目标物体所在图像区域的像素信息相比在该已解码图像中保持 不变。可选的,码流数据中不对该目标物体的像素信息进行编码。
在一些实现方式中,目标物体可以包括当前图像相对已编码图像,所在图像区域的像素信息发生变化的标识物体,那么该目标物体(也即该像素信息发生变化的标识物体)在当前图像所在图像区域的像素信息包括该目标物体所在的图像区域的至少一个像素的属性的绝对值或属性的变化的相对值。
在上述当前图像相对已编码图像存在目标物体(也即像素信息发生变化的标识物体)的实现方式中,可能存在该目标物体在当前图像中所在图像区域的位置相比在该已解码图像中所在图像区域的位置发生变化或保持不变两种情况。
在发生变化的情况中,可选地,该目标物体的图像区域信息包括该目标物体所在的图像区域的位置的绝对值或位置变化的相对值。
在保持不变的情况中,可选地,该目标物体的图像区域信息包括标识位,用于指示该目标物体所在图像区域的位置相比在该已解码图像中保持不变。可选的,码流数据中该目标物体的图像区域信息中不对该图像区域的位置进行编码。
在上述当前图像相对已编码图像存在目标物体(也即像素信息发生变化的标识物体)的实现方式中,可能存在该目标物体在当前图像中所在图像区域的尺寸相比在该已解码图像中所在图像区域的尺寸发生变化或保持不变两种情况。
在发生变化的情况中,可选地,该目标物体的图像区域信息包括该目标物体所在的图像区域的尺寸的绝对值或尺寸变化的相对值。
在保持不变的情况中,可选地,该目标物体的图像区域信息包括标识位,用于指示该目标物体所在图像区域的尺寸相比在该已解码图像中保持不变。可选的,码流数据中该目标物体的图像区域信息中不对该图像区域的尺寸进行编码。
需要补充说明的是,上面所述的几种实现方式之间至少部分实现方式是可以结合的。例如,对于上述目标物体所在的图像区域的位置和尺寸均保持不变的情况,图像区域信息中可包括标识位,用于指示目标物体所在的图像区域的尺寸和位置相比已编码图像保持不变。应理解,该标识位可以是一个标识位,同时指示尺寸和位置均不变;该标识位也可以包括两个子标识位,分别指示尺寸不变和位置不变。
本领域技术人员可理解地,当目标物体所在的图像区域的位置、目标物体所在的图像区域的尺寸和目标物体在图像区域中的像素三个参数中,仅有一个或两个发生变化时,标识信息中均可以既包括图像区域信息又包括像素信息。这种方案中,标识信息的具体内容可以如下。
Figure PCTCN2018094716-appb-000003
Figure PCTCN2018094716-appb-000004
其中,ar_object_mask_present_flag表示当前图像中是否需要标识物体的mask信息;ar_num_objects_minus1表示当前图像中需要标识的物体的数目;ar_object_idx[i]表示当前图像中第i个需要标识的物体的标号;ar_bounding_box_mask_present_flag[ar_object_idx[i]]表示标号为ar_object_idx[i]是否有标识物体形状的mask;ar_bounding_box_mask_infer_flag[ar_object_idx[i]]表示当标号为ar_object_idx[i]含有mask信息时,该mask值是否来自于之前已编码图像的标号为ar_object_idx[i]的物体的mask;ar_new_object_flag[ar_object_idx[i]]表示当前图像中标号为ar_object_idx[i]的物体是否是新出现的物体;ar_object_bounding_box_update_flag[ar_object_idx[i]]表示当前图像和已编码图像中,标号为ar_object_idx[i]的物体在当前图像中的位置和尺寸是否发生了变化;ar_object_top[ar_object_idx[i]]、ar_object_left[ar_object_idx[i]]、ar_object_width[ar_object_idx[i]]和ar_object_height[ar_object_idx[i]]表示标号为ar_object_idx[i]的物体的位置和尺寸,其中ar_object_top[ar_object_idx[i]]和ar_object_left[ar_object_idx[i]]表示标号为ar_object_idx[i]的物体左上角的位置;ar_object_width[ar_object_idx[i]]和ar_object_height[ar_object_idx[i]]表示标号为ar_object_idx[i]的物体的宽度和高度。mask[m][n]表示相对于矩形区域的左上角,坐标在垂直和水平方向偏移m和n的像素对应的模板值。当像素属于目标物体时,mask[m][n]的值为1;否则,像素属于背景时,mask[m][n]的值为0。
此外,对于mask的标识可以采用逐点的标识方法,也可以通过表示所述目标物体在所述以ar_object_top[ar_object_idx[i]]、ar_object_left[ar_object_idx[i]]、ar_object_width[ar_object_idx[i]]和ar_object_height[ar_object_idx[i]]标识的目标框类的每一行的起点位置以及所述目标物体在该行的长度来标识。具体方法如下:
Figure PCTCN2018094716-appb-000005
Figure PCTCN2018094716-appb-000006
其中,mask_pos[ar_object_idx[i]][m]表示第ar_object_idx[i]个物体在所述目标框内第m行的起点位置,mask_len[ar_object_idx[i]][m]表示第ar_object_idx[i]个物体在所述目标框内第m行的长度。
应理解,ar_new_object_flag等可以认为是上文中提到的类别标识位。ar_object_idx[i]是目标物体的标号,也可以叫做目标物体的指示位、编号或索引,用于指示是哪个目标物体。
可选地,在本申请的一些实施例中,码流数据和/或标识信息中还可以包括已编码图像的指示位,用于指示当前参考的是哪个已编码图像。该指示位可以是已编码图像的编号,或者是在编码顺序上距离当前图像的帧数。当然,码流数据和标识信息也可以不包括已编码图像的参考位,而是使用协议规定或默认的前一帧图像或者前N帧图像,作为参考的已编码图像。
可选地,在本申请的另一些实施例中,已编码图像可以是通过以下方法确定的。采用当前图像中的一个或多个目标物体的标号作为搜索条件,从已完成编码的多个图像中,搜索出所包括的目标物体最接近当前图像的图像,作为用作参考的已编码图像。
可选地,在本申请的又一些实施例中,已编码图像可以是通过以下方法确定的。采用当前图像中的目标物体所在的图像区域的位置、尺寸和像素信息这三个参数中的至少一个参数值作为搜索条件,从已完成编码的多个图像中,搜索出与该至少一个参数值最接近的图像,作为用作参考的已编码图像。其中,该目标物体可以为一个或多个。搜索可以基于与当前图像中相同的目标物体的至少一个参数值进行,即当是相同目标物体、并且位置和/或尺寸和/或像素信息最接近时,认为搜索到了用作参考的已编码图像。搜索也可以不基于与当前图像中相同的目标物体,而是仅基于位置、尺寸和像素信息中的至少一个数值进行,即不考虑目标物体是否相同,位置和/或尺寸和/或像素信息最接近时,就认为搜索到了用作参考的已编码图像。
在无人机的一种应用场景中,无人机会通过云台控制摄像设备,使得人物等目标物体一直保持在画面的中心或者画面的某个特定位置。结合本申请实施例的编码方法,即将目标物体所在的图像区域的中心一直保持在画面的中心或者画面的某个特定位置。在该应用场景中,或者其他目标物体会在图像区域的位置在多帧中保持不变的应用场景中,图像区域可以为矩形区域, 图像区域信息可以包括矩形区域的中心点坐标、矩形区域的高度信息和矩形区域的宽度信息。由于这种应用场景中图像区域的位置在多帧中保持不变,只是图像区域的尺寸和/或图像区域中的像素信息发生变化,因此码流数据中可以不对图像区域信息中图像区域的中心点坐标的具体数值进行编码,而是用标识位来指示其数值不变。
对于上述应用场景,图像区域为矩形区域,图像区域信息包括矩形区域的中心点坐标、矩形区域的高度信息和矩形区域的宽度信息。图像区域信息可以包括标识位,用于指示目标物体所在的图像区域的中心点坐标相比所述已编码图像保持不变。
可选地,在本申请的一些实施例中,标识信息还可以用于标识当前图像相对已编码图像的被移除物体。应理解,本申请各实施例的每个标识物体可以具有唯一的标号或索引。并且,相同的标识物体在不同图像中的标号或索引可以相同。在一些可能的实现方式中,标识信息包括被移除物体的标号信息或被移除物体的位置信息。在一个例子中,被移除物体的具体的标识方案可以如下。
Figure PCTCN2018094716-appb-000007
其中,ar_num_cancel_objects表示当前图像相对于已编码图像不再存在的物体;ar_cancel_object_idx[i]表示上述不再存在的物体的标号。
上文中提到目标物体可以是人、车和公共设施等。可选地,在本申请的一些实施例中,标识信息中还可以包括内容信息,内容信息用于指示目标物体的内容。
在一个例子中,内容信息可以为标签(label)信息。label内可以使用自然语言,直接标明目标物体的内容,所述的自然语言可以采用互联网工程任务组(Internet Engineering Task Force,IETF)请求注解(Request For Comments,RFC)5646标准,即IETF RFC 5646标准来表示。在另一个例子中,内容信息可以为数值。即可以增加一维数值,通过不同的数值来指示目标物体为何种内容。例如,内容信息的数值为1表示目标物体的内容为人;内容信息的数值为2表示目标物体的内容为车。
可选地,在本申请的一些实施例中,所述码流数据还可以包括所述当前图像的图像内容数据。
在一种可能的实现方式中,所述当前图像的图像内容数据包括所述当前图像的参考帧数据以及所述当前图像与所述参考帧之间的残差数据。
图3是本申请提供的一个实施例的解码方法300的示意性流程图。解码方法300由解码设备执行。如图3所示,该解码方法300包括:S310,获取当前图像的码流数据,该码流数据中包括标识信息,该标识信息用于标识该当前图像中的至少一个目标物体,该标识信息包括图像区域信息和像素信息,该图像区域信息包括该目标物体所在的图像区域的位置和尺寸,该像素信息包括该图像区域中的至少一个像素的属性;S320,对该码流数据的至少部分进行解码处理。
本申请实施例提供的解码方法,通过图像区域信息指示目标物体所在的图像区域的位置和尺寸,通过像素信息指示图像区域中的多个像素的属性,从而以更细的粒度来标识目标物体,有利于解码设备更高效更准确地对目标物体执行操作。
本实施例提供的解码方法中,步骤310中所获取到的当前图像的码流数据可以和本发明中提供的编码方法中的码流数据相同,对于步骤310中的码流数据的解释可以参考上述编码方法中对码流数据的解释。
可选地,在本申请的一些实施例中,至少一个像素的属性可以包括至少一个像素是否属于目标物体。
可选地,在本申请的一些实施例中,图像区域可以包括多个子图像区域,像素信息可以包括赋予图像区域中的至少一个像素的数值;其中,不同子图像区域中的像素赋予不同的数值。
可选地,在本申请的一些实施例中,像素信息中,可以为至少一个像素赋予不同的数值,S320,对该码流数据的至少部分进行解码处理,可以包括:根据码流数据中的像素信息,确定图像区域中的至少一个像素是否属于目标物体。
在一种可能的实现方式中,至少一个像素中,第一部分像素可以被赋予第一数值,根据码流数据中的像素信息,确定图像区域中的至少一个像素是否属于目标物体,可以包括:当码流数据中的像素信息中第一部分像素对应第一数值是,确定第一部分像素不属于目标物体。例如,像素信息中第一部 分像素对应0,则第一部分像素不属于目标物体。
在另一种可能的实现方式中,至少一个像素中,第二部分像素可以被赋予第二数值,根据码流数据中的像素信息,确定图像区域中的至少一个像素是否属于目标物体,可以包括:当码流数据中的像素信息中第二部分像素对应第二数值时,确定第二部分像素属于目标物体。例如,像素信息中第二部分像素对应1,则第二部分像素属于目标物体。
应理解,与编码方法类似地,上述两种可能的实现方式可以单独实现,也可以相互结合实现,本申请实施例对此不作限定。
可选地,在本申请的一些实施例中,至少一个像素的属性可以包括至少一个像素所属的目标物体的部位。
在一种可能的实现方式中,像素信息中,不同像素可以被赋予不同的数值,S320对该码流数据的至少部分进行解码处理,可以包括:根据码流数据中的像素信息,确定图像区域中的至少一个像素在目标物体所属的部位。
在一个具体的例子中,目标物体可以是人;至少一个像素中的第一部分像素可以被赋予第三数值,根据码流数据中的像素信息,确定图像区域中的至少一个像素在目标物体所属的部位,可以包括:当码流数据中的像素信息中第一部分像素对应第三数值时,确定第一部分像素属于目标物体的头部;和/或,至少一个像素中的第二部分像素可以被赋予第四数值,根据码流数据中的像素信息,确定图像区域中的至少一个像素在目标物体所属的部位,可以包括:当码流数据中的像素信息中第二部分像素对应第四数值时,确定第二部分像素属于目标物体的手部。
在另一个具体的例子中,目标物体可以是车;至少一个像素中的第一部分像素可以被赋予第五数值,根据码流数据中的像素信息,确定图像区域中的至少一个像素在目标物体所属的部位,可以包括:当码流数据中的像素信息中第一部分像素对应第五数值时,确定第一部分像素属于目标物体的车头;和/或,至少一个像素中的第二部分像素可以被赋予第六数值,根据码流数据中的像素信息,确定图像区域中的至少一个像素属于目标物体的部位,可以包括:当码流数据中的像素信息中第二部分像素对应第六数值时,确定第二部分像素属于目标物体的车尾。
在一种可能的实现方式中,至少一个像素的属性可以包括至少一个像素对应的描述特征。例如,至少一个像素对应的描述特征可以包括以下中的至 少一种:所述至少一个像素对应的点云的反射强度、所述至少一个像素对应的红外强度和所述至少一个像素对应的深度值。
可选地,在本申请的一些实施例中,属性以像素块为计量单位,像素信息可以包括至少一个像素块的属性的信息,像素块可以包括至少两个像素。
可选地,在本申请的一些实施例中,码流数据中还可以包括类别标识位。该解码方法300还可以包括:根据类别标识位确定目标物体可以为符合以下情况的至少一种的物体:当前图像相对已解码图像新增的标识物体;当前图像相对已解码图像位置发生变化的标识物体;当前图像相对已解码图像尺寸发生变化的标识物体;当前图像相对已解码图像中图像区域中的像素信息发生变化的标识物体。
在一些实现方式中,目标物体可以包括当前图像相对已解码图像新增的标识物体。图像区域信息可以包括目标物体所在的图像区域的位置的绝对值和尺寸的绝对值。在当前图像相对已解码图像新增标识物体时,图像区域信息和像素信息均应标出。S320对该码流数据的至少部分进行解码处理,可以包括:根据码流数据中的图像区域信息,确定目标物体,即新增的标识物体所在的图像区域的位置和尺寸。
在一些实现方式中,目标物体可以包括当前图像相对已解码图像,位置发生变化的标识物体,那么该目标物体(也即该位置发生变化的标识物体)的图像区域信息包括该目标物体所在的图像区域的位置的绝对值或位置变化的相对值。其中,该位置的绝对值,指的是目标物体在当前图像中所在图像区域的位置;该位置变化的相对值,指的是目标物体在已解码图像中所在图像区域的位置,与该目标物体在当前图像中所在图像区域的位置之间的差值。
在图像区域信息中包括该目标物体所在的图像区域的位置变化的相对值时,S320对该码流数据的至少部分进行解码处理,可以包括:根据所述目标物体在所述已解码图像中所在图像区域的位置,以及所述图像区域的位置变化的相对值,确定所述目标物体在所述当前图像中所在图像区域的位置。例如,解码设备可以确定目标物体在已解码图像中所在图像区域的位置;根据目标物体在已解码图像中所在图像区域的位置,以及目标物体在已解码图像中所在图像区域的位置与该目标物体在当前图像中所在图像区域的位置之间的差值,确定当前图像中目标物体所在的图像区域的位置。
在上述当前图像相对已解码图像存在目标物体(也即位置发生变化的标识物体)的实现方式中,可能存在该目标物体在当前图像中所在图像区域的尺寸相比在该已解码图像中所在图像区域的尺寸发生变化或保持不变两种情况。
在发生变化的情况中,可选地,该目标物体的图像区域信息包括该目标物体所在的图像区域的尺寸的绝对值或尺寸变化的相对值。其中,该尺寸的绝对值,指的是目标物体在当前图像中所在图像区域的尺寸;该尺寸变化的相对值,指的是目标物体在已解码图像中所在区域的尺寸,与该目标物体在当前图像中所在图像去的尺寸之间的差值。
在图像区域信息中包括该目标物体所在的图像区域的尺寸变化的相对值时,S320对该码流数据的至少部分进行解码处理,可以包括:根据所述目标物体在所述已解码图像中所在图像区域的尺寸,以及所述图像区域的尺寸变化的相对值,确定所述目标物体在所述当前图像中所在图像区域的尺寸。例如,解码设备可以确定目标物体在已解码图像中所在图像区域的尺寸;根据目标物体在已解码图像中所在图像区域的尺寸,以及目标物体在已解码图像中所在图像区域的尺寸与该目标物体在当前图像中所在图像区域的尺寸之间的差值,确定当前图像中目标物体所在的图像区域的尺寸。
在保持不变的情况中,可选地,该目标物体的图像区域信息包括标识位,用于指示该目标物体所在图像区域的尺寸相比在该已解码图像中保持不变。可选的,码流数据中该目标物体的图像区域信息中不对该图像区域的尺寸进行编码。S320对该码流数据的至少部分进行解码处理,还可以包括:根据所述目标物体在所述已解码图像中所在图像区域的尺寸,确定所述目标物体在所述当前图像中所在图像区域的尺寸。即将目标物体在已解码图像中所在图像区域的尺寸确定为其在所述当前图像中所在图像区域的尺寸。
在上述当前图像相对已解码图像存在目标物体(也即位置发生变化的标识物体)的实现方式中,可能存在该目标物体在当前图像中所在图像区域的像素信息相比在该已解码图像中所在图像区域的像素发生变化或保持不变两种情况。
在发生变化的情况中,可选地,该目标物体的像素信息包括该目标物体所在的图像区域的至少一个像素的属性的绝对值或者至少一个像素的属性变化的相对值。其中,该属性的绝对值,指的是目标物体在当前图像中所在 图像区域的至少一个像素的属性;至少一个像素的属性可以是指该图像区域中所有像素的属性的绝对值,也可以是指该图像区域中属性发生变化的部分像素的属性的绝对值。该属性变化的相对值,指的是该目标物体在当前图像中所在图像区域的像素被赋予的数值,与该目标物体在已解码图像中所在图像区域的像素被赋予的数值之间的差值。该相对值可以是该图像区域中所有像素分别对应的差值,也可以是该图像区域中属性发生变化的部分像素对应的差值,即当差值为0时,该差值可以省略。
在像素信息包括该目标物体所在的图像区域的至少一个像素的属性变化的相对值时,S320对该码流数据的至少部分进行解码处理,还可以包括:根据所述目标物体在所述已解码图像中的像素信息,以及所述至少一个像素的属性变化的相对值,确定所述目标物体在所述当前图像的像素信息。例如,解码设备可以确定目标物体在已解码图像中所在图像区域的至少一个像素的属性;根据目标物体在已解码图像中所在图像区域的至少一个像素的属性,以及目标物体在已解码图像中所在图像区域的至少一个像素的属性与该目标物体在当前图像中所在图像区域的至少一个像素的属性之间的差值,确定当前图像中目标物体所在的图像区域的至少一个像素的属性。
当像素信息中包括目标物体在当前图像中所在图像区域中属性发生变化的部分像素的信息时,解码设备可以认为其余部分像素的属性没有发生变化。
在保持不变的情况中,可选地,该目标物体的图像区域信息包括标识位,用于指示该目标物体所在图像区域的像素信息相比在该已解码图像中保持不变。可选地,码流数据中不对该目标物体的像素信息进行编码。相应地,S320对该码流数据的至少部分进行解码处理,还可以包括:根据目标物体在已解码图像中所在图像区域的像素信息,确定目标物体在当前图像所在图像区域的像素信息。
在一些实现方式中,目标物体可以包括当前图像相对已解码图像,尺寸发生变化的标识物体,那么该目标物体(也即该尺寸发生变化的标识物体)的图像区域信息包括该目标物体所在的图像区域的尺寸的绝对值或尺寸变化的相对值。在图像区域信息中包括该目标物体所在的图像区域的尺寸变化的相对值时,S320对该码流数据的至少部分进行解码处理,可以包括:根据所述目标物体在所述已解码图像中所在图像区域的尺寸,以及所述图像区 域的尺寸变化的相对值,确定所述目标物体在所述当前图像中所在图像区域的尺寸。
在上述当前图像相对已解码图像存在目标物体(也即尺寸发生变化的标识物体)的实现方式中,可能存在该目标物体在当前图像中所在图像区域的位置相比在该已解码图像中所在图像区域的位置发生变化或保持不变两种情况。
在发生变化的情况中,可选地,该目标物体的图像区域信息包括该目标物体所在的图像区域的位置的绝对值或位置变化的相对值。在图像区域信息中包括该目标物体所在的图像区域的位置变化的相对值时,S320对该码流数据的至少部分进行解码处理,还可以包括:根据所述目标物体在所述已解码图像中所在图像区域的位置,以及所述图像区域的位置变化的相对值,确定所述目标物体在所述当前图像中所在图像区域的位置。
在保持不变的情况中,可选地,该目标物体的图像区域信息包括标识位,用于指示该目标物体所在图像区域的位置相比在该已解码图像中保持不变。可选的,码流数据中该目标物体的图像区域信息中不对该图像区域的位置进行编码。S320对该码流数据的至少部分进行解码处理,还可以包括:根据所述目标物体在所述已解码图像中所在图像区域的位置,确定所述目标物体在所述当前图像中所在图像区域的位置。即将目标物体在已解码图像中所在图像区域的位置确定为其在所述当前图像中所在图像区域的位置。
在上述当前图像相对已解码图像存在目标物体(也即位置发生变化的标识物体)的实现方式中,可能存在该目标物体在当前图像中所在图像区域的像素信息相比在该已解码图像中所在图像区域的像素发生变化或保持不变两种情况。
在发生变化的情况中,可选地,该目标物体的像素信息包括该目标物体所在的图像区域的至少一个像素的属性的绝对值或者至少一个像素的属性变化的相对值。在像素信息包括该目标物体所在的图像区域的至少一个像素的属性变化的相对值时,S320对该码流数据的至少部分进行解码处理,还可以包括:根据所述目标物体在所述已解码图像中的像素信息,以及所述至少一个像素的属性变化的相对值,确定所述目标物体在所述当前图像的像素信息。
在保持不变的情况中,可选地,该目标物体的图像区域信息包括标识位, 用于指示该目标物体所在图像区域的像素信息相比在该已解码图像中保持不变。可选的,码流数据中不对该目标物体的像素信息进行编码。可选地,码流数据中不对该目标物体的像素信息进行编码。相应地,S320对该码流数据的至少部分进行解码处理,还可以包括:根据目标物体在已解码图像中所在图像区域的像素信息,确定目标物体在当前图像所在图像区域的像素信息。
在一些实现方式中,目标物体可以包括当前图像相对已解码图像,所在图像区域的像素信息发生变化的标识物体,那么该目标物体(也即该像素信息发生变化的标识物体)在当前图像所在图像区域的像素信息包括该目标物体所在的图像区域的至少一个像素的属性的绝对值或属性的变化的相对值。
在上述当前图像相对已解码图像存在目标物体(也即像素信息发生变化的标识物体)的实现方式中,可能存在该目标物体在当前图像中所在图像区域的位置相比在该已解码图像中所在图像区域的位置发生变化或保持不变两种情况。在像素信息包括该目标物体所在的图像区域的至少一个像素的属性变化的相对值时,S320对该码流数据的至少部分进行解码处理,可以包括:根据所述目标物体在所述已解码图像中的像素信息,以及所述至少一个像素的属性变化的相对值,确定所述目标物体在所述当前图像的像素信息。
在发生变化的情况中,可选地,该目标物体的图像区域信息包括该目标物体所在的图像区域的位置的绝对值或位置变化的相对值。在图像区域信息中包括该目标物体所在的图像区域的位置变化的相对值时,S320对该码流数据的至少部分进行解码处理,还可以包括:根据所述目标物体在所述已解码图像中所在图像区域的位置,以及所述图像区域的位置变化的相对值,确定所述目标物体在所述当前图像中所在图像区域的位置。
在保持不变的情况中,可选地,该目标物体的图像区域信息包括标识位,用于指示该目标物体所在图像区域的位置相比在该已解码图像中保持不变。可选的,码流数据中该目标物体的图像区域信息中不对该图像区域的位置进行编码。S320对该码流数据的至少部分进行解码处理,还可以包括:根据所述目标物体在所述已解码图像中所在图像区域的位置,确定所述目标物体在所述当前图像中所在图像区域的位置。即将目标物体在已解码图像中所在图像区域的位置确定为其在所述当前图像中所在图像区域的位置。
在上述当前图像相对已解码图像存在目标物体(也即像素信息发生变化 的标识物体)的实现方式中,可能存在该目标物体在当前图像中所在图像区域的尺寸相比在该已解码图像中所在图像区域的尺寸发生变化或保持不变两种情况。
在发生变化的情况中,可选地,该目标物体的图像区域信息包括该目标物体所在的图像区域的尺寸的绝对值或尺寸变化的相对值。在图像区域信息中包括该目标物体所在的图像区域的尺寸变化的相对值时,S320对该码流数据的至少部分进行解码处理,可以包括:根据所述目标物体在所述已解码图像中所在图像区域的尺寸,以及所述图像区域的尺寸变化的相对值,确定所述目标物体在所述当前图像中所在图像区域的尺寸。
在保持不变的情况中,可选地,该目标物体的图像区域信息包括标识位,用于指示该目标物体所在图像区域的尺寸相比在该已解码图像中保持不变。可选的,码流数据中该目标物体的图像区域信息中不对该图像区域的尺寸进行编码。S320对该码流数据的至少部分进行解码处理,还可以包括:根据所述目标物体在所述已解码图像中所在图像区域的尺寸,确定所述目标物体在所述当前图像中所在图像区域的尺寸。
需要补充说明的是,上面所述的几种实现方式之间至少部分实现方式是可以结合的。
在无人机的一种具体的应用场景中,图像区域可以为矩形区域。图像区域信息可以包括矩形区域的中心点坐标、矩形区域的高度信息和矩形区域的宽度信息。在目标物体所在的图像区域的位置保持不变,且尺寸发生变化的情况下,码流数据中可以不包括图像区域信息中图像区域的中心点坐标的数值,而是用标识位来指示其内容不变。所述图像区域信息还可以包括标识位,用于指示所述目标物体所在图像区域的中心点坐标保持不变。S320对所述码流数据的至少部分进行解码处理,可以包括:根据所述目标物体在所述已解码图像中所在图像区域的中心点坐标,确定所述目标物体在所述图像区域的中心点坐标。解码设备可以根据位置保持不变的标识物体在已解码图像中所在图像区域的中心点坐标,确定位置保持不变的标识物体在图像区域的中心点坐标;根据当前图像的图像区域信息,确定图像区域的高度信息和宽度信息;根据图像区域的中心点坐标,以及图像区域的高度信息和宽度信息,确定位置保持不变的标识物体所在的图像区域。
可选地,在本申请的一些实施例中,标识信息还可以用于标识当前图像 相对已解码图像的被移除物体。
在一种可能的实现方式中,标识信息可以包括被移除物体的标号信息或被移除物体的位置信息。
可选地,在本申请的一些实施例中,所述码流数据还可以包括所述当前图像的图像内容数据。S320对所述码流数据的至少部分进行解码处理,可以包括:对所述码流数据中的所述当前图像的图像内容数据进行解码处理。
在一种可能的实现方式中,所述当前图像的图像内容数据包括所述当前图像的参考帧数据以及所述当前图像与所述参考帧之间的残差数据。
可选地,在本申请的一些实施例中,S320对所述码流数据的至少部分进行解码处理,可以包括:对码流数据中的标识信息解码,获得当前图像以及解码后的标识信息。
可选地,在本申请的一些实施例中,S320对所述码流数据的至少部分进行解码处理,可以包括:丢弃标识信息,不对标识信息进行解码。
可选地,在本申请的一些实施例中,标识信息中还可以包括内容信息。S320对所述码流数据的至少部分进行解码处理,可以包括:根据码流数据中的内容信息,确定目标物体的内容。
在一种可能的实现方式中,内容信息可以为label信息。
在另一种可能的实现方式中,内容信息可以为数值。
可选地,在本申请的一些实施例中,图像区域可以为矩形区域。
在一种可能的实现方式中,图像区域信息可以包括矩形区域的任意一角的坐标、矩形区域的高度信息和矩形区域的宽度信息。
或者,图像区域信息可以包括矩形区域的中心点坐标、矩形区域的高度信息和矩形区域的宽度信息。
或者,图像区域信息可以包括矩形区域的左上角坐标和矩形区域的右下角坐标。
或者,图像区域信息可以包括矩形区域的右上角坐标和矩形区域的左下角坐标。
可选地,在本申请的一些实施例中,标识信息可以位于当前图像的辅助强化信息或扩展数据中。
在属性以像素为计量单位一个例子中,像素信息可以以模板(mask)来表示。模板值可以以二进制数值0和1来标识。像素信息中属于目标物体的 像素的模板值为1;属于背景的像素的模板值为0。以目标物体i的图像区域为矩形区域;目标物体i的图像区域信息包括该矩形区域的左上角的坐标、该矩形区域的高度信息和该矩形区域的宽度信息;目标物体i的像素信息以模板来表示为例,对于解码设备,目标物体i的标识信息的具体内容可以如下。本领域技术人员可理解地,该内容仅是示意性的,可以由其他可替换的形式或方案得到,此处不在文中一一列举。
Figure PCTCN2018094716-appb-000008
其中,ar_object_top[i]、ar_object_left[i]、ar_object_width[i]和ar_object_height[i]表示目标物体i的位置和尺寸,ar_object_top[i]和ar_object_left[i]表示目标物体i的左上角的位置;ar_object_width[i]和ar_object_height[i]表示目标物体i的宽度和高度。mask[m][n]表示相对于矩形区域的左上角,坐标在垂直和水平方向偏移m和n的像素对应的模板值。当所述解码得到的mask_value的值为1时,mask[m][n]的值为1,表示像素属于目标物体i;当解码得到的mask_value的值为0时,mask[m][n]的值为0,表示像素属于背景。
此外,对于mask的标识可以采用逐点的标识方法,也可以通过表示所述目标物体在所述以ar_object_top[i]、ar_object_left[i]、ar_object_width[i]和ar_object_height[i]标识的目标框类的每一行的起点位置以及所述目标物体在该行的长度来标识。具体方法如下:
Figure PCTCN2018094716-appb-000009
其中,mask_pos[i][m]表示第i个物体在所述目标框内第m行的起点位置,mask_len[i][m]表示第i个物体在所述目标框内第m行的长度。
对于解码设备,解码当前图像的标识物体的相关信息可以参考已解码图像的情况。解码设备接收的标识信息的具体内容可以如下。
Figure PCTCN2018094716-appb-000010
Figure PCTCN2018094716-appb-000011
其中,ar_object_mask_present_flag表示当前图像中是否需要标识物体的mask信息;ar_num_cancel_objects表示当前图像相对于已解码图像不再存在的物体;ar_cancel_object_idx[i]表示上述不再存在的物体的标号;ar_num_objects_minus1表示当前图像中需要标识的物体的数目;ar_object_idx[i]表示当前图像中第i个需要标识的物体的标号;ar_bounding_box_mask_present_flag[ar_object_idx[i]]表示标号为ar_object_idx[i]是否有标识物体形状的mask;ar_bounding_box_mask_infer_flag[ar_object_idx[i]]表示当标号为ar_object_idx[i]含有mask信息时,该mask值是否来自于之前已编码图像的标号为ar_object_idx[i]的物体的mask;ar_new_object_flag[ar_object_idx[i]]表示当前图像中标号为ar_object_idx[i]的物体是否是新出现的物体;ar_object_bounding_box_update_flag[ar_object_idx[i]]表示当前图像和已解码图像中,标号为ar_object_idx[i]的物体在当前图像中的位置和尺寸是否发生了变化;ar_object_top[ar_object_idx[i]]、ar_object_left[ar_object_idx[i]]、ar_object_width[ar_object_idx[i]]和ar_object_height[ar_object_idx[i]]表示标号为ar_object_idx[i]的物体的位置和尺寸,ar_object_top[ar_object_idx[i]]和ar_object_left[ar_object_idx[i]]表示标号为ar_object_idx[i]的物体左上角的位置;ar_object_width[ar_object_idx[i]]和ar_object_height[ar_object_idx[i]]表示标号为ar_object_idx[i]的物体的宽度和高度。如果未变化,则ar_object_idx[i]对应的矩形区域的位置、尺寸及像素信息与所述已解码图像中标号为ar_object_idx[i]对应的矩形区域的位置、尺寸及像素信息一致;如果发生了变化,则重新解码ar_object_idx[i]对应的矩形区域的位置、尺寸及像素信息。mask[m][n]表示相对于矩形区域的左上角,坐标在垂直和水平方向偏移m和n的像素对应的模板值。当所述解码得到的mask_value的值为1时,mask[m][n]的值为1,表示像素属于标号为ar_object_idx[i]的物体;当解码得到的mask_value的值为0时, mask[m][n]的值为0,表示像素属于背景。
此外,对于mask的标识可以采用逐点的标识方法,也可以通过表示所述目标物体在所述以ar_object_top[ar_object_idx[i]]、ar_object_left[ar_object_idx[i]]、ar_object_width[ar_object_idx[i]]和ar_object_height[ar_object_idx[i]]标识的目标框类的每一行的起点位置以及所述目标物体在该行的长度来标识。具体方法如下:
Figure PCTCN2018094716-appb-000012
其中,mask_pos[ar_object_idx[i]][m]表示第ar_object_idx[i]个物体在所述目标框内第m行的起点位置,mask_len[ar_object_idx[i]][m]表示第ar_object_idx[i]个物体在所述目标框内第m行的长度。
可选地,在本申请的一些实施例中,码流数据和/或标识信息中还可以包括已解码图像的指示位,用于指示当前参考的是哪个已解码图像。该指示位可以是已解码图像的编号,或者是在解码顺序上距离当前图像的帧数。当然,码流数据和标识信息也可以不包括已解码图像的参考位,而是使用协议规定或默认的前一帧图像或者前N帧图像,作为参考的已解码图像。
以上详细说明了本申请实施例的方法,下面详细说明本申请实施例的编码设备和解码设备。
图4是本申请一个实施例的编码设备400的示意性框图。如图4所示,编码设备400包括:
至少一个存储器410,用于存储计算机可执行指令;
至少一个处理器420,单独或共同地用于:访问所述至少一个存储器410,并执行所述计算机可执行指令,以实施以下操作:
对当前图像进行编码处理,生成码流数据,所述码流数据中包括标识信息,所述标识信息用于标识所述当前图像中的至少一个目标物体,所述标识信息包括图像区域信息和像素信息,所述图像区域信息包括所述目标物体所在的图像区域的位置和尺寸,所述像素信息包括所述图像区域中的至少一个像素的属性。
本申请实施例的编码设备,通过图像区域信息指示目标物体所在的图像 区域的位置和尺寸,通过像素信息指示图像区域中的多个像素的属性,从而以更细的粒度来标识目标物体,有利于解码设备更高效更准确地对目标物体执行操作。
在一些实施例中,所述至少一个像素的属性包括所述至少一个像素是否属于所述目标物体。
在一些实施例中,所述图像区域包括多个子图像区域,所述像素信息包括赋予所述图像区域中的至少一个像素的数值;其中,不同所述子图像区域中的像素赋予不同的数值。
在一些实施例中,所述像素信息中,为所述至少一个像素赋予不同的数值,用于指示所述至少一个像素是否属于所述目标物体。
在一些实施例中,所述至少一个像素中,第一部分像素赋予第一数值,来指示所述第一部分像素不属于所述目标物体。
在一些实施例中,所述至少一个像素中,第二部分像素赋予第二数值,来指示所述第二部分像素属于所述目标物体。
在一些实施例中,所述至少一个像素的属性包括所述至少一个像素所属的所述目标物体的部位。
在一些实施例中,所述像素信息中,不同像素赋予不同的数值,用于指示所述不同像素属于所述目标物体的不同部位。
在一些实施例中,所述目标物体是人;
所述至少一个像素中的第一部分像素赋予第三数值,用于指示所述第一部分像素属于所述目标物体的头部;
和/或,
所述至少一个像素中的第二部分像素赋予第四数值,用于指示所述第二部分像素属于所述目标物体的手部。
在一些实施例中,所述目标物体是车;
所述至少一个像素中的第一部分像素赋予第五数值,用于指示所述第一部分像素属于所述目标物体的车头;
和/或,
所述至少一个像素中的第二部分像素赋予第六数值,用于指示所述第二部分像素属于所述目标物体的车尾。
在一些实施例中,所述至少一个像素的属性包括所述至少一个像素对应 的描述特征。
在一些实施例中,所述至少一个像素对应的描述特征包括以下中的至少一种:所述至少一个像素对应的点云的反射强度、所述至少一个像素对应的红外强度和所述至少一个像素对应的深度值。
在一些实施例中,其特征在于,所述属性以像素块为计量单位,所述像素信息包括至少一个像素块的属性的信息,所述像素块包括至少两个像素。
在一些实施例中,所述目标物体为符合以下情况的至少一种的物体:
所述当前图像相对已编码图像新增的标识物体;
所述当前图像相对已编码图像位置发生变化的标识物体;
所述当前图像相对已编码图像尺寸发生变化的标识物体;
所述当前图像相对已编码图像中所述图像区域中的像素信息发生变化的标识物体。
在一些实施例中,所述码流数据中还包括用于指示以下至少一种情况的类别标识位:
所述目标物体为所述当前图像相对已编码图像新增的标识物体;
所述目标物体为所述当前图像相对已编码图像位置发生变化的标识物体;
所述目标物体为所述当前图像相对已编码图像尺寸发生变化的标识物体;
所述目标物体为所述当前图像相对已编码图像中所述图像区域中的像素信息发生变化的标识物体。
在一些实施例中,所述目标物体包括所述当前图像相对已编码图像新增的标识物体,所述图像区域信息包括所述新增的标识物体所在的图像区域的位置的绝对值和尺寸的绝对值。
在一些实施例中,所述目标物体包括所述当前图像相对已编码图像、位置发生变化的标识物体;
所述图像区域信息包括所述目标物体所在的图像区域的位置的绝对值或位置变化的相对值。
在一些实施例中,所述图像区域信息中包括标识位,用于指示所述目标物体所在图像区域的尺寸相比已编码图像保持不变。
在一些实施例中,所述目标物体包括所述当前图像相对已编码图像、尺 寸发生变化的标识物体;
所述图像区域信息包括所述目标物体所在的图像区域的尺寸的绝对值或尺寸变化的相对值。
在一些实施例中,所述像素信息中包括标识位,用于指示所述目标物体所在图像区域的像素信息相比已编码图像保持不变。
在一些实施例中,所述目标物体包括所述当前图像相对已编码图像、像素信息发生变化的标识物体,所述像素信息包括所述像素信息的绝对值或者所述像素信息变化的相对值。
在一些实施例中,所述像素信息中包括标识位,用于指示所述目标物体所在图像区域的像素信息相比已编码图像发生改变。
在一些实施例中,所述图像区域信息中包括标识位,用于指示所述目标物体所在图像区域的尺寸和/或位置相比已编码图像保持不变。
在一些实施例中,所述图像区域为矩形区域,所述图像区域信息包括所述矩形区域的中心点坐标、所述矩形区域的高度信息和所述矩形区域的宽度信息;
所述图像区域信息包括标识位,用于指示所述目标问题所在图像区域的中心点坐标相比所述已编码图像保持不变。
在一些实施例中,所述标识信息还用于标识所述当前图像相对已编码图像的被移除物体。
在一些实施例中,所述标识信息包括所述被移除物体的标号信息或所述被移除物体的位置信息。
在一些实施例中,所述处理器420还用于:
将所述当前图像相对已编码图像,新增的待标识物体确定为所述目标物体;
将所述当前图像相对已编码图像,位置和/或尺寸发生变化的待标识物体确定为所述目标物体;
将所述当前图像相对已编码图像,图像区域中的像素信息发生变化的待标识物体确定为所述目标物体。
在一些实施例中,所述标识信息中还包括内容信息,所述内容信息用于指示所述目标物体的内容。
在一些实施例中,所述内容信息为标签label信息。
在一些实施例中,所述内容信息为数值。
在一些实施例中,所述图像区域为矩形区域。
在一些实施例中,所述图像区域信息包括所述矩形区域的任意一角的坐标、所述矩形区域的高度信息和所述矩形区域的宽度信息;
或者,
所述图像区域信息包括所述矩形区域的中心点坐标、所述矩形区域的高度信息和所述矩形区域的宽度信息;
或者,
所述图像区域信息包括所述矩形区域的左上角坐标和所述矩形区域的右下角坐标;
或者,
所述图像区域信息包括所述矩形区域的右上角坐标和所述矩形区域的左下角坐标。
在一些实施例中,在所述对当前图像进行编码处理,生成码流数据之前,所述处理器420还可以用于:
对所述当前图像进行图像识别,确定所述目标物体,得到所述目标物体的所述标识信息。
在一些实施例中,其特征在于,所述标识信息位于所述当前图像的辅助强化信息或扩展数据中。
应理解,本申请各实施例的编码设备可以基于模块实现。例如,图5是本申请一个实施例的编码设备500的示意性框图。如图5所示,编码设备500可以包括编码模块510,用来进行编码处理,生成码流数据等。编码设备中的各模块可以用于执行本申请各实施例的方法,此处不再赘述。
图6是本申请一个实施例的解码设备600的示意性框图。如图6所示,
至少一个存储器610,用于存储计算机可执行指令;
至少一个处理器620,单独或共同地用于:访问所述至少一个存储器610,并执行所述计算机可执行指令,以实施以下操作:
获取当前图像的码流数据,所述码流数据中包括标识信息,所述标识信息用于标识所述当前图像中的至少一个目标物体,所述标识信息包括图像区域信息和像素信息,所述图像区域信息包括所述目标物体所在的图像区域的位置和尺寸,所述像素信息包括所述图像区域中的至少一个像素的属性;
对所述码流数据的至少部分进行解码处理。
本申请实施例提供的解码设备,通过图像区域信息指示目标物体所在的图像区域的位置和尺寸,通过像素信息指示图像区域中的多个像素的属性,从而以更细的粒度来标识目标物体,有利于解码设备更高效更准确地对目标物体执行操作。
在一些实施例中,所述至少一个像素的属性包括所述至少一个像素是否属于所述目标物体。
在一些实施例中,所述图像区域包括多个子图像区域,所述像素信息包括赋予所述图像区域中的至少一个像素的数值;其中,不同所述子图像区域中的像素赋予不同的数值。
在一些实施例中,所述像素信息中,为所述至少一个像素赋予不同的数值,所述处理器620对所述码流数据的至少部分进行解码处理,包括:
根据所述码流数据中的像素信息,确定所述图像区域中的所述至少一个像素是否属于所述目标物体。
在一些实施例中,所述处理器620根据所述码流数据中的像素信息,确定所述图像区域中的所述至少一个像素是否属于所述目标物体,可以包括:
当所述码流数据中的像素信息中第一部分像素对应第一数值时,确定所述第一部分像素不属于所述目标物体。
在一些实施例中,所述处理器620根据所述码流数据中的像素信息,确定所述图像区域中的所述至少一个像素是否属于所述目标物体,可以包括:
当所述码流数据中的像素信息中第二部分像素对应所述第二数值时,确定所述第二部分像素属于所述目标物体。
在一些实施例中,所述至少一个像素的属性包括所述至少一个像素所属的所述目标物体的部位。
在一些实施例中,所述像素信息中,不同像素赋予不同的数值,
所述处理器620对所述码流数据的至少部分进行解码处理,包括:
根据所述码流数据中的像素信息,确定所述图像区域中的所述至少一个像素属于所述目标物体的部位。
在一些实施例中,所述目标物体是人;
所述处理器620根据所述码流数据中的像素信息,确定所述图像区域中的所述至少一个像素在所述目标物体所属的部位,包括:
当所述码流数据中的像素信息中第一部分像素对应所述第三数值时,确定所述第一部分像素属于所述目标物体的头部;
和/或,
所述处理器620根据所述码流数据中的像素信息,确定所述图像区域中的所述至少一个像素在所述目标物体所属的部位,包括:
当所述码流数据中的像素信息中第二部分像素对应所述第四数值时,确定所述第二部分像素属于所述目标物体的手部。
在一些实施例中,所述目标物体是车;
所述处理器620根据所述码流数据中的像素信息,确定所述图像区域中的所述至少一个像素在所述目标物体所属的部位,包括:
当所述码流数据中的像素信息中第一部分像素对应所述第五数值时,确定所述第一部分像素属于所述目标物体的车头;
和/或,
所述至少一个像素中的第二部分像素赋予第六数值,所述处理器620根据所述码流数据中的像素信息,确定所述图像区域中的所述至少一个像素属于所述目标物体的部位,包括:
根据所述码流数据中的像素信息中第二部分像素对应所述第六数值,确定所述第二部分像素属于所述目标物体的车尾。
在一些实施例中,所述至少一个像素的属性包括所述至少一个像素对应的描述特征。
在一些实施例中,所述至少一个像素对应的描述特征包括以下中的至少一种:所述至少一个像素对应的点云的反射强度、所述至少一个像素对应的红外强度和所述至少一个像素对应的深度值。
在一些实施例中,所述属性以像素块为计量单位,所述像素信息包括至少一个像素块的属性的信息,所述像素块包括至少两个像素。
在一些实施例中,所述码流数据中包括类别标识位,所述处理器620还用于:
根据所述类别标识位确定所述目标物体为符合以下情况的至少一种的物体:
所述当前图像相对已解码图像新增的标识物体;
所述当前图像相对已解码图像位置发生变化的标识物体;
所述当前图像相对已解码图像尺寸发生变化的标识物体;
所述当前图像相对已解码图像中所述图像区域中的像素信息发生变化的标识物体。
在一些实施例中,所述目标物体包括所述当前图像相对已解码图像新增的标识物体,所述图像区域信息包括所述目标物体所在的图像区域的位置的绝对值和尺寸的绝对值。
在一些实施例中,所述目标物体包括所述当前图像相对已解码图像位置发生变化的标识物体;
所述图像区域信息包括所述目标物体所在的图像区域的位置的绝对值,
或者,
所述图像区域信息包括所述目标物体所在的图像区域的位置变化的相对值,所述处理器620对所述码流数据的至少部分进行解码处理,可以包括:
根据所述目标物体在所述已解码图像中所在图像区域的位置,以及所述图像区域的位置变化的相对值,确定所述目标物体在所述当前图像中所在图像区域的位置。
在一些实施例中,所述图像区域信息中包括标识位,用于指示所述目标物体所在图像区域的尺寸相比在所述已解码图像中保持不变;
所述处理器620可以对所述码流数据的至少部分进行解码处理,还包括:
根据所述目标物体在所述已解码图像中所在图像区域的尺寸,确定所述目标物体在所述当前图像中所在图像区域的尺寸。
在一些实施例中,所述目标物体包括所述当前图像相对已解码图像尺寸发生变化的标识物体;
所述图像区域信息包括所述图像区域的尺寸的绝对值,
或者,
所述图像区域信息包括所述图像区域的尺寸变化的相对值,所述处理器620对所述码流数据的至少部分进行解码处理,包括:
根据所述目标物体在所述已解码图像中所在图像区域的尺寸,以及所述图像区域的尺寸变化的相对值,确定所述目标物体在所述当前图像中所在图像区域的尺寸。
在一些实施例中,所述像素信息包括标识位,用于指示所述目标物体所在图像区域的像素信息相比所述已解码图像保持不变;
所述处理器620对所述码流数据的至少部分进行解码处理,还包括:
根据所述目标物体在所述已解码图像中所在图像区域的像素信息,确定所述目标物体在所述当前图像所在图像区域的像素信息。
在一些实施例中,所述码流数据包括所述像素信息;
所述处理器620对所述码流数据的至少部分进行解码处理,还包括:
解码所述目标物体在所述当前图像所在图像区域的像素信息。
在一些实施例中,所述码流数据还包括标识位,用于指示所述目标物体所在图像区域的像素信息相比所述已解码图像改变。
在一些实施例中,所述目标物体包括所述当前图像相对已解码图像像素信息发生变化的标识物体;
所述像素信息包括所述至少一个像素的属性的绝对值;
或者,
所述像素信息包括所述至少一个像素的属性变化的相对值,所述处理器620对所述码流数据的至少部分进行解码处理,包括:
根据所述目标物体在所述已解码图像中的像素信息,以及所述至少一个像素的属性变化的相对值,确定所述目标物体在所述当前图像的像素信息。
在一些实施例中,所述图像区域信息还包括标识位,用于指示所述目标物体在所述当前图像所在图像区域相比在所述已解码图像中不变;
所述处理器620对所述码流数据的至少部分进行解码处理,包括:
根据所述目标物体在所述已解码图像中的图像区域信息,确定所述目标物体在当前图像中的图像区域信息。
在一些实施例中,所述图像区域为矩形区域,所述图像区域信息包括所述矩形区域的中心点坐标、所述矩形区域的高度信息和所述矩形区域的宽度信息;
所述图像区域信息还包括标识位,用于指示所述目标物体所在图像区域的中心点坐标保持不变;
所述处理器620对所述码流数据的至少部分进行解码处理,包括:
根据所述目标物体在所述已解码图像中所在图像区域的中心点坐标,确定所述目标物体在所述图像区域的中心点坐标。
在一些实施例中,所述标识信息还用于标识所述当前图像相对已解码图像的被移除物体。
在一些实施例中,所述标识信息包括所述被移除物体的标号信息或所述被移除物体在所述已解码图像中的位置信息。
在一些实施例中,所述处理器620对所述码流数据的至少部分进行解码处理,包括:
对所述码流数据中的所述标识信息解码,获得所述当前图像以及解码后的所述标识信息。
在一些实施例中,所述处理器620对所述码流数据的至少部分进行解码处理,包括:
丢弃所述标识信息,不对所述标识信息进行解码。
在一些实施例中,所述码流数据还包括所述当前图像的图像内容数据;
所述处理器620对所述码流数据的至少部分进行解码处理,包括:
对所述码流数据中的所述当前图像的图像内容数据进行解码处理。
在一些实施例中,所述当前图像的图像内容数据包括所述当前图像的参考帧数据以及所述当前图像与所述参考帧之间的残差数据。
在一些实施例中,所述标识信息中还包括内容信息,
所述处理器620对所述码流数据的至少部分进行解码处理,包括:
根据所述码流数据中的所述内容信息,确定所述目标物体的内容。
在一些实施例中,所述内容信息为标签label信息。
在一些实施例中,所述内容信息为数值。
在一些实施例中,所述图像区域为矩形区域。
在一些实施例中,所述图像区域信息包括所述矩形区域的任意一角的坐标、所述矩形区域的高度信息和所述矩形区域的宽度信息;
或者,
所述图像区域信息包括所述矩形区域的中心点坐标、所述矩形区域的高度信息和所述矩形区域的宽度信息;
或者,
所述图像区域信息包括所述矩形区域的左上角坐标和所述矩形区域的右下角坐标;
或者,
所述图像区域信息包括所述矩形区域的右上角坐标和所述矩形区域的左下角坐标。
在一些实施例中,所述标识信息位于所述当前图像的辅助强化信息或扩展数据中。
应理解,本申请各实施例的解码设备可以基于模块实现。例如,图7是本申请一个实施例的解码设备700的示意性框图。如图7所示,解码设备700可以包括获取模块710,用来获取当前图像的码流数据;还包括解码模块720,用来对所述码流数据的至少部分进行解码处理。解码设备中的各模块可以用于执行本申请各实施例的方法,此处不再赘述。
本申请还提供了一种图像处理方法。图8是本申请一个实施例的图像处理方法800的示意性流程图。如图8所示,该方法800包括以下步骤。
S810,获取当前图像的码流数据,该码流数据中包括标识信息,该标识信息用于标识该当前图像中的至少一个目标物体,该标识信息包括图像区域信息和像素信息,该图像区域信息包括该目标物体所在的图像区域的位置和尺寸,该像素信息包括该图像区域中的至少一个像素的属性。
S820,对该码流数据进行解码,得到该当前图像和该标识信息。
S830,根据该标识信息,对该当前图像进行像素级别处理。
本申请实施例的图像处理方法,通过图像区域信息指示目标物体所在的图像区域的位置和尺寸,通过像素信息指示图像区域中的多个像素的属性,从而以更细的粒度来标识目标物体,解码设备可以更高效更准确地对目标物体执行像素级别处理。
应理解,现有的方案中,由于识别目标物体的运算较为复杂,通常对解码设备的硬件要求较高,解码设备通常为电脑或服务器。本申请实施例的图像处理方法使得目标物体的识别可以放在编码端进行,解码设备仅需要进行后续的图像处理即可。因此,一方面,本申请实施例的图像处理方法可以在手机、平板电脑等平台上实现;另一方面,解码设备的计算资源可以用于更复杂的图像处理,使得解码设备能够呈现出更优质更精美的图像。
在本申请一些实施例中,S830根据所述标识信息,对所述当前图像进行像素级别处理,可以包括:根据所述标识信息,更改所述当前图像的显示内容。
在本申请另一些实施例中,S830根据所述标识信息,对所述当前图像进行像素级别处理,可以包括:根据所述标识信息,对所述当前图像中的数据信息进行统计。
换而言之,标识信息中可以包括图像区域信息以及更精细的像素信息。解码设备在对当前图像中的一个或多个像素进行显示处理或进行统计时,参考该图像区域信息以及更精细的像素信息,可以减少自身的运算量,能够节省计算资源、降低处理所需要的时间。下面将分别就显示和统计两方面,对本申请实施例的图像处理方法进行更详细地说明。
如前文所描述地,所述至少一个像素的属性包括所述至少一个像素是否属于所述目标物体。在一个实施例中,在像素信息中,为至少一个像素赋予不同的数值,用于指示至少一个像素是否属于目标物体。在一个例子中,至少一个像素中第一部分像素被赋予第一数值,来指示第一部分像素不属于目标物体;和/或,至少一个像素中第二部分像素被赋予第二数值,来指示第二部分像素属于目标物体。
可选地,在本申请的一些实施例中,所述方法800还可以包括:获取第一图像。S830根据所述标识信息,对所述当前图像进行像素级别处理,可以包括:基于所述标识信息对所述当前图像和所述第一图像进行融合处理,得到第二图像,所述第二图像包括所述当前图像的至少部分内容和所述第一图像的至少部分内容。
应理解,当前图像与第二图像的参数可以相同,例如大小相等,像素数量相同,分辨率相同等。当前图像与第二图像的参数也可以不同,本申请实施例对此不作限定。
在一个实施例中,所述基于所述标识信息对所述当前图像和所述第一图像进行融合处理,可以包括:基于所述标识信息对所述当前图像和所述第一图像进行加权求与,其中所述当前图像中所述目标物体对应的像素的加权值与所述当前图像中除所述目标物体以外的至少部分像素的加权值不同。目标物体对应的像素的加权值较大,除所述目标物体以外的至少部分像素的加权值较小,使得融合处理后得到的第二图像中当前图像的目标物体比非目标物体更突出。此外,还可以采取如下处理:将所述当前图像中的像素与所述第一图像中的像素进行加权求与,所述当前图像中属于所述目标物体的像素的权重大于所述第一图像中的相应位置像素的权重,所述当前图像中不属于所述目标物体的像素的权重小于所述第一图像中的相应像素的权重。
例如,目标物体对应的像素的加权值为0.6,第一图像对应目标物***置的像素的加权值为0.4;当前图像的除目标物体以外的至少部分像素的加 权值为0.2,第一图像对应除目标物体以外的位置的至少部分像素的加权值为0.8。最终的图像效果为当前图像呈半透明状浮在作为背景的第一图像上,并且目标物体在融合的图片中更突出。目标物体浮在第一图像展示的背景上。
在另一个实施例中,所述基于所述标识信息对所述当前图像和所述第一图像进行融合处理,可以包括:根据所述图像区域信息和所述像素信息,确定所述当前图像中不属于所述目标物体的像素;用所述第一图像中的相应像素替换所述当前图像中不属于所述目标物体的像素,得到第二图像。
在一个具体的例子中,所述至少一个像素被赋予不同的数值。所述根据所述图像区域信息和所述像素信息,确定所述当前图像中不属于所述目标物体的像素,可以包括:根据所述图像区域信息,将所述目标物体所在的图像区域以外的像素确定为所述当前图像中不属于所述目标物体的像素;当所述像素信息中第一部分像素对应第一数值时,确定所述第一部分像素不属于所述目标物体。由此,解码设备不需要进行复杂的运算,就可以简单地将目标物体的位置、尺寸以及边界等细节清楚的确定出来。
在一个具体的例子中,用所述第一图像中的相应像素替换所述当前图像中不属于所述目标物体的像素,得到第二图像,可以包括:将所述当前图像中的像素与所述第一图像中的像素进行加权求与,用第一图像中的相应像素替换所述当前图像中不属于所述目标物体的像素。其中,第一图像的部分像素作为背景。例如,将所述当前图像中的像素与所述第一图像中的像素进行加权求与,所述当前图像中属于所述目标物体的像素的权重为1,所述第一图像中的相应位置像素的权重为0;所述当前图像中不属于所述目标物体的像素的权重为0,所述第一图像中的相应位置像素的权重为1。如果像素信息中属于所述目标物体的像素被赋予数值1,不属于所述目标物体的像素被赋予数值0,则像素信息中的数值可以直接对应该像素在融合时的权重。图9A和图9B是本申请实施例的融合得到的两张图像的示意图。图中分别示出了目标物体和背景。
可选地,在所述根据所述图像区域信息和所述像素信息,确定所述当前图像中不属于所述目标物体的像素之前,所述方法还可以包括:所述根据所述像素信息,确定所述目标物体的边界,基于所述边界对所述目标物体进行膨胀运算;所述根据所述图像区域信息和所述像素信息,确定所述当前图像 中不属于所述目标物体的像素,包括:根据所述图像区域信息、所述像素信息和膨胀后的所述目标物体的边界,确定所述当前图像中不属于所述目标物体的像素。应理解,本申请实施例中,膨胀运算是将目标物体的领域扩张,使得原不属于目标物体的像素变为属于目标物体的像素。具体可以是将目标物体的原边界附近的像素在像素信息中的数值由0变为1,使得这些像素在融合时,不被第一图像中的像素替换,从而起到保护目标物体的作用。
在又一个实施例中,可以根据标识信息,可以将目标物体的像素提取出来用于其他场合。相应地,所述根据所述标识信息,对所述当前图像进行像素级别处理,可以包括:根据所述图像区域信息和所述像素信息,确定所述当前图像中属于所述目标物体的像素,提取属于所述目标物体的像素。
可选地,所述当前图像的拍摄角度与所述第一图像的拍摄角度相同。本申请实施例可以应用于单帧图像中,也可以应用于视频中。例如,该方法被应用于单帧图像的场景可以如下。某人拥有一张在泰山山顶某位置、某拍摄角度拍摄的人物(即目标物体)照片。然而,由于拍摄当天为阴天,并未拍摄到日出。另外有一张第一图像,也是在泰山山顶该位置、该拍摄角度拍摄的景物照片。将该人物照片和该景物照片进行融合,或者进行缩小或放大后进行融合,可以得到非常逼真的融合效果,得到该人沐浴初升的太阳的纪念照片。类似地,该方法可以被应用于视频的场景,得到较好的增强现实(Augmented Reality,AR)效果,这将在下文中进行详细说明。
应理解,当前图像的拍摄角度可以携带在码流数据中或更具体地在标识信息中。相应地,标识信息中还包括视角信息,所述视角信息用于指示所述目标物体相对拍摄装置的拍摄角度,或者指示当前图像的拍摄角度。在所述用第一图像中的相应像素替换所述当前图像中不属于所述目标物体的像素之前,所述方法还可以包括:确定所述目标物体的拍摄角度与所述第一图像的拍摄角度相同。当然,标识信息中也可以不包括视角信息。由解码设备通过其他方式或算法等,确定目标物体相对拍摄装置的拍摄角度,本申请实施例对此不作限定。
可选地,在一些实施例中,S830,根据所述标识信息,对所述当前图像进行像素级别处理,可以包括:根据所述图像区域信息和所述像素信息,确定所述当前图像中的所述目标物体,对所述目标物体添加增强现实AR特效。
在一个实施例中,所述标识信息中还可以包括视角信息,所述视角信息 用于指示所述目标物体相对拍摄装置的拍摄角度,所述至少一个像素的属性还可以包括所述至少一个像素对应的深度值。根据所述图像区域信息和所述像素信息,确定所述当前图像中的所述目标物体,对所述目标物体添加AR特效,包括:根据所述图像区域信息和所述像素信息,确定所述当前图像中的所述目标物体;根据所述目标物体、所述拍摄角度和所述至少一个像素对应的深度值,对所述目标物体添加AR特效。应理解,AR特效可以是增加图标(例如,箭头、光环等)、文字以及图层等。
应理解,当前图像的拍摄角度和/或至少一个像素对应的深度值等可以携带在码流数据中或更具体地在标识信息中,也可以由解码设备通过其他方式或算法等,确定目标物体相对拍摄装置的拍摄角度。例如,视角信息和/或至少一个像素对应的深度值可以通过无人机的对地姿态计算得到,本申请实施例对此不作限定。
在一个具体的例子中,所述像素信息中,所述至少一个像素被赋予不同的数值。例如,至少一个像素中第一部分像素被赋予第一数值,来指示第一部分像素不属于目标物体;和/或,至少一个像素中第二部分像素被赋予第二数值,来指示第二部分像素属于目标物体。所述根据所述图像区域信息和所述像素信息,确定所述当前图像中所述目标物体的边界,对所述目标物体添加增强现实AR特效,可以包括:当所述像素信息中第二部分像素对应第二数值时,确定所述第二部分像素属于所述目标物体;基于所述目标物体的边界,在所述目标物体上添加指示光环。图10A和图10B是本申请实施例的在目标物体上添加指示光环的示意图。如图10A所示,当指示光环的光亮部分旋转至目标物体前方时,指示光环的光亮部分遮挡目标物体。如图10B所示,当指示光环的光亮部分旋转至目标物体后方时,指示光环的光亮部分被目标物体遮挡。
可选地,在一些实施例中,S830所述根据所述标识信息,对所述当前图像进行像素级别处理,可以包括:根据所述图像区域信息和所述像素信息,确定所述当前图像中属于所述目标物体的像素或所述当前图像中不属于所述目标物体的非目标物体的像素;改变所述目标物体的亮度、颜色和灰度中的至少一种,或改变所述非目标物体的亮度、颜色和灰度中的至少一种,或改变所述目标物体和所述非目标物体的对比度。
具体地,所述改变所述目标物体的亮度或颜色,或改变所述非目标物体 的亮度或颜色,可以包括:通过修改所述目标物体或所述非目标物体的YUV值、RGB值或γ曲线,来改变所述目标物体的亮度或颜色,或改变所述非目标物体的亮度或颜色。当图像为灰度图像时,可以根据标识信息,改变目标物体的灰度。当需要突出目标物体和非目标物体的对比时,可以提高目标物体和非目标物体之间的对比度。
下面以几个具体的例子来说明改变亮度、颜色、灰度或对比度的情况。在这些例子中,所述像素信息中所述至少一个像素可以被赋予不同的数值。例如,至少一个像素中第一部分像素被赋予第一数值,来指示第一部分像素不属于目标物体;和/或,至少一个像素中第二部分像素被赋予第二数值,来指示第二部分像素属于目标物体。
例如,当所述像素信息中第二部分像素对应第二数值时,确定所述第二部分像素属于所述目标物体;提高所述第二部分像素的亮度。
再如,当所述像素信息中第一部分像素对应第一数值时,确定所述第一部分像素不属于所述目标物体;降低所述第一部分像素的亮度。
又如,当所述像素信息中第一部分像素对应第一数值时,确定所述第一部分像素属于所述目标物体;将所述第一部分像素用预设颜色标识。
又如,当所述像素信息中第一部分像素对应第一数值时,确定所述第一部分像素属于所述目标物体;当所述像素信息中第二部分像素对应第二数值时,确定所述第二部分像素属于不所述目标物体;提高所述第一部分像素和所述第二部分像素之间的对比度。
可选地,所述标识信息中还包括内容信息,用于指示所述目标物体的内容类别。所述改变所述目标物体的亮度、颜色和灰度中的至少一种,可以包括:当所述目标物体为第一内容类别时,改变所述目标物体的亮度为预设的第一亮度值、改变所述目标物体的颜色为预设的第一颜色值、或改变所述目标物体的灰度为预设的第一灰度值。图11是本申请实施例的改变目标物体的亮度的示意图。例如当前图像中包括多个目标物体,该多个目标物体一部分内容类别是人,另一部分内容类别不是人。可以改变内容类别是人的目标物体的亮度为预设的第一亮度值。如图11所示,内容类别是人的目标物体被高亮或者说突出(pop out)显示,以便于观察者观察。例如,该例子可用于智能摄像头的录像结果回放查看,或实时播放等。
可选地,当所述当前图像中包括多个所述目标物体时,所述改变所述目 标物体的亮度、颜色和灰度中的至少一种,可以包括:根据多个所述目标物体的内容类别,对不同内容类别的所述目标物体赋予不同的亮度值、颜色值或灰度值。即,以不同的亮度、颜色或灰度来标识不同内容类别的目标物体,以便于观察者观察。
可选地,在一些实施例中,S830根据所述标识信息,对所述当前图像进行像素级别处理,可以包括:根据所述图像区域信息、所述像素信息和所述内容类别,基于所述当前图像生成物体类别分割图像。物体类别分割图像中,不同内容类别的目标物体可以被赋予不同的颜色。图12A是当前图像的原图,图12B是当前图像对应的物体类别分割图像。如图12B所示,例如,车用蓝色标识,建筑物用灰色标识,大地用紫色标识。此外,人用红色标识,路灯用黄色标识,植物用绿色标识等等,图12B中未标出。
应理解,如前文所述,内容信息可以为标签label或数值。
可选地,在一些实施例中,所述至少一个像素的属性可以包括所述至少一个像素所属的所述目标物体的部位。例如,所述像素信息中所述至少一个像素被赋予不同的数值,以表示所述至少一个像素属于所述目标物体的不同部位。
可选地,S830根据所述标识信息,对所述当前图像进行像素级别处理,可以包括:根据所述图像区域信息和所述像素信息,对所述目标物体的不同部位用不同的亮度、颜色或灰度标识,或者不同部位之间具有不同的对比度。下面以目标物体的不同部位用不同的颜色标识为例进行说明。图13是本申请一个实施例的不同部位用不同的颜色标识的图像的示意图。例如,图13左下方的人与包、自行车一起构成目标物体。该目标物体中,人用黄色标识、包用红色标识,自行车用绿色标识。此外,在该图像中,其他内容类型的目标物体,例如汽车用蓝色标识,人用黄色标识。
可选地,所述至少一个像素的属性可以包括所述至少一个像素对应的描述特征。所述至少一个像素对应的描述特征可以包括以下中的至少一种:所述至少一个像素对应的点云的反射强度、所述至少一个像素对应的红外强度和所述至少一个像素对应的深度值。
可选地,在本申请一些实施例中,所述当前图像中可以包括多个所述目标物体。S830根据所述标识信息,对所述当前图像进行像素级别处理,可以包括:根据多个所述目标物体的反射强度、红外强度或深度值,对不同反 射强度、红外强度或深度值的所述目标物体赋予不同的亮度值、颜色值和灰度值。以颜色为例,S830根据所述标识信息,对所述当前图像进行像素级别处理,可以包括以下中的至少一种处理。
在一个实施例的处理中,根据所述图像区域信息、像素信息和所述至少一个像素对应的点云的反射强度,基于所述当前图像生成反射强度分割图像。反射强度分割图像中,可以区分目标物体的各个部分,不同反射强度的部分以不同的颜色标识;也可以不区分目标物体的各个部分,例如对目标物体各个部分的反射强度求平均值(或者反射强度本身就是对应整个目标物体的平均反射强度),一个目标物体以一个单一的颜色标识。不同反射强度的目标物体被赋予不同的颜色。图14A是当前图像的原图,图14B是当前图像对应的反射强度分割图像。如图14B所示,不同反射强度的目标物体被赋予不同的颜色,图14B中不属于任何目标物体的部分,例如背景部分以白色标识。
在一个实施例的处理中,根据所述图像区域信息、像素信息和所述至少一个像素对应的深度值,基于所述当前图像生成深度图。深度图中,可以区分目标物体的各个部分,不同深度值的部分以不同的颜色标识;也可以不区分目标物体的各个部分,例如对目标物体各个部分的深度值求平均值(或者深度值本身就是对应整个目标物体的平均深度值),一个目标物体以一个单一的颜色标识。图15A是当前图像的原图,图15B是当前图像对应的深度图。如图15B所示,不同深度值的像素点被赋予不同的颜色。
类似地,根据所述图像区域信息、所述像素信息和所述至少一个像素对应的红外强度,基于所述当前图像可以生成红外图像,此处不再赘述。
可选地,在本申请一些实施例中,S830根据所述标识信息,对所述当前图像进行像素级别处理,可以包括:根据所述标识信息,对所述当前图像中的数据信息进行统计。
可选地,在本申请一些实施例中,所述标识信息中还包括内容信息,用于指示所述目标物体的内容类别。可以利用内容信息中所包括的目标物体的内容类别进行一些统计。
相应地,S830根据所述标识信息,对所述当前图像进行像素级别处理,可以包括:根据所述目标物体的内容类别,对所述当前图像中的所述目标物体进行统计,得到统计结果。
例如,所述根据所述目标物体的内容类别,对所述当前图像中的所述目标物体进行统计,得到统计结果,可以包括:对所述当前图像中内容类别为人的目标物体进行统计,获得人流量结果和/或人流密度结果。该场景可以用于市政管理部门用于上下班高峰期或节假日的人流管理,或者可以用于商业布局目的,用于统计客流量等。
再如,所述根据所述目标物体的内容类别,对所述当前图像中的所述目标物体进行统计,得到统计结果,可以包括:对所述当前图像中内容类别为车的目标物体进行统计,获得车流量结果和/或车流密度结果。该场景可以用于交通管理部门用于上下班高峰期或公共交通站点的交通管理等。
可选地,在本申请一些实施例中,S830中的像素级别处理可以是表情识别或动作识别等。
例如,当目标物体是人时,S830根据所述标识信息,对所述当前图像进行像素级别处理,可以包括:当所述像素信息中第一部分像素对应第三数值时,确定所述第一部分像素属于所述目标物体的头部;根据所述目标物体的头部,进行人物表情识别;和/或,S830根据所述标识信息,对所述当前图像进行像素级别处理,可以包括:当所述像素信息中第二部分像素对应第四数值时,确定所述第二部分像素属于所述目标物体的手部;根据所述目标物体的手部,进行手部动作识别。
该场景可以用于无人机领域。解码设备可以基于人物表情识别的结果或手部动作识别的结果,向无人机发送控制指令。例如,当手部摆出“T”形状时,无人机悬空或者返航。再如,当操作者点头时,无人机加速飞行等等。人物表情或手部动作所代表的含义可以是无人机与控制端提前约定好的,本申请实施例对此不再赘述。
可选地,在本申请一些实施例中,S830中的像素级别处理可以涉及交通管理等。例如,所述目标物体是车,S830根据所述标识信息,对所述当前图像进行像素级别处理,可以包括:当所述像素信息中第一部分像素对应第五数值时,确定所述第一部分像素属于所述目标物体的车头;根据所述目标物体的车头,确定所述目标物体的行驶方向;和/或,S830根据所述标识信息,对所述当前图像进行像素级别处理,可以包括:当所述像素信息中第一部分像素对应第六数值时,确定所述第二部分像素属于所述目标物体的车尾;根据所述目标物体的车尾,确定所述目标物体的行驶方向。通过本申请 实施的方法,可以快速找到逆行车辆,辅助交通警察进行及时处理。
可选地,在本申请一些实施例中,所述至少一个像素对应的描述特征包括以下中的至少一种:所述至少一个像素对应的点云的反射强度、所述至少一个像素对应的红外强度和所述至少一个像素对应的深度值。S830根据所述标识信息,对所述当前图像进行像素级别处理,可以包括:根据所述描述特征,对所述当前图像中的所述目标物体进行统计,得到统计结果。以深度值为例,统计深度值为某个值的目标物体的数量,或者深度值处于某个范围内的目标物体的数量,以用于距离相关的一些统计,此处不进行一一赘述。
可选地,在统计完成后,所述方法800还可以包括:根据所述统计结果,生成针对所述目标物体的所述统计结果的热力图。即在统计完成后,生成热力图,以用于展示统计结果。
应理解,本申请各实施例中,所述属性可以以像素为计量单位;所述属性也可以以像素块为计量单位,所述像素信息可以包括至少一个像素块的属性的信息,所述像素块包括至少两个像素。
应理解,本申请各实施例中,所述标识信息可以位于所述当前图像的辅助强化信息或扩展数据中。
以上详细说明了本申请实施例的图像处理方法,下面详细说明本申请实施例的图像处理装置。
图16是本申请一个实施例的图像处理装置1600的示意性框图。如图16所示,装置1600包括:
至少一个存储器1610,用于存储计算机可执行指令;
至少一个处理器1620,单独或共同地用于:访问所述至少一个存储器1610,并执行所述计算机可执行指令,以实施以下操作:
获取当前图像的码流数据,所述码流数据中包括标识信息,所述标识信息用于标识所述当前图像中的至少一个目标物体,所述标识信息包括图像区域信息和像素信息,所述图像区域信息包括所述目标物体所在的图像区域的位置和尺寸,所述像素信息包括所述图像区域中的至少一个像素的属性;
对所述码流数据进行解码,得到所述当前图像和所述标识信息;
根据所述标识信息,对所述当前图像进行像素级别处理。
本申请实施例的图像处理装置,通过图像区域信息指示目标物体所在的图像区域的位置和尺寸,通过像素信息指示图像区域中的多个像素的属性, 从而以更细的粒度来标识目标物体,使得图像处理装置可以更高效更准确地对目标物体执行像素级别处理。
应理解,该图像处理装置1600可以为解码设备。现有的方案中,由于识别目标物体的运算较为复杂,通常对解码设备的硬件要求较高,解码设备通常为电脑或服务器。本申请实施例的图像处理方法使得目标物体的识别可以放在编码端进行,解码设备仅需要进行后续的图像处理即可。因此,一方面,本申请实施例的图像处理方法可以在手机、平板电脑等平台上实现;另一方面,解码设备的计算资源可以用于更复杂的图像处理,使得解码设备能够呈现出更优质更精美的图像。
可选地,在一些实施例中,所述处理器1620根据所述标识信息,对所述当前图像进行像素级别处理,包括:根据所述标识信息,更改所述当前图像的显示内容。
可选地,在一些实施例中,所述至少一个像素的属性包括所述至少一个像素是否属于所述目标物体。
可选地,在一些实施例中,所述处理器1620还用于获取第一图像;所述处理器根据所述标识信息,对所述当前图像进行像素级别处理,包括:基于所述标识信息对所述当前图像和所述第一图像进行融合处理,得到第二图像,所述第二图像包括所述当前图像的至少部分内容和所述第一图像的至少部分内容。
可选地,在一些实施例中,所述处理器1620基于所述标识信息对所述当前图像和所述第一图像进行融合处理,包括:基于所述标识信息对所述当前图像和所述第一图像进行加权求与,其中所述当前图像中所述目标物体对应的像素的加权值与所述当前图像中除所述目标物体以外的至少部分像素的加权值不同。
可选地,在一些实施例中,所述处理器1620基于所述标识信息对所述当前图像和所述第一图像进行融合处理,包括:根据所述图像区域信息和所述像素信息,确定所述当前图像中不属于所述目标物体的像素;用所述第一图像中的相应像素替换所述当前图像中不属于所述目标物体的像素,得到第二图像。
可选地,在一些实施例中,所述像素信息中,所述至少一个像素被赋予不同的数值,所述处理器1620根据所述图像区域信息和所述像素信息,确 定所述当前图像中不属于所述目标物体的像素,包括:根据所述图像区域信息,将所述目标物体所在的图像区域以外的像素确定为所述当前图像中不属于所述目标物体的像素;当所述像素信息中第一部分像素对应第一数值时,确定所述第一部分像素不属于所述目标物体。
可选地,在一些实施例中,所述处理器1620根据所述图像区域信息和所述像素信息,确定所述当前图像中不属于所述目标物体的像素之前,所述处理器1620还用于:根据所述像素信息,确定所述目标物体的边界,基于所述边界对所述目标物体进行膨胀运算;所述处理器1620根据所述图像区域信息和所述像素信息,确定所述当前图像中不属于所述目标物体的像素,包括:根据所述图像区域信息、所述像素信息和膨胀后的所述目标物体的边界,确定所述当前图像中不属于所述目标物体的像素。
可选地,在一些实施例中,所述当前图像的拍摄角度与所述第一图像的拍摄角度相同。
可选地,在一些实施例中,所述标识信息中还包括视角信息,所述视角信息用于指示所述目标物体相对拍摄装置的拍摄角度;在所述处理器1620用第一图像中的相应像素替换所述当前图像中不属于所述目标物体的像素之前,所述处理器1620还用于:确定所述目标物体的拍摄角度与所述第一图像的拍摄角度相同。
可选地,在一些实施例中,所述处理器1620根据所述标识信息,对所述当前图像进行像素级别处理,包括:根据所述图像区域信息和所述像素信息,确定所述当前图像中的所述目标物体,对所述目标物体添加增强现实AR特效。
可选地,在一些实施例中,所述标识信息中还包括视角信息,所述视角信息用于指示所述目标物体相对拍摄装置的拍摄角度,所述至少一个像素的属性还包括所述至少一个像素对应的深度值;所述处理器1620根据所述图像区域信息和所述像素信息,确定所述当前图像中的所述目标物体,对所述目标物体添加AR特效,包括:根据所述图像区域信息和所述像素信息,确定所述当前图像中的所述目标物体;根据所述目标物体、所述拍摄角度和所述至少一个像素对应的深度值,对所述目标物体添加AR特效。
可选地,在一些实施例中,所述处理器1620根据所述标识信息,对所述当前图像进行像素级别处理,包括:根据所述图像区域信息和所述像素信 息,确定所述当前图像中属于所述目标物体的像素或所述当前图像中不属于所述目标物体的非目标物体的像素;改变所述目标物体的亮度、颜色和灰度中的至少一种,或改变所述非目标物体的亮度、颜色和灰度中的至少一种,或改变所述目标物体和所述非目标物体的对比度。
可选地,在一些实施例中,所述处理器1620改变所述目标物体的亮度或颜色,或改变所述非目标物体的亮度或颜色,包括:通过修改所述目标物体或所述非目标物体的YUV值、RGB值或γ曲线,来改变所述目标物体的亮度或颜色,或改变所述非目标物体的亮度或颜色。
可选地,在一些实施例中,所述标识信息中还包括内容信息,用于指示所述目标物体的内容类别。
可选地,在一些实施例中,所述处理器1620改变所述目标物体的亮度、颜色和灰度中的至少一种,包括:当所述目标物体为第一内容类别时,改变所述目标物体的亮度为预设的第一亮度值、改变所述目标物体的颜色为预设的第一颜色值、或改变所述目标物体的灰度为预设的第一灰度值。
可选地,在一些实施例中,所述当前图像中包括多个所述目标物体,所述处理器1620改变所述目标物体的亮度、颜色和灰度中的至少一种,包括:根据多个所述目标物体的内容类别,对不同内容类别的所述目标物体赋予不同的亮度值、颜色值或灰度值。
可选地,在一些实施例中,所述处理器1620根据所述标识信息,对所述当前图像进行像素级别处理,包括:根据所述图像区域信息、所述像素信息和所述内容类别,基于所述当前图像生成物体类别分割图像。
可选地,在一些实施例中,所述内容信息为标签label或数值。
可选地,在一些实施例中,所述至少一个像素的属性包括所述至少一个像素所属的所述目标物体的部位。
可选地,在一些实施例中,所述像素信息中,所述至少一个像素被赋予不同的数值,以表示所述至少一个像素属于所述目标物体的不同部位。
可选地,在一些实施例中,所述处理器1620根据所述标识信息,对所述当前图像进行像素级别处理,包括:根据所述图像区域信息和所述像素信息,对所述目标物体的不同部位用不同的亮度、颜色或灰度标识,或者不同部位之间具有不同的对比度。
可选地,在一些实施例中,所述至少一个像素的属性包括所述至少一个 像素对应的描述特征。
可选地,在一些实施例中,所述至少一个像素对应的描述特征包括以下中的至少一种:所述至少一个像素对应的点云的反射强度、所述至少一个像素对应的红外强度和所述至少一个像素对应的深度值。
可选地,在一些实施例中,所述处理器1620根据所述标识信息,对所述当前图像进行像素级别处理,包括以下中的至少一种:根据所述图像区域信息、所述像素信息和所述至少一个像素对应的点云的反射强度,基于所述当前图像生成反射强度分割图像;根据所述图像区域信息、所述像素信息和所述至少一个像素对应的红外强度,基于所述当前图像生成红外图像;根据所述图像区域信息、所述像素信息和所述至少一个像素对应的深度值,基于所述当前图像生成深度图。
可选地,在一些实施例中,所述当前图像中包括多个所述目标物体,所述处理器1620根据所述标识信息,对所述当前图像进行像素级别处理,包括:根据多个所述目标物体的反射强度、红外强度或深度值,对不同反射强度、红外强度或深度值的所述目标物体赋予不同的亮度值、颜色值和灰度值。
可选地,在一些实施例中,所述处理器1620根据所述标识信息,对所述当前图像进行像素级别处理,包括:根据所述标识信息,对所述当前图像中的数据信息进行统计。
可选地,在一些实施例中,所述标识信息中还包括内容信息,用于指示所述目标物体的内容类别。
可选地,在一些实施例中,所述处理器1620根据所述标识信息,对所述当前图像进行像素级别处理,包括:根据所述目标物体的内容类别,对所述当前图像中的所述目标物体进行统计,得到统计结果。
可选地,在一些实施例中,所述目标物体是人,所述处理器1620根据所述标识信息,对所述当前图像进行像素级别处理,包括:当所述像素信息中第一部分像素对应第三数值时,确定所述第一部分像素属于所述目标物体的头部;根据所述目标物体的头部,进行人物表情识别;
和/或,
所述处理器1620根据所述标识信息,对所述当前图像进行像素级别处理,包括:当所述像素信息中第二部分像素对应第四数值时,确定所述第二部分像素属于所述目标物体的手部;根据所述目标物体的手部,进行手部动 作识别。
可选地,在一些实施例中,所述处理器1620还用于:基于人物表情识别的结果或手部动作识别的结果,向无人机发送控制指令。
可选地,在一些实施例中,所述目标物体是车,所述处理器1620根据所述标识信息,对所述当前图像进行像素级别处理,包括:当所述像素信息中第一部分像素对应第五数值时,确定所述第一部分像素属于所述目标物体的车头;根据所述目标物体的车头,确定所述目标物体的行驶方向;
和/或,
所述处理器1620根据所述标识信息,对所述当前图像进行像素级别处理,包括:当所述像素信息中第一部分像素对应第六数值时,确定所述第二部分像素属于所述目标物体的车尾;根据所述目标物体的车尾,确定所述目标物体的行驶方向。
可选地,在一些实施例中,所述处理器1620根据所述目标物体的内容类别,对所述当前图像中的所述目标物体进行统计,得到统计结果,包括:对所述当前图像中内容类别为人的目标物体进行统计,获得人流量结果和/或人流密度结果。
可选地,在一些实施例中,所述处理器1620根据所述目标物体的内容类别,对所述当前图像中的所述目标物体进行统计,得到统计结果,包括:对所述当前图像中内容类别为车的目标物体进行统计,获得车流量结果和/或车流密度结果。
可选地,在一些实施例中,所述至少一个像素对应的描述特征包括以下中的至少一种:所述至少一个像素对应的点云的反射强度、所述至少一个像素对应的红外强度和所述至少一个像素对应的深度值;所述处理器1620根据所述标识信息,对所述当前图像进行像素级别处理,包括:根据所述描述特征,对所述当前图像中的所述目标物体进行统计,得到统计结果。
可选地,在一些实施例中,所述处理器1620还用于:根据所述统计结果,生成针对所述目标物体的所述统计结果的热力图。
可选地,在一些实施例中,所述属性以像素块为计量单位,所述像素信息包括至少一个像素块的属性的信息,所述像素块包括至少两个像素。
可选地,在一些实施例中,所述标识信息位于所述当前图像的辅助强化信息或扩展数据中。
应理解,本申请各实施例的图像处理装置可以基于模块实现。例如,图17是本申请一个实施例的图像处理装置1700的示意性框图。如图17所示,图像处理装置1700可以包括获取模块1710,用来获取当前图像的码流数据;解码模块1720,用于对该码流数据进行解码,得到该当前图像和该标识信息;处理模块1730,用于根据该标识信息,对该当前图像进行像素级别处理等。图像处理装置1700中的各模块可以用于执行本申请各实施例的图像处理方法,此处不再赘述。
应理解,本申请实施例中提及的处理器可以是中央处理单元(central processing unit,CPU),还可以是其他通用处理器、数字信号处理器(digital signal processor,DSP)、专用集成电路(application specific integrated circuit,ASIC)、现成可编程门阵列(field programmable gate array,FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件等。通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。
还应理解,本申请实施例中提及的存储器可以是易失性存储器或非易失性存储器,或可包括易失性和非易失性存储器两者。其中,非易失性存储器可以是只读存储器(read-only memory,ROM)、可编程只读存储器(programmable ROM,PROM)、可擦除可编程只读存储器(erasable PROM,EPROM)、电可擦除可编程只读存储器(electrically EPROM,EEPROM)或闪存。易失性存储器可以是随机存取存储器(random access memory,RAM),其用作外部高速缓存。通过示例性但不是限制性说明,许多形式的RAM可用,例如静态随机存取存储器(static RAM,SRAM)、动态随机存取存储器(dynamic RAM,DRAM)、同步动态随机存取存储器(synchronous DRAM,SDRAM)、双倍数据速率同步动态随机存取存储器(double data rate SDRAM,DDR SDRAM)、增强型同步动态随机存取存储器(enhanced SDRAM,ESDRAM)、同步连接动态随机存取存储器(synchlink DRAM,SLDRAM)和直接内存总线随机存取存储器(direct rambus RAM,DR RAM)。
需要说明的是,当处理器为通用处理器、DSP、ASIC、FPGA或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件时,存储器(存储模块)集成在处理器中。
应注意,本文描述的存储器旨在包括但不限于这些和任意其它适合类型的存储器。
本申请实施例还提供一种计算机可读存储介质,其上存储有指令,当指令在计算机上运行时,使得计算机执行上述各方法实施例的方法。
本申请实施例还提供一种计算机程序,该计算机程序使得计算机执行上述各方法实施例的方法。
本申请实施例还提供一种计算设备,该计算设备包括上述计算机可读存储介质。
本申请实施例可以应用在飞行器,尤其是无人机领域。
应理解,本申请各实施例的电路、子电路、子单元的划分只是示意性的。本领域普通技术人员可以意识到,本文中所公开的实施例描述的各示例的电路、子电路和子单元,能够再行拆分或组合。
在上述实施例中,可以全部或部分地通过软件、硬件、固件或者其任意组合来实现。当使用软件实现时,可以全部或部分地以计算机程序产品的形式实现。计算机程序产品包括一个或多个计算机指令。在计算机上加载和执行计算机指令时,全部或部分地产生按照本申请实施例的流程或功能。计算机可以是通用计算机、专用计算机、计算机网络、或者其他可编程装置。计算机指令可以存储在计算机可读存储介质中,或者从一个计算机可读存储介质向另一个计算机可读存储介质传输,例如,计算机指令可以从一个网站站点、计算机、服务器或数据中心通过有线(例如同轴电缆、光纤、数字用户线(digital subscriber line,DSL))或无线(例如红外、无线、微波等)方式向另一个网站站点、计算机、服务器或数据中心进行传输。计算机可读存储介质可以是计算机能够存取的任何可用介质或者是包含一个或多个可用介质集成的服务器、数据中心等数据存储设备。可用介质可以是磁性介质(例如,软盘、硬盘、磁带)、光介质(例如,高密度数字视频光盘(digital video disc,DVD))、或者半导体介质(例如,固态硬盘(solid state disk,SSD))等。
应理解,说明书通篇中提到的“一个实施例”或“一实施例”意味着与实施例有关的特定特征、结构或特性包括在本申请的至少一个实施例中。因此,在整个说明书各处出现的“在一个实施例中”或“在一实施例中”未必一定指相同的实施例。此外,这些特定的特征、结构或特性可以任意适合的方式结合在一个或多个实施例中。
应理解,在本申请的各种实施例中,上述各过程的序号的大小并不意味 着执行顺序的先后,各过程的执行顺序应以其功能和内在逻辑确定,而不应对本申请实施例的实施过程构成任何限定。
应理解,在本申请实施例中,“与A相应的B”表示B与A相关联,根据A可以确定B。但还应理解,根据A确定B并不意味着仅仅根据A确定B,还可以根据A和/或其它信息确定B。
应理解,本文中术语“和/或”,仅仅是一种描述关联对象的关联关系,表示可以存在三种关系,例如,A和/或B,可以表示:单独存在A,同时存在A和B,单独存在B这三种情况。另外,本文中字符“/”,一般表示前后关联对象是一种“或”的关系。
本领域普通技术人员可以意识到,结合本文中所公开的实施例描述的各示例的单元及算法步骤,能够以电子硬件、或者计算机软件和电子硬件的结合来实现。这些功能究竟以硬件还是软件方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本申请的范围。
所属领域的技术人员可以清楚地了解到,为描述的方便和简洁,上述描述的***、装置和单元的具体工作过程,可以参考前述方法实施例中的对应过程,在此不再赘述。
在本申请所提供的几个实施例中,应该理解到,所揭露的***、装置和方法,可以通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如,所述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个***,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或单元的间接耦合或通信连接,可以是电性,机械或其它的形式。
所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。
另外,在本申请各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。
以上所述,仅为本申请的具体实施方式,但本申请的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本申请揭露的技术范围内,可轻易想到变化或替换,都应涵盖在本申请的保护范围之内。因此,本申请的保护范围应以所述权利要求的保护范围为准。

Claims (77)

  1. 一种图像处理方法,其特征在于,包括:
    获取当前图像的码流数据,所述码流数据中包括标识信息,所述标识信息用于标识所述当前图像中的至少一个目标物体,所述标识信息包括图像区域信息和像素信息,所述图像区域信息包括所述目标物体所在的图像区域的位置和尺寸,所述像素信息包括所述图像区域中的至少一个像素的属性;
    对所述码流数据进行解码,得到所述当前图像和所述标识信息;
    根据所述标识信息,对所述当前图像进行像素级别处理。
  2. 根据权利要求1所述的方法,其特征在于,
    所述根据所述标识信息,对所述当前图像进行像素级别处理,包括:
    根据所述标识信息,更改所述当前图像的显示内容。
  3. 根据权利要求1或2所述的方法,其特征在于,所述至少一个像素的属性包括所述至少一个像素是否属于所述目标物体。
  4. 根据权利要求3所述的方法,其特征在于,所述方法还包括:
    获取第一图像;
    所述根据所述标识信息,对所述当前图像进行像素级别处理,包括:
    基于所述标识信息对所述当前图像和所述第一图像进行融合处理,得到第二图像,所述第二图像包括所述当前图像的至少部分内容和所述第一图像的至少部分内容。
  5. 根据权利要求4所述的方法,其特征在于,所述基于所述标识信息对所述当前图像和所述第一图像进行融合处理,包括:
    基于所述标识信息对所述当前图像和所述第一图像进行加权求与,其中所述当前图像中所述目标物体对应的像素的加权值与所述当前图像中除所述目标物体以外的至少部分像素的加权值不同。
  6. 根据权利要求4或5所述的方法,其特征在于,所述基于所述标识信息对所述当前图像和所述第一图像进行融合处理,包括:
    根据所述图像区域信息和所述像素信息,确定所述当前图像中不属于所述目标物体的像素;
    用所述第一图像中的相应像素替换所述当前图像中不属于所述目标物体的像素,得到第二图像。
  7. 根据权利要求6所述的方法,其特征在于,所述像素信息中,所述至 少一个像素被赋予不同的数值,
    所述根据所述图像区域信息和所述像素信息,确定所述当前图像中不属于所述目标物体的像素,包括:
    根据所述图像区域信息,将所述目标物体所在的图像区域以外的像素确定为所述当前图像中不属于所述目标物体的像素;
    当所述像素信息中第一部分像素对应第一数值时,确定所述第一部分像素不属于所述目标物体。
  8. 根据权利要求7所述的方法,其特征在于,在所述根据所述图像区域信息和所述像素信息,确定所述当前图像中不属于所述目标物体的像素之前,所述方法还包括:
    根据所述像素信息,确定所述目标物体的边界,基于所述边界对所述目标物体进行膨胀运算;
    所述根据所述图像区域信息和所述像素信息,确定所述当前图像中不属于所述目标物体的像素,包括:
    根据所述图像区域信息、所述像素信息和膨胀后的所述目标物体的边界,确定所述当前图像中不属于所述目标物体的像素。
  9. 根据权利要求4至8中任一项所述的方法,其特征在于,所述当前图像的拍摄角度与所述第一图像的拍摄角度相同。
  10. 根据权利要求9所述的方法,其特征在于,所述标识信息中还包括视角信息,所述视角信息用于指示所述目标物体相对拍摄装置的拍摄角度;
    在所述用第一图像中的相应像素替换所述当前图像中不属于所述目标物体的像素之前,所述方法还包括:
    确定所述目标物体的拍摄角度与所述第一图像的拍摄角度相同。
  11. 根据权利要求3所述的方法,其特征在于,
    所述根据所述标识信息,对所述当前图像进行像素级别处理,包括:
    根据所述图像区域信息和所述像素信息,确定所述当前图像中的所述目标物体,对所述目标物体添加增强现实AR特效。
  12. 根据权利要求11所述的方法,其特征在于,所述标识信息中还包括视角信息,所述视角信息用于指示所述目标物体相对拍摄装置的拍摄角度,所述至少一个像素的属性还包括所述至少一个像素对应的深度值;
    所述根据所述图像区域信息和所述像素信息,确定所述当前图像中的所 述目标物体,对所述目标物体添加AR特效,包括:
    根据所述图像区域信息和所述像素信息,确定所述当前图像中的所述目标物体;
    根据所述目标物体、所述拍摄角度和所述至少一个像素对应的深度值,对所述目标物体添加AR特效。
  13. 根据权利要求3所述的方法,其特征在于,
    所述根据所述标识信息,对所述当前图像进行像素级别处理,包括:
    根据所述图像区域信息和所述像素信息,确定所述当前图像中属于所述目标物体的像素或所述当前图像中不属于所述目标物体的非目标物体的像素;
    改变所述目标物体的亮度、颜色和灰度中的至少一种,或改变所述非目标物体的亮度、颜色和灰度中的至少一种,或改变所述目标物体和所述非目标物体的对比度。
  14. 根据权利要求13所述的方法,其特征在于,
    所述改变所述目标物体的亮度或颜色,或改变所述非目标物体的亮度或颜色,包括:
    通过修改所述目标物体或所述非目标物体的YUV值、RGB值或γ曲线,来改变所述目标物体的亮度或颜色,或改变所述非目标物体的亮度或颜色。
  15. 根据权利要求1、3、13或14所述的方法,其特征在于,所述标识信息中还包括内容信息,用于指示所述目标物体的内容类别。
  16. 根据权利要求15所述的方法,其特征在于,
    所述改变所述目标物体的亮度、颜色和灰度中的至少一种,包括:
    当所述目标物体为第一内容类别时,改变所述目标物体的亮度为预设的第一亮度值、改变所述目标物体的颜色为预设的第一颜色值、或改变所述目标物体的灰度为预设的第一灰度值。
  17. 根据权利要求15所述的方法,其特征在于,所述当前图像中包括多个所述目标物体,
    所述改变所述目标物体的亮度、颜色和灰度中的至少一种,包括:
    根据多个所述目标物体的内容类别,对不同内容类别的所述目标物体赋予不同的亮度值、颜色值或灰度值。
  18. 根据权利要求15所述的方法,其特征在于,
    所述根据所述标识信息,对所述当前图像进行像素级别处理,包括:
    根据所述图像区域信息、所述像素信息和所述内容类别,基于所述当前图像生成物体类别分割图像。
  19. 根据权利要求15至18中任一项所述的方法,所述内容信息为标签label或数值。
  20. 根据权利要求1或2所述的方法,其特征在于,所述至少一个像素的属性包括所述至少一个像素所属的所述目标物体的部位。
  21. 根据权利要求1、2或20所述的方法,其特征在于,所述像素信息中,所述至少一个像素被赋予不同的数值,以表示所述至少一个像素属于所述目标物体的不同部位。
  22. 根据权利要求1、2、20或21所述的方法,其特征在于,
    所述根据所述标识信息,对所述当前图像进行像素级别处理,包括:
    根据所述图像区域信息和所述像素信息,对所述目标物体的不同部位用不同的亮度、颜色或灰度标识,或者不同部位之间具有不同的对比度。
  23. 根据权利要求1至3中任一项所述的方法,其特征在于,所述至少一个像素的属性包括所述至少一个像素对应的描述特征。
  24. 根据权利要求23所述的方法,其特征在于,所述至少一个像素对应的描述特征包括以下中的至少一种:所述至少一个像素对应的点云的反射强度、所述至少一个像素对应的红外强度和所述至少一个像素对应的深度值。
  25. 根据权利要求24所述的方法,其特征在于,
    所述根据所述标识信息,对所述当前图像进行像素级别处理,包括以下中的至少一种:
    根据所述图像区域信息、所述像素信息和所述至少一个像素对应的点云的反射强度,基于所述当前图像生成反射强度分割图像;
    根据所述图像区域信息、所述像素信息和所述至少一个像素对应的红外强度,基于所述当前图像生成红外图像;
    根据所述图像区域信息、所述像素信息和所述至少一个像素对应的深度值,基于所述当前图像生成深度图。
  26. 根据权利要求24所述的方法,其特征在于,所述当前图像中包括多个所述目标物体,
    所述根据所述标识信息,对所述当前图像进行像素级别处理,包括:
    根据多个所述目标物体的反射强度、红外强度或深度值,对不同反射强度、红外强度或深度值的所述目标物体赋予不同的亮度值、颜色值和灰度值。
  27. 根据权利要求1所述的方法,其特征在于,
    所述根据所述标识信息,对所述当前图像进行像素级别处理,包括:
    根据所述标识信息,对所述当前图像中的数据信息进行统计。
  28. 根据权利要求1或27所述的方法,其特征在于,所述标识信息中还包括内容信息,用于指示所述目标物体的内容类别。
  29. 根据权利要求28所述的方法,其特征在于,
    所述根据所述标识信息,对所述当前图像进行像素级别处理,包括:
    根据所述目标物体的内容类别,对所述当前图像中的所述目标物体进行统计,得到统计结果。
  30. 根据权利要求1、27或28所述的方法,其特征在于,所述目标物体是人,
    所述根据所述标识信息,对所述当前图像进行像素级别处理,包括:
    当所述像素信息中第一部分像素对应第三数值时,确定所述第一部分像素属于所述目标物体的头部;
    根据所述目标物体的头部,进行人物表情识别;
    和/或,
    所述根据所述标识信息,对所述当前图像进行像素级别处理,包括:
    当所述像素信息中第二部分像素对应第四数值时,确定所述第二部分像素属于所述目标物体的手部;
    根据所述目标物体的手部,进行手部动作识别。
  31. 根据权利要求30所述的方法,其特征在于,所述方法还包括:
    基于人物表情识别的结果或手部动作识别的结果,向无人机发送控制指令。
  32. 根据权利要求1、27或28所述的方法,其特征在于,所述目标物体是车,
    所述根据所述标识信息,对所述当前图像进行像素级别处理,包括:
    当所述像素信息中第一部分像素对应第五数值时,确定所述第一部分像素属于所述目标物体的车头;
    根据所述目标物体的车头,确定所述目标物体的行驶方向;
    和/或,
    所述根据所述标识信息,对所述当前图像进行像素级别处理,包括:
    当所述像素信息中第一部分像素对应第六数值时,确定所述第二部分像素属于所述目标物体的车尾;
    根据所述目标物体的车尾,确定所述目标物体的行驶方向。
  33. 根据权利要求29所述的方法,其特征在于,
    所述根据所述目标物体的内容类别,对所述当前图像中的所述目标物体进行统计,得到统计结果,包括:
    对所述当前图像中内容类别为人的目标物体进行统计,获得人流量结果和/或人流密度结果。
  34. 根据权利要求29所述的方法,其特征在于,
    所述根据所述目标物体的内容类别,对所述当前图像中的所述目标物体进行统计,得到统计结果,包括:
    对所述当前图像中内容类别为车的目标物体进行统计,获得车流量结果和/或车流密度结果。
  35. 根据权利要求1或27所述的方法,其特征在于,所述至少一个像素对应的描述特征包括以下中的至少一种:所述至少一个像素对应的点云的反射强度、所述至少一个像素对应的红外强度和所述至少一个像素对应的深度值;
    所述根据所述标识信息,对所述当前图像进行像素级别处理,包括:
    根据所述描述特征,对所述当前图像中的所述目标物体进行统计,得到统计结果。
  36. 根据权利要求27或35所述的方法,其特征在于,所述方法还包括:
    根据所述统计结果,生成针对所述目标物体的所述统计结果的热力图。
  37. 根据权利要求1至36中任一项所述的方法,其特征在于,所述属性以像素块为计量单位,所述像素信息包括至少一个像素块的属性的信息,所述像素块包括至少两个像素。
  38. 根据权利要求1至37中任一项所述的方法,其特征在于,所述标识信息位于所述当前图像的辅助强化信息或扩展数据中。
  39. 一种图像处理装置,其特征在于,包括:
    至少一个存储器,用于存储计算机可执行指令;
    至少一个处理器,单独或共同地用于:访问所述至少一个存储器,并执行所述计算机可执行指令,以实施以下操作:
    获取当前图像的码流数据,所述码流数据中包括标识信息,所述标识信息用于标识所述当前图像中的至少一个目标物体,所述标识信息包括图像区域信息和像素信息,所述图像区域信息包括所述目标物体所在的图像区域的位置和尺寸,所述像素信息包括所述图像区域中的至少一个像素的属性;
    对所述码流数据进行解码,得到所述当前图像和所述标识信息;
    根据所述标识信息,对所述当前图像进行像素级别处理。
  40. 根据权利要求39所述的装置,其特征在于,
    所述处理器根据所述标识信息,对所述当前图像进行像素级别处理,包括:
    根据所述标识信息,更改所述当前图像的显示内容。
  41. 根据权利要求39或40所述的装置,其特征在于,所述至少一个像素的属性包括所述至少一个像素是否属于所述目标物体。
  42. 根据权利要求41所述的装置,其特征在于,所述处理器还用于获取第一图像;
    所述处理器根据所述标识信息,对所述当前图像进行像素级别处理,包括:
    基于所述标识信息对所述当前图像和所述第一图像进行融合处理,得到第二图像,所述第二图像包括所述当前图像的至少部分内容和所述第一图像的至少部分内容。
  43. 根据权利要求42所述的装置,其特征在于,所述处理器基于所述标识信息对所述当前图像和所述第一图像进行融合处理,包括:
    基于所述标识信息对所述当前图像和所述第一图像进行加权求与,其中所述当前图像中所述目标物体对应的像素的加权值与所述当前图像中除所述目标物体以外的至少部分像素的加权值不同。
  44. 根据权利要求42或43所述的装置,其特征在于,所述处理器基于所述标识信息对所述当前图像和所述第一图像进行融合处理,包括:
    根据所述图像区域信息和所述像素信息,确定所述当前图像中不属于所述目标物体的像素;
    用所述第一图像中的相应像素替换所述当前图像中不属于所述目标物 体的像素,得到第二图像。
  45. 根据权利要求44所述的装置,其特征在于,所述像素信息中,所述至少一个像素被赋予不同的数值,
    所述处理器根据所述图像区域信息和所述像素信息,确定所述当前图像中不属于所述目标物体的像素,包括:
    根据所述图像区域信息,将所述目标物体所在的图像区域以外的像素确定为所述当前图像中不属于所述目标物体的像素;
    当所述像素信息中第一部分像素对应第一数值时,确定所述第一部分像素不属于所述目标物体。
  46. 根据权利要求45所述的装置,其特征在于,在所述处理器根据所述图像区域信息和所述像素信息,确定所述当前图像中不属于所述目标物体的像素之前,所述处理器还用于:
    根据所述像素信息,确定所述目标物体的边界,基于所述边界对所述目标物体进行膨胀运算;
    所述处理器根据所述图像区域信息和所述像素信息,确定所述当前图像中不属于所述目标物体的像素,包括:
    根据所述图像区域信息、所述像素信息和膨胀后的所述目标物体的边界,确定所述当前图像中不属于所述目标物体的像素。
  47. 根据权利要求42至46中任一项所述的装置,其特征在于,所述当前图像的拍摄角度与所述第一图像的拍摄角度相同。
  48. 根据权利要求47所述的装置,其特征在于,所述标识信息中还包括视角信息,所述视角信息用于指示所述目标物体相对拍摄装置的拍摄角度;
    在所述处理器用第一图像中的相应像素替换所述当前图像中不属于所述目标物体的像素之前,所述处理器还用于:
    确定所述目标物体的拍摄角度与所述第一图像的拍摄角度相同。
  49. 根据权利要求41所述的装置,其特征在于,
    所述处理器根据所述标识信息,对所述当前图像进行像素级别处理,包括:
    根据所述图像区域信息和所述像素信息,确定所述当前图像中的所述目标物体,对所述目标物体添加增强现实AR特效。
  50. 根据权利要求49所述的装置,其特征在于,所述标识信息中还包括 视角信息,所述视角信息用于指示所述目标物体相对拍摄装置的拍摄角度,所述至少一个像素的属性还包括所述至少一个像素对应的深度值;
    所述处理器根据所述图像区域信息和所述像素信息,确定所述当前图像中的所述目标物体,对所述目标物体添加AR特效,包括:
    根据所述图像区域信息和所述像素信息,确定所述当前图像中的所述目标物体;
    根据所述目标物体、所述拍摄角度和所述至少一个像素对应的深度值,对所述目标物体添加AR特效。
  51. 根据权利要求41所述的装置,其特征在于,
    所述处理器根据所述标识信息,对所述当前图像进行像素级别处理,包括:
    根据所述图像区域信息和所述像素信息,确定所述当前图像中属于所述目标物体的像素或所述当前图像中不属于所述目标物体的非目标物体的像素;
    改变所述目标物体的亮度、颜色和灰度中的至少一种,或改变所述非目标物体的亮度、颜色和灰度中的至少一种,或改变所述目标物体和所述非目标物体的对比度。
  52. 根据权利要求51所述的装置,其特征在于,
    所述处理器改变所述目标物体的亮度或颜色,或改变所述非目标物体的亮度或颜色,包括:
    通过修改所述目标物体或所述非目标物体的YUV值、RGB值或γ曲线,来改变所述目标物体的亮度或颜色,或改变所述非目标物体的亮度或颜色。
  53. 根据权利要求39、41、51或52所述的装置,其特征在于,所述标识信息中还包括内容信息,用于指示所述目标物体的内容类别。
  54. 根据权利要求53所述的装置,其特征在于,
    所述处理器改变所述目标物体的亮度、颜色和灰度中的至少一种,包括:
    当所述目标物体为第一内容类别时,改变所述目标物体的亮度为预设的第一亮度值、改变所述目标物体的颜色为预设的第一颜色值、或改变所述目标物体的灰度为预设的第一灰度值。
  55. 根据权利要求53所述的装置,其特征在于,所述当前图像中包括多个所述目标物体,
    所述处理器改变所述目标物体的亮度、颜色和灰度中的至少一种,包括:
    根据多个所述目标物体的内容类别,对不同内容类别的所述目标物体赋予不同的亮度值、颜色值或灰度值。
  56. 根据权利要求55所述的装置,其特征在于,
    所述处理器根据所述标识信息,对所述当前图像进行像素级别处理,包括:
    根据所述图像区域信息、所述像素信息和所述内容类别,基于所述当前图像生成物体类别分割图像。
  57. 根据权利要求53至56中任一项所述的装置,所述内容信息为标签label或数值。
  58. 根据权利要求39或40所述的装置,其特征在于,所述至少一个像素的属性包括所述至少一个像素所属的所述目标物体的部位。
  59. 根据权利要求39、40或58所述的装置,其特征在于,所述像素信息中,所述至少一个像素被赋予不同的数值,以表示所述至少一个像素属于所述目标物体的不同部位。
  60. 根据权利要求39、40、58或59所述的装置,其特征在于,
    所述处理器根据所述标识信息,对所述当前图像进行像素级别处理,包括:
    根据所述图像区域信息和所述像素信息,对所述目标物体的不同部位用不同的亮度、颜色或灰度标识,或者不同部位之间具有不同的对比度。
  61. 根据权利要求39至41中任一项所述的装置,其特征在于,所述至少一个像素的属性包括所述至少一个像素对应的描述特征。
  62. 根据权利要求51所述的装置,其特征在于,所述至少一个像素对应的描述特征包括以下中的至少一种:所述至少一个像素对应的点云的反射强度、所述至少一个像素对应的红外强度和所述至少一个像素对应的深度值。
  63. 根据权利要求62所述的装置,其特征在于,
    所述处理器根据所述标识信息,对所述当前图像进行像素级别处理,包括以下中的至少一种:
    根据所述图像区域信息、所述像素信息和所述至少一个像素对应的点云的反射强度,基于所述当前图像生成反射强度分割图像;
    根据所述图像区域信息、所述像素信息和所述至少一个像素对应的红外 强度,基于所述当前图像生成红外图像;
    根据所述图像区域信息、所述像素信息和所述至少一个像素对应的深度值,基于所述当前图像生成深度图。
  64. 根据权利要求62所述的装置,其特征在于,所述当前图像中包括多个所述目标物体,
    所述处理器根据所述标识信息,对所述当前图像进行像素级别处理,包括:
    根据多个所述目标物体的反射强度、红外强度或深度值,对不同反射强度、红外强度或深度值的所述目标物体赋予不同的亮度值、颜色值和灰度值。
  65. 根据权利要求39所述的装置,其特征在于,
    所述处理器根据所述标识信息,对所述当前图像进行像素级别处理,包括:
    根据所述标识信息,对所述当前图像中的数据信息进行统计。
  66. 根据权利要求39或65所述的装置,其特征在于,所述标识信息中还包括内容信息,用于指示所述目标物体的内容类别。
  67. 根据权利要求66所述的装置,其特征在于,
    所述处理器根据所述标识信息,对所述当前图像进行像素级别处理,包括:
    根据所述目标物体的内容类别,对所述当前图像中的所述目标物体进行统计,得到统计结果。
  68. 根据权利要求39、65或66所述的装置,其特征在于,所述目标物体是人,
    所述处理器根据所述标识信息,对所述当前图像进行像素级别处理,包括:
    当所述像素信息中第一部分像素对应第三数值时,确定所述第一部分像素属于所述目标物体的头部;
    根据所述目标物体的头部,进行人物表情识别;
    和/或,
    所述处理器根据所述标识信息,对所述当前图像进行像素级别处理,包括:
    当所述像素信息中第二部分像素对应第四数值时,确定所述第二部分像 素属于所述目标物体的手部;
    根据所述目标物体的手部,进行手部动作识别。
  69. 根据权利要求68所述的装置,其特征在于,所述处理器还用于:
    基于人物表情识别的结果或手部动作识别的结果,向无人机发送控制指令。
  70. 根据权利要求39、65或66所述的装置,其特征在于,所述目标物体是车,
    所述处理器根据所述标识信息,对所述当前图像进行像素级别处理,包括:
    当所述像素信息中第一部分像素对应第五数值时,确定所述第一部分像素属于所述目标物体的车头;
    根据所述目标物体的车头,确定所述目标物体的行驶方向;
    和/或,
    所述处理器根据所述标识信息,对所述当前图像进行像素级别处理,包括:
    当所述像素信息中第一部分像素对应第六数值时,确定所述第二部分像素属于所述目标物体的车尾;
    根据所述目标物体的车尾,确定所述目标物体的行驶方向。
  71. 根据权利要求67所述的装置,其特征在于,
    所述处理器根据所述目标物体的内容类别,对所述当前图像中的所述目标物体进行统计,得到统计结果,包括:
    对所述当前图像中内容类别为人的目标物体进行统计,获得人流量结果和/或人流密度结果。
  72. 根据权利要求67所述的装置,其特征在于,
    所述处理器根据所述目标物体的内容类别,对所述当前图像中的所述目标物体进行统计,得到统计结果,包括:
    对所述当前图像中内容类别为车的目标物体进行统计,获得车流量结果和/或车流密度结果。
  73. 根据权利要求39或65所述的装置,其特征在于,所述至少一个像素对应的描述特征包括以下中的至少一种:所述至少一个像素对应的点云的反射强度、所述至少一个像素对应的红外强度和所述至少一个像素对应的深 度值;
    所述处理器根据所述标识信息,对所述当前图像进行像素级别处理,包括:
    根据所述描述特征,对所述当前图像中的所述目标物体进行统计,得到统计结果。
  74. 根据权利要求65或73所述的装置,其特征在于,所述处理器还用于:
    根据所述统计结果,生成针对所述目标物体的所述统计结果的热力图。
  75. 根据权利要求39至74中任一项所述的装置,其特征在于,所述属性以像素块为计量单位,所述像素信息包括至少一个像素块的属性的信息,所述像素块包括至少两个像素。
  76. 根据权利要求39至75中任一项所述的装置,其特征在于,所述标识信息位于所述当前图像的辅助强化信息或扩展数据中。
  77. 一种计算机可读存储介质,其特征在于,其上存储有指令,当指令在计算机上运行时,使得计算机执行权利要求1至38中任一项所述的方法。
PCT/CN2018/094716 2018-07-05 2018-07-05 图像处理方法和装置 WO2020006739A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201880037369.6A CN110720224B (zh) 2018-07-05 2018-07-05 图像处理方法和装置
PCT/CN2018/094716 WO2020006739A1 (zh) 2018-07-05 2018-07-05 图像处理方法和装置

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2018/094716 WO2020006739A1 (zh) 2018-07-05 2018-07-05 图像处理方法和装置

Publications (1)

Publication Number Publication Date
WO2020006739A1 true WO2020006739A1 (zh) 2020-01-09

Family

ID=69059747

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2018/094716 WO2020006739A1 (zh) 2018-07-05 2018-07-05 图像处理方法和装置

Country Status (2)

Country Link
CN (1) CN110720224B (zh)
WO (1) WO2020006739A1 (zh)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112668582A (zh) * 2020-12-31 2021-04-16 北京迈格威科技有限公司 图像识别方法、装置、设备和存储介质
CN114333127A (zh) * 2021-12-09 2022-04-12 中建丝路建设投资有限公司 一种基于幸福林带的智慧化服务方法及***
CN114445683A (zh) * 2022-01-29 2022-05-06 北京百度网讯科技有限公司 属性识别模型训练、属性识别方法、装置及设备

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20220103846A1 (en) * 2020-09-28 2022-03-31 Alibaba Group Holding Limited Supplemental enhancement information message in video coding

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110026607A1 (en) * 2008-04-11 2011-02-03 Thomson Licensing System and method for enhancing the visibility of an object in a digital picture
CN102663375A (zh) * 2012-05-08 2012-09-12 合肥工业大学 H.264中基于数字水印技术的主动目标识别方法
CN103813169A (zh) * 2014-02-19 2014-05-21 北京大学 视频编解码器中可伸缩的对象表示方法和装置
CN107889215A (zh) * 2017-12-01 2018-04-06 重庆邮电大学 基于标识管理的多级定位方法及***
CN108200432A (zh) * 2018-02-03 2018-06-22 王灏 一种基于视频压缩域的目标跟踪技术

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104038798B (zh) * 2014-05-09 2017-12-19 青岛海信电器股份有限公司 一种图像处理的方法、设备及***
US20180270208A1 (en) * 2015-10-09 2018-09-20 Sony Corporation Image processing apparatus and image processing method

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110026607A1 (en) * 2008-04-11 2011-02-03 Thomson Licensing System and method for enhancing the visibility of an object in a digital picture
CN102663375A (zh) * 2012-05-08 2012-09-12 合肥工业大学 H.264中基于数字水印技术的主动目标识别方法
CN103813169A (zh) * 2014-02-19 2014-05-21 北京大学 视频编解码器中可伸缩的对象表示方法和装置
CN107889215A (zh) * 2017-12-01 2018-04-06 重庆邮电大学 基于标识管理的多级定位方法及***
CN108200432A (zh) * 2018-02-03 2018-06-22 王灏 一种基于视频压缩域的目标跟踪技术

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112668582A (zh) * 2020-12-31 2021-04-16 北京迈格威科技有限公司 图像识别方法、装置、设备和存储介质
CN114333127A (zh) * 2021-12-09 2022-04-12 中建丝路建设投资有限公司 一种基于幸福林带的智慧化服务方法及***
CN114333127B (zh) * 2021-12-09 2023-08-04 中建丝路建设投资有限公司 一种智慧化服务方法及***
CN114445683A (zh) * 2022-01-29 2022-05-06 北京百度网讯科技有限公司 属性识别模型训练、属性识别方法、装置及设备

Also Published As

Publication number Publication date
CN110720224B (zh) 2021-12-17
CN110720224A (zh) 2020-01-21

Similar Documents

Publication Publication Date Title
WO2020006739A1 (zh) 图像处理方法和装置
US10937167B2 (en) Automated generation of pre-labeled training data
WO2019218824A1 (zh) 一种移动轨迹获取方法及其设备、存储介质、终端
WO2018028583A1 (zh) 字幕提取方法及装置、存储介质
US8199165B2 (en) Methods and systems for object segmentation in digital images
WO2018099136A1 (zh) 一种低照度图像降噪方法、装置及存储介质
CN116018616A (zh) 保持帧中的目标对象的固定大小
WO2021051601A1 (zh) 利用Mask R-CNN选择检测框的方法及***、电子装置及存储介质
US10269119B2 (en) System and method for background and foreground segmentation
KR102139582B1 (ko) 다중 roi 및 객체 검출 dcnn 기반의 cctv 영상분석장치 및 그 장치의 구동방법
US8897548B2 (en) Low-complexity method of converting image/video into 3D from 2D
US11037308B2 (en) Intelligent method for viewing surveillance videos with improved efficiency
US20180144212A1 (en) Method and device for generating an image representative of a cluster of images
US20230351604A1 (en) Image cutting method and apparatus, computer device, and storage medium
US9805267B2 (en) Video processing system with photo generation and methods for use therewith
US20200145623A1 (en) Method and System for Initiating a Video Stream
US20230127009A1 (en) Joint objects image signal processing in temporal domain
WO2020000473A1 (zh) 编码方法、解码方法、编码设备和解码设备
CN112837323A (zh) 一种基于人像分割的视频处理方法、***和存储介质
US20210312200A1 (en) Systems and methods for video surveillance
CN111654747B (zh) 弹幕显示方法及装置
CN110570441B (zh) 一种超高清低延时视频控制方法及***
US11044399B2 (en) Video surveillance system
US20210099756A1 (en) Low-cost video segmentation
CN111915713A (zh) 一种三维动态场景的创建方法、计算机设备、存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18925611

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 18925611

Country of ref document: EP

Kind code of ref document: A1