CN111801703A

CN111801703A - Hardware and system for bounding box generation for an image processing pipeline

Info

Publication number: CN111801703A
Application number: CN201980016252.4A
Authority: CN
Inventors: A·F·加里多; J·科鲁兹-阿尔布雷克特; T·J·德罗西耶; S·劳
Original assignee: HRL Laboratories LLC
Current assignee: HRL Laboratories LLC
Priority date: 2018-04-17
Filing date: 2019-02-14
Publication date: 2020-10-20
Also published as: EP3782114A4; EP3782114A1; WO2019203920A1

Abstract

A system for bounding box generation is described. The system operates on an image made up of a plurality of pixels each having a one-bit value. The bounding box is generated around connected components in the image, the connected components having pixel coordinates and pixel count information. Based on the pixel coordinates and the pixel count information, an ordering score is generated for each bounding box. The bounding box is filtered based on the pixel coordinates and the pixel count information to remove bounding boxes that exceed a predetermined size and a predetermined pixel count. The bounding boxes are also filtered to remove bounding boxes below a predetermined ranking score, resulting in remaining bounding boxes. Finally, the device may be controlled or otherwise operated based on the remaining bounding box.

Description

Hardware and system for bounding box generation for an image processing pipeline

Government rights and interests

The invention was made with government support under U.S. government contract entitled "Revolutionary analog predictive Devices for Unconventional signal Processing for data development" (RAPID-UPSIDE) entitled HR 0011-13-C-0052. The government has certain rights in the invention.

Cross Reference to Related Applications

This application is a continuation-in-part application of U.S. application No.15/272,247 filed on 21/9/2016 and U.S. application No.15/272,247 is a non-provisional application of U.S. provisional application No.62/221,550 filed on 21/9/2015, which is incorporated herein by reference in its entirety.

U.S. application No.15/272,247, a continuation-in-part application of U.S. application No.15/079,899 filed 24/3/2016, and U.S. application No.15/079,899, a non-provisional application of U.S. provisional application No.62/137,665 filed 24/3/2015, the entire contents of which are incorporated herein by reference. U.S. application No.15/079,899 is also a non-provisional application of U.S. provisional application No.62/155,355 filed on 30.4.2015, which is incorporated herein by reference in its entirety.

U.S. application No.15/272,247 is a continuation-in-part application of U.S. application No.15/043,478, filed on 12.2.2016, which is hereby incorporated by reference in its entirety.

U.S. application No.15/272,247 is also a continuation-in-part application of U.S. application No.15/203,596 filed on 6/7/2016, and U.S. application No.15/203,596 is a non-provisional application of U.S. provisional application No.62/221,550 filed on 21/9/2015.

This application also claims the benefit of non-provisional patent application us.62/659,129 filed on 2018, 4, month 17, the entire contents of which are incorporated herein by reference.

Background

(1) Field of the invention

The present invention relates to an image processing system, and more particularly, to a system for generating a bounding box in an image for image processing.

(2) Description of the related Art

Image processing is used in a variety of implementations (including tracking and monitoring applications). In tracking or monitoring, a bounding box is used to identify an object and, ideally, to track the object across image frames and scenes. The bounding box may be formed by setting a box for the connected component. For example, the results of Walczyk et al describe the Labeling of connected Components for performing binary images (see Robert Walczyk, Alismar Armitage, and TD Binnie, "Comparative Study of connected component Labeling Algorithms for Embedded Video processing systems", conference records of the International conference on image processing, computer vision, and Pattern recognition (IPCV) (CSREA,2010), the entire contents of which are incorporated herein by reference). Although Walczyk et al disclose performing connected component labeling, this disclosure is directed only to labeling of images and fails to further efficiently process frames or images.

Therefore, there remains a need to generate bounding boxes while efficiently calculating bounding box coordinates and bounding box object pixel counts to facilitate subsequent sorting and filtering of the object boxes for image processing.

Disclosure of Invention

The present disclosure provides a system for bounding box generation. In various aspects, the system includes a memory and one or more processors. The memory has executable instructions such that, when executed, the one or more processors perform operations, such as receiving an image comprised of a plurality of pixels having a one-bit value per pixel; generating a bounding box around connected components in the image, the connected components having pixel coordinates and pixel count information; generating a ranking score for each bounding box based on the pixel coordinates and the pixel count information; filtering the bounding box based on the pixel coordinates and the pixel count information to remove bounding boxes that exceed a predetermined size and a predetermined pixel count; and filtering the bounding box to remove bounding boxes below a predetermined ranking score, thereby resulting in remaining bounding boxes; and controlling the device based on the remaining bounding box.

In another aspect, the processor is a Field Programmable Gate Array (FPGA).

In yet another aspect, generating the bounding box further comprises the operations of: grouping consecutive pixels in the image; and merging connected pixels into connected components, wherein the bounding box is formed by a box surrounding the connected components.

Additionally, controlling the device includes moving a video platform to maintain at least one of the remaining bounding boxes within a field of view of the video platform.

Finally, the present invention also includes a computer program product and a computer-implemented method. The computer program product includes computer-readable instructions stored on a non-transitory computer-readable medium that are executable by a computer having one or more processors such that, when the instructions are executed, the one or more processors perform the operations listed herein. Alternatively, a computer-implemented method includes acts that cause a computer to execute such instructions and perform the resulting operations.

Drawings

The objects, features and advantages of the present invention will become apparent from the following detailed description of the various aspects of the invention, when taken in conjunction with the following drawings, in which:

FIG. 1 is a block diagram illustrating components of a system in accordance with various embodiments of the invention;

FIG. 2 is a diagram of a computer program product embodying an aspect of the present invention;

FIG. 3 is a flow diagram illustrating the relationship between variables and arrays during preparation according to various embodiments of the invention;

FIG. 4 is a diagram of a search block for finding pixel marker values, according to various embodiments of the present invention;

FIG. 5 is a flow diagram illustrating a search/tag process according to various embodiments of the invention;

FIG. 6A is a diagram illustrating partial images and corresponding markers according to various embodiments of the invention;

FIG. 6B is a diagram illustrating partial images and corresponding markers according to various embodiments of the invention;

FIG. 6C is a diagram of a full image and corresponding markers as partially shown in FIGS. 6A and 6B, in accordance with various embodiments of the present invention;

FIG. 7 is a flow diagram illustrating merging regions according to various embodiments of the invention;

FIG. 8 is a flow diagram illustrating state transitions in accordance with various embodiments of the invention;

FIG. 9A is a flow diagram illustrating State 1 according to various embodiments of the present invention;

FIG. 9B is an example of state 2 code in accordance with various embodiments of the invention;

FIG. 10 is a flow diagram illustrating an incrementer according to various embodiments of the invention;

FIG. 11 is a flow diagram illustrating State 2 according to various embodiments of the present invention;

FIG. 12 is a diagram of a current marking module, according to various embodiments of the present invention;

FIG. 13 is a flow diagram illustrating State 3 according to various embodiments of the present invention;

FIG. 14 is a flow chart illustrating state 4, state 5, and state 6 according to various embodiments of the invention;

FIG. 15 is a flow diagram illustrating state 7 and callback operations according to various embodiments of the invention;

FIG. 16 is a flow diagram illustrating state 7 and sort operations according to various embodiments of the present invention;

FIG. 17 is a flow diagram illustrating a State 7 and sort operation and sort module according to various embodiments of the present invention;

FIG. 18 is a diagram illustrating an example input image in which each pixel location has a one-bit value, according to various embodiments of the invention;

FIG. 19 is a diagram illustrating an image with a bounding box resulting after bounding box processing and filtering in accordance with various embodiments of the present invention; and

FIG. 20 is a block diagram illustrating control of a device according to various embodiments.

Detailed Description

The present invention relates to an image processing system, and more particularly, to a system for generating a bounding box in an image for image processing. The following description is presented to enable any person skilled in the art to make and use the invention and is incorporated in the context of a particular application. Various modifications, as well as a variety of uses in different applications will be readily apparent to those skilled in the art, and the general principles defined herein may be applied to a wide variety of aspects. Thus, the present invention is not intended to be limited to the aspects shown, but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

In the following detailed description, numerous specific details are set forth in order to provide a more thorough understanding of the invention. However, it will be apparent to one skilled in the art that the present invention may be practiced without limitation to these specific details. In other instances, well-known structures and devices are shown in block diagram form, rather than in detail, in order to avoid obscuring the present invention.

The reader is directed to all papers and documents which are filed concurrently with this specification and which are open to public inspection with this specification, and the contents of all such papers and documents are incorporated herein by reference. All the features disclosed in this specification (including any accompanying claims, abstract and drawings) may be replaced by alternative features serving the same, equivalent or similar purpose, unless expressly stated otherwise. Thus, unless expressly stated otherwise, each feature disclosed is one example only of a generic series of equivalent or similar features.

Furthermore, any element in the claims that does not explicitly recite "means" or "step" to perform a specified function is not to be construed as an "means" or "step" clause specified in 35u.s.c. section 112, clause 6. In particular, the use of "step …" or "action …" in the claims herein is not intended to refer to the provisions of section 6, section 112, 35u.s.c.

Before describing the present invention in detail, a description of various principal aspects of the present invention is first provided. The introductory portion, then, provides the reader with a general understanding of the present invention. Finally, specific details of various embodiments of the invention are provided to give an understanding of the specific aspects.

(1) Main aspects of the invention

Various embodiments of the present invention include three "primary" aspects. A first aspect is a system for image processing. The system is typically in the form of computer system operating software or in the form of a "hard coded" instruction set or as a Field Programmable Gate Array (FPGA). The system may be incorporated into a wide variety of devices that provide different functions. The second main aspect is a method, usually in the form of software, run using a data processing system (computer). A third broad aspect is a computer program product. The computer program product generally represents computer readable instructions stored on a non-transitory computer readable medium such as an optical storage device (e.g., a Compact Disc (CD) or a Digital Versatile Disc (DVD)) or a magnetic storage device (e.g., a floppy disk or a magnetic tape). Other non-limiting examples of computer readable media include hard disks, Read Only Memories (ROMs), and flash-type memories. These aspects will be described in more detail below.

FIG. 1 provides a block diagram illustrating an example of the system of the present invention (i.e., computer system 100). The computer system 100 is configured to perform calculations, processes, operations, and/or functions associated with a program or algorithm. In one aspect, certain processes and steps discussed herein are implemented as a series of instructions (e.g., a software program) residing in a computer readable memory unit and executed by one or more processors of the computer system 100. When executed, the instructions cause the computer system 100 to perform particular actions and exhibit particular behaviors, such as those described herein.

Computer system 100 may include an address/data bus 102 configured to communicate information. In addition, one or more data processing units, such as a processor 104 (or multiple processors), are coupled to the address/data bus 102. The processor 104 is configured to process information and instructions. In one aspect, the processor 104 is a microprocessor. Alternatively, the processor 104 may be a different type of processor, such as a parallel processor, an Application Specific Integrated Circuit (ASIC), a Programmable Logic Array (PLA), a Complex Programmable Logic Device (CPLD), or a Field Programmable Gate Array (FPGA), configured to perform the operations described herein.

Computer system 100 is configured to utilize one or more data storage units. The computer system 100 may include a volatile memory unit 106 (e.g., random access memory ("RAM"), static RAM, dynamic RAM, etc.) coupled to the address/data bus 102, wherein the volatile memory unit 106 is configured to store information and instructions for the processor 104. The computer system 100 may also include a non-volatile memory unit 108 (e.g., read only memory ("ROM"), programmable ROM ("PROM"), erasable programmable ROM ("EPROM"), electrically erasable programmable ROM ("EEPROM"), flash memory, etc.) coupled to the address/data bus 102, wherein the non-volatile memory unit 108 is configured to store static information and instructions for the processor 104. Alternatively, the computer system 100 may execute instructions retrieved from an online data storage unit, such as in "cloud" computing. In one aspect, computer system 100 may also include one or more interfaces (such as interface 110) coupled to address/data bus 102. The one or more interfaces are configured to enable the computer system 100 to interface with other electronic devices and computer systems. The communication interfaces implemented by the one or more interfaces may include wired communication techniques (e.g., serial cable, modem, network adapter, etc.) and/or wireless communication techniques (e.g., wireless modem, wireless network adapter, etc.).

In one aspect, computer system 100 may include an input device 112 coupled to address/data bus 102, wherein input device 112 is configured to communicate information and command selections to processor 100. According to one aspect, the input device 112 is an alphanumeric input device (such as a keyboard) that may include alphanumeric and/or function keys. Alternatively, input device 112 may be an input device other than an alphanumeric input device. In one aspect, the computer system 100 may include a cursor control device 114 coupled with the address/data bus 102, wherein the cursor control device 114 is configured to communicate user input information and/or command selections to the processor 100. In one aspect, cursor control device 114 is implemented using a device such as a mouse, trackball, trackpad, optical tracking device, or touch screen. Nonetheless, in one aspect, cursor control device 114 is directed and/or activated via input from input device 112, such as in response to using special keys and key sequence commands associated with input device 112. In an alternative aspect, cursor control device 114 is configured to be guided or directed by voice commands.

In one aspect, computer system 100 may also include one or more optional computer usable data storage devices (such as storage device 116) coupled to address/data bus 102. Storage device 116 is configured to store information and/or computer-executable instructions. In one aspect, the storage device 116 is a storage device such as a magnetic or optical disk drive (e.g., a hard disk drive ("HDD"), a floppy disk, a compact disk read only memory ("CD-ROM"), a digital versatile disk ("DVD")). In accordance with one aspect, a display device 118 is coupled with the address/data bus 102, wherein the display device 118 is configured to display video and/or graphics. In one aspect, display device 118 may include a cathode ray tube ("CRT"), a liquid crystal display ("LCD"), a field emission display ("FED"), a plasma display, or any other display device suitable for displaying video and/or graphical images, as well as alphanumeric characters recognizable to a user.

The computer system 100 presented herein is an example computing environment in accordance with an aspect. However, non-limiting examples of computer system 100 are not strictly limited to a computer system. For example, one aspect provides that computer system 100 represents a type of data processing analysis that can be used in accordance with various aspects described herein. Other computing systems may also be implemented. Indeed, the spirit and scope of the present technology is not limited to any single data processing environment. Thus, in one aspect, computer-executable instructions, such as program modules, executed by a computer are used to control or implement one or more operations of various aspects of the present technology. In one implementation, such program modules include routines, programs, objects, components, and/or data structures that are configured to perform particular tasks or implement particular abstract data types. In addition, one aspect provides for implementing one or more aspects of the technology through the use of one or more distributed computing environments, such as where tasks are performed by remote processing devices that are linked through a communications network, or such as where various program modules are located in both local and remote computer storage media including memory-storage devices.

FIG. 2 illustrates a diagram of a computer program product (i.e., a storage device) embodying the present invention. The computer program product is shown as a floppy disk 200 or an optical disk 202 such as a CD or DVD. However, as previously mentioned, the computer program product generally represents computer readable instructions stored on any compatible non-transitory computer readable medium. The term "instructions" as used in relation to the present invention generally indicates a set of operations to be performed on a computer and may represent a fragment of an entire program or a single separable software module. Non-limiting examples of "instructions" include computer program code (source code or object code) and "hard-coded" electronic devices (i.e., computer operations encoded into a computer chip). "instructions" are stored on any non-transitory computer readable medium, such as on a floppy disk, CD-ROM, and flash drive or in the memory of a computer. In either case, the instructions are encoded on a non-transitory computer readable medium.

(2) Introduction to

The present disclosure provides a system and corresponding hardware implementation for bounding box generation for an image processing pipeline. In various aspects, the system is implemented on a Field Programmable Gate Array (FPGA) that receives a binary image, detects object pixels from the binary image, and generates bounding boxes around connected components. The system implements a connected component labeling method for grouping consecutive pixels found throughout an image, and then merging the connected pixels to create a unique boxed location. These unique boxes are saved as a single unit containing bounding box coordinates and a count of the contained object pixels. The system also calculates an ordering score based on the height and width of the bounding box and the number of object pixels contained, which is used for subsequent filtering of the object based on size and aspect ratio. This processing is intended to provide this bounding box information while minimizing FPGA resources and achieving sufficient throughput to keep up with the required input image frame rate (e.g., 30 frames per second).

In performing connected component labeling of a binary image, the system also records bounding box coordinates and the number of object pixels detected for each bounding box simultaneously. This additional information is used for subsequent sorting and filtering of objects based on size and aspect ratio and is collected without significant additional computational time and hardware resources. In addition, the design of the present invention is optimized to minimize both FPGA utilization and computation time. This advantage allows the present invention to be used as part of a small size, light weight, and low power (SWAP) image processing pipeline running at high image frame rates (e.g., > 30 frames per second), such as that described in U.S. application No.15/272,247. The use process of the present disclosure will better simplify the image processing pipeline by reducing the necessary computations for a particular object detected over the entire image.

The systems and processes described herein may be implemented as a key component of a low-SWAP image processing pipeline. Furthermore, the system and process may be applied to a variety of implementations (including unmanned autonomous vehicles and platforms with severely limited SWAP). By quickly detecting task-related targets and obstacles on hardware in the vicinity of the sensor, the present invention can improve task responsiveness and reduce the amount of raw sensor data that must be transmitted over constrained communication bandwidths. Further, the system and process may be used for both active safety and autonomous driving applications. By performing object detection in low-power, low-cost hardware near the camera, the automobile may detect obstacles on the road more quickly and robustly, providing a more timely warning to the driver or a more rapid automatic response to the obstacle in the autonomous vehicle. Additional details are provided below.

(3) Details of various embodiments

(3.1) introduction of bounding Box

The bounding box is the following method: the system accepts a matrix of single bit data as an input image by this method, which the system uses as a basis for creating an array of frames. Each "box" will contain the coordinates of two x-positions, two y-positions, and the valid pixel count. As one example, a set of box data from the bounding box array would contain an x-position (Xmin-100, Xmax-150) and a y-position (Ymin-80, Ymax-90), with a pixel count of 70. This would be a "box" 10 pixels high and 50 pixels wide containing 70 active pixels. Other sets of pixels are placed in the box and assigned their respective x-positions, y-positions, and valid pixel counts. Separating the frames is the proximity and separation of the active pixels. This distinction is further described below with respect to software implementations and hardware implementations, respectively.

(3.2) software bounding Box design in Matlab

Any suitable software product may be used to implement the bounding box processing. As one example, the bounding box is implemented in Matlab. The software-based bounding box design can be defined into 3 different parts: prepare, search/mark, and merge regions. The various parts will later be converted to be implemented on a hardware design.

(3.2.1) preparation

The preparation process is a simple instantiation and initialization of variables and arrays to their default values. The initialized variables will be the region count, previous y-position and current y-position. The initialized arrays will be "Image", "Labeled Image", "Merge to Region", and "Bounding Box Data". The region count will be used as the "ticket" value for the marker pixel found. The previous/current Y position is used as a reference to reduce the size of the marked image matrix. The reduced overall size of the marker image allows digital circuits to use less hardware resources when implementing the bounding box approach in such circuits. Since the marking algorithm search pattern uses previous positions, the larger image size must be accommodated by placing additional blank pixels around the image, increasing the size of the width and height by 2. The height of the mark image is 2 and the width is set to the image width. The merge-to-region is a set of single-dimensional arrays of length that counts for the largest region, which is a set limited to how many tokens are maximally expected to be distributed. This limitation is to reflect the limited resources used during digital implementation. The Bounding Box Data (Bounding Box Data) is two-dimensional, 5 parts wide: two x-positions, two y-positions, and a valid pixel count. The bounding box data height is set to the maximum region count.

For example, FIG. 3 shows the relationships between various matrices and variables during the instantiation of the preparation process of the various matrices and variables. Now that the variable has been created, the variable must be initialized to the correct starting value. For example, the system launches an array 316 of multi-dimensional images (i.e., sizes) plus pixel boundaries (i.e., blank pixels that fill multiple dimensions) to form an image 318. All values incorporated into both the region 300 and the marker image 302 have been initialized to the maximumLarge area count 304. The marker image 302 continues to change 312 values so that Y_{Previously described}(Y Previous) is set to 0 and Y_{At present}(Y Current) is set to 1. The minimum x-value and minimum y-value 314 of the bounding box data 306 are sent to the maximum size 308 of the image width and image height, respectively. Y is_{Previously described}Is set to 1, and Y_{At present}Is set to 2. Thus, the values in the above matrix are initialized to non-0 values. The last thing to initialize is a Region Count (Region Count)310 set to 0. This will keep track of how many tokens were generated, which will prevent processing from exceeding the maximum array size.

(3.2.2) search/tag

After the preparation process, the system continues with the search/tag process, as shown in FIG. 5. As indicated by the title, the system will search the image and mark the pixels found (from top to bottom of the image, or any other predetermined order). A pixel is a binary number in the range of 0 or 1, so that a "found" pixel is a pixel having a value of one. Fig. 4 shows an example of how a search may be performed. Using the current pixel position in the image as (X, Y), the system first checks to see if (X, Y) has a valid pixel. If so, the system continues to check whether (X-1, Y-1), (X +1, Y-1), and/or (X-1, Y) have been flagged. As shown in fig. 5, the process continues until the image is searched through 500 (e.g., to its bottom or top, etc.), at which point the search/tagging is completed 502. Alternatively, assume that this is the first pixel found; thus, no location currently has a marker (e.g., the image has not been read to its edge 504, and there is a valid pixel 506 at (x, y), and no adjacent pixels have been marked 508). If this is the case, the pixel should be marked 508 with an area count of 1 and the area count 510 is incremented. Using the region count, the system indexes the bounding box data array storing the pixel count to a value of 1 and stores its minimum/maximum value using the current x-position, the current y-position. Since the size of the found box is only one pixel, the maximum and minimum of the x-position and y-position will be equal. Now, assume that the next pixel is also valid. This creates (X-1, Y-1), (X +1, Y-1)And/or the condition that (X-1, Y) has been labeled. The system then compares the labels of the various neighbors to find the lowest Label position and refers to it as the "Current Label" 512. To do so, the process indexes the tab image using the previous Y and the current Y; however, using (X-1), (X +1) moves in another dimension (i.e., in the X-dimension rather than the y-dimension). It is contemplated that, in this example, by making (X-1, Y)_{At present}) The lowest is the value "1" and the marker image array is loaded with the maximum value for that region. Then, a value of "1" is assigned to the label of the pixel. After completing the evaluation of one pixel location, the system implements block 526, adding the "X" index to move closer to the edge of the image. Further, after reaching the edge of the image as shown in block 504, the system begins implementing block 524, which will increase the "Y" dimension and swap Y_{At present}Value and Y_{Previously described}The value is obtained. Please remember, Y_{At present}And Y_{Previously described}For holding arrays of small-mark images, Y_{At present}And Y_{Previously described}The exchange will leave the data needed for one evaluation while opening a new set to be overwritten.

The process then continues to determine whether the bounding box data has been updated by comparing the new data "(X, Y)" to the stored maximum/minimum X and Y values. If the stored maximum/minimum X and Y values are less than/greater than the new data, the system will update the bounding box data with the new X and Y locations. Using the example above, the system must update 514 some values in the bounding box data by increasing Xmax, because connected pixels increase the box size by 1 in the x-direction, and the pixel count will increase by 1.

After updating the bounding box data, note that the invisible pixels are merged later, because the process may not have an opportunity to re-label these pixels in the current scan. As described below, this concept will become more meaningful in the case of more complex examples. To record the merge, consider first identifying 516 which neighbors have valid pixels. Any neighbor with a valid pixel will be compared to the current marker to check its merging into region location 518 to find the minimum marker. Then, the smallest of the two is stored backWhich is also merged into region location 520. In this example, the search is ended by indexing the merge into regions with the marker for the previous pixel location and then storing the value of the current marker. Finally, if the current pixel is invalid 522, the tag image array passes y_{At present}Is indexed and x is set to the maximum value of +1 for the region count.

(3.2.3) merging regions

To understand the merge area process, it is helpful to view and understand the more complex "search/mark" example. For example, assume that the system is processing an image as shown in FIG. 6C. Assuming that the system is scanning the pixel image shown in fig. 6C, the system will gradually "see" the image from fig. 6A-6B-6C. Note that in fig. 6A and 6B, the image contains separate components during scanning, so that one would not know that the two separate components are connected. In this case, as an example, it is assumed that the left side has a mark 1 and the right side has a mark 2. Only when a pixel bridges between two sides is it known that these two sides are part of the same image and should be marked accordingly; however, the system/process cannot change the information found in tag 2, because it is not known whether the components are connected, nor how many components are connected at this time. To solve this problem, merging into the area array sets position 2 with a value of 1. Later, the change will be used to convert all the mark positions of 2 to the mark position of 1.

In the full image and as shown in fig. 6C, line 600 may be used to cut or otherwise segment the image, thereby showing that in this example, a and C will contain all those components labeled 1. B will contain all those components that are labeled 2 and still labeled 2 in this example. D will contain all those components that would have been labeled 2 but now instead labeled 1.

The final step is the merge region process, as shown in FIG. 7. In this step, the update reminders found in the merge into the zone array are placed in the appropriate bounding box. Before the section begins, a new array is created 700 that will track the valid bounding box data. The valid data array will start with the area count for the "true" entry and the maximum area count (having the same length as the bounding box data) for any of the above entries that are "false". The stored bounding box data values may be merged into a central location, thereby invalidating one or more sets of data. Since it is known that markers are assigned from fractional to large numbers, the system starts by counting down to 1 in the for loop by tracing back the bounding box data from the current region count (previously used to count how many markers were assigned). The for loop is a process that is repeated cyclically until completion. There are different types of these loops, but typically a for loop performs some action until an end condition is reached. Typically either count down or count up in the loop to reach the condition of ending the process and exiting the loop.

The necessity of an update is determined by looking at the current index of the for loop and comparing it to the merge area 702 indexed by the same value. If connectivity to a different smaller tag is found at some time during the search/tag phase, the array merged into region 702 should be updated 704 to a lower value than the index. If this never happens, then merging into regions will naturally have a larger number from the original preparation phase. Thus, if the index for the loop is greater than the information stored in the merged region with that same index, the system needs to update 706 the bounding box data to include the newly found information. The bounding box data will contain Xmin, Ymin, Xmax, Ymax, and pixel count information; thus, to update 706, the process continues to look at the bounding box data in both index locations (i.e., (1) the currently indexed bounding box data and (2) the bounding box data stored in the merged region with the same index). Using both positions; comparing the minimum values to see which has the smaller value, comparing the maximum values to see which has the larger value, and combining the pixel count values. The information is then stored back into the bounding box data indexed by the merge into region value indexed by the for loop that will contain the lower label. Once the array reaches the lower index 708, the bounding box data is fully updated at that location with most of the current information about the minimum count value, the maximum count value, and the pixel count value. This process is repeated until the region count is one 710, which is the lowest marker and no further locations are possible to merge into. At this point, the merge area process terminates 712. The output will now be the value stored as bounding box data with a validation array that identifies which portions of the bounding box data contain boxes with regions that are not merged or have the most recent merging information.

(3.3) hardware bounding Box implementation

As described above, the present disclosure also provides a digital hardware implementation for generating bounding boxes. Creating a digital hardware implementation requires reducing the bounding box to some known value range. For illustrative purposes, implementations are described with respect to an image that is 512 pixels wide and 256 pixels long, where each pixel comprises a one-bit value. The design will be controlled from an external module so that the bounding box module will receive the activation signal and need to allow indexing of the necessary image positions as requested. To meet other specifications of image processing design, additional filtering is also implemented and reduces the provided results to the top 15-ordered boxes. Finding and storing all bounding boxes in memory; however, this module will specifically provide 15 bounding boxes, which are ordered in a manner discussed in further detail below.

As in software design, the hardware can be generalized into three phases, namely preparation, search/mark, and merge regions. However, in this hardware implementation, there is an additional phase known as callback and sequencing. Transitions in hardware also require that functions be completed on a specific clock cycle. In this case, the algorithm has been decomposed into different states. In addition, to reduce the hardware burden of using many flip-flops, large bounding box arrays are stored in Block Random Access Memory (BRAM). A flip-flop is a type of register that stores bits. This trigger can be found in the construction of the FPGA and can be put into a BRAM (which is another component in the FPGA) in order to reduce the amount of storage required to create the data. This further requires decomposing the algorithm into multiple states, some of which are used to hide the index and receive information from the BRAM.

(3.3.1) preparation

The hardware preparation part translates to instantiation of variables and some initialization required in the algorithm. As discussed previously and as shown in fig. 8, the limitations of the hardware require marking the size of the variables. This process is illustrated in FIG. 8, where some of the blocks represent arrays 800 and the remaining blocks represent values. The maximum limit is the number of tokens that can be provided. In one example, the number of tokens is reduced to 256, which in turn will set the range of many arrays. As described previously, the marker image array 802 is included for the input image 801, in this example the height of the input image 801 would be 255 dimensions and the width 511 (width of image) (dimension). The data contained will have a maximum region count 804 of 255 at the maximum, meaning that a size of 8 bits can represent these ranges. The width of the merge to region 806 would be 256, with a value as large as 255 (which means a size of 8 bits). Empirically, any term on width would be as large as 511 (requiring 9 bits) and height would be as large as 255 (requiring 8 bits). The width of a pixel rendered for use is 1 bit. Since the size of BRAM808 is as long as the largest mark (which is 256), the width of the write and read addresses will be 8 bits. The BRAM will contain all information 810 from the bounding box BRAM array 808. As in the software case described above, the contained information 810 would be the information of Xmax, Ymax, Xmin, Ymin, and pixel count. However, in addition to this, the hardware implementation also includes Xsize and Ysize, which are simple (Xmax-Xmin or Ymax-Ymin) calculations.

Just as in software, all variables must be initialized. Variables may be initialized in reset and state 0. In the reset, the marker image and all values incorporated into the region 806 are set to 255, which would be the maximum value marker, with all other values set to 0. Since the content contained in BRAM is not known, bounding box BRAM808 will not be set to any value. Instead, it is desirable to track the location of the write pointer to know which portion of the BRAM is valid. In the state 0812, when a start command is given, the mark image and merge area value is set to 255, the state is set to 1, Y_{Previously described}Is set to 0, Y_{At present}Is set to 1 and all other values are set to 0. As a reminder, the digital implementation indexes the first line starting from 0 instead of 1, which was previously done in software implementations.

(3.3.2) search/tag

Along with the software, a search/tag section follows. Since the bounding box is located in the BRAM, the search/mark function is split to allow reading of the BRAM. Given this constraint, the search/tag software design can be further divided into different state sections to take advantage of clock delays. Thus, the search/tag is divided into three states by a special incrementer stage.

As shown in fig. 9A and 10, state 1 will contain conditions for finding a new pixel or not finding a pixel at the current search position. If a hard stop condition 900 is not present, and if a currently active pixel is found 902, then it is desirable to determine if any of the neighbors of the pixel (as shown in FIG. 4) have an active pixel 904. If none of the neighbors have valid pixels, the number of marked images and the bounding box BRAM are updated 906 and incrementer 908 is activated 918. A write command to the bounding box BRAM will take one clock cycle, but since the process increments the write address (via incrementer 908) to never overwrite the region in BRAM, and state 1 does not read BRAM, the process can return to state 1910 without any problem, thereby not requiring an unnecessary wait state. To further understand, the FPGA/hardware consists of processes that run in parallel every clock cycle, and actions will be acknowledged at the end of each clock cycle. It should also be noted that BRAM operates with some clock cycle delay. The BRAM is the location where the system is to write and read. Considering that the FPGA acknowledges actions at the end of a clock cycle and there is a delay in the BRAM, it is desirable not to incorrectly access the BRAM until the write operation is complete. In other words, the system does not attempt to read the location that is currently being written, but should only read the location after the write delay has ended. Note also that state 1 does not overwrite a location or require a read, it only writes if the process returns to state 1910. This will hide the write clock cycle so that the BRAM is ready and there is no need to add a wait clock to this state.

If there is a valid pixel in the neighboring pixels 904, move to state 2912. Unique to the digital implementation is the functionality of the hard stop 900 and the incrementer 908. Incrementer 908 will act as a for loop, moving the current pixel and requesting the next set of pixel values. Once the entire image has been read, the incrementer 908 moves the state machine to state 4914 to begin merging region portions. However, there is a chance of overflow as too many markers are given, so state 1 implements a hard stop 900 to see when the process is less than the maximum marker range. If so, then the search for images need not continue, but rather the process proceeds to state 4916 to begin merging region portions. If the system is not successful in achieving a hard stop and no valid pixel is found, the marker image must reset the data stored at that pixel location to the maximum region count 920. This will ensure that when the system compares the marker image positions (see fig. 12), the lowest position is the most recent and contains only valid data relating to that part of the image.

Further, FIG. 10 illustrates the incrementer 908 process, showing the decision between proceeding to state 1910 or state 4914. After activating the incrementer 908, if the process has not read an edge 1000 of the image, the system increments the "X" index to move closer to the edge 1002 of the image and proceeds to state 1910. Alternatively, if the process has read the edge 1000 of the image, the "X" index is reset to 11004 and it is determined whether the process is at the end of the image 1006. If not, the "Y" index is incremented to move closer to the end of the image and "Y" is swapped_{At present}"and" Y_{Previously described}"1008 and proceeds to state 1910. Alternatively, if so, then "Y" will be_{At present}"and" Y_{Previously described}Reset to its initial value and lock the valid region count to the current value of the region count 1010 and proceed to state 4914.

As shown in FIG. 11, State 2 uses another module referred to as the current tagged module 1100, which current tagged module 1100 is shown in further detail in FIG. 12. Here, the clock cycle delay is used to perform the current tag assignment 1112 in preparation for this phase (note the discussion above regarding FPGA clock cycles and BRAM read/write delays). Referring again to fig. 11, since the bounding box BRAM1102 will take one clock cycle to read, a read signal must be sent with the current tag 1104 to read the address containing the data to be combined. The BRAM needs a signal as well as an address to indicate that the process is to read. The "current flag" 1102 will be the read address. Note the above comments regarding the functionality of the "search/mark" section where the system combines marks for later incorporation.

State 2 will only contain the portion where the settings are merged into the region. State 2 exists to set the "merge to area" 1108 array with data that will be used later during the state 3a 1110 and "merge area" phases.

Just as in software, the current marker 1104 is compared 1106 to the content stored in the merged region to update based on the valid neighboring pixels 1114 for which the system compares the value found in the "merged region" array to the lowest marker value. The lower "assign" flag will be the value stored in the "merge to area" array. Since the marks are provided in a consecutive order, the lower mark should be a reference for subsequent use in merging region portions. See, e.g., FIG. 9B, which illustrates the state 2 code, and FIG. 12, which illustrates the current flag module 1102.

State 3 shown in fig. 13 covers the final part of the search/mark phase. In state 3, the bounding box BRAM1102 is updated. However, since a read is sent in state 2, the data is only valid after one clock cycle. Therefore, state 3 is referred to as two parts. State 3a 1300 will wait one clock cycle to obtain valid data from BRAM1102 and then state 3B 1302 will update the bounding box BRAM 1102. Assuming that data is received within the second period, an update to the bounding box BRAM1102 may be performed. The updating comprises combining data read from the calculated current mark position with the current information. Just as in software, it is desirable to compare the maximum and minimum values of the x and y positions and then set the larger and smaller values back into the bounding box BRAM 1102. It is also desirable to increase the pixel count to a larger value for the new pixel found. The digital implementation also adds another part to calculate the size in the X and Y directions. This will later be used as a filter for unneeded bounding boxes. Another addition is to filter out unwanted boxes that do not meet a particular range 1308 of pixel sizes. The lower and upper boundary pixel counts are stored by the verification bounding box BRAM1102 (where the pixel counts are within a valid pixel range 1310 that is designated as valid and the pixel counts outside the valid pixel range are designated as invalid 1312). Thus, the active area array is used to determine which information stored in the BRAM should be tested at a later stage by assigning a value of "1" to the address that holds the active data set. The active area array will be cycled through later to see if the system should read the data stored in the BRAM from that address location during the callback and sequencing phase. Finally, the system activates incrementer 908 and sends process 1304 to state 1. However, if incrementer 908 detects that the process is at the last pixel location, it instead sends process 1306 back to state 4, as shown in FIG. 10.

As shown in fig. 14, states 4, 5 and 6 cover the merge region phase. In state 4, the system searches in the merge into region 1402 by starting at region count 1400 and then counting down to determine if the merge data is needed. If so, then two addresses (region count 1400 and merge to region [ region count ]1402) from bounding box BRAM1102 are needed and the system writes back to bounding box BRAM 1002 location. If not, the system continues searching in the merged regions by decreasing the region count 1422 and decreasing the valid region count 1420 based on the data created during state 614261430 and eventually implemented by 1418.

Because of the BRAM1102 read, the process must return to state 4 to cover the one clock cycle delay of the read and request a different address. For clarity, state 4 is divided into two parts, state 4a1404 for determining whether consolidation is required and state 4b 1406, which includes clock cycles waiting and reading for the next address.

State 51408 is very simple because BRAM1102 is known to have valid data from state 4a 1404. Therefore, the data that has arrived must be saved 1410 so that it can be used later for comparison with the address read in state 4b 1406.

The state 61412 will then compare 1414 the two sets of data received read from the two bounding boxes BRAM1102 and then store the consolidated information back into the bounding box BRAM1102 with the lower address accordingly. Since the write will occur during state 4a1404, the system will have written correctly before activating the read of the next address. Reaching state 6 indicates that the current region count is to be merged, so the system will invalidate the region for the valid region array and decrease the valid region count 1424. As described above, a filter can be added to filter out regions (unneeded blocks) that do not satisfy a particular range 1426 of pixel counts that are within a valid pixel range 1428 that is designated as valid, while pixel counts outside the valid pixel range are designated as invalid 1430.

Unique to this implementation are the additional callbacks and sequencing phases carried by state 71416. Specifically, and as shown in fig. 15 and 16, state 7 focuses on callback operations and sort operations, respectively. As shown in the callback operation of fig. 15, this phase will call back all valid information from BRAM1102 until the process reads all valid regions that have been stored. Based on the previous state, the process always counts how many locations in the bounding box BRAM1102 are still valid after filtering and keeps track of the addresses using the valid array. Thus, to call back all valid locations, the valid array is used to check whether the region in BRAM1102 has valid data, and then decrement the valid data count. On a read, the BRAM has two clock cycles delayed. Processing moves from state 4a to state 7 by starting a read 1508 from the BRAM at the valid region 1510 at address zero 1512, and by forcing a one clock cycle wait 1514 before returning to state 71516. The idea is to constantly read addresses from the bounding box BRAM1102 so that the process is only one clock cycle 1500 later than each read. The state 71416 will be filtered based on whether the read from the bounding box BRAM1102 is valid 1502 and additional filtering is satisfied. A delay clock cycle is included at 1504 to account for the additional filtering delay before returning to state 01506. Block 1518 shows the end of the search for valid regions, so the system must be forced to stop reading addresses seen in 1520 continuously.

During states 3 and 6, the pixel count valid range is used as an initial condition for verifying the bounding box BRAM1102 data. Now, it is desirable to filter out "boxes" of odd shape by comparing Xsize (Xmax-Xmin) with Ysize (Ymax-Ymin). These boxes are invalid if Xsize/Ysize or Ysize/Xsize < ═ 30% (or any other predetermined value). In addition to this, it is desirable to find a box that is filled with pixel blobs, thereby covering a good area of the found box. Therefore, the system also filters by checking to see (pixel count)/Xsize × Ysize < ═ 30% (or any other predetermined value) and setting these positions to invalid. If the data has passed through all filters, it is marked as valid sorted and passed to the "sort" section. Only those valid boxes are sorted and saved locally for use by other modules. Due to the delay in ordering, state 7 may complete 8 clock cycles after the last time valid data is read from BRAM 1102. Thus, for example, the process waits 8 clock cycles before determining that the bounding box has completed and is currently being saved. Since the sorting is partially done in another module, the value will be reset, which is a transition from state 0 to state 1.

As shown in fig. 16, the sorting is done in state 7 and in a further sorting module 1600. The valid bram reads 1508 and the bounding box data 1102 are filtered 1604 to identify valid orderings. In addition, the result of the read and the effective ordering have one

clock delay

1606, 1608 added. If the data is found to be a valid sort 1602, it is further sorted by a sort module 1600.

For further understanding, FIG. 17 shows a flow diagram for state 7, which focuses on the sorting operation in each individual sorting module. The respective ordering module first begins by determining 1700 whether the ordering associated with the bounding box or region is valid or whether a reset ordering has been issued. If the ordering is an invalid ordering as determined by a valid ordering value of 0, the module will issue a "0" value that conveys an invalid ordering command and associated data. If the ordering is a reset ordering 1704, the system clears the data stored in the ordering 1706. The flush data 1706 stored in the sorting refers to sorting data stored locally. This ordering data will later hold the values found in the callback BRAM reads, which ultimately store locally the largest of the pixel counts, xmax, xmin, ymax, ymin, and (Xsize-xmax-xmin and Ysize-ymax-ymin). The sort modules each create a sort number by dividing the pixel count by the maximum between Xsize and Ysize. The sort numbers and valid sort are passed 1708 between sort modules so that higher sort numbers remain at the top while lower sort numbers fall off to open the sort or leave the save area entirely. This process is done by first determining 1710 whether the incoming ordering is greater than the higher ordering of the currently stored data. If so, the incoming ordering is set 1712 to the higher ordering of the currently stored data, with the previously stored higher ordering data then set 1714 to the stored lower data, and 1716 is removed from the ordering module. If the incoming ordering is less than the higher ordering of the currently stored data, a determination 1718 is made as to whether the incoming ordering is greater than the lower ordering of the previously stored data. If not, the incoming sort data is removed 1722 from the sorting module. Alternatively, if the incoming ordering is greater than the lower ordering of the previously stored data, the incoming ordering data is set 1720 as the lower ordering stored data and passed out 1716 from the ordering module.

(3.3.3) hardware implementation of results

A simulation is performed in which the known image is subjected to the bounding box implementation described above. Following the algorithmic requirements and as shown in fig. 18, the size of the known image 1800 is 512 × 256 pixels, with a single bit pixel (i.e., one bit value for each pixel location). By passing the image 1800 through the filter, the system identifies 220 marker locations, which then merge into 182 unique bounding boxes. By sorting, the system filters the bounding box to leave only the top 15 "sorted" positions (as shown in FIG. 19). Thus, this demonstrates that the bounding box processing of the present disclosure is effective in recognized objects in an image and generating bounding boxes around such objects. Based on this, the bounding box processing described herein can be implemented on successive frames in a video image to function as an efficient and effective motion tracker in any desired setting.

And (3.4) controlling the equipment.

As shown in fig. 20, the processor 2000 may be used to control a device 2002 (e.g., mobile device display, virtual reality display, augmented reality display, computer monitor, motor, machine, drone, camera, etc.) based on bounding box generation. The controls of device 2002 may be used to translate the location of an object into a still image or video representing the object. In other implementations, the device 2002 may be controlled based on authentication and location to cause the device to move or otherwise initiate a physical action.

In some embodiments, a drone or other autonomous vehicle may be controlled to move to an area where the location of the object is determined based on the image. In still other embodiments, the camera may be controlled to track the identified object by maintaining a moving bounding box within the field of view. In other words, the actuator or motor is activated to move the camera (or sensor) to keep the bounding box within the field of view so that an operator or other system can identify and track the object. As yet another example, the apparatus may be an autonomous vehicle, such as an Unmanned Aerial Vehicle (UAV), that includes a camera and the bounding box design described herein. In operation and when the bounding box is generated by a system implemented in the UAV, the UAV may be manipulated to follow the object such that the bounding box remains within a field of view of the UAV. For example, rotors and other components of the UAV are actuated to cause the UAV to track and follow objects.

Finally, while the invention has been described in terms of various embodiments, those skilled in the art will readily recognize that the invention can have other applications in other environments. It should be noted that many embodiments and implementations are possible. Additionally, any term "means for …" is intended to induce an element and a device-plus-function interpretation of the claims, and any element not specifically used with the term "means for …" should not be interpreted as a device-plus-function element, even if the claims otherwise include the term "means. Further, although specific method steps have been set forth in a particular order, these method steps may be performed in any desired order and are within the scope of the present invention.

Claims

1. A system for bounding box generation, the system comprising:

a memory and one or more processors, the memory having executable instructions such that, when the instructions are executed, the one or more processors perform the following:

receiving an image comprised of pixels having a one-bit value per pixel;

generating a bounding box around connected components in the image, the connected components having pixel coordinates and pixel count information;

generating a ranking score for each bounding box based on the pixel coordinates and the pixel count information;

filtering the bounding box based on the pixel coordinates and the pixel count information to remove bounding boxes that exceed a predetermined size and a predetermined pixel count; and

filtering the bounding box to remove bounding boxes below a predetermined ranking score, resulting in remaining bounding boxes; and

controlling the device based on the remaining bounding box.

2. The system of claim 1, wherein the processor is a Field Programmable Gate Array (FPGA).

3. The system of claim 1, wherein generating the bounding box further comprises the operations of:

grouping consecutive pixels in the image; and

merging connected pixels into connected components, the bounding box being formed by a box enclosing the connected components.

4. The system of claim 1, wherein controlling the device comprises: moving a video platform to maintain at least one of the remaining bounding boxes within a field of view of the video platform.

5. A computer program product for bounding box generation, the computer program product comprising:

a non-transitory computer-readable medium having executable instructions encoded thereon such that, when executed by one or more processors, the one or more processors perform operations comprising:

receiving an image comprised of pixels having a one-bit value per pixel;

controlling the device based on the remaining bounding box.

6. The computer program product of claim 5, wherein the processor is a Field Programmable Gate Array (FPGA).

7. The computer program product of claim 5, wherein generating the bounding box further comprises:

grouping consecutive pixels in the image; and

8. The computer program product of claim 5, wherein controlling the device comprises: moving a video platform to maintain at least one of the remaining bounding boxes within a field of view of the video platform.

9. A computer-implemented method for bounding box generation, the method comprising the acts of:

causing one or more processors to execute instructions encoded on a non-transitory computer-readable medium such that, when executed, the one or more processors perform the following:

receiving an image comprised of pixels having a one-bit value per pixel;

controlling the device based on the remaining bounding box.

10. The method of claim 9, wherein the processor is a Field Programmable Gate Array (FPGA).

11. The method of claim 9, wherein generating the bounding box further comprises:

grouping consecutive pixels in the image; and

12. The method of claim 9, wherein controlling the device comprises: moving a video platform to maintain at least one of the remaining bounding boxes within a field of view of the video platform.