WO2020046524A1

WO2020046524A1 - Automatic feed pellet monitoring based on camera footage in an aquaculture environment

Info

Publication number: WO2020046524A1
Application number: PCT/US2019/044298
Authority: WO
Inventors: Bryton SHANG
Original assignee: Aquabyte, Inc.
Priority date: 2018-08-27
Filing date: 2019-07-31
Publication date: 2020-03-05

Abstract

The present invention may encompass automatically estimating the amount of feed pellets dispensed in a fish farming enclosure that are uneaten and providing spatio-temporal feedback based on the estimates that help fish farmers determine how much feed should be dispensed in the fish farming enclosure at a next feeding. For example, a web dashboard may be presented in a user interface to the fish farmers where the web dashboard presents information about or derived from the estimated amount of uneaten feed pellets.

Description

International Patent Application

for

AUTOMATIC FEED PELLET MONITORING BASED ON CAMERA FOOTAGE IN

AN AQUACULTURE ENVIRONMENT

TECHNICAL FIELD

[0001] The present disclosure is directed to automatic feed pellet monitoring based on camera footage in an aquaculture environment.

BACKGROUND

[0002] The growth rate of world human population is applying substantial pressure on the planet’s natural food resources. Aquaculture will play a significant part in feeding this growing human population.

[0003] Aquaculture is the farming of aquatic organisms (fish) in both coastal and inland areas involving interventions in the rearing process to enhance production. Aquaculture has experienced dramatic growth in recent years. The United Nations Food and Agriculture Organization estimates that aquaculture now accounts for half of the world’s fish that is used for food.

[0004] Feed is a significant cost of raising farmed fish. Thus, fish farm operators would appreciate technology that enables them to optimize feeding. Current approaches for optimizing feed are suboptimal and typically involves a human operator monitoring the feeding via a video monitor that is connected to an underwater camera immersed in the fish farming enclosure where the fish are feeding to determine how many of the dispensed feed pellets sink to the bottom of the fish farming enclosure and go uneaten. This is a very manual and tedious task. Moreover, it is often not very accurate because of inattentiveness to the video monitor. As a result, using current feed optimization techniques, fish farmers can waste as much as 5% of the feed or more that is dispensed.

[0005] The techniques herein address this and other issues.

[0006] The approaches described in this section are approaches that could be pursued, but not necessarily approaches that have been previously conceived or pursued. Therefore, unless otherwise indicated, it should not be assumed that any of the approaches described in this section qualify as prior art merely because of their inclusion in this section.

BRIEF DESCRIPTION OF THE DRAWINGS

[0007] FIG. l is a schematic diagram of an example aquaculture environment in which techniques for automatic pellet monitoring in aquaculture environment may be implemented. [0008] FIG. 2 is a flowchart of a process for detection and recognition of uneaten or partially eaten fish feed pellets in an aquaculture environment.

[0009] FIG. 3 depicts information flow for feed pellet monitoring.

[0010] FIG. 4 diagrams conventional and truss dimensions of a fish.

[0011] FIG. 5 depicts a biomass estimation sub-system of an image processing system.

[0012] FIG. 6 depicts basic computer hardware that may be used in an implementation.

DETAILED DESCRIPTION

[0013] In the following description, for the purposes of explanation, numerous specific details are set forth in order to provide a thorough understanding of the present invention. It will be apparent, however, that the present invention may be practiced without these specific details. In other instances, well-known structures and devices are shown in block diagram form in order to avoid unnecessarily obscuring the present invention.

AUTOMATIC PELLET MONITORING

[0014] A computer vision based on approach for feed pelleting monitoring in an aquaculture environment is disclosed. The approach utilizes deep machine learning to identify and count the number of uneaten pellets in a fish farming enclosure where fish are feeding. Based on the count, an estimate of the percentage of pelletized feed that is being wasted may be generated and reported to fish farmers. The estimate may also be input to a feed dispenser to automatically adjust the amount of feed that is dispensed in the fish farming enclosure so as to reduce waste of feed.

AQUACULTURE ENVIRONMENT FOR PELLET MONITORING

[0015] FIG. 1 is a schematic diagram of an aquaculture environment 100 for automatic pellet monitoring of pelletized feed 114 dispensed to fish 102 in a fish farming enclosure 104. The environment 100 includes a high-resolution, light sensitive, digital camera 106 within a waterproof housing immersed underwater in the fish farming enclosure 104.

[0016] In some implementations, camera 106 is an approximately l2-megapixel monochrome or color camera with a resolution of approximately 4096 pixels by 3000 pixels, and a frame rate of approximately 1 to 8 frames per second. Although different cameras with different capabilities including higher frame rates may be used according to the requirements of the particular implementation at hand. For example, various different lens filters and artificial lighting techniques may be used. Further, camera 106 can be a stereo camera or a monoscopic camera.

[0017] Selection of the camera lens for camera 106 may be based on an appropriate baseline and focal length to capture images of a fish swimming in front of a camera where the fish is close enough to the lenses for proper pixel resolution and feature detection in the captured image, but far enough away from the lenses such that the fish can fit in both the left and right frames. For example, 8-millimeter focal length lenses with high line pair count (lp/mm) can be used such that each of the pixels in a captured image the left and right images can be resolved. The baseline of the camera 106 may have greater variance such as, for example, within the range of 6 to 12-centimeter millimeter baseline.

[0018] The fish farming enclosure 104 may be a net pen framed by a plastic or steel cage that provides a substantially inverted conical, circular, or rectangular cage, or cage of other desired dimensions. The fish farming enclosure 104 may hold a number of fish of a particular type (e.g., salmon). The number of fish held may vary depending on a variety of factors such as the size of the fish farming enclosure 104 and the maximum stocking density of the particular fish caged. For example, a fish farming enclosure for salmon may be 50 meters in diameter, 20-50 meters deep, and hold up to approximately 200,000 salmon assuming a maximum stocking density of 10 to 25 kg/m3.

[0019] While in some implementations the techniques for automatic pelleting monitoring disclosed herein are applied to a sea pen environment such as a fish farming enclosure 104, the techniques are applied to other fish farming enclosures in other embodiments. For example, the techniques may be applied to fish farm ponds, tanks, or other like fish farm enclosures.

[0020] The camera 106 may be attached to a winch system that allows the camera 106 to be relocated underwater in the fish farming enclosure 104 to capture stereo images of feed 114 and fish 102 from different locations within the fish farming enclosure 104. For example, the winch system may allow the camera 106 to move around the perimeter and the interior of the fish farming enclosure 104 and at various depths within the fish farming enclosure 104 to capture images of feed 114 and fish 102 at different depths and locations within the fish farming enclosure 104. The winch system may also allow control of pan and tilt of the camera 106.

[0021] The winch system may be operated manually by a human controller such as, for example, by directing user input to an above-water surface winch control system.

Alternatively, the winch system may operate autonomously according to a winch control program configured to adjust the location of the camera 106 within the fish farming enclosure 104, for example, in terms of location on the perimeter of the cage and depth within the fish farming enclosure 104. [0022] The autonomous winch control system may adjust the location of the camera 106 according to a series of predefined or pre-programmed adjustments and /or according to detected signals in the fish farming enclosure 104 that indicate better or more optimal locations for capturing images of feed 114 and the fish 102 relative to a current position and/or orientation of the camera 106. A variety of signals may be used such as, for example, machine learning and computer vision techniques applied to images captured by the camera 106 to detect schools or clusters of feed 114 or fish 102 currently distant from the camera 106 such that a closer location can be determined and the location, tilt, and / or pan of the camera 106 adjusted to capture more suitable images of feed 114 and fish 102. The same techniques may be used to automatically determine that the camera 106 should remain or linger in a current location and /or orientation because the camera 106 are currently in a good position to capture suitable images of feed 114 or fish 102 for pellet monitoring.

[0023] It is also possible to illuminate the fish farming enclosure 104 with ambient lighting in the blue-green spectrum (450nm to 570nm). This may be useful to increase the length of the daily sample period during which useful images of feed 114 and fish 102 in the fish farming enclosure 104 may be captured. For example, depending on the current season (e.g., winter), time of day (e.g., sunrise or sunset), and latitude of the fish farming enclosure 104, only a few hours during the middle of the day may be suitable for capturing useful images without using ambient lighting. This daily period may be extended with ambient lighting.

[0024] The fish farming enclosure 104 may be configured with a wireless cage access point 108 A for transmitting stereo images captured by the camera 106 and other information wirelessly to a barge 110 or other water vessel that is also configured with a wireless access point 108B. The barge 110 may be where on-site fish farming process control, production, and planning activities are conducted.

[0025] The barge 110 may house a computer image processing system 112 that embodies techniques disclosed herein for automatic pellet monitoring. While camera 106 can be communicatively coupled to image processing system 112 wirelessly via wireless access points 108, camera 106 can be communicatively coupled to image processing system 112 by wire such as, for example, via a wired fiber connection between fish farming enclosure 104 and barge 110.

[0026] Some or all of image processing system 112 can be located remotely from camera 106 and connected to camera 106 by wire or coupled wirelessly. However, some or all of the image processing system 112 can be a component of the camera 106. In this implementation, the camera 106 may be configured within an on-board graphics processing unit (GPU) or other on-board processor or processors capable of executing the image processing system 112, or portions thereof.

[0027] In any both implementations where the system 112 is integrated with the camera 106 and implementations where system 112 and camera 106 are remote from each other, output of the image processing system 112 based on processing images captured by camera 106 may be uploaded to the cloud or otherwise over the internet via a cellular data network, satellite data network, or other suitable data network to an online service configured to provide uneaten / wasted pellet count or density estimates or other information derived by the online service therefrom in a web dashboard or the like (e.g., in a web browser, a mobile application, a client application, or other graphical user interface.) System 112 may also be locally coupled to a web dashboard or the like to support on-site fish farming operations and analytics.

[0028] One skilled in the art will recognize from the foregoing description that there is no requirement that image processing system 112 be contained on barge 110 or that barge 110 be present in the aquaculture environment. Instead, camera 106 may contain image processing system 112 or be coupled by wire to a computer system that contains image processing system 112. The computer system may be affixed above the water surface to fish farming enclosure 104 and may include wireless data communications capabilities for transmitting and receiving information over a data network (e.g., the Internet).

[0029] As another alternative, image processing system 112 may be located in the cloud (e.g., on the internet). In this configuration, camera footage captured by camera 106 is uploaded over a network (e.g., the internet) to the system 112 in the cloud for processing there. The barge 110 or other location at the fish farm may have a personal computing device (e.g., a laptop computer) for accessing a web application over the network. The web application may drive a graphical user interface (e.g., web browser web pages) at the personal computing device where the graphical user interface presents results produced by the system 112 such as analytics, reports, etc. generated by the web application based on the automatic pellet monitoring.

[0030] Although not shown in FIG. 1, the barge 110 may include a mechanical feed system that is connected by physical pipes to the fish farming enclosure 104. The feed system may deliver food pellets 114 via the pipes in doses to fish 102 in the fish farming enclosure 104. The feed system may include other components such as a feed blower connected to an air cooler which is connected to an air controller and a feed doser which is connected to a feed selector that is connected to the pipes to the fish farming enclosure 104. Results and outputs of the automatic pellet monitoring performed by the image processing system 112 may be used as input to the feed system for determining the correct amount of feed 114 to dispense in terms of dosage amounts and dosage frequency, thereby improving the operation of the feed system.

[0031] As well as being useful for determining the correct amount of feed 114 to dispense, the results and outputs of the automatic pellet monitoring by the image processing system 112 are also useful for determining more optimal feed formulation. Feed formulation includes determining the ratio of fat, protein, and other nutrients in the food pellets fed to the fish 102. Feed formulation can also include determining the inclusion and amount of feed additives in the food pellets fed to the fish 102. Using the results and outputs of the automatic pellet monitoring, precise feed formulations for the fish in that fish farming enclosure may be determined. It is also possible to have different formulations for the fish in different fish farming enclosures based on the results and outputs. For example, the results and outputs may be used to select feed 114 to dispense in the fish farming enclosure 104 from multiple different silos of pelletized feed. The different silos of feed may have different predetermined nutrient mixes and / or different pellet sizes. The results and outputs of the automatic pellet monitoring may be used to automatically select which silo or silos to dispense feed from.

EXAMPLE PROCESSING FOR AUTOMATIC PELLET MONITORING

[0032] Waste of feed pellets is a serious problem in aquaculture. In part, this is because fish food accounts for a significant portion of a fish farmer’s capital investment. Also, wasted feed pellets pollute the water and can make fish sick such as for example by causing gill damage. As mentioned above, a reason feed is wasted is because of the limited feedback on the consumption of dispensed feed pellets. As a result, fish farmers have a difficult time determining the amount of feed pellets that should be delivered at a given feeding time and at regular feeding intervals.

[0033] The present invention may encompass automatically estimating the amount of feed pellets dispensed in a fish farming enclosure that are uneaten and providing spatio- temporal feedback based on the estimates that help fish farmers determine how much feed should be dispensed in the fish farming enclosure at a next feeding. For example, a web dashboard may be presented in a user interface to the fish farmers where the web dashboard presents information about or derived from the estimated amount of uneaten feed pellets. Generally, feed pellets are dispensed at the water surface of the fish farming enclosure and sink toward the bottom of the fish farming enclosure. The feed pellets may be dispensed in different areas of the fish farming enclosure water surface. The area selected may affect the number of uneaten pellets depending on various conditions such as time of day and the current swimming patterns of the fish in the fish farming enclosure.

[0034] The process 200 begins by positioning 202 the camera 106 at the bottom or substantially at the bottom of the fish farming enclosure 104 Alternatively, the camera 106 can be positioned just under the typical feeding area in the pen. For example, the camera 106 can be positioned at a depth between 5 and 15 meters and the lens of the camera 106 can be tilted up toward the water surface of the fish farming enclosure 104 or substantially laterally depending on the depth of the camera 106 position relative to the feeding area. In this relatively shallow depth position, the camera can capture images of feed pellets dispensed at the water surface of the fish farming enclosure 104 that sink through the area of the feeding fish 102 uneaten before currents in the water carry the uneaten feed outside the frame of the camera 106

[0035] The image processing system 112 may contain a mass storage hard disk or other suitable mass storage non-volatile media for storing the video signal captured 204 by camera 106 as compressed video files (e.g., AVI files).

[0036] The image processing system 112 may contain a pellet detection sub-system that detects 206 the pixels in a captured 202 video image that correspond to feed pellets. The output of the pellet detection sub-system may be one or more bounding boxes for the image.

[0037] The pellet detection sub-system may include a convolutional neural network for analyzing an input image to perform classification on objects (e.g., pellets) detected in the image. The convolutional neural network may be composed of an input layer, an output layer, and multiple hidden layers including convolutional layers that emulate in a computer the response of neurons in animals to visual stimuli.

[0038] Given an image, a convolutional neural network of the pellet detection and image segmentation system may convolve the image by a number of convolution filters. Non-linear transformations (e.g., MaxPool and RELU) may then then be applied to the output of convolution. The convolutional neural network may perform the convolution and the non linear transformations multiple times. The output of the final layer may then be sent to a Softmax layer that gives the probability of the image being of a particular class (e.g., an image of one or more pellets). The pellet detection sub-system may discard the input image from further processing by system 112 if the probability that the image is not an image of one or more pellets is above a threshold (e.g., 80%). [0039] Assuming the input image is not discarded by pellet detection sub-system, the sub-system may use a convolutional neural network to detect 206 feed pellets 114 in the image using bounding boxes. A bounding box may also be associated with a label identifying the bounding box as containing a feed pellet (as opposed to identifying the object as a fish or a bubble or other object).

[0040] The images of pellets on which a convolutional neural network is trained may include images that are representative of images containing pellets from which feed pellets can be identified. Such training data may be generated synthetically using a computer graphics application (e.g., a video gaming engine or a computer graphical animation application (e.g., Blender) in order to generate sufficient training data.

[0041] After obtaining the bounding boxes for pellets detected 206 in a captured 204 video image, the number of feed pellets is counted 208. This can be done simply by counting the number of bounding boxes corresponding to pellets detected 206 in the image. This number (or a metric based on thereon) may be provided to a computer that regulates the controller of a fish food feeding machine to provide precise fish food quantity in the next feeding process.

[0042] In the above process 200, the camera 106 is positioned at or near the bottom of the fish farming enclosure 104 to capture video of feed pellets that sink through fish 102 feeding near the surface and fall toward the bottom on the fish farming enclosure 104 uneaten.

However, it is also possible to position the camera 106 at or near the surface of the fish farming enclosure 104 during feeding. In this case, the system 112 may count the number of feed pellets 114 detected over the feeding period and an estimate of the number or amount of uneaten pellets can be determined based on a difference between a number of feed pellets detected at a beginning of the feeding period and a number of feed pellets remaining that are detected at the end of the feeding period. The start and end of the feeding period may also be automatically determined by system 112 based on the classified activity of fish 102.

[0043] As an alternative, as estimate of the absolute or relative density of uneaten pellets is estimated. This estimate may be based on a known dispensed volume reflecting the amount of feed dispensed such as for example according to software that controls the feed

blower/dispenser and is configured to track the number of pellets of a known average volume or density that are dispensed to the water surface of the fish farming enclosure. The camera footage (i.e., a series of images not necessarily consecutive) captured by camera 106 can be used to estimate the volume of uneaten pellets. The camera footage that is analyzed for this may be footage captured by camera 106 after a certain amount of time since a feeding commenced. The length of this time can be determined in a variety of manners including empirically and automatically based on the current camera footage indicating that feeding by fish 102 has slowed down or ceased. For the camera footage analyzed by system 112 to detect uneaten pellets, a trained convolutional neural network can be used to detect uneaten pellets in the images and based on the real-world area of the image that contains a pellet, the volume of the pellet can be estimated. Determination of the real-world area may be aided by using stereo camera 106 from which depth or disparity map information can be obtained from captured stereo images of uneaten pellets and ultimately the real world cartesian distance of a detected pellet from the camera 106.

[0044] FIG. 3 is a schematic 300 of the feed pellet monitoring process according to some embodiments of the present invention. Various signals about conditions in the fish farming enclosure are detected including underwater video camera footage that is input to an image processing system. The image processing system provides fish biomass estimates and uneaten feed estimates to a feed optimization system that is configured with a feed optimization model. The fish biomass estimates may be the estimated average weight of fish in the fish farming enclosure, or a distribution thereof. Techniques that may be used by image processing system to produce the biomass estimates are described in greater detail below. The uneaten feed estimates may be in terms of a number of uneaten pellets, a volume of uneaten pellets, or a relative or absolute density of the uneaten pellets. Temperature readings and dissolved oxygen sensor readings may also be input to the feed optimization system and incorporated into the model.

[0045] The feed optimization system applies the feed optimization model to the biomass estimates and the uneaten feed estimates and other inputs to determine a feed dosage. A goal of the feed optimization model may be to determine a feed dispense rate (dosage) for the fish farming enclosure over a growing cycle (e.g., 12 to 18 months) that maximizes marginal profit given various factors including the biomass estimates that reflect the growth rate of the fish in the fish farming enclosure during the growing cycle and the uneaten feed estimates which reflect the amount of feed that is going uneaten (wasted) during the growing cycle. Other parameters of the model may include the cost of feed during the growing cycle, the market price of fish during the growing cycle, dissolved oxygen in the fish farming enclosure during the growing cycle, and water temperature in the fish farming enclosure during the growing cycle. The feed dosage may include an amount of feed to dispense at a next feeding of fish in the fish farming enclosure. [0046] In addition, or alternatively, the feed dosage may also include a rate of feed to dispense at the next feeding. In addition, the feed dosage may include a nutrient mix or feed- type ratio of which the feed at the next feeding should be composed. The dosage is input to a feed selector/doser. Based on the input dosage, the feed selector/doser selects an amount of feed to dispense over the water surface of the fish farming enclosure at the next feeding. If the dosage encompasses a nutrient mix, then the feed selector/doser may select the appropriate mix of feed from various feed silos containing different feed compositions. The selected feed in the selected amount is then dispersed over the water surface of the fish farming enclosure via a feed blower. The video and other signals captured during the feeding may be input to the image processing system and the feed optimization system to optimize the next feeding cycle. This cycle may be repeated over time to optimize feeding

automatically, on an ongoing basis.

[0047] The various signals may include video captured by a camera that is input to an image processing system. The various signals may also include water temperature readings from a temperature sensor and dissolved oxygen readings from a dissolved oxygen sensor input to a feed optimization system. The readings may be obtained by the feed optimization system in a time series manner such as a reading every minute or every few minutes.

[0048] Image processing system may apply various computer vision techniques estimate the amount of uneaten feed pellets. The uneaten feed estimates may be generated for each feeding based on video footage captured during the feeding.

COMPUTER VISION TECHNIQUES FOR ESTIMATING THE AVERAGE BIOMASS OF

FISH IN THE FISH FARMING ENCLOSURE

[0049] The camera 106 immersed in the fish farming enclosure 104 may be a stereo- vision camera. One challenge with a stereo-vision system is the accurate estimation of biomass from the two-dimensional stereo camera images. For example, a single lateral dimension of a fish such as fork length not be sufficient to accurately predict the weight of the fish because of variances in fish size and feeding regimes. In some embodiments, to improve the accuracy of the weight prediction, the image processing system 112

automatically detects and captures a set of one or more morphological lateral body dimensions of a fish that are useful for accurately predicting the weight of the fish.

[0050] Once the dimensions are known, the weight may be calculated. The weight may be calculated using a regression equation, for example. The regression equation may be developed using regression analysis on known ground truth weight and dimension relationships of fish. The regression equation may be fish specific reflecting the particular morphological characteristics of the fish. For example, a different regression equation or set of equations may be used for Scottish salmon that is used for Norwegian salmon which are typically heavier than Scottish salmon.

[0051] The weight calculated can be a discrete value representing a predicted weight of the fish or multiple values representing a probability distribution of the predicted weight of the fish.

[0052] Multiple regression equations may be used to calculate multiple weights for a fish and the weights averaged (e.g., a weighted average) to a calculate a final weight prediction for the fish.

[0053] A regression equation used can be a single-factor regression equation or a multi- factor regression equation. A single-factor regression equation can predict the weight within a degree of accuracy using only one of the dimensions. A multi-factor regression equation can predict the weight within a degree of accuracy using multiple of the dimensions.

[0054] Different regression equations or sets of regression equations may be used in different situations depending on the particular morphological lateral body dimensions the system 112 is able to determine from different sets of stereo images. Various different morphological lateral body measurements and regression equations can be used such as those described in the paper by Beddow, T.A. and Ross, L.G.,“Predicting biomass of Atlantic salmon from morphometric lateral measurements,” in the Journal of Fish Biology 49(3): 469- 482. The system 112 is not limited to any particular set of morphological lateral body dimensions or any particular set of regression equations.

[0055] According to the Beddow and Ross paper referenced above, the morphological lateral body dimensions depicted in FIG. 2 may be especially suitable for predicting the weight of an individual fish to in some cases within plus or minus two percent (2%) of the actual weight.

[0056] FIG. 4 depicts truss dimensions 410 and conventional dimensions 430 of a fish.

[0057] The truss dimensions 410 (shown as dashed lines) are established by

corresponding landmark points on the fish and lines between the landmark points. The various landmark points include (1) the posterior most part of the eye, (2) the posterior point of the neurocranium (where scales begin), (3) the origin of the pectoral fin, (4) the origin of the dorsal fin, (5) the origin of the pelvic fin, (6) the posterior end of the dorsal fin, (7) the origin of the anal fin, (8) the origin of the adipose fin, (9) the anterior attachment of the caudal fin to the tail, (10) the posterior attachment of the caudal fin to the tail and (11) the base of the middle caudal rays. [0058] The conventional dimensions 430 include (A) the body depth at origin of the pectoral fin, (B) the body depth at origin of the dorsal fin, (C) the body depth at end of the dorsal fin, (D) the body depth at origin of the anal fin, (E) the least depth of the caudal peduncle, (POL) the post-orbital body length, and (SL) the standard body length.

[0059] The conventional dimensions 430 correspond to various landmark areas of the fish. The head area (SL)-(A) is between the start of the (SL) the standard body length at the anterior end of the fish and (A) the body depth at the origin pectoral fin. The pectoral area (A)-(B) is between the (A) the body depth at the origin pectoral fin and (B) the body depth at the origin of the dorsal fin is. The anterior dorsal area (B)-(C) is between the (B) body depth at the origin of the dorsal fin and (C) the body depth at the end of the dorsal fin. The posterior dorsal area (C)-(D) is between the (C) the body depth at end of the dorsal fin and (D) the body depth at origin of the anal fin. The anal area (D)-(E) is between (D) the body depth at origin of the anal fin and (E) the least depth of the caudal peduncle. The tail area (E)-(SL) is between the (E) the least depth of the caudal peduncle and the end of the (SL) the standard body length at the posterior end of the fish.

[0060] The system 112 may automatically detect and identify one or more or all the landmark points and the landmark areas discussed above for purposes of predicting the weight of the fish. The system 112 may do this even though the yaw, roll, and pitch angle of the fish 102 captured in the stereo images may be greater than zero degrees with respect to a fish that is perfectly lateral with the camera 106. By doing so, the system 112 can estimate biomass from stereo images of freely swimming fish 102 in the fish farming enclosure 104. The system 100 does not require a tube or a channel in the fish farming enclosure 104 through which the fish 102 must swim to accurately estimate biomass.

[0061] FIG. 5 is a schematic diagram showing high-level components of the image processing system 112. The system 112 includes image storage 510, image filtration system 512, object detection and image segmentation system 514, stereo matching and occlusion handling system 516, and weight predictor system 518.

[0062] At a high-level, operation of the system 112 for biomass estimation may be as follows.

[0063] High-resolution monochrome or color rectified stereo images captured by the camera 106 in the fish farming enclosure 104 are transmitted to image processing system 112 via a communication link established between wireless access points 108A and 108B. The images received by the image processing system 112 are stored in image storage 510 (e.g., one or more non-volatile and /or volatile memory devices.) Pairs of rectified stereo images are output (read) from image storage 510 and input to image filtration system 512.

[0064] Image filtration system 512 analyzes the input image pair, or alternatively one of the images in the pair, to make a preliminary, relatively low-processing cost determination of whether the image pair contains suitable images for further processing by system 112. If so, then the pair of images are input stereo matching and occlusion handling system 516.

[0065] Stereo matching and occlusion handling system 516 determines corresponding pairs of pixels in the stereo images and outputs a disparity map for the base image of the stereo pair.

[0066] Object detection and image segmentation system 514 detects fish in the input base image and produces one or more image segmentation masks corresponding to one or more landmark points and / or landmark areas of the fish detected in the base image.

[0067] In some embodiments, weight predictor system 518 obtains three-dimensional (3- D) world coordinates of points corresponding to pixels from the input disparity map output by the stereo matching and occlusion handling system 516. The 3-D world coordinates of points corresponding to pixels corresponding to the landmark point(s) and / or landmark area(s) of the fish are used to calculate one or more truss dimensions and / or one or more conventional dimensions of the fish. The calculated dimensions then used in a calculation to predict the weight of the fish.

[0068] As an alternative, instead of calculating the truss and / or conventional dimensions, the weight predictor 518 may generate a 3-D point cloud object from the 3-D world coordinates. The volume of the fish may be estimated from the 3-D point cloud and then the weight predicting based on the estimated volume and a predetermined density or density distribution of the fish (e.g., a known average density or density distribution of Atlantic Salmon.)

[0069] A threshold number of individual weight predictions determined (e.g., 1,000) for a period of time (e.g., a day) may be averaged. From this, a distribution of the average daily (or other time period) fish biomass in the fish farming enclosure 104 over an extended period of time (e.g., the past month) may be charted (e.g., as a histogram or other visual distribution presented in a web browser) to provide a visualization of whether the total fish biomass in the fish farming enclosure 104 is increasing, decreasing, or staying relatively constant over that extended period according to the aggregated individual estimates. A decreasing biomass may be indicative, for example, of fish that escaped the fish farming enclosure 104, a predator gained access to the fish farming enclosure 104, fish mortality, etc. An increasing biomass may be indicative that the fish are still growing and not yet ready for harvest. While a steady biomass distribution may be indicative that the fish are ready for harvest. Other applications of the individual biomass estimates are possible, and the present invention is not limited to any particular application of the individual biomass estimates.

[0070] It should also be noted that the sampling strategy may vary depending on if the object detection and segmentation system 514 has the capability to uniquely identify fish 102 in the fish farming enclosure 104. For example, one or more features of a fish detected in an image may be used to uniquely identify the fish in the image. Inf that case, a weight prediction may not be determined for the fish if a weight prediction was recently obtained for the fish within a sampling window (e.g., the same day or the same week.) The sampling strategy, in this case, may be to predict the weight of each unique fish identified during the sampling window and avoid re-predicting the weight of a fish for which a weight prediction has already been made during the sampling window. By doing so, a more accurate average biomass may be calculated because double counting is avoided.

[0071] To produce rectified stereo images for input to stereo matching and occlusion handling system 516, camera 106 may be calibrated underwater against a target of a known geometry such as a black and white checkered board of alternative squares that provides good contrast in underwater conditions. Other calibration techniques are possible, and the present invention is not limited to any particular calibration techniques.

[0072] In some embodiments, stereo matching and occlusion handling system 516 detects pairs of corresponding pixels in the rectified stereo pair. From this, stereo matching and occlusion handling system 516 outputs a disparity map for the base image of the stereo pair. The disparity map may be an array of values, one value for each pixel in the base image, where each value numerical represents the disparity with respect to a corresponding pixel in the other (match) image of the stereo pair. A depth map for the base image may be derived from the disparity map using known techniques. For example, the depth of a pixel in the base image may be calculated from a known focal length of the camera 106, a known baseline distance between the camera 106, and the disparity of the pixel in the base image and its corresponding pixel in the match image. The depth map may also be an array of values, one value for each pixel in the image, but where each value numerically represents the distance of the object in the image scene corresponding to the pixel from the camera 106 (e.g., from the center of the lens of the camera that captured the base image.)

[0073] The pair of images may be rectified, either vertically or horizontally. The matching task performed by the stereo matching and occlusion handling system 516 is to match pixels in one of the stereo images (the base image) to corresponding pixels in the other stereo image (the match image) according to a disparity matching algorithm (e.g., basic block matching, semi-global block matching, etc.) The output of the matching task is a disparity map for the stereo image pair. A depth map may be calculated from the disparity map using basic geometrical calculations. The depth map may contain per-pixel information relating to the distance of the surfaces of objects in the base image from a viewpoint (e.g., the center of the lens of the camera that captured the base image.)

[0074] One of the challenges to accurately estimating the weight of a fish in an image is occlusions. This challenge is magnified in the fish biomass estimation context because freely swimming fish 102 in the fish farming enclosure 104 may swim close to each other or in schools such that one fish occludes another as they swim in front of the camera 106. In this case, it can be difficult to stereo match corresponding pixels in the stereo image pair because a portion of a fish that is visible in one of the images may not be visible in the other of the images because of occlusion.

[0075] To address this occlusion problem, a convolutional neural network may be trained to aid the stereo matching task. In particular, the convolution neural network may be used to learn a similarity measure on small image patches of the base and match images. The network may be trained with a binary classification set having examples of similar and dissimilar pairs of patches. The output of the convolutional neural network may be used to initialize the stereo matching cost. Post-processing steps such as, for example, semi-global block matching, may follow to determine pixel or pixel region correspondence between the base and match images. In some embodiments, a depth map is extracted from the input stereo image pair based on stereo matching method described in the paper by J. Zbontar and Y. LeCun,“Stereo Matching by Training a Convolutional Neural Network to Compare Image Patches,” JMLR 17(65): 1-32, 2016.

[0076] Object detection and image segmentation system 514 identifies the pixels in an image that corresponds to a landmark point or a landmark area of the fish. The output of the object detection and image segmentation system 514 may be one or more image

segmentation masks for the image. For example, an image segmentation mask may be a binary mask where the binary mask is an array of values, each value corresponding to one of the pixels in the image, the value being 1 (or 0) to indicate that the corresponding pixel does correspond to a landmark point or a landmark area of the fish, or being the opposite binary value to indicate that the corresponding pixel does not correspond to a landmark point or a landmark area of the fish. Instead of binary values, confidence values may be used in the image segmentation mask. In addition, or as an alternative to representing an image segmentation using raster coordinates, an image segmentation may be represented by vector coordinates that indicate the outline area of a landmark point or a landmark area of the fish.

[0077] The object detection and image segmentation system 514 may output multiple image segmentation masks for the same input image corresponding to different landmark points and different landmark features detected in the image.

[0078] The object detection and image segmentation system 514 may include a convolutional neural network for analyzing an input image to perform classification on objects (fish) detected in the image. The convolutional neural network may be composed of an input layer, an output layer, and multiple hidden layers including convolutional layers that emulate in a computer the response of neurons in animals to visual stimuli.

[0079] Given an image, a convolutional neural network of the object detection and image segmentation system 514 may convolve the image by a number of convolution filters. Non linear transformations (e.g., MaxPool and RELU) may then then be applied to the output of convolution. The convolutional neural network may perform the convolution and the non linear transformations multiple times. The output of the final layer may then be sent to a Softmax layer that gives the probability of the image being of a particular class (e.g., an image of one or more fish). The object detection and image segmentation system 514 may discard the input image from further processing by system 112 if the probability that the image is not an image of one or more fish is above a threshold (e.g., 80%).

[0080] Assuming the input image is not discarded by object detection and image segmentation system 514, may use a convolutional neural network to identify fish in the image via a bounding box. A bounding box may also be associated a label identifying the fish within the bounding box. The convolutional neural network may perform a selective search on the image through pixel windows of different sizes. For each size, the convolutional neural network attempts to group together adjacent pixels by texture, color, or intensity to identify fish in the image. This may result in a set of region proposals which may then be input to a convolutional neural network trained on images of fish with the locations of the fish within the image labeled to determine which regions contain fish and which do not.

[0081] As an alternative, it is also possible to detect fish, landmark points and / or landmark areas of the fish in the image without region proposals using a single stage method (e.g., YOLO, SSD, SSD with recurrent rolling convolution, deconvolutional single shot detector, or the like.) YOLO is described in the paper by J. Redmon, S. Divvala, R. Girschick and A. Farhadi,“You Only Look Once: Unified, Real-Time Object Detection,” arXiv: l506.02640v5, May 9, 2016, the entire contents of which is hereby incorporated by reference. SSD is described in the paper by W. Liu, D. Anguelov, D. Erhan, C. Szegedy, S. Reed, Cheng-Yang Fu and A. C. Berg,“SSD: Single Shot MultiBox Detector,”

arXiv: l5l2.02325v5, December 19, 2016, the entire contents of which is hereby incorporated by reference. SSD with recurrent rolling convolution is described in the paper by J. Ren, X. Chen, J. Liu, W. Sun, J. Pang, Q. Yan, Y. Tai and L. Xu,“Accurate Single Stage Detector Using Recurrent Rolling Convolution,” arXiv: l704:05776vl, April 19,

2017. Deconvolutional single shot detector is described in the paper by C. Fu, W. Liu, A. Ranga, A. Tyagi, A. Berg,“DSSD: Deconvolutional Single Shot Detector,”

arXiv: l70l.06659vl, January 23, 2017.

[0082] The images of fish on which a convolutional neural network is trained may include images that are representative of images containing fish from which landmark point and landmark areas can be identified. For example, a convolutional neural network may be trained on images of fish that provide a full lateral view of the fish including the head and tail at various different yaw, roll, and pitch angles and at different sizes in the image representing different distances from the camera 106. Such training data may also be generated

synthetically using a computer graphics application (e.g., a video gaming engine or a computer graphical animation application (e.g., Blender) in order to generate sufficient training data. A final layer of a convolutional neural network may include a support vector machine (SVM) that classifies the fish in each valid region. Such classification may include whether a full lateral view of a fish including both the head and the tail of the fish is detected in the region. Object detection and image segmentation system 514 may tighten the bounding box around a region of a fish by running a linear regression on the region. This produces new bounding box coordinates for the fish in the region.

[0083] In addition, or alternatively, a convolutional neural network, a classifier, and abounding box linear regressor may be jointly trained for greater accuracy. This may be accomplished by replacing the SVM classifier with a softmax layer on top of a convolutional neural network to output a classification for a valid region. The linear regressor layer may be added in parallel to the softmax layer to output bounding box coordinates for the valid region.

[0084] To speed up objection detection, it is also possible to leverage the convolutional feature maps used by a region-based detector such as Fast R-CNN, to generate region proposals such as is done with Faster R-CNN. For example, a single stage region-based detector may be used. Fast R-CNN is described in the paper by R. Girshick,“Fast R-CNN,” arXiv: l504.08083v2, September 27, 2015. Faster R-CNN is described in the paper by S. Ren, K. He, R. Girschick and J. Sun,“Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks,” arXiv: l 506.01497n3, January 6, 2016.

[0085] The bounding box of a full lateral view of a fish may be cropped from the image in which is detected. A convolutional neural network-based object detection may then be performed again on the cropped image to detect and obtain bounding boxes or image segmentation masks corresponding to the landmark points and the landmark areas of the fish in the cropped image. For this, a trained convolutional neural network may be used. The convolutional neural network may be trained on tight images of fish (synthetically generated or images captured in situ from a camera) with the locations of the various landmark points and the landmark areas in the images labeled in the tight images.

[0086] Once a suitable cropped image is obtained, the object detection and image segmentation system 514 performs pixel level segmentation on the cropped image. This may be accomplished using a convolutional neural network that runs in parallel with a

convolutional neural network for object detection such as Mask R-CNN. The output of this may be image segmentation masks for the locations of the detected landmark points and landmark areas in the image. Mask R-CNN is described in the paper by K. He, G. Gkioxari,

P. Dollar and R. Girschik,“Mask R-CNN,” arXiv: l703.06870v3, January 24, 2018.

[0087] Weight predictor 518 combines the depth map output by stereo matching and occlusion handling system 516 with the image segmentation mask(s) generated by the object detection and image segmentation system 514. The combining is done to determine the location of the landmark point(s) and / or landmark area(s) in 3-D cartesian space This combining may be accomplished by superimposing the depth map on an image segmentation mask, or vice versa, on a pixel-by-pixel basis (pixelwise.)

[0088] For example, one or more truss dimensions 310 and / or one or more conventional dimensions 330 of the fish may then be calculated using planar geometry. For example, the distance of the truss dimension between landmark point (1) the posterior most part of the eye and landmark point (4) the origin of the dorsal fin may be calculated using the Pythagorean theorem given the x, y, and z coordinates in the 3D space for each landmark point. Weight predictor 518 may then predict the weight of the fish according to a regression equation using the one or more truss and /or one or more conventional dimensions. In this way, weight prediction of the fish 102 in the fish farming enclosure 104 may be made on an individual basis.

EXAMPLE HARDWARE IMPLEMENTING MECHANISM FOR AUTOMATIC PELLET

MONITORING [0089] FIG. 6 is a block diagram that illustrates a computer system 600 with which some embodiments of the present invention may be implemented. Computer system 600 includes a bus 602 or other communication mechanism for communicating information, and a hardware processor 604 coupled with bus 602 for processing information. Hardware processor 604 may be, for example, a general-purpose microprocessor, a central processing unit (CPU) or a core thereof, a graphics processing unit (GPU), or a system on a chip (SoC).

[0090] Computer system 600 also includes a main memory 606, typically implemented by one or more volatile memory devices, coupled to bus 602 for storing information and instructions to be executed by processor 604. Main memory 606 also may be used for storing temporary variables or other intermediate information during execution of instructions by processor 604. Computer system 600 may also include a read-only memory (ROM) 608 or other static storage device coupled to bus 602 for storing static information and instructions for processor 604. A storage device 610, typically implemented by one or more non-volatile memory devices, is provided and coupled to bus 602 for storing information and instructions.

[0091] Computer system 600 may be coupled via bus 602 to a display 612, such as a liquid crystal display (LCD), a light emitting diode (LED) display, or a cathode ray tube (CRT), for displaying information to a computer user. Display 612 may be combined with a touch sensitive surface to form a touch screen display. The touch sensitive surface is an input device for communicating information including direction information and command selections to processor 604 and for controlling cursor movement on display 612 via touch input directed to the touch sensitive surface such by tactile or haptic contact with the touch sensitive surface by a user’s finger, fingers, or hand or by a hand-held stylus or pen. The touch sensitive surface may be implemented using a variety of different touch detection and location technologies including, for example, resistive, capacitive, surface acoustical wave (SAW) or infrared technology.

[0092] An input device 614, including alphanumeric and other keys, may be coupled to bus 602 for communicating information and command selections to processor 604.

[0093] Another type of user input device may be cursor control 616, such as a mouse, a trackball, or cursor direction keys for communicating direction information and command selections to processor 604 and for controlling cursor movement on display 612. This input device typically has two degrees of freedom in two axes, a first axis (e.g., x) and a second axis (e.g., y), that allows the device to specify positions in a plane.

[0094] Instructions, when stored in non-transitory storage media accessible to processor 604, such as, for example, main memory 606 or storage device 610, render computer system 600 into a special-purpose machine that is customized to perform the operations specified in the instructions. Alternatively, customized hard-wired logic, one or more ASICs or FPGAs, firmware and/or hardware logic which in combination with the computer system causes or programs computer system 600 to be a special-purpose machine.

[0095] A computer-implemented process may be performed by computer system 600 in response to processor 604 executing one or more sequences of one or more instructions contained in main memory 606. Such instructions may be read into main memory 606 from another storage medium, such as storage device 610. Execution of the sequences of instructions contained in main memory 606 causes processor 604 to perform the process. Alternatively, hard-wired circuitry may be used in place of or in combination with software instructions to perform the process.

[0096] The term“storage media” as used herein refers to any non-transitory media that store data and/or instructions that cause a machine to operate in a specific fashion. Such storage media may comprise non-volatile media (e.g., storage device 610) and/or volatile media (e.g., main memory 606). Non-volatile media includes, for example, read-only memory (e.g., EEPROM), flash memory (e.g., solid-state drives), magnetic storage devices (e.g., hard disk drives), and optical discs (e.g., CD-ROM). Volatile media includes, for example, random-access memory devices, dynamic random-access memory devices (e.g., DRAM) and static random-access memory devices (e.g., SRAM).

[0097] Storage media is distinct from but may be used in conjunction with transmission media. Transmission media participates in transferring information between storage media. For example, transmission media includes coaxial cables, copper wire and fiber optics, including the circuitry that comprise bus 602. Transmission media can also take the form of acoustic or light waves, such as those generated during radio-wave and infra-red data communications.

[0098] Computer system 600 also includes a network interface 618 coupled to bus 602. Network interface 618 provides a two-way data communication coupling to a wired or wireless network link 620 that is connected to a local, cellular or mobile network 622. For example, communication interface 618 may be IEEE 302.3 wired“ethernet” card, an IEEE 302.11 wireless local area network (WLAN) card, a IEEE 302.15 wireless personal area network (e.g., Bluetooth) card or a cellular network (e.g., GSM, LTE, etc.) card to provide a data communication connection to a compatible wired or wireless network. In any such implementation, communication interface 618 sends and receives electrical, electromagnetic or optical signals that carry digital data streams representing various types of information. [0099] Network link 620 typically provides data communication through one or more networks to other data devices. For example, network link 620 may provide a connection through network 622 to a local computer system 624 that is also connected to network 622 or to data communication equipment operated by a network access provider 626 such as, for example, an internet service provider or a cellular network provider. Network access provider 626 in turn provides data communication connectivity to another data communications network 628 (e.g., the internet). Networks 622 and 628 both use electrical, electromagnetic or optical signals that carry digital data streams. The signals through the various networks and the signals on network link 620 and through communication interface 618, which carry the digital data to and from computer system 600, are example forms of transmission media.

[0100] Computer system 600 can send messages and receive data, including program code, through the networks 622 and 628, network link 620 and communication interface 618. In the internet example, a remote computer system 630 might transmit a requested code for an application program through network 628, network 622 and communication interface 618. The received code may be executed by processor 604 as it is received, and/or stored in storage device 610, or other non-volatile storage for later execution.

[0101] In the foregoing specification, the embodiments have been described with reference to numerous specific details that may vary from implementation to implementation. The specification and drawings are, accordingly, to be regarded in an illustrative rather than a restrictive sense.

Claims

1. A system for automatic feed pellet monitoring of feed dispensed to freely swimming fish in a fish farming enclosure in an aquaculture environment, the system comprising:

a digital camera for immersion underwater in the fish farming enclosure and for capturing digital images of feed pellets as the feed pellets sink in the fish farming enclosure; an image processing system of, or operatively coupled to, the digital camera, the image processing system comprising one or more processors, storage media, and one or more programs stored in the storage media and configured for execution by the one or more processors;

a feed optimization system of, or operatively coupled to, the image processing system, the feed optimization system comprising one or more processors, storage media, and one or more programs stored in the storage media and configured for execution by the one or more processors;

wherein the one or more programs of the image processing system are configured to generate an uneaten feed estimate based on digital images captured by the digital camera; wherein the one or more programs of the feed optimization system are configured to: receive the uneaten feed estimate from the image processing system;

determine a feed dosage based on a feed optimization model and the uneaten feed estimate; and

cause a feed doser system to dispense an amount of feed into the fish farming enclosure based on the feed dosage.

2. The system of Claim 1, wherein the one or more programs of the image processing system are configured to generate a biomass estimates based on digital images captured by the digital camera; and wherein the one or more programs of the feed optimization system are configured to: receive the biomass estimate from the image processing system; and determine the feed dosage amount based on the feed optimization model, the biomass estimate, and the uneaten feed estimate.

3. The system of Claim 1, wherein the one or more programs of the image processing system are configured to:

use a convolutional neural network to detect feed pellet objects in digital images captured by the digital camera;

count a number of feed pellet objects that are detected in digital images captured by the digital camera using the convolutional neural network; compute the uneaten feed estimate based on the number of feel pellet objects counted; and

provide the uneaten feed estimate to the feed optimization system.

4. The system of Claim 1, wherein the one or more programs of the feed optimization system are configured to:

receive a temperature reading from a temperature sensor immersed underwater in the fish farming enclosure; and

determine the feed dosage amount based on the feed optimization model, the uneaten feed estimate, and the temperature reading.

5. The system of Claim 1, wherein the one or more programs of the feed optimization system are configured to:

receive a dissolved oxygen reading by a dissolved oxygen sensor immersed underwater in the fish farming enclosure; and

determine the feed dosage amount based on the feed optimization model, the uneaten feed estimate, and the dissolved oxygen reading.

6. The system of Claim 1, wherein the one or more programs of the feed optimization system are configured to:

receive a feed dispense rate from the feed doser system reflecting a rate at which the feed doser dispensed feed into the fish farming enclosure over a period of time; and

determine the feed dosage amount based on the feed optimization model, the uneaten feed estimate, and the feed dispense rate.

7. The system of Claim 1, wherein the uneaten feed estimate is a volume of feed.

8. The system of Claim 1, wherein the one or more programs of the feed optimization system are configured to determine the feed dosage based on a feed optimization model, the uneaten feed estimate, and a known volume of feed dispensed into the fish farming enclosure.

9. The system of Claim 1, wherein the feed dosage is a feed dosage rate.

10. The system of Claim 1, wherein the feed dosage is a feed dosage amount.

11. The system of Claim 1, wherein the one or more programs of the feed optimization system are configured to:

determine the feed dosage for a particular type of feed based on a feed optimization model and the uneaten feed estimates; and

cause the feed doser system to dispense an amount of feed of the particular type of feed into the fish farming enclosure based on the feed dosage.

12. The system of Claim 1, wherein the feed optimization model accounts for monetary cost of feed over a period of time.

13. The system of Claim 1, wherein the feed optimization model accounts for a market price of fish over a period of time.

14. A computer-implemented method for optimizing feeding of freely swimming fish in a fish farming enclosure in an aquaculture environment, the method comprising:

generating an uneaten feed estimate based on digital images of feed pellet objects dispensed in the fish farming enclosure captured by a digital camera immersed underwater in the fish farming enclosure;

determining a feed dosage based on a feed optimization model and the uneaten feed estimate; and

causing a feed doser system to dispense an amount of feed into the fish farming enclosure based on the feed dosage.

15. One or more non-transitory computer-readable media storing one or more computer programs for optimizing feeding of freely swimming fish in a fish farming enclosure in an aquaculture environment, the one or more computer programs comprising instructions configured to perform a method as recited in Claim 14.