WO2020007363A1 - 识别目标数量的方法、装置及计算机可读存储介质 - Google Patents

识别目标数量的方法、装置及计算机可读存储介质 Download PDF

Info

Publication number
WO2020007363A1
WO2020007363A1 PCT/CN2019/094876 CN2019094876W WO2020007363A1 WO 2020007363 A1 WO2020007363 A1 WO 2020007363A1 CN 2019094876 W CN2019094876 W CN 2019094876W WO 2020007363 A1 WO2020007363 A1 WO 2020007363A1
Authority
WO
WIPO (PCT)
Prior art keywords
image
target
convolution
predicted
target point
Prior art date
Application number
PCT/CN2019/094876
Other languages
English (en)
French (fr)
Inventor
刘明
王怀庆
付靖玲
Original Assignee
京东数字科技控股有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 京东数字科技控股有限公司 filed Critical 京东数字科技控股有限公司
Publication of WO2020007363A1 publication Critical patent/WO2020007363A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/46Extracting features or characteristics from the video content, e.g. video fingerprints, representative shots or key frames
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks

Definitions

  • the present disclosure relates to the field of artificial intelligence technology, and in particular, to a method, a device, and a computer-readable storage medium for identifying the number of targets.
  • Ear deficiencies Generally within 1 to 2 days after birth of a piglet, cut out the gap with ear deficient forceps at the edge of the pig's ears according to the corresponding rules, and compose numbers according to the corresponding rules to identify different pigs. In the same pig farm, the same year, the number of the same breed of pigs must not be repeated. This method has been used in the industry for many years and is a more traditional numbering method.
  • Ear tags In most cases, ear tags are used for adult breeding pigs after reserving, but ear tags are now gradually being used for piglets. When used, the ear tag head penetrates the animal's ear, embeds the auxiliary tag, and fixes the ear tag. The ear tag neck is left in the perforation, and the ear tag surface carries the encoded information.
  • the inventor's research found that in the related methods, the calculation of the number of pigs in the pig farm is mainly to manually calculate the number of individuals in different pens. Each method has corresponding defects, as follows.
  • Ear deficiency In related technical solutions, different pig farms use different marking standards and specifications. Some numbers are misprinted and cannot be corrected, and the error rate during reading is also high. The workload generated during the entire work process is huge, and the pig itself is harmed during the marking process.
  • Ear tags Different pigs need ear tags of different specifications, and the ear tags will fall when the pigs are moving, causing confusion among individuals. Significant labor costs incurred during the marking process.
  • a technical problem solved by the present disclosure is how to quickly, accurately, and efficiently identify the number of targets in a target group.
  • a method for identifying the number of targets includes processing a to-be-predicted image using a deep learning neural network to obtain a target point cloud having shallow image features and deep image features of the to-be-predicted image.
  • each target point in the target point cloud image represents a target in the image to be predicted; the number of point clouds in the target point cloud image is identified to obtain the number of targets in the image to be predicted.
  • the deep learning neural network includes a convolution layer and a deconvolution layer, where the deconvolution layer is configured to perform a deconvolution operation on the image features output by the convolution layer and superimpose it on the output of the convolution layer.
  • the image features enable the deep learning neural network to process the image to be predicted into a target point cloud image with shallow image features and deep image features of the image to be predicted.
  • the deep learning neural network includes the first five convolution blocks of the VGG16 network model and an additional first deconvolution layer, a second deconvolution layer, and a third deconvolution layer;
  • the first deconvolution The layer is configured to deconvolve the image feature output from the fifth convolution block and superimpose it to the image feature output from the fourth convolution block;
  • the second deconvolution layer is configured to The image features are deconvolved and superimposed on the image features output by the third convolution block;
  • the third deconvolution layer is configured to perform a deconvolution operation on the image features of the second superposition output and superimposed on the second feature Image features output by the convolution block.
  • the method further includes: setting the number of channels of the first deconvolution layer, the second deconvolution layer, and the third deconvolution layer to 256; before the first superposition operation, using 1 ⁇
  • the convolution operation of 1 processes the number of channels of the image feature output by the fifth convolution block to 256; before the second superposition operation, the image feature output by the fourth convolution block is adopted by the 1 ⁇ 1 convolution operation
  • the number of channels is processed as 256; before the third superposition operation, the number of channels of the image features output by the third convolution block is processed as 256 by a 1 ⁇ 1 convolution operation.
  • the deep learning neural network further includes an additional convolution layer, and the additional convolution layer is configured to process the image features output by the third superposition to obtain a smooth gray image of the target point cloud.
  • the method further includes: performing a dot operation on each target in the training image; and training the deep learning neural network by using the training image and the training image after the puncturing, so that the deep learning neural network can process the image to be predicted
  • Each target point in the target point cloud image represents a target in the image to be predicted.
  • Gaussian blur processing is performed after performing dot operations on each target in the training image; the training image and the training image after Gaussian blur processing are used to train the deep learning neural network.
  • performing a dot operation on each target in the training image includes: performing a dot operation on each target in the training image at different target locations, and enabling the target point representing the first target to be expanded to eight surrounding pixels. After that is still on the first goal.
  • the method further includes: using a camera to record the target group to be identified to obtain a video of the target group to be identified; capturing a video image from the video, and identifying the number of targets in the video image in real time using the foregoing steps.
  • the number of cameras is multiple, and the resolution of the cameras is negatively related to the light intensity of the recording environment of the cameras.
  • the wide angle of the cameras changes with the installation position of the cameras.
  • an apparatus for identifying the number of targets including: an image processing module configured to process a to-be-predicted image using a deep learning neural network to obtain a shallow image feature having the to-be-predicted image And the target point cloud image of deep image features, each target point in the target point cloud image represents a target in the image to be predicted; the number recognition module is configured to identify the number of point clouds in the target point cloud image to obtain The number of targets in the image to be predicted.
  • the deep learning neural network includes a convolution layer and a deconvolution layer, where the deconvolution layer is configured to perform a deconvolution operation on the image features output by the convolution layer and superimpose it on the output of the convolution layer.
  • the image features enable the deep learning neural network to process the image to be predicted into a target point cloud image with shallow image features and deep image features of the image to be predicted.
  • the deep learning neural network includes the first five convolution blocks of the VGG16 network model and an additional first deconvolution layer, a second deconvolution layer, and a third deconvolution layer;
  • the first deconvolution The layer is configured to deconvolve the image feature output from the fifth convolution block and superimpose it to the image feature output from the fourth convolution block;
  • the second deconvolution layer is configured to The image features are deconvolved and superimposed on the image features output by the third convolution block;
  • the third deconvolution layer is configured to perform a deconvolution operation on the image features of the second superposition output and superimposed on the second feature Image features output by the convolution block.
  • the number of channels of the first deconvolution layer, the second deconvolution layer, and the third deconvolution layer is 256; the image processing module is configured to: before the first superposition operation, use 1 ⁇ The convolution operation of 1 processes the number of channels of the image features output by the fifth convolution block to 256; before the second superposition operation, a 1 ⁇ 1 convolution operation will be used to output the image of the fourth convolution block
  • the number of channels of features is processed as 256; before the third superposition operation, the number of channels of image features output by the third convolution block is processed as 256 by a 1 ⁇ 1 convolution operation.
  • the deep learning neural network further includes an additional convolution layer, and the additional convolution layer is configured to process the image features output by the third superposition to obtain a smooth gray image of the target point cloud.
  • the device further includes a network training module configured to: perform a dot operation on each target in the training image; use the training image and the trained image to train the deep learning neural network so that the deep learning neural network
  • the network can process the image to be predicted into a target point cloud image with shallow image features and deep image features of the image to be predicted.
  • Each target point in the target point cloud image represents a target in the image to be predicted.
  • the network training module is configured to perform Gaussian blur processing after performing a dot operation on each target in the training image; use the training image and the training image after the Gaussian blur processing to train the deep learning neural network.
  • the network training module is configured to perform a dot operation on each target in the training image at different target locations, and make the target point representing the first target expand to eight pixels around it and still be located at the first point On target.
  • the device further includes: a camera module configured to record a target group to be identified to obtain a video of the target group to be identified; an image interception module configured to intercept a video image from the video, using the foregoing steps Identify the number of targets in a video image in real time.
  • the number of camera modules is multiple, and the resolution of the camera module is inversely related to the light intensity of the recording environment of the camera module.
  • the wide angle of the camera module changes with the installation position of the camera module.
  • a device for identifying the number of targets including: a memory; and a processor coupled to the memory, the processor being configured to perform the foregoing identification based on instructions stored in the memory.
  • Target quantity method including: a processor coupled to the memory, the processor being configured to perform the foregoing identification based on instructions stored in the memory.
  • a computer-readable storage medium wherein the computer-readable storage medium stores computer instructions, and when the instructions are executed by a processor, the foregoing method for identifying the number of targets is implemented.
  • the disclosure adopts artificial intelligence technology, which can quickly, accurately and efficiently identify the number of targets in the target group, and its application to the animal husbandry industry can provide basic support for intelligent feeding.
  • Figure 1 shows a schematic diagram of different data storage methods.
  • Figure 2 shows the training images after pigs were spotted.
  • FIG. 3 shows a schematic structural diagram of a deep learning neural network used in the present disclosure.
  • FIG. 4 shows a schematic diagram of an image to be predicted.
  • FIG. 5 shows a schematic diagram of a target point cloud image.
  • FIG. 6 is a schematic structural diagram of an apparatus for identifying the number of targets in some embodiments of the present disclosure.
  • FIG. 7 is a schematic structural diagram of an apparatus for identifying the number of targets in another embodiment of the present disclosure.
  • artificial intelligence technology can greatly improve pig raising efficiency and save a lot of labor costs.
  • artificial intelligence technology can be used to monitor and record the entire process of the pig's life cycle. And through real-time monitoring of the behavioral trajectory, physical condition and characteristic data of each pig, scientific feeding and disease prevention and control are carried out, so that ultra-large-scale breeding can be realized.
  • the present disclosure addresses the problems of inaccurate and tedious statistics of pig numbers in pig farms.
  • a process for identifying the number of pigs using an artificial intelligence algorithm is proposed.
  • the artificial intelligence algorithm combined with the monitoring camera (camera) calculates the number of pigs in real time, which greatly saves costs and improves feeding efficiency.
  • the number of pigs and the few phenomena when manually calculating the number of pigs are solved, providing basic support for subsequent intelligent feeding, and improving the feasibility of artificial intelligence projects in the livestock industry.
  • the method of identifying the number of targets provided by the present disclosure is described below in stages.
  • the laying of cameras is to monitor the overall operation of the pig farm, to detect problems in time, and to provide corresponding image data for artificial intelligence algorithms.
  • a camera (camera) with corresponding parameters is selected for erection.
  • the main function of this process is to identify the number of pigs. Therefore, when setting up a camera (camera), it is generally necessary to lay it on the roof corner and set the corresponding camera (camera) angle so that all the pigs in the pig farm can be photographed as much as possible in the later stage. Only, the convenient artificial intelligence algorithm can calculate the number of pigs on the farm in real time.
  • a camera (camera) installed at the corner of the roof needs to have clear pixels and be able to transmit video streams in real time for analysis by the algorithm.
  • the camera (camera) set up below needs to have a wide angle enough to allow more pigs to be included in the scope of the camera (camera). According to different situations, sometimes it is necessary to record at night or under different lighting conditions. While considering the cost, it is necessary to select cameras (cameras) with different parameters at different locations to reduce the overall hardware cost.
  • the cameras (cameras) laid in the early stage are the infrastructure to provide data.
  • the cameras (cameras) laid may be cloud storage or local external storage devices.
  • the images taken by the camera (camera) should be saved according to different storage methods, and according to the required data format, screenshots should be taken using the camera (camera) 's shooting screen, so that the captured images can see all pigs as much as possible and No cover.
  • the camera (camcorder) is required to be able to capture a valid picture at any time, to ensure that the picture quality is clear, and to store it in a format that can be recognized by the later algorithm.
  • the output of the pig number result is based on the data processed by the algorithm. Enter the collected raw data into the system through the interface or batch import, waiting for processing by the artificial intelligence system. If the camera (camera) selects cloud storage, you need to access the video stream interface from a third party to obtain real-time data. If the camera (camcorder) selects a local external storage device for storage, it needs a local server or a real-time video stream back through the network.
  • Figure 1 shows a schematic diagram of different data storage methods.
  • different storage methods should be selected according to different situations. Because some cloud storage compresses the video during the upload process in the cloud, we cannot get the original video quality. If the requirements for the video picture are higher in the later period, we should choose to meet the requirements Way of storage. As a local storage, it also has its shortcomings. First, the cost will rise, and second, the network requirements are high during the transmission process. If an accident occurs, there will be data loss.
  • the input image data is subjected to a preliminary filtering and screening. Only data that meets the specifications (mainly including the image data format and resolution requirements) will be transferred according to the normal process. Otherwise, the image data is defined as abnormal data, and the process is directly exited, no longer occupying system resources for processing.
  • the method for training a deep learning neural network specifically includes the following steps:
  • Figure 2 shows the training images after pigs were spotted.
  • make a dot on each pig It is possible to perform dot operations on each target in the training image at different target locations in order to enhance the generalization ability of the deep learning neural network, so that the deep learning neural network can identify different parts of the pig.
  • the deep learning neural network includes a convolution layer and a deconvolution layer.
  • the deconvolution layer is configured to perform a deconvolution operation on the image features output by the convolution layer and superimpose them on the image features output by the convolution layer.
  • the neural network can process the image to be predicted into a target point cloud image with shallow image features and deep image features of the image to be predicted.
  • FIG. 3 shows a schematic structural diagram of a deep learning neural network used in the present disclosure.
  • the deep learning neural network includes the first five convolution blocks of the VGG16 network model and an additional first deconvolution layer Q1, a second deconvolution layer Q2, and a third deconvolution layer Q3.
  • the deep learning neural network uses the first five convolution blocks of VGG16, removing the fully connected layer of VGG16 itself.
  • Each convolutional block of VGG16 will have a pooling operation, so that the length and width of the output image will become 1/2 of the input image, so the output image size of the five convolutional blocks will be 1/2 of the training image, 1/4, 1/8, 1/16, 1/32, the number of image channels output by the five convolution blocks are 64, 128, 256, 512, and 512, respectively.
  • the first deconvolution layer is configured to perform a deconvolution operation on the image features output by the fifth convolution block P5 (the output image size becomes 1/16 of the training image) and superimposed on the fourth convolution block P4 output (The output image size is 1/16 of the training image), and the second deconvolution layer is configured to perform a deconvolution operation on the image features of the first superimposed output (the output image size becomes 1/16 of the training image 8) and superimpose the image features output by the third convolution block P3 (the output image size is 1/8 of the training image), and the third deconvolution layer is configured to deconvolute the image features output by the second superposition Machine operation (the output image size becomes 1/4 of the training image) and superimposed on the image features output by the second convolution block P2 (the output image size is 1/4 of the training image).
  • Deep learning neural networks may also include additional convolutional layers P6, P7, P8.
  • the additional convolution layer is configured to process the image features output by the third superposition to obtain a smooth gray image of the target point cloud.
  • the additional convolution layer P6 outputs an image feature with a length and width of 1/4 the training image and the number of channels is 256
  • the additional convolution layer P7 outputs an image with a length and width of 1/4 the training image and the number of channels 256 Feature
  • the additional convolution layer P6 outputs image features with a length and width of 1/4 of the training image and a number of channels of 1.
  • the target point representing a certain pig in step S302, can be expanded to eight pixels around it and still be located on the pig, and each target in the training image can be Gaussian after performing a dot operation. Obfuscation.
  • the deep learning neural network is trained by using the training image and the training image after Gaussian blur processing.
  • a specific dot specification is used for dot marking, and high-speed blur processing is performed after the dot marking operation on the training image, which can turn a pixel representing a target into a circle of pixel points, so that the target point is more clearly displayed in the training image.
  • high-speed blur processing is performed after the dot marking operation on the training image, which can turn a pixel representing a target into a circle of pixel points, so that the target point is more clearly displayed in the training image.
  • the trained deep learning neural network model is used to process the input image data, analyze the pigs in the image, and perform dot processing on the pigs in the input image data. Each pig needs a dot, and the midpoint of the image is finally calculated. To calculate the number of pigs on the farm. As shown in the figure below, the number of pigs is identified by individual identification of each pig.
  • the method of identifying the target quantity includes the following steps:
  • a deep learning neural network is used to process the image to be predicted, and a target point cloud image with shallow image features and deep image features of the image to be predicted is obtained.
  • Each target point in the target point cloud image represents the target image in the image to be predicted.
  • FIG. 4 shows a schematic diagram of an image to be predicted
  • FIG. 5 shows a schematic diagram of a target point cloud image.
  • integrating and summing the number of target points in the target point cloud image can obtain the number of pigs in the image to be predicted.
  • the data is packaged and sorted, and the number of pigs is output, and the number of pigs on the pig farm is displayed in real time on, for example, an intelligent pig farm APP client or other terminal.
  • the display interface of the terminal is embedded in the intelligent pig farm management software and displayed to the users in as intuitive a form as possible, so that the staff can clearly and intuitively check the real-time status of the farm and provide assistance for subsequent production.
  • the above embodiment uses artificial intelligence technology, which can automatically, quickly, accurately, and efficiently identify the number of targets in the target group. It solves the problem of multiple counts and minority numbers in pig farms when pigs are counted. Labor costs and time costs can be applied to the livestock industry to provide basic support for intelligent unmanned farms.
  • cameras (cameras) with different parameters will be set according to the image requirements of different locations, which can maximize the utilization of each camera (camera) and reduce costs as much as possible on the premise of meeting the requirements.
  • the preliminary screening and filtering of invalid data can further improve the efficiency of resource utilization and save costs.
  • the data access scheme is optimized to allow effective data to be intercepted at any time when needed, and different schemes can be selected according to the actual situation of different pig farms to achieve personalized customization.
  • FIG. 6 is a schematic structural diagram of an apparatus for identifying the number of targets in some embodiments of the present disclosure. As shown in FIG. 6, the device 60 for identifying the number of targets in this embodiment includes:
  • An image processing module 603 is configured to process a to-be-predicted image by using a deep learning neural network to obtain a target point cloud image having shallow image characteristics and deep image characteristics of the image to be predicted, and each target point in the target point cloud image represents A target in the image to be predicted;
  • the number recognition module 604 is configured to identify the number of point clouds in the target point cloud image to obtain the number of targets in the image to be predicted.
  • the deep learning neural network includes a convolution layer and a deconvolution layer, where the deconvolution layer is configured to perform a deconvolution operation on the image features output by the convolution layer and superimpose it on the output of the convolution layer.
  • the image features enable the deep learning neural network to process the image to be predicted into a target point cloud image with shallow image features and deep image features of the image to be predicted.
  • the deep learning neural network includes the first five convolution blocks of the VGG16 network model and an additional first deconvolution layer, a second deconvolution layer, and a third deconvolution layer;
  • the first deconvolution The layer is configured to deconvolve the image feature output from the fifth convolution block and superimpose it to the image feature output from the fourth convolution block;
  • the second deconvolution layer is configured to The image features are deconvolved and superimposed on the image features output by the third convolution block;
  • the third deconvolution layer is configured to perform a deconvolution operation on the image features of the second superposition output and superimposed on the second feature Image features output by the convolution block.
  • the number of channels of the first deconvolution layer, the second deconvolution layer, and the third deconvolution layer is 256; the image processing module is configured to: before the first superposition operation, use 1 ⁇ The convolution operation of 1 processes the number of channels of the image features output by the fifth convolution block to 256; before the second superposition operation, a 1 ⁇ 1 convolution operation will be used to output the image of the fourth convolution block
  • the number of channels of features is processed as 256; before the third superposition operation, the number of channels of image features output by the third convolution block is processed as 256 by a 1 ⁇ 1 convolution operation.
  • the deep learning neural network further includes an additional convolution layer, and the additional convolution layer is configured to process the image features output by the third superposition to obtain a smooth gray image of the target point cloud.
  • the device 60 further includes a network training module 602 configured to: perform a dot operation on each target in the training image; use the training image and the trained image to train the deep learning neural network such that the depth
  • the learning neural network can process the image to be predicted into a target point cloud image with shallow image features and deep image features of the image to be predicted.
  • Each target point in the target point cloud image represents a target in the image to be predicted.
  • the network training module 602 is configured to perform Gaussian blur processing after performing a dot operation on each target in the training image; use the training image and the training image after the Gaussian blur processing to train the deep learning neural network.
  • the network training module 602 is configured to perform a dot operation on each target in the training image at different target locations, and make the target point representing the first target expand to eight surrounding pixels and still be located at the third position. On a goal.
  • the device 60 further includes: a camera module 600 configured to record a target group to be identified to obtain a video of the target group to be identified; and an image interception module 601 configured to intercept a video image from the video and adopt The foregoing steps identify the number of targets in the video image in real time.
  • the number of camera modules is multiple, and the resolution of the camera module is inversely related to the light intensity of the recording environment of the camera module.
  • the wide angle of the camera module changes with the installation position of the camera module.
  • the above embodiment uses artificial intelligence technology, which can automatically, quickly, accurately, and efficiently identify the number of targets in the target group. It solves the problem of multiple counts and minority numbers in pig farms when pigs are counted. Labor costs and time costs can be applied to the livestock industry to provide basic support for intelligent unmanned farms.
  • FIG. 7 is a schematic structural diagram of an apparatus for identifying the number of targets in another embodiment of the present disclosure.
  • the device 70 for identifying the number of targets in this embodiment includes a memory 710 and a processor 720 coupled to the memory 710.
  • the processor 720 is configured to execute any of the foregoing based on instructions stored in the memory 710.
  • the memory 710 may include, for example, a system memory, a fixed non-volatile storage medium, and the like.
  • the system memory stores, for example, an operating system, an application program, a boot loader (Boot Loader), and other programs.
  • the device 70 for identifying the target number may further include an input-output interface 730, a network interface 740, a storage interface 750, and the like. These interfaces 730, 740, 750, and the memory 710 and the processor 720 may be connected through a bus 760, for example.
  • the input / output interface 730 provides a connection interface for input / output devices such as a display, a mouse, a keyboard, and a touch screen.
  • the network interface 740 provides a connection interface for various networked devices.
  • the storage interface 750 provides a connection interface for external storage devices such as an SD card and a U disk.
  • the present disclosure also includes a computer-readable storage medium having stored thereon computer instructions that, when executed by a processor, implement a method for identifying the number of targets in any of the foregoing embodiments.
  • the embodiments of the present disclosure may be provided as a method, a system, or a computer program product. Therefore, the present disclosure may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware aspects. Moreover, the present disclosure may take the form of a computer program product implemented on one or more computer-usable non-transitory storage media (including but not limited to disk storage, CD-ROM, optical storage, etc.) containing computer-usable program code therein. .
  • These computer program instructions may also be stored in a computer-readable memory capable of directing a computer or other programmable data processing device to work in a specific manner such that the instructions stored in the computer-readable memory produce a manufactured article including an instruction device, the instructions
  • the device implements the functions specified in one or more flowcharts and / or one or more blocks of the block diagram.
  • These computer program instructions can also be loaded onto a computer or other programmable data processing device, so that a series of steps can be performed on the computer or other programmable device to produce a computer-implemented process, which can be executed on the computer or other programmable device.
  • the instructions provide steps for implementing the functions specified in one or more flowcharts and / or one or more blocks of the block diagrams.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Computational Linguistics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Evolutionary Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Multimedia (AREA)
  • Image Analysis (AREA)

Abstract

本公开提供了一种识别目标数量的方法、装置及计算机可读存储介质,涉及人工智能技术领域。其中的识别目标数量的方法包括:采用深度学习神经网络对待预测图像进行处理,得到具有待预测图像的浅层图像特征以及深层图像特征的目标点云图像,目标点云图像中的每个目标点代表待预测图像中的一个目标;对目标点云图像中的点云数量进行识别,得到待预测图像中的目标数量。本公开采用人工智能技术,能够快速、准确、高效的识别出目标群体中目标的数量,应用至畜牧行业能够为智能化饲养提供基础支撑。

Description

识别目标数量的方法、装置及计算机可读存储介质
本申请是以CN申请号为201810733440.6,申请日为2018年7月6日的申请为基础,并主张其优先权,该CN申请的公开内容在此作为整体引入本申请中。
技术领域
本公开涉及人工智能技术领域,特别涉及一种识别目标数量的方法、装置及计算机可读存储介质。
背景技术
农业是我国的第一产业,我国自古以来就是农耕社会,肥沃的土地孕育、滋养了伟大的华夏民族。同时,农业也是国民经济的基础,关系到我们的日常饮食生活。养猪业作为农业的重要组成部分,对保障肉食品安全供应有重要作用。目前我国的养猪业正在从传统的养猪业向现代养猪业转变。但是,现有的猪场管理仍较为粗犷,从猪场建设到后期的饲养管理都缺乏技术人员的参与。诸多小散养殖户抗风险能力极差,无法保证其稳定盈利。在饲养过程中,因消毒措施以及防范措施不到位,饲养人员与猪只的频繁接触的过程中造成细菌、疾病的传染亦是一大隐患。
养猪场在进行猪个体数量识别的相关方法有如下几种。
耳缺:一般在仔猪出生后1~2天内,根据相应的规则在猪耳的边缘,用耳缺钳剪出缺口,根据相应的规则组成数字编号,以识别不同的猪只。同一个猪场内,同一年份,同一品种猪的编号不可重复。此方法在行业内使用多年,是较为传统的编号方法。
刺青:利用刺青钳对猪打上刺青,以分辨识别猪个体。
耳标:多数情况下,对于留种后的成年种猪使用耳标,但现在也逐步对仔猪开始使用耳标。使用时耳标头穿透牲畜耳部、嵌入辅标、固定耳标,耳标颈留在穿孔内,耳标面登载编码信息。
发明内容
发明人研究发现,相关方法中猪场对于猪只的数量计算主要是通过人工来计算不同猪栏内的个体数量,每种方法都有相应的缺陷,具体如下。
耳缺:相关技术方案中,不同的猪场在使用不同的打标标准、规范并不统一。有一些数字会出现错打现象且无法纠正,读取过程中的错误率也较高。整个工作过程中产生的工作量十分巨大,且在打标的过程中对猪体本身亦有所伤害。
刺青:在国内使用较少,操作过程较为繁琐,成本较高。
耳标:不同的猪需要不同规格的耳标,且在猪只活动时会导致耳标掉落,致使个体混淆。打标过程中产生大量人工成本。
由此可见,相关的技术方案中,人工计算猪个体数量会产生大量的人工成本,并且会出现错数、少数、重数等现象,导致个体数量统计不准确。
本公开解决的一个技术问题是,如何快速、准确、高效的识别出目标群体中目标的数量。
根据本公开实施例的一个方面,提供了一种识别目标数量的方法,包括:采用深度学习神经网络对待预测图像进行处理,得到具有待预测图像的浅层图像特征以及深层图像特征的目标点云图像,目标点云图像中的每个目标点代表待预测图像中的一个目标;对目标点云图像中的点云数量进行识别,得到待预测图像中的目标数量。
在一些实施例中,深度学习神经网络包括卷积层和反卷积层,其中的反卷积层被配置为对卷积层输出的图像特征进行反卷积操作并叠加至卷积层输出的图像特征,使得深度学习神经网络能够将待预测图像处理为具有待预测图像的浅层图像特征以及深层图像特征的目标点云图像。
在一些实施例中,深度学习神经网络包括VGG16网络模型的前五个卷积块与附加的第一反卷积层、第二反卷积层、第三反卷积层;第一反卷积层被配置为对第五个卷积块输出的图像特征进行反卷积操作并叠加至第四个卷积块输出的图像特征;第二反卷积层被配置为对第一次叠加输出的图像特征进行反卷积操作并叠加至第三个卷积块输出的图像特征;第三反卷积层被配置为对第二次叠加输出的图像特征进行反卷机操作并叠加至第二个卷积块输出的图像特征。
在一些实施例中,该方法还包括:将第一反卷积层、第二反卷积层、第三反卷积层的通道数设置为256;在第一次叠加操作之前,采用1×1的卷积操作将第五个卷积块输出的图像特征的通道数处理为256;在第二次叠加操作之前,采用1×1的卷积操作将第四个卷积块输出的图像特征的通道数处理为256;在第三次叠加操作之前,采用1×1的卷积操作将第三个卷积块输出的图像特征的通道数处理为256。
在一些实施例中,深度学***滑的目标点云灰度图像。
在一些实施例中,该方法还包括:对训练图像中的各个目标进行打点操作;利用训练图像以及打点后的训练图像对深度学习神经网络进行训练,使得深度学习神经网络能够将待预测图像处理为具有待预测图像的浅层图像特征以及深层图像特征的目标点云图像,目标点云图像中的每个目标点代表待预测图像中的一个目标。
在一些实施例中,对训练图像中的各个目标进行打点操作后进行高斯模糊处理;利用训练图像以及高斯模糊处理后的训练图像对深度学习神经网络进行训练。
在一些实施例中,对训练图像中的各个目标进行打点操作包括:对训练图像中的各个目标在不同目标部位进行打点操作,并使得将代表第一目标的目标点扩大至周围八个像素点后仍然位于第一目标上。
在一些实施例中,该方法还包括:采用摄像机对待识别目标群进行录像,得到待识别目标群的视频;从视频中截取视频图像,采用前述步骤实时识别视频图像中的目标数量。
在一些实施例中,摄像机的个数为多个,摄像机的分辨率与摄像机录像环境的光照强度呈负相关,摄像机的广角随摄像机的架设位置变化而变化。
根据本公开实施例的另一个方面,提供了一种识别目标数量的装置,包括:图像处理模块,被配置为采用深度学习神经网络对待预测图像进行处理,得到具有待预测图像的浅层图像特征以及深层图像特征的目标点云图像,目标点云图像中的每个目标点代表待预测图像中的一个目标;数量识别模块,被配置为对目标点云图像中的点云数量进行识别,得到待预测图像中的目标数量。
在一些实施例中,深度学习神经网络包括卷积层和反卷积层,其中的反卷积层被配置为对卷积层输出的图像特征进行反卷积操作并叠加至卷积层输出的图像特征,使得深度学习神经网络能够将待预测图像处理为具有待预测图像的浅层图像特征以及深层图像特征的目标点云图像。
在一些实施例中,深度学习神经网络包括VGG16网络模型的前五个卷积块与附加的第一反卷积层、第二反卷积层、第三反卷积层;第一反卷积层被配置为对第五个卷积块输出的图像特征进行反卷积操作并叠加至第四个卷积块输出的图像特征;第二反卷积层被配置为对第一次叠加输出的图像特征进行反卷积操作并叠加至第三个卷积块输出的图像特征;第三反卷积层被配置为对第二次叠加输出的图像特征进行反卷机操作并叠加至第二个卷积块输出的图像特征。
在一些实施例中,第一反卷积层、第二反卷积层、第三反卷积层的通道数为256;图像处理模块被配置为:在第一次叠加操作之前,采用1×1的卷积操作将第五个卷积块输出的图像特征的通道数处理为256;在第二次叠加操作之前,将采用1×1的卷积操作将第四个卷积块输出的图像特征的通道数处理为256;在第三次叠加操作之前,将采用1×1的卷积操作将第三个卷积块输出的图像特征的通道数处理为256。
在一些实施例中,深度学***滑的目标点云灰度图像。
在一些实施例中,该装置还包括网络训练模块,被配置为:对训练图像中的各个目标进行打点操作;利用训练图像以及打点后的训练图像对深度学习神经网络进行训练,使得深度学习神经网络能够将待预测图像处理为具有待预测图像的浅层图像特征以及深层图像特征的目标点云图像,目标点云图像中的每个目标点代表待预测图像中的一个目标。
在一些实施例中,网络训练模块被配置为:对训练图像中的各个目标进行打点操作后进行高斯模糊处理;利用训练图像以及高斯模糊处理后的训练图像对深度学习神经网络进行训练。
在一些实施例中,网络训练模块被配置为:对训练图像中的各个目标在不同目标部位进行打点操作,并使得将代表第一目标的目标点扩大至周围八个像素点后仍然位于第一目标上。
在一些实施例中,该装置还包括:摄像模块,被配置为对待识别目标群进行录像,得到待识别目标群的视频;图像截取模块,被配置为从视频中截取视频图像,采用前述的步骤实时识别视频图像中的目标数量。
在一些实施例中,摄像模块的个数为多个,摄像模块的分辨率与摄像模块录像环境的光照强度呈负相关,摄像模块的广角随摄像模块的架设位置变化而变化。
根据本公开实施例的又一个方面,提供了一种识别目标数量的装置,包括:存储器;以及耦接至存储器的处理器,处理器被配置为基于存储在存储器中的指令,执行前述的识别目标数量的方法。
根据本公开实施例的再一个方面,提供了一种计算机可读存储介质,其中,计算机可读存储介质存储有计算机指令,指令被处理器执行时实现前述的识别目标数量的方法。
本公开采用人工智能技术,能够快速、准确、高效的识别出目标群体中目标的数 量,应用至畜牧行业能够为智能化饲养提供基础支撑。
通过以下参照附图对本公开的示例性实施例的详细描述,本公开的其它特征及其优点将会变得清楚。
附图说明
此处所说明的附图用来提供对本公开的进一步理解,构成本申请的一部分,本公开的示意性实施例及其说明用于解释本公开,并不构成对本公开的不当限定。在附图中:
图1示出了不同数据存储方式的示意图。
图2示出了对猪只进行打点后的训练图像。
图3示出了本公开使用的深度学习神经网络的结构示意图。
图4示出了待预测图像的示意图。
图5示出了目标点云图像的示意图。
图6示出了本公开一些实施例的识别目标数量的装置的结构示意图。
图7示出了本公开另一些实施例的识别目标数量的装置的结构示意图。
具体实施方式
下面将结合本公开实施例中的附图,对本公开实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本公开一部分实施例,而不是全部的实施例。以下对至少一个示例性实施例的描述实际上仅仅是说明性的,决不作为对本公开及其应用或使用的任何限制。基于本公开中的实施例,本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其它实施例,都属于本公开保护的范围。
人工智能技术作为新一轮产业变革的核心驱动力,正在释放历次科技革命和产业变革积蓄的巨大能量,并且创造新的强大引擎,重构生产、分配、交换、消费等经济活动各环节,形成从宏观到微观各领域的智能化新需求,催生新技术、新产品、新产业、新业态、新模式,引发经济结构重大变革,深刻改变人类生产生活方式和思维模式,实现社会生产力的整体跃升。
全流程的人工智能技术能够大大的提高养猪效率,节约大量的人工成本。在饲养过程中,可以利用人工智能技术对猪只生命周期的全过程进行监控、记录。并通过实时监控每头猪的行为轨迹、身体状况、特征数据,来进行科学的投喂料以及疾病防控, 从而使得超大规模养殖得以实现。
本公开针对猪场中猪只数量统计不准确、繁琐等问题。提出了利用人工智能算法识别猪只数量的流程。通过人工智能算法,结合监控摄像头(摄像机)实时计算猪只数量,极大节约了成本,提高了饲养效率。同时解决了人工计算猪只数量时的重数、少数现象,为后续的智能化饲养提供基础支撑,提升了人工智能项目在畜牧行业落地的可行性。下面分阶段描述对本公开提供的识别目标数量的方法。
(一)铺设摄像头(摄像机)
摄像头(摄像机)的铺设是为了监控猪场整体运行情况,及时发现问题,也为人工智能算法提供相应的图像数据。根据不同部位所需要图像、视频质量的不同,选取相应参数的摄像头(摄像机)进行架设。本流程的主要功能是为了识别猪只数量,因此,在架设摄像头(摄像机)时一般需铺设在屋顶角落,设置好相应的摄像头(摄像机)角度,以便后期能够尽可能拍摄到猪场所有的猪只,方便人工智能算法可以实时计算猪场猪只数量。一般来说,架设在屋顶角落的摄像头(摄像机),需要像素足够清晰,并且能够实时回传视频流,以便于算法进行分析。架设在下方的摄像头(摄像机),则需要有足够的广角,以便将更多的猪只纳入到摄像头(摄像机)范围内。根据不同情况,有时需要在夜间或者不同光照条件下进行摄录,在考虑成本的同时,需要在不同的位置选用不同参数的摄像头(摄像机),从整体角度降低硬件成本。
(二)采集数据
前期铺设的摄像头(摄像机)是提供数据的基础设施,所铺设摄像头(摄像机)可能为云端存储,或者本地外接存储设备存储。应该根据不同的存储方式将摄像头(摄像机)所摄录图像进行保存,并根据所需数据格式,利用摄像头(摄像机)的拍摄画面,进行截图,使得拍摄图像能够尽可能的看到所有猪只且没有遮挡。同时要求摄像头(摄像机)能够随时拍摄到有效画面,能够保证画质清晰,并存储为后期算法可识别格式。
(三)数据接入
猪只数量结果的产出是基于算法对信息进行加工后的数据。将采集的原始数据通过接口或者批量导入的形式录入到***中,等待人工智能***的处理。如果摄像头(摄像机)选择的是云端存储,则需要从第三方接入视频流接口,以获取实时数据。如果摄像头(摄像机)选择的是本地外接存储设备存储,则需要有本地服务器或者通过网络实时回传视频流。
图1示出了不同数据存储方式的示意图。在实际操作中要根据不同的情况选取不同的存储方式。因为某些云端存储在上传云端的过程中,会对视频进行压缩,因此我们并不能够得到原画质的视频,如果在后期对视频画面的要求较高的情况下,应选用能够满足其要求的存储方式。作为本地存储也有其缺点,首先便是成本将会上升,其次,在传输过程中对于网络要求较高,如果出现意外,存在数据丢失现象。
(四)数据有效性判断
当数据落库后,我们会对数据的真实性和有效性进行校验。依据相应的规范,形成的简单算法,先对输入的图像数据进行一次初步的过滤和筛查,符合规范(主要包括图像数据的格式、分辨率要求)的数据才会按照正常的流程去流转,否则会把图像数据定义为异常数据,直接退出流程,不再占用***资源进行处理。
(五)模型训练
在算法模型构建前期,需要将机器进行预打标的图像数据外包到数据标注平台,采用人工的方式进行标注,图像数据中的猪只即为需要标注的目标。
对深度学习神经网络进行训练的方法具体包括如下步骤:
(1)对训练图像中的各个目标进行打点操作。
图2示出了对猪只进行打点后的训练图像。打点时,在每一只猪上打一个点。可以对训练图像中的各个目标在不同目标部位进行打点操作,以便增强深度学习神经网络的泛化能力,使得深度学习神经网络能够识别猪的不同部位。
(2)利用训练图像以及打点后的训练图像对深度学习神经网络进行训练,使得深度学习神经网络能够将待预测图像处理为具有待预测图像的浅层图像特征以及深层图像特征的目标点云图像,目标点云图像中的每个目标点代表待预测图像中的一个目标。
深度学习神经网络包括卷积层和反卷积层,其中的反卷积层被配置为对卷积层输出的图像特征进行反卷积操作并叠加至卷积层输出的图像特征,使得深度学习神经网络能够将待预测图像处理为具有待预测图像的浅层图像特征以及深层图像特征的目标点云图像。
图3示出了本公开使用的深度学习神经网络的结构示意图。如图3所示,深度学习神经网络包括VGG16网络模型的前五个卷积块与附加的第一反卷积层Q1、第二反卷积层Q2、第三反卷积层Q3。
深度学习神经网络采用了VGG16前五个卷积块,去掉了VGG16本身的全连接层。 VGG16每个卷积块都会有一次池化操作,使得输出图像的长、宽尺寸均变为输入图像的1/2,这样五个卷积块输出的图像尺寸分别为训练图像的1/2、1/4、1/8、1/16、1/32,五个卷积块输出的图像通道数分别为64、128、256、512、512。
第一反卷积层被配置为对第五个卷积块P5输出的图像特征进行反卷积操作(输出图像尺寸变为训练图像的1/16)并叠加至第四个卷积块P4输出的图像特征(输出图像尺寸为训练图像的1/16),第二反卷积层被配置为对第一次叠加输出的图像特征进行反卷积操作(输出图像尺寸变为训练图像的1/8)并叠加至第三个卷积块P3输出的图像特征(输出图像尺寸为训练图像的1/8),第三反卷积层被配置为对第二次叠加输出的图像特征进行反卷机操作(输出图像尺寸变为训练图像的1/4)并叠加至第二个卷积块P2输出的图像特征(输出图像尺寸为训练图像的1/4)。将第一反卷积层、第二反卷积层、第三反卷积层的通道数设置为256,在第一次叠加操作之前,采用1×1的卷积操作将第五个卷积块输出的图像特征的通道数处理为256,在第二次叠加操作之前,采用1×1的卷积操作将第四个卷积块输出的图像特征的通道数处理为256,在第三次叠加操作之前,采用1×1的卷积操作将第三个卷积块输出的图像特征的通道数处理为256,以便实现图像特征的叠加操作。
深度学***滑的目标点云灰度图像。其中,附加的卷积层P6输出长宽尺寸为训练图像1/4、通道数为256的图像特征,附加的卷积层P7输出长宽尺寸为训练图像1/4、通道数为256的图像特征,附加的卷积层P6输出长宽尺寸为训练图像1/4、通道数为1的图像特征。
在一些实施例中,在步骤S302中,可以使得将代表某只猪的目标点扩大至周围八个像素点后仍然位于该只猪上,并对训练图像中的各个目标进行打点操作后进行高斯模糊处理。在步骤S304中,利用训练图像以及高斯模糊处理后的训练图像对深度学习神经网络进行训练。
上述实施例中,采用特定的打点规范进行打点,并对训练图像打点操作后进行高速模糊处理,能够将代表目标的一个像素点变成一圈像素点,使目标点更明显的呈现在训练图像中,从而更高效的训练深度学习神经网络。训练后的深度学习能够节省人工标注的成本,能够快速识别图像中的目标。
(六)人工智能识别
完成有效性判断后,***会开始对这部分数据进行人工智能标注。采用训练好的 深度学习神经网络模型对输入的图像数据进行处理,分析图像中的猪只,对输入的图像数据中的猪只进行打点处理,每一头猪需打一个点,最后统计图像中点的数量来计算猪场中的猪只数量。如下图所示,通过对每头猪的个体识别,来对猪只数量进行辨认。识别目标数量的方法具体包括如下步骤:
(1)采用深度学习神经网络对待预测图像进行处理,得到具有待预测图像的浅层图像特征以及深层图像特征的目标点云图像,目标点云图像中的每个目标点代表待预测图像中的一个目标。图4示出了待预测图像的示意图,图5示出了目标点云图像的示意图。
(2)对目标点云图像中的点云数量进行识别,得到待预测图像中的目标数量。
例如,对目标点云图像中目标点的数量进行积分求和,可以得到待预测图像中的猪只的数量。
(七)结果展现形式
将数据打包整理,输出猪只数量结果,在例如智能猪场APP客户端或者其他终端实时显示猪场的猪只数量状况。终端显示界面内嵌到智能猪场管理软件中,并以尽可能直观的形式展现给使用人员,以便工作人员能够清晰、直观的查看猪场实时状况,为后续生产提供帮助。
上述实施例采用人工智能技术,能够自动、快速、准确、高效的识别出目标群体中目标的数量,解决了猪场在猪只数量清点时重数、少数等现象,直观可视,能够节约大量人力成本和时间成本,应用至畜牧行业能够为智能化无人养殖场提供基础支撑。
同时,根据不同位置的图像要求将设置不同参数的摄像头(摄像机),能够尽可能提高每个摄像头(摄像机)的利用率,在满足要求的前提下尽可能降低成本。通过初步筛选过滤无效数据,能够进一步提高资源的利用效率,节约成本。另外,数据接入方案通过优化设计,能够在需要时随时截取有效数据,并根据不同猪场的实际情况选用不同方案,实现个性化定制。
下面结合图6描述本公开一些实施例的识别目标数量的装置。
图6示出了本公开一些实施例的识别目标数量的装置的结构示意图。如图6所示,本实施例中的识别目标数量的装置60包括:
图像处理模块603,被配置为采用深度学习神经网络对待预测图像进行处理,得到具有待预测图像的浅层图像特征以及深层图像特征的目标点云图像,目标点云图像 中的每个目标点代表待预测图像中的一个目标;
数量识别模块604,被配置为对目标点云图像中的点云数量进行识别,得到待预测图像中的目标数量。
在一些实施例中,深度学习神经网络包括卷积层和反卷积层,其中的反卷积层被配置为对卷积层输出的图像特征进行反卷积操作并叠加至卷积层输出的图像特征,使得深度学习神经网络能够将待预测图像处理为具有待预测图像的浅层图像特征以及深层图像特征的目标点云图像。
在一些实施例中,深度学习神经网络包括VGG16网络模型的前五个卷积块与附加的第一反卷积层、第二反卷积层、第三反卷积层;第一反卷积层被配置为对第五个卷积块输出的图像特征进行反卷积操作并叠加至第四个卷积块输出的图像特征;第二反卷积层被配置为对第一次叠加输出的图像特征进行反卷积操作并叠加至第三个卷积块输出的图像特征;第三反卷积层被配置为对第二次叠加输出的图像特征进行反卷机操作并叠加至第二个卷积块输出的图像特征。
在一些实施例中,第一反卷积层、第二反卷积层、第三反卷积层的通道数为256;图像处理模块被配置为:在第一次叠加操作之前,采用1×1的卷积操作将第五个卷积块输出的图像特征的通道数处理为256;在第二次叠加操作之前,将采用1×1的卷积操作将第四个卷积块输出的图像特征的通道数处理为256;在第三次叠加操作之前,将采用1×1的卷积操作将第三个卷积块输出的图像特征的通道数处理为256。
在一些实施例中,深度学***滑的目标点云灰度图像。
在一些实施例中,该装置60还包括网络训练模块602,被配置为:对训练图像中的各个目标进行打点操作;利用训练图像以及打点后的训练图像对深度学习神经网络进行训练,使得深度学习神经网络能够将待预测图像处理为具有待预测图像的浅层图像特征以及深层图像特征的目标点云图像,目标点云图像中的每个目标点代表待预测图像中的一个目标。
在一些实施例中,网络训练模块602被配置为:对训练图像中的各个目标进行打点操作后进行高斯模糊处理;利用训练图像以及高斯模糊处理后的训练图像对深度学习神经网络进行训练。
在一些实施例中,网络训练模块602被配置为:对训练图像中的各个目标在不同目标部位进行打点操作,并使得将代表第一目标的目标点扩大至周围八个像素点后仍 然位于第一目标上。
在一些实施例中,该装置60还包括:摄像模块600,被配置为对待识别目标群进行录像,得到待识别目标群的视频;图像截取模块601,被配置为从视频中截取视频图像,采用前述步骤实时识别视频图像中的目标数量。
在一些实施例中,摄像模块的个数为多个,摄像模块的分辨率与摄像模块录像环境的光照强度呈负相关,摄像模块的广角随摄像模块的架设位置变化而变化。
上述实施例采用人工智能技术,能够自动、快速、准确、高效的识别出目标群体中目标的数量,解决了猪场在猪只数量清点时重数、少数等现象,直观可视,能够节约大量人力成本和时间成本,应用至畜牧行业能够为智能化无人养殖场提供基础支撑。
图7示出了本公开另一些实施例的识别目标数量的装置的结构示意图。如图7所示,该实施例的识别目标数量的装置70包括:存储器710以及耦接至该存储器710的处理器720,处理器720被配置为基于存储在存储器710中的指令,执行前述任意一些实施例中的识别目标数量的方法。其中,存储器710例如可以包括***存储器、固定非易失性存储介质等。***存储器例如存储有操作***、应用程序、引导装载程序(Boot Loader)以及其他程序等。
识别目标数量的装置70还可以包括输入输出接口730、网络接口740、存储接口750等。这些接口730、740、750以及存储器710和处理器720之间例如可以通过总线760连接。其中,输入输出接口730为显示器、鼠标、键盘、触摸屏等输入输出设备提供连接接口。网络接口740为各种联网设备提供连接接口。存储接口750为SD卡、U盘等外置存储设备提供连接接口。
本公开还包括一种计算机可读存储介质,其上存储有计算机指令,该指令被处理器执行时实现前述任意一些实施例中的识别目标数量的方法。
本领域内的技术人员应明白,本公开的实施例可提供为方法、***、或计算机程序产品。因此,本公开可采用完全硬件实施例、完全软件实施例、或结合软件和硬件方面的实施例的形式。而且,本公开可采用在一个或多个其中包含有计算机可用程序代码的计算机可用非瞬时性存储介质(包括但不限于磁盘存储器、CD-ROM、光学存储器等)上实施的计算机程序产品的形式。
本公开是参照根据本公开实施例的方法、设备(***)、和计算机程序产品的流程图和/或方框图来描述的。应理解可由计算机程序指令实现流程图和/或方框图中 的每一流程和/或方框、以及流程图和/或方框图中的流程和/或方框的结合。可提供这些计算机程序指令到通用计算机、专用计算机、嵌入式处理机或其他可编程数据处理设备的处理器以产生一个机器,使得通过计算机或其他可编程数据处理设备的处理器执行的指令产生用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的装置。
这些计算机程序指令也可存储在能引导计算机或其他可编程数据处理设备以特定方式工作的计算机可读存储器中,使得存储在该计算机可读存储器中的指令产生包括指令装置的制造品,该指令装置实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能。
这些计算机程序指令也可装载到计算机或其他可编程数据处理设备上,使得在计算机或其他可编程设备上执行一系列操作步骤以产生计算机实现的处理,从而在计算机或其他可编程设备上执行的指令提供用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的步骤。
以上所述仅为本公开的较佳实施例,并不用以限制本公开,凡在本公开的精神和原则之内,所作的任何修改、等同替换、改进等,均应包含在本公开的保护范围之内。

Claims (22)

  1. 一种识别目标数量的方法,包括:
    采用深度学习神经网络对待预测图像进行处理,得到具有待预测图像的浅层图像特征以及深层图像特征的目标点云图像,所述目标点云图像中的每个目标点代表待预测图像中的一个目标;
    对所述目标点云图像中的点云数量进行识别,得到待预测图像中的目标数量。
  2. 如权利要求1所述的方法,其中,所述深度学习神经网络包括卷积层和反卷积层,其中的反卷积层被配置为对卷积层输出的图像特征进行反卷积操作并叠加至卷积层输出的图像特征,使得所述深度学习神经网络能够将待预测图像处理为具有待预测图像的浅层图像特征以及深层图像特征的目标点云图像。
  3. 如权利要求1所述的方法,其中,
    所述深度学习神经网络包括VGG16网络模型的前五个卷积块与附加的第一反卷积层、第二反卷积层、第三反卷积层;
    所述第一反卷积层被配置为对第五个卷积块输出的图像特征进行反卷积操作并叠加至第四个卷积块输出的图像特征;
    所述第二反卷积层被配置为对第一次叠加输出的图像特征进行反卷积操作并叠加至第三个卷积块输出的图像特征;
    所述第三反卷积层被配置为对第二次叠加输出的图像特征进行反卷机操作并叠加至第二个卷积块输出的图像特征。
  4. 如权利要求3所述的方法,其中,所述方法还包括:
    将所述第一反卷积层、第二反卷积层、第三反卷积层的通道数设置为256;
    在第一次叠加操作之前,采用1×1的卷积操作将第五个卷积块输出的图像特征的通道数处理为256;
    在第二次叠加操作之前,采用1×1的卷积操作将第四个卷积块输出的图像特征的通道数处理为256;
    在第三次叠加操作之前,采用1×1的卷积操作将第三个卷积块输出的图像特征的 通道数处理为256。
  5. 如权利要求3所述方法,其中,所述深度学***滑的目标点云灰度图像。
  6. 如权利要求1所述的方法,所述方法还包括:
    对训练图像中的各个目标进行打点操作;
    利用训练图像以及打点后的训练图像对所述深度学习神经网络进行训练,使得所述深度学习神经网络能够将待预测图像处理为具有待预测图像的浅层图像特征以及深层图像特征的目标点云图像,所述目标点云图像中的每个目标点代表待预测图像中的一个目标。
  7. 如权利要求6所述的方法,其中,
    对训练图像中的各个目标进行打点操作后进行高斯模糊处理;
    利用训练图像以及高斯模糊处理后的训练图像对所述深度学习神经网络进行训练。
  8. 如权利要求7所述的方法,所述对训练图像中的各个目标进行打点操作包括:
    对训练图像中的各个目标在不同目标部位进行打点操作,并使得将代表第一目标的目标点扩大至周围八个像素点后仍然位于所述第一目标上。
  9. 如权利要求1所述的方法,所述方法还包括:
    采用摄像机对待识别目标群进行录像,得到待识别目标群的视频;
    从所述视频中截取视频图像,采用如权利要求1中的步骤实时识别视频图像中的目标数量。
  10. 如权利要求9所述的方法,其中,所述摄像机的个数为多个,所述摄像机的分辨率与所述摄像机录像环境的光照强度呈负相关,所述摄像机的广角随所述摄像机的架设位置变化而变化。
  11. 一种识别目标数量的装置,包括:
    图像处理模块,被配置为采用深度学习神经网络对待预测图像进行处理,得到具有待预测图像的浅层图像特征以及深层图像特征的目标点云图像,所述目标点云图像中的每个目标点代表待预测图像中的一个目标;
    数量识别模块,被配置为对所述目标点云图像中的点云数量进行识别,得到待预测图像中的目标数量。
  12. 如权利要求11所述的装置,其中,所述深度学习神经网络包括卷积层和反卷积层,其中的反卷积层被配置为对卷积层输出的图像特征进行反卷积操作并叠加至卷积层输出的图像特征,使得所述深度学习神经网络能够将待预测图像处理为具有待预测图像的浅层图像特征以及深层图像特征的目标点云图像。
  13. 如权利要求11所述的装置,其中,
    所述深度学习神经网络包括VGG16网络模型的前五个卷积块与附加的第一反卷积层、第二反卷积层、第三反卷积层;
    所述第一反卷积层被配置为对第五个卷积块输出的图像特征进行反卷积操作并叠加至第四个卷积块输出的图像特征;
    所述第二反卷积层被配置为对第一次叠加输出的图像特征进行反卷积操作并叠加至第三个卷积块输出的图像特征;
    所述第三反卷积层被配置为对第二次叠加输出的图像特征进行反卷机操作并叠加至第二个卷积块输出的图像特征。
  14. 如权利要求13所述的装置,其中,所述第一反卷积层、第二反卷积层、第三反卷积层的通道数为256;
    所述图像处理模块被配置为:在第一次叠加操作之前,采用1×1的卷积操作将第五个卷积块输出的图像特征的通道数处理为256;在第二次叠加操作之前,将采用1×1的卷积操作将第四个卷积块输出的图像特征的通道数处理为256;在第三次叠加操作之前,将采用1×1的卷积操作将第三个卷积块输出的图像特征的通道数处理为256。
  15. 如权利要求13所述装置,其中,所述深度学***滑的 目标点云灰度图像。
  16. 如权利要求11所述的装置,所述装置还包括网络训练模块,被配置为:
    对训练图像中的各个目标进行打点操作;
    利用训练图像以及打点后的训练图像对所述深度学习神经网络进行训练,使得所述深度学习神经网络能够将待预测图像处理为具有待预测图像的浅层图像特征以及深层图像特征的目标点云图像,所述目标点云图像中的每个目标点代表待预测图像中的一个目标。
  17. 如权利要求16所述的装置,其中,所述网络训练模块被配置为:
    对训练图像中的各个目标进行打点操作后进行高斯模糊处理;
    利用训练图像以及高斯模糊处理后的训练图像对所述深度学习神经网络进行训练。
  18. 如权利要求17所述的装置,其中,所述网络训练模块被配置为:
    对训练图像中的各个目标在不同目标部位进行打点操作,并使得将代表第一目标的目标点扩大至周围八个像素点后仍然位于所述第一目标上。
  19. 如权利要求11所述的装置,所述装置还包括:
    摄像模块,被配置为对待识别目标群进行录像,得到待识别目标群的视频;
    图像截取模块,被配置为从所述视频中截取视频图像,采用如权利要求1中的步骤实时识别视频图像中的目标数量。
  20. 如权利要求19所述的装置,其中,所述摄像模块的个数为多个,所述摄像模块的分辨率与所述摄像模块录像环境的光照强度呈负相关,所述摄像模块的广角与随所述摄像模块的架设位置变化而变化。
  21. 一种识别目标数量的装置,包括:
    存储器;以及
    耦接至所述存储器的处理器,所述处理器被配置为基于存储在所述存储器中的指 令,执行如权利要求1至10中任一项所述的识别目标数量的方法。
  22. 一种计算机可读存储介质,其中,所述计算机可读存储介质存储有计算机指令,所述指令被处理器执行时实现如权利要求1至10中任一项所述的识别目标数量的方法。
PCT/CN2019/094876 2018-07-06 2019-07-05 识别目标数量的方法、装置及计算机可读存储介质 WO2020007363A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201810733440.6 2018-07-06
CN201810733440.6A CN108921105B (zh) 2018-07-06 2018-07-06 识别目标数量的方法、装置及计算机可读存储介质

Publications (1)

Publication Number Publication Date
WO2020007363A1 true WO2020007363A1 (zh) 2020-01-09

Family

ID=64425405

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/094876 WO2020007363A1 (zh) 2018-07-06 2019-07-05 识别目标数量的方法、装置及计算机可读存储介质

Country Status (2)

Country Link
CN (1) CN108921105B (zh)
WO (1) WO2020007363A1 (zh)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112348089A (zh) * 2020-11-10 2021-02-09 中南民族大学 工作状态识别方法、服务器、存储介质及装置
CN112530004A (zh) * 2020-12-11 2021-03-19 北京奇艺世纪科技有限公司 一种三维点云重建方法、装置及电子设备
CN112581016A (zh) * 2020-12-28 2021-03-30 深圳硅纳智慧科技有限公司 物料管理***及采用该物料管理***的物料管理方法
WO2021217934A1 (zh) * 2020-04-28 2021-11-04 平安国际智慧城市科技股份有限公司 监控牲畜数量的方法、装置、计算机设备及存储介质
CN113920454A (zh) * 2021-10-21 2022-01-11 广西科技大学 一种低对比度工况下工地物料的快速识别和分类方法
US12026968B2 (en) 2020-11-12 2024-07-02 Sony Group Corporation Training machine learning-based models for animal feature detection

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108921105B (zh) * 2018-07-06 2020-11-03 京东数字科技控股有限公司 识别目标数量的方法、装置及计算机可读存储介质
CN109658414A (zh) * 2018-12-13 2019-04-19 北京小龙潜行科技有限公司 一种猪只的智能盘点方法及装置
CN109785337B (zh) * 2018-12-25 2021-07-06 哈尔滨工程大学 一种基于实例分割算法的栏内哺乳动物清点方法
CN110189264B (zh) * 2019-05-05 2021-04-23 Tcl华星光电技术有限公司 图像处理方法
CN111008561B (zh) * 2019-10-31 2023-07-21 重庆小雨点小额贷款有限公司 一种牲畜的数量确定方法、终端及计算机存储介质
CN111310805B (zh) * 2020-01-22 2023-05-30 中能国际高新科技研究院有限公司 一种对图像中的目标进行密度预测的方法、装置及介质
CN111401182B (zh) * 2020-03-10 2023-12-08 京东科技信息技术有限公司 针对饲喂栏的图像检测方法和装置

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20150079124A (ko) * 2013-12-31 2015-07-08 하나 마이크론(주) 가축 사료 급이 장치 및 급이 방법
CN105488534A (zh) * 2015-12-04 2016-04-13 中国科学院深圳先进技术研究院 交通场景深度解析方法、装置及***
CN107680080A (zh) * 2017-09-05 2018-02-09 翔创科技(北京)有限公司 牲畜的样本库建立方法和清点方法、存储介质和电子设备
CN107844790A (zh) * 2017-11-15 2018-03-27 上海捷售智能科技有限公司 一种基于图像识别的菜品识别与收银***及方法
CN108921105A (zh) * 2018-07-06 2018-11-30 北京京东金融科技控股有限公司 识别目标数量的方法、装置及计算机可读存储介质

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8525835B1 (en) * 2010-02-24 2013-09-03 The Boeing Company Spatial data compression using implicit geometry
CN107025642B (zh) * 2016-01-27 2018-06-22 百度在线网络技术(北京)有限公司 基于点云数据的车辆轮廓检测方法和装置
CN108021923B (zh) * 2017-12-07 2020-10-23 上海为森车载传感技术有限公司 一种用于深度神经网络的图像特征提取方法
CN108229548A (zh) * 2017-12-27 2018-06-29 华为技术有限公司 一种物体检测方法及装置

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20150079124A (ko) * 2013-12-31 2015-07-08 하나 마이크론(주) 가축 사료 급이 장치 및 급이 방법
CN105488534A (zh) * 2015-12-04 2016-04-13 中国科学院深圳先进技术研究院 交通场景深度解析方法、装置及***
CN107680080A (zh) * 2017-09-05 2018-02-09 翔创科技(北京)有限公司 牲畜的样本库建立方法和清点方法、存储介质和电子设备
CN107844790A (zh) * 2017-11-15 2018-03-27 上海捷售智能科技有限公司 一种基于图像识别的菜品识别与收银***及方法
CN108921105A (zh) * 2018-07-06 2018-11-30 北京京东金融科技控股有限公司 识别目标数量的方法、装置及计算机可读存储介质

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2021217934A1 (zh) * 2020-04-28 2021-11-04 平安国际智慧城市科技股份有限公司 监控牲畜数量的方法、装置、计算机设备及存储介质
CN112348089A (zh) * 2020-11-10 2021-02-09 中南民族大学 工作状态识别方法、服务器、存储介质及装置
CN112348089B (zh) * 2020-11-10 2024-01-16 中南民族大学 工作状态识别方法、服务器、存储介质及装置
US12026968B2 (en) 2020-11-12 2024-07-02 Sony Group Corporation Training machine learning-based models for animal feature detection
CN112530004A (zh) * 2020-12-11 2021-03-19 北京奇艺世纪科技有限公司 一种三维点云重建方法、装置及电子设备
CN112530004B (zh) * 2020-12-11 2023-06-06 北京奇艺世纪科技有限公司 一种三维点云重建方法、装置及电子设备
CN112581016A (zh) * 2020-12-28 2021-03-30 深圳硅纳智慧科技有限公司 物料管理***及采用该物料管理***的物料管理方法
CN113920454A (zh) * 2021-10-21 2022-01-11 广西科技大学 一种低对比度工况下工地物料的快速识别和分类方法
CN113920454B (zh) * 2021-10-21 2024-03-19 广西科技大学 一种低对比度工况下工地物料的快速识别和分类方法

Also Published As

Publication number Publication date
CN108921105B (zh) 2020-11-03
CN108921105A (zh) 2018-11-30

Similar Documents

Publication Publication Date Title
WO2020007363A1 (zh) 识别目标数量的方法、装置及计算机可读存储介质
CN114359727B (zh) 基于轻量级优化Yolo v4的茶叶病害识别方法及***
CN107229947A (zh) 一种基于动物识别的金融保险方法及***
WO2020030054A1 (zh) 一种动物识别方法、装置、介质及电子设备
CN113919442B (zh) 一种基于卷积神经网络烟叶成熟度状态识别方法
CN110321956B (zh) 一种基于人工智能的牧草病虫害治理方法及装置
CN113221864A (zh) 多区域深度特征融合的病鸡视觉识别模型构建及应用方法
CN107437068A (zh) 基于Gabor方向直方图和猪体毛发模式的猪个体识别方法
CN111727457A (zh) 一种基于计算机视觉的棉花作物行检测方法、装置及存储介质
CN104007733B (zh) 一种对农业集约化生产进行监控的***及方法
CN113312999A (zh) 一种自然果园场景下的柑橘木虱高精度检测方法及装置
CN114049550A (zh) 基于深度学习的笼养死亡肉鸡自动识别方法
CN115661650A (zh) 一种基于物联网数据监测的农场管理***
Zhang et al. Research on application technology of 5G Internet of Things and big data in dairy farm
CN112861666A (zh) 一种基于深度学习的鸡群计数方法及应用
CN114997725A (zh) 基于注意力机制与轻量级卷积神经网络的奶牛体况评分方法
CN117456358A (zh) 一种基于YOLOv5神经网络的植物病虫害检测方法
CN108334938A (zh) 一种基于图像识别的蚊媒自动监测***
Yang et al. A defencing algorithm based on deep learning improves the detection accuracy of caged chickens
WO2022104867A1 (zh) 目标物的特征检测方法和装置
Zhu et al. Automated chicken counting using yolo-v5x algorithm
CN214201204U (zh) 一种基于树莓派的水稻病虫害检测装置
TWM638328U (zh) 基於影像辨識技術的蟲害監測防治系統
CN112329697B (zh) 一种基于改进YOLOv3的树上果实识别方法
Deng et al. A real-time sheep counting detection system based on machine learning.

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19831511

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19831511

Country of ref document: EP

Kind code of ref document: A1