CN111325084A - Dish information identification method and terminal based on YOLO neural network - Google Patents

Dish information identification method and terminal based on YOLO neural network Download PDF

Info

Publication number
CN111325084A
CN111325084A CN201910806784.XA CN201910806784A CN111325084A CN 111325084 A CN111325084 A CN 111325084A CN 201910806784 A CN201910806784 A CN 201910806784A CN 111325084 A CN111325084 A CN 111325084A
Authority
CN
China
Prior art keywords
neural network
dish
yolo neural
yolo
identification method
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910806784.XA
Other languages
Chinese (zh)
Inventor
于文涛
郝继伟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Xiaoniu Zhixun Technology Co ltd
Original Assignee
Xi'an Iridium Shiyun Catering Management Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xi'an Iridium Shiyun Catering Management Co ltd filed Critical Xi'an Iridium Shiyun Catering Management Co ltd
Priority to CN201910806784.XA priority Critical patent/CN111325084A/en
Publication of CN111325084A publication Critical patent/CN111325084A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/10Terrestrial scenes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/243Classification techniques relating to the number of classes
    • G06F18/2431Multiple classes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/082Learning methods modifying the architecture, e.g. adding, deleting or silencing nodes or connections
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/60Type of objects
    • G06V20/68Food, e.g. fruit or vegetables

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • Computing Systems (AREA)
  • Software Systems (AREA)
  • Molecular Biology (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Physics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Multimedia (AREA)
  • Image Analysis (AREA)

Abstract

The invention belongs to the technical field of data processing, and discloses a dish information identification method and terminal based on a YOLO neural network, which are used for making an xml file in a VOC format; converting the XML file in the VOC format into a txt file required by a YOLO neural network; setting training parameters of a YOLO neural network; preprocessing the images in the training set, and sending the images with the adjusted sizes into a YOLO neural network for training; observing a loss curve in the training process, and judging whether the YOLO neural network is converged; and packaging the codes into pyd and lib files by a pybind11 library, and calling a YOLO neural network by using a python language to realize the identification of the dishes. Compared with the existing dish identification method, the dish identification method provided by the invention has higher accuracy and identification speed; the dish identification speed of the video with 1080P resolution can reach more than 30 FPS.

Description

Dish information identification method and terminal based on YOLO neural network
Technical Field
The invention belongs to the technical field of data processing, and particularly relates to a dish information identification method and terminal based on a YOLO neural network.
Background
Currently, the closest prior art: the dish refers to various kinds of dishes, such as green dish fried mushroom, green pepper shredded meat, potato braised meat and the like. The dish identification means that the name of the dish is identified. In restaurants and restaurants, waiters who are responsible for paying money need to settle the fees according to dishes ordered by customers. The traditional dish identification all relies on human eyes to identify dishes. However, because of the variety of dishes, the color, the fragrance, the taste and the shape of the dishes cooked each time are different, so that the accuracy rate of manual dish identification is low and the speed is slow. YOLO is a very excellent target detection network proposed in recent years, and can predict the category and bounding box of a target at the same time, so as to convert the target detection problem into a regression problem. The YOLO can achieve a balance between speed and performance, and achieve very high accuracy and recall while ensuring very high target detection speed.
The prior art discloses a dish identification method and a dish identification system, and dish identification is considered to be a classification problem, so that the prior art can only identify one dish at a time and cannot simultaneously identify multiple dishes which are put together. However, in the actual situation of a restaurant or a restaurant, a waiter responsible for cash collection needs to order one dish or multiple dishes for a customer to calculate a meal cost. Therefore, dish identification is a target detection problem, namely, the target detection is carried out on one dish or a plurality of dishes which are put together. The method adopts a classification method in the prior art, the classification method can only identify one dish at a time, a plurality of dishes are put together, the prior art cannot work normally, and the obtained result is meaningless. Only one dish can be identified at a time, and a plurality of dishes placed together cannot be identified at the same time.
The second prior art discloses a dish identification method, which comprises the following steps: 1) acquiring a web request, wherein the server corresponds to the web request and acquires a corresponding image; 2) saving the image, acquiring an input data stream, generating an image file name and saving the image file name to a magnetic disk; 3) image preprocessing, namely, input; resizing and normalizing the image; 4) and processing by using a pre-trained convolutional neural network, detecting and classifying objects on the image, ending if no dish is detected, and outputting corresponding dish information by combining a classification result if the dish is detected. The method adopts a shallow convolutional neural network, and the number of layers of the convolutional neural network adopted in the prior art is small, so that the prior art has poor performance; the extracted image features are limited, so that the final dish identification accuracy is low.
In summary, the problems of the prior art are as follows: in the dish identification method in the prior art, a plurality of dishes put together cannot be identified at the same time; the final dish identification accuracy is low.
The difficulty of solving the technical problems is as follows:
in the prior art, the technical problems cannot be fundamentally solved only by simple modification, and the difficulty in solving the technical problems is very high, so that only a new method can be innovated. The present invention is a great innovation of the prior art and can solve the above-mentioned technical problems.
The significance of solving the technical problems is as follows:
the significance of solving a technical problem in the prior art is that a plurality of dishes placed together can be identified at the same time. The significance of solving the two technical problems in the prior art is that the accuracy of dish identification can be improved.
Disclosure of Invention
Aiming at the problems in the prior art, the invention provides a dish information identification method and terminal based on a YOLO neural network.
The invention is realized in such a way that a dish information identification method based on a YOLO neural network comprises the following steps:
firstly, making an xml file in a VOC format, labeling the real category and the boundary box of dishes in an image, and automatically generating the xml file required by a YOLO neural network;
secondly, converting the xml file in the VOC format into a txt file required by a YOLO neural network;
thirdly, setting training parameters of a YOLO neural network, wherein the learning rate is 0.01, the batch size is 64, the dropout is 0.25, and the iteration frequency is 10 ten thousand times;
fourthly, preprocessing the images in the training set, and sending the images with the adjusted sizes into a YOLO neural network for training;
fifthly, observing a loss curve in the training process, and judging whether the YOLO neural network is converged; if the convergence occurs, stopping training; if not, continuing training;
and sixthly, packaging the code into pyd and lib files by a pybind11 library, and calling a YOLO neural network by using a python language to realize the identification of the dishes.
Further, the dish information identification method based on the YOLO neural network performs feature extraction of dish images through a dark net-53 network.
Further, the dish information identification method based on the YOLO neural network divides the extracted feature maps with three different sizes into grids with different sizes, and carries out boundary frame prediction and category judgment on dishes with different sizes.
Further, the dish information identification method based on the YOLO neural network is characterized in that the size of an image input into the YOLO neural network is uniformly defined as 416 × 416, and feature maps of three different sizes, namely 13 × 13, 26 × 26 and 52 × 52, are obtained through a series of operations of convolution, up-sampling, residual error unit and tensor splicing.
Further, the dish information identification method based on the YOLO neural network selects corresponding prediction frame sizes from three feature maps with different sizes according to the scope of receptive fields, and selects boundary frames with 3 sizes respectively, wherein:
outputting a feature map with the size of 13 × 13, wherein the sizes of the corresponding preset template boxes are mapped to the predicted box sizes of the input image 416 × 416 to be 116 × 90, 156 × 198, 373 × 326 respectively;
outputting a feature map with the size of 26 × 26, wherein the corresponding prediction box sizes are 30 × 61, 62 × 45, 59 × 119;
the feature map with the output size of 52 × 52 corresponds to prediction box sizes of 10 × 13, 16 × 30 and 33 × 23, respectively.
Further, the dish information identification method based on the YOLO neural network further includes:
firstly, image preprocessing, namely setting the size of an input image of a YOLO neural network as 416 × 416, dividing the image into squares with corresponding size quantity according to the size of an output feature map, dividing the original input image into 13x13 grids, wherein each grid corresponds to a 3-dimensional tensor of the output 13 × 13 × 47;
step two, outputting 3 prediction frames with different sizes for the square where the center point of the dish is located, wherein the first part of the 47 output tensors is that the number of the identified dish types is 32; the number of the corresponding prediction frames is 3, and the last 12 parameters are bx, by, bw and bh which correspond to the 3 bounding boxes respectively;
and step three, performing target boundary frame prediction and category judgment according to whether the central point of the dish falls in the grid or not, and outputting the dish identification result.
Another object of the present invention is to provide a food information recognition system based on a YOLO neural network, which operates the food information recognition method based on a YOLO neural network, the food information recognition system based on a YOLO neural network including:
the image feature extraction module is used for extracting the features of the dish images through a dark net-53 network;
the characteristic diagram dividing module is used for dividing the extracted three characteristic diagrams with different sizes into grids with different sizes;
and the judging module is used for carrying out boundary frame prediction and type judgment on dishes with different sizes.
Another object of the present invention is to provide a computer program for implementing the food information identification method based on the YOLO neural network.
Another object of the present invention is to provide an information data processing terminal for implementing the food information identification method based on the YOLO neural network.
Another object of the present invention is to provide a computer-readable storage medium, which includes instructions that, when executed on a computer, cause the computer to execute the food information identification method based on the YOLO neural network.
In summary, the advantages and positive effects of the invention are: the invention provides a dish identification method based on a YOLO neural network, wherein YOLO is an excellent target detection network, and dish identification by using the network can achieve very high accuracy and very high identification speed.
The invention adopts the YOLO neural network, and the number of layers of the convolutional neural network adopted by the prior art is small, so that the performance of the prior art is poor. The method is used in the dish identification field for the first time, and the problem that the dish identification field cannot achieve both the identification accuracy and the identification speed is solved; a plurality of dishes can be identified thereby. Compared with the existing dish identification method, the dish identification method provided by the invention has higher accuracy and identification speed, the accuracy is improved by 30%, and the identification speed is improved by 15%. The deep learning is automatic learning in the network training process, automatically extracts the features of the images and is not limited to the traditional manual features. The dish identification is carried out by adopting a deep learning GPU video card acceleration method, and the dish identification speed of a video with a 1080P resolution can reach more than 30 FPS.
Drawings
Fig. 1 is a flowchart of a dish information identification method based on a YOLO neural network according to an embodiment of the present invention.
Fig. 2 is a schematic structural diagram of a YOLO neural network provided in an embodiment of the present invention.
Fig. 3 is a schematic diagram of a calculation process of bounding box prediction according to an embodiment of the present invention.
Fig. 4 is a schematic diagram of a dish information identification method based on the YOLO neural network according to an embodiment of the present invention.
Fig. 5 is a schematic diagram illustrating changes in parameters of the YOLO neural network structure and each layer according to an embodiment of the present invention.
Fig. 6 is a schematic view of a loss curve in the process of identifying and training the YOLO neural network dishes according to the embodiment of the present invention.
Fig. 7 is a schematic view of an IOU curve in the process of identifying and training the YOLO neural network dishes according to the embodiment of the present invention.
Fig. 8 is an original image input by the dish identification method according to the embodiment of the present invention.
Fig. 9 is a schematic diagram of an identification result output by the dish identification method according to the embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is further described in detail with reference to the following embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
Aiming at the problems in the prior art, the invention provides a dish information identification method and a dish information identification terminal based on a YOLO neural network, and the invention is described in detail below with reference to the accompanying drawings.
As shown in fig. 1, the dish information identification method based on the YOLO neural network according to the embodiment of the present invention includes the following steps:
s101: making an xml file in a VOC format, marking the real category and the boundary box of dishes in the image, and automatically generating the xml file required by a YOLO neural network;
s102: converting the XML file in the VOC format into a txt file required by a YOLO neural network;
s103: setting training parameters of a YOLO neural network, wherein the learning rate is 0.01, the batch size is 64, the dropout is 0.25, and the iteration number is 10 ten thousand;
s104: preprocessing the images in the training set, and sending the images with the adjusted sizes into a YOLO neural network for training;
s105: observing a loss curve in the training process, and judging whether the YOLO neural network is converged; if the convergence occurs, stopping training; if not, continuing training;
s106: and packaging the codes into pyd and lib files by a pybind11 library, and calling a YOLO neural network by using a python language to realize the identification of the dishes.
In a preferred embodiment of the invention, an xml file in VOC format is made, the software tool: LabelImg, the software can conveniently label and select dishes data sets, label the real categories and bounding boxes of the dishes in the image, and automatically generate the xml file required by the YOLO neural network.
In a preferred embodiment of the invention, the hardware environment of the training process: the CPU is Intel Xeon (R) and 20 cores, the model is E5-2640 v4, the main frequency is 2.4G Hz, and the memory is 64G. And (3) accelerating training by adopting a GPU, wherein the GPU is NVIDIA GeForceGTX 1080Ti/PCIe/SSE2, and the video memory size is 20G. Software environment of the training process: the operating system was ubuntu16.04lts, the OpenCV version was 3.3.0, and the TensorFlow version was 1.2.1.
The technical solution of the present invention is further described below with reference to the accompanying drawings.
The invention provides a method for identifying dishes, which adopts a YOLO neural network to detect the names of one or more dishes; the dish identification process comprises the following steps:
in the Residual module in fig. 2,. 1 indicates that the number of Residual network elements is 1,. 2 indicates that the number of Residual network elements is 2, and so on, the number of Residual network elements in the YOLO neural network is 1+2+8+8+4 to 23. the convolutional layer is followed by the BN regularization operation and the leak ReLU nonlinear activation function, wherein upsampling is mainly used for fusing shallow features and deep features so as to achieve better detection effect on dishes, and in order to cope with the size difference existing in different size dish images, there are 3 feature maps (13 × 13, 26 × 26, 52 × 52) with different sizes in the output part in fig. 2, wherein the feature maps with 3 sizes respectively select different length-width ratios and different areas according to the sizes of the feature maps, the feature map with different sizes in the reverse direction is deduced, that the corresponding area in the original dish is the area, that the final regression area of the dish image is obtained by using the prediction target region (the optimal regression region) for detecting the dish image.
The number of the output feature graphs has a direct relation with the number of the categories of the target to be identified, and a calculation formula (1) is as follows:
filters_num=3*(class_num+5) (1)
taking the dish identification method provided by the invention as an example for identifying 32 types of dishes, the number of feature graphs of three sizes output by the identification network for 32 types of dishes can be 111 through the formula.
In the prediction of the bounding box (bounding box), a template box (anchor box) is determined by using a dimension clustering method, and the relative coordinate of the center point of the bounding box relative to the upper left corner of the grid unit is obtained by directly predicting the relative position. The bounding box prediction process is shown in fig. 3 below.
It can be known from fig. 3 that there is a window fine-tuning process in the prediction process of the bounding box, so that the network location is more accurate, and the IOU value is increased. Predicting the coordinate value of the output frame based on the characteristic diagram to be bx、by、bw,bhI.e. the position and size of the bounding box with respect to the feature map: the formula is as follows:
bx=σ(tx)+cx(2)
by=σ(ty)+cy(3)
Figure BDA0002183883710000071
Figure BDA0002183883710000081
the learning objective of the network is th,tw,tx,tyWherein t isx,tyIs the coordinate offset value of the prediction box, th,twIs a scaling, Gx,GyIs the coordinate of the center point of the actual frame (ground route) in this feature map, Gw,GhIs the width and height of the ground channel on the feature map. Cx,CyIs the coordinate of the upper left corner of the center of the grid in the feature map, the width and the height of each gridcell in the YOLO neural network in the feature map are both 1, and P in the formulaw,PhIs the preset template box maps to the width and height in the feature map, where tx,tyAnd directly calculating the offset of the center of the boundary frame from the coordinate of the upper left corner of the center of the grid, wherein the formula is as follows:
tx=Gx-Cx(6)
ty=Gy-Cy(7)
wherein t ish,twThe ratio of the length and the width of the frame where the object is located to the length and the width of the template frame is shown as follows:
tw=log(Gw/Pw) (8)
th=log(Gh/Ph) (9)
as can be seen from the expressions (2) to (5), the position of the bounding box is determined by (t)h,tw,tx,ty) Calculated b is obtainedx,byUsing sigmoid function to calculate tx,tyCompressed to [0,1 ]]Within the interval, the target center can be effectively ensured to be in the grid unit for executing prediction, and excessive deviation is prevented. To obtain a more stable model, the predicted value of the position of the bounding box is constrained to [0,1 ]]I.e. for bx,by,bw,bhDivided by the width and height of the feature map, respectively, the formula is as follows:
bx=σ(tx)+cx/w (10)
by=σ(ty)+cy/h (11)
Figure BDA0002183883710000093
Figure BDA0002183883710000094
b after division by w, hx,by,bw,bhMultiplying the 4 values by the width and height of the picture of the input network (e.g. 416 × 416) respectively can obtain the position and size of the bounding box relative to the coordinate system (416 × 416), i.e. the desired target box can be output.
When the YOLO neural network predicts the bounding box, logistic regression is used. logistic regression is used to score the portion of the template (anchor) that is surrounded by an objective score (Objectness score), i.e., how likely the block is to be an object. This step is performed before prediction, and unnecessary anchors are removed, so that the calculation amount can be reduced.
Thus, the YOLO neural network will only operate on 1 anchor prior, i.e., the best prior. While logistic regression is used to find the highest one of the 9 anchors' priors with the highest objective score (object score). logistic regression is a linear modeling of the prior versus object score mapping using a curve.
The confidence of the YOLO neural network is defined as the probability size P that the bounding box contains the targetr(objec), and the accuracy of this bounding box. When the bounding box is background (i.e., contains no objects), P is now presentr(object) ═ 0. And when the bounding box contains an object, Pr(object) 1. The accuracy of the bounding box is represented by the IOU (intersection ratio) of the predicted box and the actual box (ground channel), and is recorded as
Figure BDA0002183883710000091
Confidence is defined as follows:
Figure BDA0002183883710000092
from the above formula, the confidence is the product of two factors, and the accuracy of the prediction box is also reflected therein.
According to the theoretical basis, the detail involved in dish identification for the YOLO neural network is described as shown in fig. 4, the specific process of dish identification for the YOLO neural network is shown in fig. 4, firstly, the image preprocessing is carried out, the size of an input image of the YOLO neural network is set to be 416 × 416, the image is divided into blocks with corresponding sizes and numbers according to the size of an output feature map, the original input image is divided into grids of 13x13 by taking 13 × 13 sizes (scale) in fig. 4 as an example, each grid corresponds to 3-dimensional output 13 × 13 × 47, such as a cuboid in fig. 4, 3 prediction frames with different sizes are output for a gray grid where a dish central point is located, wherein the first part of the output 47 tensors is the number of identified dish categories, the number of items is 32 by taking the identification of 32 dishes as an example, the number of items is 32, the number of the corresponding prediction frames is 3, the last 12 parameters are 3 corresponding boundary frames, bx, by, and the final result of dish identification is judged according to whether the grid falls in the target dish categories.
In the process of dish identification by a YOLO neural network, firstly, the characteristic extraction of a dish image is carried out through a darknet-53 network, then, extracted characteristic diagrams with three different sizes are divided into grids with different sizes, and the dish with different sizes is subjected to boundary frame prediction and type judgment.
And determining the prior number k to 9 after the YOLO neural network used in the dish identification process is clustered, and predicting dishes of the input image through preset template frames with different sizes to obtain corresponding 9 boundary frames with different sizes.
Selecting corresponding prediction frame sizes according to the scope of the receptive field by three feature maps with different sizes, and selecting boundary frames with 3 sizes respectively, wherein:
outputting a feature map with a size of 13 × 13, which is suitable for detecting large-sized dishes, such as large boiled fish, due to the largest receptive field, and the sizes of the corresponding preset template frames are mapped to the predicted frame sizes of the input image (416 × 416) of 116 × 90, 156 × 198, 373 × 326, respectively;
outputting a characteristic diagram with the size of 26 × 26, wherein the characteristic diagram is used for detecting medium-sized dishes, such as medium-sized shredded pork with fish flavor, due to the medium receptive field, and the corresponding prediction box sizes are 30 × 61, 62 × 45 and 59 × 119 respectively;
the signature of 52 × 52 is output, because of its minimal field of view, for use in detecting small dishes, such as a small bowl of rice, with corresponding predicted box sizes of 10 × 13, 16 × 30, and 33 × 23, respectively.
The YOLO neural network finally uses logistic regression to find the highest one of the object score from the 9 template boxes, i.e. the predicted bounding box that outputs the nearest real dish.
The technical effects of the present invention will be described in detail with reference to experiments.
The loss curve and IOU curve of the present invention during a certain training process are shown in FIG. 6 and FIG. 7.
The effect of the dish identification method of the present invention is shown in fig. 8 and 9.
TABLE 1 Performance of the dish identification method of the present invention
Figure BDA0002183883710000111
It should be noted that the embodiments of the present invention can be realized by hardware, software, or a combination of software and hardware. The hardware portion may be implemented using dedicated logic; the software portions may be stored in a memory and executed by a suitable instruction execution system, such as a microprocessor or specially designed hardware. Those skilled in the art will appreciate that the apparatus and methods described above may be implemented using computer executable instructions and/or embodied in processor control code, such code being provided on a carrier medium such as a disk, CD-or DVD-ROM, programmable memory such as read only memory (firmware), or a data carrier such as an optical or electronic signal carrier, for example. The apparatus and its modules of the present invention may be implemented by hardware circuits such as very large scale integrated circuits or gate arrays, semiconductors such as logic chips, transistors, or programmable hardware devices such as field programmable gate arrays, programmable logic devices, etc., or by software executed by various types of processors, or by a combination of hardware circuits and software, e.g., firmware.
The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents and improvements made within the spirit and principle of the present invention are intended to be included within the scope of the present invention.

Claims (10)

1. A dish information identification method based on a YOLO neural network is characterized by comprising the following steps:
firstly, making an xml file in a VOC format, labeling the real category and the boundary box of dishes in an image, and automatically generating the xml file required by a YOLO neural network;
secondly, converting the xml file in the VOC format into a txt file required by a YOLO neural network;
thirdly, setting training parameters of a YOLO neural network, wherein the learning rate is 0.01, the batch size is 64, the dropout is 0.25, and the iteration frequency is 10 ten thousand times;
fourthly, preprocessing the images in the training set, and sending the images with the adjusted sizes into a YOLO neural network for training;
fifthly, observing a loss curve in the training process, and judging whether the YOLO neural network is converged; if the convergence occurs, stopping training; if not, continuing training;
and sixthly, packaging the code into pyd and lib files by a pybind11 library, and calling a YOLO neural network by using a python language to realize the identification of the dishes.
2. The yo neural network-based dish information recognition method of claim 1, wherein the yo neural network-based dish information recognition method performs feature extraction of a dish image through a dark net-53 network.
3. The food information identification method based on the YOLO neural network of claim 1, wherein the food information identification method based on the YOLO neural network divides the extracted three feature maps with different sizes into squares with different sizes, and performs bounding box prediction and category judgment on the food with different sizes.
4. The food information identification method based on the YOLO neural network of claim 1, wherein the size of the image input to the YOLO neural network is uniformly defined as 416 × 416, and feature maps of three different sizes 13 × 13, 26 × 26 and 52 × 52 are obtained through a series of convolution, upsampling, residual unit and tensor stitching operations.
5. The food information identification method based on the YOLO neural network of claim 4, wherein feature maps of three different sizes select corresponding prediction frame sizes according to the scope of receptive fields, and each of the three different sizes selects a bounding frame of 3 sizes, wherein:
outputting a feature map with the size of 13 × 13, wherein the sizes of the corresponding preset template boxes are mapped to the predicted box sizes of the input image 416 × 416 to be 116 × 90, 156 × 198, 373 × 326 respectively;
outputting a feature map with the size of 26 × 26, wherein the corresponding prediction box sizes are 30 × 61, 62 × 45, 59 × 119;
the feature map with the output size of 52 × 52 corresponds to prediction box sizes of 10 × 13, 16 × 30 and 33 × 23, respectively.
6. The yo neural network-based dish information identification method of claim 1, further comprising:
firstly, image preprocessing, namely setting the size of an input image of a YOLO neural network as 416 × 416, dividing the image into squares with corresponding size quantity according to the size of an output feature map, dividing the original input image into 13x13 grids, wherein each grid corresponds to a 3-dimensional tensor of the output 13 × 13 × 47;
step two, outputting 3 prediction frames with different sizes for the square where the center point of the dish is positioned, wherein the prediction frames are in different sizesIn the 47 output tensors, the first part is that the number of the identified dish categories is 32; the number of the corresponding prediction frames is 3, and the last 12 parameters are b corresponding to 3 boundary frames respectivelyx、by、bw,bh
And step three, performing target boundary frame prediction and category judgment according to whether the central point of the dish falls in the grid or not, and outputting the dish identification result.
7. A food information identification system based on a YOLO neural network, which operates the food information identification method based on a YOLO neural network of any one of claims 1 to 6, wherein the food information identification system based on a YOLO neural network comprises:
the image feature extraction module is used for extracting the features of the dish images through a dark net-53 network;
the characteristic diagram dividing module is used for dividing the extracted three characteristic diagrams with different sizes into grids with different sizes;
and the judging module is used for carrying out boundary frame prediction and type judgment on dishes with different sizes.
8. A computer program for implementing the method for identifying dish information based on the YOLO neural network as claimed in any one of claims 1 to 6.
9. An information data processing terminal for implementing the dish information identification method based on the YOLO neural network as claimed in any one of claims 1 to 6.
10. A computer-readable storage medium comprising instructions which, when executed on a computer, cause the computer to perform the food information recognition method based on the YOLO neural network of any one of claims 1 to 6.
CN201910806784.XA 2019-08-29 2019-08-29 Dish information identification method and terminal based on YOLO neural network Pending CN111325084A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910806784.XA CN111325084A (en) 2019-08-29 2019-08-29 Dish information identification method and terminal based on YOLO neural network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910806784.XA CN111325084A (en) 2019-08-29 2019-08-29 Dish information identification method and terminal based on YOLO neural network

Publications (1)

Publication Number Publication Date
CN111325084A true CN111325084A (en) 2020-06-23

Family

ID=71172474

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910806784.XA Pending CN111325084A (en) 2019-08-29 2019-08-29 Dish information identification method and terminal based on YOLO neural network

Country Status (1)

Country Link
CN (1) CN111325084A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112115902A (en) * 2020-09-25 2020-12-22 广州市派客朴食信息科技有限责任公司 Dish identification method based on single-stage target detection algorithm
CN112560918A (en) * 2020-12-07 2021-03-26 杭州电子科技大学 Dish identification method based on improved YOLO v3

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108052946A (en) * 2017-12-11 2018-05-18 国网上海市电力公司 A kind of high pressure cabinet switch automatic identifying method based on convolutional neural networks
US20190050981A1 (en) * 2017-08-09 2019-02-14 Shenzhen Keya Medical Technology Corporation System and method for automatically detecting a target object from a 3d image
CN109508664A (en) * 2018-10-26 2019-03-22 浙江师范大学 A kind of vegetable identification pricing method based on deep learning

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190050981A1 (en) * 2017-08-09 2019-02-14 Shenzhen Keya Medical Technology Corporation System and method for automatically detecting a target object from a 3d image
CN108052946A (en) * 2017-12-11 2018-05-18 国网上海市电力公司 A kind of high pressure cabinet switch automatic identifying method based on convolutional neural networks
CN109508664A (en) * 2018-10-26 2019-03-22 浙江师范大学 A kind of vegetable identification pricing method based on deep learning

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
陈聪;杨忠;宋佳蓉;韩家明;: "一种改进的卷积神经网络行人识别方法" *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112115902A (en) * 2020-09-25 2020-12-22 广州市派客朴食信息科技有限责任公司 Dish identification method based on single-stage target detection algorithm
CN112560918A (en) * 2020-12-07 2021-03-26 杭州电子科技大学 Dish identification method based on improved YOLO v3
CN112560918B (en) * 2020-12-07 2024-02-06 杭州电子科技大学 Dish identification method based on improved YOLO v3

Similar Documents

Publication Publication Date Title
TWI746674B (en) Type prediction method, device and electronic equipment for identifying objects in images
Aguilar et al. Grab, pay, and eat: Semantic food detection for smart restaurants
CN104424482B (en) Image processing equipment and image processing method
CN109165645A (en) A kind of image processing method, device and relevant device
WO2021115345A1 (en) Image processing method and apparatus, computer device, and storage medium
CN109952614A (en) The categorizing system and method for biomone
WO2022227770A1 (en) Method for training target object detection model, target object detection method, and device
CN112907595B (en) Surface defect detection method and device
CN113033706B (en) Multi-source two-stage dish identification method based on visual detection and re-identification
CN109858547A (en) A kind of object detection method and device based on BSSD
CN106650743B (en) Image strong reflection detection method and device
CN110334594A (en) A kind of object detection method based on batch again YOLO algorithm of standardization processing
CN110363224B (en) Object classification method and system based on image and electronic equipment
CN114926747A (en) Remote sensing image directional target detection method based on multi-feature aggregation and interaction
CN111325084A (en) Dish information identification method and terminal based on YOLO neural network
CN112991238A (en) Texture and color mixing type food image segmentation method, system, medium and terminal
CN116092179A (en) Improved Yolox fall detection system
CN114581744A (en) Image target detection method, system, equipment and storage medium
CN112149664A (en) Target detection method for optimizing classification and positioning tasks
CN109815854A (en) It is a kind of for the method and apparatus of the related information of icon to be presented on a user device
CN114462469B (en) Training method of target detection model, target detection method and related device
CN110472673B (en) Parameter adjustment method, fundus image processing device, fundus image processing medium and fundus image processing apparatus
KR20190018274A (en) Method and apparatus for recognizing a subject existed in an image based on temporal movement or spatial movement of a feature point of the image
CN114255377A (en) Differential commodity detection and classification method for intelligent container
CN114332602A (en) Commodity identification method of intelligent container

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right

Effective date of registration: 20240305

Address after: Building 2, 2102, No. 35 Guanlan Avenue, Xinhe Community, Fucheng Street, Longhua District, Shenzhen City, Guangdong Province, 518000

Applicant after: Shenzhen Xiaoniu Zhixun Technology Co.,Ltd.

Country or region after: China

Address before: 710000, Unit 4, Building 6, Beihang Science and Technology Park, No. 588 Feitian Road, National Civil Aerospace Industry Base, Xi'an City, Shaanxi Province, China, Third Floor, Aviation Enterprise Service Center D999

Applicant before: Xi'an iridium Shiyun Catering Management Co.,Ltd.

Country or region before: China

TA01 Transfer of patent application right