CN112650866A

CN112650866A - Catering health analysis method based on image semantic deep learning

Info

Publication number: CN112650866A
Application number: CN202010836022.7A
Authority: CN
Inventors: 戴超; 盛斌; 朱双奇; 潘思源
Original assignee: Shanghai Zhitang Health Technology Co ltd
Current assignee: Shanghai Zhitang Health Technology Co ltd
Priority date: 2020-08-19
Filing date: 2020-08-19
Publication date: 2021-04-13

Abstract

The invention provides a catering health analysis method based on deep learning of image semantics, which takes a menu image as input and can realize high-precision dish image classification and dish nutrient calculation. In the dish image classification part, the invention constructs a dish image classification network capable of learning the distance between the recipes, the network takes the dish image and the recipe information as input, and the information of the raw material part in the recipe is deeply understood while the image information is learned, thereby further improving the classification accuracy. In the nutrient calculation part, pixel-level semantic segmentation is carried out on the image, the image information represented by each pixel point is accurate, the proportion of various raw materials in each picture is determined, and the raw material content in the menu is further corrected. For the same dish, different pictures can return different nutrient content information, so that the nutrient calculation module is more accurate and scientific.

Description

Catering health analysis method based on image semantic deep learning

Technical Field

The invention mainly relates to computer vision correlation technology, in particular to a dish image identification and dish image semantic segmentation technology based on deep learning.

Background

In today's society, diet health has been a topic of concern and concern to the general public. The reasonable and healthy diet can also help people to prevent diet-related diseases such as diabetes mellitus and the like. However, at the present stage, the science and popularity of dietary health are still insufficient, and most people still have insufficient understanding of truly scientific dietary health. Therefore, the diet health needs not only the improvement of attention, but also a way for helping the masses to scientifically know the diet and provide guidance suggestions with medical values, and the masses need not only a perceptual knowledge on diet, but also specific numbers and data guidance.

In the aspect of a dish analysis system, most of related catering analysis systems at the present stage have two defects: the user is required to have dish related knowledge such as raw materials, recipes and the like; the provided nutrient information is not comprehensive enough, and scientific medical guidance suggestions are lacked.

In the aspect of dish analysis algorithm, two mainstream dish image analysis technologies are currently used:

(1) a single label classifier is trained by means of a convolutional neural network, a dish corresponds to a class, and each picture obtains one class. Because pictures in the task of dish identification often have high similarity and complexity, the method cannot achieve a good effect.

(2) And training a multi-label classifier by taking the convolutional neural network as a framework, wherein each raw material corresponds to one category, and each picture obtains a plurality of categories. The method needs a large amount of extra manual information such as prior relation among dish raw materials, meanwhile, deep learning on menu pictures is not carried out, and the method still has a space for improving the accuracy.

Meanwhile, pixel-level semantic segmentation of dish picture raw materials is not performed at present, and data query aiming at dishes and staying on the surface is performed on dish nutrient calculation, namely unit-mass nutrients obtained from different pictures of the same dish are the same, and fine analysis and calculation aiming at the user pictures cannot be performed.

Disclosure of Invention

The invention provides a catering health analysis system which can meet most daily requirements on dietary health. The method can identify the corresponding dish name and the menu thereof according to the dish picture input by inquiry. And then, performing pixel-level semantic segmentation on the picture, and understanding the dish raw material information contained in each pixel in the picture, thereby accurately calculating the nutrient content information of the dish. The invention only needs to input dish pictures and the quality thereof and output a nutrient element reference table.

The technical scheme of the invention is as follows:

(1) target detection: after the user inputs the picture, the positions of the containers such as the bowl and the like are detected through a target detection method, the position of the dish is further accurate, the enclosing frame of the dish is obtained, and irrelevant influence factors such as the background are removed.

(2) The dish identification of the distance between the recipes can be learned: and (2) after the enclosing frame of the dishes is obtained according to the step (1), simultaneously learning picture and menu information through a classification model capable of learning the distance between the classes, matching the picture and the menu information, and finally obtaining five dishes with the highest matching degree with the picture and the menu thereof for a user to select.

(3) Calculating nutrient of the dish: after the name and the menu of the dishes are obtained, semantic division is performed on the dishes with various main food materials at the pixel level, the dish raw materials are divided according to colors, the proportion of each raw material is further refined, and therefore the content of nutrients is calculated more accurately.

The method has the advantages that the method does not need to be capable of automatically extracting the effective area of the dish image, and the dish type is analyzed by adopting the recognition model; and the content of the nutrients is further accurately calculated by means of a semantic segmentation model while the related nutrient information is obtained through the pictures and the quality.

Drawings

FIG. 1 is a method framework and flow chart

FIG. 2 is a diagram showing the effect of the target detection model

FIG. 3 is a frame diagram of a classification model for learnable inter-class distance

FIG. 4 is a graph showing the relationship between recipes obtained by the model shown in FIG. 3

FIG. 5 is a diagram showing pixel-level semantic segmentation effect of vegetable raw materials

Detailed Description

As shown in fig. 1, the specific flow of the catering health analysis system based on the deep learning of image semantic meanings is as follows:

step 1, inputting a dish picture.

And 2, entering the dish identification module by the image. First, as shown in fig. 2, redundant information such as a background is removed by using the target detection model, and a dish enclosure frame is obtained. And then judging whether the picture is the dish or not by means of a two-classifier. And finally, when the judgment result is true, entering a model shown in the figure 3 to obtain 5 dish names and the menu thereof which have the highest matching degree with the picture.

For the model illustrated in fig. 3, the model is entirely comprised of an image encoder and a text encoder. During the training process, each model input is a set of image-material pairs (images)_k,ingredient_k,y_k)k∈[0,K]Wherein the image_kIs the kth picture, ingredient_kIs the corresponding raw material, i.e. recipe, y_kIt indicates whether the first two match, K being the total number of pictures. In concrete training, y_kRandomly assigning values, wherein 80% of the probability is assigned as 1, 20% of the probability is assigned as 0, namely, the menus are not matched, and a menu which is not matched is randomly selected as an ingredient_k. Recording the image encoder as Encode_ir_mageThe text encoder is denoted as Encode_ir_ngreThen

The goal of the training is such that when the image matches the text,

and

as small as possible and vice versa. In the most ideal case, the vector calculated by the dish picture and the corresponding menu is completely consistent, and the non-corresponding menu is orthogonal to the dish picture.

Thus setting the loss function to

When testing and using, firstly, the recipe is assigned to each dish_iCalculating and storing

When a user inputs a picture image_queryWhen requested, the score of the ith menu is

Fig. 4 verifies that the model learns the relationship information between recipes, and a recipe which is close to the raw material is often higher in matching score.

Step 3 nutrient query

For dishes with single main food materials, the ratio of the input quality of the user to the quality of the dishes in the menu is calculated, and then the result is obtained by multiplying the ratio by the corresponding nutrient elements of the food materials. For more than two dishes of main food materials, the proportional relation among the food materials is further refined by means of an image semantic segmentation model, and the effect is shown in fig. 5, so that the nutrient result is further accurate.

The image semantic segmentation model adopts a structure of MobileNetV2+ PPM. MobileNetV2 is a lightweight convolutional network. The common convolutional layer is divided into a DepthWise convolutional layer and a PointWise convolutional layer, so that the number of times of multiplication required by the convolution is greatly reduced. Meanwhile, the method adjusts the use condition of ReLu and adds a residual error network structure, so that the accuracy is further improved. The PPM pyramid type pooling module can better understand the context relationship, namely the relationship between the raw materials. Meanwhile, the method has good performance on the detection of small objects. The model classifies the raw materials according to colors, and better robustness and universality are obtained.

Claims

1. A catering health analysis method based on deep learning of image semantics is characterized by comprising the following steps:

(1) inputting a picture, detecting the positions of containers such as bowls and the like by a target detection method, further accurately detecting the positions of dishes to obtain a surrounding frame of the dishes, and removing irrelevant influence factors such as backgrounds and the like.

(1) Identifying the dish image: the dish image without the background is used as input, the dish image and the menu are mapped to the same domain through a classification network capable of identifying the distance between the menus, so that the distance between the dish image and the menu is obtained, and the five menus closest to the image are further obtained.

(2) Calculating and correcting nutrient of the dish: and performing pixel-level image semantic segmentation on the dish image, and determining that each pixel belongs to a background or a certain dish raw material, so that the menu is further corrected, and the nutrient content is accurately calculated.

2. The method for classifying the identifiable inter-menu distance as claimed in claim 1, further comprising:

(1) the dish classification model is composed of an image encoder and a text encoder, pictures and recipes are encoded into 512-dimensional vectors respectively, and the distance relation between the recipes is learned by means of the text encoder;

(2) adopting cosine loss as a loss function to enable the result of an image encoder to be closer to that of a text encoder;

(3) the dot product of the vectors of the two encoders is used as a matching score between the picture and the menu, and the matching score of any menu and menu picture can be provided.

3. The dish nutrient calculation method of claim 1, further comprising:

(1) firstly, the vegetable food materials are divided into five types of red, yellow, green, black and white according to the colors;

(2) then, classifying each pixel point in the dish image by adopting a training semantic segmentation network, wherein the classification result is red, yellow, green, black, white and non-dish six types, and the pixel proportion among various food materials is returned;

(3) and further correcting the content of the raw materials in the menu according to the returned pixel proportion relation.

4. The recipe raw material correction method as set forth in claim 3, further comprising:

the invention defines a standard menu, and for the dishes with the name R in the standard menu, the raw materials of the dishes are expressed as (ingre)_red,ingre_yellow,ingre_green,ingre_black,ingre_white,ingre_other) They respectively represent the results of six categories of food materials, and the corresponding mass of the food materials in the standard menu is m ═ m_red,m_yellow,m_green,m_black,m_white,m_other) Defining the ratio of each food material in the ith picture of the DIMAX data set

Comprises the following steps:

wherein

Showing the pixel size occupied by the corresponding food material in the ith picture of the R;

further calculating the volume ratio V of various food materials in R as follows:

wherein n is the total number of pictures of R in the DIMAX dataset;

for the R picture input by the user, the ratio of actual various food materials is V 'obtained through the image voice segmentation model, and the actual due quality is m', then the food material with the color of c is taken as an example:

where i, c ∈ { red, yellow, green, black, white }, m'_cI.e. the quality of the food material with color c after correction.