CN112650866A - Catering health analysis method based on image semantic deep learning - Google Patents

Catering health analysis method based on image semantic deep learning Download PDF

Info

Publication number
CN112650866A
CN112650866A CN202010836022.7A CN202010836022A CN112650866A CN 112650866 A CN112650866 A CN 112650866A CN 202010836022 A CN202010836022 A CN 202010836022A CN 112650866 A CN112650866 A CN 112650866A
Authority
CN
China
Prior art keywords
dish
image
menu
picture
pixel
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010836022.7A
Other languages
Chinese (zh)
Inventor
戴超
盛斌
朱双奇
潘思源
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Zhitang Health Technology Co ltd
Original Assignee
Shanghai Zhitang Health Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Zhitang Health Technology Co ltd filed Critical Shanghai Zhitang Health Technology Co ltd
Priority to CN202010836022.7A priority Critical patent/CN112650866A/en
Publication of CN112650866A publication Critical patent/CN112650866A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • G06F16/53Querying
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • G06F16/55Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • G06F16/58Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/583Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H20/00ICT specially adapted for therapies or health-improving plans, e.g. for handling prescriptions, for steering therapy or for monitoring patient compliance
    • G16H20/60ICT specially adapted for therapies or health-improving plans, e.g. for handling prescriptions, for steering therapy or for monitoring patient compliance relating to nutrition control, e.g. diets
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/07Target detection

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Databases & Information Systems (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Computing Systems (AREA)
  • Molecular Biology (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Biophysics (AREA)
  • Library & Information Science (AREA)
  • Biomedical Technology (AREA)
  • Evolutionary Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Nutrition Science (AREA)
  • Epidemiology (AREA)
  • Medical Informatics (AREA)
  • Primary Health Care (AREA)
  • Public Health (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Image Analysis (AREA)
  • Image Processing (AREA)

Abstract

The invention provides a catering health analysis method based on deep learning of image semantics, which takes a menu image as input and can realize high-precision dish image classification and dish nutrient calculation. In the dish image classification part, the invention constructs a dish image classification network capable of learning the distance between the recipes, the network takes the dish image and the recipe information as input, and the information of the raw material part in the recipe is deeply understood while the image information is learned, thereby further improving the classification accuracy. In the nutrient calculation part, pixel-level semantic segmentation is carried out on the image, the image information represented by each pixel point is accurate, the proportion of various raw materials in each picture is determined, and the raw material content in the menu is further corrected. For the same dish, different pictures can return different nutrient content information, so that the nutrient calculation module is more accurate and scientific.

Description

Catering health analysis method based on image semantic deep learning
Technical Field
The invention mainly relates to computer vision correlation technology, in particular to a dish image identification and dish image semantic segmentation technology based on deep learning.
Background
In today's society, diet health has been a topic of concern and concern to the general public. The reasonable and healthy diet can also help people to prevent diet-related diseases such as diabetes mellitus and the like. However, at the present stage, the science and popularity of dietary health are still insufficient, and most people still have insufficient understanding of truly scientific dietary health. Therefore, the diet health needs not only the improvement of attention, but also a way for helping the masses to scientifically know the diet and provide guidance suggestions with medical values, and the masses need not only a perceptual knowledge on diet, but also specific numbers and data guidance.
In the aspect of a dish analysis system, most of related catering analysis systems at the present stage have two defects: the user is required to have dish related knowledge such as raw materials, recipes and the like; the provided nutrient information is not comprehensive enough, and scientific medical guidance suggestions are lacked.
In the aspect of dish analysis algorithm, two mainstream dish image analysis technologies are currently used:
(1) a single label classifier is trained by means of a convolutional neural network, a dish corresponds to a class, and each picture obtains one class. Because pictures in the task of dish identification often have high similarity and complexity, the method cannot achieve a good effect.
(2) And training a multi-label classifier by taking the convolutional neural network as a framework, wherein each raw material corresponds to one category, and each picture obtains a plurality of categories. The method needs a large amount of extra manual information such as prior relation among dish raw materials, meanwhile, deep learning on menu pictures is not carried out, and the method still has a space for improving the accuracy.
Meanwhile, pixel-level semantic segmentation of dish picture raw materials is not performed at present, and data query aiming at dishes and staying on the surface is performed on dish nutrient calculation, namely unit-mass nutrients obtained from different pictures of the same dish are the same, and fine analysis and calculation aiming at the user pictures cannot be performed.
Disclosure of Invention
The invention provides a catering health analysis system which can meet most daily requirements on dietary health. The method can identify the corresponding dish name and the menu thereof according to the dish picture input by inquiry. And then, performing pixel-level semantic segmentation on the picture, and understanding the dish raw material information contained in each pixel in the picture, thereby accurately calculating the nutrient content information of the dish. The invention only needs to input dish pictures and the quality thereof and output a nutrient element reference table.
The technical scheme of the invention is as follows:
(1) target detection: after the user inputs the picture, the positions of the containers such as the bowl and the like are detected through a target detection method, the position of the dish is further accurate, the enclosing frame of the dish is obtained, and irrelevant influence factors such as the background are removed.
(2) The dish identification of the distance between the recipes can be learned: and (2) after the enclosing frame of the dishes is obtained according to the step (1), simultaneously learning picture and menu information through a classification model capable of learning the distance between the classes, matching the picture and the menu information, and finally obtaining five dishes with the highest matching degree with the picture and the menu thereof for a user to select.
(3) Calculating nutrient of the dish: after the name and the menu of the dishes are obtained, semantic division is performed on the dishes with various main food materials at the pixel level, the dish raw materials are divided according to colors, the proportion of each raw material is further refined, and therefore the content of nutrients is calculated more accurately.
The method has the advantages that the method does not need to be capable of automatically extracting the effective area of the dish image, and the dish type is analyzed by adopting the recognition model; and the content of the nutrients is further accurately calculated by means of a semantic segmentation model while the related nutrient information is obtained through the pictures and the quality.
Drawings
FIG. 1 is a method framework and flow chart
FIG. 2 is a diagram showing the effect of the target detection model
FIG. 3 is a frame diagram of a classification model for learnable inter-class distance
FIG. 4 is a graph showing the relationship between recipes obtained by the model shown in FIG. 3
FIG. 5 is a diagram showing pixel-level semantic segmentation effect of vegetable raw materials
Detailed Description
As shown in fig. 1, the specific flow of the catering health analysis system based on the deep learning of image semantic meanings is as follows:
step 1, inputting a dish picture.
And 2, entering the dish identification module by the image. First, as shown in fig. 2, redundant information such as a background is removed by using the target detection model, and a dish enclosure frame is obtained. And then judging whether the picture is the dish or not by means of a two-classifier. And finally, when the judgment result is true, entering a model shown in the figure 3 to obtain 5 dish names and the menu thereof which have the highest matching degree with the picture.
For the model illustrated in fig. 3, the model is entirely comprised of an image encoder and a text encoder. During the training process, each model input is a set of image-material pairs (images)k,ingredientk,yk)k∈[0,K]Wherein the imagekIs the kth picture, ingredientkIs the corresponding raw material, i.e. recipe, ykIt indicates whether the first two match, K being the total number of pictures. In concrete training, ykRandomly assigning values, wherein 80% of the probability is assigned as 1, 20% of the probability is assigned as 0, namely, the menus are not matched, and a menu which is not matched is randomly selected as an ingredientk. Recording the image encoder as EncodeirmageThe text encoder is denoted as EncodeirngreThen
Figure BDA0002639706370000031
Figure BDA0002639706370000032
The goal of the training is such that when the image matches the text,
Figure BDA0002639706370000033
and
Figure BDA0002639706370000034
as small as possible and vice versa. In the most ideal case, the vector calculated by the dish picture and the corresponding menu is completely consistent, and the non-corresponding menu is orthogonal to the dish picture.
Thus setting the loss function to
Figure BDA0002639706370000035
When testing and using, firstly, the recipe is assigned to each dishiCalculating and storing
Figure BDA0002639706370000036
When a user inputs a picture imagequeryWhen requested, the score of the ith menu is
Figure BDA0002639706370000037
Fig. 4 verifies that the model learns the relationship information between recipes, and a recipe which is close to the raw material is often higher in matching score.
Step 3 nutrient query
For dishes with single main food materials, the ratio of the input quality of the user to the quality of the dishes in the menu is calculated, and then the result is obtained by multiplying the ratio by the corresponding nutrient elements of the food materials. For more than two dishes of main food materials, the proportional relation among the food materials is further refined by means of an image semantic segmentation model, and the effect is shown in fig. 5, so that the nutrient result is further accurate.
The image semantic segmentation model adopts a structure of MobileNetV2+ PPM. MobileNetV2 is a lightweight convolutional network. The common convolutional layer is divided into a DepthWise convolutional layer and a PointWise convolutional layer, so that the number of times of multiplication required by the convolution is greatly reduced. Meanwhile, the method adjusts the use condition of ReLu and adds a residual error network structure, so that the accuracy is further improved. The PPM pyramid type pooling module can better understand the context relationship, namely the relationship between the raw materials. Meanwhile, the method has good performance on the detection of small objects. The model classifies the raw materials according to colors, and better robustness and universality are obtained.

Claims (4)

1. A catering health analysis method based on deep learning of image semantics is characterized by comprising the following steps:
(1) inputting a picture, detecting the positions of containers such as bowls and the like by a target detection method, further accurately detecting the positions of dishes to obtain a surrounding frame of the dishes, and removing irrelevant influence factors such as backgrounds and the like.
(1) Identifying the dish image: the dish image without the background is used as input, the dish image and the menu are mapped to the same domain through a classification network capable of identifying the distance between the menus, so that the distance between the dish image and the menu is obtained, and the five menus closest to the image are further obtained.
(2) Calculating and correcting nutrient of the dish: and performing pixel-level image semantic segmentation on the dish image, and determining that each pixel belongs to a background or a certain dish raw material, so that the menu is further corrected, and the nutrient content is accurately calculated.
2. The method for classifying the identifiable inter-menu distance as claimed in claim 1, further comprising:
(1) the dish classification model is composed of an image encoder and a text encoder, pictures and recipes are encoded into 512-dimensional vectors respectively, and the distance relation between the recipes is learned by means of the text encoder;
(2) adopting cosine loss as a loss function to enable the result of an image encoder to be closer to that of a text encoder;
(3) the dot product of the vectors of the two encoders is used as a matching score between the picture and the menu, and the matching score of any menu and menu picture can be provided.
3. The dish nutrient calculation method of claim 1, further comprising:
(1) firstly, the vegetable food materials are divided into five types of red, yellow, green, black and white according to the colors;
(2) then, classifying each pixel point in the dish image by adopting a training semantic segmentation network, wherein the classification result is red, yellow, green, black, white and non-dish six types, and the pixel proportion among various food materials is returned;
(3) and further correcting the content of the raw materials in the menu according to the returned pixel proportion relation.
4. The recipe raw material correction method as set forth in claim 3, further comprising:
the invention defines a standard menu, and for the dishes with the name R in the standard menu, the raw materials of the dishes are expressed as (ingre)red,ingreyellow,ingregreen,ingreblack,ingrewhite,ingreother) They respectively represent the results of six categories of food materials, and the corresponding mass of the food materials in the standard menu is m ═ mred,myellow,mgreen,mblack,mwhite,mother) Defining the ratio of each food material in the ith picture of the DIMAX data set
Figure FDA0002639706360000013
Comprises the following steps:
Figure FDA0002639706360000011
wherein
Figure FDA0002639706360000012
Showing the pixel size occupied by the corresponding food material in the ith picture of the R;
further calculating the volume ratio V of various food materials in R as follows:
Figure FDA0002639706360000021
wherein n is the total number of pictures of R in the DIMAX dataset;
for the R picture input by the user, the ratio of actual various food materials is V 'obtained through the image voice segmentation model, and the actual due quality is m', then the food material with the color of c is taken as an example:
Figure FDA0002639706360000022
Figure FDA0002639706360000023
where i, c ∈ { red, yellow, green, black, white }, m'cI.e. the quality of the food material with color c after correction.
CN202010836022.7A 2020-08-19 2020-08-19 Catering health analysis method based on image semantic deep learning Pending CN112650866A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010836022.7A CN112650866A (en) 2020-08-19 2020-08-19 Catering health analysis method based on image semantic deep learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010836022.7A CN112650866A (en) 2020-08-19 2020-08-19 Catering health analysis method based on image semantic deep learning

Publications (1)

Publication Number Publication Date
CN112650866A true CN112650866A (en) 2021-04-13

Family

ID=75346136

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010836022.7A Pending CN112650866A (en) 2020-08-19 2020-08-19 Catering health analysis method based on image semantic deep learning

Country Status (1)

Country Link
CN (1) CN112650866A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114360690A (en) * 2022-03-18 2022-04-15 天津九安医疗电子股份有限公司 Method and system for managing diet nutrition of chronic disease patient
WO2023159909A1 (en) * 2022-02-25 2023-08-31 重庆邮电大学 Nutritional management method and system using deep learning-based food image recognition model

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100173269A1 (en) * 2009-01-07 2010-07-08 Manika Puri Food recognition using visual analysis and speech recognition
CN104730073A (en) * 2015-02-16 2015-06-24 中国土产畜产进出口总公司 Quantitative analysis method and system for dishes contained in plates
CN108198188A (en) * 2017-12-28 2018-06-22 北京奇虎科技有限公司 Food nutrition analysis method, device and computing device based on picture
KR20180093141A (en) * 2017-02-09 2018-08-21 주식회사 롭썬컴퍼니 A meal calendar system using the image processing method based on colors
CN110852733A (en) * 2019-10-22 2020-02-28 杭州效准智能科技有限公司 Intelligent catering settlement system based on RFID fusion dish image matching identification
CN111128341A (en) * 2019-11-07 2020-05-08 北京航空航天大学 Dish identification APP based on deep learning

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100173269A1 (en) * 2009-01-07 2010-07-08 Manika Puri Food recognition using visual analysis and speech recognition
CN104730073A (en) * 2015-02-16 2015-06-24 中国土产畜产进出口总公司 Quantitative analysis method and system for dishes contained in plates
KR20180093141A (en) * 2017-02-09 2018-08-21 주식회사 롭썬컴퍼니 A meal calendar system using the image processing method based on colors
CN108198188A (en) * 2017-12-28 2018-06-22 北京奇虎科技有限公司 Food nutrition analysis method, device and computing device based on picture
CN110852733A (en) * 2019-10-22 2020-02-28 杭州效准智能科技有限公司 Intelligent catering settlement system based on RFID fusion dish image matching identification
CN111128341A (en) * 2019-11-07 2020-05-08 北京航空航天大学 Dish identification APP based on deep learning

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
飞桨PADDLEPADDLE: "我用飞桨做了一个菜品图像识别***", 《HTTPS://BLOG.CSDN.NET/PADDLEPADDLE/ARTICLE/DETAILS/104666572/》, 4 March 2020 (2020-03-04), pages 1 - 9 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2023159909A1 (en) * 2022-02-25 2023-08-31 重庆邮电大学 Nutritional management method and system using deep learning-based food image recognition model
CN114360690A (en) * 2022-03-18 2022-04-15 天津九安医疗电子股份有限公司 Method and system for managing diet nutrition of chronic disease patient

Similar Documents

Publication Publication Date Title
Aguilar et al. Grab, pay, and eat: Semantic food detection for smart restaurants
CN107578060B (en) Method for classifying dish images based on depth neural network capable of distinguishing areas
CN110837870B (en) Sonar image target recognition method based on active learning
US7848577B2 (en) Image processing methods, image management systems, and articles of manufacture
EP2064677B1 (en) Extracting dominant colors from images using classification techniques
CN110717554B (en) Image recognition method, electronic device, and storage medium
CN103699532B (en) Image color retrieval method and system
CN111178120B (en) Pest image detection method based on crop identification cascading technology
CN104572965A (en) Search-by-image system based on convolutional neural network
CN102385592B (en) Image concept detection method and device
CN112650866A (en) Catering health analysis method based on image semantic deep learning
CN111599438A (en) Real-time diet health monitoring method for diabetic patient based on multi-modal data
WO2017016886A1 (en) System and method for providing a recipe
CN111652273A (en) Deep learning-based RGB-D image classification method
CN110503140A (en) Classification method based on depth migration study and neighborhood noise reduction
EP3044733A1 (en) Image processing
CN111476319A (en) Commodity recommendation method and device, storage medium and computing equipment
CN114241226A (en) Three-dimensional point cloud semantic segmentation method based on multi-neighborhood characteristics of hybrid model
CN110097603B (en) Fashionable image dominant hue analysis method
CN113705310A (en) Feature learning method, target object identification method and corresponding device
CN109685146A (en) A kind of scene recognition method based on double convolution sum topic models
CN114882973A (en) Daily nutrient intake analysis method and system based on standard food recognition
Rimiru et al. GaborNet: investigating the importance of color space, scale and orientation for image classification
CN114494827A (en) Small target detection method for detecting aerial picture
Afifi Image retrieval based on content using color feature

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination