CN114463740A - Food nutrition assessment method and system based on visual analysis - Google Patents

Food nutrition assessment method and system based on visual analysis Download PDF

Info

Publication number
CN114463740A
CN114463740A CN202210023062.9A CN202210023062A CN114463740A CN 114463740 A CN114463740 A CN 114463740A CN 202210023062 A CN202210023062 A CN 202210023062A CN 114463740 A CN114463740 A CN 114463740A
Authority
CN
China
Prior art keywords
food
depth image
neural network
convolutional neural
image
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210023062.9A
Other languages
Chinese (zh)
Inventor
李海生
王薇
董笑笑
李楠
李勇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Technology and Business University
Original Assignee
Beijing Technology and Business University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Technology and Business University filed Critical Beijing Technology and Business University
Priority to CN202210023062.9A priority Critical patent/CN114463740A/en
Publication of CN114463740A publication Critical patent/CN114463740A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T17/00Three dimensional [3D] modelling, e.g. data description of 3D objects
    • G06T17/20Finite element generation, e.g. wire-frame surface description, tesselation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/60Analysis of geometric attributes
    • G06T7/62Analysis of geometric attributes of area, perimeter, diameter or volume
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H20/00ICT specially adapted for therapies or health-improving plans, e.g. for handling prescriptions, for steering therapy or for monitoring patient compliance
    • G16H20/60ICT specially adapted for therapies or health-improving plans, e.g. for handling prescriptions, for steering therapy or for monitoring patient compliance relating to nutrition control, e.g. diets
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10024Color image
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10028Range image; Depth image; 3D point clouds
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20016Hierarchical, coarse-to-fine, multiscale or multiresolution image processing; Pyramid transform
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20212Image combination
    • G06T2207/20224Image subtraction

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Software Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Physics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Geometry (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Biology (AREA)
  • Computer Graphics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Nutrition Science (AREA)
  • Epidemiology (AREA)
  • Medical Informatics (AREA)
  • Primary Health Care (AREA)
  • Public Health (AREA)
  • Image Analysis (AREA)

Abstract

The invention relates to a food nutrition assessment method and a system based on visual analysis, wherein the method comprises the following steps: s1: acquiring RGB (red, green and blue) images and depth images of food before and after eating; s2: acquiring the category and the visual area of the RGB food image by using Mask R-CNN, and marking the food area in the depth image; s3: constructing and training a 3D convolutional neural network, inputting the marked depth image into the trained 3D convolutional neural network, and predicting to obtain a view depth image; s4: registering the depth image and the depth image of the opposite view into the same world coordinate to obtain a three-dimensional point cloud of food; s5: applying the point cloud to each food object, meshing the food objects using a convex hull-based algorithm to calculate a food volume; s6: and calculating the food mass and the nutrient amount according to the food volume and the food category, and subtracting the food nutrient amounts before and after eating to obtain the accurate food intake. The method provided by the invention effectively solves the problem of diet evaluation in daily life.

Description

Food nutrition assessment method and system based on visual analysis
Technical Field
The invention relates to the field of computer vision and computer graphics, in particular to a food nutrition assessment method and system based on visual analysis.
Background
A healthy diet and balanced nutrition are key to preventing life-threatening diseases such as obesity, cardiovascular disease and cancer. According to WHO statistics, children under 3900 ten thousand and 5 years old are overweight or obese in 2020. Fortunately, obesity and many chronic diseases can be prevented by dietary assessments, which can monitor daily food intake and control eating habits. Thus, diet assessment has become a focus of widespread attention in various fields of computer vision, medicine, nutrition, and health.
In nutritional epidemiology, detailed food information is needed to help the dietician assess the dietary behavior of the participant. Traditional dietary intake is typically assessed by techniques such as dietary records, 24-hour dietary recall (24HR) and Frequency of Food Questionnaires (FFQ). In the diet recording method, respondents record the consumption of each food and drink for one or more days. To accomplish this task, each respondent must receive detailed training to adequately describe the food and consumption, including the name of the food, the method of preparation, the formula and portion of the food mix, etc. Furthermore, 24-hour dietary recall is a typical method of measuring diet information for daily foods. The idea of this method is to list the daily food intake over 24 hours in a special format. However, it is not always easy for a person to remember the actual food content and the amount of food intake. In real life, it is difficult and in many cases not feasible to see experts every 24 hours. The food frequency method focuses on describing eating patterns or eating habits rather than caloric intake. It requires the interviewee to report the frequency with which they typically consume each food from a list of foods for a particular period. Information is collected on a frequency basis, but there is little detailed information about other characteristics of the food being eaten, such as the cooking method. The total nutrient intake estimate is obtained by adding the product of the reporting frequencies for each of all foods to the amount of nutrients in the designated (or assumed) food, resulting in an estimated daily intake of nutrients, dietary components, and food groups. In most cases, the goal of the food frequency method is to obtain a rough estimate of the total daily intake over a specified period of time. These conventional manual recording methods are complicated and cumbersome, and contain a large number of deviations and errors. Therefore, there is a need to develop objective diet assessment techniques to solve the problem of inaccurate and subjective measures.
With recent advances in Artificial Intelligence (AI), particularly computer vision and machine learning, a road has been paved for more powerful automatic meal assessment. With the widespread use of portable devices (e.g., smartphones) and the advancement of computer vision, there has been a proliferation of food monitoring applications based on automated food image processing. It not only relieves the burden of recording food, but also provides an immediate diet assessment, showing great potential in effective diet monitoring and control. It is emphasized that two important pieces of information, the name of the food and the volume of the food, required for food nutrition, are obtained from the food image. Existing methods have made great progress through image recognition techniques, but accurate and convenient estimation of food volume remains a challenge. The measurement techniques for food volume include model-based techniques and stereo-based techniques, among others. However, model-based techniques typically involve varying degrees of user intervention. The stereoscopic-based technology requires a user to take food images from multiple angles, increasing the user's burden. Thus, there is not yet a good way to conveniently and time-effectively monitor the dietary intake of a user.
Disclosure of Invention
In order to solve the technical problems, the invention provides a food nutrition assessment method and system based on visual analysis.
The technical solution of the invention is as follows: a method for food nutrition assessment based on visual analysis, comprising:
step S1: acquiring RGB images and depth images of food before and after eating, wherein the shooting angles of the RGB images and the depth images are kept consistent;
step S2: acquiring the food category and the visual area of the RGB image by using a Mask R-CNN neural network, and marking the corresponding food area in the depth image to obtain a marked depth image;
step S3: constructing and training a 3D convolutional neural network, inputting the marked depth image into the trained 3D convolutional neural network, and predicting to obtain a view depth image of the depth image; wherein the 3D convolutional neural network comprises: an initial layer, an encoder, a full link layer, and a decoder;
step S4: registering the depth image and the depth image of the opposite view into the same world coordinate to obtain a complete three-dimensional point cloud of a target object;
step S5: applying the point cloud to each tagged food object, meshing the food objects using a convex hull-based algorithm to calculate a food volume;
step S6: and calculating the food mass according to the food mass and the food category, comparing the food mass with a food nutrition table, calculating the food nutrition information before and after eating, and subtracting the food nutrition information from the food nutrition table to obtain the accurate food intake.
Compared with the prior art, the invention has the following advantages:
the invention discloses a food nutrition assessment method based on visual analysis, which can predict depth images of food at opposite visual angles, relieve the common problem of food occlusion in real life, reduce the burden of a user for shooting food images from multiple angles and effectively solve the problem of diet assessment in daily life.
Drawings
FIG. 1 is a flow chart of a method for food nutrition assessment based on visual analysis in an embodiment of the present invention;
FIG. 2 is a schematic diagram of a 3D convolutional neural network structure according to an embodiment of the present invention;
FIG. 3 is a schematic flow chart of calculating the volume of food based on the convex hull algorithm according to the embodiment of the present invention;
fig. 4 is a block diagram of a food nutrition evaluation system based on visual analysis according to an embodiment of the present invention.
Detailed Description
The food nutrition assessment method based on visual analysis can predict the depth images of food at opposite visual angles, relieves the common problem of food occlusion in real life, reduces the burden of a user on shooting food images from multiple angles, and effectively solves the problem of diet assessment in daily life.
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings.
Example one
As shown in fig. 1, a food nutrition assessment method based on visual analysis according to an embodiment of the present invention includes the following steps:
step S1: acquiring RGB images and depth images of food before and after eating, wherein the shooting angles of the RGB images and the depth images are kept consistent;
step S2: acquiring the food category and the visual area of the RGB image by using a Mask R-CNN neural network, and marking the corresponding food area in the depth image to obtain a marked depth image;
step S3: constructing and training a 3D convolutional neural network, inputting the marked depth image into the trained 3D convolutional neural network, and predicting to obtain a view depth image of the depth image; wherein, 3D convolution neural network includes: an initial layer, an encoder, a full link layer, and a decoder;
step S4: registering the depth image and the depth image of the opposite view into the same world coordinate to obtain a complete three-dimensional point cloud of the target object;
step S5: applying a point cloud to each marked food object, meshing the food objects using a convex hull-based algorithm to calculate a food volume;
step S6: and calculating the food mass according to the food mass and the food category, comparing the food mass with a food nutrition table, calculating the food nutrition information before and after eating, and subtracting the food nutrition information before and after eating to obtain the accurate food intake.
In one embodiment, the step S1: the method comprises the following steps of acquiring RGB images and depth images of food before and after eating, wherein the shooting angles of the RGB images and the depth images are kept consistent, and the method specifically comprises the following steps:
the user selects a portable device with a depth sensor or depth camera to obtain RGB food images and depth images before and after eating. The user can shoot from any angle and does not need to place a reference card (or a reference mark) beside the food, thereby greatly reducing the burden of the user. In the process, the shooting angles of the RGB image and the depth image are required to be kept consistent.
In one embodiment, the step S2: acquiring the food category and the visual area of the RGB image by using a Mask R-CNN neural network, marking the corresponding food area in the depth image to obtain a marked depth image, which specifically comprises the following steps:
this step is related to food recognition and food segmentation, so after acquiring the food image, a visual analysis of the food is first performed. The key to visual analysis is to obtain a compact and expressive representation of features. The embodiment of the invention adopts a general object example segmentation framework Mask R-CNN and simultaneously processes food segmentation and identification. It is an extended form of Faster R-CNN, which can effectively detect objects in an image, and also can generate a high-quality segmentation mask for each instance. Thereby, food category and food visual area information can be acquired.
Firstly, inputting an RGB image into an ROI Align layer of a Mask R-CNN, mapping the ROI to a corresponding position, then, rounding up, outputting according to a set size, dividing an original image area into different parts (sections), and passing through a maximum pooling layer according to a finally generated part. In the embodiment of the invention, the ROI Align layer uses 54 layers of standard residual error neural networks (ResNet) as a feature extractor for extracting the Section features. And inputting the extracted features of different sections into two convolution layers in sequence, extracting the Section features again, wherein convolution kernels are respectively 3X 3 and 1X 1, a branch is inserted into the convolution layer with the convolution kernel of 3X 3 and used for calculating a classification frame, and a Feature Pyramid (FPN) is connected behind the second convolution layer to serve as a structure of a prediction segmentation mask. And finally obtaining the food category and the food visual area information.
In addition, the corresponding food area in the depth image needs to be marked to obtain a marked depth image.
Since the food image is photographed at a single angle, a food blocking phenomenon inevitably occurs. In detail, only the front side of the food is visible and the back side shape of the food is not accessible. To solve this problem, the embodiment of the present invention designs a 3D convolutional neural network for predicting the image of the back of the food.
As shown in the schematic structural diagram of the 3D convolutional neural network of fig. 2, in one embodiment, the step S3: constructing and training a 3D convolutional neural network, inputting the marked depth image into the trained 3D convolutional neural network, and predicting to obtain a view depth image of the depth image; wherein, 3D convolution neural network includes: initial layer, encoder, full connection layer and decoder specifically include:
step S31: constructing a 3D convolutional neural network, comprising:
initial layer: the convolution layers with different convolution kernel sizes are connected together to form the convolution device; in the embodiment of the invention, 3 initial layers are used for processing the input image. Each initial layer is formed by connecting 4 convolutional layers with different convolutional kernel sizes; the initial layer is intended to process food objects acquired at different distances. Since food objects may vary in size, an initial layer with convolution kernels of different sizes can capture the details of the food object in a more convenient and efficient manner.
An encoder: the convolution layer is composed of a plurality of convolution layers with different convolution kernels; in the embodiment of the invention, 1 convolution layer of 3 × 3 convolution kernels and 3 convolution layers of 2 × 2 convolution kernels are connected together to be used as an encoder;
a plurality of fully-connected layers: sharing the feature through a full connection layer, and aligning feature dimensions; the embodiment of the invention shares the characteristics of the vector characteristics output by the encoder through 2 full-connection layers, and aligns the characteristic dimensions;
a decoder: the encoder is inverted and then is combined with a plurality of convolution layers to form the encoder; as shown in fig. 2, the decoder is formed by inverting the encoder and adding 2 convolution kernels of 3 × 3 and 1 convolution kernel of 5 × 5.
The 3D convolutional neural network of an embodiment of the present invention inputs a depth image of size 480 x 640 and outputs a depth image of a pair of views captured at opposite perspectives corresponding to the input image of 480 x 640.
Step S32: training a 3D convolutional neural network by using the disclosed two-dimensional view and a corresponding three-dimensional model data set to obtain a trained 3D convolutional neural network; the 3D convolutional neural network outputs a depth image of the opposite side shielding angle of the input image, namely an opposite view depth image; the loss function that defines a 3D convolutional neural network is shown in equation (1):
Figure BDA0003463273440000051
wherein the content of the first and second substances,
Figure BDA0003463273440000052
for the pixel value of the view depth image, d (u, v) is the pixel value of the depth image, w and h respectively represent the width and height of the image, λ is a regularization term in a loss function, and b is an offset;
the embodiment of the invention trains the 3D convolutional neural network constructed in the step S31 by using the disclosed two-dimensional view and the corresponding three-dimensional model data set, and simultaneously utilizes the loss function of the formula (1) to control the convergence speed of the network, thereby finally obtaining the trained 3D convolutional neural network.
Step S33: and inputting the marked depth image into a trained 3D convolutional neural network to generate a depth image of the opposite view of the depth image.
And inputting the marked depth image obtained in the step S2 into a trained 3D convolutional neural network to generate a view-to-view depth image of the depth image.
After obtaining the depth image of the food and its depth image of the view, its corresponding three-dimensional point cloud can be calculated. The existing volume estimation method obtains an external calibration matrix through a reference mark, fuses a synthetic point and an initial point, and obtains a complete point cloud through a transformation matrix. The embodiment of the invention provides a method for reconstructing three-dimensional point cloud without using a reference mark, and the reference mark is not required to be additionally arranged.
In one embodiment, the step S4: registering the depth image and the depth image of the opposite view into the same world coordinate to obtain a complete three-dimensional point cloud of the target object, which specifically comprises the following steps:
step S41: moving the position of the origin of the world coordinates to the center of the camera that captured the depth image, re-projecting the depth image into the world coordinates of the following equation (2):
Figure BDA0003463273440000061
wherein u, v represent coordinates in the depth image, and X, Y, Z represent coordinates in world coordinates; z is a scalar, representing depthmap (u, v);
Figure BDA0003463273440000062
is a camera matrix, where fx,fyA parameter indicative of a focal length; c. Cx,cyIs the principal point offset, i.e., the position of the principal point position relative to the image plane (projection plane);
step S42: by performing 180 degrees camera rotation and translation, respectively, through the rotation and translation matrices, it can be simplified to the following equation (3):
Figure BDA0003463273440000063
where θ is the angle of rotation of the camera along the y-axis;
step S43: registering to the same world coordinates by the following equation (4), a food point cloud is synthesized:
Figure BDA0003463273440000064
wherein the content of the first and second substances,
Figure BDA0003463273440000065
in order to be a matrix of rotations,
Figure BDA0003463273440000066
is a translation matrix.
In one embodiment, the step S5: applying a point cloud to each marked food object, meshing the food objects using a convex hull-based algorithm to calculate a food volume, comprising:
step S51: layering food point clouds from bottom to top according to preset equal intervals, and storing each layer of point cloud as an independent unit;
in the embodiment of the invention, the layering equal interval is set to be 5 cm.
Step S52: performing z-axis projection on the layered food point clouds, and then performing outer convex hull construction on each layer of food point cloud by using a convex hull algorithm to obtain convex hull outlines;
step S53: setting a side length threshold LlimitFor convex hull contours and each being greater than a variable length threshold LlimitThe diameter of the edge of the circle is taken as a circular area, and points in the circle are selected as suspected boundary points; finding a point which forms a maximum angle with the diameter end point in the suspected boundary points as a new boundary point, so as to shrink the food boundary and eliminate gaps;
step S54: repeating the steps S52-S53 until all the side lengths are less than the threshold value, and stopping iterative computation;
step S55: the volume of each layer is calculated and the volume of all layers is added to obtain the volume of the food.
As shown in fig. 3, a schematic flow chart for calculating the volume of food based on the convex hull algorithm is shown.
In one embodiment, the step S6: and calculating the food mass according to the food volume and the food category, comparing the food mass with a food nutrition table, calculating the food nutrition amount before and after eating, and subtracting the food nutrition amount before and after eating to obtain the accurate food intake.
From the food type obtained in step S2 and the food volume obtained in step S5, the food mass M of the food item can be calculated according to formula (5):
M=ρV (5)
where V represents the indicated food volume and ρ is the food density, available from a food density table;
depending on the food mass M, nutritional information in the food may be determined from a food nutrition table as shown in the food nutrition table example of table 1 below; the food nutrient amount N is calculated using the following formula (6):
Figure BDA0003463273440000071
wherein N isTAnd MTRespectively, the food nutrition and food quality looked up from the table.
Table 1: food nutrition table example
Figure BDA0003463273440000072
After the food nutrition amount before and after meal is calculated, the actual food intake amount of the user can be obtained by subtracting the food nutrition amount before and after meal.
The food nutrition assessment method based on visual analysis can predict the depth images of food at opposite visual angles, relieves the common problem of food occlusion in real life, reduces the burden of a user on shooting food images from multiple angles, and effectively solves the problem of diet assessment in daily life.
Example two
As shown in fig. 4, an embodiment of the present invention provides a food nutrition evaluation system based on visual analysis, including the following modules:
an RGB image and depth image obtaining module 71, configured to obtain RGB images and depth images of food before and after eating, where shooting angles of the RGB images and the depth images are kept consistent;
a food category and visual area obtaining module 72, configured to obtain a food category and a visual area of the RGB image by using a Mask R-CNN neural network, and mark a corresponding food area in the depth image to obtain a marked depth image;
a depth image to look prediction module 73, configured to construct and train a 3D convolutional neural network, input the labeled depth image into the trained 3D convolutional neural network, and predict a depth image to look at of the depth image; wherein the 3D convolutional neural network comprises: an initial layer, an encoder, a full link layer, and a decoder;
a food three-dimensional point cloud obtaining module 74, configured to register the depth image and the depth image of the opposite view into the same world coordinate, so as to obtain a complete three-dimensional point cloud of a target object;
a calculate food volume module 75 for applying the point cloud to each marked food object, the food objects being gridded using a convex hull based algorithm to calculate a food volume;
and a food intake calculation module 76 for calculating the food mass according to the food volume and the food category, comparing the food mass with a food nutrition table, calculating the food nutrition information before and after eating, and subtracting the two to obtain the accurate food intake.
The above examples are provided only for the purpose of describing the present invention, and are not intended to limit the scope of the present invention. The scope of the invention is defined by the appended claims. Various equivalent substitutions and modifications can be made without departing from the spirit and principles of the invention, and are intended to be within the scope of the invention.

Claims (5)

1. A method for food nutrition assessment based on visual analysis, comprising:
step S1: acquiring RGB images and depth images of food before and after eating, wherein the shooting angles of the RGB images and the depth images are kept consistent;
step S2: acquiring the food category and the visual area of the RGB image by using a MaskR-CNN neural network, and marking the corresponding food area in the depth image to obtain a marked depth image;
step S3: constructing and training a 3D (three-dimensional) convolutional neural network, inputting the marked depth image into the trained 3D convolutional neural network, and predicting to obtain a depth image of the opposite view of the depth image; wherein the 3D convolutional neural network comprises: an initial layer, an encoder, a full link layer, and a decoder;
step S4: registering the depth image and the depth image of the opposite view into the same world coordinate to obtain a complete three-dimensional point cloud of a target object;
step S5: applying the point cloud to each tagged food object, meshing the food objects using a convex hull-based algorithm to calculate a food volume;
step S6: and calculating the food mass according to the food volume and the food category, comparing the food mass with a food nutrition table, calculating the food nutrition amount before and after eating, and subtracting the food nutrition amount before and after eating to obtain the accurate food intake.
2. The visual analysis-based food nutrition assessment method according to claim 1, wherein said step S3: constructing and training a 3D convolutional neural network, inputting the marked depth image into the trained 3D convolutional neural network, and predicting to obtain a view depth image of the depth image; wherein the 3D convolutional neural network comprises: initial layer, encoder, full connection layer and decoder specifically include:
step S31: constructing a 3D convolutional neural network, comprising:
initial layer: the convolution layers with different convolution kernel sizes are connected together to form the convolution device;
an encoder: the convolution layer is composed of a plurality of convolution layers with different convolution kernels;
a plurality of fully-connected layers: sharing features through the full connection layer, and aligning feature dimensions;
a decoder: the encoder is inverted and then is combined with a plurality of convolution layers to form the encoder;
step S32: training the 3D convolutional neural network by using the disclosed two-dimensional view and the corresponding three-dimensional model data set to obtain a trained 3D convolutional neural network; the 3D convolutional neural network outputs a depth image of the opposite side shielding angle of the input image, namely an opposite view depth image; the loss function that defines the 3D convolutional neural network is shown in equation (1):
Figure FDA0003463273430000011
wherein, the first and the second end of the pipe are connected with each other,
Figure FDA0003463273430000021
for the pixel values of the pair-view depth image, d (u, v) is the pixel values of the depth image, w and h represent the width and height of the image, respectively, λ is a regularization term in a loss function, and b is an offset;
step S33: and inputting the marked depth image into the trained 3D convolutional neural network to generate a view-to-view depth image of the depth image.
3. The visual analysis-based food nutrition assessment method according to claim 1, wherein said step S4: registering the depth image and the depth image of the opposite view into the same world coordinate to obtain a complete three-dimensional point cloud of the target object, wherein the method specifically comprises the following steps:
step S41: moving the position of the origin of the world coordinates to the center of the camera that captured the depth image, re-projecting the depth image into the world coordinates of equation (2) below:
Figure FDA0003463273430000022
wherein u, v represent coordinates in the depth image, and X, Y, Z represent coordinates in world coordinates; z is a scalar, representing depthmap (u, v);
Figure FDA0003463273430000023
is a camera matrix, where fx,fyA parameter indicative of a focal length; c. Cx,cyIs the principal point offset, i.e., the position of the principal point position relative to the image plane (projection plane);
step S42: by performing 180 degrees camera rotation and translation, respectively, through the rotation and translation matrices, it can be simplified to the following equation (3):
Figure FDA0003463273430000024
where θ is the angle of rotation of the camera along the y-axis;
step S43: registering to the same world coordinates by the following equation (4), a food point cloud is synthesized:
Figure FDA0003463273430000025
wherein the content of the first and second substances,
Figure FDA0003463273430000026
in order to be a matrix of rotations,
Figure FDA0003463273430000027
is a translation matrix.
4. The visual analysis-based food nutrition assessment method according to claim 1, wherein said step S5: applying the point cloud to each marked food object, meshing the food objects using a convex hull-based algorithm to calculate a food volume, comprising:
step S51: layering the food point clouds from bottom to top according to preset equal intervals, and storing each layer of point cloud as an independent unit;
step S52: performing z-axis projection on the layered food point clouds, and then performing outer convex hull construction on each layer of the food point clouds by using a convex hull algorithm to obtain convex hull outlines;
step S53: setting a side length threshold LlimitFor the convex hull contour and each larger than the variable length threshold LlimitThe diameter of the edge of the circle is taken as a circular area, and points in the circle are selected as suspected boundary points; finding a point which forms a maximum angle with the diameter end point in the suspected boundary points as a new boundary point, so as to shrink the food boundary and eliminate gaps;
step S54: repeating the steps S52-S53 until all the side lengths are less than the threshold value, and stopping iterative computation;
step S55: the volume of each layer is calculated and the volume of all layers is added to obtain the volume of the food.
5. A food nutrition assessment system based on visual analysis, comprising the following modules:
the food processing device comprises an RGB image and depth image acquisition module, a food processing module and a display module, wherein the RGB image and depth image acquisition module is used for acquiring RGB images and depth images of food before and after eating, and the shooting angles of the RGB images and the depth images are kept consistent;
the food category and visual region acquisition module is used for acquiring the food category and the visual region of the RGB image by using a MaskR-CNN neural network, and marking the corresponding food region in the depth image to obtain a marked depth image;
the prediction view depth image module is used for constructing and training a 3D (three-dimensional) convolutional neural network, inputting the marked depth image into the trained 3D convolutional neural network, and predicting to obtain a view depth image of the depth image; wherein the 3D convolutional neural network comprises: an initial layer, an encoder, a full link layer, and a decoder;
the food three-dimensional point cloud obtaining module is used for registering the depth image and the depth image of the opposite view into the same world coordinate to obtain a complete three-dimensional point cloud of a target object;
a calculate food volume module to apply the point cloud to each marked food object, the food objects being gridded using a convex hull based algorithm to calculate a food volume;
and the food intake calculation module is used for calculating the food mass according to the food mass and the food category, comparing the food mass with a food nutrition table, calculating food nutrition information before and after eating, and subtracting the food nutrition information from the food nutrition table to obtain the accurate food intake.
CN202210023062.9A 2022-01-10 2022-01-10 Food nutrition assessment method and system based on visual analysis Pending CN114463740A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210023062.9A CN114463740A (en) 2022-01-10 2022-01-10 Food nutrition assessment method and system based on visual analysis

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210023062.9A CN114463740A (en) 2022-01-10 2022-01-10 Food nutrition assessment method and system based on visual analysis

Publications (1)

Publication Number Publication Date
CN114463740A true CN114463740A (en) 2022-05-10

Family

ID=81408753

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210023062.9A Pending CN114463740A (en) 2022-01-10 2022-01-10 Food nutrition assessment method and system based on visual analysis

Country Status (1)

Country Link
CN (1) CN114463740A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117078955A (en) * 2023-08-22 2023-11-17 海啸能量实业有限公司 Health management method based on image recognition

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117078955A (en) * 2023-08-22 2023-11-17 海啸能量实业有限公司 Health management method based on image recognition
CN117078955B (en) * 2023-08-22 2024-05-17 海口晓建科技有限公司 Health management method based on image recognition

Similar Documents

Publication Publication Date Title
US10657709B2 (en) Generation of body models and measurements
WO2021057810A1 (en) Data processing method, data training method, data identifying method and device, and storage medium
Lo et al. Point2volume: A vision-based dietary assessment approach using view synthesis
Xu et al. Image-based food volume estimation
CN108334899A (en) Quantify the bone age assessment method of information integration based on hand bone X-ray bone and joint
CA2949844C (en) System and method for identifying, analyzing, and reporting on players in a game from video
Fang et al. Single-view food portion estimation based on geometric models
US20210174505A1 (en) Method and system for imaging and analysis of anatomical features
CN113065558A (en) Lightweight small target detection method combined with attention mechanism
WO2020103417A1 (en) Bmi evaluation method and device, and computer readable storage medium
Wang et al. Multi-level nested pyramid network for mass segmentation in mammograms
CN108597582B (en) Method and device for executing fast R-CNN neural network operation
Chen et al. Model-based measurement of food portion size for image-based dietary assessment using 3D/2D registration
CN104598871B (en) A kind of facial age computational methods based on correlation regression
CN104778374A (en) Automatic dietary estimation device based on image processing and recognizing method
CN101976444A (en) Pixel type based objective assessment method of image quality by utilizing structural similarity
CN110516716A (en) Non-reference picture quality appraisement method based on multiple-limb similarity network
CN112016497A (en) Single-view Taijiquan action analysis and assessment system based on artificial intelligence
CN108898269A (en) Electric power image-context impact evaluation method based on measurement
Sun et al. Image adaptation and dynamic browsing based on two-layer saliency combination
CN114463740A (en) Food nutrition assessment method and system based on visual analysis
CN111968135A (en) Three-dimensional abdomen CT image multi-organ registration method based on full convolution network
Konstantakopoulos et al. An automated image-based dietary assessment system for mediterranean foods
CN114565976A (en) Training intelligent test method and device
Shermila et al. Estimation of protein from the images of health drink powders

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination