CN112995572A

CN112995572A - Remote conference system and physical display method in remote conference

Info

Publication number: CN112995572A
Application number: CN202110438782.7A
Authority: CN
Inventors: 贾涛
Original assignee: Shenzhen Heijin Industrial Manufacturing Co ltd
Current assignee: Shenzhen Heijin Industrial Manufacturing Co ltd
Priority date: 2021-04-23
Filing date: 2021-04-23
Publication date: 2021-06-18

Abstract

The invention relates to a remote conference system and a real object display method in a remote conference, which comprises a near-end device and a far-end device, wherein the near-end device and the far-end device both comprise a host and an external device; the peripheral equipment comprises a camera module, a microphone and an adjusting device; the camera module is connected with the adjusting device and used for acquiring a 3D virtual image of a real object and a user gesture, and the host machine recognizes the user gesture and controls the 3D virtual image through a forming instruction; the host machine also controls the adjusting device by recognizing voice and forming an instruction; the 3D image acquisition device comprises a 3D scanner or a 3D depth perception camera; the whole conference or teaching activities can be smoothly carried out conveniently, and the communication efficiency is improved; the operation of other inconvenient peripheral equipment such as a keyboard or a mouse is not needed, so that the continuous suspension of teaching explanation or meeting programs is avoided, and the development of the remote meeting or remote teaching activities is more smooth and efficient.

Description

Remote conference system and physical display method in remote conference

Technical Field

The invention relates to the technical field of teleconferencing, in particular to a teleconferencing system and a real object display method in a teleconference.

Background

When a teleconference is carried out, communication is generally carried out in a video mode, when a real object needs to be displayed to be a three-dimensional graph, a display person often needs to adjust the camera and rotate and turn the real object, the real object can be conveniently observed by users of other remote-end equipment, and the teleconference is very inconvenient.

Disclosure of Invention

The technical problem to be solved by the present invention is to provide a remote conference system and a method for displaying a real object in a remote conference, aiming at the above-mentioned defects in the prior art.

The technical scheme adopted by the invention for solving the technical problems is as follows:

constructing a remote conference system, which comprises a near-end device and a far-end device remotely connected with the near-end device, wherein the number of the far-end devices is one or more, and the near-end device and the far-end device respectively comprise a host and a peripheral connected with the host;

the peripheral comprises a display module, a camera module, a pickup module, a microphone, a loudspeaker and an adjusting device; wherein:

the camera module is arranged on the adjusting device and comprises a 3D image acquisition device and a depth lens, and the 3D image acquisition device is used for acquiring a 3D virtual image of a real object;

the host acquires user gestures through the depth lens, recognizes the user gestures and controls the 3D virtual image through a command corresponding to the gestures; the host computer also obtains voice information through the microphone, recognizes voice and controls the adjusting device through a command corresponding to the voice;

the 3D image acquisition device comprises a 3D scanner or a 3D depth perception camera.

Preferably, the host computer is still used for long-range acoustic image to adjust, and is concrete, near-end equipment the host computer is to far-end equipment peripheral hardware output audio and video, far-end equipment the peripheral hardware passes through display module shows the image, through the broadcast sound of speaker, the rethread pickup module is right the sound of speaker broadcast is gathered, acquires sound data, through camera module is right the video that display module broadcast is gathered, acquires video data to with video data and sound data transmission to near-end equipment the host computer, staff are according to video data and sound data to far-end equipment the display module with the speaker debugging.

Preferably, the host is further configured to record a conference, specifically, store the voice acquired by the microphone as an audio file, convert the audio file into a text file through voice recognition, recognize the speaker of each voice segment through voiceprint recognition, and mark corresponding speaker information on the text converted from each voice segment.

A real object display method in a teleconference is based on the teleconference system and comprises the following steps:

the method comprises the following steps: when the real object is to be displayed, a user of the near-end equipment sends an instruction through voice, the host of the near-end equipment acquires the voice through the microphone, the voice of the user is recognized, the adjusting device is controlled through the instruction corresponding to the voice to adjust the position of the camera module, the real object image information is acquired through the camera module and is transmitted to the far-end equipment in real time;

step two: when a 3D virtual image of a three-dimensional real object is to be displayed, acquiring picture data of the real object through a 3D scanner or a 3D depth perception camera; when the position or the angle of the camera module needs to be adjusted, a user of the near-end equipment sends an instruction through voice, the host of the near-end equipment acquires the voice through a microphone, the voice of the user is recognized, the adjusting device is controlled to adjust the position of the camera module through the instruction corresponding to the voice, the camera module is matched with the direction and the angle of an artificially overturned real object at the same time, graphic information of each angle of the three-dimensional real object is acquired, the graphic information of the three-dimensional real object is edited, and a virtual real object three-dimensional image is formed;

step three: when the 3D virtual image of the real object is not clear, executing a first step;

step four: when a 3D virtual image of a real object is to be controlled, the host machine acquires a user gesture through the depth lens or the 3D depth perception camera, recognizes the user gesture and controls the 3D virtual image through a command corresponding to the gesture;

step five: and when the plane image of the plane object is to be displayed, the host of the near-end equipment acquires the plane display image of the plane object through the camera module.

Preferably, when the 3D image acquisition device employs a 3D depth perception camera;

if the displayed real object is a three-dimensional object, the host acquires a 3D virtual image of the real object through a 3D depth perception camera, and acquires a gesture of a user through the depth lens and sends a corresponding instruction to control the state of the 3D virtual image;

if the actual real object needs to be displayed remotely, the angle of the real object is turned over by hand, the host computer sends out a control to the adjusting device by acquiring a voice command so as to adjust the position of the 3D depth perception camera, and the detailed structure information of the real object is displayed by matching with the manual turning of the user on the real object;

when the display module displays the 3D virtual image and the real image of the real object simultaneously in a split screen mode, the host has the function of controlling the real object display through voice and controlling the 3D virtual image through gestures.

Preferably, when the object to be displayed is a planar paper form document, the host acquires an image of the paper form document through the camera module;

the host computer scans fields of the image, calls out corresponding prefabricated template files according to the first row and/or the first column fields of the scanned form, correspondingly fills the scanned fields into the template files to form files, and sends the form files to all remote devices;

when the near-end equipment needs to display the table, the host of the near-end equipment opens the table file, the host of the far-end equipment synchronously opens the table file, the host of the near-end equipment acquires a control instruction of a user on the table and sends the control instruction to the host of the far-end equipment, and the far-end equipment and the near-end equipment update the table display state in real time according to the instruction.

Preferably, when the table image is scanned, the maximum bounding box is found first, a plurality of bounding boxes in the maximum bounding box are positioned, the rows and columns of the table are determined, and fields of the rows and columns of the table are scanned; calling out a prefabricated template, matching the image form with the prefabricated template according to the scanned form row and column information and the first row and/or first column character information, determining the form type, and newly building a corresponding template form file according to the determined form type; and filling the identified fields of the rest rows and columns into corresponding areas of the template format file, storing the file, and simultaneously respectively sending the file to all remote equipment.

Preferably, when the form image is scanned and the image needs to be corrected, specifically, the longest straight line in the image is found first, then it is determined whether the longest straight line is close to the horizontal line or the vertical line, then the included angle between the longest straight line and the horizontal line/vertical line is found, the inclination angle of the image is determined, and thus the angle of the image is corrected in a rotating manner.

Preferably, when the host acquires a plurality of image files, scanning header text information outside the maximum bounding box, if it is detected that the header text information is consistent and the templates are the same, performing field scanning on the areas to be identified of the image files with the same template, and sequentially filling the scanned fields into files with the same template format according to the receiving sequence and storing the files.

Preferably, the target remote device obtains a table display instruction and a table editing instruction, sends the table display instruction and the table editing instruction to the other remote devices through the host of the near-end device, the remote device updates the table according to the table display instruction and the table editing instruction, and after confirming that the other remote devices are all updated, the target remote device updates the table according to the table display instruction and the table editing instruction.

The invention has the beneficial effects that: through obtaining speech information, discernment pronunciation and formation instruction control adjusting device adjusts camera module's position through adjusting device, cooperates the manual rotation object of user to obtain the image of different angles in kind to transmit the image to far-end equipment in real time, be convenient for explain.

The 3D virtual image of the real object can be obtained through the cooperation of the adjusting device and the 3D depth perception camera, the 3D virtual image is transmitted to the far-end equipment, after the 3D virtual image is obtained, the 3D depth perception camera can also recognize the gesture of the user and control the 3D virtual image through a corresponding instruction by obtaining the gesture of the user, the change of the 3D virtual image is synchronously transmitted to the far-end equipment, the interference of the displayed finger on the real object image can be reduced, and the user of the far-end equipment can clearly observe the real object carefully.

3D degree of depth perception camera itself has the effect that intelligence was caught the real object or is explained personnel, cooperates the position and the angle of voice control adjusting device adjustment 3D degree of depth perception camera simultaneously, can develop meeting or teaching activities more efficiently.

In a remote conference or remote teaching environment, the 3D virtual image is controlled through gestures or the position and the angle of the camera device are controlled through voice, so that the whole conference or teaching activity can be smoothly carried out, and the communication efficiency is improved; by adopting the remote conference system and the method, the operation of other inconvenient peripheral equipment such as a keyboard or a mouse is not needed, the continuous suspension of teaching explanation or conference program is avoided, and the remote conference or remote teaching activity is more smoothly and efficiently developed by adopting the system and the method.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the present invention will be further described with reference to the accompanying drawings and embodiments, wherein the drawings in the following description are only part of the embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts:

FIG. 1 is a schematic diagram of a teleconferencing system in accordance with a preferred embodiment of the present invention;

fig. 2 is a flowchart of a method for displaying a real object in a teleconference according to a preferred embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the following will clearly and completely describe the technical solutions in the embodiments of the present invention, and it is obvious that the described embodiments are some embodiments of the present invention, but not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments of the present invention without inventive step, are within the scope of the present invention.

As shown in fig. 1, the teleconference system in the preferred embodiment of the present invention includes one or more near-end devices and a far-end device remotely connected to the near-end device, where the near-end device and the far-end device each include a host 1 and a peripheral 2 connected to the host 1;

the peripheral 2 comprises a display module 21, a camera module 22, a sound pickup module 23, a microphone 24, a loudspeaker 25 and an adjusting device 26; wherein:

the camera module 22 is arranged on the adjusting device 26, the camera module 22 comprises a 3D image acquiring device 221 and a depth lens 222, and the 3D image acquiring device 221 is used for acquiring a 3D virtual image of a real object;

the host 2 acquires the user gesture through the depth lens 222, recognizes the user gesture, and controls the 3D virtual image through an instruction corresponding to the gesture; the host machine 2 also acquires voice information through the microphone 24, recognizes the voice and controls the adjusting device 26 through a command corresponding to the voice;

the 3D image acquisition device 221 includes a 3D scanner or a 3D depth perception camera.

The 3D virtual image of the real object can be obtained through the cooperation of the adjusting device and the 3D depth perception camera, the 3D virtual image is transmitted to the far-end equipment, after the 3D virtual image is obtained, the 3D depth perception camera can recognize the gesture of the user and control the 3D virtual image through a corresponding instruction by obtaining the gesture of the user, the change of the 3D virtual image is synchronously transmitted to the far-end equipment, the interference of fingers of a presenter when the real object is observed can be reduced, and the real object can be observed more clearly and carefully by a user of the far-end equipment.

As shown in fig. 1, the host 1 is further configured to perform remote audio-video adjustment, specifically, the host 1 of the near-end device outputs audio and video to the peripheral 2 of the far-end device, the peripheral 1 of the far-end device displays images through the display module 21, collects sounds played by the speaker 25 through the sound pickup module 23 to obtain sound data, collects videos played by the display module 21 through the camera module 22 to obtain video data, and sends the video data and the sound data to the host 1 of the near-end device, and a worker debugs the display module 21 and the speaker 25 of the far-end device according to the video data and the sound data;

the host computer can also carry out automatic debugging, specifically, the video data and the sound data transmitted to the host computer 1 are analyzed, the image information such as definition and noise point of the video data are analyzed, the sound information such as decibel value and definition of the sound data are analyzed, and the image information and the sound information are adjusted to be within a preset range; the sound pickup module 23 comprises a plurality of sound pickups which are uniformly distributed on the conference table; camera module 22 includes a plurality of cameras, and a plurality of cameras can collect the sound image information of each position in meeting room to display module 21 from different angles, behind debugging display module 21 and sound source output module 25, can include that all meeting room personnel can both hear clearly see clearly.

As shown in fig. 2, the host 1 is further configured to record a conference, specifically, store the voice obtained by the microphone 24 as an audio file, convert the audio file into a text file through voice recognition, recognize the speaker of each voice through voiceprint recognition, and label corresponding speaker information on the text converted from each voice.

As shown in fig. 1, the method for displaying a real object in a teleconference according to the preferred embodiment of the present invention includes the following steps based on the previous embodiment:

the method comprises the following steps: when the real object is to be displayed, a user of the near-end equipment sends an instruction through voice, the host 1 of the near-end equipment acquires the voice through the microphone 24, recognizes the voice of the user, controls the adjusting device 26 to adjust the position of the camera module 22 through the instruction corresponding to the voice, acquires real object image information through the camera module 22, and transmits the real object image information to the far-end equipment in real time;

step two: when a 3D virtual image of a three-dimensional real object is to be displayed, acquiring picture data of the real object through a 3D scanner or a 3D depth perception camera; when the position or the angle of the camera module 22 needs to be adjusted, a user of the near-end device sends an instruction through voice, the host 1 of the near-end device obtains the voice through the microphone 24, recognizes the voice of the user and controls the adjusting device 26 to adjust the position of the camera module 22 through the instruction corresponding to the voice, the camera module 22 is matched with the direction and the angle of an artificially overturned real object at the same time to obtain graphic information of each angle of the three-dimensional real object, and the graphic information of the three-dimensional real object is edited to form a three-dimensional image of a virtual real object;

step four: when a 3D virtual image of a real object is to be controlled, the host 1 acquires a user gesture through the depth lens 222 or the 3D depth perception camera, recognizes the user gesture, and controls the 3D virtual image through an instruction corresponding to the gesture;

step five: when the planar image of the planar real object is to be displayed, the host 1 of the near-end device acquires the planar display image of the planar real object through the camera module 22.

As shown in fig. 2, when the 3D image acquisition device 221 employs a 3D depth perception camera;

if the displayed real object is a three-dimensional object, the host 1 acquires a 3D virtual image of the real object through a 3D depth perception camera, and acquires a state of the 3D virtual image controlled by a command sent by a gesture of a user through the depth lens 222;

if the actual real object needs to be displayed remotely, the angle of the real object is turned over by hand, the host 1 sends out a control adjusting device 26 by acquiring a voice command, so that the position of the 3D depth perception camera is adjusted, and the detailed structure information of the real object is displayed in a manner of matching with the manual turning of the real object by a user;

when the display module 21 displays the 3D virtual image and the real image of the real object simultaneously in a split screen manner, the host has the function of controlling the real object display by voice and the 3D virtual image by gesture at the same time.

As shown in fig. 2, when the object to be displayed is a planar paper form document, the host 1 obtains an image of the paper form document through the camera module 22;

the host 1 carries out field scanning on the image, calls out a corresponding prefabricated template file according to the first row and/or the first column field of a scanned form, correspondingly fills the scanned field into the template file to form a form file, and sends the form file to all remote devices;

when the near-end equipment needs to display the table, the host 1 of the near-end equipment opens the table file, the host 1 of the far-end equipment synchronously opens the table file, the host 1 of the near-end equipment acquires a control instruction of the user on the table and sends the control instruction to the host 1 of the far-end equipment, and the far-end equipment and the near-end equipment update the table display state in real time according to the instruction.

The information extraction can be rapidly carried out on the paper form, and the information extraction accuracy is high; by generating the instruction aiming at the table document to be displayed and sending the document and the display state thereof to each remote device, all users can synchronously display the table document to be displayed without transmitting videos, so that a large amount of memory and network resources are not required to be consumed, the requirement on the performance of the remote device is low, and the cost can be reduced.

As shown in fig. 2, when scanning the table image, first find the largest bounding box, locate a plurality of bounding boxes within the largest bounding box, determine the rows and columns of the table, and scan the fields of the rows and columns of the table; calling out a prefabricated template, matching the image form with the prefabricated template according to the scanned form row and column information and the first row and/or first column character information, determining the form type, and newly building a corresponding template form file according to the determined form type; and filling the identified fields of the rest rows and columns into corresponding areas of the template format file, storing the file, and simultaneously respectively sending the file to all remote equipment.

As shown in fig. 2, when the form image is scanned and the image needs to be corrected, specifically, the longest straight line in the image is found first, then it is determined whether the longest straight line is close to the horizontal line or the vertical line, then an included angle between the longest straight line and the horizontal line/vertical line is found, the inclination angle of the image is determined, and thus the angle of the image is corrected by rotation; by correcting the image, the resource utilization rate of the image acquisition equipment is improved, the data volume of image data scanned and processed by the image is reduced, and the response rate of the system is improved;

after the image is corrected, noise reduction processing is required, and the specific processing steps are as follows: transferring the image to an image with an HSV color gamut, and removing pixel points falling in a red interval; and determining a binarization threshold value at the pixel position according to the pixel value distribution of the neighborhood blocks of the pixels of the image, and carrying out binarization of the adaptive threshold value on the image to reduce the interference of noise.

And after the field in the table is recognized, dictionary optimization is needed, specifically, the field recognized by the OCR is matched with the field in the dictionary library by establishing the form of the dictionary library, if the matching score is larger than a preset threshold value, the field in the dictionary library is replaced by the field recognized by the OCR so as to optimize and update the field in the dictionary library, meanwhile, the manually confirmed correct field is supplemented into the dictionary library, and the matching score is equal to the total number of the words recognized by the OCR divided by the total number of the matched words in the current dictionary library.

The method comprises the following steps that part of forms have more contents, one paper file cannot be completely accommodated, so that the situation that a plurality of paper forms are unified forms possibly exists, when the same remote equipment transmits a plurality of image files, title character information outside a maximum enclosing frame is scanned, if the fact that the title character information is consistent and templates are the same is detected, fields of to-be-identified areas of the image files with the same template are scanned, and the scanned fields are sequentially filled into files with the same template format according to a receiving sequence and stored; a plurality of paper files may be merged into one document file.

The near-end equipment acquires the table display instruction and the table editing instruction, the table display instruction and the table editing instruction are sent to other far-end equipment through the host 1 of the near-end equipment, the far-end equipment updates the table according to the table display instruction and the table editing instruction, and after the fact that the other far-end equipment is updated is confirmed, the near-end equipment updates the table according to the table display instruction and the table editing instruction.

The presentation instructions include one or more of: page turning, page zooming and cursor displaying, wherein the editing instruction comprises one or more items of the following items: a formula editing instruction, a table editing instruction, and a chart insertion instruction.

When a conference is recorded, the display state and the update of the form are recorded on a screen, the recorded state and the update of the form are saved as a video file, real-time is bound, voice acquired by the microphone 24 when the form is displayed is saved as a second audio file, the real-time is also bound, and the video file is associated with the second audio file through the real-time; and the second audio file is converted into a text file through voice recognition, is converted into a subtitle file through real-time, and is associated with the video file, and the two modes can be simultaneously implemented or can be implemented singly.

The first audio file is also bound with real-time, and when the first audio file is converted into a text file, the real-time is not displayed, but the associated video file and the second audio file are inserted in corresponding time periods.

It will be understood that modifications and variations can be made by persons skilled in the art in light of the above teachings and all such modifications and variations are intended to be included within the scope of the invention as defined in the appended claims.

Claims

1. A teleconferencing system, characterized in that, includes near-end equipment, and far-end equipment connected with the near-end equipment remotely, the said far-end equipment is specifically one or more, the said near-end equipment and the said far-end equipment each include host computer (1), and peripheral hardware (2) connected with the said host computer (1);

the peripheral (2) comprises a display module (21), a camera module (22), a pickup module (23), a microphone (24), a loudspeaker (25) and an adjusting device (26); wherein:

the camera module (22) is arranged on the adjusting device (26), the camera module (22) comprises a 3D image acquisition device (221) and a depth lens (222), and the 3D image acquisition device (221) is used for acquiring a 3D virtual image of a real object;

the host (2) acquires user gestures through the depth lens (222), recognizes the user gestures and controls the 3D virtual image through commands corresponding to the gestures; the host (2) also acquires voice information through the microphone (24), recognizes voice and controls the adjusting device (26) through a command corresponding to the voice;

the 3D image acquisition device (221) comprises a 3D scanner or a 3D depth perception camera;

host computer (1) still is used for long-range acoustic image to adjust, specific, near-end equipment host computer (1) is to far-end equipment peripheral hardware (2) output audio and video, far-end equipment peripheral hardware (1) pass through display module (21) show image, through the broadcast sound of speaker (25), rethread pickup module (23) are right the sound of speaker (25) broadcast is gathered, acquires sound data, through camera module (22) are right the video of display module (21) broadcast is gathered, acquires video data to with video data and sound data send to near-end equipment host computer (1), the staff is according to video data and sound data to far-end equipment display module (21) with speaker (25) debug.

2. A method for displaying a real object in a teleconference, based on the teleconference system of claim 1, characterized by comprising the steps of:

the method comprises the following steps: when the real object is to be displayed, a user of the near-end equipment sends an instruction through voice, the host (1) of the near-end equipment acquires the voice through the microphone (24), recognizes the voice of the user and controls the adjusting device (26) to adjust the position of the camera module (22) through the instruction corresponding to the voice, and the real object image information is acquired through the camera module (22) and transmitted to the far-end equipment in real time;

step two: when a 3D virtual image of a three-dimensional real object is to be displayed, acquiring picture data of the real object through a 3D scanner or a 3D depth perception camera; when the position or the angle of the camera module (22) needs to be adjusted, a user of a near-end device sends an instruction through voice, the host (1) of the near-end device obtains the voice through a microphone (24), the voice of the user is recognized, the adjusting device (26) is controlled through the instruction corresponding to the voice to adjust the position of the camera module (22), the camera module (22) is matched with the direction and the angle of a man-made overturned real object at the same time, graphic information of each angle of the three-dimensional real object is obtained, and the graphic information of the three-dimensional real object is edited to form a three-dimensional image of the virtual real object;

step four: when a 3D virtual image of a real object is to be controlled, the host (1) acquires a user gesture through the depth lens (222) or the 3D depth perception camera, recognizes the user gesture and controls the 3D virtual image through a command corresponding to the gesture;

step five: when the plane image of the plane object is to be displayed, the host (1) of the near-end equipment acquires the plane display image of the plane object through the camera module (22).

3. The method for displaying the real object in the teleconference according to claim 2, wherein when the 3D image obtaining apparatus (221) employs a 3D depth-aware camera;

if the displayed real object is a three-dimensional object, the host (1) acquires a 3D virtual image of the real object through a 3D depth perception camera, acquires a gesture of a user through the depth lens (222), and sends a corresponding instruction to control the state of the 3D virtual image;

if the actual real object needs to be displayed remotely, the angle of the real object is turned over by hands, the host (1) sends out a control to the adjusting device (26) through acquiring a voice command so as to adjust the position of the 3D depth perception camera and display detailed structure information of the real object in a manner of matching with the manual turning of the real object by a user;

when the display module (21) displays the 3D virtual image and the real image of the real object simultaneously in a split screen mode, the host has the function of simultaneously controlling the real object display through voice and controlling the 3D virtual image through gestures.

4. The method for displaying the real object in the teleconference according to claim 2, wherein when the real object to be displayed is a planar paper form document, the host (1) acquires an image of the paper form document through the camera module (22);

the host (1) scans fields of the image, calls out corresponding prefabricated template files according to the first row and/or the first column fields of the scanned form, correspondingly fills the scanned fields into the template files to form files, and sends the form files to all remote devices;

when the near-end equipment needs to display the table, the host (1) of the near-end equipment opens the table file, the host (1) of the far-end equipment synchronously opens the table file, the host (1) of the near-end equipment acquires a control instruction of a user on the table and sends the control instruction to the host (1) of the far-end equipment, and the far-end equipment and the near-end equipment update the table display state in real time according to the instruction.

5. The method as claimed in claim 4, wherein when the table image is scanned, the largest bounding box is first found, the plurality of bounding boxes in the largest bounding box are located, the rows and columns of the table are determined, and the fields of the rows and columns of the table are scanned; calling out a prefabricated template, matching the image form with the prefabricated template according to the scanned form row and column information and the first row and/or first column character information, determining the form type, and newly building a corresponding template form file according to the determined form type; and filling the identified fields of the rest rows and columns into corresponding areas of the template format file, storing the file, and simultaneously respectively sending the file to all remote equipment.

6. The method as claimed in claim 5, wherein when the image of the form is scanned and the image needs to be corrected, the method specifically comprises finding the longest straight line in the image, determining whether the longest straight line is close to the horizontal line or the vertical line, finding the included angle between the longest straight line and the horizontal line/vertical line, and determining the inclination angle of the image, thereby performing rotation correction on the angle of the image.

7. The method for displaying the real object in the teleconference according to claim 5, wherein when the host (1) acquires the plurality of image files, the information of the title words outside the maximum bounding box is scanned, and if it is detected that the information of the title words is consistent and the templates are the same, the field scanning is performed on the areas to be identified of the image files with the same templates, and the scanned fields are sequentially filled in the files with the same template format according to the receiving sequence and are stored.

8. The method according to claim 4, wherein the near-end device obtains a table display instruction and a table editing instruction, and sends the table display instruction and the table editing instruction to the other far-end devices through the host (1) of the near-end device, the far-end device updates the table according to the table display instruction and the table editing instruction, and after confirming that the other far-end devices are all updated, the near-end device updates the table according to the table display instruction and the table editing instruction.