CN113280817B

CN113280817B - Visual navigation based on landmarks

Info

Publication number: CN113280817B
Application number: CN202010652637.4A
Authority: CN
Inventors: 诸小熊; 李军舰; 姚迪狄
Original assignee: Alibaba Group Holding Ltd
Current assignee: Alibaba Group Holding Ltd
Priority date: 2020-07-08
Filing date: 2020-07-08
Publication date: 2024-07-23
Anticipated expiration: 2040-07-08
Also published as: CN113280817A

Abstract

The invention provides a visual navigation method based on landmarks, which comprises the following steps: determining landmarks in the visual scene; acquiring multi-degree-of-freedom information of the intelligent object relative to the landmark; acquiring multi-degree-of-freedom change information of the intelligent agent relative to the landmark; and navigating the movement of the intelligent body according to the multi-degree-of-freedom change information. According to the scheme, six-degree-of-freedom information of a landmark in a visual scene is constructed through camera image information and pose information of the node gyroscope, wherein the six-degree-of-freedom information comprises three coordinate information of the landmark in the visual scene, namely transverse, longitudinal, far and near coordinate information and three angle information of pitching, rotating and yawing on the coordinate point; then, according to the six-degree-of-freedom information, high-frame-rate visual navigation with the landmark as a reference point can be realized. The invention can be used for displaying virtual articles/figures in VR/AR, can also be used in scenes such as unmanned, robot navigation and the like, and can realize high real-time navigation of an intelligent body on mobile equipment with common computing capability by matching with a gyroscope.

Description

Visual navigation based on landmarks

Technical Field

The invention relates to the technical field of map navigation, in particular to a visual navigation method and device based on landmarks.

Background

With the rapid development of computer vision technology, the visual scene map construction and navigation technology based on computer vision is widely applied to VR/AR (virtual reality/augmented reality), automatic navigation and other scenes due to the characteristics of low cost, wide application and the like.

The most commonly used visual map scene construction scheme is a visual SLAM (simultaneous Localization AND MAPPING, instant localization and map construction) technology, and map information is constructed through sensors, visual odometers and the like and used for judging the position information of the current intelligent agent. This solution has several problems: first, the map construction flow of SLAM is complex: the visual SLAM requires inputting scene information of a plurality of angles to a scene where the scene is located, and then constructs map information through techniques such as feature extraction and matching. Secondly, the calculation complexity is high, and the navigation speed is low: because the map information established by the visual SLAM is relatively more and the features are relatively rich, the navigation calculation amount based on the map is large, and real-time navigation is difficult to realize in common computing equipment, particularly mobile equipment.

Therefore, a scheme of visual navigation is needed, which can reduce the complexity of map construction, improve the navigation speed and support the application on common computing equipment.

Disclosure of Invention

An object of the present invention is to provide a visual navigation method based on landmarks, so as to implement immediate and simple visual landmark construction and high real-time visual navigation.

To achieve the above object, an embodiment of the present invention provides a landmark-based visual navigation method, including:

Determining landmarks in the visual scene;

Acquiring multi-degree-of-freedom information of the intelligent object relative to the landmark;

Acquiring multi-degree-of-freedom change information of the intelligent agent relative to the landmark;

and navigating the movement of the intelligent body according to the multi-degree-of-freedom change information.

Further, the multiple degree of freedom information includes coordinate information and angle information.

Further, the multi-degree-of-freedom information is six-degree-of-freedom information, and the six-degree-of-freedom information comprises an abscissa, an ordinate, a depth coordinate of the agent relative to the landmark in the visual scene, and a pitch angle, a yaw angle and a rotation angle of the agent in a spatial coordinate system; the obtaining the multi-degree-of-freedom information of the landmark relative to the intelligent agent specifically comprises the following steps:

Acquiring a visual scene shot by the intelligent camera, analyzing the visual scene, and determining the abscissa, the ordinate and the depth coordinate of the intelligent body relative to the landmark;

and acquiring sensor data of the intelligent body, and determining the pitch angle, the yaw angle and the rotation angle of the intelligent body in a space coordinate system.

Further, determining landmarks in the visual scene is specifically: the region in the visual scene preselected by the user serves as the landmark.

Further, determining landmarks in the visual scene is specifically: a salient object target in the visual scene is identified as the landmark using a subject identification algorithm, or a specific region is detected as the landmark using a target detection algorithm.

Further, the method further comprises: after the multi-degree-of-freedom information is acquired, initializing an image tracking algorithm by utilizing the multi-degree-of-freedom information, wherein the image tracking algorithm is used for acquiring the position and/or the area of the landmark in the current visual scene.

Further, the method further comprises: judging whether the current landmark is lost, if so, stopping the motion navigation and starting the re-detection step.

Further, the re-detection step specifically includes: and detecting the landmark by taking the last frame before the loss as a template, and if the landmark is detected, re-acquiring the multi-degree-of-freedom information of the intelligent object relative to the landmark.

Further, taking the central coordinates of the region of the landmark image as the abscissa and the ordinate of the landmark, further obtaining the abscissa and the ordinate of the intelligent body relative to the landmark, and taking the distance of the intelligent body camera relative to the landmark as the depth coordinate; the depth value obtaining process comprises the following steps: and acquiring a minimum circumscribed circle of the area of the landmark image, taking the product of the radius R of the circumscribed circle and the prior coefficient k as the depth coordinate of the landmark, and further acquiring the depth coordinate of the intelligent agent relative to the landmark.

Further, the method is characterized in that the change of the multi-degree-of-freedom change information of the intelligent agent relative to the landmark comprises the following steps: the change information of pitch angle, yaw angle and rotation angle, the displacement of the intelligent agent relative to the landmark plane and the depth displacement of the intelligent agent relative to the landmark; the displacement in the landmark plane is the variation of the coordinate of the landmark in the current image frame and the initial coordinate of the landmark.

Further, the depth displacement is determined according to a minimum circumcircle radius of the current landmark image area and a minimum circumcircle radius of the landmark image area when the landmark is constructed.

The embodiment of the invention also provides a visual navigation device based on the landmark, which comprises:

The landmark determining module is used for determining landmarks in the visual scene;

The multi-degree-of-freedom information construction module is used for acquiring multi-degree-of-freedom information of the intelligent object relative to the landmark;

the change information acquisition module is used for acquiring the position change information of the intelligent agent relative to the landmark;

and the visual navigation module is used for navigating the movement of the intelligent body according to the position change information.

Further, the multi-degree-of-freedom information is six-degree-of-freedom information, and the six-degree-of-freedom information comprises an abscissa, an ordinate, a depth coordinate of the agent relative to the landmark in the visual scene, and a pitch angle, a yaw angle and a rotation angle of the agent in a spatial coordinate system; the multi-degree-of-freedom information construction module is specifically used for:

Further, the landmark determining module is specifically configured to: the region in the visual scene preselected by the user serves as the landmark.

Further, the landmark determining module is specifically configured to: a salient object target in the visual scene is identified as the landmark using a subject identification algorithm, or a specific region is detected as the landmark using a target detection algorithm.

Further, the multiple-degree-of-view information construction module is further configured to: after the multi-degree-of-freedom information is acquired, initializing an image tracking algorithm by utilizing the multi-degree-of-freedom information, wherein the image tracking algorithm is used for acquiring the position and/or the area of the landmark in the current visual scene.

Further, the visual navigation module is further configured to determine whether the current landmark is lost, and if the determination result is that the current landmark is lost, stop the motion navigation and start the re-detection module.

Further, the re-detection module is configured to detect the landmark by using the last frame before the loss as a template, and re-acquire the multi-degree-of-freedom information of the intelligent agent relative to the landmark if the landmark is detected.

Further, the depth displacement is determined according to a minimum circumcircle radius of the current landmark image area and a minimum circumcircle radius of the landmark image area when the landmark is constructed. The embodiment of the invention also provides an image acquisition method, which comprises the following steps:

determining an acquisition object in a visual scene, wherein the acquisition object is at least one obvious object or a specific area in the visual scene;

Acquiring an image of the object;

Acquiring multi-degree-of-freedom information of the intelligent agent relative to an acquisition object;

correlating the image of the acquisition object with the multi-degree-of-freedom information;

storing the image of the acquisition object and the associated multi-degree-of-freedom information.

Further, the determining of the acquisition object in the visual scene is specifically: and identifying a significant object in the visual scene as the acquisition object by using an image main body identification algorithm, or detecting a specific area as the acquisition object by using a target detection algorithm.

Further, the multi-degree-of-freedom information is six-degree-of-freedom information, and the six-degree-of-freedom information comprises an abscissa, an ordinate and a depth coordinate of the intelligent agent relative to the acquisition object in the visual scene, and a pitch angle, a yaw angle and a rotation angle of the intelligent agent in a space coordinate system; the acquiring the multi-degree-of-freedom information of the acquisition object relative to the intelligent agent specifically comprises the following steps:

Acquiring a visual scene shot by the intelligent camera, analyzing the visual scene, and determining an abscissa, an ordinate and a depth coordinate of the intelligent body relative to the acquisition object;

Further, the method further comprises:

acquiring environment attribute information when the intelligent agent acquires the object image;

associating the image of the acquisition object with the environment attribute information;

Storing the associated environment attribute information.

Further, the method further comprises:

acquiring multi-degree-of-freedom information and/or environment attribute information of a current intelligent agent relative to a specified object;

acquiring an image of the specified object according to the multi-degree-of-freedom information and/or the environment attribute information;

And presenting the image of the specified object.

The embodiments of the present invention also provide a computer program product comprising computer program instructions for implementing the aforementioned landmark-based visual navigation method or the aforementioned image acquisition method when the instructions are executed by a processor.

The embodiment of the invention also provides a computer readable storage medium, on which a computer program is stored, which when executed, implements the aforementioned landmark-based visual navigation method or the aforementioned image acquisition method.

The beneficial effects of the invention are as follows: the invention provides a visual navigation method based on landmarks, which comprises the following steps: acquiring multi-degree-of-freedom information of the landmark relative to the intelligent body; acquiring multi-degree-of-freedom change information of the landmark relative to the intelligent body; and navigating the movement of the intelligent body according to the multi-degree-of-freedom change information. According to the scheme, six-degree-of-freedom information of a landmark in a visual scene is constructed through camera image information and pose information of the node gyroscope, wherein the six-degree-of-freedom information comprises three coordinate information of the landmark in the visual scene, namely transverse, longitudinal, far and near coordinate information and three angle information of pitching, rotating and yawing on the coordinate point; then, according to the six-degree-of-freedom information, high-frame-rate visual navigation with the landmark as a reference point can be realized. The invention can be used for displaying six-degree-of-freedom virtual objects/figures in VR/AR, and can also be used in scenes such as unmanned, robot navigation and the like. By matching with gyroscope information, high real-time navigation of an intelligent body can be realized on mobile equipment with common computing capacity.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings that are required to be used in the embodiments or the description of the prior art will be briefly described below, and it is obvious that the drawings in the following description are only embodiments of the present invention, and that other drawings can be obtained according to the provided drawings without inventive effort for a person skilled in the art.

Fig. 1 is a flow chart of a method according to a first embodiment of the invention.

Fig. 2 is a schematic diagram of landmark regions in a visual scene.

FIG. 3 is a block diagram of a device according to a second embodiment of the present invention

Fig. 4 is a flow chart of a method according to a third embodiment of the invention.

Detailed Description

In order to facilitate an understanding and a complete description of the technical solutions of the present invention by a person skilled in the art, reference is made to the accompanying drawings, it being evident that the embodiments described are only some, but not all, of the embodiments of the present invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

Because the visual SLAM has complex image characteristics in the constructed map image information and has high algorithm complexity, the real-time navigation on the mobile equipment, particularly the mobile equipment (such as a mobile phone) with general computing power is difficult to realize.

According to the scheme, the image tracking algorithm is utilized, the image tracking of the landmark region is only needed, and the displacement and the posture change of the intelligent agent relative to the landmark can be obtained by utilizing the image coordinate system information and the gyroscope information. Because most of the existing image tracking algorithms have the characteristic of high real-time, real-time image tracking can be realized on mobile terminal equipment. Therefore, by matching with gyroscope information, high real-time navigation of the intelligent body can be realized on mobile equipment with common computing capacity.

The intelligent agent mainly refers to a movable device provided with a camera, a gyroscope and a computing unit, such as a smart phone, a unmanned aerial vehicle with a camera, and the like.

Example 1

Referring to fig. 1, an embodiment of the present invention provides a landmark-based visual navigation method, which includes a landmark determining step, a multi-degree-of-freedom information constructing step, a change information acquiring step, and a visual navigation step.

And a landmark determining step for determining landmarks in the visual scene. The landmark in the invention is a mark area for making position and gesture reference in the motion navigation process, and the position area in the visual scene is preset by a user, as shown in fig. 2, the user takes a clothes closet in the visual scene as the landmark. Of course, landmarks may also be determined by certain intelligent algorithms, such as: the most significant object target in the visual scene is identified as a landmark using a subject identification algorithm, or in a particular scene, a particular region (e.g., logo) is detected as a landmark using a target detection algorithm.

And a multi-degree-of-freedom information construction step of acquiring multi-degree-of-freedom information of the intelligent object relative to the landmark. After the multi-degree-of-freedom information is obtained, initializing an image tracking algorithm by utilizing the multi-degree-of-freedom information so as to realize the area tracking of the visual image layer.

Wherein the multi-degree-of-freedom information is six-degree-of-freedom information. The six degrees of freedom include an abscissa, an ordinate, a depth coordinate of the agent relative to the landmark in the visual scene, and pitch, yaw, and rotation angles of the agent in a spatial coordinate system. The visual scene is a visual scene in an image frame shot by the intelligent camera. Attitude angle information such as pitch angle, yaw angle and rotation angle can be obtained through a gyroscope in the intelligent agent.

As shown in fig. 2, the region center coordinates (x, y) of the landmark image are taken as the abscissa and the ordinate of the landmark, so as to obtain the abscissa and the ordinate of the intelligent agent relative to the landmark. And taking the distance of the intelligent camera relative to the landmark as a depth coordinate. The depth value obtaining process comprises the following steps: and acquiring a minimum circumscribed circle of the region of the landmark image, and taking the product of the radius R of the circumscribed circle and the prior coefficient k as the depth coordinate of the landmark, wherein d=R×k, and k is an empirical value set according to specific application and scene.

And the change information acquisition step is used for acquiring the multi-degree-of-freedom change information of the intelligent agent relative to the landmark. The multi-degree-of-freedom variation information includes: three azimuth angle change information (delta_p, delta_r, delta_y), displacement of the agent relative to the landmark plane (delta_x, delta_y), and depth displacement of the agent relative to the landmark delta_d.

The three azimuth angle change information is the change amount of the attitude information of the current intelligent agent and the attitude information when the landmark is constructed, taking a pitch angle as an example, setting the current gyroscope position as P1, and the pitch angle as P0 when the landmark is constructed, so that the angle delta_P=P1-P0 of the change of the pitch angle can be obtained. In the same way, three azimuth angle change information (delta_p, delta_r, delta_y) can be obtained.

For a change in position, we can obtain the position and region of the current landmark in the image through the image tracker. The displacement in the landmark plane is the amount of change of the coordinates of the landmark in the current image frame and the initial coordinates when the landmark is constructed. Taking the horizontal axis x as an example, let the horizontal axis coordinate of the landmark in the current image area be x1, and the initial position be x0, delta_x=x1-x 0 can be obtained. In the same way, the displacement (delta_x, delta_y) in the image plane can be obtained.

For depth displacement, let the minimum circumcircle radius of the current landmark image area be R1, R0 be the minimum circumcircle radius of the landmark image area when the landmark is constructed, delta_d=k (R1/R0).

And the visual navigation step is used for navigating the movement of the intelligent body according to the multi-degree-of-freedom change information.

Preferably, the visual navigation step further comprises: judging whether the current landmark is lost or not by utilizing an image tracking algorithm, if the judgment result is lost, stopping the motion navigation, and starting a re-detection step. Taking KCF (Kernel Correlation Filter) tracking algorithm as an example, the current tracking state can be determined by filtering the response value for each frame.

Preferably, the re-detection step specifically includes: and detecting the landmark by taking the image of the last frame before the loss as a template, and acquiring information of six degrees of freedom of the landmark again if the landmark is detected.

The image tracking algorithm can be any algorithm for realizing object tracking through images, and is not limited to KCF type visual target tracking algorithm.

Example two

Referring to fig. 3, a second embodiment of the present invention provides a visual navigation device 300 based on a landmark, where the device includes a landmark determining module 301, a multi-degree-of-freedom information constructing module 302, a change information obtaining module 303, and a visual navigation module 304.

The landmark determining module 301 is configured to determine landmarks in a visual scene. The landmark in the invention is a mark area for making position and gesture reference in the motion navigation process, and is a position area in a visual scene preset by a user. Of course, landmarks may also be determined by certain intelligent algorithms, such as: the most significant object target in the visual scene is identified as a landmark using a subject identification algorithm, or in a particular scene, a particular region (e.g., logo) is detected as a landmark using a target detection algorithm.

The multiple degree of freedom information construction module 302 is configured to obtain location information of the landmark relative to the agent. After the multi-degree-of-freedom information is obtained, initializing an image tracking algorithm by utilizing the multi-degree-of-freedom information so as to realize the area tracking of the visual image layer. Wherein the multi-degree-of-freedom information is six-degree-of-freedom information. The six degrees of freedom include an abscissa, an ordinate, a depth coordinate of the landmark in the visual scene, and pitch, yaw and rotation angles of the agent in a spatial coordinate system; the visual scene is a visual scene in an image frame shot by the intelligent camera.

The change information obtaining module 303 obtains the multi-degree-of-freedom change information of the landmark relative to the intelligent agent.

The visual navigation module 304 is used for navigating the movement of the intelligent agent according to the change information of the six degrees of freedom of the intelligent agent relative to the landmark.

Preferably, the apparatus further comprises a re-detection module 305. The visual navigation module 304 is further configured to determine whether the current landmark is lost through an image tracking algorithm, and if the determination result is that the landmark is lost, stop the motion navigation, and start the re-detection module 305.

And the re-detection module 305 is configured to detect the landmark by using the image of the last frame before the loss as a template, and re-acquire the multi-degree-of-freedom information of the landmark if the landmark is detected.

Example III

Referring to fig. 4, a third embodiment of the present invention provides an image acquisition method, including:

s401, determining an acquisition object in the visual scene, wherein the acquisition object is at least one obvious object or a specific area in the visual scene.

In addition to collecting specified objects, all objects in a visual scene may also be collected by the present invention. For objects, there is a difference in the kind of objects contained in different scenes. Such as an inter-template scene, objects including furniture, decorations, etc.; whereas for museum scenes, the objects include exhibits.

Acquisition objects are determined by certain intelligent algorithms, such as: a salient object in a visual scene is identified as an acquisition object using an image subject identification algorithm, or a specific region (e.g., logo, furniture, ornament, etc.) is detected as an acquisition object using a target detection algorithm in a specific scene. As shown in fig. 2, a wardrobe in a visual scene is taken as an acquisition object.

S402, acquiring an image of the object. The collected images can help the user browse scene space, such as home decoration scenes or museum scenes, and besides browsing certain specific objects at multiple angles, the user can browse the whole images of the visual scenes and/or the images of other objects.

S403, acquiring multi-degree-of-freedom information of the intelligent agent relative to the acquisition object. After the multi-degree-of-freedom information is obtained, initializing an image tracking algorithm by utilizing the multi-degree-of-freedom information so as to realize the area tracking of the visual image layer.

Wherein the multi-degree-of-freedom information is six-degree-of-freedom information. The six degrees of freedom include an abscissa, an ordinate, a depth coordinate of the agent with respect to the acquisition object, and a pitch angle, a yaw angle, and a rotation angle of the agent in a spatial coordinate system. The visual scene is a visual scene in an image frame shot by the intelligent camera. Attitude angle information such as pitch angle, yaw angle and rotation angle can be obtained through a gyroscope in the intelligent agent.

As shown in fig. 2, the region center coordinates (x, y) of the landmark image are taken as the abscissa and the ordinate of the landmark, so as to obtain the abscissa and the ordinate of the intelligent agent relative to the acquisition object. And taking the distance between the intelligent camera and the acquisition object relative to the camera as a depth coordinate. The depth coordinate acquiring process comprises the following steps: and acquiring a minimum circumscribed circle of the region of the landmark image, and taking the product of the radius R of the circumscribed circle and the prior coefficient k as the depth coordinate of the landmark, wherein d=R×k, and k is an empirical value set according to specific application and scene.

S404, correlating the image of the acquisition object with the multi-degree-of-freedom information. Thus, the mapping relation between the image of the acquisition object and the multi-degree-of-freedom information is established.

Preferably, the environment attribute information when the agent collects the image is further acquired, and the object in the visual scene is associated with the environment attribute information. The environment information includes a time of photographing, a scene type, season information of photographing, and the like.

S405, storing the image of the acquisition object and the associated multi-degree-of-freedom information. Preferably, the associated environment attribute information is also stored.

Through the steps, images of different angles and different distances between objects in the visual scene and the intelligent body are established. The method may also be used to present six degrees of freedom virtual objects/characters in VR/AR.

Preferably, the method further comprises: acquiring multi-degree-of-freedom information and/or environment attribute information of a current intelligent agent relative to a specified object; acquiring an image of the specified object according to the multi-degree-of-freedom information and/or the environment attribute information; and presenting an image of the specified object.

The method of the third embodiment of the invention can realize the acquisition of images of certain objects in the visual scene in the visual navigation process, and establishes the association relationship between the intelligent body track information and the acquired images. Based on the trajectory information, the view angle position, the view angle for viewing a certain formulated object, and the like can be accurately determined.

Taking sample plate image collection as an example, after the method of the third embodiment of the invention is adopted, the whole 3D image between sample plates and images with different visual angles of certain furniture/decorations can be generated based on the collected images. Other users (e.g., clients between visiting templates) can view images of different perspectives of a specified object, and also can view the overall 3D effect between templates as a reference for their own house purchase or decoration. It will be clear to those skilled in the art that, for convenience and brevity of description, specific working procedures of the above-described apparatus, modules and units may refer to corresponding procedures of the foregoing method embodiments, and are not repeated herein.

The embodiments of the present invention also disclose a computer program product comprising computer program instructions for implementing the method as in embodiment one or embodiment three when the instructions are executed by a processor.

The embodiment of the invention also discloses a computer readable storage medium, on which a computer program is stored, which when executed, implements the method as in the first or third embodiment.

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of methods, apparatus and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart and block diagrams may represent a module, segment, or portion of code, which comprises one or more computer-executable instructions for implementing the logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. It will also be noted that each block or combination of blocks in the block diagrams and flowchart illustrations, can be implemented by special purpose hardware-based systems that perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

It is noted that relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises an element.

The previous description of the disclosed embodiments, provided to enable any person skilled in the art to make or use the present invention, is provided for illustration only and not for limitation. Other variations or modifications of the various aspects of the invention will be apparent to those of skill in the art, and are within the scope of the invention.

Claims

1. A landmark-based visual navigation method, the method comprising:

determining landmarks in a visual scene, wherein the landmarks are marker areas in the visual scene for position and posture reference in a motion navigation process;

2. The method of claim 1, wherein the multiple degree of freedom information includes coordinate information and angle information.

3. The method of claim 2, wherein the multi-degree-of-freedom information is six degrees-of-freedom information including an abscissa, an ordinate, a depth coordinate of the agent relative to the landmark in the visual scene, and a pitch angle, a yaw angle, and a rotation angle of the agent in a spatial coordinate system; the obtaining the multi-degree-of-freedom information of the landmark relative to the intelligent agent specifically comprises the following steps:

4. The method of claim 1, wherein determining landmarks in the visual scene is specifically: the region in the visual scene preselected by the user serves as the landmark.

5. The method of claim 1, wherein determining landmarks in the visual scene is specifically: a salient object target in the visual scene is identified as the landmark using a subject identification algorithm, or a specific region is detected as the landmark using a target detection algorithm.

6. The method of claim 1, wherein the method further comprises: after the multi-degree-of-freedom information is acquired, initializing an image tracking algorithm by utilizing the multi-degree-of-freedom information, wherein the image tracking algorithm is used for acquiring the position and/or the area of the landmark in the current visual scene.

7. The method of claim 1, wherein the method further comprises: judging whether the current landmark is lost, if so, stopping the motion navigation and starting the re-detection step.

8. The method of claim 7, wherein the re-detecting step is specifically: and detecting the landmark by taking the last frame before the loss as a template, and if the landmark is detected, re-acquiring the multi-degree-of-freedom information of the intelligent object relative to the landmark.

9. The method of claim 3, wherein the region center coordinates of the landmark image are taken as the abscissa and the ordinate of the landmark, and further the abscissa and the ordinate of the intelligent body relative to the landmark are obtained, and the distance of the intelligent body camera relative to the landmark is taken as the depth coordinate; the depth coordinate acquiring process comprises the following steps: and acquiring a minimum circumscribed circle of the area of the landmark image, taking the product of the radius R of the circumscribed circle and the prior coefficient k as the depth coordinate of the landmark, and further acquiring the depth coordinate of the intelligent agent relative to the landmark.

10. The method of claim 3, wherein the variation of the multiple degree of freedom variation information of the agent relative to the landmark comprises: the change information of pitch angle, yaw angle and rotation angle, the displacement of the intelligent agent relative to the landmark plane and the depth displacement of the intelligent agent relative to the landmark; the displacement in the landmark plane is the variation of the coordinate of the landmark in the current image frame and the initial coordinate of the landmark.

11. The method of claim 10, wherein the depth displacement is determined based on a minimum circumscribing radius of the current landmark image area and a minimum circumscribing radius of the landmark image area at the time of landmark construction.

12. A landmark-based visual navigation device, comprising:

The landmark determining module is used for determining landmarks in a visual scene, wherein the landmarks are mark areas for making position and gesture references in the motion navigation process in the visual scene;

13. An image acquisition method, the method comprising:

Acquiring an image of the object;

acquiring multi-degree-of-freedom information of the intelligent agent relative to the acquisition object;

14. The method according to claim 13, wherein determining the acquisition object in the visual scene is in particular: and identifying a significant object in the visual scene as the acquisition object by using an image main body identification algorithm, or detecting a specific area as the acquisition object by using a target detection algorithm.

15. The method of claim 13, wherein the multiple degree of freedom information includes coordinate information and angle information.

16. The method of claim 15, wherein the multi-degree-of-freedom information is six degrees-of-freedom information including an abscissa, an ordinate, a depth coordinate of the agent relative to the acquisition object in the visual scene, and a pitch angle, a yaw angle, and a rotation angle of the agent in a spatial coordinate system; the acquiring the multi-degree-of-freedom information of the acquisition object relative to the intelligent agent specifically comprises the following steps:

17. The method of claim 13, wherein the method further comprises:

Associating the image of the object with the environmental attribute information;

Storing the associated environment attribute information.

18. The method of claim 13, wherein the method further comprises:

And presenting the image of the specified object.

19. A computer program product comprising computer program instructions for implementing the visual navigation method of any one of claims 1-11 or the image acquisition method of any one of claims 13-18 when the instructions are executed by a processor.

20. A computer readable storage medium having stored thereon a computer program which, when executed, implements the visual navigation method of any of claims 1-11 or the image acquisition method of any of claims 13-18.