WO2025186882A1

WO2025186882A1 - Position specification system and recording medium

Info

Publication number: WO2025186882A1
Application number: PCT/JP2024/008137
Authority: WO
Inventors: 翔太今村
Original assignee: Nefront; Nefront Inc
Current assignee: Nefront; Nefront Inc
Priority date: 2024-03-04
Filing date: 2024-03-04
Publication date: 2025-09-12
Anticipated expiration: 2026-09-04
Also published as: WO2025186882A8

Abstract

Provided is a system in which a control unit 22 of a server 20 that has acquired an image captured by a terminal device 10 specifies the position of the terminal device. The server comprises a database 23 which stores a large number of 3D maps indicating respective areas and feature data related to the 3D maps. The feature data includes data obtained by vectorizing an image of a 3D map and text information related thereto, and context information of a 3D map, etc. A map narrowing unit 222 of the control unit vectorizes the acquired image, etc., performs vector retrieval with the feature data stored in the database, and provides the acquired image and the context information of the 3D map to an LLM to narrow down candidates of the 3D map (exclude 3D maps of similar shapes). A map matching unit 221 can accurately specify a position and a posture by matching the shape with the narrowed 3D map candidates.

Description

Location identification system and recording medium

　本発明は、位置特定システム及び記録媒体に関する。 The present invention relates to a location identification system and a recording medium.

　画像情報に基づく撮像装置の位置及び姿勢の計測は、複合現実感／拡張現実感における現実空間と仮想物体の位置合わせ、撮像装置を携帯するユーザ、ロボット、自動車・ドローンなどのモビリティの位置推定など様々な目的で利用される。ＶＰＳ（Ｖｉｃｕａｌ　Ｐｏｓｉｔｉｏｎｉｎｇ　Ｓｙｓｔｅｍ）を用いた位置検出は、ＧＰＳ衛星信号を受信できない環境下であっても位置を特定できる。したがって、例えば、撮影装置を携帯するユーザが、建物内や地下空間に存在している場合に、その空間内での存在位置の特定が可能となる。 Measuring the position and orientation of an imaging device based on image information is used for a variety of purposes, such as aligning real space with virtual objects in mixed reality/augmented reality, and estimating the position of users carrying the imaging device, robots, cars, drones, and other mobility devices. Position detection using a VPS (Viral Positioning System) can identify a location even in environments where GPS satellite signals cannot be received. Therefore, for example, if a user carrying an imaging device is inside a building or underground, it is possible to identify their location within that space.

　この種の撮影した画像情報を利用した位置を特定するシステムは、例えば特許文献１等に開示されている。 This type of system for identifying a location using captured image information is disclosed, for example, in Patent Document 1.

特許第７３６１０７５号公報Patent No. 7361075

　スーパーマーケットやコンビニエンスストアなどの売り場は、同じような陳列棚が似たレイアウトで配置されている。また、複合商業施設における屋内で出店される各店舗は、通路に沿って用意された同じ大きさの部屋を利用し、門構えが似たものがある。展示場で開催される展示会は、同じ寸法形状で区切られたブースが、同じ配置レイアウトで多数用意されることがある。 Sales areas in supermarkets and convenience stores tend to have similar display shelves arranged in a similar layout. Furthermore, indoor stores in commercial complexes tend to use rooms of the same size along the corridors, and some have similar entrances. Exhibitions held at exhibition halls often have numerous booths with the same dimensions and shapes, arranged in the same layout.

　このような場合、例えば同一のスーパーマーケット内で似たような形状の売り場が多数存在し、各売り場で撮影した画像の特徴点が似てしまう。このように特徴点が似ているものが存在すると、ＶＰＳを用いた位置特定の精度が低下する。このことは、複合商業施設や展示場等の屋内でも同様に発生する。 In such cases, for example, if there are many similarly shaped sales areas within the same supermarket, the feature points of the images taken at each sales area will be similar. When similar feature points exist, the accuracy of location identification using VPS will decrease. This also occurs indoors in commercial complexes, exhibition halls, etc.

　さらに、屋内の広範囲の領域内で画像に基づいて位置特定をする場合、対象となる比較対象の画像が多数存在する。システムは、特徴点が似ている画像が多数存在すると、正しい位置検出が行えず、また、１つに特定できずにフリーズしてしまうおそれがある。 Furthermore, when performing image-based location identification within a wide indoor area, there are many images to compare. If there are many images with similar features, the system may not be able to correctly detect the location, and may freeze up without being able to identify one.

　上述した課題はそれぞれ独立したものとして記載しているものであり、本発明は、必ずしも記載した課題の全てを解決できる必要はなく、少なくとも一つの課題が解決できればよい。またこの課題を解決するための構成についても単独で分割出願・補正等により権利取得する意思を有する。 The above-mentioned problems are described as being independent of each other, and the present invention does not necessarily have to solve all of the problems described; it is sufficient if it can solve at least one of the problems. Furthermore, we intend to obtain rights to the configurations that solve these problems separately through divisional applications, amendments, etc.

　（１）取得した画像が撮影された位置を特定する位置特定システムであって、前記画像に基づき抽象化した特徴データを用いて検索対象の３Ｄマップの候補を絞り込むマップ絞り込み機能と、前記画像を、その絞り込んだ前記３Ｄマップの候補と形状のマッチングを行い、前記画像を撮影した位置を特定するマップマッチング機能と、を備えた位置特定システムとするとよい。 (1) A location identification system that identifies the location where an acquired image was taken should preferably have a map filtering function that narrows down the 3D map candidates to be searched using feature data abstracted based on the image, and a map matching function that matches the shape of the image with the narrowed down 3D map candidates to identify the location where the image was taken.

　このようにすることで、内容面でのエリアの絞り込み（３Ｄマップの絞り込み）を行った上で、形状の特徴点マッチングを精度良く行える。３Ｄマップの候補の絞り込み処理は、例えば、複数の３Ｄマップの中から、形状のマッチングを行う３Ｄマップを所定数選択する場合、１つの３Ｄマップ内で形状のマッチングを行うエリアの３Ｄマップの部分を所定数選択する場合、あるいはそれら組み合わせなどを含む。 By doing this, it is possible to narrow down the area in terms of content (narrow down the 3D maps) and then perform shape feature point matching with high accuracy. The process of narrowing down 3D map candidates includes, for example, selecting a predetermined number of 3D maps for shape matching from multiple 3D maps, selecting a predetermined number of 3D map portions of the area for shape matching within a single 3D map, or a combination of these.

　（２）前記マップ絞り込み機能は、前記画像と前記３Ｄマップのコンテクスト情報を入力として大規模言語モデルを用いて前記抽象化を行い、前記３Ｄマップの候補を絞り込むように構成するとよい。 (2) The map narrowing function may be configured to perform the abstraction using a large-scale language model with context information of the image and the 3D map as input, thereby narrowing down the candidates for the 3D map.

　大規模言語モデルを用いた検索は、ピンポイントでの位置特定や姿勢・向きの特定まではできないが、似た形状でもその内容を加味して識別できる。よって、３Ｄマップの候補を精度良くピックアップできる。そして、３Ｄマップの候補を絞り込むことで、後段のマップマッチング機能にて、形状のマッチングを用いて精度良く位置特定等を行うことができる。 Searches using large-scale language models cannot pinpoint location or pinpoint posture or orientation, but they can identify similar shapes by taking their content into account. This allows for accurate selection of 3D map candidates. Then, by narrowing down the 3D map candidates, the map matching function in the subsequent stage can use shape matching to accurately pinpoint location, etc.

　（３）前記マップ絞り込み機能は、前記画像に基づく情報をベクトル化して前記特徴データを作成するように構成するとよい。 (3) The map refinement function may be configured to vectorize information based on the image to create the feature data.

　（４）前記マップ絞り込み機能は、前記画像に関するテキスト情報をベクトル化して前記特徴データを作成するように構成するとよい。画像に関するテキスト情報は、画像に含まれる文字を文字認識して作成したもの、画像認識して画像中に存在する物体を示すもの、画像の説明文書などがある。説明文書は、予め人が作成したものや画像を生成ＡＩに与え生成したものなどとするとよい。 (4) The map narrowing function may be configured to vectorize text information related to the image to create the feature data. Text information related to the image may be created by character recognition of characters contained in the image, by image recognition to indicate objects present in the image, or an explanatory document for the image. The explanatory document may be one created in advance by a person or one generated by providing the image to a generation AI.

　（５）前記マップ絞り込み機能が行う前記抽象化は、前記画像と、その画像に関係するコンテクスト情報を用いて行うように構成するとよい。 (5) The abstraction performed by the map refinement function may be configured to be performed using the image and context information related to that image.

　（６）前記マップ絞り込み機能は、前記画像を撮影した端末が検出する位置情報を利用して行うように構成するとよい。 (6) The map narrowing function may be configured to use location information detected by the device that captured the image.

　（７）比較対象となる特徴データと、３Ｄマップを記憶し、前記マップ絞り込み機能及び前記マップマッチング機能がアクセス可能なデータベースを備えるように構成するとよい。 (7) It is preferable to configure the system to include a database that stores feature data to be compared and 3D maps and that can be accessed by the map narrowing function and the map matching function.

　（８）カメラと通信機能を有する端末を備え、前記端末は、前記通信機能を用いて前記カメラが撮影した画像を前記マップ絞り込み機能を備えるサーバに送信するように構成するとよい。また、端末は、前記マップマッチング機能が特定した位置を取得し、その取得した位置を報知する機能を備えるとよい。 (8) The system may include a camera and a terminal having a communication function, and the terminal may be configured to use the communication function to send images captured by the camera to a server having the map filtering function. The terminal may also have a function to acquire the location identified by the map matching function and notify the user of the acquired location.

　（９）本発明の記録媒体は、（１）から（８）のいずれかに記載の位置特定システムの機能をコンピュータに実現させるためのプログラムを記録したコンピュータ読み取り可能な記録媒体とするとよい。 (9) The recording medium of the present invention may be a computer-readable recording medium on which a program for causing a computer to realize the functions of the location identification system described in any one of (1) to (8) is recorded.

　上記（１）から（８）に記載の各発明の構成要素は、それぞれ２つ或いは３つ以上の任意の組み合わせをすることができ、その組み合わせにより発明を構成することができる。 The components of each invention described above in (1) to (8) can be combined in any combination of two or more, and the invention can be constituted by such combinations.

　本発明は、例えば、ベクター検索や大規模言語モデルによる推論等の抽象化した特徴データを用いて検索対象の３Ｄマップ候補を絞り込んだ後、その絞り込んだ３Ｄマップとの特徴点のマッチングで、似た形状が存在する空間内であっても正確に位置を特定することができる。 The present invention narrows down the candidate 3D maps to be searched using abstracted feature data, such as vector search or inference using large-scale language models, and then matches feature points with the narrowed-down 3D maps to accurately identify a location even within a space containing similar shapes.

図１は、本発明に係る位置特定システムの好適な一実施形態を示す図である。FIG. 1 is a diagram showing a preferred embodiment of a location specifying system according to the present invention. 図２は、実際のシステム構成の一例を示す図である。FIG. 2 is a diagram showing an example of an actual system configuration. 図３は、サーバの制御部の処理を説明するフローチャートである。FIG. 3 is a flowchart illustrating the processing of the control unit of the server. 図４は、屋内の位置を特定する場所の一例を示す図である。FIG. 4 is a diagram showing an example of a place for identifying an indoor position. 図５は、屋内の位置を特定する場所の一例を示す図である。FIG. 5 is a diagram showing an example of a place for identifying an indoor position. 図６は、屋内の位置を特定する場所の一例を示す図である。FIG. 6 is a diagram showing an example of a place for identifying an indoor position. 図７は、屋内の位置を特定する場所の一例を示す図である。FIG. 7 is a diagram showing an example of a place for identifying an indoor position. 図８は、屋内の位置を特定する場所の一例を示す図である。FIG. 8 is a diagram showing an example of a place for identifying an indoor position. 図９は、屋内の位置を特定する場所の一例を示す図である。FIG. 9 is a diagram showing an example of a place for identifying an indoor position.

　以下、本発明の好適な実施形態について図面に基づき、詳細に説明する。なお、本発明は、これに限定されて解釈されるものではなく、本発明の範囲を逸脱しない限りにおいて、当業者の知識に基づいて、種々の変更、修正、改良を加え得るものである。 Below, a preferred embodiment of the present invention will be described in detail with reference to the drawings. However, the present invention should not be construed as being limited to this embodiment, and various changes, modifications, and improvements may be made based on the knowledge of those skilled in the art, provided that they do not deviate from the scope of the present invention.

　本発明に係る位置特定システムは、展示場、商業施設、各種の建物内、地下街、地下駐車場などのＧＰＳ信号を精度良く受信できない屋内における位置特定を精度良く行うものである。位置特定を行う屋内のエリアについて、事前にカメラやＬｉＤＡＲ（Ｌｉｇｈｔ　Ｄｅｔｅｃｔｉｏｎ　Ａｎｄ　Ｒａｎｇｉｎｇ）を用いて取得した情報を元に、各種の計算処理、統計モデル、機械学習モデルを用いて、屋内の所定の領域を示す３Ｄマップと、その３Ｄマップに関連する特徴データ等を作成したものを記憶保持する。そして、位置特定システムは、実際に現場で撮影した画像データを取得し、取得した画像データと記憶保持した特徴データ等から、屋内における存在位置の絞り込みを行い、さらに記憶保持した３Ｄマップとの特徴点のマッチングで正確に位置を特定する機能を備える。 The positioning system of the present invention accurately identifies positions indoors where GPS signals cannot be received accurately, such as in exhibition halls, commercial facilities, various buildings, underground shopping malls, and underground parking lots. Based on information acquired in advance using cameras and LiDAR (Light Detection and Ranging) for the indoor area to be located, the system uses various computational processes, statistical models, and machine learning models to create and store a 3D map showing a specific indoor area, along with feature data related to the 3D map. The positioning system then acquires image data actually taken on-site, narrows down indoor locations based on the acquired image data and the stored feature data, and is further equipped with the function of accurately identifying the location by matching feature points with the stored 3D map.

　図１は、本発明に係る位置特定システム１の好適な一実施形態を示す構成図であり、図２は実際の構成の一例を示す図である。図１に示すように、位置特定システム１は、端末装置１０で撮影した画像データを、ネットワーク２を経由して取得したサーバ２０が、その取得した画像データに基づきその端末装置１０の存在位置を特定する機能を有する。ネットワーク２は、例えば公衆回線、インターネット等であり、相互に離れた場所にいてもアクセス可能なものとするとよい。 FIG. 1 is a block diagram showing a preferred embodiment of a location identification system 1 according to the present invention, and FIG. 2 is a diagram showing an example of an actual configuration. As shown in FIG. 1, the location identification system 1 has a function in which a server 20 acquires image data captured by a terminal device 10 via a network 2 and identifies the location of the terminal device 10 based on the acquired image data. The network 2 may be, for example, a public line or the Internet, and is preferably accessible even when the devices are in distant locations.

　端末装置１０は、制御部１１、カメラ１２、入力部１３、通信部１４、表示部１５、ＧＰＳ受信部１６等を備える。端末装置１０は、例えば、人が携帯する場合、スマートフォン、スマートグラス等を用いるとよい。端末装置１０がスマートフォンの場合、端末装置１０を構成するハードウェア（制御部１１、カメラ１２、入力部１３、通信部１４、表示部１５、ＧＰＳ受信部１６）は、スマートフォンに標準装備される機器を用いる。 The terminal device 10 includes a control unit 11, a camera 12, an input unit 13, a communication unit 14, a display unit 15, a GPS receiver unit 16, etc. When the terminal device 10 is carried by a person, it may be a smartphone, smart glasses, etc. When the terminal device 10 is a smartphone, the hardware that constitutes the terminal device 10 (control unit 11, camera 12, input unit 13, communication unit 14, display unit 15, GPS receiver unit 16) uses equipment that is standard equipment on smartphones.

　制御部１１は、端末装置１０の各部を制御する。制御部１１は、例えば、プロセッサ及びメモリを含むコンピュータである。メモリは、例えば、ＲＡＭ（Ｒａｎｄｏｍ　Ａｃｃｅｓｓ　Ｍｅｍｏｒｙ）、及びＲＯＭ（Ｒｅａｄ　Ｏｎｌｙ　Ｍｅｍｏｒｙ）等を有する主記憶装置である。プロセッサは、メモリのＲＯＭから読み出したプログラムをＲＡＭに一時的に記憶させる。また、メモリのＲＡＭは、プロセッサに作業領域を提供する。プロセッサは、プログラムの実行中に生成されるデータをＲＡＭに一時的に記憶させながら演算処理を行うことにより、各種の制御を行う。そして、端末装置１０がスマートフォンの場合、位置特定を行うための処理は、例えば、制御部がスマホアプリを起動することで行うとよい。 The control unit 11 controls each part of the terminal device 10. The control unit 11 is, for example, a computer including a processor and memory. The memory is, for example, a main storage device having RAM (Random Access Memory) and ROM (Read Only Memory). The processor temporarily stores programs read from the ROM in the RAM. The RAM also provides a working area for the processor. The processor performs various controls by performing arithmetic processing while temporarily storing data generated during program execution in the RAM. If the terminal device 10 is a smartphone, the process for determining location may be performed, for example, by the control unit launching a smartphone app.

　通信部１４は、上記のネットワーク２に接続し、ネットワーク２に接続される他の装置とデータ・情報を送受信する機能、Ｗｉ－Ｆｉのアクセスポイントに接続する機能等を備える。 The communication unit 14 connects to the above-mentioned network 2 and has functions such as sending and receiving data and information with other devices connected to network 2, and connecting to Wi-Fi access points.

　制御部１１は、カメラ１２で撮影した画像を表示部１５に表示する。また制御部１１は、撮影中の画像データを、ネットワーク２を用いてサーバ２０に送信する。また、制御部１１は、現在位置に基づくセンサ情報を、ネットワーク２を用いてサーバ２０に送信する。センサ情報は、例えばＧＰＳ受信部１６が受信し取得した位置情報（経度緯度）や、通信部１４が取得したＷｉ－Ｆｉのアクセスポイントを特定する情報等を備える。これらの情報を送信するタイミングは、所定の間隔で定期的に、或いは入力部１３で受け付けた送信指示を契機とするとよい。さらに、制御部１１は、サーバ２０から取得した現在位置に基づき、表示部１５に所定の情報を表示する機能等を備える。 The control unit 11 displays images captured by the camera 12 on the display unit 15. The control unit 11 also transmits image data being captured to the server 20 using the network 2. The control unit 11 also transmits sensor information based on the current location to the server 20 using the network 2. The sensor information includes, for example, location information (longitude and latitude) received and acquired by the GPS receiving unit 16, and information identifying a Wi-Fi access point acquired by the communication unit 14. The timing for transmitting this information may be periodically at a predetermined interval, or triggered by a transmission instruction received by the input unit 13. The control unit 11 also has a function for displaying predetermined information on the display unit 15 based on the current location acquired from the server 20.

　サーバ２０は、インターネット接続機能等のネットワーク２に接続し通信するインタフェース等を有する通信部２１と、サーバ２０の動作を司る制御部２２と、位置特定をするためのデータを記憶するデータベース２３等を備える。また、図１では、サーバ２０は、１台のサーバコンピュータから構成される例を図示しているが、図２に示すように、直接或いはネットワークを介して接続される複数台のコンピュータから構成されるものとしてもよい。また、データベースを実装するサーバは、物理的なサーバに限ることなく、ソフトウエアプログラムで構成したり、クラウドに実装したりするなど各種の形態のものを利用するとよい。 The server 20 comprises a communication unit 21 having an interface for connecting to and communicating with a network 2 such as an internet connection function, a control unit 22 that controls the operation of the server 20, and a database 23 that stores data for location identification. In addition, while Figure 1 shows an example in which the server 20 is composed of a single server computer, as shown in Figure 2, it may also be composed of multiple computers connected directly or via a network. The server that implements the database is not limited to a physical server, and various other forms may be used, such as being composed of a software program or implemented in the cloud.

　データベース２３は、端末装置１０のカメラ１２で撮影した画像に基づき位置を特定するためのデータを記憶する。データベース２３は、３Ｄマップ記憶部２３１と、特徴データ記憶部２３２を備える。３Ｄマップ記憶部２３１は、事前に作成した位置特定を行う対象エリアについての３Ｄマップを記憶する。すなわち、３Ｄマップは、多数の画像、さらには各画像と紐付くＬｉＤＡＲの深度データを用いて作成される特徴点（ｘ，ｙ，ｚ）の集合体もしくはポリゴンメッシュから構成される３次元形状に基づくデータである。そこで、システム管理者等は、事前に位置特定を行う屋内のエリア内を移動しながら、カメラを用いて撮影して各地点の画像データを取得し、必要に応じてＬｉＤＡＲ（Ｌｉｇｈｔ　Ｄｅｔｅｃｔｉｏｎ　Ａｎｄ　Ｒａｎｇｉｎｇ）を用いて深度データを取得する。屋内のある地点で取得した複数の画像データ並びに深度データに基づき、各種の計算処理等を行い、３Ｄマップを作成する。この３Ｄマップの作成処理は、既存の技術を用いて行うことができる。 The database 23 stores data for identifying a location based on images captured by the camera 12 of the terminal device 10. The database 23 includes a 3D map storage unit 231 and a feature data storage unit 232. The 3D map storage unit 231 stores a 3D map of the target area for which location is to be identified, created in advance. In other words, the 3D map is data based on a three-dimensional shape composed of a collection of feature points (x, y, z) or a polygon mesh created using a large number of images and the LiDAR depth data associated with each image. Therefore, a system administrator or the like moves around the indoor area for which location is to be identified in advance, taking images using a camera to acquire image data for each location, and acquiring depth data using LiDAR (Light Detection and Ranging) as needed. Various calculations and other processes are performed based on the multiple image data and depth data acquired at a certain indoor location to create a 3D map. This 3D map creation process can be performed using existing technology.

　同じ屋内には、似た形状からなるエリアが複数存在することがある。その結果、異なる場所における３Ｄマップの形状、特徴点のうち、類似するものが存在する。特に、展示場は、同一の展示会の場合、同じような寸法形状のブースが多数存在し、会場内での複数のブースの配置レイアウトが同じになることが多い。また、大型商業施設やショッピングモール内では、同じような門構えの店が複数存在することがある。また、同一の店舗内においても、同じ陳列棚を多数用意し、同じレイアウトで配置した場合、類似する３Ｄマップが多数存在してしまう。 In the same indoor space, there can be multiple areas with similar shapes. As a result, there will be similar shapes and feature points on 3D maps in different locations. In particular, at exhibition halls, for the same exhibition, there will be many booths with similar dimensions and shapes, and the layout of multiple booths within the venue will often be the same. Furthermore, in large commercial facilities and shopping malls, there may be multiple stores with similar entrances. Furthermore, even within the same store, if many identical display shelves are prepared and arranged in the same layout, there will be many similar 3D maps.

　所定の地点における３Ｄマップの作成処理は、例えば、３Ｄマップデータの特徴が一意に定まるように範囲を分割する。その分割したエリアで取得した少しずつ重なり合った画像のシークエンスと各画像に紐づく姿勢情報から三次元再構成を行い、分割した範囲の３Ｄマップを作成するとよい。また、ＬｉＤＡＲを用いて取得した震度データが存在する場合、それも利用して分割した範囲の３Ｄマップを作成するとよい。なお、ＬｉＤＡＲのデータが無い場合、画像データのみに基づいて３Ｄマップを作成する。 The process of creating a 3D map for a given location involves, for example, dividing the area so that the characteristics of the 3D map data are uniquely determined. A 3D map of the divided area can be created by performing three-dimensional reconstruction from a sequence of slightly overlapping images acquired in the divided area and the posture information associated with each image. Furthermore, if seismic intensity data acquired using LiDAR is available, this can also be used to create a 3D map of the divided area. If LiDAR data is not available, the 3D map can be created based on image data alone.

　この分割する範囲は、例えば展示場の場合、イベント会場のコーナー単位、ブース単位など一定のエリアとする。また、ショッピングモールの場合の分割する範囲は、例えば店舗単位で門構えが含まれるエリア等とする。また、店舗の内部における分割する範囲は、例えば、１または複数の陳列棚を含むエリア等とする。この分割の範囲は適宜選択するとよく、何れの場合も、分割した各エリアに対し１つずつ３Ｄマップを作成する。各３Ｄマップには、ＩＤが付与され、また、屋内の実際のエリアを特定する位置情報も関連付けて登録する。 In the case of an exhibition hall, for example, this division range may be a fixed area such as a corner of an event venue or a booth. In the case of a shopping mall, the division range may be, for example, an area including the entrance to each store. Furthermore, the division range within a store may be, for example, an area including one or more display shelves. The division range can be selected as appropriate, and in either case, one 3D map is created for each divided area. Each 3D map is assigned an ID, and location information identifying the actual area indoors is also associated and registered.

　このように３Ｄマップは、屋内の所定の領域を複数に分割した各エリアに対して作成するため、例えば、１つの展示場で開催される同一の展示会・イベント会場において多数の３Ｄマップが用意される。１つの商業施設の屋内においても区分けされた複数の領域に対しそれぞれ３Ｄマップが作成される。 In this way, 3D maps are created for each of the multiple areas that a specific indoor area is divided into, so for example, multiple 3D maps are prepared for the same exhibition or event venue held at a single exhibition center. 3D maps are also created for each of the multiple divided areas inside a single commercial facility.

　サーバ２０は、異なる建物・施設内における複数の領域についての３Ｄマップを記憶する。そこで、サーバ２０は、どの展示場や商業施設についての３Ｄマップであるかがわかるように記憶する。例えば、建物毎に記憶領域を分けて３Ｄマップを記憶したり、３Ｄマップの付記情報として、建物等を特定する識別情報を登録したりするとよい。また展示場が同じであっても、開催される展示会が異なると３Ｄマップは異なるため、３Ｄマップは、展示会毎に分けて３Ｄマップ記憶部２３１に記憶する。 The server 20 stores 3D maps for multiple areas within different buildings and facilities. Therefore, the server 20 stores the 3D maps in a way that makes it clear which exhibition hall or commercial facility the 3D map pertains to. For example, it is possible to store 3D maps in separate storage areas for each building, or to register identification information that identifies the building, etc., as additional information for the 3D map. Furthermore, since the 3D maps will be different for different exhibitions held at the same exhibition hall, the 3D maps are stored separately for each exhibition in the 3D map storage unit 231.

　特徴データ記憶部２３２は、３Ｄマップを作成したエリアを撮像した１または複数枚の画像データ等を中間表現に抽象化した特徴データを記憶する。この特徴データは、それぞれの３Ｄマップに関連付けられ、１または複数有する。抽象化した特徴データは、例えば、３Ｄマップを作成した際に使用した画像をマルチモーダルの埋め込みモデルでベクトル化したデータ、また、各分割範囲の画像中に存在する説明情報（コンテクスト）、おおよその位置を特定するための付随する位置情報などがある。コンテクストは、テキストデータとしてもよいがベクトル化した多次元ベクトルとするとよい。特徴データを表すベクトルは、例えば、画像から文字検出や物体検出して説明文を生成した上で、テキスト情報を埋め込みモデルでベクトル化して作成するとよい。また、特徴データを表すベクトルは、テキスト情報を用いるものに限ることなく、例えば、画像を畳み込んで特徴ベクトルを直接的に生成するとよい。画像に基づきベクトルを生成することで、文字検出では生成できない例えば壁の様相などの情報に基づく特徴データを生成することができ、情報量的に漏れが少なくなるのでよい。また、同一範囲内の別角度から撮影した複数枚の画像の説明情報を結合した情報を、単一の３Ｄマップのコンテクスト情報としてもよい。この説明情報を作成するに際に使用するテキストは、例えば画像データを画像認識し、ＯＣＲの技術を用いて作成するとよい。また、説明情報は、マルチモーダルの大規模言語モデルを用い、画像を入力に説明テキストを生成するとより好ましい。これにより画像の情報を人間が説明するかのように記述した文章が作成され、形状や色など文字以外の見た目の情報なども含めて文字にできるため、他の場所との識別がより精度よくできる特徴データを生成できる。さらにまた、作成した説明情報（説明テキスト）を見た作業員が、テキストを補正し、再撮影して再度説明情報を作成する処理を行うとよい。 The feature data storage unit 232 stores feature data abstracted into an intermediate representation from one or more image data images of the area for which the 3D map was created. Each 3D map has one or more pieces of this feature data. The abstracted feature data includes, for example, data vectorized using a multimodal embedding model from the images used to create the 3D map, explanatory information (context) present in the images of each divided area, and associated location information for identifying the approximate location. The context may be text data, but it is preferable to use a vectorized multidimensional vector. The vector representing the feature data may be created, for example, by detecting characters or objects from the image to generate explanatory text, and then vectorizing the text information using an embedding model. Furthermore, the vector representing the feature data is not limited to text information; for example, it may be generated directly by convolving the image. Generating vectors based on images allows for the generation of feature data based on information that cannot be generated by character detection, such as the appearance of walls, which is advantageous in terms of reducing information loss. Additionally, information combining explanatory information from multiple images taken from different angles within the same area may be used as context information for a single 3D map. The text used to create this explanatory information can be created, for example, by image recognition of image data and using OCR technology. It is also more preferable to use a multimodal large-scale language model to generate explanatory text using images as input. This creates text that describes the image information as if it were a human, and can also include non-textual visual information such as shape and color, generating feature data that can be more accurately distinguished from other locations. Furthermore, it is also preferable for a worker to view the created explanatory information (explanatory text), correct the text, re-take photos, and create new explanatory information.

　また付随する位置情報は、例えばＷｉ－Ｆｉのアクセスポイント、ＧＰＳ情報等がある。例えば屋内であってもＧＰＳ信号を受信できるエリアがあるため、ＧＰＳに基づく位置情報からおおよその存在位置を特定できる。また、地下街などでは適宜位置にＷｉ－Ｆｉのアクセスポイントが設置されている。Ｗｉ－Ｆｉのアクセスポイントは、通信可能なエリアが狭い。よって、アクセスポイントと通信可能なエリア内の３Ｄマップに関連付ける特徴データとして、そのアクセスポイントを特定する情報を用いる。 Furthermore, the associated location information includes, for example, Wi-Fi access points and GPS information. For example, there are areas where GPS signals can be received even indoors, so the approximate location can be determined from GPS-based location information. Furthermore, Wi-Fi access points are installed in appropriate locations in underground shopping malls, etc. Wi-Fi access points have a small communication area. Therefore, information identifying the access point is used as feature data to associate the access point with a 3D map of the communication area.

　データベース２３に記憶する３Ｄマップと特徴データの作成は、何れを先に行ってもよく、同一の範囲に基づくものであればよい。 The 3D map and feature data stored in database 23 can be created in any order, as long as they are based on the same area.

　制御部２２は、例えば、プロセッサ及びメモリを含むコンピュータである。メモリは、例えば、ＲＡＭ（Ｒａｎｄｏｍ　Ａｃｃｅｓｓ　Ｍｅｍｏｒｙ）、及びＲＯＭ（Ｒｅａｄ　Ｏｎｌｙ　Ｍｅｍｏｒｙ）等を有する主記憶装置である。プロセッサは、メモリのＲＯＭから読み出したプログラムをＲＡＭに一時的に記憶させる。また、メモリのＲＡＭは、プロセッサに作業領域を提供する。プロセッサは、プログラムの実行中に生成されるデータをＲＡＭに一時的に記憶させながら演算処理を行うことにより、各種の制御を行う。 The control unit 22 is, for example, a computer including a processor and memory. The memory is, for example, a main storage device having RAM (Random Access Memory) and ROM (Read Only Memory). The processor temporarily stores programs read from the ROM in the RAM. The RAM also provides a working area for the processor. The processor performs various controls by performing arithmetic processing while temporarily storing data generated during program execution in RAM.

　制御部２２は、位置特定を行う機能として、マップマッチング部２２１と、マップ絞り込み部２２２を備える。マップマッチング部２２１は、形状に基づくマッチングを行う。すなわち、マップマッチング部２２１は、端末装置１０が撮影した画像データ等に基づき抽出した特徴点と、３Ｄマップ記憶部２３１に記憶された３Ｄマップ群とマッチングし、一致度の高い３Ｄマップを１つ特定し、端末装置１０のカメラ１２で撮影時の位置や姿勢を導出する。さらにマップマッチング部２２１は、導出した撮影位置や姿勢をもとに端末装置１０の３Ｄマップ上の座標や３Ｄマップ群全体の中での座標を求める機能を備えるとよい。また、形状に基づくマッチング処理は、例えばＶＰＳ等の撮影した画像データと、予め用意した３Ｄマップの対比に基づく既存の位置特定のアルゴリズムを利用するとよい。そして制御部２２は、特定した位置や姿勢等に関する位置情報を、ネットワーク２経由で端末装置１０に送る。 The control unit 22 has a map matching unit 221 and a map narrowing unit 222 as functions for performing position identification. The map matching unit 221 performs shape-based matching. That is, the map matching unit 221 matches feature points extracted based on image data captured by the terminal device 10 with the 3D maps stored in the 3D map storage unit 231, identifies one 3D map with a high degree of match, and derives the position and orientation at the time of capture by the camera 12 of the terminal device 10. Furthermore, the map matching unit 221 preferably has a function for calculating coordinates on the 3D map of the terminal device 10 or coordinates within the entire 3D map group based on the derived capture position and orientation. Furthermore, the shape-based matching process preferably utilizes an existing position identification algorithm based on comparing image data captured by a VPS or the like with a pre-prepared 3D map. The control unit 22 then sends position information regarding the identified position, orientation, etc. to the terminal device 10 via the network 2.

　マップ絞り込み部２２２は、マップマッチング部２２１による形状マッチングを行うに先立ち、その前段として取得した画像データ等を用い、画像が撮影されたエリアの絞り込み処理を行う。すなわち、マップ絞り込み部２２２は、画像データが表す内容を抽象化してマップの絞り込みを行う。マップ絞り込み部２２２は、取得した画像データ等と特徴データ記憶部２３２に記憶された特徴データに基づいて３Ｄマップの候補を抽出する。このマップ絞り込み部２２２の詳細な機能は、後述する。 Prior to shape matching by the map matching unit 221, the map narrowing down unit 222 uses the acquired image data, etc., as a preliminary step to narrow down the area where the image was taken. In other words, the map narrowing down unit 222 abstracts the content represented by the image data and narrows down the map. The map narrowing down unit 222 extracts candidates for the 3D map based on the acquired image data, etc., and the feature data stored in the feature data storage unit 232. The detailed functions of this map narrowing down unit 222 will be described later.

　図３は、実際にシステムを利用するユーザが、現場で端末装置１０のカメラ１２で撮影した画像を取得した場合の制御部２２の処理を説明するフローチャートである。システムを利用するユーザは、端末装置１０の一例であるスマートフォンにインストールされた所定のスマホアプリを起動する。スマホアプリは、所定の入力指示を受け付け、カメラ１２を起動し、常時撮影するモードにし、撮影中の画像を表示部１５に表示する。また、スマホアプリは、所定のタイミングで撮影中の画像をサーバ２０に送る。この所定タイミングは、決められた時間間隔、画面のタップなどの入力部１３から与えられる送信指示の受け付け等とするとよい。また、制御部１１は、画像データとともに所定のセンサ情報を送信する。 Figure 3 is a flowchart explaining the processing of the control unit 22 when an image captured by the camera 12 of the terminal device 10 at the site is acquired by a user actually using the system. The user using the system launches a specific smartphone app installed on a smartphone, which is an example of the terminal device 10. The smartphone app accepts a specific input instruction, starts the camera 12, switches it to continuous shooting mode, and displays the image being captured on the display unit 15. The smartphone app also sends the image being captured to the server 20 at a specific timing. This specific timing may be a set time interval, or the acceptance of a transmission instruction given from the input unit 13 such as a tap on the screen. The control unit 11 also transmits the specific sensor information along with the image data.

　サーバ２０の制御部２２は、端末装置１０で撮影した画像並びにセンサ情報を取得する（Ｓ１）。制御部２２は、取得した画像データを所定の記憶部に記憶する。記憶部は、例えば揮発性のメモリなど一時記憶部とするとよい。 The control unit 22 of the server 20 acquires the image captured by the terminal device 10 and sensor information (S1). The control unit 22 stores the acquired image data in a specified storage unit. The storage unit may be a temporary storage unit such as a volatile memory.

　制御部２２のマップ絞り込み部２２２は、取得した画像をマルチモーダルの大規模言語モデルでコンテクストと照合してエリアの判別を行い、機械学習モデルを用いた処理で特徴情報に落とし込んだ上で、撮影した位置を検出する画像の位置する３Ｄマップ候補を絞り込む（Ｓ２）。マップ絞り込み部２２２は、絞り込んだ３Ｄマップ候補のＩＤをマップマッチング部２２１に送る。 The map narrowing down unit 222 of the control unit 22 compares the acquired image with the context using a multimodal large-scale language model to determine the area, then processes it using a machine learning model to convert it into feature information, and narrows down the 3D map candidates where the image whose photographed position is to be detected is located (S2). The map narrowing down unit 222 sends the IDs of the narrowed down 3D map candidates to the map matching unit 221.

　次いで、マップマッチング部２２１は、一時記憶部に記憶する画像データを読み出し、機械学習モデルやＳＬＡＭの手法を用いて特徴点を抽出する（Ｓ３）。そしてマップマッチング部２２１は、求めた特徴点と、マップ絞り込み部２２２が絞り込んだ候補の３Ｄマップとのマッチングを行い、最も一致する３Ｄマップを特定し、その３Ｄマップ内の位置や姿勢を導出する（Ｓ４）。 Next, the map matching unit 221 reads the image data stored in the temporary storage unit and extracts feature points using a machine learning model or SLAM method (S3). The map matching unit 221 then matches the obtained feature points with the candidate 3D maps narrowed down by the map narrowing down unit 222, identifies the 3D map that best matches, and derives the position and orientation within that 3D map (S4).

　制御部２２は、導出した撮影位置や姿勢をもとに端末の３Ｄマップ上の座標や３Ｄマップ群全体の中での座標を求める（Ｓ５）。そしてこの求めた座標データを、端末装置１０に送る。スマホアプリは、受け取った座標データで特定される位置に基づき、例えば道案内などをする。 The control unit 22 calculates the coordinates on the terminal's 3D map and the coordinates within the entire group of 3D maps based on the derived shooting position and orientation (S5). The calculated coordinate data is then sent to the terminal device 10. The smartphone app then provides, for example, route guidance based on the position identified by the received coordinate data.

　上記の処理Ｓ２を実施するマップ絞り込み部２２２は、マップ絞り込み部２２２は、（１）画像認識とベクトル検索機能と、（２）マルチモーダルＬＬＭ（Ｌａｒｇｅ　Ｌａｎｇｕａｇｅ　Ｍｏｄｅｌｓ）を用いる検索機能を備える。 The map narrowing unit 222 that performs the above process S2 has (1) image recognition and vector search functions, and (2) a search function that uses multimodal LLMs (Large Language Models).

　（１）の画像認識とベクトル検索機能は、各種情報をいずれかの方法で特徴ベクトルにし、そのベクトル検索を元に位置特定を行うものであり、類似度検索である。このベクトル検索機能は、取得した入力画像（シークエンス）をベクトル化し、特徴データ記憶部２３２に記憶した３Ｄマップに関連付けたベクトルとの一致度を求め、所定の基準を満たす特徴データを抽出する。制御部２２は、抽出した特徴データに関連付けられた３Ｄマップを、マップ候補に決定する。所定の基準は、例えば、一致度が閾値を超えるものや、一致度の高いものから所定数のものとするとよい。入力画像のベクトル化処理は、画像をそのままマルチモーダルの埋め込みモデルを用いてベクトル化する（複数の情報から１つの抽象的な特徴データを生成する）ものや、画像中の文字を検出して得られたテキスト情報を抽象化してベクトル化するものなどがある。抽象化する対象のテキスト情報は、例えば画像認識処理を行い抽出した画像中に存在する文字を用いることができる。また、そのテキスト情報は、画像を直接或いは画像認識して画像中に存在する物体の名称等を生成ＡＩに与え、生成ＡＩが生成した画像の説明文を用いるとよい。このベクトル検索は、高速に処理できるため、比較対象の３Ｄマップが多い場合に効率よく３Ｄマップが対象になるか否かの判断を短時間で行える。 The image recognition and vector search function (1) converts various information into feature vectors using some method, and performs location identification based on the vector search; this is a similarity search. This vector search function vectorizes the acquired input image (sequence), calculates the degree of match with vectors associated with 3D maps stored in the feature data storage unit 232, and extracts feature data that meets specified criteria. The control unit 22 determines the 3D maps associated with the extracted feature data as map candidates. The specified criteria may be, for example, a match exceeding a threshold, or a predetermined number of highly matched maps. Vectorization processing of input images can involve vectorizing the image directly using a multimodal embedding model (generating a single abstract feature data set from multiple pieces of information), or abstracting and vectorizing the text information obtained by detecting characters in the image. The text information to be abstracted can be, for example, characters present in the image extracted through image recognition processing. Furthermore, the text information can be provided to the generation AI directly, or by image recognition, and the names of objects present in the image can be provided, and the description of the image generated by the generation AI can be used. This vector search can be processed quickly, so when there are many 3D maps to compare, it is possible to efficiently determine in a short amount of time whether a 3D map is eligible.

　（２）のＬＬＭを用いる検索機能は、画像と文字情報を入力として大規模言語モデルを用いて論理的に推論して位置特定を行うものである。このＬＬＭを用いる検索機能は、特徴データ記憶部２３２に記憶された候補マップの情報（説明文など）をマルチモーダルＬＬＭにコンテクストとして与え、端末装置１０が撮影しネットワーク２を経由して取得した画像を入力に与え、候補の３Ｄマップを推定する。すなわち、３Ｄマップのコンテクスト情報と、端末装置１０が撮影した画像情報をＬＬＭに与え、ＬＬＭが３Ｄマップ候補を求める。ＬＬＭのモデル自体をファインチューニングしてもよい。ＬＬＭを用いる検索は、ベクトル検索と比較すると精度は高いが処理時間がかかるという特徴を有する。ＬＬＭを用いた検索は、対応できるエリアの数に制約がある。 The search function using LLM (2) uses a large-scale language model to logically infer and identify a location using image and text information as input. This search function using LLM provides information about the candidate map (such as a description) stored in the feature data storage unit 232 as context to the multimodal LLM, and provides as input an image captured by the terminal device 10 and acquired via the network 2, thereby estimating a candidate 3D map. In other words, the context information for the 3D map and image information captured by the terminal device 10 are provided to the LLM, which then searches for 3D map candidates. The LLM model itself may be fine-tuned. Searches using LLM are characterized by higher accuracy compared to vector searches, but require longer processing times. Searches using LLM are limited in the number of areas they can cover.

　また、実際にシステムを利用するユーザが、現場で端末装置１０のカメラ１２で撮影した場合、予め特徴データ等の作成時に同じ位置にいてもカメラ１２の向きが異なると、撮影される画像データは異なる。このような場合でも、例えば、マルチモーダル大規模言語モデルの場合は、３Ｄマップが該当する範囲をうまく説明する特徴データとなるように複数の角度の画像の情報を結合・編集して取り扱うため、特に角度の違いには強くなる。よって、カメラ１２の向きが異なっていても多少の範囲であればマップの絞り込みで３Ｄマップ候補として抽出される。 Furthermore, when a user actually using the system takes a photograph on-site with the camera 12 of the terminal device 10, the captured image data will differ if the orientation of the camera 12 is different, even if the camera 12 is in the same position when the feature data, etc. was created in advance. Even in such cases, for example, in the case of a multimodal large-scale language model, image information from multiple angles is combined and edited to create feature data that effectively describes the range to which the 3D map applies, making it particularly tolerant to differences in angle. Therefore, even if the orientation of the camera 12 is different, if the range is within a certain range, the map can be narrowed down and extracted as a 3D map candidate.

　一方、ベクトル検索の場合、同じ位置から撮影しても向きが異なると、一致度は低くなり、異なる角度が大きくなると検出できなくなる。そこで、ベクトル検索の場合には、一つの３Ｄマップの範囲の中で複数の向きの画像について特徴ベクトルを作成し、各ベクトルを同一マップのＩＤに関連付けるとよい。つまり、事前に特徴データを作成し、データベース２３に記憶する際に、１つの３Ｄマップに複数の特徴ベクトルを関連付ける。このようにすると、利用時の撮影した画像に紐づく３Ｄマップを特徴ベクトルのマッチングから適切に特定することができる。 On the other hand, with vector searches, even if the images are taken from the same position, if the orientation is different the degree of match will be low, and if the angle difference is large it will not be possible to detect them. Therefore, with vector searches, it is advisable to create feature vectors for images taken in multiple orientations within the range of a single 3D map, and associate each vector with the ID of the same map. In other words, feature data is created in advance, and multiple feature vectors are associated with a single 3D map when it is stored in database 23. In this way, the 3D map linked to the image taken during use can be appropriately identified by matching the feature vectors.

　（１）と（２）の機能は、いずれか一方を用いてもよく、両方を組み合わせてもよい。一方を用いる場合、汎用性の高い（２）を用いるとよい。また、（２）は処理速度や処理可能なエリア数に制限があるため、（１）の画像認識とベクトル検索のみで絞り込みを行うようにしてもよい。また、組み合わせる場合、例えば、３Ｄマップ群の数が多い場合に、（１）で絞り込み、（２）で最終判断を下すとよい。また、このように使用する順番を分けるのではなく、（１）と（２）の検索機能を並列で動作させ、それぞれについて信頼度を算出し、信頼度をもとに複数の３Ｄマップ候補を求めるとよい。 Either of the functions (1) and (2) can be used, or both can be combined. If one is used, it is recommended to use the more versatile (2). Also, since (2) has limitations in processing speed and the number of areas it can process, it is also possible to narrow down the results using only the image recognition and vector search of (1). If combining them, for example, if there are a large number of 3D map groups, it is recommended to narrow down the results using (1) and make the final decision using (2). Also, rather than dividing the order of use in this way, it is recommended to run the search functions of (1) and (2) in parallel, calculate the reliability for each, and find multiple 3D map candidates based on the reliability.

　また、上述したように（１）の検索は高速処理ができるので、広いエリアに対して対応させることができるという特徴を有する。一方、（２）の検索は、凝った趣向、高精度に検出はできるが、その分時間が掛かり、対象を広くするとリアルタイムで絞り込みができなくなる恐れがある。そこで、位置特定を行う場所に応じて、いずれかを使い分ける機能を備えるとよい。例えば、ある建物Ａ内のエリアは比較的狭く、比較対象の３Ｄマップや特徴データの数が少ない場合には、（２）の検索を用いて絞り込みを行う。一方、建物Ｂ内のエリアは比較的広く、比較対象の３Ｄマップや特徴データの数が多い場合には、（１）の検索を用いて絞り込みを行う。何れの検索を行うかは、例えば、ＧＰＳ信号やＷｉ－Ｆｉのアクセスポイント等のセンサ情報に基づく現在位置から、制御部２２は建物Ａと建物Ｂのどちらにいるかを認識し、現在位置に基づき動作する検索機能を切替えるとよい。 Furthermore, as mentioned above, search (1) can be processed quickly and is characterized by its ability to be applied to wide areas. On the other hand, search (2) can be sophisticated and allows for highly accurate detection, but it takes time, and if the target is wide, it may not be possible to narrow down the search in real time. Therefore, it is advisable to provide a function that allows for the use of either search depending on the location to be located. For example, if the area within building A is relatively small and there are few 3D maps and feature data to compare, search (2) is used to narrow down the search. On the other hand, if the area within building B is relatively large and there are many 3D maps and feature data to compare, search (1) is used to narrow down the search. To determine which search to perform, the control unit 22 can determine whether the user is in building A or building B based on the current location based on sensor information such as GPS signals and Wi-Fi access points, and then switch the search function to operate based on the current location.

　また、マップ絞り込み部２２２は、ＧＰＳ（ＧＮＳＳ）情報、Ｗｉ－Ｆｉ情報、地磁気情報など各種センサ情報から事前に対象とする３Ｄマップ群を絞り込む機能を備えるとよい。さらに、制御部２２は、送られた画像データが対応エリア外と判定した場合、その旨を端末装置１０に送信し、必要な処理を促す機能を備える。対象エリア外か否かの判定は、例えば、センサ情報に基づき、端末装置１０が存在する現在位置の周囲に、位置特定を行う対象の施設等が存在するか否かにより行うとよい。また、対象エリア外か否かの判定は、マップ絞り込み部２２２が検索した結果、３Ｄマップ候補が検出されなかった場合に、対象エリア外と判定するようにしてもよい。 The map narrowing unit 222 may also have a function to narrow down the target 3D map group in advance based on various sensor information such as GPS (GNSS) information, Wi-Fi information, and geomagnetic information. Furthermore, the control unit 22 has a function to, if it determines that the sent image data is outside the target area, send a message to that effect to the terminal device 10 and prompt the terminal device 10 to perform the necessary processing. The determination of whether or not the target area is outside may be made, for example, based on sensor information, by determining whether or not there are any facilities or the like to be located around the current position of the terminal device 10. The determination of whether or not the target area is outside may also be made if no 3D map candidates are detected as a result of a search by the map narrowing unit 222.

　マップ絞り込み部２２２は、抽出した１または複数の３Ｄマップ候補の情報、例えばＩＤをマップマッチング部２２１に送る。そして、マップマッチング部２２１は、３Ｄマップ候補に対してのみマッチング処理を行う。このようにすると、マッチング対象が絞り込まれているため、マップマッチング部２２１は、正確な位置や姿勢（３次元的な向き）を特定することがでる。 The map narrowing down unit 222 sends information about the extracted 3D map candidate or candidates, such as their IDs, to the map matching unit 221. The map matching unit 221 then performs matching processing only on the 3D map candidates. In this way, the matching targets are narrowed down, allowing the map matching unit 221 to identify accurate positions and attitudes (three-dimensional orientations).

　このように、本実施形態の制御部２２は、内容面でのエリアの絞り込み（３Ｄマップの絞り込み）を行った上で、形状の特徴点マッチングを行う。また、マップ絞り込み部２２２は、ベクター検索やＬＬＭも含めて、特定の事物ではなくて、周辺情報も含めた総合的な特徴を、一度をベクターや大規模言語モデルにおける中間表現（これもベクター）に抽象化した上で、総合的に比較推論等することで絞り込みを行うことができる。そして、その後にマップマッチング部２２１が、形状に基づくマッチングを行うことで、精度良く現在位置を特定できる。 In this way, the control unit 22 of this embodiment narrows down the area in terms of content (narrows down the 3D map) and then performs shape feature point matching. Furthermore, the map narrowing down unit 222, including vector search and LLM, can narrow down the area by abstracting comprehensive features including peripheral information rather than specific objects into vectors or intermediate representations in large-scale language models (also vectors), and then performing comprehensive comparative inference, etc. Then, the map matching unit 221 performs shape-based matching, thereby enabling the current location to be identified with high accuracy.

　すなわち、上述したように形状が似ているエリアの３Ｄマップは、特徴量が似ているため、形状の特徴点マッチングのみではその識別を精度良く行うことができない場合がある。一方、そのように形状が似ていても、色や文字などを含めた画像を抽象化すると、その特徴は異なる。例えば、ベクトルの方向は異なるため、両者を弁別できる。 In other words, as mentioned above, 3D maps of areas with similar shapes have similar features, so they may not be able to be accurately identified by matching feature points on shape alone. On the other hand, even if the shapes are similar, when the images are abstracted to include colors, characters, etc., the features differ. For example, the vector directions are different, so the two can be distinguished.

　そこで、マップ絞り込み部２２２が絞り込むことで、異なるエリアの３Ｄマップを排除することができる。一方、抽象化した検索処理は、類似する３Ｄマップなど大まかな存在位置は絞り込むことができるが、正確な位置や姿勢を特定することはできない。そこで、事前に似たような３Ｄマップを排除することで、マップマッチング部２２１は、精度良く位置を特定できる。 The map narrowing unit 222 narrows down the search, eliminating 3D maps of different areas. On the other hand, abstract search processing can narrow down the search to similar 3D maps, etc., but cannot identify the exact location or orientation. Therefore, by eliminating similar 3D maps in advance, the map matching unit 221 can identify the location with high accuracy.

　よって、制御部２２は、反復形状、類似形状の判別が行える。例えば、制御部２２は、３Ｄマップの特徴点マッチングだけでは判別ができないところを判別できる。また、３Ｄマップの特徴点マッチングを行う処理対象の３Ｄマップを削減できるため、マップマッチング部２２１が行う点群マッチング処理の負荷低減や高速化ができ、それに伴い対応エリアを拡大できる。 Therefore, the control unit 22 can distinguish between repetitive shapes and similar shapes. For example, the control unit 22 can distinguish between areas that cannot be distinguished by 3D map feature point matching alone. Furthermore, because the number of 3D maps to be processed for 3D map feature point matching can be reduced, the load on the point cloud matching process performed by the map matching unit 221 can be reduced and the speed can be increased, thereby expanding the coverage area.

　つまり、形状が似たものでも抽象化すると特徴が相違し、似た３Ｄマップも弁別でき、正しい３Ｄマップを候補として抽出できる。よって特定のランドマークや物だけでなくて、マルチモーダルＬＬＭ（大規模言語モデル）や全体の特徴を認識する画像認識処理や機械学習モデルを用いて、総合的に判断を行うことで複雑なエリアの判別が可能となる。 In other words, even objects with similar shapes will have different characteristics when abstracted, making it possible to distinguish between similar 3D maps and extract the correct 3D map as a candidate. Therefore, it is possible to distinguish complex areas by making a comprehensive judgment using not only specific landmarks or objects, but also multimodal LLMs (large-scale language models) and image recognition processing and machine learning models that recognize overall features.

　例えば、３Ｄマップが類似するケースとしては、例えば展示会の場合、図４と図５に示すように、類似したブースのパッケージを使用している場合がある。この２つの図では、入口側に同じようなカウンター５１があり、周囲との仕切り壁５２も似た形状のものを用いている。従って、形状のみに基づいて判断すると、識別できない恐れがある。一方、２つのブースは、壁に貼り付けるポスター等の展示物５３の大きさや、数、設置位置が異なり、社名などを文字情報５４も異なる。そこで、それらの情報を抽象化した特徴データは、異なる。よって、マップ絞り込み部２２２により絞り込みにより、２つのブースを適切に識別ができる。 For example, in the case of an exhibition, similar booth packaging may be used, as shown in Figures 4 and 5. In these two figures, there is a similar counter 51 at the entrance, and the partition wall 52 from the surrounding area is also similar in shape. Therefore, discrimination based on shape alone may not be possible. On the other hand, the two booths differ in the size, number, and installation location of exhibits 53 such as posters pasted on the walls, and the text information 54 such as company names also differs. Therefore, the feature data that abstracts this information is different. Therefore, the two booths can be properly identified by narrowing down the search using the map narrowing unit 222.

　また、例えば商業施設の場合、ショッピングモール内の似たような門構えの店の場合でも、色や店舗名が異なっていたり、あるいは扱っている商品が異なっていたりする時に、マップマッチング部２２１を用いた形状の特徴点マッチングの前段階でマップ絞り込み部２２２による判別を行い、対象となる３Ｄマップ候補を絞り込むことで、適切かつ効率的に形状の特徴点マッチングを行えます。例えば、図６と図７に示す店舗は、直線的な通路に沿って配置され、出入り口が大きく開放された門構えであり、似た形状となっている。しかし、店のロゴ（文字情報）５５や、壁面の配色などが異なるため、それらを加味した特徴データは異なる。よって、マップ絞り込み部２２２による絞り込みにより、２つの店舗を適切に識別ができる。 Furthermore, in the case of a commercial facility, for example, even if two stores in a shopping mall have similar entrances, if they have different colors or store names, or if the products they carry are different, the map narrowing down unit 222 can make a distinction before the map matching unit 221 performs shape feature point matching, narrowing down the target 3D map candidates, allowing for appropriate and efficient shape feature point matching. For example, the stores shown in Figures 6 and 7 are located along a straight corridor, have wide, open entrances, and are similar in shape. However, because the store logos (text information) 55 and color schemes on the walls are different, the feature data that takes these factors into account is different. Therefore, the two stores can be appropriately identified by narrowing down the search results using the map narrowing down unit 222.

　また、図８と図９に示す店舗は、曲線形状と天井の照明５７などの構成や形状が類似するが扱っている商品５８が異なる（例えば、書籍と洋服）。同じような棚であっても、陳列されている商品が違えば、商品の形状・色なども異なり、文字も違う。よって、マップ絞り込み部２２２による絞り込みにより、２つの店舗を適切に識別ができる。 Furthermore, the stores shown in Figures 8 and 9 have similar configurations and shapes, such as curved shapes and ceiling lights 57, but carry different products 58 (for example, books and clothing). Even if the shelves are similar, if the products displayed are different, the shapes and colors of the products will be different, and the text will also be different. Therefore, the two stores can be properly identified by narrowing down the search using the map narrowing down unit 222.

　本実施形態によれば、おおまかな特徴を捉えるのに適した画像認識処理・ベクトル検索と、精度良く位置を検出するのに適した３Ｄマップの特徴マッチングを組み合わせることで、複雑な建物の中等、ＧＰＳでは検出できない位置特定が可能になる。 According to this embodiment, by combining image recognition processing and vector search, which are suitable for capturing general characteristics, with 3D map feature matching, which is suitable for detecting positions with high accuracy, it becomes possible to identify positions that cannot be detected by GPS, such as inside complex buildings.

　本実施形態では、画像とスマートフォンのセンサ情報を活用して屋内での位置特定を行うことができる。そして、端末装置１０は、サーバ２０が特定した位置情報を取得することで、例えば指定されたアイテムや目的地への誘導やＡＲコンテンツ提示を行う機能を備えるとよい。 In this embodiment, indoor location can be determined using images and smartphone sensor information. The terminal device 10 may then acquire the location information determined by the server 20 and have the functionality to, for example, provide guidance to a specified item or destination, or present AR content.

　本実施形態の位置特定システムは、既存のＶＰＳが苦手とする似た形状が存在する屋内であっても精度良く位置を特定することができる。その結果、大型商業施設やイベント会場などで特別な機器なく現在位置を特定し、案内や場所に応じたＡＲコンテンツの提供が可能となる。また、コンテンツや目的地の情報を管理アプリやＡＰＩを通じてリアルタイム更新できるシステムを構築し、小売店舗やイベント会場のような流動的な配置の場所でも利用可能となる。 The positioning system of this embodiment can accurately identify positions even indoors where similar shapes exist, something that existing VPSs struggle with. As a result, it is possible to identify the current location without special equipment in places such as large commercial facilities and event venues, and provide guidance and AR content tailored to the location. Furthermore, a system can be created that can update content and destination information in real time via a management app or API, making it possible to use the system in places with dynamic layouts such as retail stores and event venues.

　［変形例］
　位置特定システムは、以下に示す各種の変形実施が可能となる。図２は、実際の構成の一例を示す図である。この例では、サーバ２０を複数のクライアントサーバシステムで実現している。マップ絞り込み部２２２は、第１クラウドシステム３１に実装する。マップマッチング部２２１は、第２クラウドシステム３２に実装する。この第２クラウドシステム３２は、画像から特徴点を抽出する処理や、特徴点から現在の姿勢を推定する処理を行う。また、３Ｄマップの実体データは、第２クラウドシステム３２に関連したストレージ３３に記憶保持する。さらに３Ｄマップの特徴データと３ＤマップのＩＤとはデータベース３４で結合する。 [Modification]
The positioning system can be modified in various ways as described below. FIG. 2 is a diagram showing an example of an actual configuration. In this example, the server 20 is realized as a multiple client-server system. The map narrowing unit 222 is implemented in the first cloud system 31. The map matching unit 221 is implemented in the second cloud system 32. This second cloud system 32 performs processing to extract feature points from images and to estimate the current posture from the feature points. Furthermore, the entity data of the 3D map is stored and held in storage 33 associated with the second cloud system 32. Furthermore, the feature data of the 3D map and the 3D map ID are linked in a database 34.

　３Ｄマップの絞り込み処理と３Ｄマップとのマッチング処理の一部もしくは全部を、撮影機器のある端末装置１０上で行っても、データセンターや外部のサービスなどの計算資源（これを総称してクラウド）で行ってもよい。 The 3D map narrowing process and the 3D map matching process may be performed in part or in whole on the terminal device 10 containing the imaging equipment, or on computational resources such as a data center or external services (collectively referred to as the cloud).

　また、撮影機器のある端末装置と計算処理を行う端末装置が別に存在し、有線通信もしくは無線通信で接続する構成としてもいい（これを総称して端末とする）。マップの絞り込み処理をクラウドで行い、その結果を端末が取得したのち、再度３Ｄマップとのマッチング処理をクラウドで行ってもよいし、マップの絞り込み処理と３Ｄマップとのマッチング処理の全てをクラウド上で行ってその結果を取得してもよい。また、クラウドの処理も複数のクラウドにまたがって処理を行ってもよい。 Alternatively, a configuration may be adopted in which a terminal device containing the imaging equipment and a terminal device performing the calculation processing exist separately, and are connected via wired or wireless communication (collectively referred to as terminals). The map narrowing process may be performed in the cloud, and the terminal may obtain the results, after which the matching process with the 3D map may be performed again in the cloud, or both the map narrowing process and the matching process with the 3D map may be performed on the cloud, and the results obtained. Cloud processing may also be performed across multiple clouds.

　マップを絞り込む処理は、クラウド上で行っても、端末上で行ってもよい。端末上でマップの絞り込み処理を行うために必要な計算処理や機械学習モデル、特徴データなどを事前にダウンロードしておくことができる。ダウンロードする範囲はＧＰＳやＷｉ－Ｆｉなどのセンサ情報を用いて絞り込むことができる。また、ユーザの操作によっても絞り込みが可能である。 The process of narrowing down the map can be performed on the cloud or on the device. The computational processes, machine learning models, feature data, and other information required for the map narrowing process can be downloaded in advance to the device. The range to be downloaded can be narrowed down using sensor information such as GPS and Wi-Fi. Narrowing down can also be done by user operation.

　画像もしくは画像シークエンスから特徴点を抽出する処理もクラウド上で行っても、端末上で行ってもよい。 The process of extracting feature points from images or image sequences can be performed on the cloud or on the device.

　画像もしくは画像シークエンスから特徴点を抽出する処理は、マップの絞り込みの処理の前に行っても、同時もしくは後に行ってもよい。また、マップ絞り込みと同じ画像を用いて特徴点を抽出してもいい。 The process of extracting feature points from an image or image sequence may be performed before, at the same time as, or after the map refinement process. Feature points may also be extracted using the same images as those used for map refinement.

　３Ｄマップと画像の特徴点とをマッチングする処理はクラウド上で行っても、端末上で行ってもよい。端末上でマッチング処理を行うために３Ｄマップを事前にダウンロードしておくか、マップの絞り込み処理の結果を返す際に取得することができる。 The process of matching the 3D map with the image's feature points can be performed on the cloud or on the device. The 3D map can be downloaded in advance to perform the matching process on the device, or it can be obtained when the results of the map filtering process are returned.

　特徴データと３Ｄマップを保持するデータベースは単一のものではなくて、特定の３Ｄマップの特徴データと３Ｄマップとを対応付けた上で、別々のデータベースやクラウドシステム上で保存されていてもよい。 The database that holds the feature data and 3D maps may not be a single database; the feature data for a specific 3D map may be associated with the 3D map and stored in separate databases or cloud systems.

　最終的な位置の情報を端末に返さず、クラウド上での記録や活用だけに利用することもできる。また、すでに撮影されている画像を用いて、マップの絞り込み処理や３Ｄマップとのマッチング処理を行うことも考えられる。 It is also possible to use the final location information only for recording and utilization on the cloud without returning it to the device. It is also possible to use images that have already been taken to narrow down the map or match it with a 3D map.

　また、上述した各実施形態並びに変形例では、本実施形態では、屋内の所定の領域を示す３Ｄマップは、その領域を複数に分割した各エリアに対して作成したが、屋内の所定の領域の全体に対し単一の３Ｄマップを作成してもよい。この場合、単一の３Ｄマップ内の各位置に対しそれぞれ所定の特徴データを関連付け、特徴データ記憶部に記憶する。マップ絞り込み部は、取得した画像データ等に基づき生成した特徴データと、特徴データ記憶部に記憶された特徴データとに基づき、形状のマッチングを行うべき候補の地点を絞り込む。すなわち、単一の３Ｄマップ内の絞り込んだ地点に対応する３Ｄマップの部分を３Ｄマップの候補として選択する。そして、マップマッチング部は、候補となった単一の３Ｄマップ内の一部の３Ｄマップに対して形状のマッチングを行う。 In addition, in the above-described embodiments and variations, in this embodiment, 3D maps showing a predetermined indoor area are created for each of the areas into which the area is divided, but a single 3D map may be created for the entire predetermined indoor area. In this case, predetermined feature data is associated with each position within the single 3D map and stored in the feature data storage unit. The map narrowing unit narrows down the candidate points for shape matching based on the feature data generated based on the acquired image data, etc., and the feature data stored in the feature data storage unit. In other words, the 3D map portion corresponding to the narrowed-down point within the single 3D map is selected as a 3D map candidate. The map matching unit then performs shape matching on the 3D map portion within the single 3D map candidate.

　このようにすることで、単一の比較的広く類似する形状の３Ｄマップの要素を含む３Ｄマップの全体に対して形状のマッチングを行うと、検索できないような場合であっても、単一の３Ｄマップ内で絞り込んだ範囲で形状のマッチングを行うことができ、精度よく位置を特定することができる。 By doing this, even in cases where shape matching would not be possible for the entire 3D map containing elements of a single, relatively broad, similar shape, shape matching can be performed within a narrowed-down range within a single 3D map, allowing for accurate location identification.

　この場合、単一の３Ｄマップは、上述した実施形態における複数の３Ｄマップの集合体ととらえてもよい。 In this case, a single 3D map may be considered a collection of multiple 3D maps in the above-mentioned embodiments.

　［カメラを搭載する装置］
　上述した実施形態では、端末装置１０は、主にスマートフォンを用いた例を説明したが、実施例で例示列挙したスマートグラスなど、カメラを備えた端末とするとよい。また、端末は、スマートフォンやスマートグラスのように人が持ち歩くものに限ることはなく、自ら移動する移動体としてもよい。移動体は、自動車やドローンといったモビリティや、ロボット等から構成するとよい。ドローンやロボットなどの移動体へ適用は、例えば、複数階の建物内で、今何階にいるか、そのフロアのどこにいるかなどを認識し、自動走行などに利用できるとよい。また自動車の場合、例えば地下駐車場などにおいて現在位置を認識し自動走行に利用できる。 [Device equipped with a camera]
In the above-described embodiment, the terminal device 10 is mainly a smartphone. However, it may be a terminal equipped with a camera, such as smart glasses exemplified in the examples. Furthermore, the terminal is not limited to devices carried by people, such as smartphones and smart glasses, but may also be a mobile object that moves on its own. The mobile object may be a mobility such as an automobile or drone, or a robot. For example, application to a mobile object such as a drone or robot may be possible by recognizing the current floor and the location within a multi-story building, and using the information for autonomous driving. Furthermore, in the case of an automobile, it may be possible to recognize the current location in an underground parking lot, for example, and use the information for autonomous driving.

　端末は、表示部分を備えていることは必須ではなく、単に撮影機器を備えるものでもよい。 The terminal does not necessarily have to have a display; it may simply have a camera.

　また、位置特定システムは、現在リアルタイムで撮影して取得した画像に基づいて現在の位置を特定するものに限ることはない。例えば、位置特定システムは、すでに撮影されている画像を用いて一連の処理を行い、処理対象の画像が撮影された位置の推定を行うものに適用してもよい。 Furthermore, the location identification system is not limited to one that identifies the current location based on images captured in real time. For example, the location identification system may be applied to one that performs a series of processes using images that have already been captured, and estimates the location where the image being processed was captured.

　以上、本発明の様々な側面を実施形態並びに変形例を用いて説明してきたが、これらの実施形態や説明は、本発明の範囲を制限する目的でなされたものではなく、本発明の理解に資するために提供されたものであることを付言しておく。本発明の範囲は、明細書に明示的に説明された構成や製法に限定されるものではなく、本明細書に開示される本発明の様々な側面の組み合わせをも、その範囲に含むものである。本発明のうち、特許を受けようとする構成を、添付の特許請求の範囲に特定したが、現在の処は特許請求の範囲に特定されていない構成であっても、本明細書に開示される構成を、将来的に特許請求する可能性があることを、念のために申し述べる。 Although various aspects of the present invention have been described above using embodiments and modifications, it should be noted that these embodiments and descriptions are not intended to limit the scope of the present invention, but are provided to aid in understanding the present invention. The scope of the present invention is not limited to the configurations and manufacturing methods explicitly described in the specification, but also includes combinations of the various aspects of the present invention disclosed herein. While the configurations of the present invention that are sought to be patented are specified in the appended claims, it is hereby emphasized that configurations not currently specified in the claims may be claimed in the future as disclosed herein.

１　　　：位置特定システム
２　　　：ネットワーク
１０　　：端末装置
１１　　：制御部
１２　　：カメラ
１３　　：入力部
１４　　：通信部
１５　　：表示部
１６　　：ＧＰＳ受信部
２０　　：サーバ
２１　　：通信部
２２　　：制御部
２３　　：データベース
３１　　：第１クラウドシステム
３２　　：第２クラウドシステム
３３　　：ストレージ
３４　　：データベース
２２１　：マップマッチング部
２２２　：マップ絞り込み部
２３１　：３Ｄマップ記憶部
２３２　：特徴データ記憶部 1: Positioning system 2: Network 10: Terminal device 11: Control unit 12: Camera 13: Input unit 14: Communication unit 15: Display unit 16: GPS receiving unit 20: Server 21: Communication unit 22: Control unit 23: Database 31: First cloud system 32: Second cloud system 33: Storage 34: Database 221: Map matching unit 222: Map narrowing unit 231: 3D map storage unit 232: Feature data storage unit

Claims

A location identification system for identifying a location where an acquired image was taken,
a map narrowing function that narrows down candidates of 3D maps to be searched using feature data abstracted based on the image;
a map matching function that performs shape matching of the image with the narrowed-down candidates of the 3D map and identifies the location where the image was taken;
A location system comprising:

The location identification system according to claim 1 , wherein the map narrowing function performs the abstraction using a large-scale language model with context information of the image and the 3D map as input, thereby narrowing down the candidates for the 3D map.

The location identification system described in claim 1, wherein the map refinement function creates the feature data by vectorizing information based on the image.

The location identification system described in claim 3, wherein the map refinement function creates the feature data by vectorizing text information related to the image.

The location identification system described in claim 1, wherein the abstraction performed by the map refinement function is performed using the image and context information related to the image.

The location identification system described in claim 1, wherein the map narrowing function is performed using location information detected by the device that captured the image.

The location identification system of claim 1 includes a database that stores feature data to be compared and 3D maps and that is accessible by the map narrowing function and the map matching function.

equipped with a camera and a terminal with communication capabilities,
The position specifying system according to claim 1 , wherein the terminal has a function of transmitting, using the communication function, an image captured by the camera to a server having the map narrowing function.

A computer-readable recording medium having recorded thereon a program for causing a computer to implement the functions of the location identification system described in any one of claims 1 to 8.