KR20130142191A

KR20130142191A - Robust feature matching for visual search

Info

Publication number: KR20130142191A
Application number: KR1020137030249A
Authority: KR
Inventors: 선딥 바드다디; 오누르 씨. 함시치; 유리 레즈닉; 존 에이치. 홍; 종 유. 이
Original assignee: 퀄컴 인코포레이티드
Priority date: 2011-04-14
Filing date: 2012-04-13
Publication date: 2013-12-27
Also published as: US20120263388A1; JP2014512057A; WO2012142483A1; JP5749394B2; CN103582884A; US9036925B2; EP2697725A1

Abstract

비주얼 탐색을 위한 강건한 특징 매칭을 수행하기 위한 기법들이 기재된다. 인터페이스 및 특징 매칭 유닛을 포함하는 장치가 이들 기법들을 구현한다. 인터페이스는 질의 특징 기술자를 수신한다. 특징 매칭 유닛은 이어서 질의 특징 기술자와 기준 특징 기술자들 간의 거리를 컴퓨팅하고 클러스터링 알고리즘에 따라 컴퓨팅된 거리들의 제 1 그룹 및 컴퓨팅된 거리들의 제 2 그룹을 결정한다. 특징 매칭 유닛은 이어서 컴퓨팅된 거리들의 결정된 제 1 그룹 및 제 2 그룹에 기초하여 질의 특징 기술자가 컴퓨팅된 거리들 중 최소의 것에 연관되는 기준 특징 기술자들 중 하나에 매칭하는지를 결정한다. Techniques for performing robust feature matching for visual search are described. An apparatus including an interface and feature matching unit implements these techniques. The interface receives a query feature descriptor. The feature matching unit then computes the distance between the query feature descriptor and the reference feature descriptors and determines a first group of computed distances and a second group of computed distances according to a clustering algorithm. The feature matching unit then determines whether the query feature descriptor matches one of the reference feature descriptors associated with the least of the computed distances based on the determined first and second groups of computed distances.

Description

Robust Feature Matching for Visual Search {ROBUST FEATURE MATCHING FOR VISUAL SEARCH}

[0001] 본 개시는 영상 프로세싱 시스템들에 관한 것으로서, 보다 구체적으로는, 영상 프로세싱 시스템들을 통해 비주얼 탐색들을 수행하는 것에 관한 것이다. TECHNICAL FIELD The present disclosure relates to image processing systems, and more particularly, to performing visual searches through image processing systems.

[0002] 컴퓨팅 디바이스들 또는 컴퓨터들의 맥락에서 비주얼 탐색(visual search)은 컴퓨터 또는 다른 디바이스가 하나 이상의 영상들 내의 다른 객체들 및/또는 특징들 사이에서 객체들 및/또는 특징들에 대한 탐색을 수행하는 것을 가능하게 하는 기법들을 지칭한다. 비주얼 탐색에 있어서의 최근의 관심은 영상 스케일의 변경들, 노이즈, 조명 및 로컬 지오메트리 왜곡을 포함하는 매우 다양한 변하는 영상 조건들에서 부분적으로 차폐된 객체들 및/도는 특징들을 컴퓨터가 식별하는 것을 가능하게 하는 알고리즘들을 발생시켰다. 이 동일한 시간 동안, 텍스트를 입력하거나 모바일 디바이스들과 다른 방식으로 인터페이싱하기 위한 제한된 사용자 인터페이스들을 가질 수 있지만 카메라들을 피처링(feature)하는 모바일 디바이스들이 출현하였다. 모바일 디바이스들 및 모바일 디바이스 애플리케이션들의 개발자들은 모바일 디바이스와의 사용자 상호작용을 강화하기 위해 모바일 디바이스의 카메라를 활용하고자 하였다. Visual search in the context of computing devices or computers allows a computer or other device to search for objects and / or features among other objects and / or features in one or more images. Refers to techniques that make it possible to do so. Recent interest in visual search has enabled the computer to identify partially shielded objects and / or features in a wide variety of changing imaging conditions, including changes in image scale, noise, lighting and local geometry distortion. Generated algorithms. During this same time, mobile devices have emerged that feature limited camera interfaces for entering text or otherwise interfacing with mobile devices. Developers of mobile devices and mobile device applications have sought to leverage the camera of a mobile device to enhance user interaction with the mobile device.

[0003] 일 강화를 예시하기 위해, 모바일 디바이스의 사용자는 상점에서 쇼핑하는 동안 임의의 정해진 물건의 영상을 캡처하도록 모바일 디바이스의 카메라를 활용할 수 있다. 모바일 디바이스는 이어서 매칭하는 기준 영상에 기초하여 영상("탐색 영상"으로서 지칭될 수 있음)에서 도시되는 물건을 식별하도록 다양한 기준 영상들에 대한 아카이빙된 특징 기술자들(archived feature descriptors)의 세트 내에서 비주얼 탐색 알고리즘을 시작할 수 있다. 물건을 식별한 이후, 모바일 디바이스는 이어서 인터넷의 탐색을 시작하고 물건이 근처의 상인들 및/또는 온라인 상인들로부터 이용 가능한 최저 가격을 포함해서 식별된 물건에 관한 정보를 포함하는 웹페이지를 제시할 수 있다. 이러한 방식으로, 사용자는 키보드(사용자가 인터페이스하는 영상으로서 터치 스크린 상에 그것이 제시된다는 점에서 종종 "가상" 키보드임) 또는 다른 입력 매커니즘을 통해 모바일 디바이스와 인터페이스해야만 할 필요성을 방지할 수 있지만, 비주얼 탐색 및 후속 웹페이지 탐색들을 개시하기 위해 탐색 영상을 단지 캡처할 수 있다. To illustrate one enhancement, a user of a mobile device may utilize the camera of the mobile device to capture an image of any given item while shopping in a store. The mobile device is then within a set of archived feature descriptors for the various reference images to identify the object shown in the image (which may be referred to as a "navigation image") based on the matching reference image. You can start the visual search algorithm. After identifying the object, the mobile device may then begin searching the Internet and present a webpage that contains information about the identified object, including the lowest price available from nearby merchants and / or online merchants. Can be. In this way, the user can avoid the need to interface with the mobile device through a keyboard (which is often a "virtual" keyboard in that it is presented on a touch screen as the image the user interfaces) or other input mechanism, but visual The search image may only be captured to initiate the search and subsequent webpage searches.

[0004] 비주얼 탐색에 대한 액세스 및 카메라가 장착된 모바일 디바이스가 이용할 수 있는 다수의 애플리케이션들이 있지만, SIFT(scale invariant feature transform) 알고리즘과 같은 비주얼 탐색을 구현하기 위한 비주얼 탐색 알고리즘들은 특징 매칭을 수행하는 견지에서 부족할 수 있다. 특징 매칭은 탐색 영상으로부터 추출된 탐색 특징 기술자들이 기준 영상들로부터 추출된 기준 특징 기술자들에 대해 매칭되는 비주얼 탐색 알고리즘의 양상을 지칭한다. Although there are many applications that can be used by mobile devices equipped with cameras and access to visual navigation, visual search algorithms for implementing visual search, such as a scale invariant feature transform (SIFT) algorithm, perform feature matching. May be lacking in terms of Feature matching refers to an aspect of a visual search algorithm in which search feature descriptors extracted from a search image are matched against reference feature descriptors extracted from reference images.

[0005] 이러한 부족들을 예시하기 위해, 탐색 특징 기술자 및 기준 특징 기술자들이 각각 빌딩을 가로질러 반복하는 특유의 아치들 또는 창문들과 같이 탐색 및 기준 영상들의 반복하는 특징으로부터 추출될 때의 인스턴스들에서, 그렇지 않았다면 탐색 특징 기술자에 매칭했었을 기준 특징 기술자를 폐기할 수 있는 SIFT 알고리즘을 고려한다. 또한, SIFT 알고리즘은 공통적으로 임의의 정해진 비주얼 탐색에 응답하여 단일의 영상만을 리턴하며, 여기서 이러한 리턴된 영상은 SIFT 알고리즘에 의해 알고리즘적으로 "최상의 매칭"인 것으로 결정된다. 그러나 사용자들은 단일의 SIFT 최상의 매칭 결과가 사용자의 기대에 매칭하지 않을 수 있기 때문에 사용자를 좌절시킬 수 있는 SIFT 알고리즘과 동일한 방식으로 "최상의 매칭"를 무엇이 구성하는지를 결정할 수 없을 수 있다. To illustrate these deficiencies, in the instances when the search feature descriptor and the reference feature descriptors are extracted from the repeating feature of the search and reference images, such as the distinct arches or windows that repeat across the building, respectively. Consider an SIFT algorithm that can discard reference feature descriptors that would otherwise have matched the search feature descriptors. In addition, the SIFT algorithm commonly returns only a single image in response to any given visual search, where this returned image is determined to be "best match" algorithmically by the SIFT algorithm. However, users may not be able to determine what constitutes "best match" in the same way as a SIFT algorithm that can be frustrating because a single SIFT best match result may not match the user's expectations.

[0006] 일반적으로, 본 개시는 비주얼 탐색을 수행할 때 특징 매칭을 용이하게 하는 기법들을 기술한다. 이 기법들은 탐색 결과들의 랭크된 리스트들을 제공할 수 있는 특징 매칭 알고리즘을 제공함으로써 특징 매칭을 개선할 수 있다. 기법들은 반복 특징들을 수용하는 특징 매칭 알고리즘을 제공함으로써 강건성을 개선할 수 있다. 현재 탐색 특징 기술자와 매칭하는 2개 이상의 기준 특징들이 서로 근접하게 위치될 때 기준 특징 기술자들을 거절하기 보단 오히려, 예를 들어, SIFT 알고리즘에 의해 제공되는 바와 같이(2개 이상의 매칭 기준 특징 기술자들이 너무 가까운 경우, 이들 매칭 기준 특징 기술자들은 고유할 가능성이 적고 이에 따라 탐색 영상의 분류를 용이하게 할 가능성이 적다는 전제 하에서), 기법들은 특징 매칭을 수행할 때 고유성을 보다 적절히 결정하기 위해 클러스터링 알고리즘들을 활용할 수 있다. 또한, 기법들은 종래의 비주얼 탐색 알고리즘들에서 공통적인 바와 같이 단일의 기준 영상을 단순히 리턴하기 보단 오히려 매칭 기준 영상들의 랭크된 리스트의 생성 및 리턴을 용이하게 할 수 있다. 매칭 기준 영상들의 랭크된 리스트는 최상의 매칭을 구성하는 것의 알고리즘 결정을 수용하도록 강제되기 보단 오히려, "최상의 매칭"으로서 사용자가 고려하는 것을 선택하는 기뢰를 사용자에게 제공할 수 있다. In general, the present disclosure describes techniques for facilitating feature matching when performing a visual search. These techniques can improve feature matching by providing a feature matching algorithm that can provide ranked lists of search results. The techniques can improve robustness by providing a feature matching algorithm that accommodates iterative features. Rather than rejecting the reference feature descriptors when two or more reference features matching the current search feature descriptor are located in close proximity to each other, for example, as provided by the SIFT algorithm (two or more matching reference feature descriptors are too large). In the near term, under the premise that these matching criteria feature descriptors are less likely to be unique, and thus less likely to facilitate classification of search images), the techniques employ clustering algorithms to more appropriately determine uniqueness when performing feature matching. It can be utilized. In addition, the techniques may facilitate generation and return of a ranked list of matching reference images rather than simply returning a single reference image as is common in conventional visual search algorithms. Rather than being forced to accept an algorithmic decision of what constitutes the best match, the ranked list of matching reference images may provide the user with a mine to choose what the user considers as “best match”.

[0007] 일 예에서, 비주얼 탐색 디바이스를 통해 비주얼 탐색을 수행하기 위한 방법으로서, 이 방법은 상기 비주얼 탐색 디바이스를 통해, 비주얼 탐색 질의에 의해 제공된 질의 특징 기술자(query feature descriptor)와 복수의 기준 특징 기술자들 각각 간의 거리를 컴퓨팅하는 단계를 포함하고, 상기 비주얼 탐색 질의는 상기 비주얼 탐색을 개시한다. 이 방법은 또한 상기 비주얼 탐색 디바이스를 통해, 클러스터링 알고리즘(clustering algorithm)에 따라 컴퓨팅된 거리들의 하나 이상의 제 1 그룹 및 컴퓨팅된 거리들의 제 2 그룹을 결정하는 단계를 포함하고 상기 컴퓨팅된 거리들의 제 1 그룹은 상기 복수의 기준 특징 기술자들의 연관된 것이 상기 컴퓨팅된 거리들의 제 2 그룹 내에 있는 것으로 결정된 컴퓨팅된 거리들의 것들에 비해 상기 질의 특징 기술자에 대해 근처에 있다고 표시하는 컴퓨팅된 거리들의 것들을 포함하고 컴퓨팅된 거리들의 제 2 그룹은 상기 복수의 기준 특징 기술자들의 연관된 것이 상기 컴퓨팅된 거리들의 제 1 그룹 내에 있는 것으로 결정된 컴퓨팅된 거리들의 것들에 비해 상기 질의 특징 기술자로부터 떨어져 있다고 표시하는 컴퓨팅된 거리들의 것들을 포함한다. 이 방법은 추가로 상기 비주얼 탐색 디바이스를 통해, 상기 컴퓨팅된 거리들의 결정된 제 1 그룹 및 상기 컴퓨팅된 거리들의 제 2 그룹에 기초하여 상기 질의 특징 기술자가 상기 컴퓨팅된 거리들 중 최소의 것과 연관되는 복수의 기준 특징 기술자들 중 하나에 매칭하는지를 결정하는 단계를 포함한다. In one example, a method for performing a visual search via a visual search device, the method comprising, via the visual search device, a query feature descriptor provided by a visual search query and a plurality of reference features. Computing a distance between each of the descriptors, wherein the visual search query initiates the visual search. The method also includes determining, via the visual navigation device, one or more first groups of computed distances and a second group of computed distances according to a clustering algorithm. The group includes those of computed distances indicating that the associated of the plurality of reference feature descriptors is near to the query feature descriptor relative to those of the computed distances determined to be within the second group of computed distances. The second group of distances includes those of computed distances indicating that the associated of the plurality of reference feature descriptors is away from the query feature descriptor relative to those of the computed distances determined to be within the first group of computed distances. . The method further includes, via the visual navigation device, a plurality of query feature descriptors associated with the least of the computed distances based on the determined first group of computed distances and the second group of computed distances. And determining whether to match one of the reference feature descriptors of.

[0008] 다른 예로서, 비주얼 탐색을 수행하기 위한 장치는 비주얼 탐색 질의에 의해 제공된 질의 특징 기술자(query feature descriptor)와 복수의 기준 특징 기술자들 각각 간의 거리를 컴퓨팅하기 위한 수단을 포함하며, 상기 비주얼 탐색 질의는 상기 비주얼 탐색을 개시한다. 이 장치는 또한 클러스터링 알고리즘(clustering algorithm)에 따라 컴퓨팅된 거리들의 하나 이상의 제 1 그룹 및 컴퓨팅된 거리들의 제 2 그룹을 결정하기 위한 수단을 포함하고 상기 컴퓨팅된 거리들의 제 1 그룹은 상기 복수의 기준 특징 기술자들의 연관된 것이 상기 컴퓨팅된 거리들의 제 2 그룹 내에 있는 것으로 결정된 컴퓨팅된 거리들의 것들에 비해 상기 질의 특징 기술자에 대해 근처에 있다고 표시하는 컴퓨팅된 거리들의 것들을 포함하고 컴퓨팅된 거리들의 제 2 그룹은 상기 복수의 기준 특징 기술자들의 연관된 것이 상기 컴퓨팅된 거리들의 제 1 그룹 내에 있는 것으로 결정된 컴퓨팅된 거리들의 것들에 비해 상기 질의 특징 기술자로부터 떨어져 있다고 표시하는 컴퓨팅된 거리들의 것들을 포함한다. 이 장치는 추가로 상기 컴퓨팅된 거리들의 결정된 제 1 그룹 및 상기 컴퓨팅된 거리들의 제 2 그룹에 기초하여 상기 질의 특징 기술자가 상기 컴퓨팅된 거리들 중 최소의 것과 연관되는 복수의 기준 특징 기술자들 중 하나에 매칭하는지를 결정하는 단계를 포함한다. As another example, an apparatus for performing a visual search includes means for computing a distance between each of a plurality of reference feature descriptors and a query feature descriptor provided by a visual search query, wherein the visual The search query initiates the visual search. The apparatus also includes means for determining one or more first groups of computed distances and a second group of computed distances in accordance with a clustering algorithm wherein the first group of computed distances is determined by the plurality of criteria. The second group of computed distances includes those of computed distances indicating that an association of feature descriptors is near to the query feature descriptor relative to those of the computed distances determined to be within the second group of computed distances. The associated of the plurality of reference feature descriptors includes those of computed distances indicating that the query feature descriptor is away from those of the computed distances determined to be within the first group of computed distances. The apparatus further includes one of a plurality of reference feature descriptors in which the query feature descriptor is associated with the least of the computed distances based on the determined first group of computed distances and the second group of computed distances. Determining whether to match.

[0009] 다른 예에서, 비주얼 탐색을 수행하기 위한 장치는 질의 특징 기술자를 수신하는 인터페이스, 및 특징 매칭 유닛을 포함하고, 상기 특징 매칭 유닛은 비주얼 탐색 질의에 의해 제공된 질의 특징 기술자(query feature descriptor)와 복수의 기준 특징 기술자들 각각 간의 거리를 컴퓨팅하도록 - 상기 비주얼 탐색 질의는 상기 비주얼 탐색을 개시함 - ; 및 클러스터링 알고리즘(clustering algorithm)에 따라 컴퓨팅된 거리들의 하나 이상의 제 1 그룹 및 컴퓨팅된 거리들의 제 2 그룹을 결정하도록; 상기 컴퓨팅된 거리들의 제 1 그룹은 상기 복수의 기준 특징 기술자들의 연관된 것이 상기 컴퓨팅된 거리들의 제 2 그룹 내에 있는 것으로 결정된 컴퓨팅된 거리들의 것들에 비해 상기 질의 특징 기술자에 대해 근처에 있다고 표시하는 컴퓨팅된 거리들의 것들을 포함하고 컴퓨팅된 거리들의 제 2 그룹은 상기 복수의 기준 특징 기술자들의 연관된 것이 상기 컴퓨팅된 거리들의 제 1 그룹 내에 있는 것으로 결정된 컴퓨팅된 거리들의 것들에 비해 상기 질의 특징 기술자로부터 떨어져 있다고 표시하는 컴퓨팅된 거리들의 것들을 포함함 - ; 및 상기 컴퓨팅된 거리들의 결정된 제 1 그룹 및 상기 컴퓨팅된 거리들의 제 2 그룹에 기초하여 상기 질의 특징 기술자가 상기 컴퓨팅된 거리들 중 최소의 것과 연관되는 복수의 기준 특징 기술자들 중 하나에 매칭하는지를 결정하도록 구성된다. In another example, an apparatus for performing a visual search includes an interface to receive a query feature descriptor, and a feature matching unit, wherein the feature matching unit is a query feature descriptor provided by a visual search query. And compute a distance between each of a plurality of reference feature descriptors, wherein the visual search query initiates the visual search; And determine one or more first groups of computed distances and a second group of computed distances according to a clustering algorithm; The first group of computed distances is computed to indicate that an associated of the plurality of reference feature descriptors is near to the query feature descriptor as compared to those of the computed distances determined to be within the second group of computed distances. A second group of computed distances, including those of distances, indicating that an associated of the plurality of reference feature descriptors is away from the query feature descriptor relative to those of the computed distances determined to be within the first group of computed distances. Includes those of computed distances; And based on the determined first group of computed distances and the second group of computed distances, determine whether the query feature descriptor matches one of a plurality of reference feature descriptors associated with the least of the computed distances. It is configured to.

[0010] 다른 예에서, 컴퓨터-판독 가능한 매체는 명령을 포함하고, 상기 명령은 실행될 때, 하나 이상의 프로세서들로 하여금, 비주얼 탐색 질의에 의해 제공된 질의 특징 기술자(query feature descriptor)와 복수의 기준 특징 기술자들 각각 간의 거리를 컴퓨팅하게 하고 - 상기 비주얼 탐색 질의는 상기 비주얼 탐색을 개시함 - ; 및 클러스터링 알고리즘(clustering algorithm)에 따라 컴퓨팅된 거리들의 하나 이상의 제 1 그룹 및 컴퓨팅된 거리들의 제 2 그룹을 결정하게 하고 - 상기 컴퓨팅된 거리들의 제 1 그룹은 상기 복수의 기준 특징 기술자들의 연관된 것이 상기 컴퓨팅된 거리들의 제 2 그룹 내에 있는 것으로 결정된 컴퓨팅된 거리들의 것들에 비해 상기 질의 특징 기술자에 대해 근처에 있다고 표시하는 컴퓨팅된 거리들의 것들을 포함하고 컴퓨팅된 거리들의 제 2 그룹은 상기 복수의 기준 특징 기술자들의 연관된 것이 상기 컴퓨팅된 거리들의 제 1 그룹 내에 있는 것으로 결정된 컴퓨팅된 거리들의 것들에 비해 상기 질의 특징 기술자로부터 떨어져 있다고 표시하는 컴퓨팅된 거리들의 것들을 포함함 - ; 및 상기 컴퓨팅된 거리들의 결정된 제 1 그룹 및 상기 컴퓨팅된 거리들의 제 2 그룹에 기초하여 상기 질의 특징 기술자가 상기 컴퓨팅된 거리들 중 최소의 것과 연관되는 복수의 기준 특징 기술자들 중 하나에 매칭하는지를 결정하게 한다. In another example, a computer-readable medium includes instructions that, when executed, cause the one or more processors to execute a query feature descriptor and a plurality of reference features provided by a visual search query. Compute a distance between each of the descriptors, wherein the visual search query initiates the visual search; And determine one or more first groups of computed distances and a second group of computed distances according to a clustering algorithm, wherein the first group of computed distances is associated with the plurality of reference feature descriptors. And those of computed distances indicating that the query feature descriptor is near relative to those of the computed distances determined to be within the second group of computed distances, wherein the second group of computed distances is the plurality of reference feature descriptors. Includes those of computed distances indicating that the association of the distance from the query feature descriptor is relative to those of the computed distances determined to be within the first group of computed distances; And based on the determined first group of computed distances and the second group of computed distances, determine whether the query feature descriptor matches one of a plurality of reference feature descriptors associated with the least of the computed distances. Let's do it.

[0011] 다른 예로서, 시스템은 비주얼 탐색을 개시하기 위한 탐색 질의에 의해 질의 특징 기술자를 전송하는 클라이언트 디바이스, 복수의 기준 질의 기술자를 저장하는 데 및 비주얼 탐색을 수행하는 비주얼 탐색 서버 디바이스를 포함한다. 비주얼 탐색 서버는 탐색 질의에 의해 질의 특징 기술자를 수신하는 인터페이스 및 특징 매칭 유닛을 포함하고, 상기 특징 매칭 유닛은 질의 특징 기술자와 상기 복수의 기준 특징 기술자들 각각 간의 거리를 컴퓨팅하고; 클러스터링 알고리즘(clustering algorithm)에 따라 컴퓨팅된 거리들의 하나 이상의 제 1 그룹 및 컴퓨팅된 거리들의 제 2 그룹을 결정하고 - 상기 컴퓨팅된 거리들의 제 1 그룹은 상기 복수의 기준 특징 기술자들의 연관된 것이 상기 컴퓨팅된 거리들의 제 2 그룹 내에 있는 것으로 결정된 컴퓨팅된 거리들의 것들에 비해 상기 질의 특징 기술자에 대해 근처에 있다고 표시하는 컴퓨팅된 거리들의 것들을 포함하고 컴퓨팅된 거리들의 제 2 그룹은 상기 복수의 기준 특징 기술자들의 연관된 것이 상기 컴퓨팅된 거리들의 제 1 그룹 내에 있는 것으로 결정된 컴퓨팅된 거리들의 것들에 비해 상기 질의 특징 기술자로부터 떨어져 있다고 표시하는 컴퓨팅된 거리들의 것들을 포함함 - ; 및 상기 컴퓨팅된 거리들의 결정된 제 1 그룹 및 상기 컴퓨팅된 거리들의 제 2 그룹에 기초하여 상기 질의 특징 기술자가 상기 컴퓨팅된 거리들 중 최소의 것과 연관되는 복수의 기준 특징 기술자들 중 하나에 매칭하는지를 결정하는 인터페이스를 포함한다. As another example, a system includes a client device that sends a query feature descriptor by a search query to initiate a visual search, a plurality of reference query descriptors, and a visual search server device that performs the visual search. . The visual search server includes an interface and a feature matching unit to receive a query feature descriptor by a search query, the feature matching unit computing a distance between a query feature descriptor and each of the plurality of reference feature descriptors; Determine one or more first groups of computed distances and a second group of computed distances according to a clustering algorithm, wherein the first group of computed distances is associated with the plurality of reference feature descriptors. Including those of computed distances indicating that the query feature descriptor is near relative to those of the computed distances determined to be within a second group of distances, and wherein the second group of computed distances are associated with the plurality of reference feature descriptors. Includes those of computed distances indicating that the one is farther from the query feature descriptor compared to those of the computed distances determined to be within the first group of computed distances; And based on the determined first group of computed distances and the second group of computed distances, determine whether the query feature descriptor matches one of a plurality of reference feature descriptors associated with the least of the computed distances. Contains the interface to

[0012] 하나 이상의 예들의 상세들은 아래의 첨부 도면 및 설명에서 기술된다. 다른 특징들, 목적들 및 이점들은 설명 및 도면 및 청구범위로부터 자명하게 될 것이다.
Details of one or more examples are set forth in the accompanying drawings and the description below. Other features, objects, and advantages will be apparent from the description and drawings, and from the claims.

도 1은 본 개시에서 기술되는 강건한 특징 기술자 매칭 기법들을 구현하는 영상 프로세싱 시스템을 예시하는 블록도.
도 2는 도 1의 특징 매칭 유닛을 보다 상세히 예시하는 블록도.
도 3a 및 도 3b는 본 개시에서 기술되는 특징 매칭 기법들을 구현하는데 있어 비주얼 탐색 서버의 예시적인 동작을 예시하는 흐름도.
도 4는 키포인트 추출을 수행하는데 이용하기 위한 DoG(difference of Gaussian) 피라미드를 특징 추출 유닛이 결정하는 프로세스를 예시하는 다이어그램.
도 5는 DoG(difference of Gaussian) 피라미드를 결정한 이후 키포인트의 검출을 예시하는 다이어그램.
도 6은 특징 추출 유닛이 그라디언트 분포 및 방위 히스토그램을 결정하는 프로세스를 예시하는 다이어그램.
도 7은 특징 추출 유닛이 그라디언트 분포 및 방위 히스토그램을 결정하는 프로세스를 예시하는 다이어그램.1 is a block diagram illustrating an image processing system implementing the robust feature descriptor matching techniques described in this disclosure.
2 is a block diagram illustrating the feature matching unit of FIG. 1 in more detail.
3A and 3B are flow diagrams illustrating exemplary operation of a visual search server in implementing the feature matching techniques described in this disclosure.
4 is a diagram illustrating a process by which a feature extraction unit determines a difference of Gaussian (DoG) pyramid for use in performing keypoint extraction.
5 is a diagram illustrating detection of a keypoint after determining a difference of Gaussian (DoG) pyramid.
6 is a diagram illustrating a process by which a feature extraction unit determines a gradient distribution and an orientation histogram.
7 is a diagram illustrating a process by which a feature extraction unit determines a gradient distribution and an orientation histogram.

[0020] 도 1은 본 개시에서 기술되는 강건한 특징 매칭 기법들을 구현하는 영상 프로세싱 시스템(10)을 예시하는 블록도이다. 도 1의 예에서, 영상 프로세싱 시스템(10)은 클라이언트 디바이스(12), 비주얼 탐색 서버(14) 및 네트워크(16)를 포함한다. 클라이언트 디바이스(12)는 이 예에서, 랩톱, 이른바 넷북, 개인 휴대 정보 단말(PDA), 셀룰러 또는 모바일 전화 또는 핸드셋(이른바 "스마트폰들"로 불림), GPS(global positioning system) 디바이스, 디지털 카메라, 디지털 미디어 재생기, 게임 디바이스, 또는 비주얼 탐색 서버(14)와 통신할 수 있는 임의의 다른 모바일 디바이스와 같은 모바일 디바이스를 이 예에서 표현한다. 본 개시에서 비주얼 탐색 서버(14)에 대해 기술되지만, 본 개시에서 기술되는 기법들은 비주얼 탐색 서버들로 제한되어선 안 된다. 대신, 기법들은 로컬 특징-기반 비주얼 탐색 알고리즘들의 특징 매칭 양상들을 구현할 수 있는 임의의 디바이스에 의해 구현될 수 있다. FIG. 1 is a block diagram illustrating an image processing system 10 implementing the robust feature matching techniques described in this disclosure. In the example of FIG. 1, image processing system 10 includes client device 12, visual search server 14, and network 16. The client device 12 may in this example be a laptop, a so-called netbook, a personal digital assistant (PDA), a cellular or mobile phone or handset (called "smartphones"), a global positioning system (GPS) device, a digital camera. , A mobile device such as a digital media player, a game device, or any other mobile device capable of communicating with the visual search server 14 is represented in this example. Although described with respect to visual search server 14 in this disclosure, the techniques described in this disclosure should not be limited to visual search servers. Instead, the techniques may be implemented by any device capable of implementing feature matching aspects of local feature-based visual search algorithms.

[0021] 비주얼 탐색 서버(14)는 TCP(transmission control protocol) 접속의 형태로 통상적으로 접속들을 수락하는 서버 디바이스를 표현하고 질의 데이터를 수신하고 식별 데이터를 제공하는 근거가 되는 TCP 세션을 형성하도록 그의 자신의 TCP 접속에 응답한다. 비주얼 탐색 서버(14)는 하나 이상의 기준 영상들 내의 하나 이상의 특징들 또는 객체들을 식별하기 위해 로컬 특징-기반 비주얼 탐색 알고리즘을 수행하거나 다른 방식으로 구현한다는 점에서, 비주얼 탐색 서버(14)는 비주얼 탐색 서버 디바이스를 표현할 수 있다. The visual search server 14 typically represents a server device that accepts connections in the form of a transmission control protocol (TCP) connection and establishes a TCP session upon which it receives a query data and provides identification data. Respond to its own TCP connection. The visual search server 14 performs or otherwise implements a local feature-based visual search algorithm to identify one or more features or objects in one or more reference images. Represents a server device.

[0022] 네트워크(16)는 인터넷과 같이, 클라이언트 디바이스(12)와 비주얼 탐색 서버(14)를 상호접속하는 공개 네트워크를 표현하지만, 네트워크(16)는 또한 사설 네트워크일 수 있다. 흔히, 네트워크(16)는 클라이언트 디바이스(12)와 비주얼 탐색 서버(14) 간에 데이터 또는 통신의 전달을 용이하게 하기 위한 OSI(open system interconnection) 모델의 다양한 층들을 구현한다. 네트워크(16)는 통상적으로 클라이언트 디바이스(12)와 비주얼 탐색 서버(14) 간에 데이터의 전달을 가능하게 하기 위해 스위치들, 허브들, 라우터들, 서버들과 같은 임의의 수의 네트워크 디바이스들을 포함한다. 단일의 네트워크로서 도시되지만, 네트워크(16)는 네트워크(16)를 형성하기 위해 상호접속되는 하나 이상의 서브-네트워크들을 포함할 수 있다. 이들 서브-네트워크들은 서비스 제공자 네트워크들, 액세스 네트워크들, 백엔드 네트워크들 또는 네트워크(16) 전체에 걸쳐서 데이터의 전달을 제공하기 위해 공개 네트워크에서 흔히 이용되는 임의의 다른 타입의 네트워크를 포함할 수 있다. 이 예에서 공개 네트워크로서 기술되지만, 네트워크(16)는 대중에 의해 일반적으로 액세스 가능하지 않은 사설 네트워크를 포함할 수 있다. Network 16 represents a public network that interconnects client device 12 and visual search server 14, such as the Internet, but network 16 may also be a private network. Often, the network 16 implements various layers of an open system interconnection (OSI) model to facilitate the transfer of data or communications between the client device 12 and the visual search server 14. Network 16 typically includes any number of network devices such as switches, hubs, routers, servers to enable the transfer of data between client device 12 and visual search server 14. . Although shown as a single network, network 16 may include one or more sub-networks that are interconnected to form network 16. These sub-networks may include service provider networks, access networks, back end networks, or any other type of network commonly used in public networks to provide for the transfer of data throughout network 16. Although described as a public network in this example, network 16 may comprise a private network that is not generally accessible by the public.

[0023] 도 1의 예에서 도시된 바와 같이, 클라이언트 디바이스(12)는 특징 추출 유닛(18), 특징 압축 유닛(20), 인터페이스(22) 및 디스플레이(24)를 포함한다. 특징 추출 유닛(18)은 스케일 불변 특징 변형 알고리즘 또는 특징들의 추출을 제공하는 임의의 다른 특징 기술 추출 알고리즘과 같은 특징 추출 알고리즘에 따라 특징 추출을 수행하는 유닛을 표현한다. 일반적으로, 특징 추출 유닛(18)은 클라이언트 디바이스(12) 내에 포함되는 카메라 또는 다른 영상 캡처 디바이스(도 1의 예에서 도시되지 않음)를 이용하여 로컬적으로 캡처될 수 있는 영상 데이터(26) 상에서 동작한다. 대안적으로 클라이언트 디바이스(12)는 국부적으로 다른 컴퓨팅 디바이스와의 유선 접속을 통해 또는 통신의 임의의 다른 유성 또는 무성 형태를 통해 네트워크(16)로부터 이러한 영상 데이터(26)를 다운로딩함으로써 그 영상 데이터를 스스로 캡처하지 않고 영상 데이터(26)를 저장할 수 있다. As shown in the example of FIG. 1, the client device 12 includes a feature extraction unit 18, a feature compression unit 20, an interface 22, and a display 24. Feature extraction unit 18 represents a unit that performs feature extraction in accordance with a feature extraction algorithm, such as a scale invariant feature modification algorithm or any other feature description extraction algorithm that provides for the extraction of features. In general, feature extraction unit 18 is on image data 26 that can be captured locally using a camera or other image capture device (not shown in the example of FIG. 1) included within client device 12. It works. Alternatively, client device 12 may download such image data 26 from network 16 via a wired connection locally with another computing device or through any other voiced or unvoiced form of communication. Can store the image data 26 without capturing by itself.

[0024] 아래에서 보다 상세히 기술되지만, 특징 추출 유닛(18)은 요약하면, 2개의 연속적인 가우시안-블러링된 영상들(Gaussian-blurred images)을 생성하기 위해 영상 데이터(26)를 가우시안 블러링함으로써 특징 기술자(28)를 추출할 수 있다. 가우시안 블러링은 일반적으로 정의된 스케일에서 가우시안 블러 함수로 영상 데이터(26)를 컨볼브(convolve)하는 것을 포함한다. 특징 추출 유닛(18)은 영상 데이터(26)를 증분적으로 컨볼브할 수 있고, 여기서 결과적인 가우시안-블러링된 영상들은 스케일 공간에서 상수만큼 서로로부터 분리된다. 특징 추출 유닛(18)은 "가우시안 피라미드" 또는 "가우시안 피라미드의 차이"로서 지칭될 수 있는 것을 형성하도록 이들 가우시안-블러링된 영상들을 적층(stack)할 수 있다. 특징 추출 유닛(18)은 DoG(difference of Gaussian) 영상들을 생성하도록 2개의 연속적으로 적층된 가우시안-블러링된 영상들을 비교한다. DoG 영상들은 "DoG 공간"으로서 지칭되는 것을 형성할 수 있다. Although described in more detail below, feature extraction unit 18 summarizes Gaussian blurring image data 26 to produce two consecutive Gaussian-blurred images. By doing so, the feature descriptor 28 can be extracted. Gaussian blurring generally involves convolving image data 26 with a Gaussian blur function at a defined scale. Feature extraction unit 18 may incrementally convolve image data 26, where the resulting Gaussian-blurred images are separated from each other by a constant in scale space. Feature extraction unit 18 may stack these Gaussian-blurred images to form what may be referred to as "Gaussian pyramids" or "differences of Gaussian pyramids." The feature extraction unit 18 compares two consecutively stacked Gaussian-blurred images to produce difference of Gaussian (DoG) images. DoG images may form what are referred to as “DoG spaces”.

[0025] 이러한 DoG 공간에 기초하여 특징 추출 유닛(18)은 키포인트(keypoint)들을 검출하며, 여기서 키포인트는 잠재적으로 지오메트리 관점에서 흥미로운 영상 데이터(26) 내의 특정한 샘플 지점 또는 픽셀 주위의 영역 또는 픽셀들의 패치(patch)를 지칭한다. 일반적으로 특징 추출 유닛(18)은 구성된 DoG 공간에서 로컬 최대 및/또는 로컬 최소로서 키포인트들을 식별한다. 특징 추출 유닛(18)은 이어서 키포인트가 검출된 패치에 대한 로컬 영상 그라디언트의 방향에 기초하여 하나 이상의 방위들, 또는 방향들을 이들 키포인트들에 할당한다. 이들 방위들을 특성화하기 위해, 특징 추출 유닛(18)은 그라디언트 방위 히스토그램(gradient orientation histogram)들의 견지에서 방위를 결정할 수 있다. 특징 추출 유닛(18)은 이어서 위치 및 방위(예를 들어, 그라디언트 방위 히스토그램에 의해)으로서 특징 기술자(28)를 정의한다. 특징 기술자(28)를 정의한 이후, 특징 추출 유닛(18)은 이 특징 기술자(28)를 특징 압축 유닛(20)에 출력한다. 통상적으로, 특징 추출 유닛(18)은 이러한 프로세스를 이용하여 특징 기술자들(28)의 세트를 특징 압축 유닛(20)에 출력한다.Based on this DoG space, feature extraction unit 18 detects keypoints, where the keypoints are of a region or pixels around a particular sample point or pixel in image data 26 that is potentially interesting in terms of geometry. Refers to a patch. In general, feature extraction unit 18 identifies keypoints as a local maximum and / or a local minimum in the configured DoG space. Feature extraction unit 18 then assigns one or more orientations, or directions, to these keypoints based on the direction of the local image gradient for the patch from which the keypoint was detected. To characterize these orientations, feature extraction unit 18 may determine orientation in terms of gradient orientation histograms. Feature extraction unit 18 then defines feature descriptor 28 as a position and orientation (eg, by a gradient orientation histogram). After defining the feature descriptor 28, the feature extraction unit 18 outputs this feature descriptor 28 to the feature compression unit 20. Typically, feature extraction unit 18 uses this process to output a set of feature descriptors 28 to feature compression unit 20.

[0026]특징 압축 유닛(20)은 이들 특징 기술자들을 정의하기 위해 특징 추출 유닛(18)에 의해 이용된 데이터의 양에 대해, 특징 기술자들(28)과 같은 특징 기술자들을 정의하는데 이용되는 데이터의 양을 압축하거나 또는 다른 방식으로 감소시키는 유닛을 표현한다. 특징 기술자들(28)을 압축하기 위해, 특징 압축 유닛(20)은 특징 기술자들(28)을 압축하기 위해 타입 양자화(type quantization)로서 지칭되는 양자화의 형태를 수행할 수 있다. 이것에 관하여, 특징 기술자들(28)에 의해 정의된 히스토그램 그 전체를 송신하기 보단 오히려, 특징 압축 유닛(20)은 이른바 "타입"으로서 히스토그램을 표현하도록 타입 양자화를 수행한다. 일반적으로 타입은 히스토그램의 압축된 표현이다(예를 들어, 타입이 전체 히스토그램 보단 히스토그램의 형상을 표현하는 경우). 타입은 일반적으로 심볼들의 주파수들의 세트를 표현하고, 히스토그램의 맥락에서, 히스토그램의 그라디언트 분포의 주파수를 표현할 수 있다. 즉, 타입은 특징 기술자(28)의 대응하는 것을 생성한 소스의 진정한 분포의 추정을 표현한다. 이것에 관하여, 타입의 인코딩 및 전송은 분포의 형상을 인코딩하고 전송하는 것과 등가인 것으로 간주될 수 있는데, 그 이유는 그것이 특정한 샘플에 기초한 추정일 수 있기 때문이다(즉, 이 예에서 특징 기술자들(28)의 대응하는 것에 의해 정의된 히스토그램임). The feature compression unit 20 is responsible for the amount of data used by the feature extraction unit 18 to define these feature descriptors, of the data used to define feature descriptors, such as feature descriptors 28. Represents a unit that compresses or otherwise reduces the amount. To compress the feature descriptors 28, the feature compression unit 20 may perform a form of quantization referred to as type quantization to compress the feature descriptors 28. In this regard, rather than transmitting the entire histogram defined by the feature descriptors 28, the feature compression unit 20 performs type quantization to represent the histogram as a so-called “type”. In general, a type is a compressed representation of a histogram (for example, if the type represents the shape of the histogram rather than the entire histogram). The type generally represents a set of frequencies of symbols, and in the context of the histogram, can represent the frequency of the gradient distribution of the histogram. That is, the type represents an estimate of the true distribution of the source that produced the corresponding of the feature descriptors 28. In this regard, the encoding and transmission of a type may be considered equivalent to encoding and transmitting the shape of the distribution because it may be an estimate based on a particular sample (ie, feature descriptors in this example). A histogram defined by the corresponding of (28)).

[0027] 특징 기술자들(28) 및 양자화의 레벨(" n"으로서 여기서 수학적으로 표시될 수 있음)이 정해지면, 특징 압축 유닛(20)은 특징 기술자들(28) 각각에 대해 파라미터들(k1, ..., km)(여기서 m은 치수들의 수를 표시함)을 갖는 타입을 컴퓨팅한다. 각각의 타입은 정해진 공통 분모를 갖는 유리수들의 세트를 표현하며, 여기서 유리수들은 1로 합산된다. 특징 기술자들(28)은 이어서 사전 편찬상 열거(lexicographic enumeration)를 이용한 인덱스로서 이러한 타입을 인코딩할 수 있다. 즉, 정해진 공통 분모를 갖는 모든 가능한 타입들에 대해, 특징 압출 유닛(20)은 이러한 타입들의 사전 편찬상의 순서에 기초하여 이들 타입들 각각에 인덱스를 효율적으로 할당한다. 특징 압출 유닛(20)은 그에 의해 단일의 사전 편찬상으로 배열된 인덱스들로 특징 기술자들(28)을 압축하고 질의 데이터(30)의 형태의 이들 압축된 특징 기술자들을 인터페이스(22)에 출력한다. Once the feature descriptors 28 and the level of quantization (which may be represented mathematically herein as “n”) are determined, the feature compression unit 20 may determine parameters k1 for each of the feature descriptors 28. , ..., km), where m represents the number of dimensions. Each type represents a set of rational numbers with a given common denominator, where the rational numbers sum to one. Feature descriptors 28 may then encode this type as an index using lexicographic enumeration. That is, for all possible types with a given common denominator, the feature extrusion unit 20 efficiently assigns an index to each of these types based on the precomputation order of these types. Feature extrusion unit 20 thereby compresses feature descriptors 28 into a single precompiled index and outputs these compressed feature descriptors in the form of query data 30 to interface 22. .

[0028] 인터페이스(22)는 무선 네트워크 및 유선 네트워크를 포함하는 네트워크(16)를 통해 비주얼 탐색 서버(14)와 통신할 수 있는 임의의 타입의 인터페이스를 표현한다. 인터페이스(22)는 무선 셀룰러 인터페이스를 표현하고 비주얼 탐색 서버(14)와 네트워크(16)를 통해 그리고 네트워크(16)와 무선 셀룰러 네트워크를 통해 통신하기 위해, 필수 하드웨어 또는 안테나, 변조기들 등과 같은 다른 컴포넌트들을 포함할 수 있다. 이 예에서, 도 1의 예에서 도시되진 않았지만, 네트워크(16)는 무선 셀룰러 인터페이스(22)가 네트워크(16)와 통신하는 무선 셀룰러 액세스 네트워크를 포함한다. 디스플레이(24)는 영상 데이터(26), 또는 임의의 다른 타입들의 데이터와 같이 영상들을 디스플레이할 수 있는 임의의 타입의 디스플레이 유닛을 표현한다. 디스플레이(24)는 예를 들어, LED(light emitting diode) 디스플레이 디바이스, OLED(organic LED) 디스플레이 디바이스, LCD(liquid crystal display) 디바이스, 플라즈마 디스플레이 디바이스 또는 임의의 다른 타입의 디스플레이 디바이스를 표현할 수 있다. The interface 22 represents any type of interface capable of communicating with the visual search server 14 over a network 16 including a wireless network and a wired network. The interface 22 represents the wireless cellular interface and communicates with the visual search server 14 and the network 16 and through the network 16 and the wireless cellular network, the necessary hardware or other components such as antennas, modulators, and the like. Can include them. In this example, although not shown in the example of FIG. 1, the network 16 includes a wireless cellular access network with which the wireless cellular interface 22 communicates with the network 16. Display 24 represents any type of display unit capable of displaying images, such as image data 26, or any other type of data. Display 24 may represent, for example, a light emitting diode (LED) display device, an organic LED (OLED) display device, a liquid crystal display (LCD) device, a plasma display device, or any other type of display device.

[0029] 비주얼 탐색 서버(14)는 인터페이스(32), 특징 재구성 유닛(34), 특징 매칭 유닛(36) 및 특징 기술자 데이터베이스(38)를 포함한다. 비주얼 탐색 서버(14)의 인터페이스(32)는 인터페이스(32)가 네트워크(16)와 같은 네트워크와 통신할 수 있는 임의의 타입의 인터페이스를 표현할 수 있다는 점에서 클라이언트 디바이스(12)의 인터페이스(22)와 유사하다. 특징 재구성 유닛(34)은 압축된 특징 기술자들로부터 특징 기술자들을 재구성하기 위해 압축된 특징 기술자들을 압축해제하는 유닛을 표현한다. 압축 재구성 유닛(34)은 압축된 특징 기술자들로부터 특징 기술자들을 재구성하기 위해 특징 재구성 유닛(34)이 양자화의 역(종종 재구성으로서 지칭됨)을 수행한다는 점에서 특징 압축 유닛(20)에 의해 수행된 동작들에 역으로 동작들을 수행할 수 있다. The visual search server 14 includes an interface 32, a feature reconstruction unit 34, a feature matching unit 36, and a feature descriptor database 38. The interface 32 of the visual search server 14 can represent any type of interface through which the interface 32 can communicate with a network, such as the network 16. Similar to The feature reconstruction unit 34 represents a unit that decompresses the compressed feature descriptors to reconstruct the feature descriptors from the compressed feature descriptors. Compression reconstruction unit 34 is performed by feature compression unit 20 in that feature reconstruction unit 34 performs the inverse of quantization (often referred to as reconstruction) to reconstruct feature descriptors from compressed feature descriptors. It is possible to perform operations in reverse to the specified operations.

[0030] 특징 매칭 유닛(36)은 재구성된 특징 기술자들에 기초하여 영상 데이터(26)에서 하나 이상의 특징들 또는 객체들을 식별하기 위해 특징 매칭을 수행하는 유닛을 표현한다. 특징 매칭 유닛(36)은 이 특징 식별을 수행하기 위해 특징 기술자 데이터베이스(38)에 액세스할 수 있으며, 여기서 특징 기술자 데이터베이스(38)는 특징 기술자들을 정의하고 영상 데이터(26)로부터 추출된 대응하는 특징 또는 객체를 포함하는 기준 영상들에 이 특징 기술자들 중 적어도 일부를 연관시키는 데이터를 저장한다. 이들 기준 영상들은 또한 기준 영상들의 하나 이상의 주제들, 특징들 또는 객체들을 식별하는 식별 데이터와 연관될 수 있다. 데이터베이스(38)는 압축된 k-차원 트리(KD 트리)를 이용하여 이 데이터를 저장할 수 있다. The feature matching unit 36 represents a unit that performs feature matching to identify one or more features or objects in the image data 26 based on the reconstructed feature descriptors. Feature matching unit 36 can access feature descriptor database 38 to perform this feature identification, where feature descriptor database 38 defines feature descriptors and corresponding features extracted from image data 26. Or store data associating at least some of these feature descriptors with reference images containing the object. These reference images may also be associated with identification data that identifies one or more subjects, features, or objects of the reference images. The database 38 can store this data using a compressed k-dimensional tree (KD tree).

[0031] 재구성된 특징 기술자들(이 데이터는 비주얼 탐색 또는 질의를 수행하는데 이용되는 비주얼 탐색 질의 데이터를 표현한다는 점에서 "질의 데이터 40"으로서 여기서 또한 지칭될 수 있음)에 기초하여 영상 데이터(26)로부터 추출된 특징 또는 객체를 성공적으로 식별하면, 특징 매칭 유닛(36)은 질의 결과 데이터(42)로서 하나 이상의 매칭 기준 영상들 및 임의의 연관된 식별 데이터를 리턴한다. [0031] Image data 26 based on reconstructed feature descriptors (this data may also be referred to herein as “query data 40” in that it represents visual search query data used to perform a visual search or query). Upon successful identification of the extracted feature or object, feature matching unit 36 returns one or more matching reference images and any associated identification data as query result data 42.

[0032] 초기에, 클라이언트 디바이스(12)의 사용자는 비주얼 탐색들 시작하기 위해 클라이언트 디바이스(12)와 상호작용한다. 사용자는 질의 영상 데이터(26)를 선택하기 위해 디스플레이(24)에 의해 제시되는 사용자 인터페이스 또는 다른 타입의 인터페이스와 상호작용하고 이어서 질의 영상 데이터(26)로서 저장된 영상의 포커스인 하나 이상의 특징들 또는 객체들을 식별하기 위해 비주얼 탐색을 개시할 수 있다. 예를 들어, 질의 영상 데이터(26)는 피사의 사탑과 같은 랜드마크의 영상을 특정할 수 있다. 사용자는 클라이언트 디바이스(12)의 영상 캡처 유닛(예를 들어, 카메라)을 이용하여 이 영상을 캡처하거나, 또는 대안적으로 네트워크(16)로부터 또는 다른 컴퓨팅 디바이스와의 유선 또는 무선 접속을 통해 로컬적으로 영상을 다운로드할 수 있다. 임의의 이벤트 시에, 질의 영상 데이터(26)를 선택한 이후, 사용자는 이 예에서, 랜드마크를 식별하기 위한 비주얼 탐색을 개시한다. Initially, a user of client device 12 interacts with client device 12 to begin visual searches. The user interacts with the user interface or other type of interface presented by display 24 to select query image data 26 and then one or more features or objects that are the focus of the image stored as query image data 26. May initiate a visual search to identify them. For example, the query image data 26 may specify an image of a landmark such as a leaning tower of Pisa. The user captures this image using an image capture unit (eg, a camera) of the client device 12, or alternatively locally from the network 16 or via a wired or wireless connection with another computing device. You can download the video. At any event, after selecting the query image data 26, the user, in this example, initiates a visual search to identify the landmark.

[0033] 비주얼 탐색의 개시에 응답하여, 클라이언트 디바이스(12)는 적어도 하나의 특징 기술자(28) 및 통상적으로 질의 영상 데이터(26)의 분석을 통해 발견된 이른바 "키포인트들" 중 하나를 기술하는 다수의 특징 기술자들(28)을 추출하기 위해 특징 추출 유닛(18)을 시동(invoke)한다. 특징 추출 유닛(18)은 이 질의 특징 기술자(28)를, 질의 특징 기술자(28)를 압축하고 질의 데이터(30)를 생성하도록 진행하는 특징 압축 유닛(20)에 포워딩한다. 특징 압축 유닛(2)은 네트워크(16)를 통해 비주얼 탐색 서버(14)에 질의 데이터(30)를 포워딩하는 인터페이스(22)에 질의 데이터(30)를 출력한다. In response to the initiation of the visual search, the client device 12 describes one of the so-called “keypoints” found through the analysis of at least one feature descriptor 28 and typically query image data 26. Invoke the feature extraction unit 18 to extract a number of feature descriptors 28. The feature extraction unit 18 forwards this query feature descriptor 28 to the feature compression unit 20 which proceeds to compress the query feature descriptor 28 and generate the query data 30. The feature compression unit 2 outputs the query data 30 to the interface 22 which forwards the query data 30 to the visual search server 14 via the network 16.

[0034] 비주얼 탐색 서버(14)의 인터페이스(32)는 질의 데이터(30)를 수신한다. 질의 데이터(30)의 수신에 응답하여, 비주얼 탐색 서버(14)는 특징 재구성 유닛(34)을 시동한다. 특징 재구성 유닛(34)은 질의 데이터(30)에 기초하여 질의 특징 기술자(28)를 재구성하고 재구성된 특징 기술자들(40)을 출력한다. 특징 매칭 유닛(36)은 재구성된 질의 특징 기술자들(30)을 수신하고 질의 특징 기술자들(40)에 기초하여 특징 매칭을 수행한다. 특징 매칭 유닛(36)은 질의 특징 기술자들(40) 각각에 대해, 특징 기술자 데이터베이스(38)를 액세스하고 실질적으로 매칭 특징 기술자를 식별하기 위해 특징 기술자 데이터베이스(38)에 의해 데이터로서 저장된 기준 특징 기술자들을 트래버싱(traversing)함으로써 특징 매칭을 수행한다. 재구성된 질의 특징 기술자들(40)에 기초하여 영상 데이터(26)로부터 추출된 특징을 성공적으로 식별하면, 특징 매칭 유닛(36)은 질의 결과 데이터(42)로서 매칭 기준 특징 기술자들과 연관된 하나 이상의 매칭 기준 영상들 및 임의의 연관된 식별 데이터를 출력한다. 인터페이스(32)는 이러한 질의 결과 데이터(42)를 수신하고 네트워크(16)를 통해 클라이언트 디바이스(12)에 질의 결과 데이터(42)를 포워딩한다. The interface 32 of the visual search server 14 receives the query data 30. In response to receiving the query data 30, the visual search server 14 starts the feature reconstruction unit 34. The feature reconstruction unit 34 reconstructs the query feature descriptor 28 based on the query data 30 and outputs the reconstructed feature descriptors 40. The feature matching unit 36 receives the reconstructed query feature descriptors 30 and performs feature matching based on the query feature descriptors 40. The feature matching unit 36 stores, for each of the query feature descriptors 40, a reference feature descriptor stored as data by the feature descriptor database 38 to access the feature descriptor database 38 and substantially identify the matching feature descriptor. Feature matching is performed by traversing them. Upon successfully identifying the feature extracted from the image data 26 based on the reconstructed query feature descriptors 40, the feature matching unit 36 may be one or more associated with the matching criteria feature descriptors as the query result data 42. Matching reference images and any associated identification data are output. The interface 32 receives this query result data 42 and forwards the query result data 42 to the client device 12 via the network 16.

[0035] 클라이언트 디바이스(12)의 인터페이스(22)는 이 질의 결과 데이터(42)를 수신하고 일반적으로 탐색 질의를 시동한 모든 애플리케이션에 이 질의 결과 데이터(42)를 포워딩한다. 즉, 클라이언트 디바이스(12)는 통상적으로 비주얼 탐색을 시동하고 질의 결과 데이터(42)와 같은 리턴된 질의 결과 데이터의 제시(presentation)를 관리할 수 있는 하나 이상의 애플리케이션들을 실행한다. 이 애플리케이션은 질의 결과 데이터(42)를 제시하도록 디스플레이(24)와 인터페이스할 수 있다. 몇몇 인터페이스들에서, 애플리케이션은 질의 결과 데이터(42)를 증가시키기 위해 질의 결과 데이터(42)에 의해 정의된 식별 데이터에 기초하여 부가적인 정보를 리트리브(retrieve)하도록 인터넷 탐색 또는 다른 동작들을 수행할 수 있다. 이 인스턴스에서, 질의 결과 데이터(42)의 식별 데이터는 랜드마크의 명칭, 즉, 이 예에서, 피사의 사탑, 피사의 사탑을 건설한 건설자의 명칭, 피사의 사탑의 완성 데이터, 및 이 랜드마크와 관련된 임의의 다른 정보를 포함할 수 있다. The interface 22 of the client device 12 receives this query result data 42 and generally forwards this query result data 42 to all the applications that initiated the search query. That is, client device 12 typically executes one or more applications that can initiate a visual search and manage the presentation of returned query result data, such as query result data 42. The application may interface with the display 24 to present the query result data 42. In some interfaces, the application may perform an internet search or other operations to retrieve additional information based on the identification data defined by the query result data 42 to increment the query result data 42. have. In this instance, the identification data of the query result data 42 is the name of the landmark, that is, in this example, the Leaning Tower of Pisa, the name of the builder who built the Leaning Tower of Pisa, the completion data of the Leaning Tower of Pisa, and this landmark It may include any other information related to.

[0036] 비주얼 탐색이 제한된 입력 매커니즘들(이를 테면, 사용자에 의한 텍스트의 입력을 제한하거나 좌절시킬 수 있는 햅틱 피드백(haptic feedback)을 거의 제공하지 않는 가상 키보드) 및/또는 더 작은 스크린들을 갖는 클라이언트 디바이스들과의 사용자 상호작용을 용이하게 하고 다양한 애플리케이션들을 용이하게 하는 견지에서 다수의 이점들을 제공할 수 있지만, 몇몇 인스턴스들에서, SIFT(scale invariant feature transform) 알고리즘과 같은 몇몇 비주얼 탐색 알고리즘들은 특징 매칭의 수행의 견지에서 부족할 수 있다. 이들 부족들을 예시하기 위해 SIFT 알고리즘은 빌딩(예를 들어, 피사의 사탑)에 걸쳐서 반복되는 특유의 아치들 또는 창문들과 같이 탐색 및 기준 영상들의 반복 특징들로부터 질의 특징 기술자 및 기준 특징 기술자들 각각이 추출될 때, 그렇지 않았다면 인스턴스들에서 탐색 또는 질의 특징 기술자에 매칭했었을 기준 특징 기술자를 폐기할 수 있다. 또한, SIFT 알고리즘은 흔히 임의의 정해진 비주얼 탐색에 응답하여 단일의 영상만을 리턴하며, 여기서 이 리턴된 영상은 SIFT 알고리즘에 의해 알고리즘적으로 "최상의 매칭"인 것으로 결정된다. 그러나 사용자들은 무엇이 SIFT 알고리즘과 동일한 방식으로 "최상의 매칭"를 구성할지 결정할 수 없을 수 있고, 이는 단일의 SIFT 최상의 매칭 결과가 사용자의 기대들에 매칭하지 않을 수 있기 때문에 사용자의 좌절을 초래할 수 있다. [0036] Client with visual navigation limited input mechanisms (such as a virtual keyboard that provides little or no haptic feedback that can limit or frustrate input of text by a user) and / or a client with smaller screens While it may provide a number of advantages in terms of facilitating user interaction with devices and facilitating a variety of applications, in some instances, some visual search algorithms, such as scale invariant feature transform (SIFT) algorithms, may feature match. May be lacking in terms of the performance of To illustrate these deficiencies, the SIFT algorithm uses query feature descriptors and reference feature descriptors, respectively, from repetitive features of search and reference images, such as unique arches or windows that are repeated across a building (eg, the Leaning Tower of Pisa). When this is extracted, it may discard the reference feature descriptor that would otherwise have matched the search or query feature descriptor in the instances. In addition, the SIFT algorithm often returns only a single image in response to any given visual search, where the returned image is determined to be "best match" algorithmically by the SIFT algorithm. However, users may not be able to determine what constitutes "best match" in the same way as the SIFT algorithm, which may cause user frustration because a single SIFT best matching result may not match the user's expectations.

[0037] 보다 구체적으로, SIFT 알고리즘을 포함하는 비주얼 탐색 알고리즘들 "유사성 측정(similarity measure)"으로서 지칭될 수 있는 것을 이용하여 특징 매칭을 구현할 수 있다. SIFT 알고리즘은 기준 특징 기술자들의 각각에 대해 질의 특징 기술자가 얼마나 유사한지를 특정하기 위해 거리 비 테스트(distance ratio test)를 이용한다. 이 거리 비는 질의 특징 기술자들의 현재의 것의 최근접 기준 특징 기술자가 질의 특징 기술자들의 현재의 것의 제 2 최근접(특징 기술자 공간에서) 특징 기술자에 대해 얼마나 멀리(재차 특징 기술자 공간에서) 있는지를 측정한다. 최근접 및 제 2 최근접 기준 특징 기술자가 매우 근접(임계치에 의해 식별된 바와 같이)한 경우, SIFT 알고리즘은 질의 특징 기술자의 현재의 것이 고유하지 않고 그에 따라 매칭이 신뢰할 수 있지 않다고 결정한다. 거리 비 유사성 측정은 다음의 수학식(1)에 의해 표현될 수 있다. More specifically, feature matching may be implemented using what may be referred to as visual search algorithms “similarity measure” including a SIFT algorithm. The SIFT algorithm uses a distance ratio test to specify how similar the query feature descriptors are to each of the reference feature descriptors. This distance ratio measures how far (in the feature descriptor space) the nearest reference feature descriptor of the current feature of the query feature descriptors with respect to the second nearest (in the feature descriptor space) feature of the query feature descriptors' current one. do. If the nearest and second nearest reference feature descriptors are very close (as identified by the threshold), the SIFT algorithm determines that the current of the query feature descriptor is not unique and therefore the matching is not reliable. The distance dissimilarity measurement can be expressed by the following equation (1).

수학식(1)에서, q는 질의 기술자를 표현하고, nn₁은 데이터베이스 내의 최근접 기준 특징 기술자("가장 근처의 이웃(nearest neighbor)" 또는 "nn")으로서 또한 지칭될 수 있음)를 표현하고, nn₂는 데이터베이스 내의 제 2 최근접 기준 특징 기술자("제 2 가장 근처의 이웃"으로서 지칭될 수 있음)를 표현하고, d는 놈 연산(norm operation)을 표현하고, T는 임계치를 표현한다. In equation (1), q represents the query descriptor and nn ₁ represents the nearest reference feature descriptor (which may also be referred to as "nearest neighbor" or "nn") in the database. Nn ₂ represents a second nearest reference feature descriptor (which may be referred to as a “second closest neighbor”) in the database, d represents a norm operation, and T represents a threshold do.

[0038] SIFT 거리 비 유사성 측정이 통상적으로 적합한 결과들을 제공하지만, SIFT 거리 비 유사성 측정은 특정한 인스턴스들에서 매칭 기준 특징 기술자를 올바르게 식별하는데 실패할 수 있다. 예를 들어, 영상이 반복 구조들(이를 테면, 도 7의 예에서 도시된 피사의 사탑의 반복되는 창문 구조들)을 포함하는 인스턴스들에서, SIFT 거리 비 유사성 측정은 그의 2개의 최근접 기준 특징 기술자들(피사의 사탑을 도시하는 영상 데이터로부터 또한 추출됨)간의 거리가 작을 수 있다는 것을 고려하면, 이러한 반복되는 영상 특징들로부터 추출된 질의 특징 기술자들이 고유하지 않다고 결정할 수 있다. 이 인스턴스에서, SIFT 거리 비 유사성 측정은 잠재적으로 올바른 매칭을 거절할 수 있다. Although the SIFT distance dissimilarity measure typically provides suitable results, the SIFT distance dissimilarity measure may fail to correctly identify the matching criteria feature descriptor in certain instances. For example, in instances where the image includes repeating structures (such as repeating window structures of the Leaning Tower of Pisa shown in the example of FIG. 7), the SIFT distance dissimilarity measure is characterized by its two nearest reference features. Given that the distance between the descriptors (also extracted from the image data showing the Leaning Tower of Pisa) may be small, one may determine that the query feature descriptors extracted from these repeated image features are not unique. In this instance, the SIFT distance dissimilarity measure can potentially reject correct matching.

[0039] 본 개시에서 기술된 기법들에 따라, 비주얼 탐색 서버(14)의 특징 매칭 유닛(36)은 반복 구조들을 수용하고 이들 및 다른 인스턴스들에서 모다 정확한 매칭을 다른 방식으로 용이하게 하는 더 많은 강건한 거리 비 유사성 측정을 이용한다. 또한, 기법들은 종래의 시스템들에서 공통적인 바와 같이 단일의 "최상의 매칭" 기준 영상을 단순히 리턴하기 보다는 오히려 "최상의 매칭"의 사용자 지각 및 알고리즘식 결정에서의 차이들을 수용하도록 기준 영상들의 순서화된 랭킹(ordered ranking)을 제공할 수 있다. In accordance with the techniques described in this disclosure, the feature matching unit 36 of the visual search server 14 accommodates repeating structures and more in other ways to facilitate accurate matching in all of these and other instances. We use a robust distance dissimilarity measure. In addition, the techniques order the ranking of reference images to accommodate differences in user perception and algorithmic determination of "best matching" rather than simply returning a single "best matching" reference image as is common in conventional systems. (ordered ranking) can be provided.

[0040] 본 개시에서 기술된 기법들에 따라 거리 비 유사성 측정을 구현하는데 있어서, 특징 매칭 유닛(36)은 재구성된 질의 특징 기술자들(40) 중 하나와 특징 기술자 데이터베이스(38)에 저장된 복수의 기준 특징 기술자들 각각 간의 거리를 우선 컴퓨팅할 수 있다. 특징 매칭 유닛(36)은 이어서 컴퓨팅된 거리들의 하나 이상의 제 1 그룹 및 컴퓨팅된 거리들의 제 2 그룹을 클러스터링 알고리즘(clustering algorithm)을 이용하여 결정할 수 있다. 공통적은 클러스터링 알고리즘들은 k-평균 클러스터링 알고리즘(여기서 k는 이 예에서 제 1 및 제 2 그룹들을 생성하기 위해 2로 세팅됨), 가우시안 필터링 알고리즘 및 그래픽 컷팅 알고리즘을 포함할 수 있다. In implementing the distance dissimilarity measurement in accordance with the techniques described in this disclosure, the feature matching unit 36 may include a plurality of reconstructed query feature descriptors 40 and a plurality of stored in the feature descriptor database 38. The distance between each of the reference feature descriptors may first be computed. The feature matching unit 36 may then determine one or more first groups of computed distances and a second group of computed distances using a clustering algorithm. Common clustering algorithms may include a k-means clustering algorithm, where k is set to 2 in this example to generate the first and second groups, a Gaussian filtering algorithm, and a graphical cutting algorithm.

[0041] 몇몇 인스턴스들에서, 클러스터링 알고리즘은 가증 근처의 또는 최소 거리를 제 1 그룹으로 그리고 복수의 다음의 가장 근처의 또는 최소 거리들을 제 2 그룹으로 클러스터링하는 알고리즘을 포함할 수 있으며, 여기서 제 2 그룹의 복수의 다음의 가장 근처의 또는 최소 거리들의 수는 예를 들어, 2, 3, 또는 4개의 다음의 가장 근처의 거리들을 포함할 수 있다. k-평균 클러스터링 알고리즘과 같은 보다 형식적인 클러스터링 알고리즘들에 관하여 여기서 기술되지만, 2개 의상의 그룹들(여기서 제 1 그룹은 하나 이상의 거리들을 포함하고 제 2 그룹은 2개 이상의 거리들을 포함함)을 결정하는 임의의 알고리즘은 본 개시에서 기술되는 기법들에 관하여 활용될 수 있다. In some instances, the clustering algorithm may include an algorithm that clusters the near or minimal distance into the first group and the plurality of next nearest or minimum distances into the second group, wherein the second The number of the plurality of next nearest or minimum distances of the group may include, for example, two, three, or four next nearest distances. More formal clustering algorithms, such as the k-means clustering algorithm, are described herein, but include two groups of clothing, where the first group contains one or more distances and the second group contains two or more distances. Any algorithm that determines may be utilized with respect to the techniques described in this disclosure.

[0042] 클러스터링 알고리즘을 이용하면, 특징 매칭 유닛(36)은 컴퓨팅된 거리들 중 2개 의상의 제 1 그룹을 결정할 수 있어서, 제 1 그룹은 특징 기술자 데이터베이스(38)에 저장된 복수의 기준 특징 기술자들 중 연관된 하나가 컴퓨팅된 거리들 의 제 2 그룹 내에 있는 것으로 결정된 컴퓨팅된 거리들의 것들에 관하여 질의 특징 기술자에 대해 근처에 있게 된다. 또한, 이 클러스터링 알고리즘을 이용하면, 특징 매칭 유닛(36)은 컴퓨팅된 차이들의 제 2 그룹을 결정할 수 있어서, 이러한 제 2 그룹은 특징 기술자 데이터베이스(38)에 저장된 복수의 기준 특징 기술자들 중 연관된 하나가 컴퓨팅된 거리들의 제 2 그룹 내에 있는 것으로 결정된 컴퓨팅된 거리들의 것들에 관하여 질의 특징 기술자들(30) 중 현재의 것으로부터 떨어져 있게 된다. Using the clustering algorithm, the feature matching unit 36 may determine a first group of two garments of the computed distances, such that the first group includes a plurality of reference feature descriptors stored in the feature descriptor database 38. The associated one of them is near to the query feature descriptor with respect to those of the computed distances determined to be within the second group of computed distances. In addition, using this clustering algorithm, the feature matching unit 36 can determine a second group of computed differences such that the second group is an associated one of a plurality of reference feature descriptors stored in the feature descriptor database 38. Is away from the current of the query feature descriptors 30 with respect to those of the computed distances determined to be within the second group of computed distances.

[0043] 비주얼 탐색 서버(14)의 특징 매칭 유닛(36)은 이어서 질의 특징 기술자들(40) 중 현재의 것이 컴퓨팅된 거리들의 결정된 제 1 그룹 및 컴퓨팅된 거리들의 제 2 그룹에 기초하여 컴퓨팅된 거리들의 최소의 것과 연관된 복수의 기준 특징 기술자들 중 하나에 매칭하는지를 결정할 수 있다. 예를 들어, 특징 매칭 유닛(36)은 제 1 그룹 거리 평균을 생성하기 위해 제 1 그룹 내에 있는 것으로 결정된 컴퓨팅된 거리들의 평균을 우선 컴퓨팅하고, 마찬가지로 제 2 그룹 거리 평균을 생성하기 위해 제 2 그룹 내에 있는 것으로 결정된 컴퓨팅된 거리들의 평균을 컴퓨팅할 수 있다. 제 1 및 제 2 그룹 거리 평균들의 컴퓨팅 시에, 특징 매칭 유닛(36)은 평균 거리 비 측정을 생성하기 위해 제 2 그룹 거리 평균으로 제 1 그룹 거리 평균을 나눌 수 있다. 특징 매칭 유닛(36)은 다음으로 임계값에 평균 거리 비 측정을 비교하고 비교에 기초하여 질의 특징 기술자들(40)의 현재의 것이 컴퓨팅된 거리들 중 최소의 것과 연관된 복수의 기준 특징 기술자들 중 하나에 매칭하는지를 결정할 수 있다. The feature matching unit 36 of the visual search server 14 is then computed based on the determined first group of computed distances and the current one of the query feature descriptors 40 based on the determined first group of computed distances. One may determine whether to match one of the plurality of reference feature descriptors associated with the minimum of distances. For example, feature matching unit 36 first computes an average of the computed distances determined to be within the first group to produce a first group distance mean, and likewise produces a second group distance mean. Compute the average of computed distances determined to be within. Upon computing the first and second group distance averages, feature matching unit 36 may divide the first group distance average by the second group distance average to produce an average distance ratio measurement. The feature matching unit 36 next compares the average distance ratio measurement to the threshold and based on the comparison, the current of the query feature descriptors 40 is one of a plurality of reference feature descriptors associated with the least of the computed distances. It can be determined whether to match one.

[0044] 이러한 방식으로 컴퓨팅된 거리들을 클러스터링하거나 다른 방식으로 그룹핑하고 이어서 각각의 그룹의 거리들을 평균화함으로써, 기법들은 서로 근처에 있고 양자가 판에 박은 듯이 공통적인(common conventionally) 바와 같이 질의 특징 기술자에 매칭하는 임의의 기준 특징 기술자들을 완전히 거절함 없이 가까운 반복 특징들로부터 추출된 특징 기술자들의 매칭을 수용할 수 있다. 클러스터링 알고리즘을 이용하는 거리 측정들의 그룹핑은 또한 근처의 또는 멀리 있는 특징 기술자들 간의 더 명확한 판별을 달성할 수 있어서, 비주얼 탐색 서버(14)의 특징 매칭 유닛(36)은 거리를 이용하여 서로로부터 특유의 또는 고유한 특징 기술자들을 올바르게 식별할 수 있다. 거리 측정들을 평균화함으로써, 특징 매칭 유닛(36)은 임계치를 올바르게 적용하고 그에 의해 매칭 기준 특징 기술자로 고려되는 것의 거절을 방지하도록 이들 2개의 그룹들이 서로로부터 얼마나 멀리 떨어져 있는지에 관한 상대적 근접 근사를 제공할 수 있다. By clustering or otherwise grouping the distances computed in this manner and then averaging the distances of each group, the techniques are in the vicinity of each other and the query feature descriptor as common conventionally as if both are plated. It is possible to accept a match of feature descriptors extracted from nearby repeating features without completely rejecting any reference feature descriptors that match. Grouping of distance measurements using a clustering algorithm can also achieve a clearer distinction between nearby or distant feature descriptors, such that feature matching unit 36 of visual search server 14 uses distances to distinguish unique from each other. Or unique feature descriptors can be correctly identified. By averaging distance measurements, feature matching unit 36 provides a relative approximation of how far apart these two groups are from each other to correctly apply the threshold and thereby avoid rejection of what is considered a matching criteria feature descriptor. can do.

[0045] 기법들의 매칭 영상 양상들의 순서화된 또는 랭크된 리스트를 구현하기 위해 비주얼 탐색 서버(14)의 특징 매칭 유닛(36)은 컴퓨팅된 거리들의 제 2 그룹에 비교하여 질의 특징 기술자들(40)의 현재의 것에 비교적 근처에 있는 컴퓨팅된 거리들의 제 1 그룹들을 활용한다. 위에서 기술된 보다 강건한 거리 유사성 측정을 이용하여 매칭하는 것으로 결정되고 제 1 그룹에 있는 것으로 결정된 모든 컴퓨팅된 거리들에 대해, 특징 매칭 유닛(36)은 제 1 그룹의 컴퓨팅된 거리들에 대응하는 매칭 기준 특징 기술자들과 연관되는 특유의 영상들에 보트(vote)를 할당할 수 있다. 즉, 제 1 그룹의 컴퓨팅된 거리들에 대응하는 기준 특징 기술자들의 다수의 것들이 동일한 기준 영상으로부터 추출되는 경우, 특징 매칭 유닛(36)은 단일의 보트를 그 기준 영상에 할당한다. To implement an ordered or ranked list of matching image aspects of the techniques, feature matching unit 36 of visual search server 14 compares query feature descriptors 40 to a second group of computed distances. Utilize first groups of computed distances that are relatively near to the current of. For all computed distances determined to match and determined to be in the first group using the more robust distance similarity measure described above, the feature matching unit 36 matches the corresponding correspondence to the computed distances in the first group. A boat may be assigned to unique images associated with the reference feature descriptors. That is, if multiple ones of the reference feature descriptors corresponding to the computed distances of the first group are extracted from the same reference image, the feature matching unit 36 assigns a single boat to that reference image.

[0046] 예를 들어, 특징 매칭 유닛(36)은 기준 영상들의 그룹이 어떤 복제 기준 영상들도 포함하지 않도록 고유한 기준 영상들의 그룹을 결정할 수 있다. 특징 매칭 유닛(36)은 제 1 그룹에 있는 것으로 결정된 컴퓨팅된 거리들이 컴퓨팅된 기준 특징 기술자들을 먼저 고려할 수 있으며, 여기서 이들 기준 특징 기술자들은 "제 1 그룹 기준 특징 기술자들"로서 지칭될 수 있다. 각각의 기준 특징 기술자는 특징 기술자 데이터베이스(38)를 초기에 구축할 때 그 기준 특징 기술자가 추출된 기준 영상(특징 기술자 데이터베이스(38) 또는 도 1의 예에서 도시되지 않은 몇몇 다른 데이터베이스, 메모리 또는 저장 유닛에 또한 저장될 수 있음)과 연관된다. 특징 매칭 유닛(36)은 이어서 제 1 그룹 기준 특징 기술자들과 연관된 그러한 기준 영상들로서 기준 영상들의 제 1 그룹을 결정할 수 있다. 특징 매칭 유닛(36)은 이어서 기준 영상들의 그룹이 어떠한 복제 기준 영상들도 포함하지 않도록 고유한 기준 영상들의 그룹을 생성하기 위해 기준 영상들이 초기 그룹으로부터 임의의 복제 기준 영상들을 제거할 수 있다. For example, feature matching unit 36 may determine a group of unique reference images such that the group of reference images does not include any duplicate reference images. The feature matching unit 36 may first consider reference feature descriptors for which computed distances determined to be in the first group have been computed, where these reference feature descriptors may be referred to as “first group reference feature descriptors”. When each reference feature descriptor initially builds the feature descriptor database 38, the reference feature descriptor is extracted from the reference image (feature descriptor database 38 or some other database, memory or storage not shown in the example of FIG. 1). May also be stored in the unit). Feature matching unit 36 may then determine a first group of reference images as those reference images associated with the first group reference feature descriptors. Feature matching unit 36 may then remove any duplicate reference images from the initial group of reference images to create a group of unique reference images such that the group of reference images does not contain any duplicate reference images.

[0047] 특징 매칭 유닛(36)은 이어서 고유한 기준 영상들의 그룹 내에 있는 것으로 결정된 기준 영상들 각각에 보트를 할당할 수 있다. 보트는 1과 같은 상수일 수 있다. 대안적으로 보트는 최근접 기준 특징 기술자 및 질의 특징 기술자들(40) 중 현재의 것 간의 컴퓨팅된 거리에 비견되는 현재의 컴퓨팅된 거리의 거리 비에 비례할 수 있다. 다른 대안으로서, 보트는 제 1 그룹의 컴퓨팅된 거리들이 최소로부터 최대까지 순서화될 때 제 1 그룹 내의 랭킹에 비례할 수 있다(예를 들어, 제 1 그룹 기준 특징 기술자들 중 하나가 질의 특징 기술자들(40)의 현재의 것에 5번째 최근접일 때, 보트는 1/5일 수 있음). 특징 매칭 유닛(36)은 이어서 질의 특징 기술자들(40) 각각에 대해 이러한 방식으로 보트들을 할당한다. 이러한 방식으로 기준 또는 타겟 영상들에 보트들이 할당된 이후, 특징 매칭 유닛(36)은 이어서 수집된 보트들에 기초하여 기준 또는 타겟 영상들을 랭크하고 질의 결과 데이터(42)의 형태로 질의 데이터(30)에 응답하여 타겟 영상들의 랭크 또는 순서화된 리스트를 제공할 수 있다. The feature matching unit 36 may then assign a boat to each of the reference images determined to be within a group of unique reference images. The boat can be a constant equal to one. Alternatively, the boat may be proportional to the distance ratio of the current computed distance compared to the computed distance between the nearest one of the nearest reference feature descriptor and query feature descriptors 40. As another alternative, the boat may be proportional to the ranking in the first group when the computed distances of the first group are ordered from minimum to maximum (eg, one of the first group reference feature descriptors may be a query feature descriptor. When the fifth nearest to the current of 40, the boat may be one fifth). The feature matching unit 36 then assigns the boats in this manner for each of the query feature descriptors 40. After the boats have been assigned to the reference or target images in this manner, the feature matching unit 36 then ranks the reference or target images based on the collected boats and query data 30 in the form of query result data 42. ) May provide a rank or ordered list of target images.

[0048] 다양한 컴포넌트들, 모듈들 또는 유닛들이 개시된 기법들을 수행하도록 구성된 디바이스들의 기능적 양상들을 강조하기 위해 본 개시에서 기술되지만, 이들 유닛들이 반드시 상이한 하드웨어 유닛들에 의한 실현을 요구하는 것은 아니다. 오히려, 다양한 유닛들은 하드웨어 유닛에서 조합되거나, 컴퓨터-판독 가능한 매체들에 저장된 적합한 소프트웨어 및/또는 펌웨어와 함께, 위에서 기술된 바와 같은 하나 이상의 프로세서들을 포함하는 상호-동작 가능한 하드웨어의 모음에 의해 제공될 수 있다. 이러한 것에 관하여, 본 개시들에서 유닛들에 대한 참조는 별개의 하드웨어 유닛들 및/또는 하드웨어 및 소프트웨어 유닛들로서 구현될 수 있거나 구현될 수 없는 상이한 기능적 유닛들을 제안하는 것으로 의도된다. 여기서 이용되는 바와 같은 구문 "컴퓨터 판독 가능한 매체"는 단지 제조물들에 적용되며 일시적인 전파 신호들에 적용되지 않는다. Although various components, modules, or units are described in this disclosure to emphasize functional aspects of devices configured to perform the disclosed techniques, these units do not necessarily require realization by different hardware units. Rather, the various units may be provided by a collection of inter-operable hardware, including one or more processors as described above, in combination with the hardware unit, or in combination with suitable software and / or firmware stored on computer-readable media. Can be. In this regard, reference to units in the present disclosures is intended to suggest different functional units that may or may not be implemented as separate hardware units and / or hardware and software units. The phrase “computer readable medium” as used herein applies only to articles of manufacture and not to transient propagation signals.

[0049] 도 2는 도 1의 예에서 도시된 특징 매칭 유닛(36)을 보다 상세히 예시하는 블록도이다. 도 2의 예에서, 특징 매칭 유닛(36)은 거리 컴퓨팅 유닛(distance computation unit; 50), 그룹핑 유닛(52), 매칭 유닛(54) 및 결과 생성기 유닛(56)을 포함한다. 거리 컴퓨팅 유닛(50)은 질의 특징 기술자들(40) 및 특징 기술자 데이터베이스(38)에 저장된 기준 특징 기술자들(51)에 기초하여 거리들(62A 내지 62N)("거리 62")을 컴퓨팅하는 유닛이다. 보다 구체적으로, 거리 컴퓨팅 유닛(50)은 질의 특징 기술자들(40) 중 현재의 것으로부터 기준 특징 기술자들(51) 각각을 차감한 절댓값으로서 질의 특징 기술자들(40) 각각에 대한 거리들(62)을 컴퓨팅하는 유닛을 표현할 수 있다. 거리 컴퓨팅 유닛(50)은 거리들(62)을 그룹핑 유닛(52)에 출력한다. FIG. 2 is a block diagram illustrating in more detail the feature matching unit 36 shown in the example of FIG. 1. In the example of FIG. 2, the feature matching unit 36 includes a distance computation unit 50, a grouping unit 52, a matching unit 54 and a result generator unit 56. The distance computing unit 50 is a unit that computes the distances 62A-62N (“distance 62”) based on the reference feature descriptors 51 stored in the query feature descriptors 40 and the feature descriptor database 38. to be. More specifically, the distance computing unit 50 calculates the distances 62 for each of the query feature descriptors 40 as an absolute value subtracting each of the reference feature descriptors 51 from the current one of the query feature descriptors 40. ) Can be expressed as a unit for computing. The distance computing unit 50 outputs the distances 62 to the grouping unit 52.

[0050] 그룹핑 유닛(52)은 거리들(62)과 같은 거리들 또는 2개의 상의 값들의 임의의 다른 비-엠티(non-empty) 세트를 수신하고 클러스터링 알고리즘(64)에 따라 이 값들을 적어도 2개의 그룹들로 그룹핑하는 유닛을 표현한다. 클러스터링 알고리즘(64)은 k-평균 클러스터링 알고리즘(여기서 k는 이 예에서 제 1 및 제 2 그룹들을 생성하기 위해 2로 세팅됨), 가우시안 필터링 알고리즘 및 그래픽 컷팅 알고리즘은 물론 값들의 세트로부터 적어도 2개의 그룹들을 생성할 수 있는 임의의다른 공통 클러스터링 알고리즘 중 하나 이상을 표현한다. 도 2의 예에서, 그룹핑 유닛(52)은 거리들(62)을 수신하고 클러스터링 알고리즘(64)에 따라 거리들(62) 중 상이한 것들의 그룹들(66A, 66B("그룹들(66)")을 컴퓨팅한다. 그룹핑 유닛(52)은 컴퓨팅된 거리들(62)의 2개 이상의 그룹(66a)을 결정할 수 있어서, 이 제 1 그룹(66A)은 특징 기술자 데이터베이스(38)에 저장된 기준 특징 기술자들(51) 중 연관된 하나가 컴퓨팅된 거리들(62)의 그룹(66B) 내에 있을 것으로 결정된 컴퓨팅된 거리들(62)의 것들에 관하여 질의 특징 기술자들(40)의 현재의 것에 대해 근처에 있음을 표시하는 컴퓨팅된 거리들(62)의 것들을 포함한다. 또한, 클러스터링 알고리즘(64)에 따라, 그룹핑 유닛(52)은 컴퓨팅된 거리들(62)의 그룹(66B)을 결정할 수 있어서, 이 제 2 그룹(66B)은 컴퓨팅된 거리들의 제 1 그룹 내에 있는 것으로 결정된 컴퓨팅된 거리들의 것들에 관하여 질의 특징 기술자들(40)의 현재의 것으로부터 떨어져 있음을 표시하는 컴퓨팅된 거리들(62)의 것들을 포함한다. 그룹핑 유닛(52)은 거리들(62)의 그룹들(66)을 매칭 유닛(54)에 출력한다. Grouping unit 52 receives any other non-empty set of distances, such as distances 62, or values of two phases, and at least these values in accordance with clustering algorithm 64. Represents a unit grouping into two groups. Clustering algorithm 64 is a k-average clustering algorithm, where k is set to 2 in this example to generate the first and second groups, a Gaussian filtering algorithm and a graphical cutting algorithm as well as at least two from a set of values. Represent one or more of any other common clustering algorithm that can create groups. In the example of FIG. 2, grouping unit 52 receives distances 62 and groups 66A, 66B (“groups 66”) of different ones of distances 62 according to clustering algorithm 64. The grouping unit 52 can determine two or more groups 66a of computed distances 62 so that the first group 66A is a reference feature descriptor stored in the feature descriptor database 38. The associated one of the 51 is near to the current one of the query feature descriptors 40 with respect to those of the computed distances 62 determined to be within the group 66B of the computed distances 62. And those of computed distances 62. In addition, in accordance with the clustering algorithm 64, the grouping unit 52 may determine a group 66B of computed distances 62, so that Two groups 66B are those of computed distances determined to be within the first group of computed distances. With respect to those of the computed distances 62 indicating a distance from the current one of the query feature descriptors 40. The grouping unit 52 matches the groups 66 of the distances 62. Output to unit 54.

[0051] 매칭 유닛(54)은 질의 특징 기술자들(40) 중 하나 이상이 기준 특징 기술자들(51) 중 하나 이상에 고유하게 매칭하는지를 결정하는 유닛을 표현한다. 질의 특징 기술자들(40)의 현재의 것이 기준 특징 기술자들(51) 중 하나 이상에 고유하게 매칭하는지를 결정하기 위해, 매칭 유닛(54)은 통상적으로 제공된 그룹들 각각의 값들의 평균을 컴퓨팅한다. 도 2의 예에서, 매칭 유닛(54)은 그룹(66A)의 거리들(62) 및 그룹(66B)의 거리들(62)을 각각 평균화함으로써 그룹 평균(68A, 68B)을 컴퓨팅한다. 다음으로, 매칭 유닛(54)은 그룹 평균(68B)으로 그룹 평균(68A)을 나누고 결과를 임계치(70)에 비교한다. 즉, 매칭 유닛(54)은 일반적으로 클러스터링 알고리즘(64)을 이용하여 형성된 거리들(62)의 그룹들(66)로부터 컴퓨팅된 그룹 평균들에 대해서만 SIFT 알고리즘에 의해 현재 수행된(및 수학식(1)으로서 위에서 도시된) 것에 대한 유사한 비교를 수행한다. Matching unit 54 represents a unit that determines whether one or more of query feature descriptors 40 uniquely matches one or more of reference feature descriptors 51. To determine if the current of query feature descriptors 40 uniquely matches one or more of the reference feature descriptors 51, matching unit 54 typically computes an average of the values of each of the provided groups. In the example of FIG. 2, matching unit 54 computes group averages 68A, 68B by averaging the distances 62 of group 66A and the distances 62 of group 66B, respectively. Next, the matching unit 54 divides the group average 68A by the group average 68B and compares the result to the threshold 70. That is, the matching unit 54 is generally performed by the SIFT algorithm only for group averages computed from groups 66 of distances 62 formed using the clustering algorithm 64 (and the equation ( Similar comparisons are made with 1) as shown above).

[0052] 그룹 평균(68B)으로 그룹 평균(68A)을 나눈 결과가 임계치 보다 큰 경우(그룹(66B)과 연관된 기준 특징 기술자들(51)이 그룹(66A)과 연관된 것에 근접하다는 것을 의미함), 매칭 유닛(54)은 특징 기술자들(40) 중 현재의 것이 기준 특징 기술자들(51) 중 어느 것에도 고유하게 매칭하지 않는다는 것을 표시하는 매칭 표시자(72)를 출력한다. 용어 "고유한 매칭(unique match)"는 매칭이 존재하지만 질의 영상 데이터의 식별을 용이하게 할 수 없고 그에 따라 매칭이 고유한 것으로 간주되지 않는 비주얼 탐색을 수행하는 맥락에서의 특징 매칭의 양상들을 지칭하도록 여기서 이용된다. 매칭은 매칭이 질의 영상 데이터의 식별을 용이하게 하는 경우 고유하다. 예를 들어, 피사의 사탑 상의 아치들은 이 랜드마크를 고유하게 식별할 수 있는 질의 특징 기술자들로서 피사의 사탑의 질의 영상으로부터 추출될 수 있다. 피사의 사탑의 기준 영상에서 이러한 아치들로부터 추출된 기준 특징 기술자들은 매칭할 수 있다. 고유성의 좁은 정의 및 이들 아치들의 반복 성질을 고려하는 종래의 SIFT 알고리즘에서 공통적인 바와 같은 이들 매칭 기준 영상 특징 기술자들을 거절하기 보단 오히려, 매칭 유닛(54)은 매칭 유닛(54)이 동작하는 데이터의 클러스터링 성질로 인해 반복에도 불구하고 이들이 고유하게 매칭한다고 적절히 결정할 수 있다. 이 고유한 매칭은 위에서 기술된 유사성 측정으로서 또한 지칭될 수 있다. 고유한 매칭들은, 이들이 매칭 유닛(54)이 비-고유 매칭들을 거절하는 원인인 비-고유한 매칭들보다 더 빨리 질의 영상 데이터의 식별을 용이하게 하기 때문에 바람직하다. When the result of dividing the group average 68A by the group average 68B is greater than the threshold (meaning that the reference feature descriptors 51 associated with the group 66B are close to those associated with the group 66A) The matching unit 54 outputs a matching indicator 72 indicating that the current of the feature descriptors 40 does not uniquely match any of the reference feature descriptors 51. The term “unique match” refers to aspects of feature matching in the context of performing a visual search where a match exists but cannot facilitate the identification of query image data and thus the match is not considered unique. Is used here. Matching is unique when the matching facilitates identification of query image data. For example, the arches on the Leaning Tower of Pisa can be extracted from the Leaning Tower's query image as query feature descriptors that can uniquely identify this landmark. Reference feature descriptors extracted from these arches in the reference image of the leaning tower of Pisa can be matched. Rather than rejecting these matching reference image feature descriptors as is common in conventional SIFT algorithms that take into account the narrow definition of uniqueness and the repetitive nature of these arches, the matching unit 54 may not be able to match the data on which the matching unit 54 operates. Due to the clustering nature, one can properly determine that they match uniquely despite repetition. This unique match may also be referred to as the similarity measure described above. Unique matches are desirable because they facilitate the identification of query image data faster than non-unique matches, which causes the matching unit 54 to reject non-unique matches.

[0053] 임의의 이벤트 시에, 그룹 평균(68B)으로 그룹 평균(68A)을 나눈 결과가 임계치 미만인 경우(그룹(66B)과 연관된 기준 특징 기술자들(51)이 그룹(66A)과 연관된 것들로부터 떨어져 있음을 의미함), 매칭 유닛(54)은 특징 기술자들(40)의 현재의 것이 거리들(62) 중 최소의 것과 연관되는 기준 특징 기술자들(51) 중 하나와 고유하게 매칭한다는 것을 표시하는 매칭 표시자(72)를 출력한다. 매칭 유닛(54)은 이 매칭 표시자(72)를 출력하고 그룹(66A)을 결과 생성기 유닛(56)으로 포워딩한다. 몇몇 인스턴스들에서, 매칭 유닛(54)은 매칭 유닛(54)이 어떠한 매칭도 존재하지 않는다고 결정할 때 결과 생성기 유닛(56)이 그룹(66A)을 고려할 필요가 없기 때문에, 매칭 표시자(72)가 매칭하는 경우에만 그룹(66A)을 포워딩한다. At any event, if the result of dividing the group average 68A by the group average 68B is below the threshold (the reference feature descriptors 51 associated with the group 66B may be from those associated with the group 66A). Matching unit 54 indicates that the current of feature descriptor 40 uniquely matches one of the reference feature descriptors 51 associated with the least of the distances 62. The matching indicator 72 is outputted. Matching unit 54 outputs this matching indicator 72 and forwards group 66A to result generator unit 56. In some instances, the matching indicator 54 is not matched because the result generator unit 56 does not need to consider the group 66A when the matching unit 54 determines that no match exists. Only group 66A is forwarded if there is a match.

[0054] 결과 생성기 유닛(56)은 질의 결과 데이터(42)를 생성하는 유닛을 표현한다. 결과 생성기 유닛(56)은 리스트 형성 유닛(58) 및 보트 할당 유닛(60)을 포함한다. 리스트 형성 유닛(58)은 그룹(66A)에 기초하여 고유한 기준 영상들(74)의 리스트를 형성하는 유닛을 표현한다. 보다 구체적으로, 리스트 형성 유닛(58)은 매칭 표시자(72)가 매칭을 시그널링할 때, 그룹(66A)의 거리들(62)이 컴퓨팅된 기준 특징 기술자들(51)의 것들을 결정하도록 그룹(66A)을 프로세싱할 수 있다. 리스트 생성 유닛(58)은 이어서 기준 특징 기술자들(51)의 이러한 것들 각각이 추출된 기준 영상들(53)을 리트리브하고 초기 기준 영상 리스트(76)를 형성한다. 리스트 형성 유닛(58)은 이어서 고유한 기준 영상 리스트(74)를 생성하기 위해 초기 기준 영상 리스트(76)로부터 임의의 중복 영상들을 제거할 수 있다. Result generator unit 56 represents a unit that generates query result data 42. The result generator unit 56 includes a list forming unit 58 and a boat assignment unit 60. The list forming unit 58 represents a unit for forming a list of unique reference images 74 based on the group 66A. More specifically, the list forming unit 58 determines that the distances 62 of the group 66A determine those of the computed reference feature descriptors 51 when the matching indicator 72 signals a match. 66A). The list generating unit 58 then retrieves the reference images 53 from which each of these of the reference feature descriptors 51 has been extracted and forms the initial reference image list 76. The list forming unit 58 may then remove any duplicate images from the initial reference image list 76 to generate a unique reference image list 74.

[0055] 초기 기준 영상 리스트(76)를 형성하는 것으로서 기술되지만, 리스트 형성 유닛(58)은 이러한 리스트(76)를 반드시 형성할 필요는 있는 것이 아니라 대신 기준 영상들(53)의 고유한 것들 만을 리트리브할 수 있다. 즉, 리스트 형성 유닛(58)은 기준 특징 기술자들(51)의 결정된 것들이 추출된 기준 영상들(53)을 리트리브할 때, 먼저 그것이 이미 영상들(53)의 이러한 하나를 리트리브하였는지를 결정하고 이어서 그것이 영상들(53)의 이러한 하나를 이미 검색하지 않은 경우에만 영상들(53)의 이러한 하나를 리트리브할 수 있다. 영상들(53)의 사본 또는 복제들을 저장하는 것을 방지함으로써, 리스트 형성 유닛(58)은 메모리 소비를 방지할 수 있고, 이는 구현자가 메모리 요건들을 감소하는 것을 가능하게 할 수 있다. 이러한 이유로, 기법들은 고유한 기준 영상 리스트(74)를 컴퓨팅하는 임의의 하나의 방식으로 제한되어선 안 된다. 이로서, 리스트 형성 유닛(58)이 고유한 기준 영상 리스트(74)를 어떻게 컴퓨팅하는지에 무관하게, 리스트 형성 유닛(58)은 보트 할당 유닛(60)에 고유한 기준 영상 리스트(74)를 출력한다. Although described as forming the initial reference image list 76, the list forming unit 58 does not necessarily need to form such a list 76, but instead only the unique ones of the reference images 53. It can retrieve it. That is, when the list forming unit 58 retrieves the extracted reference images 53 where the determined ones of the reference feature descriptors 51 have been extracted, it first determines whether it has already retrieved this one of the images 53 and then it is This one of the images 53 can be retrieved only if this one of the images 53 has not already been retrieved. By preventing storing copies or copies of the images 53, the list forming unit 58 can prevent memory consumption, which can enable an implementer to reduce memory requirements. For this reason, the techniques should not be limited to any one way of computing the unique reference picture list 74. As such, regardless of how the list forming unit 58 computes the unique reference image list 74, the list forming unit 58 outputs the unique reference image list 74 to the boat assignment unit 60. .

[0056] 보트 할당 유닛(60)은 기준 영상들(53)의 하나 이상에 보트들을 할당하는 유닛을 표현한다. 통상적으로, 보트 할당 유닛(60)은 보트 집계 표(78) 내의 엔트리(entry)에 이들 영상들(53) 각각을 연관시킴으로써 고유한 기준 영상 리스트(74)에 의해 식별되는 영상들(53)에 보트들을 할당한다. 보트 할당 유닛(60)은 각각의 엔트리의 보트 카운트를 증가시키도록 영상들(53)과 연관된 각각의 엔트리를 업데이트할 수 있다. 보트 할당 유닛(60)은 위에서 기술된 방식들을 포함해서 임의의 수의 방식들로 보트들을 결정할 수 있다. 즉, 보트는 1과 같은 상수 값일 수 있다. 대안적으로, 보트는 최근접 기준 특징 기술자와 질의 특징 기술자들(40) 중 현재의 것 간의 컴퓨팅된 거리에 비견되는 현재의 컴퓨팅된 거리의 거리 비에 비례할 수 있다. 다른 대안으로서, 보트는 제 1 그룹의 컴퓨팅된 거리들이 최소로부터 최대까지 순서화될 때 제 1 그룹 내의 랭킹에 비례할 수 있다(예를 들어, 제 1 그룹 기준 특징 기술자들 중 하나가 질의 특징 기술자들(40)의 현재의 것에 5번째 최근접일 때, 보트는 1/5일 수 있음).The boat assignment unit 60 represents a unit for assigning boats to one or more of the reference images 53. Typically, the boat assignment unit 60 associates each of these images 53 with an entry in the boat tally table 78 to the images 53 identified by the unique reference image list 74. Allocate boats. Boat assignment unit 60 may update each entry associated with images 53 to increase the boat count of each entry. Boat assignment unit 60 may determine the boats in any number of ways, including those described above. That is, the boat may be a constant value such as one. Alternatively, the boat may be proportional to the distance ratio of the current computed distance compared to the computed distance between the nearest reference feature descriptor and the query feature descriptors 40 present. As another alternative, the boat may be proportional to the ranking in the first group when the computed distances of the first group are ordered from minimum to maximum (eg, one of the first group reference feature descriptors may be a query feature descriptor. When the fifth nearest to the current of 40, the boat may be one fifth).

[0057] 거리 컴퓨팅 유닛(50), 그룹핑 유닛(52), 매칭 유닛(54), 및 결과 생성기 유닛(56)은 질의 특징 기술자들(40) 모두가 이 특징 매칭 프로세스를 경험할 때까지 위에서 기술된 방식으로 지속될 수 있다. 질의 특징 기술자들(40)의 마지막 것이 프로세싱된 후에, 결과 생성기 유닛(56)은 보트 집계 표(78)에 기초하여 질의 결과 데이터(42)를 구성한다. 예를 들어, 결과 생성기 유닛(56)은 임의의 제한 또는 임계치까지 최다 보트들, 제 2의 최다 보트들, 제 3의 최다 보트들 등을 수신한 기준 영상들(53)의 것들을 리트리브하고 이들 영상들(53)을 (특징 기술자 데이터베이스(38) 내의 이들 영상들 각각을 기술하거나 또는 다른 방식으로 연관되는 메타데이터 또는 다른 데이터에 따라) 질의 결과 데이터(42)로서 출력할 수 있다. 대안적으로, 결과 생성기 유닛(56)은 보트를 수신한 영상들(53) 중 임의의 것으로서 (재차, 특징 기술자 데이터베이스(38) 내의 이들 영상들 각각을 기술하거나 또는 다른 방식으로 연관되는 메타데이터 또는 다른 데이터에 따라) 질의 결과 데이터(42)를 형성할 수 있다.The distance computing unit 50, the grouping unit 52, the matching unit 54, and the result generator unit 56 are described above until the query feature descriptors 40 all experience this feature matching process. It can last in a way. After the last of the query feature descriptors 40 has been processed, the result generator unit 56 constructs the query result data 42 based on the boat aggregation table 78. For example, the result generator unit 56 retrieves those of the reference images 53 that received the most boats, the second most boats, the third most boats, etc. up to any limit or threshold and these images. Fields 53 may be output as query result data 42 (depending on metadata or other data describing or otherwise associated with each of these images in feature descriptor database 38). Alternatively, the result generator unit 56 may be any of the images 53 that have received the boat (again, describing metadata or otherwise associated with each of these images in the feature descriptor database 38 or The query result data 42 may be formed according to other data.

[0058] 도 3은 본 개시에서 기술되는 특징 매칭 기법들을 구현하는데 있어 도 1의 예에서 도시된 비주얼 탐색 서버(14)와 같은 비주얼 탐색 서버의 예시적인 동작을 예시하는 흐름도이다. 초기에, 비주얼 탐색 서버(14)는 통상적으로 압축된 특징 기술자들을 정의하는 질의 데이터(30)를 인터페이스(32)를 통해 수신한다(90). 비주얼 탐색 서버(14)의 특징 재구성 유닛(34)은 재구성된 특징 기술자들(40)을 생성하기 위해 질의 데이터(30)에 의해 정의된 압축된 특징 기술자들로부터 질의 특징 기술자들(38)을 재구성한다(92). 특징 재구성 유닛(34)은 특징 매칭 유닛(36)에 재구성된 질의 특징 기술자들(40)을 출력한다. FIG. 3 is a flowchart illustrating an example operation of a visual search server, such as the visual search server 14 shown in the example of FIG. 1, in implementing the feature matching techniques described in this disclosure. Initially, visual search server 14 typically receives 90 interface data 30 via query 32 that defines compressed feature descriptors. Feature reconstruction unit 34 of visual search server 14 reconstructs query feature descriptors 38 from the compressed feature descriptors defined by query data 30 to produce reconstructed feature descriptors 40. (92). Feature reconstruction unit 34 outputs reconstructed query feature descriptors 40 to feature matching unit 36.

[0059] 특징 매칭 유닛(36)은 (도 2의 예에서 도시된 바와 같은) 거리 컴퓨팅 유닛(50)을 통해 이러한 질의 특징 기술자들(40)을 수신한다. 거리 컴퓨팅 유닛(50)은 질의 특징 기술자들(40) 중 하나를 선택한다(94). 거리 컴퓨팅 유닛(50)은 이어서 위에서 기술된 방식으로 특징 기술자 데이터베이스(38)에 저장된 기준 특징 기술자들(51) 중 하나 이상과 질의 특징 기술자들(40) 중 선택된 하나 간의 거리를 컴퓨팅한다(96). 이들 거리들은 거리들(62)로서 도 2의 예에서 도시된다. 거리 컴퓨팅 유닛(50)은 이들 거리들(62)을 그룹핑 유닛(52)에 출력한다. Feature matching unit 36 receives these query feature descriptors 40 via distance computing unit 50 (as shown in the example of FIG. 2). Distance computing unit 50 selects one of query feature descriptors 40 (94). Distance computing unit 50 then computes 96 a distance between one or more of reference feature descriptors 51 and a selected one of query feature descriptors 40 stored in feature descriptor database 38 in the manner described above. . These distances are shown in the example of FIG. 2 as distances 62. The distance computing unit 50 outputs these distances 62 to the grouping unit 52.

[0060] 이들 거리들(62)을 수신 시에, 그룹핑 유닛(52)은 클러스터링 알고리즘(64)에 따라 컴퓨팅된 거리(62)의 제 1 및 제 2 그룹들(66A, 66B)을 결정한다(98). 제 1 그룹(66A)은 제 2 그룹(66B)의 비-제로 거리들(62)의 세트와 상이한 거리들(62)의 비-제로 세트를 포함한다. 즉, 거리들(62)은 제 1 및 제 2 그룹들(66)로 분할되어서, 거리들(62) 중 어느 하나도 양자의 그룹들(66)과 연관될 수 없다. 그룹핑 유닛(52)은 그룹들(66)을 매칭 유닛(54)에 출력한다. Upon receiving these distances 62, the grouping unit 52 determines the first and second groups 66A, 66B of the computed distance 62 according to the clustering algorithm 64 ( 98). The first group 66A includes a non-zero set of distances 62 different from the set of non-zero distances 62 of the second group 66B. That is, the distances 62 are divided into first and second groups 66 so that none of the distances 62 can be associated with both groups 66. The grouping unit 52 outputs the groups 66 to the matching unit 54.

[0061] 그룹들(66)의 수신에 응답하여 매칭 유닛(54)은 제 1 그룹 거리들(66A)의 평균 및 제 2 그룹 거리들(66B)의 평균을 컴퓨팅한다(100). 이들 평균들은 그룹 평균(68A) 및 그룹 평균(68B)으로서 도 1의 예에서 도시되며, 여기서 그룹 평균(68A)은 그룹(66A)의 거리들(62)의 컴퓨팅된 평균을 표현하고 그룹 평균(68B)은 그룹(66B)의 거리들(62)의 컴퓨팅된 평균을 표현한다. 매칭 유닛(54)은 이어서 분할의 결과가 임계치(70) 미만인지를 결정하기 위해 분할의 결과를 임계치(70)에 비교하도록 그룹 평균(58B0로 그룹 평균(68A)을 나눈다(102-106). 결과가 임계치(70) 미만인 경우("예" 106), 매칭 유닛(54)은 질의 특징 기술자들(40) 중 선택된 것이 기준 특징 기술자들(51)의 결정된 가장 근처의 것에 매칭(거리(62)의 견지에서)한다고 식별하는 매칭 표시자(72)를 생성한다(108). 매칭 유닛(54)은 표시자(72) 및 제 1 그룹(66A)을 결과 생성기 유닛(56)에 포워딩한다. In response to receiving the groups 66, the matching unit 54 computes 100 the average of the first group distances 66A and the average of the second group distances 66B. These averages are shown in the example of FIG. 1 as group average 68A and group average 68B, where group average 68A represents the computed average of the distances 62 of group 66A and the group average ( 68B represents the computed average of the distances 62 of group 66B. Matching unit 54 then divides 102-106 the group mean 68A by group mean 58B0 to compare the result of the segmentation to threshold 70 to determine if the result of the segmentation is below threshold 70. FIG. If the result is below the threshold 70 (“YES” 106), the matching unit 54 matches the selected one of the query feature descriptors 40 to the determined nearest one of the reference feature descriptors 51 (distance 62). And a matching indicator 72 identifying 108). The matching unit 54 forwards the indicator 72 and the first group 66A to the result generator unit 56.

[0062] 결과 생성기 유닛(56)은 매칭 표시자(72) 및 그룹(66A)을 수신한다. 매칭 표시자(72) 및 그룹(66A)의 수신에 응답하여, 결과 생성기 유닛(56)은 리스트 형성 유닛(58)을 시동한다. 리스트 형성 유닛(58)은 위에서 기술된 방식으로 거리들(62)의 제 1 그룹(66A)과 연관된 기준 특징 기술자들(51)을 먼저 식별함으로써 초기 기준 영상 리스트(76)를 생성한다(110). 리스트 형성 유닛(58)은 다음으로, 재차 위에서 기술된 방식으로 기준 특징 기술자들(51)의 식별된 것들과 연관된 기준 영상들(53)을 식별한다(112). 리스트 형성 유닛(58)은 초기 기준 영상 리스트(76)로서 식별된 기준 영상들(53)을 저장할 수 있다. 또한, 위에서 기술된 바와 같이, 리스트 형성 유닛(58)은 이어서 식별된 기준 특징 기술자들(51)(또는 초기 기준 영상 리스트(76))에 기초하여 고유한 기준 영상들(53)의 리스트를 생성하며(114), 여기서 이 리스트는 고유한 기준 영상 리스트(74)로서 도 2의 예에서 도시된다. 결과 생성기 유닛(56)은 다음으로, 위에서 기술된 방식으로 고유한 기준 영상들(53)(또는 고유한 기준 영상 리스트(74))의 리스트에 보트들을 할당하는 보트 할당 유닛(60)을 시동할 수 있다(116). 보트 할당 유닛(60)은 고유한 기준 영상 리스트(74)에 의해 식별된 그러한 기준 영상들(53)에 보트들을 할당하도록 보트 집계 표(78)를 유지할 수 있다. [0062] The result generator unit 56 receives the matching indicator 72 and the group 66A. In response to receiving the matching indicator 72 and the group 66A, the result generator unit 56 starts the list forming unit 58. List forming unit 58 generates an initial reference image list 76 by first identifying reference feature descriptors 51 associated with the first group 66A of distances 62 in the manner described above (110). . The list forming unit 58 next identifies 112 reference images 53 associated with the identified ones of the reference feature descriptors 51 in the manner described above again. The list forming unit 58 may store the reference images 53 identified as the initial reference image list 76. Further, as described above, the list forming unit 58 then generates a list of unique reference images 53 based on the identified reference feature descriptors 51 (or the initial reference image list 76). 114, where this list is shown in the example of FIG. 2 as a unique reference picture list 74. The result generator unit 56 then starts the boat assignment unit 60 which assigns the boats to the list of unique reference images 53 (or the unique reference image list 74) in the manner described above. It may be 116. The boat assignment unit 60 may maintain a boat aggregation table 78 to assign boats to those reference images 53 identified by the unique reference image list 74.

[0063] 특징 매칭 유닛(36)은 이어서 그것이 위에서 기술된 방식으로 모든 질의 특징 기술자들(40)의 프로세싱을 완료하였는지를 결정한다(118). 대안적으로 매칭 유닛(54)은 그룹 평균(68B)으로 그룹 평균(68A)을 나눈 결과가 임계치(70) 미만이 아니라고 결정하는 경우(도 3a을 역으로 참조하여, "아니오"(106)), 특징 매칭 유닛(36)은 마찬가지로 그것이 위에서 기술된 방식으로 모든 질의 특징 기술자들(40)의 프로세싱을 완료하였다고 결정할 수 있다(118). 어떤 맥락에서, 특징 매칭 유닛(36)이 모든 질의 특징 기술자들(40의 프로세싱을 그것이 완료하였는지를 결정했는지 여부와 무관하게, 모든 질의 특징 기술자들(40)의 프로세싱을 완료하지 않은 경우,("아니오"(118)), 유닛들(50 내지 60)은 잔여 질의 특징 기술자들(40)을 프로세싱하도록 기술된 방식으로 동작한다(94 내지 118). The feature matching unit 36 then determines 118 whether it has completed the processing of all query feature descriptors 40 in the manner described above. Alternatively, the matching unit 54 determines that the result of dividing the group average 68A by the group average 68B is not less than the threshold 70 (reversely referring to FIG. 3A, "no" 106). The feature matching unit 36 may likewise determine that it has completed the processing of all query feature descriptors 40 in the manner described above (118). In some context, if feature matching unit 36 has not completed processing of all query feature descriptors 40, regardless of whether it has determined that it has completed the processing of all query feature descriptors 40 (“No 118, the units 50-60 operate in a manner described to process residual query feature descriptors 40 (94-118).

[0064] 그러나, 모든 질의 특징 기술자들(40)의 프로세싱을 완료한 경우("예" (118)), 보트 할당 유닛(60)은 위에서 기술된 방식으로 할당된 보트들(보트 집계 표(78)에 의해 정의된 바와 같이)에 기초하여 기준 영상들(53)의 랭크된 리스트를 생성할 수 있다(120). 결과 생성기 유닛(56)은 이어서 기준 영상들(53)의 랭크된 리스트를 포함하도록 질의 결과 데이터(42)를 생성할 수 있다(122). 결과 생성기 유닛(56)은 이어서 클라이언트 디바이스(12)에 대한 인터페이스(32)를 통해 질의 결과 데이터(42)를 전송할 수 있다(124). However, when the processing of all query feature descriptors 40 is completed (“YES” 118), the boat assignment unit 60 is assigned to the boats assigned in the manner described above (boat aggregation table 78). The rank list of the reference images 53 may be generated based on (as defined by)). The result generator unit 56 may then generate query result data 42 to include a ranked list of reference images 53 (122). The result generator unit 56 may then transmit 124 query result data 42 via the interface 32 to the client device 12.

[0065] 도 4를 참조하면 특징 기술자 추출에서 이용하기 위해 결정된 DoG(difference of Gaussian) 피라미드(204)를 예시하는 다이어그램이다. 도 1의 특징 추출 유닛(18)은 가우시안 피라미드(202)에서 임의의 2개의 연속적인 가우시안-블러링된 영상들의 차이를 컴퓨팅함으로써 DoG 피라미드(204)를 구성할 수 있다. 도 1의 예에서 영상 데이터(26)로서 도시된 입력 영상(I(x, y))은 가우시안 피라미드(202)를 구성하도록 점진적으로 가우시안 블러링된다. 가우시안 블러링은 일반적으로 가우시안 블러링된 함수(L(x, y, cσ))가 L(x, y, cσ) = G(x, y, cσ)*I(x, y))로서 정의되도록 스케일 cσ에서 가우시안 블러 함수(G(x, y, cσ))로 원래의 영상(I(x, y))을 컨볼브하는 것을 포함한다. 여기서, G는 가우시안 커널이고, cσ는 영상(I(x, y))을 블러링하는데 이용되는 가우시안 함수의 표준 편차를 나타낸다. c가 변동되면(c0 < c1 < c2 < c3 < c4), 표준 편차(cσ)가 변동되고, 점진적인 블러링이 획득된다. 시그마 σ는 기본 스케일 변수(본질적으로 가우시안 커널의 폭)이다. 초기 영상(I(x, y))이 블러링된 영상들(L)을 생성하기 위해 가우시안(G)으로 증분적으로 컨볼브될 때, 블러링된 영상들(L)은 스케일 공간에서 상수 팩터(c)에 의해 분리된다. 4 is a diagram illustrating a difference of Gaussian (DoG) pyramid 204 determined for use in feature descriptor extraction. The feature extraction unit 18 of FIG. 1 may construct the DoG pyramid 204 by computing the difference between any two consecutive Gaussian-blurred images in the Gaussian pyramid 202. The input image I (x, y) shown as image data 26 in the example of FIG. 1 is gradually Gaussian blurred to form a Gaussian pyramid 202. Gaussian blurring is generally such that a Gaussian blurred function (L (x, y, cσ)) is defined as L (x, y, cσ) = G (x, y, cσ) * I (x, y) Convolving the original image I (x, y) with a Gaussian blur function G (x, y, cσ) at scale cσ. Where G is a Gaussian kernel and cσ represents the standard deviation of the Gaussian function used to blur the image (I (x, y)). If c is changed (c0 <c1 <c2 <c3 <c4), the standard deviation cσ is changed and gradual blurring is obtained. Sigma σ is the basic scale variable (essentially the width of a Gaussian kernel). When the initial image I (x, y) is incrementally convolved with Gaussian G to produce blurred images L, the blurred images L are constant factors in scale space. separated by (c).

[0066] DoG 공간 또는 피라미드(204)에서, D(x, y, a) = L(x, y, cnσ) - L(x, y, cn-1σ)이다. DoG 영상(D(x, y, σ))은 스케일들(cnσ및 cn-1σ)에서 2개의 가까운 가우시안 블러링된 영상들(L) 간의 차이이다. D(x, y, σ)의 스케일은 cnσ와 cn-1σ 사이 어딘가에 있다. 가우시안-블러링된 영상들(L)의 수가 증가하고 가우시안 피라미드(202)에 대해 제공된 근사가 연속적인 공간에 접근하면, 2개의 스케일들은 서로 하나의 스케일로 접근한다. 컨볼브된 영상들(L)은 옥타브(octave)에 의해 그룹핑될 수 있으며, 여기서 옥타브는 표준 편차 σ의 값의 2배에 대응한다. 또한, 승산기들의 값들(k)(예를 들어, c0 < c1 < c2 < c3 < c4)은 고정된 수의 컨볼브된 영상들(L)이 옥타브 당 획득되도록 선택된다. 이어서, DoG 영상들(D)은 옥타브 당 가까운 가우시안-블러링된 영상들(L)로부터 획득될 수 있다. 각각의 옥타브 이후에, 가우시안 영상은 2배로 다운-샘플링되고 이어서 프로세스가 반복된다. In the DoG space or pyramid 204, D (x, y, a) = L (x, y, cnσ)-L (x, y, cn-1σ). The DoG image D (x, y, σ) is the difference between two close Gaussian blurred images L in the scales cnσ and cn-1σ. The scale of D (x, y, σ) lies somewhere between cnσ and cn-1σ. If the number of Gaussian-blurred images L increases and the approximation provided for the Gaussian pyramid 202 approaches a continuous space, the two scales approach one scale with each other. Convolved images L may be grouped by octave, where the octaves correspond to twice the value of the standard deviation σ. In addition, the values k of multipliers (e.g., c0 <c1 <c2 <c3 <c4) are selected such that a fixed number of convolved images L are obtained per octave. DoG images D may then be obtained from close Gaussian-blurred images L per octave. After each octave, the Gaussian image is down-sampled twice and then the process is repeated.

[0067] 특징 추출 유닛(18)은 이어서 영상(I(x, y))에 대한 키포인트들을 식별하기 위해 DoG 피라미드(204)를 이용할 수 있다. 키포인트 검출을 수행 시에, 특징 추출 유닛(18)은 영상 내의 특정한 샘플 지점 또는 픽셀 주위의 로컬 영역 또는 패치가 잠재적으로 흥미로운 패치(기하학적으로 말하면)인지를 결정한다. 일반적으로 특징 추출 유닛(18)은 DoG 공간(204)에서 로컬 최대 및/또는 로컬 최소를 식별하고 DoG 공간(204) 내의 키포인트 위치들로서 이들 최대 및 최소의 위치들을 이용한다. 도 4에 예시된 예에서, 특징 추출 유닛(18)은 패치(206) 내의 키포인트(208)를 식별한다. 로컬 최대 및 최소의 발견(또한 로컬 극단 검출로서 알려짐)은 총 26개의 픽셀들((9x2+8=26))에 대해, DoG 공간(204) 내의 각각의 픽셀(예를 들어, 키포인트(208)에 대한 픽셀)을 동일한 스케일에서의 그의 8개의 이웃하는 픽셀들에 그리고 2개의 측면들 상에서 이웃하는 스케일들 각각 내의 9개의 이웃하는 픽셀들(가까운 패치들(210 및 212)의)에 비교함으로써 달성될 수 있다. 키포인트(208)에 대한 픽셀 값이 패치들(206, 210 및 208)에서 모든 26개의 비교된 픽셀들 사이에서 최대 또는 최소인 경우, 특징 추출 유닛(18)은 이것을 키포인트로서 선택한다. 특징 추출 유닛(18)은 추가로 그의 위치가 보다 정확히 식별되도록 키포인트들을 추가로 프로세싱할 수 있다. 특징 추출 유닛(18)은 몇몇 인스턴스들에서, 저 콘트라스트 키 포인트들 및 에지 키 포인트들과 같은 키포인트들 중 일부를 폐기할 수 있다. Feature extraction unit 18 may then use DoG pyramid 204 to identify keypoints for image I (x, y). In performing keypoint detection, feature extraction unit 18 determines whether a local area or patch around a particular sample point or pixel in the image is a potentially interesting patch (geometrically speaking). In general, feature extraction unit 18 identifies the local maximum and / or local minimum in DoG space 204 and uses these maximum and minimum locations as keypoint locations within DoG space 204. In the example illustrated in FIG. 4, feature extraction unit 18 identifies keypoint 208 in patch 206. The local maximum and minimum discovery (also known as local extreme detection) is based on each pixel in the DoG space 204 (eg, keypoint 208) for a total of 26 pixels ((9x2 + 8 = 26)). Is achieved by comparing it to its eight neighboring pixels at the same scale and to nine neighboring pixels (of close patches 210 and 212) within each of the neighboring scales on the two sides. Can be. If the pixel value for keypoint 208 is the maximum or minimum among all 26 compared pixels in patches 206, 210 and 208, feature extraction unit 18 selects this as a keypoint. Feature extraction unit 18 may further process keypoints such that its location is more accurately identified. Feature extraction unit 18 may discard some of the keypoints, such as low contrast key points and edge key points, in some instances.

[0068] 도 5는 키포인트의 검출을 보다 상세히 예시하는 다이어그램이다. 도 5의 예에서, 패치들(206, 210 및 212) 각각은 3x3 픽셀 영역을 포함한다. 특징 추출 유닛(18)은 우선 흥미의 픽셀(예를 들어, 키포인트(208)를 동일한 스케일에서(예를 들어, 패치(206))의 그의 8개의 이웃하는 픽셀들(302)에, 그리고 키포인트(208)의 2개의 측면들 상의 이웃하는 스케일들 각각 내의 가까운 패치들(210 및 212) 내의 9개의 이웃하는 픽셀들(304 및 306)에 비교한다. 5 is a diagram illustrating in more detail the detection of a keypoint. In the example of FIG. 5, each of the patches 206, 210, and 212 includes a 3 × 3 pixel region. The feature extraction unit 18 first first places the pixel of interest (eg, keypoint 208) at its eight neighboring pixels 302 at the same scale (eg, patch 206), and the keypoint ( Compare to nine neighboring pixels 304 and 306 in close patches 210 and 212 in each of the neighboring scales on the two sides of 208.

[0069] 특징 추출 유닛(18)은 로컬 영상 그라디언트의 방향들에 기초하여 하나 이상의 방위들, 또는 방향들을 각각의 키포인트에 할당할 수 있다. 로컬 영상 특성들에 기초하여 각각의 키포인트에 일관되는 방위를 할당함으로써, 특징 추출 유닛(18)은 이 방위에 대해 키포인트 식별자를 표현하고 이에 따라 영상 회전에 대한 불변성을 달성한다. 특징 추출 유닛(18)은 이어서 키포인트 스케일에서 및/또는 가우시안-블러링된 영상(L)의 키포인트(208) 주위의 이웃하는 영역들에서 각각의 픽셀에 대한 크기 및 방향을 계산한다. (x, y)에 위치되는 키포인트(208)에 대한 그라디언트의 크기는 m(x, y)로서 표현될 수 있고, (x, y)의 키포인트에 대한 그라디언트의 방위 또는 방향은 Γ(x, y)로서 표현될 수 있다. The feature extraction unit 18 may assign one or more orientations, or directions, to each keypoint based on the directions of the local image gradient. By assigning a consistent orientation to each keypoint based on local image characteristics, feature extraction unit 18 represents a keypoint identifier for this orientation and thus achieves invariance to image rotation. The feature extraction unit 18 then calculates the size and direction for each pixel at the keypoint scale and / or in neighboring regions around the keypoint 208 of the Gaussian-blurred image L. FIG. The magnitude of the gradient for keypoint 208 located at (x, y) can be expressed as m (x, y), and the orientation or direction of the gradient relative to the keypoint of (x, y) is Γ (x, y Can be expressed as

[0070] 특징 추출 유닛(18)은 이어서 모든 계산들이 스케일-불변 방식으로 수행되도록 키포인트(208)의 스케일에 대한 최근접 스케일을 갖는 가우시안 스무스드 영상(Gaussian smoothed image)(L)을 선택하도록 키포인트의 스케일을 이용한다. 이 스케일에서 각각의 영상 샘플 L(x, y)에 대해, 특징 추출 유닛(18)은 픽셀 차이들을 이용하여 그라디언트 크기, m(x, y) 및 방위 Γ(x, y)를 컴퓨팅한다. 예를 들어, 크기 m(x, y)는 다음의 수학식(2)에 따라 컴퓨팅될 수 있다:The feature extraction unit 18 then selects a keypoint to select a Gaussian smoothed image L having the nearest scale to the scale of the keypoint 208 such that all calculations are performed in a scale-invariant manner. Use the scale of. For each image sample L (x, y) at this scale, feature extraction unit 18 computes the gradient size, m (x, y) and orientation Γ (x, y) using the pixel differences. For example, the size m (x, y) can be computed according to the following equation (2):

특징 추출 유닛(18)은 다음의 수학식(3)에 따라 방향 또는 방위(Γ(x, y))를 계산할 수 있다:The feature extraction unit 18 may calculate the direction or orientation Γ (x, y) according to the following equation (3):

수학식(3)에서, L(x, y)는 또한 키포인트의 스케일인 스케일 σ에서 가우시안-블러링된 영상(L(x, y, σ))의 샘플을 표현한다. In equation (3), L (x, y) also represents a sample of a Gaussian-blurred image L (x, y, σ) at scale σ, which is the scale of the keypoint.

[0071] 특징 추출 유닛(18)은 DoG 공간에서 키포인트의 평면 위에, 더 높은 스케일에 있는 가우시안 피라미드의 평면에 대해 또는 키포인트 아래, 더 낮은 스케일에 있는 가우시안 피라미드의 평면에서 키포인트에 대한 그라디언트를 일관되게 계산할 수 있다. 각각의 키포인트에 대해, 양자의 방식에서, 특징 추출 유닛(18)은 동일한 스케일에서 키포인트를 둘러싸는 직사각형 영역(예를 들어, 패치)에서 그라디언트를 계산한다. 또한, 영상 신호의 주파수는 가우시안-블러링 영상의 스케일에 반영된다. 아직, SIFT 및 CHoG(compressed histogram of gradients) 알고리즘과 같은 다른 알고리즘은 단순히 패치(예를 들어, 직사각형 지역) 내의 모든 픽셀들의 그라디언트 값들을 이용한다. 패치는 키포인트 주위로 정의되고; 서브-블록들은 블록 내에서 정의되고; 샘플들은 서브-블록들 내에 정의되고, 이러한 구조는 키포인트들의 스케일들이 상이할 때조차도 모든 키포인트들에 대해 동일하게 유지된다. 그러므로 영상 신호의 주파수가 동일한 옥타브에서 가우시안 스무싱 필터들(Gaussian smoothing filters)의 연속적인 적용을 통해 변경되지만, 상이한 스케일들에서 식별된 키포인트들은 스케일에 의해 표현되는 영상 신호의 주파수의 변화와 무관하게 동일한 수의 샘플들로 샘플링될 수 있다. The feature extraction unit 18 consistently gradients the gradient for the keypoint in the plane of the Gaussian pyramid at the lower scale, above or below the plane of the Gaussian pyramid at the higher scale, above the plane of the keypoint in the DoG space. Can be calculated For each keypoint, in both ways, feature extraction unit 18 calculates the gradient in the rectangular region (eg, a patch) surrounding the keypoint at the same scale. In addition, the frequency of the image signal is reflected in the scale of the Gaussian-blurred image. Yet, other algorithms, such as SIFT and the compressed histogram of gradients (CHoG) algorithm, simply use the gradient values of all the pixels in the patch (eg, rectangular area). Patches are defined around keypoints; Sub-blocks are defined within a block; Samples are defined in sub-blocks, and this structure remains the same for all keypoints even when the scales of the keypoints are different. Therefore, although the frequency of the video signal is changed through successive application of Gaussian smoothing filters at the same octave, the keypoints identified at different scales are independent of the change in the frequency of the video signal represented by the scale. It can be sampled with the same number of samples.

[0072] 키포인트 방위를 특성화하기 위해, 특징 추출 유닛(18)은 예를 들어, SIFT를 이용함으로써 그라디언트 방위 히스토그램(도 6 참조)을 생성할 수 있다. 각각의 이웃한 픽셀들의 기여(contribution)는 그라디언트 크기 및 가우시안 윈도우에 의해 가중화될 수 있다. 히스토그램의 피크들을 지배적인 방위들에 대응한다. 특징 추출 유닛(18)은 키포인트 방위에 대한 키포인트의 모든 특성들을 측정할 수 있고, 이는 회전에 대한 불변성을 제공한다. To characterize the keypoint orientation, feature extraction unit 18 may generate a gradient orientation histogram (see FIG. 6), for example, by using SIFT. The contribution of each neighboring pixel can be weighted by the gradient size and the Gaussian window. The peaks of the histogram correspond to the dominant orientations. The feature extraction unit 18 can measure all the characteristics of the keypoint relative to the keypoint orientation, which provides invariance to rotation.

[0073] 일 예에서, 특징 추출 유닛(18)은 각각의 블록에 대한 가우시안-가중화된 그라디언트의 분포를 컴퓨팅하며, 여기서 각각의 블록은 2개의 서브-블록 x 2개의 서브 블록이며 총 4개의 서브-블록들이 있다. 가우시안-가중화된 그라디언트들의 분포를 계산하기 위해, 특징 추출 유닛(18)은 몇 개의 빈(bin)들로 방위 히스토그램을 형성하며 각각의 빈은 키포인트 주위의 영역의 부분을 커버한다. 예를 들어, 방위 히스토그램은 36개의 빈들을 가질 수 있고, 각각의 빈은 360도 범위의 방위들 중 10도를 커버한다. 대안적으로, 히스토그램은 8개의 빈들을 가질 수 있고, 각각은 360도 범위 중 45도를 커버한다. 여기서 기술되는 히스토그램 코딩 기법들은 임의의 수의 빈들의 히스토그램들에 응용 가능할 수 있다는 것이 자명해져야 한다.In one example, feature extraction unit 18 computes a distribution of Gaussian-weighted gradients for each block, where each block is two sub-blocks x two sub-blocks and a total of four There are sub-blocks. To calculate the distribution of Gaussian-weighted gradients, feature extraction unit 18 forms an azimuth histogram with several bins, each bin covering a portion of the area around the keypoint. For example, an azimuth histogram can have 36 bins, each bin covering 10 degrees of azimuths in a 360 degree range. Alternatively, the histogram can have eight bins, each covering 45 degrees in a 360 degree range. It should be apparent that the histogram coding techniques described herein may be applicable to histograms of any number of bins.

[0074] 도 6은 특징 추출 유닛(18)과 같은 특징 추출 유닛이 그라디언트 분포 및 방위 히스토그램을 결정하는 프로세스를 예시하는 다이어그램이다. 여기서, 2-차원 그라디언트 분포(dx, dy)(예를 들어, 블록(406))는 1-차원 분포(예를 들어, 히스토그램(414))로 변환된다. 키포인트(208)는 키포인트(208)를 둘러싸는 블록(406)의 중심(또한 패치, 셀 또는 영역으로 불림)에 위치된다. 피라미드의 각각의 레벨에 대해 사전-컴퓨팅되는 그라디언트들은 각각의 샘플 위치(48)에서 작은 화살표로서 도시된다. 도시된 바와 같이, 샘플들(408)의 영역들은 빈들(410)로서 또한 지칭될 수 있는 서브-블록들(410)을 형성한다. 특징 추출 유닛(18)은 서브-블록들 또는 빈들(410) 내의 각각의 샘플(408)에 가중치를 할당하기 위해 가우시안 가중화 함수를 이용할 수 있다. 가우시안 가중화 함수에 의한 샘플(408) 각각에 할당된 가중치는 빈들(410) 및 키포인트(208)의 중심으로부터 스무스하게 떨어진다. 가우시안 가중화 함수의 목적은 기술자의 중심으로부터 멀리 떨어진 그라디언트들에 더 적은 강조(emphasis)를 제공하고 윈도우의 위치에서 작은 변경들을 통해 기술자의 급작스런 변경들을 방기하기 위한 것이다. 특징 추출 유닛(18)은 차원 특징 기술자를 발생시키는 히스토그램의 각각의 빈의 8개의 방위들을 갖는 방위 히스토그램(412)의 어레이를 결정한다. 예를 들어, 방위 히스토그램(413)은 서브-블록(410)에 대한 그라디언트 분포에 대응할 수 있다. FIG. 6 is a diagram illustrating a process by which a feature extraction unit, such as feature extraction unit 18, determines the gradient distribution and the orientation histogram. Here, the two-dimensional gradient distribution (dx, dy) (e.g., block 406) is converted to a one-dimensional distribution (e.g., histogram 414). Keypoint 208 is located at the center (also called a patch, cell or area) of block 406 that surrounds keypoint 208. Gradients pre-computed for each level of the pyramid are shown as small arrows at each sample location 48. As shown, the regions of samples 408 form sub-blocks 410, which may also be referred to as bins 410. Feature extraction unit 18 may use a Gaussian weighting function to assign a weight to each sample 408 in sub-blocks or bins 410. The weight assigned to each of the samples 408 by the Gaussian weighting function falls smoothly from the center of the bins 410 and the keypoint 208. The purpose of the Gaussian weighting function is to provide less emphasis to gradients farther away from the center of the descriptor and to avoid abrupt changes of the descriptor through small changes in the position of the window. Feature extraction unit 18 determines an array of orientation histograms 412 with eight orientations of each bin of the histogram that generates a dimensional feature descriptor. For example, azimuth histogram 413 may correspond to the gradient distribution for sub-block 410.

[0075] 몇몇 인스턴스들에서, 특징 추출 유닛(18)은 그라디언트 분포들을 획득하기 위해 다른 타입들의 양자화 빈 성상도들(quantization bin constellations)(예를 들어, 상이한 보로노이(Voronoi) 셀 구조들을 가짐)을 이용할 수 있다. 이러한 다른 타입들의 빈 성상도들은 마찬가지로 소프트 비닝(soft binning)의 형태를 이용할 수 있으며, 여기서 소프트 비닝은 이른바 DAISY 구성이 이용될 때 정의된 것들과 같은 오버랩하는 빈들을 지칭한다. 도 6의 예에서, 그러나 3개의 소프트 빈들이 정의되지만, 키포인트(208) 주위의 원형 구성(402)으로 일반적으로 위치되는 중심과 함께 9 또는 그 초과만큼 많이 이용될 수 있다. In some instances, feature extraction unit 18 may have different types of quantization bin constellations (eg, having different Voronoi cell structures) to obtain gradient distributions. Can be used. These other types of bin constellations can likewise use the form of soft binning, where soft binning refers to overlapping bins, such as those defined when a so-called DAISY configuration is used. In the example of FIG. 6, however, three soft bins are defined, but as many as nine or more may be used with the center generally located in the circular configuration 402 around the keypoint 208.

[0076] 여기서 이용되는 바와 같이, 히스토그램은 빈들로서 알려진 다양한 분리된 카테고리들 내에 있는 관찰들, 샘플 또는 발생(예를 들어, 그라디언트들)의 수를 카운트하는 맵(k_i)이다. 히스토그램의 그래프는 히스토그램을 표현하기 위한 단지 하나의 방식이다. 따라서 k가 관찰들, 샘플들, 또는 발생들의 총 수이고 m인 빈들의 총 수인 경우, 히스토그램(k_i)의 주파수는 수학식(4)으로서 표현되는 다음의 조건을 만족시킨다:As used herein, a histogram is a map k _i that counts the number of observations, samples, or occurrences (eg, gradients) within various separate categories known as bins. Graphs of histograms are just one way to represent histograms. Thus, if k is the total number of observations, samples, or occurrences and the total number of bins, m, then the frequency of the histogram k _i satisfies the following condition, expressed as equation (4):

여기서

는 합산 연산자이다. here

Is an addition operator.

[0077] 특징 추출 유닛(18)은 키포인트의 스케일에 1.5배인 표준 편차를 갖는 가우시안-가중화된 함수에 의해 정의된 그의 그라디언트 크기에 의해 히스토그램들(412)에 부가된 각각의 샘플을 가중화한다. 결과적인 히스토그램(414)의 피크들은 로컬 그라디언트들의 지배적인 방향들에 대응한다. 특징 추출 유닛(18)은 이어서 히스토그램에서 최고 피크를 검출하고 이어서 최고 피크의 특정한 퍼센티지, 예를 들어, 80% 내에 있는 임의의 다른 로컬 피크를 검출한다(이것은 그 배향을 갖는 키포인트를 또한 생성하는데 또한 이용할 수 있음). 그러므로 유사한 크기의 다수의 피크들을 갖는 위치들에 대해, 특징 추출 유닛(18)은 위치 및 스케일이 동일하지만 상이한 방위로 생성되는 다수의 키포인트들을 추출한다. Feature extraction unit 18 weights each sample added to histograms 412 by its gradient size defined by a Gaussian-weighted function having a standard deviation 1.5 times the scale of the keypoint. . The peaks of the resulting histogram 414 correspond to the dominant directions of the local gradients. Feature extraction unit 18 then detects the highest peak in the histogram and then detects a certain percentage of the highest peak, for example any other local peak that is within 80% (which also generates a keypoint with that orientation) Available). Therefore, for locations with multiple peaks of similar size, feature extraction unit 18 extracts multiple keypoints that are the same in position and scale but generated in different orientations.

[0078] 특징 추출 유닛(18)은 이어서 일 타입으로서 히스토그램을 표현하는 타입 양자화로서 지칭되는 양자화의 형태를 이용하여 히스토그램을 양자화한다. 이 방식으로, 특징 추출 유닛(18)은 각각의 키포인트에 대한 기술자를 추출할 수 있고, 여기서 각각의 기술자는 일 타입의 형태의 가우시안-가중화된 그라디언트들의 분포들의 위치(x, y) 및 방위 및 기술자에 의해 특성화될 수 있다. 이러한 방식으로, 영상은 하나 이상의 키포인트 기술자들(또한 영상 기술자들로서 지칭됨)에 의해 특성화될 수 있다. Feature extraction unit 18 then quantizes the histogram using a form of quantization referred to as type quantization that represents the histogram as one type. In this way, feature extraction unit 18 can extract a descriptor for each keypoint, where each descriptor is the position (x, y) and orientation of distributions of Gaussian-weighted gradients of one type of form. And technicians. In this way, an image may be characterized by one or more keypoint descriptors (also referred to as image descriptors).

[0079] 도 4 내지 도 6에서 도시된 위의 예들이 SIFT 알고리즘에 의해 수행되는 특징 추출의 일 예를 제공하지만, 기법들은 CHoG(compressed histogram of gradient) 알고리즘과 같은 임의의 다른 비주얼 탐색 알고리즘에 따라 추출되는 특징 기술자들에 관해 구현될 수 있다. 그러므로 기법들은 이러한 점에서 SIFT 알고리즘을 이용하여 추출된 특징 기술자들만으로 제한되어야 하는 것이 아니라, 임의의 비주얼 탐색 알고리즘에 따라 추출된 특징 기술자들에 관하여 구현될 수 있다. While the above examples shown in FIGS. 4-6 provide an example of feature extraction performed by a SIFT algorithm, the techniques are in accordance with any other visual search algorithm, such as a compressed histogram of gradient (CHoG) algorithm. It may be implemented with respect to the feature descriptors being extracted. Therefore, the techniques should not be limited to feature descriptors extracted using SIFT algorithm in this respect, but may be implemented with respect to feature descriptors extracted according to any visual search algorithm.

[0080] 도 7은 본 개시에서 기술되는 기법들에 따라 특징 매칭을 수행하는데 있어 도 1의 특징 매칭 유닛(36)과 같은 특징 매칭 유닛의 예시적인 동작을 예시하는 다이어그램이다. 도 7의 예에서, 질의 영상 데이터(500) 및 기준 영상 데이터(502)가 도시되며, 여기서 질의 영상 데이터(500)는 기준 영상 데이터(502)의 좌측에 도시된다. 흑색 라인들은 특징 기술자 데이터베이스(38)에 저장되고 기준 영상 데이터(502)로부터 추출된 (우측 화살표 머리에 의해 표현되는) 기준 특징 기술자들과 질의 영상 데이터로부터 추출된 (이용 가능한 경우 좌측 화살표 머리에 의해 표현되는) 질의 특징 기술자들 간의 관계를 도시한다. FIG. 7 is a diagram illustrating an example operation of a feature matching unit, such as feature matching unit 36 of FIG. 1, in performing feature matching in accordance with the techniques described in this disclosure. In the example of FIG. 7, query image data 500 and reference image data 502 are shown, where query image data 500 is shown to the left of reference image data 502. The black lines are stored in feature descriptor database 38 and by reference feature descriptors (represented by the right arrow head) extracted from reference image data 502 and by the left arrow heads (if available) extracted from query image data. The relationship between the query feature descriptors (expressed).

[0081] 질의 영상 데이터(500)는 클라이언트 디바이스(12)와 같은 클라이언트 디바이스에 의해 캡처되거나 또는 다른 방식으로 획득된 피사의 사탑의 영상을 도시한다. 질의 영상 데이터(500) 및 기준 영상 데이터(502)는 상이한 (비록 유사하지만) 관점에서 그리고 상이한 스케일들로서 캡처되는 피사의 사탑의 영상들을 도시한다. 비주얼 탐색 서버(14)의 특징 매칭 유닛(36)은 이 예에서 좌측 화살표들로 표시되는 특징들을 표현하는 질의 특징 기술자들(40)을 수신한다. 특징 매칭 유닛(36)은 우측 화살표에 의해 표시되는 대응하는 특징들을 표현하는 매칭 기준 특징 기술자들(51)(도 2를 참조)을 결정한다. Query image data 500 shows an image of a Leaning Tower of Pisa captured or otherwise acquired by a client device, such as client device 12. Query image data 500 and reference image data 502 show images of the Leaning Tower of Pisa captured from different (although similar) viewpoints and as different scales. Feature matching unit 36 of visual search server 14 receives query feature descriptors 40 that represent the features indicated by the left arrows in this example. Feature matching unit 36 determines matching criteria feature descriptors 51 (see FIG. 2) that represent the corresponding features indicated by the right arrow.

[0082] 이 예에서, 피사의 사탑은 피사의 사탑을 둘러싸는 반복 아치들의 형태의 다수의 반복 구조들을 포함한다. 특히 504A 내지 504C로 표시되는 3개의 흑색 라인들은 특징 기술자 공간에서 서로 근접할 수 있는 기준 특징 기술자들과 대응하는 질의 간의 잠재적인 매칭을 예시한다. 종래의 비주얼 탐색 알고리즘에서 공통적이었던 바와 같이 고유하지 않은 것으로서 이들 매칭들을 거절하기 보단 오히려, 특징 매칭 유닛(36)은 이들 라인들(504A 내지 504C)의 좌측 화살표들에 의해 표현되는 대응하는 질의 특징 기술자들(40)의 각각에 대하여, 서로에 관해 제 1 그룹(66A)에서 그룹핑되는 거리(62)(재차 도 2 참조)를 결정할 수 있다. In this example, the leaning tower of Pisa includes a number of repeating structures in the form of repeating arches surrounding the leaning tower of Pisa. In particular, the three black lines, labeled 504A-504C, illustrate a potential match between a reference feature descriptor and a corresponding query that may be close to each other in the feature descriptor space. Rather than rejecting these matches as not unique as was common in conventional visual search algorithms, feature matching unit 36 does not have a corresponding query feature descriptor represented by the left arrows of these lines 504A-504C. For each of the fields 40, the distance 62 grouped in the first group 66A with respect to each other (see FIG. 2 again) may be determined.

[0083] 예를 들어, 라인(504A)에 의해 표시되는 매칭을 고려하면, 특징 매칭 유닛(36)은 라인(504A)의 좌측 화살표에 의해 표현되는 질의 특징 기술자들(40)이 라인들(504A 내지 504C는 물론 504D 내지 504G) 각각에 대한 우측 화살표들에 의해 표현되는 기준 특징 기술자들(51)에 매칭한다고 결정할 수 있다. 특징 매칭 유닛(36)은 위에서 기술된 방식으로 라인들(504A 내지 504G)과 연관되는 기준 특징 기술자들(51)에 대해 라인(504A)과 연관된 질의 특징 기술자(40)에 관한 거리들(62)을 결정할 수 있다. 다음으로, 특징 매칭 유닛(36)은 라인(504A)과 연관된 질의 특징 기술자 및 라인들(504A 내지 504C)과 연관된 기준 특징 기술자들에 관해 컴퓨팅된 거리들(62)을 그룹핑할 수 있고 다른 그룹은 모든 잔여 거리들을 포함한다. 특징 매칭 유닛(36)은 이어서 제 1 및 제 2 그룹 거리들(66)을 평균화하고 제 2 그룹 평균(68B)으로 제 1 그룹 평균(68A)을 나누고 나눈 결과를 임계치(70)에 비교할 수 있다. 특징 매칭 유닛(36)은 이 예에서 고유한 매칭을 결정하고 그에 의해 라인(504A)으로서 도시된 매칭을 결정한다. For example, considering the match indicated by line 504A, feature matching unit 36 may have query feature descriptors 40 represented by the left arrow of line 504A in lines 504A. To 504C as well as to match the reference feature descriptors 51 represented by the right arrows for each of 504D to 504G). Feature matching unit 36 provides distances 62 with respect to query feature descriptor 40 associated with line 504A with respect to reference feature descriptors 51 associated with lines 504A-504G in the manner described above. Can be determined. Next, the feature matching unit 36 may group the computed distances 62 with respect to the query feature descriptor associated with the line 504A and the reference feature descriptors associated with the lines 504A-504C and another group Include all remaining distances. The feature matching unit 36 may then average the first and second group distances 66 and divide and divide the first group average 68A by the second group average 68B and compare the result to the threshold 70. . Feature matching unit 36 determines unique matching in this example and thereby determines the matching shown as line 504A.

[0084] SIFT와 같은 비주얼 탐색 알고리즘들의 종래의 특징 매칭 양상들은 클러스터링 알고리즘을 이용하지 않음으로써 라인(504A)에 의해 표현되는 매칭을 거절할 수 있다. 이러한 매칭의 거절은 라인(504A)의 좌측 화살표에 의해 표현되는 특징으로부터 추출된 질의 특징 기술자(40) 및 라인(504A)의 우측 화살표에 의해 표현되는 특징으로부터 추출되는 기준 특징 기술자(51)가 최소일 것이기 때문에 발생할 수 있다. 이 예에서, 다음의 최소 거리는 라인(504A)의 좌측 화살표에 의해 표현되는 특징으로부터 추출된 질의 특징 기술자(40)와 라인(504B)의 우측 화살표에 의해 표현되는 특징으로부터 추출되는 기준 특징 기술자(51) 간의 거리일 것이다. 이 다음의 최소 거리는 거의 위에서 언급된 최소 거리 만큼 작을 수 있다. 따라서 비주얼 탐색 알고리즘들의 종래의 특징 매칭 양상들에 따라 다음의 최소 거리로 최소 거리를 나누면 임계치보다 클 수 있는 1에 가까운 번호가 발생할 수 있으며, 이는 통상적으로 0.5와 같이 1의 임의의 단편으로 세팅된다. Conventional feature matching aspects of visual search algorithms, such as SIFT, may reject the match represented by line 504A by not using a clustering algorithm. This rejection of matching is minimal with the query feature descriptor 40 extracted from the feature represented by the left arrow of line 504A and the reference feature descriptor 51 extracted from the feature represented by the right arrow of line 504A. It can happen because it will be. In this example, the next minimum distance is the query feature descriptor 40 extracted from the feature represented by the left arrow of line 504A and the reference feature descriptor 51 extracted from the feature represented by the right arrow of line 504B. Will be the distance between This next minimum distance can be as small as the above mentioned minimum distance. Thus, according to conventional feature matching aspects of visual search algorithms, dividing the minimum distance by the next minimum distance may result in a number close to 1, which may be greater than the threshold, which is typically set to any fragment of 1, such as 0.5. .

[0085] 따라서 본 개시에서 기술되는 기법들에 따라 그룹들을 결정하기 위해 클러스터링 알고리즘을 이용함으로써, 특징 매칭 유닛(36)은 반복 특징 패턴들을 참조하는 이들 비교적 작은 거리들(62) 모두를 제 1 그룹(66A)으로 그룹핑할 수 있는 반면에, 모든 다른 거리들(62)은 제 2 그룹(그룹 66B)으로 그룹핑될 수 있다. 특징 매칭 유닛(36)은 이어서 이들 그룹들 각각의 비교를 용이하게 하도록 이들 거리들을 평균화하고 가능하게는 고유한 매칭으로 간주되는 것을 거절하는 것을 방지하도록 이 평균들을 비교한다. 그 결과, 기법들은 특히, 반복 특징들 또는 양상들을 갖는 영상들에 관하여 특징 매칭을 개선할 수 있다. Thus, by using a clustering algorithm to determine groups in accordance with the techniques described in this disclosure, feature matching unit 36 causes all of these relatively small distances 62 to refer to repeating feature patterns to the first group. All other distances 62 can be grouped into a second group (group 66B), while grouping into 66A. The feature matching unit 36 then compares these averages to average these distances to facilitate comparison of each of these groups and possibly to avoid rejecting what is considered a unique match. As a result, the techniques may improve feature matching, particularly with respect to images with repetitive features or aspects.

[0086] 또한, 틀린 매칭의 수락에 관한 제어를 유지하면서 잠재적인 매칭들의 거절을 방지하기 위해, 본 개시의 강건한 거리 비 기법들은 위에서 기술된 보다 차별적인 유사사어 측정을 제공한다. 강건한 거리 비 기법들은 k-차원 트리(KD-트리)-베스트 빈 퍼스트(Best Bin First)를 이용하여 최초의 가장 근처의 이웃에 대해 질의하면서 저장된 중간 데이터를 이용함으로써 약간의 내지 전혀 없는 오버헤드 컴퓨팅 비용으로 수행될 수 있다. In addition, the robust distance ratio techniques of the present disclosure provide the more discriminatory quasi-measurement measures described above, in order to prevent rejection of potential matches while maintaining control over acceptance of the wrong match. Robust distance-ratio techniques use little to no overhead computing by using stored intermediate data while querying the first nearest neighbor using the k-dimensional tree-Best Bin First. It can be done at a cost.

[0087] 하나 이상의 예들에서, 기술된 기능들은 하드웨어, 소프트웨어, 펌웨어, 또는 이들의 임의의 조합으로 구현될 수 있다. 소프트웨어로 구현되는 경우, 기능들은 컴퓨터-판독 가능한 매체 상의 하나 이상의 명령들 또는 코드로서 저장되거나, 또는 이를 통해 전송될 수 있다. 컴퓨터-판독 가능한 매체들은 컴퓨터 데이터 저장 매체들을 포함할 수 있다. 데이터 저장 매체들은 본 개시에서 기술된 기법들의 구현을 위해 명령들, 코드 및/또는 데이터 구조들을 리트리브하도록 하나 이상의 컴퓨터들 또는 하나 이상의 프로세서들에 의해 액세스될 수 있는 임의의 이용 가능한 매체들일 수 있다. 여기서 이용되는 바와 같은 "데이터 저장 매체"는 제조품을 지칭하며 일시적인 전파 신호들을 지칭하지 않는다. 제한이 아닌 예로서, 이러한 컴퓨터-판독 가능한 매체들은 RAM, ROM, EEPROM, CD-ROM 또는 다른 광학 디스크 저장소, 자기 디스크 저장 또는 다른 자기 저장 디바이스들, 플래시 메모리, 또는 컴퓨터에 의해 액세스될 수 있고 데이터 구조 또는 명령들의 형태로 원하는 프로그램 코드를 저장하는데 이용될 수 있는 임의의 다른 매체를 포함할 수 있다. 본 명세서에서 사용되는 디스크(disk) 및 디스크(disc)는 컴팩트 디스크(disc)(CD), 레이저 디스크(disc), 광 디스크(disc), 디지털 다기능 디스크(disc)(DVD), 플로피 디스크(disk), 및 블루-레이 디스크(disc)를 포함하며, 여기서 디스크(disk)들은 데이터를 보통 자기적으로 재생하지만, 디스크(disc)들은 레이저를 이용하여 광학적으로 데이터를 재생한다. 상기한 것들의 조합들 역시 컴퓨터 판독가능 매체들의 범위 내에 포함되어야 한다.In one or more examples, the functions described may be implemented in hardware, software, firmware, or any combination thereof. If implemented in software, the functions may be stored on or transmitted over as one or more instructions or code on a computer-readable medium. Computer-readable media may include computer data storage media. Data storage media may be any available media that can be accessed by one or more computers or one or more processors to retrieve instructions, code and / or data structures for implementation of the techniques described in this disclosure. "Data storage medium" as used herein refers to an article of manufacture and does not refer to transient propagation signals. By way of example, and not limitation, such computer-readable media may be RAM, ROM, EEPROM, CD-ROM or other optical disk storage, magnetic disk storage or other magnetic storage devices, flash memory, or computer accessible data. It can include any other medium that can be used to store the desired program code in the form of structures or instructions. As used herein, a disc and a disc may be a compact disc (CD), a laser disc, an optical disc, a digital versatile disc (DVD), a floppy disc ), And Blu-ray discs, where discs usually reproduce data magnetically, while discs reproduce data optically using a laser. Combinations of the above should also be included within the scope of computer readable media.

[0088] 코드는 하나 이상의 DSP들(digital signal processors), 범용 마이크로프로세서들, ASIC들(application specific integrated circuits), FPGA(field programmable logic arrays)들, 또는 다른 등가의 통합된 또는 이상 로직 회로와 같은 하나 이상의 프로세서들에 의해 실행될 수 있다. 이에 따라, 여기서 이용된 바와 같은 용어 "프로세서"는 위의 구조 또는 여기서 기술된 기법들의 구현에 적합한 임의의 다른 구조 중 임의의 것을 지칭할 수 있다. 또한, 몇몇 양상들에서, 여기서 기술된 기능은 코딩 및 디코딩을 위해 구성되는 전용 하드웨어 및/또는 소프트웨어 모듈들 내에 제공될 수 있거나 조합된 코덱에 통합될 수 있다. 또한, 기법들은 하나 이상의 회로들 또는 로직 엘리먼트들에서 완전히 구현될 수 있다. The code may be one or more digital signal processors (DSPs), general purpose microprocessors, application specific integrated circuits (ASICs), field programmable logic arrays (FPGAs), or other equivalent integrated or abnormal logic circuits. It may be executed by one or more processors. As such, the term “processor” as used herein may refer to any of the above structures or any other structure suitable for the implementation of the techniques described herein. In addition, in some aspects, the functionality described herein may be provided within dedicated hardware and / or software modules configured for coding and decoding or integrated into a combined codec. Techniques may also be fully implemented in one or more circuits or logic elements.

[0089] 본 개시의 기법들은 무선 핸드셋, 집적 회로(IC) 또는 IC들의 세트(예를 들어, 칩 세트)를 포함하는 매우 다양한 디바이스들 또는 장치들로 구현될 수 있다. 다양한 컴포넌트들, 모듈들, 또는 유닛들은 기재된 기법들을 수행하도록 구성된 디바이스들의 기능적인 양상들을 강조하기 위해 본 개시에서 기술되었지만, 반드시 상이한 하드웨어 유닛들에 의한 실현을 요구하는 것은 아니다. 오히려, 위에서 기술된 바와 같이, 다양한 유닛들은 컴퓨터-판독 가능한 매체 상에 저장된 적합한 소프트웨어 및/또는 펌웨어와 함께 위에서 기술된 바와 같은 하나 이상의 프로세서들을 포함하는 상호작용식 하드웨어 유닛들의 모음에 의해 또는 코덱 하드웨어 유닛으로 조합될 수 있다. The techniques of this disclosure may be implemented in a wide variety of devices or apparatuses, including a wireless handset, an integrated circuit (IC), or a set of ICs (eg, a chip set). Various components, modules, or units have been described in this disclosure to emphasize functional aspects of devices configured to perform the described techniques, but do not necessarily require realization by different hardware units. Rather, as described above, the various units may be coded hardware or by a collection of interactive hardware units including one or more processors as described above in conjunction with suitable software and / or firmware stored on a computer-readable medium. Can be combined into units.

다응? 예들이 기술되었다. 이들 및 다른 예들은 다음의 청구항들의 범위 내에 있다.
Are you? Examples have been described. These and other examples are within the scope of the following claims.

Claims

A method for performing visual navigation via a visual navigation device, the method comprising:
Computing, via the visual search device, a distance between a query feature descriptor provided by a visual search query and each of a plurality of reference feature descriptors, wherein the visual search query initiates the visual search;
Determining, via the visual navigation device, one or more first groups of computed distances and a second group of computed distances according to a clustering algorithm, wherein the first group of computed distances is determined by the plurality of criteria. The second group of computed distances includes those of computed distances indicating that an association of feature descriptors is near to the query feature descriptor relative to those of the computed distances determined to be within the second group of computed distances. The computed distances indicating that the association of the plurality of reference feature descriptors is away from the query feature descriptor relative to those of the computed distances determined to be within one or more first groups of the computed distances; And
Via the visual search device, a plurality of reference feature descriptors with which the query feature descriptor is associated with the least of the computed distances based on the determined first group of computed distances and the second group of computed distances. Determining whether to match one of the
/ RTI >
Method for performing visual navigation.

The method of claim 1,
Determining the one or more first group of computed distances and the second group of computed distances,
one or more first groups of computed distances and the computed distances according to one or more of a k-means clustering algorithm, a Gaussian fitting algorithm, and a graph cutting algorithm Determining a second group of subjects,
Method for performing visual navigation.

The method of claim 1,
Determining if the query feature descriptor matches one of the plurality of reference feature descriptors,
Computing at least two averages of the computed distances determined to be within the first group to produce a first group distance average;
Computing an average of the computed distances determined to be within the second group to produce a second group distance average;
Dividing the first group distance average by the second group distance average to produce an average distance ratio measure;
Comparing the average distance ratio measurement to a threshold; And
Based on the comparison, determining whether the query feature descriptor matches one of a plurality of reference feature descriptors associated with the minimum of the computed distances.
/ RTI >
Method for performing visual navigation.

The method of claim 1,
A reference such that the computed distances determined to be within the first group such that the set of reference images does not include duplicate reference images include reference images associated with those of the plurality of reference feature descriptors computed for the corresponding group of reference feature descriptors. Determining a unique group of images;
Assigning a bot to each of the reference images determined to be in a unique group of the reference images;
Ordering the reference images based on the assigned boat; And
Returning ordered reference images in response to the visual search query
&Lt; / RTI >
Method for performing visual navigation.

5. The method of claim 4,
Assigning a boat to each of the reference images determined to be in a unique group of the reference images,
Assigning a constant boat to each of the reference images;
Assigning a boat in proportion to a distance ratio of a corresponding computed distance that is comparable to the computed distance between the query feature descriptor and the nearest one of the plurality of reference feature descriptors; And
Allocating a boat in proportion to the rank of the corresponding computed distance in the first group when the computed distances of the first group are ordered from minimum to maximum
&Lt; / RTI >
Method for performing visual navigation.

The method of claim 1,
The query feature descriptor includes one of a plurality of query feature descriptors provided by the visual search query and extracted from query image data according to a visual search algorithm,
The method comprises:
For each of the plurality of query feature descriptors, computing a distance between a current one of the plurality of query feature descriptors and each of the plurality of reference feature descriptors to produce a plurality of computed distances;
For each of the plurality of distances, determining a first group of the computed plurality of distances and a second group of the computed plurality of distances according to the clustering algorithm-a first group of the computed plurality of distances Includes those of computed distances indicating that the associated of the plurality of reference feature descriptors is near to the query feature descriptor relative to those of the computed plurality of distances determined to be within the second group of the computed plurality of groups. And wherein the second group of computed plurality of distances is associated with the plurality of query features relative to those of the computed plurality of distances that are associated with the plurality of reference feature descriptors within a first group of the computed plurality of distances. The com which marks the engineer as being separated from the current one Tingdoen including ones of the plurality of distances; And
With the visual search device, each of the plurality of query feature descriptors is configured to determine a minimum of the computed plurality of distances based on the determined first group of the computed plurality of distances and the second group of the computed plurality of distances. Determining one of a plurality of reference feature descriptors associated with the
&Lt; / RTI >
Method for performing visual navigation.

The method according to claim 6,
For each of the plurality of query feature descriptors, the plurality of computed distances determined to be within the first group such that the set of reference images does not include duplicate reference images are computed for the corresponding group of reference feature descriptors. Determining a unique group of reference images to include reference images associated with those of the reference feature descriptors;
Assigning a boat to each of the reference images determined to be in a unique group of reference images determined for each of the plurality of query feature descriptors;
After allocating boats for each of the plurality of query feature descriptors, ordering the reference images based on the assigned boats; And
Returning ordered reference images in response to the visual search query
&Lt; / RTI >
Method for performing visual navigation.

The method of claim 1,
The feature descriptor of the above quality,
Extracted from the query image according to a local feature-based visual search algorithm,
Method for performing visual navigation.

The method of claim 8,
The local-feature based visual search algorithm,
Including a scale invariant feature transform (SIFT) algorithm,
Method for performing visual navigation.

The method of claim 1,
Receiving the query feature descriptor as a compressed query feature descriptor from a client device through an interface of a visual navigation device; And
Reconstructing the query feature descriptor from the compressed query feature descriptor via the visual search device.
&Lt; / RTI >
Method for performing visual navigation.

An apparatus for performing visual navigation,
Means for computing a distance between each of a plurality of reference feature descriptors and a query feature descriptor provided by a visual search query, wherein the visual search query initiates the visual search;
Means for determining one or more first groups of computed distances and a second group of computed distances according to a clustering algorithm, wherein the first group of computed distances is associated with the plurality of reference feature descriptors. And those of computed distances indicating that the query feature descriptor is near relative to those of the computed distances determined to be within the second group of computed distances, wherein the second group of computed distances is the plurality of reference feature descriptors. Includes those of computed distances indicating that the association of the distance from the query feature descriptor is relative to those of the computed distances determined to be within the one or more first groups of computed distances; And
Determining whether the query feature descriptor matches one of a plurality of reference feature descriptors associated with the least of the computed distances based on the determined first group of computed distances and the second group of computed distances. step
/ RTI >
Device for performing visual navigation.

The method of claim 11,
one or more first groups of computed distances and the computed distances according to one or more of a k-means clustering algorithm, a Gaussian fitting algorithm, and a graph cutting algorithm Means for determining a second group of people
&Lt; / RTI >
Device for performing visual navigation.

The method of claim 11,
Means for computing two or more averages of the computed distances determined to be within the first group to produce a first group distance average;
Means for computing an average of the computed distances determined to be within the second group to produce a second group distance average;
Means for dividing the first group distance average by the second group distance average to produce an average distance ratio measurement;
Means for comparing the average distance ratio measurement to a threshold; And
Means for determining whether the query feature descriptor matches one of a plurality of reference feature descriptors associated with the minimum of the computed distances based on the comparison
Including,
Device for performing visual navigation.

The method of claim 11,
A reference such that the computed distances determined to be within the first group such that the set of reference images does not include duplicate reference images include reference images associated with those of the plurality of reference feature descriptors computed for the corresponding group of reference feature descriptors. Means for determining a unique group of images;
Means for assigning a bot to each of the reference images determined to be in a unique group of the reference images;
Means for ordering the reference images based on the assigned boat; And
Means for returning ordered reference images in response to the visual search query
&Lt; / RTI >
Device for performing visual navigation.

15. The method of claim 14,
Means for assigning a constant boat to each of the reference images;
Means for assigning a boat in proportion to a distance ratio of a corresponding computed distance that is comparable to a computed distance between the query feature descriptor and the nearest one of the plurality of reference feature descriptors; And
Means for allocating a boat in proportion to the rank of the corresponding computed distance in the first group when the computed distances of the first group are ordered from minimum to maximum
More than one,
Device for performing visual navigation.

The method of claim 11,
The query feature descriptor includes one of a plurality of query feature descriptors provided by the visual search query and extracted from query image data according to a visual search algorithm,
The apparatus comprises:
Means for computing, for each of the plurality of distances, a distance between a current one of the plurality of query feature descriptors and each of the plurality of reference feature descriptors to produce a plurality of computed distances;
Means for determining, for each of the plurality of distances, a first group of the computed plurality of distances and a second group of the computed plurality of distances according to the clustering algorithm-a first of the computed plurality of distances The group includes those of computed distances indicating that the associated of the plurality of reference feature descriptors is near to the query feature descriptor as compared to those of the computed plurality of distances determined to be within the second group of the computed plurality of groups. And wherein the second group of computed plurality of distances comprises the plurality of queries relative to those of the computed plurality of distances that are associated with the plurality of reference feature descriptors to be within the first group of computed plurality of distances. Mark the feature descriptors apart from the current one. Including ones of a plurality of computing the distance; And
Each of the plurality of query feature descriptors associated with a minimum of the computed plurality of distances based on the determined first group of the computed plurality of distances and the second group of the computed plurality of distances. Means for determining whether to match one of the
&Lt; / RTI >
Device for performing visual navigation.

17. The method of claim 16,
For each of the plurality of query feature descriptors, the plurality of computed distances determined to be within the first group such that the set of reference images does not include duplicate reference images are computed for the corresponding group of reference feature descriptors. Means for determining a unique group of reference images to include reference images associated with those of the reference feature descriptors;
Means for assigning a boat to each of the reference images determined to be in a unique group of reference images determined for each of the plurality of query feature descriptors;
Means for ordering the reference images based on assigned boats after allocating boats for each of the plurality of query feature descriptors; And
Means for returning ordered reference images in response to the visual search query
&Lt; / RTI >
Device for performing visual navigation.

The method of claim 11,
The feature descriptor of the above quality,
Extracted from the query image according to a local feature-based visual search algorithm,
Device for performing visual navigation.

The method of claim 18,
The local-feature based visual search algorithm,
Including a scale invariant feature transform (SIFT) algorithm,
Device for performing visual navigation.

The method of claim 11,
Means for receiving the query feature descriptor as a compressed query feature descriptor from a client device through an interface of a visual navigation device; And
Means for reconstructing the query feature descriptor from the compressed query feature descriptor via the visual search device.
&Lt; / RTI >
Device for performing visual navigation.

A device configured to perform visual navigation,
An interface configured to receive a query feature descriptor;
A feature matching unit,
The feature matching unit,
To compute a distance between a query feature descriptor provided by a visual search query and each of a plurality of reference feature descriptors, wherein the visual search query initiates the visual search; And
Determine one or more first groups of computed distances and a second group of computed distances according to a clustering algorithm.
Configured
The first group of computed distances is computed to indicate that an associated of the plurality of reference feature descriptors is near to the query feature descriptor as compared to those of the computed distances determined to be within the second group of computed distances. A second group of computed distances, including those of distances, that the associated of the plurality of reference feature descriptors is away from the query feature descriptor relative to those of the computed distances determined to be within the one or more first groups of computed distances. Include those of the computed distances to indicate; And
The feature matching unit is one of a plurality of reference feature descriptors with which the query feature descriptor is associated with the least of the computed distances based on the determined first group of computed distances and the second group of computed distances. Further configured to determine whether to match
Device for performing visual navigation.

22. The method of claim 21,
The feature matching unit,
one or more first groups of computed distances and the computed distances according to one or more of a k-means clustering algorithm, a Gaussian fitting algorithm, and a graph cutting algorithm To determine the second group of
Configured,
Device for performing visual navigation.

22. The method of claim 21,
The feature matching unit further comprises:
Compute two or more averages of the computed distances determined to be within the first group to produce a first group distance average;
Compute an average of the computed distances determined to be within the second group to produce a second group distance average;
Divide the first group distance average by the second group distance average to produce an average distance ratio measurement;
Compare the average distance ratio measurement to a threshold; And
Based on the comparison to determine if the query feature descriptor matches one of a plurality of reference feature descriptors associated with the minimum of the computed distances.
Configured,
Device for performing visual navigation.

22. The method of claim 21,
The feature matching unit further comprises:
A reference such that the computed distances determined to be within the first group such that the set of reference images does not include duplicate reference images include reference images associated with those of the plurality of reference feature descriptors computed for the corresponding group of reference feature descriptors. Determine a unique group of images;
Assign a bot to each of the reference images determined to be in a unique group of the reference images;
To order the reference images based on the assigned boat; And
To return ordered reference images in response to the visual search query.
Configured,
Device for performing visual navigation.

25. The method of claim 24,
The feature matching unit,
Assigning a constant boat to each of the reference images;
Assigning a boat in proportion to a distance ratio of a corresponding computed distance that is comparable to a computed distance between the query feature descriptor and the nearest one of the plurality of reference feature descriptors; And
To allocate the boat by one or more of allocating the boat in proportion to the rank of the corresponding computed distance in the first group when the computed distances of the first group are ordered from minimum to maximum.
Configured,
Device for performing visual navigation.

22. The method of claim 21,
The query feature descriptor includes one of a plurality of query feature descriptors provided by the visual search query and extracted from query image data according to a visual search algorithm,
The feature matching unit further comprises:
For each of the plurality of query feature descriptors, compute a distance between a current one of the plurality of query feature descriptors and each of the plurality of reference feature descriptors to produce a plurality of computed distances;
For each of the plurality of distances, to determine a first group of the computed plurality of distances and a second group of the computed plurality of distances according to the clustering algorithm—the first group of computed plurality of distances is The associated of the plurality of reference feature descriptors include those of computed distances indicating that the query feature descriptor is near relative to those of the computed plurality of distances determined to be within the second group of the computed plurality of groups; Wherein the second group of computed plurality of distances is compared to those of the computed plurality of distances that are associated with the plurality of reference feature descriptors within the first group of computed plurality of distances. Computing that indicates that you are away from the current one Includes those of a plurality of distances; And
Each of the plurality of query feature descriptors associated with a minimum of the computed plurality of distances based on the determined first group of the computed plurality of distances and the second group of the computed plurality of distances. To determine if it matches one of
Configured,
Device for performing visual navigation.

27. The method of claim 26,
The feature matching unit further comprises:
For each of the plurality of query feature descriptors, the plurality of computed distances determined to be within the first group such that the set of reference images does not include duplicate reference images are computed for the corresponding group of reference feature descriptors. Determine a unique group of reference images to include reference images associated with those of the reference feature descriptors;
Assign a boat to each of the reference images determined to be in a unique group of reference images determined for each of the plurality of query feature descriptors;
After allocating boats for each of the plurality of query feature descriptors, order the reference images based on the assigned boats.
Respectively,
The interface returns ordered reference images in response to the visual search query,
Device for performing visual navigation.

22. The method of claim 21,
The feature descriptor of the above quality,
Extracted from the query image according to a local feature-based visual search algorithm,
Device for performing visual navigation.

29. The method of claim 28,
The local-feature based visual search algorithm,
Including a scale invariant feature transform (SIFT) algorithm,
Device for performing visual navigation.

22. The method of claim 21,
The interface is,
Receive the query feature descriptor as a compressed query feature descriptor from a client device,
The apparatus comprises:
A feature reconstruction unit configured to reconstruct the query feature descriptor from the compressed query feature descriptor via a visual search device,
Device for performing visual navigation.

A computer-readable medium containing instructions,
When executed, the instruction causes one or more processors to:
Compute a distance between each of a plurality of reference feature descriptors and a query feature descriptor provided by a visual search query, wherein the visual search query initiates the visual search; And
Determine one or more first groups of computed distances and a second group of computed distances according to a clustering algorithm, wherein the first group of computed distances is associated with the plurality of reference feature descriptors. And those of computed distances indicating that they are near to the query feature descriptor as compared to those of the computed distances determined to be within a second group of calculated distances, wherein the second group of computed distances includes Includes those of computed distances indicating that an association is away from the query feature descriptor relative to those of computed distances determined to be within one or more first groups of computed distances; And
And based on the determined first group of computed distances and the second group of computed distances, determine that the query feature descriptor matches one of a plurality of reference feature descriptors associated with the least of the computed distances. doing
Computer-readable medium.

The method of claim 31, wherein
When executed, causes the one or more processors to:
one or more first groups of computed distances and the computed distances according to one or more of a k-means clustering algorithm, a Gaussian fitting algorithm, and a graph cutting algorithm Instructions to determine a second group of characters
&Lt; / RTI >
Computer-readable medium.

The method of claim 31, wherein
When executed, causes the one or more processors to:
Compute two or more averages of the computed distances determined to be within the first group to produce a first group distance average;
Compute an average of the computed distances determined to be within the second group to produce a second group distance average;
Divide the first group distance average by the second group distance average to produce an average distance ratio measurement;
Compare the average distance ratio measurement to a threshold; And
Instructions that, based on the comparison, cause the query feature descriptor to match one of a plurality of reference feature descriptors associated with the minimum of the computed distances.
&Lt; / RTI >
Computer-readable medium.

The method of claim 31, wherein
When executed, causes the one or more processors to:
A reference such that the computed distances determined to be within the first group such that the set of reference images does not include duplicate reference images include reference images associated with those of the plurality of reference feature descriptors computed for the corresponding group of reference feature descriptors. Determine a unique group of images;
Assign a bot to each of the reference images determined to be in a unique group of the reference images;
Order the reference images based on the assigned boat; And
Instructions to return ordered reference images in response to the visual search query
&Lt; / RTI >
Computer-readable medium.

35. The method of claim 34,
When executed, causes the one or more processors to:
Assign a constant boat to each of the reference images
Assign a boat in proportion to a distance ratio of a corresponding computed distance that is comparable to a computed distance between the query feature descriptor and the nearest one of the plurality of reference feature descriptors; And
Assigning the boat by one or more of allocating the boat in proportion to the rank of the corresponding computed distance in the first group when the computed distances of the first group are ordered from minimum to maximum. Commands
Further included,
Computer-readable medium.

The method of claim 31, wherein
The query feature descriptor includes one of a plurality of query feature descriptors provided by the visual search query and extracted from query image data according to a visual search algorithm,
The computer-readable medium, when executed, causes the one or more processors to:
For each of the plurality of query feature descriptors, compute a distance between a current one of the plurality of query feature descriptors and each of the plurality of reference feature descriptors to produce a plurality of computed distances;
For each of the plurality of distances, determine, according to the clustering algorithm, a first group of the computed plurality of distances and a second group of the computed plurality of distances-a first group of the computed plurality of distances Includes those of computed distances indicating that the associated of the plurality of reference feature descriptors is near to the query feature descriptor relative to those of the computed plurality of distances determined to be within the second group of the computed plurality of groups. And wherein the second group of computed plurality of distances is associated with the plurality of query features relative to those of the computed plurality of distances that are associated with the plurality of reference feature descriptors within a first group of the computed plurality of distances. The com which marks the engineer as being separated from the current one Tingdoen including ones of the plurality of distances; And
Each of the plurality of query feature descriptors associated with a minimum of the computed plurality of distances based on the determined first group of the computed plurality of distances and the second group of the computed plurality of distances. Instructions to determine whether to match one of the
&Lt; / RTI >
Computer-readable medium.

The method of claim 36,
When executed, causes the one or more processors to:
For each of the plurality of query feature descriptors, the plurality of computed distances determined to be within the first group such that the set of reference images does not include duplicate reference images are computed for the corresponding group of reference feature descriptors. Determine a unique group of reference images to include reference images associated with those of the reference feature descriptors;
Assign a boat to each of the reference images determined to be in a unique group of reference images determined for each of the plurality of query feature descriptors;
Instructions for assigning boats for each of the plurality of query feature descriptors, and then ordering the reference images based on the assigned boats
&Lt; / RTI >
Computer-readable medium.

The method of claim 31, wherein
The feature descriptor of the above quality,
Extracted from the query image according to a local feature-based visual search algorithm,
Computer-readable medium.

The method of claim 38,
The local-feature based visual search algorithm,
Including a scale invariant feature transform (SIFT) algorithm,
Computer-readable medium.

The method of claim 31, wherein
When executed, causes the one or more processors to:
Receive the query feature descriptor as a compressed query feature descriptor from a client device through an interface of a visual search device; And
Instructions to reconstruct the query feature descriptor from the compressed query feature descriptor via the visual search device
&Lt; / RTI >
Computer-readable medium.

As a system,
Input for receiving information received from one or more client devices, the information including a query feature descriptor for initiating a visual search;
A database containing information indicative of a plurality of reference query descriptors; And
A visual search server device configured to perform the visual search
Lt; / RTI >
The visual search server device,
An interface for receiving the query feature descriptor; And
Feature Matching Unit
/ RTI >
The feature matching unit,
Compute a distance between the query feature descriptor and each of the plurality of reference query descriptors;
Determine one or more first groups of computed distances and a second group of computed distances according to a clustering algorithm, wherein the first group of computed distances is associated with the plurality of reference query descriptors. And those of computed distances indicating that they are near to the query feature descriptor as compared to those of the computed distances determined to be within the second group of distances, wherein the second group of computed distances is associated with the plurality of reference query descriptors. Includes those of computed distances indicating that the one is farther from the query feature descriptor relative to those of the computed distances determined to be within the one or more first groups of computed distances; And
Based on the determined first group of computed distances and the second group of computed distances to determine whether the query feature descriptor matches one of a plurality of reference query descriptors associated with the least of the computed distances. Composed,
system.